All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode)
@ 2022-11-09 10:40 Robert Richter
  2022-11-09 10:40 ` [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device Robert Richter
                   ` (8 more replies)
  0 siblings, 9 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
the PCIe enumeration hierarchy is different from CXL VH Enumeration
(formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
and 9.12, [1]). This series adds support for RCD mode. It implements
the detection of Restricted CXL Hosts (RCHs) and its corresponding
Restricted CXL Devices (RCDs). It does the necessary enumeration of
ports and connects the endpoints. With all the plumbing an RCH/RCD
pair is registered at the Linux CXL bus and becomes visible in sysfs
in the same way as CXL VH hosts and devices do already. RCDs are
brought up as CXL endpoints and bound to subsequent drivers such as
cxl_mem.

For CXL VH the host driver (cxl_acpi) starts host bridge discovery
once the ACPI0017 CXL root device is detected and then searches for
ACPI0016 host bridges to enable CXL. This implementation requires the
ACPI device enumeration for RCD mode. It also expects the host's
downstream and upstream port RCRBs base address being reported by
firmware using the optional CEDT CHBS entry of the host bridge (see
CXL spec 3.0, 9.17.1.2).

RCD mode does not support hot-plug, so host discovery is at boot time
only.

This version bases on 2b76fc22aefd ("cxl/acpi: Improve debug messages
in cxl_acpi_probe()") containing the already accepted patches.

[1] https://www.computeexpresslink.org/spec-landing

---

v3:

 * Rebased onto 2b76fc22aefd ("cxl/acpi: Improve debug messages in
   cxl_acpi_probe()").
 * Added Co-developed-by tag. (Rafael)
 * Added public function cxl_rcrb_to_component() to regs.c for later
   reuse. Added arg to switch between upstream and downstream RCRB.
   (Dan, Dave)
 * Added patch to register CXL host ports by bridge device. Note the
   alias detection in ndctl (cxl list command) need to check both
   sysfs entries, firmware_node and physical_node. A rework is needed
   here. (Dan, Vishal)
 * Reworked implementation to skip intermediate port enumeration of
   restricted endpoints (RCDs). (Dan, Dave)
 * Added check to only register RCDs with device 0, function 0 as CXL
   memory.
 * Added Terry's patch to set ACPI's CXL _OSC to indicate CXL1.1
   support.

v2:
 * Reworked series to use add_host_bridge_dport() and
   add_host_bridge_uport(). There is a single cxl root device
   (ACPI0017) also for RCHs (must have a ACPI0016 id). (Dan)
 * Rebased onto 6.1-rc1.
 * Added a WARN_ON_ONCE() to CXL_RESOURCE_NONE check. Updated patch
   description with an example case. (Dan, Jonathan)
 * Added wrapper functions to devm_cxl_add_port() and
   devm_cxl_add_dport(). (Dan)
 * Dropped "PCI/ACPI: Link host bridge to its ACPI fw node" patch.
 * Updated spec refs to use 3.0. Added PCIe base spec refs. (Jonathan)
 * Reused UID detect code. (Dan)
 * Dropped "cxl/acpi: Specify module load order dependency for the
   cxl_acpi module" patch. Return -EINVAL if host not yet ready. (Dan)
 * Minor other changes.
 * Note: I haven't included most of the received Reviewed-by tags due
   to the major rework. In any case, thanks to all here.

Robert Richter (8):
  cxl/acpi: Register CXL host ports by bridge device
  cxl/acpi: Extract component registers of restricted hosts from RCRB
  cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port
  cxl/mem: Skip intermediate port enumeration of restricted endpoints
    (RCDs)
  cxl/pci: Only register RCDs with device 0, function 0 as CXL memory
    device
  cxl/pci: Do not ignore PCI config read errors in match_add_dports()
  cxl/pci: Factor out code in match_add_dports() to pci_dev_add_dport()
  cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted
    hosts (RCH)

Terry Bowman (1):
  cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support

 drivers/acpi/pci_root.c |  1 +
 drivers/cxl/acpi.c      | 89 +++++++++++++++++++++++++++++------------
 drivers/cxl/core/pci.c  | 74 ++++++++++++++++++++++++++++------
 drivers/cxl/core/port.c | 56 +++++++++++++++++++++++++-
 drivers/cxl/core/regs.c | 46 +++++++++++++++++++++
 drivers/cxl/cxl.h       |  8 ++++
 drivers/cxl/pci.c       | 25 +++++++++++-
 7 files changed, 257 insertions(+), 42 deletions(-)


base-commit: 2b76fc22aefd39820c0520255875f99b326ede99
-- 
2.30.2


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device
  2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
@ 2022-11-09 10:40 ` Robert Richter
  2022-11-09 23:11   ` Bjorn Helgaas
  2022-11-14 20:22   ` Dan Williams
  2022-11-09 10:40 ` [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB Robert Richter
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

A port of a CXL host bridge links to the bridge's acpi device
(&adev->dev) with its corresponding uport/dport device (uport_dev and
dport_dev respectively). The device is not a direct parent device in
the PCI topology as pdev->dev.parent points to a PCI bridge's (struct
pci_host_bridge) device. The following CXL memory device hierarchy
would be valid for an endpoint once an RCD EP would be enabled (note
this will be done in a later patch):

VH mode:

 cxlmd->dev.parent->parent
        ^^^\^^^^^^\ ^^^^^^\
            \      \       pci_dev (Type 1, Downstream Port)
             \      pci_dev (Type 0, PCI Express Endpoint)
              cxl mem device

RCD mode:

 cxlmd->dev.parent->parent
        ^^^\^^^^^^\ ^^^^^^\
            \      \       pci_host_bridge
             \      pci_dev (Type 0, RCiEP)
              cxl mem device

In VH mode a downstream port is created by port enumeration and thus
always exists.

Now, in RCD mode the host bridge also already exists but it references
to an ACPI device. A port lookup by the PCI device's parent device
will fail as a direct link to the registered port is missing. The ACPI
device of the bridge must be determined first.

To prevent this, change port registration of a CXL host to use the
bridge device instead. Do this also for the VH case as port topology
will better reflect the PCI topology then.

If a mock device is registered by a test driver, the bridge pointer
can be NULL. Keep using the matching ACPI device (&adev->dev) as a
fallback in this case.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index fb9f72813067..06150c953f58 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -185,6 +185,17 @@ __mock struct acpi_device *to_cxl_host_bridge(struct device *host,
 	return NULL;
 }
 
+static inline struct acpi_pci_root *to_cxl_pci_root(struct device *host,
+						    struct device *match)
+{
+	struct acpi_device *adev = to_cxl_host_bridge(host, match);
+
+	if (!adev)
+		return NULL;
+
+	return acpi_pci_find_root(adev->handle);
+}
+
 /*
  * A host bridge is a dport to a CFMWS decode and it is a uport to the
  * dport (PCIe Root Ports) in the host bridge.
@@ -193,35 +204,35 @@ static int add_host_bridge_uport(struct device *match, void *arg)
 {
 	struct cxl_port *root_port = arg;
 	struct device *host = root_port->dev.parent;
-	struct acpi_device *bridge = to_cxl_host_bridge(host, match);
-	struct acpi_pci_root *pci_root;
+	struct acpi_pci_root *pci_root = to_cxl_pci_root(host, match);
 	struct cxl_dport *dport;
 	struct cxl_port *port;
+	struct device *bridge;
 	int rc;
 
-	if (!bridge)
+	if (!pci_root)
 		return 0;
 
-	dport = cxl_find_dport_by_dev(root_port, match);
+	/*
+	 * If it is a mock dev, the bridge can be NULL, use matching
+	 * device (&adev->dev) as a fallback then then.
+	 */
+	bridge = pci_root->bus->bridge ?: match;
+	dport = cxl_find_dport_by_dev(root_port, bridge);
 	if (!dport) {
 		dev_dbg(host, "host bridge expected and not found\n");
 		return 0;
 	}
 
-	/*
-	 * Note that this lookup already succeeded in
-	 * to_cxl_host_bridge(), so no need to check for failure here
-	 */
-	pci_root = acpi_pci_find_root(bridge->handle);
-	rc = devm_cxl_register_pci_bus(host, match, pci_root->bus);
+	rc = devm_cxl_register_pci_bus(host, bridge, pci_root->bus);
 	if (rc)
 		return rc;
 
-	port = devm_cxl_add_port(host, match, dport->component_reg_phys, dport);
+	port = devm_cxl_add_port(host, bridge, dport->component_reg_phys, dport);
 	if (IS_ERR(port))
 		return PTR_ERR(port);
 
-	dev_info(pci_root->bus->bridge, "host supports CXL\n");
+	dev_info(bridge, "host supports CXL\n");
 
 	return 0;
 }
@@ -258,13 +269,16 @@ static int add_host_bridge_dport(struct device *match, void *arg)
 	struct cxl_chbs_context ctx;
 	struct cxl_port *root_port = arg;
 	struct device *host = root_port->dev.parent;
-	struct acpi_device *bridge = to_cxl_host_bridge(host, match);
+	struct acpi_pci_root *pci_root = to_cxl_pci_root(host, match);
+	struct device *bridge;
+	acpi_handle handle;
 
-	if (!bridge)
+	if (!pci_root)
 		return 0;
 
-	status = acpi_evaluate_integer(bridge->handle, METHOD_NAME__UID, NULL,
-				       &uid);
+	bridge = pci_root->bus->bridge ?: match;
+	handle = pci_root->device->handle;
+	status = acpi_evaluate_integer(handle, METHOD_NAME__UID, NULL, &uid);
 	if (status != AE_OK) {
 		dev_err(match, "unable to retrieve _UID\n");
 		return -ENODEV;
@@ -285,7 +299,7 @@ static int add_host_bridge_dport(struct device *match, void *arg)
 
 	dev_dbg(match, "CHBCR found: 0x%08llx\n", (u64)ctx.chbcr);
 
-	dport = devm_cxl_add_dport(root_port, match, uid, ctx.chbcr);
+	dport = devm_cxl_add_dport(root_port, bridge, uid, ctx.chbcr);
 	if (IS_ERR(dport))
 		return PTR_ERR(dport);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB
  2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
  2022-11-09 10:40 ` [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device Robert Richter
@ 2022-11-09 10:40 ` Robert Richter
  2022-11-14 21:30   ` Dan Williams
  2022-11-09 10:40 ` [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port Robert Richter
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter, Terry Bowman

A downstream port must be connected to a component register block.
For restricted hosts the base address is determined from the RCRB. The
RCRB is provided by the host's CEDT CHBS entry. Rework CEDT parser to
get the RCRB and add code to extract the component register block from
it.

RCRB's BAR[0..1] point to the component block containing CXL subsystem
component registers. MEMBAR extraction follows the PCI base spec here,
esp. 64 bit extraction and memory range alignment (6.0, 7.5.1.2.1).

Note: Right now the component register block is used for HDM decoder
capability only which is optional for RCDs. If unsupported by the RCD,
the HDM init will fail. It is future work to bypass it in this case.

Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/acpi.c      | 43 +++++++++++++++++++++++++++++---------
 drivers/cxl/core/regs.c | 46 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h       |  8 +++++++
 3 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 06150c953f58..caea42cf9522 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -9,6 +9,8 @@
 #include "cxlpci.h"
 #include "cxl.h"
 
+#define CXL_RCRB_SIZE	SZ_8K
+
 static unsigned long cfmws_to_decoder_flags(int restrictions)
 {
 	unsigned long flags = CXL_DECODER_F_ENABLE;
@@ -240,27 +242,46 @@ static int add_host_bridge_uport(struct device *match, void *arg)
 struct cxl_chbs_context {
 	struct device *dev;
 	unsigned long long uid;
-	resource_size_t chbcr;
+	struct acpi_cedt_chbs chbs;
 };
 
-static int cxl_get_chbcr(union acpi_subtable_headers *header, void *arg,
-			 const unsigned long end)
+static int cxl_get_chbs(union acpi_subtable_headers *header, void *arg,
+			const unsigned long end)
 {
 	struct cxl_chbs_context *ctx = arg;
 	struct acpi_cedt_chbs *chbs;
 
-	if (ctx->chbcr)
+	if (ctx->chbs.base)
 		return 0;
 
 	chbs = (struct acpi_cedt_chbs *) header;
 
 	if (ctx->uid != chbs->uid)
 		return 0;
-	ctx->chbcr = chbs->base;
+	ctx->chbs = *chbs;
 
 	return 0;
 }
 
+static resource_size_t cxl_get_chbcr(struct cxl_chbs_context *ctx)
+{
+	struct acpi_cedt_chbs *chbs = &ctx->chbs;
+
+	if (!chbs->base)
+		return CXL_RESOURCE_NONE;
+
+	if (chbs->cxl_version != ACPI_CEDT_CHBS_VERSION_CXL11)
+		return chbs->base;
+
+	if (chbs->length != CXL_RCRB_SIZE)
+		return CXL_RESOURCE_NONE;
+
+	dev_dbg(ctx->dev, "RCRB found for UID %lld: 0x%08llx\n",
+		ctx->uid, (u64)chbs->base);
+
+	return cxl_rcrb_to_component(ctx->dev, chbs->base, CXL_RCRB_DOWNSTREAM);
+}
+
 static int add_host_bridge_dport(struct device *match, void *arg)
 {
 	acpi_status status;
@@ -272,6 +293,7 @@ static int add_host_bridge_dport(struct device *match, void *arg)
 	struct acpi_pci_root *pci_root = to_cxl_pci_root(host, match);
 	struct device *bridge;
 	acpi_handle handle;
+	resource_size_t component_reg_phys;
 
 	if (!pci_root)
 		return 0;
@@ -287,19 +309,20 @@ static int add_host_bridge_dport(struct device *match, void *arg)
 	dev_dbg(match, "UID found: %lld\n", uid);
 
 	ctx = (struct cxl_chbs_context) {
-		.dev = host,
+		.dev = match,
 		.uid = uid,
 	};
-	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, cxl_get_chbcr, &ctx);
+	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, cxl_get_chbs, &ctx);
 
-	if (ctx.chbcr == 0) {
+	component_reg_phys = cxl_get_chbcr(&ctx);
+	if (component_reg_phys == CXL_RESOURCE_NONE) {
 		dev_warn(match, "No CHBS found for Host Bridge (UID %lld)\n", uid);
 		return 0;
 	}
 
-	dev_dbg(match, "CHBCR found: 0x%08llx\n", (u64)ctx.chbcr);
+	dev_dbg(match, "CHBCR found: 0x%08llx\n", (u64)component_reg_phys);
 
-	dport = devm_cxl_add_dport(root_port, bridge, uid, ctx.chbcr);
+	dport = devm_cxl_add_dport(root_port, bridge, uid, component_reg_phys);
 	if (IS_ERR(dport))
 		return PTR_ERR(dport);
 
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index ec178e69b18f..7a5bde81e949 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -307,3 +307,49 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
 	return -ENODEV;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_find_regblock, CXL);
+
+resource_size_t cxl_rcrb_to_component(struct device *dev,
+				      resource_size_t rcrb,
+				      enum cxl_rcrb which)
+{
+	resource_size_t component_reg_phys;
+	u32 bar0, bar1;
+	void *addr;
+
+	if (which == CXL_RCRB_UPSTREAM)
+		rcrb += SZ_4K;
+
+	/*
+	 * RCRB's BAR[0..1] point to component block containing CXL
+	 * subsystem component registers. MEMBAR extraction follows
+	 * the PCI Base spec here, esp. 64 bit extraction and memory
+	 * ranges alignment (6.0, 7.5.1.2.1).
+	 */
+	addr = ioremap(rcrb, PCI_BASE_ADDRESS_0 + SZ_8);
+	if (!addr) {
+		dev_err(dev, "Failed to map region %pr\n", addr);
+		return CXL_RESOURCE_NONE;
+	}
+
+	bar0 = readl(addr + PCI_BASE_ADDRESS_0);
+	bar1 = readl(addr + PCI_BASE_ADDRESS_1);
+	iounmap(addr);
+
+	/* sanity check */
+	if (bar0 & (PCI_BASE_ADDRESS_MEM_TYPE_1M | PCI_BASE_ADDRESS_SPACE_IO))
+		return CXL_RESOURCE_NONE;
+
+	component_reg_phys = bar0 & PCI_BASE_ADDRESS_MEM_MASK;
+	if (bar0 & PCI_BASE_ADDRESS_MEM_TYPE_64)
+		component_reg_phys |= ((u64)bar1) << 32;
+
+	if (!component_reg_phys)
+		return CXL_RESOURCE_NONE;
+
+	/* MEMBAR is block size (64k) aligned. */
+	if (!IS_ALIGNED(component_reg_phys, CXL_COMPONENT_REG_BLOCK_SIZE))
+		return CXL_RESOURCE_NONE;
+
+	return component_reg_phys;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_rcrb_to_component, CXL);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index ac8998b627b5..d6b4fe68a821 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -219,6 +219,14 @@ enum cxl_regloc_type;
 int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
 		      struct cxl_register_map *map);
 
+enum cxl_rcrb {
+	CXL_RCRB_DOWNSTREAM,
+	CXL_RCRB_UPSTREAM,
+};
+resource_size_t cxl_rcrb_to_component(struct device *dev,
+				      resource_size_t rcrb,
+				      enum cxl_rcrb which);
+
 #define CXL_RESOURCE_NONE ((resource_size_t) -1)
 #define CXL_TARGET_STRLEN 20
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port
  2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
  2022-11-09 10:40 ` [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device Robert Richter
  2022-11-09 10:40 ` [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB Robert Richter
@ 2022-11-09 10:40 ` Robert Richter
  2022-11-14 23:45   ` Dan Williams
  2022-11-09 10:40 ` [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs) Robert Richter
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

The PCIe software view of an RCH and RCD is different to VH mode. An
RCD is paired with an RCH and shows up as RCiEP with a parent already
pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
device as parent (struct pci_dev, most of the time a downstream switch
port, but could also be a root port). The following hierarchy applies
in VH mode:

 CXL memory device, cxl_memdev                               endpoint
 └──PCIe Endpoint (type 0), pci_dev                           |
    └──Downstream Port (type 1), pci_dev (Nth switch)        portN
       └──Upstream Port (type 1), pci_dev (Nth switch)        |
          :                                                   :
          └──Downstream Port (type 1), pci_dev (1st switch)  port1
             └──Upstream Port (type 1), pci_dev (1st switch)  |
                └──Root Port (type 1), pci_dev                |
                   └──PCI host bridge, pci_host_bridge       port0
                      :                                       |
                      :..ACPI0017, acpi_dev                  root

 (There can be zero or any other number of switches in between.)

An iterator through the grandparents takes us to the root port which
is registered as dport to the bridge. The next port an endpoint is
connected to can be determined by using the grandparent of the memory
device as a dport_dev in cxl_mem_find_port().

The same does not work in RCD mode where only an RCiEP is connected to
the host bridge:

 CXL memory device, cxl_memdev                               endpoint
 └──PCIe Endpoint (type 0), pci_dev                           |
    └──PCI host bridge, pci_host_bridge                      port0
       :                                                      |
       :..ACPI0017, acpi_dev                                 root

Here, an endpoint is directly connected to the host bridge without a
type 1 PCI device (root or downstream port) in between. To link the
endpoint to the correct port, the endpoint's PCI device (parent of the
memory device) must be taken as dport_dev arg in cxl_mem_find_port().

Change cxl_mem_find_port() to find an RCH's port.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 0431ed860d8e..d10c3580719b 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
 	return rc;
 }
 
+static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
+{
+	struct device *parent = cxlmd->dev.parent;
+	if (!dev_is_pci(parent))
+		return false;
+	return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
+}
+
 int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
 {
 	struct device *dev = &cxlmd->dev;
@@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
 
+/*
+ * CXL memory device and port hierarchy:
+ *
+ * VH mode:
+ *
+ * CXL memory device, cxl_memdev                               endpoint
+ * └──PCIe Endpoint (type 0), pci_dev                           |
+ *    └──Downstream Port (type 1), pci_dev (Nth switch)        portN
+ *       └──Upstream Port (type 1), pci_dev (Nth switch)        |
+ *          :                                                   :
+ *          └──Downstream Port (type 1), pci_dev (1st switch)  port1
+ *             └──Upstream Port (type 1), pci_dev (1st switch)  |
+ *                └──Root Port (type 1), pci_dev                |
+ *                   └──PCI host bridge, pci_host_bridge       port0
+ *                      :                                       |
+ *                      :..ACPI0017, acpi_dev                  root
+ *
+ * (There can be zero or any other number of switches in between.)
+ *
+ * RCD mode:
+ *
+ * CXL memory device, cxl_memdev                               endpoint
+ * └──PCIe Endpoint (type 0), pci_dev                           |
+ *    └──PCI host bridge, pci_host_bridge                      port0
+ *       :                                                      |
+ *       :..ACPI0017, acpi_dev                                 root
+ */
 struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
 				   struct cxl_dport **dport)
 {
+	if (is_cxl_restricted(cxlmd))
+		return find_cxl_port(cxlmd->dev.parent, dport);
+
 	return find_cxl_port(grandparent(&cxlmd->dev), dport);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)
  2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
                   ` (2 preceding siblings ...)
  2022-11-09 10:40 ` [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port Robert Richter
@ 2022-11-09 10:40 ` Robert Richter
  2022-11-09 16:55   ` Dave Jiang
                     ` (2 more replies)
  2022-11-09 10:40 ` [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device Robert Richter
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

When an endpoint is found, all ports in beetween the endpoint and the
CXL host bridge need to be created. In the RCH case there are no ports
in between a host bridge and the endpoint. Skip the enumeration of
intermediate ports.

The port enumeration does not only create all ports, it also
initializes the endpoint chain by adding the endpoint to every
downstream port up to the root bridge. This must be done also in RCD
mode, but is much more simple as the endpoint only needs to be added
to the host bridge's dport.

Note: For endpoint removal the cxl_detach_ep() is not needed as it is
released in cxl_port_release().

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/port.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index d10c3580719b..e21a9c3fe4da 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
 {
 	struct device *dev = &cxlmd->dev;
 	struct device *iter;
+	struct cxl_dport *dport;
+	struct cxl_port *port;
 	int rc;
 
+	/*
+	 * Skip intermediate port enumeration in the RCH case, there
+	 * are no ports in between a host bridge and an endpoint. Only
+	 * initialize the EP chain.
+	 */
+	if (is_cxl_restricted(cxlmd)) {
+		port = cxl_mem_find_port(cxlmd, &dport);
+		if (!port)
+			return -ENXIO;
+		rc = cxl_add_ep(dport, &cxlmd->dev);
+		put_device(&port->dev);
+		return rc;
+	}
+
 	rc = devm_add_action_or_reset(&cxlmd->dev, cxl_detach_ep, cxlmd);
 	if (rc)
 		return rc;
@@ -1381,8 +1397,6 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
 	for (iter = dev; iter; iter = grandparent(iter)) {
 		struct device *dport_dev = grandparent(iter);
 		struct device *uport_dev;
-		struct cxl_dport *dport;
-		struct cxl_port *port;
 
 		if (!dport_dev)
 			return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device
  2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
                   ` (3 preceding siblings ...)
  2022-11-09 10:40 ` [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs) Robert Richter
@ 2022-11-09 10:40 ` Robert Richter
  2022-11-16 19:24   ` Dan Williams
  2022-11-09 10:40 ` [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports() Robert Richter
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

The Device 0, Function 0 DVSEC controls the CXL functionality of the
entire device. Add a check to prevent registration of any other PCI
device on the bus as a CXL memory device.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..cc4f206f24b3 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
 	}
 }
 
+static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
+{
+	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
+		return 0;		/* no RCD */
+
+	if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
+		return 0;		/* ok */
+
+	dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
+		pdev->devfn, pcie_dvsec);
+
+	return -ENODEV;
+}
+
 static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct cxl_register_map map;
 	struct cxl_memdev *cxlmd;
 	struct cxl_dev_state *cxlds;
+	u16 pcie_dvsec;
 	int rc;
 
 	/*
@@ -442,6 +457,13 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	BUILD_BUG_ON(offsetof(struct cxl_regs, memdev) !=
 		     offsetof(struct cxl_regs, device_regs.memdev));
 
+	pcie_dvsec = pci_find_dvsec_capability(
+		pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
+
+	rc = check_restricted_device(pdev, pcie_dvsec);
+	if (rc)
+		return rc;
+
 	rc = pcim_enable_device(pdev);
 	if (rc)
 		return rc;
@@ -451,8 +473,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		return PTR_ERR(cxlds);
 
 	cxlds->serial = pci_get_dsn(pdev);
-	cxlds->cxl_dvsec = pci_find_dvsec_capability(
-		pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
+	cxlds->cxl_dvsec = pcie_dvsec;
 	if (!cxlds->cxl_dvsec)
 		dev_warn(&pdev->dev,
 			 "Device DVSEC not present, skip CXL.mem init\n");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports()
  2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
                   ` (4 preceding siblings ...)
  2022-11-09 10:40 ` [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device Robert Richter
@ 2022-11-09 10:40 ` Robert Richter
  2022-11-09 23:09   ` Bjorn Helgaas
  2022-11-16 19:36   ` Dan Williams
  2022-11-09 10:40 ` [PATCH v3 7/9] cxl/pci: Factor out code in match_add_dports() to pci_dev_add_dport() Robert Richter
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

The link capabilities of a PCI device are read when enumerating its
dports. This is done by reading the PCI config space. If that fails
port enumeration ignores that error. However, reading the PCI config
space should reliably work.

To reduce some complexity to the code flow when factoring out parts of
the code in match_add_dports() for later reuse, change this to throw
an error.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 0dbbe8d39b07..8271b8abde7a 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -47,7 +47,7 @@ static int match_add_dports(struct pci_dev *pdev, void *data)
 		return 0;
 	if (pci_read_config_dword(pdev, pci_pcie_cap(pdev) + PCI_EXP_LNKCAP,
 				  &lnkcap))
-		return 0;
+		return -ENXIO;
 
 	rc = cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, &map);
 	if (rc)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] cxl/pci: Factor out code in match_add_dports() to pci_dev_add_dport()
  2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
                   ` (5 preceding siblings ...)
  2022-11-09 10:40 ` [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports() Robert Richter
@ 2022-11-09 10:40 ` Robert Richter
  2022-11-16 19:37   ` Dan Williams
  2022-11-09 10:40 ` [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH) Robert Richter
  2022-11-09 10:40 ` [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Robert Richter
  8 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

Factor out the code to register a PCI device's dport to a port. It
will be reused to implement RCD mode.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/pci.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 8271b8abde7a..667de4f125f6 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -29,14 +29,32 @@ struct cxl_walk_context {
 	int count;
 };
 
+static int pci_dev_add_dport(struct pci_dev *pdev, struct cxl_port *port,
+			      resource_size_t component_reg_phys)
+{
+	struct cxl_dport *dport;
+	u32 lnkcap, port_num;
+
+	if (pci_read_config_dword(pdev, pci_pcie_cap(pdev) + PCI_EXP_LNKCAP,
+				  &lnkcap))
+		return -ENXIO;
+
+	port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap);
+	dport = devm_cxl_add_dport(port, &pdev->dev, port_num,
+				   component_reg_phys);
+	if (IS_ERR(dport))
+		return PTR_ERR(dport);
+
+	return 0;
+}
+
 static int match_add_dports(struct pci_dev *pdev, void *data)
 {
 	struct cxl_walk_context *ctx = data;
 	struct cxl_port *port = ctx->port;
 	int type = pci_pcie_type(pdev);
 	struct cxl_register_map map;
-	struct cxl_dport *dport;
-	u32 lnkcap, port_num;
+	resource_size_t component_reg_phys;
 	int rc;
 
 	if (pdev->bus != ctx->bus)
@@ -45,21 +63,18 @@ static int match_add_dports(struct pci_dev *pdev, void *data)
 		return 0;
 	if (type != ctx->type)
 		return 0;
-	if (pci_read_config_dword(pdev, pci_pcie_cap(pdev) + PCI_EXP_LNKCAP,
-				  &lnkcap))
-		return -ENXIO;
 
 	rc = cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, &map);
 	if (rc)
 		dev_dbg(&port->dev, "failed to find component registers\n");
 
-	port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap);
-	dport = devm_cxl_add_dport(port, &pdev->dev, port_num,
-				   cxl_regmap_to_base(pdev, &map));
-	if (IS_ERR(dport)) {
-		ctx->error = PTR_ERR(dport);
-		return PTR_ERR(dport);
+	component_reg_phys = cxl_regmap_to_base(pdev, &map);
+	rc = pci_dev_add_dport(pdev, port, component_reg_phys);
+	if (rc) {
+		ctx->error = rc;
+		return rc;
 	}
+
 	ctx->count++;
 
 	return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH)
  2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
                   ` (6 preceding siblings ...)
  2022-11-09 10:40 ` [PATCH v3 7/9] cxl/pci: Factor out code in match_add_dports() to pci_dev_add_dport() Robert Richter
@ 2022-11-09 10:40 ` Robert Richter
  2022-11-11 11:59   ` Robert Richter
  2022-11-16 20:55   ` Dan Williams
  2022-11-09 10:40 ` [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Robert Richter
  8 siblings, 2 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

The PCIe Software View of an RCH and RCD is different to VH mode. An
RCD is paired with an RCH and shows up as RCiEP. Its downstream and
upstream ports are hidden to the PCI hierarchy. This different PCI
topology requires a different handling of RCHs.

Extend devm_cxl_port_enumerate_dports() to support restricted hosts
(RCH). If an RCH is detected, register its port as dport to the
device. An RCH is found if the host's dev 0 func 0 devices is an RCiEP
with an existing PCIe DVSEC for CXL Devices (ID 0000h).

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/pci.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 667de4f125f6..a6b1a1501db3 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -48,6 +48,37 @@ static int pci_dev_add_dport(struct pci_dev *pdev, struct cxl_port *port,
 	return 0;
 }
 
+static int restricted_host_enumerate_dport(struct cxl_port *port,
+					   struct pci_bus *bus)
+{
+	struct pci_dev *pdev;
+	bool is_restricted_host;
+	int rc;
+
+	/* Check CXL DVSEC of dev 0 func 0 */
+	pdev = pci_get_slot(bus, PCI_DEVFN(0, 0));
+
+	is_restricted_host = pdev
+		&& (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END)
+		&& pci_find_dvsec_capability(pdev,
+					PCI_DVSEC_VENDOR_ID_CXL,
+					CXL_DVSEC_PCIE_DEVICE);
+	if (is_restricted_host)
+		rc = pci_dev_add_dport(pdev, port, CXL_RESOURCE_NONE);
+
+	pci_dev_put(pdev);
+
+	if (!is_restricted_host)
+		return 0;
+
+	dev_dbg(bus->bridge, "CXL restricted host found\n");
+
+	if (rc)
+		return rc;
+
+	return 1;
+}
+
 static int match_add_dports(struct pci_dev *pdev, void *data)
 {
 	struct cxl_walk_context *ctx = data;
@@ -91,11 +122,15 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port)
 {
 	struct pci_bus *bus = cxl_port_to_pci_bus(port);
 	struct cxl_walk_context ctx;
-	int type;
+	int type, count;
 
 	if (!bus)
 		return -ENXIO;
 
+	count = restricted_host_enumerate_dport(port, bus);
+	if (count)
+		return count;
+
 	if (pci_is_root_bus(bus))
 		type = PCI_EXP_TYPE_ROOT_PORT;
 	else
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
                   ` (7 preceding siblings ...)
  2022-11-09 10:40 ` [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH) Robert Richter
@ 2022-11-09 10:40 ` Robert Richter
  2022-11-09 12:22   ` Rafael J. Wysocki
  2022-11-09 23:35   ` Bjorn Helgaas
  8 siblings, 2 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Bjorn Helgaas, Rafael J. Wysocki
  Cc: linux-cxl, linux-kernel, Len Brown, Jonathan Cameron,
	Davidlohr Bueso, Dave Jiang, Robert Richter, Terry Bowman,
	linux-pci, linux-acpi

From: Terry Bowman <terry.bowman@amd.com>

ACPI includes a CXL _OSC support procedure to communicate the available
CXL support to FW. The CXL support _OSC includes a field to indicate
CXL1.1 RCH RCD support. The OS sets this bit to 1 if it supports access
to RCD and RCH Port registers.[1] FW can potentially change it's operation
depending on the _OSC support setting reported by the OS.

The ACPI driver does not currently set the ACPI _OSC support to indicate
CXL1.1 RCD RCH support. Change the capability reported to include CXL1.1.

[1] CXL3.0 Table 9-26 'Interpretation of CXL _OSC Support Field'

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/acpi/pci_root.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index c8385ef54c37..094a59b216ae 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -492,6 +492,7 @@ static u32 calculate_cxl_support(void)
 	u32 support;
 
 	support = OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT;
+	support |= OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT;
 	if (pci_aer_available())
 		support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
 	if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  2022-11-09 10:40 ` [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Robert Richter
@ 2022-11-09 12:22   ` Rafael J. Wysocki
  2022-11-09 23:35   ` Bjorn Helgaas
  1 sibling, 0 replies; 52+ messages in thread
From: Rafael J. Wysocki @ 2022-11-09 12:22 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Bjorn Helgaas, Rafael J. Wysocki, linux-cxl,
	linux-kernel, Len Brown, Jonathan Cameron, Davidlohr Bueso,
	Dave Jiang, Terry Bowman, linux-pci, linux-acpi

On Wed, Nov 9, 2022 at 11:41 AM Robert Richter <rrichter@amd.com> wrote:
>
> From: Terry Bowman <terry.bowman@amd.com>
>
> ACPI includes a CXL _OSC support procedure to communicate the available
> CXL support to FW. The CXL support _OSC includes a field to indicate
> CXL1.1 RCH RCD support. The OS sets this bit to 1 if it supports access
> to RCD and RCH Port registers.[1] FW can potentially change it's operation
> depending on the _OSC support setting reported by the OS.
>
> The ACPI driver does not currently set the ACPI _OSC support to indicate
> CXL1.1 RCD RCH support. Change the capability reported to include CXL1.1.
>
> [1] CXL3.0 Table 9-26 'Interpretation of CXL _OSC Support Field'
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Robert Richter <rrichter@amd.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/pci_root.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index c8385ef54c37..094a59b216ae 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -492,6 +492,7 @@ static u32 calculate_cxl_support(void)
>         u32 support;
>
>         support = OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT;
> +       support |= OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT;
>         if (pci_aer_available())
>                 support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
>         if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)
  2022-11-09 10:40 ` [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs) Robert Richter
@ 2022-11-09 16:55   ` Dave Jiang
  2022-11-15  0:07   ` Dan Williams
  2022-11-15  0:24   ` Dan Williams
  2 siblings, 0 replies; 52+ messages in thread
From: Dave Jiang @ 2022-11-09 16:55 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso



On 11/9/2022 2:40 AM, Robert Richter wrote:
> When an endpoint is found, all ports in beetween the endpoint and the
s/beetween/between/

DJ
> CXL host bridge need to be created. In the RCH case there are no ports
> in between a host bridge and the endpoint. Skip the enumeration of
> intermediate ports.
> 
> The port enumeration does not only create all ports, it also
> initializes the endpoint chain by adding the endpoint to every
> downstream port up to the root bridge. This must be done also in RCD
> mode, but is much more simple as the endpoint only needs to be added
> to the host bridge's dport.
> 
> Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> released in cxl_port_release().
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>   drivers/cxl/core/port.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index d10c3580719b..e21a9c3fe4da 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>   {
>   	struct device *dev = &cxlmd->dev;
>   	struct device *iter;
> +	struct cxl_dport *dport;
> +	struct cxl_port *port;
>   	int rc;
>   
> +	/*
> +	 * Skip intermediate port enumeration in the RCH case, there
> +	 * are no ports in between a host bridge and an endpoint. Only
> +	 * initialize the EP chain.
> +	 */
> +	if (is_cxl_restricted(cxlmd)) {
> +		port = cxl_mem_find_port(cxlmd, &dport);
> +		if (!port)
> +			return -ENXIO;
> +		rc = cxl_add_ep(dport, &cxlmd->dev);
> +		put_device(&port->dev);
> +		return rc;
> +	}
> +
>   	rc = devm_add_action_or_reset(&cxlmd->dev, cxl_detach_ep, cxlmd);
>   	if (rc)
>   		return rc;
> @@ -1381,8 +1397,6 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>   	for (iter = dev; iter; iter = grandparent(iter)) {
>   		struct device *dport_dev = grandparent(iter);
>   		struct device *uport_dev;
> -		struct cxl_dport *dport;
> -		struct cxl_port *port;
>   
>   		if (!dport_dev)
>   			return 0;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports()
  2022-11-09 10:40 ` [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports() Robert Richter
@ 2022-11-09 23:09   ` Bjorn Helgaas
  2022-11-11 11:56     ` Robert Richter
  2022-11-16 19:36   ` Dan Williams
  1 sibling, 1 reply; 52+ messages in thread
From: Bjorn Helgaas @ 2022-11-09 23:09 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, linux-cxl, linux-kernel, Bjorn Helgaas,
	Rafael J. Wysocki, Len Brown, Jonathan Cameron, Davidlohr Bueso,
	Dave Jiang

On Wed, Nov 09, 2022 at 11:40:56AM +0100, Robert Richter wrote:
> The link capabilities of a PCI device are read when enumerating its
> dports. This is done by reading the PCI config space. If that fails
> port enumeration ignores that error. However, reading the PCI config
> space should reliably work.
> 
> To reduce some complexity to the code flow when factoring out parts of
> the code in match_add_dports() for later reuse, change this to throw
> an error.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 0dbbe8d39b07..8271b8abde7a 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -47,7 +47,7 @@ static int match_add_dports(struct pci_dev *pdev, void *data)
>  		return 0;
>  	if (pci_read_config_dword(pdev, pci_pcie_cap(pdev) + PCI_EXP_LNKCAP,
>  				  &lnkcap))

You didn't change this, but I recommend using
pcie_capability_read_dword() when reading the PCIe Capability.  It
takes care of some annoying corner cases like devices that don't
implement Link Cap and the different versions of the PCIe Capability.

> -		return 0;
> +		return -ENXIO;
>  
>  	rc = cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, &map);
>  	if (rc)
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device
  2022-11-09 10:40 ` [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device Robert Richter
@ 2022-11-09 23:11   ` Bjorn Helgaas
  2022-11-14 20:22   ` Dan Williams
  1 sibling, 0 replies; 52+ messages in thread
From: Bjorn Helgaas @ 2022-11-09 23:11 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, linux-cxl, linux-kernel, Bjorn Helgaas,
	Rafael J. Wysocki, Len Brown, Jonathan Cameron, Davidlohr Bueso,
	Dave Jiang

On Wed, Nov 09, 2022 at 11:40:51AM +0100, Robert Richter wrote:
> A port of a CXL host bridge links to the bridge's acpi device

s/acpi/ACPI/ to match usage below.

> (&adev->dev) with its corresponding uport/dport device (uport_dev and
> dport_dev respectively). The device is not a direct parent device in
> the PCI topology as pdev->dev.parent points to a PCI bridge's (struct
> pci_host_bridge) device. The following CXL memory device hierarchy
> would be valid for an endpoint once an RCD EP would be enabled (note
> this will be done in a later patch):
> 
> VH mode:
> 
>  cxlmd->dev.parent->parent
>         ^^^\^^^^^^\ ^^^^^^\
>             \      \       pci_dev (Type 1, Downstream Port)
>              \      pci_dev (Type 0, PCI Express Endpoint)
>               cxl mem device
> 
> RCD mode:
> 
>  cxlmd->dev.parent->parent
>         ^^^\^^^^^^\ ^^^^^^\
>             \      \       pci_host_bridge
>              \      pci_dev (Type 0, RCiEP)
>               cxl mem device
> 
> In VH mode a downstream port is created by port enumeration and thus
> always exists.
> 
> Now, in RCD mode the host bridge also already exists but it references
> to an ACPI device. A port lookup by the PCI device's parent device
> will fail as a direct link to the registered port is missing. The ACPI
> device of the bridge must be determined first.
> 
> To prevent this, change port registration of a CXL host to use the
> bridge device instead. Do this also for the VH case as port topology
> will better reflect the PCI topology then.
> 
> If a mock device is registered by a test driver, the bridge pointer
> can be NULL. Keep using the matching ACPI device (&adev->dev) as a
> fallback in this case.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  2022-11-09 10:40 ` [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Robert Richter
  2022-11-09 12:22   ` Rafael J. Wysocki
@ 2022-11-09 23:35   ` Bjorn Helgaas
  2022-11-10  0:51     ` Verma, Vishal L
  2022-11-10 19:43     ` Terry Bowman
  1 sibling, 2 replies; 52+ messages in thread
From: Bjorn Helgaas @ 2022-11-09 23:35 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Bjorn Helgaas, Rafael J. Wysocki, linux-cxl,
	linux-kernel, Len Brown, Jonathan Cameron, Davidlohr Bueso,
	Dave Jiang, Terry Bowman, linux-pci, linux-acpi

On Wed, Nov 09, 2022 at 11:40:59AM +0100, Robert Richter wrote:
> From: Terry Bowman <terry.bowman@amd.com>
> 
> ACPI includes a CXL _OSC support procedure to communicate the available
> CXL support to FW. The CXL support _OSC includes a field to indicate
> CXL1.1 RCH RCD support. The OS sets this bit to 1 if it supports access
> to RCD and RCH Port registers.[1] FW can potentially change it's operation

s/it's/its/

> depending on the _OSC support setting reported by the OS.
> 
> The ACPI driver does not currently set the ACPI _OSC support to indicate
> CXL1.1 RCD RCH support. Change the capability reported to include CXL1.1.

Eight instances of "support" above seems like it might be more than
necessary.

I don't know the history, but OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT and
OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT seem like sort of weird names
since they don't match the spec at all ("RCD and RCH Port Register
Access Supported" and "CXL VH Register Access Supported").

> [1] CXL3.0 Table 9-26 'Interpretation of CXL _OSC Support Field'
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/acpi/pci_root.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index c8385ef54c37..094a59b216ae 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -492,6 +492,7 @@ static u32 calculate_cxl_support(void)
>  	u32 support;
>  
>  	support = OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT;
> +	support |= OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT;
>  	if (pci_aer_available())
>  		support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
>  	if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  2022-11-09 23:35   ` Bjorn Helgaas
@ 2022-11-10  0:51     ` Verma, Vishal L
  2022-11-10 17:10       ` Bjorn Helgaas
  2022-11-10 19:43     ` Terry Bowman
  1 sibling, 1 reply; 52+ messages in thread
From: Verma, Vishal L @ 2022-11-10  0:51 UTC (permalink / raw)
  To: helgaas, rrichter
  Cc: terry.bowman, Jiang, Dave, rafael, Schofield, Alison, linux-cxl,
	lenb, linux-kernel, Williams, Dan J, Weiny, Ira, bwidawsk,
	Jonathan.Cameron, bhelgaas, dave, linux-acpi, linux-pci

On Wed, 2022-11-09 at 17:35 -0600, Bjorn Helgaas wrote:
> On Wed, Nov 09, 2022 at 11:40:59AM +0100, Robert Richter wrote:
> > From: Terry Bowman <terry.bowman@amd.com>
> > 
> > ACPI includes a CXL _OSC support procedure to communicate the available
> > CXL support to FW. The CXL support _OSC includes a field to indicate
> > CXL1.1 RCH RCD support. The OS sets this bit to 1 if it supports access
> > to RCD and RCH Port registers.[1] FW can potentially change it's operation
> 
> s/it's/its/
> 
> > depending on the _OSC support setting reported by the OS.
> > 
> > The ACPI driver does not currently set the ACPI _OSC support to indicate
> > CXL1.1 RCD RCH support. Change the capability reported to include CXL1.1.
> 
> Eight instances of "support" above seems like it might be more than
> necessary.
> 
> I don't know the history, but OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT and
> OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT seem like sort of weird names
> since they don't match the spec at all ("RCD and RCH Port Register
> Access Supported" and "CXL VH Register Access Supported").

Ah the RCH/RCD and VH terminology was only introduced in the CXL-3.0
spec. When the above defines were added, the spec was at 2.0, and it
used the descriptions: "CXL 1.1 Port Register Access supported", and
"CXL 2.0 Port/Device Register Access supported" (Table 217 in 2.0).

> 
> > [1] CXL3.0 Table 9-26 'Interpretation of CXL _OSC Support Field'
> > 
> > Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> >  drivers/acpi/pci_root.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> > index c8385ef54c37..094a59b216ae 100644
> > --- a/drivers/acpi/pci_root.c
> > +++ b/drivers/acpi/pci_root.c
> > @@ -492,6 +492,7 @@ static u32 calculate_cxl_support(void)
> >         u32 support;
> >  
> >         support = OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT;
> > +       support |= OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT;
> >         if (pci_aer_available())
> >                 support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
> >         if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> > -- 
> > 2.30.2
> > 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  2022-11-10  0:51     ` Verma, Vishal L
@ 2022-11-10 17:10       ` Bjorn Helgaas
  0 siblings, 0 replies; 52+ messages in thread
From: Bjorn Helgaas @ 2022-11-10 17:10 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: rrichter, terry.bowman, Jiang, Dave, rafael, Schofield, Alison,
	linux-cxl, lenb, linux-kernel, Williams, Dan J, Weiny, Ira,
	bwidawsk, Jonathan.Cameron, bhelgaas, dave, linux-acpi,
	linux-pci

On Thu, Nov 10, 2022 at 12:51:02AM +0000, Verma, Vishal L wrote:
> On Wed, 2022-11-09 at 17:35 -0600, Bjorn Helgaas wrote:

> > I don't know the history, but OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT and
> > OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT seem like sort of weird names
> > since they don't match the spec at all ("RCD and RCH Port Register
> > Access Supported" and "CXL VH Register Access Supported").
> 
> Ah the RCH/RCD and VH terminology was only introduced in the CXL-3.0
> spec. When the above defines were added, the spec was at 2.0, and it
> used the descriptions: "CXL 1.1 Port Register Access supported", and
> "CXL 2.0 Port/Device Register Access supported" (Table 217 in 2.0).

Haha, that's annoying :)  I didn't dig back through the old versions.
I guess CXL folks can decide whether to keep the old names or update.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support
  2022-11-09 23:35   ` Bjorn Helgaas
  2022-11-10  0:51     ` Verma, Vishal L
@ 2022-11-10 19:43     ` Terry Bowman
  1 sibling, 0 replies; 52+ messages in thread
From: Terry Bowman @ 2022-11-10 19:43 UTC (permalink / raw)
  To: Bjorn Helgaas, Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Bjorn Helgaas, Rafael J. Wysocki, linux-cxl,
	linux-kernel, Len Brown, Jonathan Cameron, Davidlohr Bueso,
	Dave Jiang, linux-pci, linux-acpi



On 11/9/22 17:35, Bjorn Helgaas wrote:
> On Wed, Nov 09, 2022 at 11:40:59AM +0100, Robert Richter wrote:
>> From: Terry Bowman <terry.bowman@amd.com>
>>
>> ACPI includes a CXL _OSC support procedure to communicate the available
>> CXL support to FW. The CXL support _OSC includes a field to indicate
>> CXL1.1 RCH RCD support. The OS sets this bit to 1 if it supports access
>> to RCD and RCH Port registers.[1] FW can potentially change it's operation
> 
> s/it's/its/
> 

Ok, will fix.

>> depending on the _OSC support setting reported by the OS.
>>
>> The ACPI driver does not currently set the ACPI _OSC support to indicate
>> CXL1.1 RCD RCH support. Change the capability reported to include CXL1.1.
> 
> Eight instances of "support" above seems like it might be more than
> necessary.
> 
> I don't know the history, but OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT and
> OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT seem like sort of weird names
> since they don't match the spec at all ("RCD and RCH Port Register
> Access Supported" and "CXL VH Register Access Supported").
> 

I'll reword to not excessively reuse 'support'. Thanks for point out.

Regards,
Terry

>> [1] CXL3.0 Table 9-26 'Interpretation of CXL _OSC Support Field'
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Signed-off-by: Robert Richter <rrichter@amd.com>
>> ---
>>  drivers/acpi/pci_root.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index c8385ef54c37..094a59b216ae 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -492,6 +492,7 @@ static u32 calculate_cxl_support(void)
>>  	u32 support;
>>  
>>  	support = OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT;
>> +	support |= OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT;
>>  	if (pci_aer_available())
>>  		support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
>>  	if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports()
  2022-11-09 23:09   ` Bjorn Helgaas
@ 2022-11-11 11:56     ` Robert Richter
  2022-11-11 12:07       ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-11 11:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, linux-cxl, linux-kernel, Bjorn Helgaas,
	Rafael J. Wysocki, Len Brown, Jonathan Cameron, Davidlohr Bueso,
	Dave Jiang

Bjorn,

thank you for your review, I will address all your comments you made
also for other patches.

On 09.11.22 17:09:56, Bjorn Helgaas wrote:
> On Wed, Nov 09, 2022 at 11:40:56AM +0100, Robert Richter wrote:

> > @@ -47,7 +47,7 @@ static int match_add_dports(struct pci_dev *pdev, void *data)
> >  		return 0;
> >  	if (pci_read_config_dword(pdev, pci_pcie_cap(pdev) + PCI_EXP_LNKCAP,
> >  				  &lnkcap))
> 
> You didn't change this, but I recommend using
> pcie_capability_read_dword() when reading the PCIe Capability.  It
> takes care of some annoying corner cases like devices that don't
> implement Link Cap and the different versions of the PCIe Capability.

This is a good hint, I looked into pcie_capability_read_dword() and
found the function is checking the pci_pcie_type(). For CXL VH it is
ok as the type is an endpoint, but for the RCD case it is an RCiEP.

Two issues arise here, there are those options:

1) Device implements CXL UP RCRB (3.0, 8.2.1.2)

The link capability must be read from PCIe caps of the UP RCRB instead
of the RCiEP. The implementation of pci_dev_add_dport() and
restricted_host_enumerate_dport() in a later patch of this series
("cxl/pci: Extend devm_cxl_port_enumerate_dports() to support
restricted hosts (RCH)") is not sufficient and must be changed to use
RCRB to determine the port id if it is an RCD.

2) Device does not implement CXL UP RCRB (3.0, 9.11.8)

The RCRB reads all FFs and CXL DVSEC 7 is accessible through the
device's config space now to avoid register remappings. Since there is
no RCD UP type 0 config space any longer, I would assume the UP's PCIe
caps with the link cap would be also made available through the
endpoint, but now it is an RCiEP. This violates the PCIe base spec
which does not allow to implement the link caps (PCIe base 6.0, 7.5.3
PCI Express Capability Structure, Link Capabilities). This makes sense
as an RCiEP is not connected to a root or downstream port and thus
does not have an upstream port. Strictly looking into the wording of
the PCI base spec, Link caps are not required for RCiEP and thus is
not prohibitedi and could be implemented optionnally. But CXL 3.0 spec
is more explicit here: "[The Link Capabilities] for an RCiEP [...]
shall not be implemented by the Device 0, Function 0" (CXL 3.0,
8.2.1.2).

I think, the spec's intention in 9.11.8 is to reduce remappings and
the device should just pass through the Link caps as there is a hidden
upstream port the RCiEP connected to it. Anyway, a CXL spec
clarification is needed here and pcie_capability_read_dword() needs to
be adjusted then for the RCiEP/RCD case.

Which raises another question to extend struct pci_dev the following:

#ifdef CXL_PCI
	u16	cxl_dev_cap;	/* CXL DVSEC 3, 8.1.3.1 DVSEC CXL Capability (Offset 0Ah)*/
	u16	cxl_port_cap;	/* CXL DVSEC 7, 8.2.1.3.1 DVSEC Flex Bus Port Capability (Offset 0Ah) */
#endif

Note: At least one cap is mandatory for all kind of CXL devices, see
CXL 3.0, Table 8-2.

There could be a helper then for a CXL check:

static inline bool dev_is_cxl(struct pci_dev *dev)
{
	return dev->cxl_dev_cap || dev->cxl_port_cap;
}

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH)
  2022-11-09 10:40 ` [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH) Robert Richter
@ 2022-11-11 11:59   ` Robert Richter
  2022-11-16 20:55   ` Dan Williams
  1 sibling, 0 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-11 11:59 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 09.11.22 11:40:58, Robert Richter wrote:

> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 667de4f125f6..a6b1a1501db3 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -48,6 +48,37 @@ static int pci_dev_add_dport(struct pci_dev *pdev, struct cxl_port *port,
>  	return 0;
>  }
>  
> +static int restricted_host_enumerate_dport(struct cxl_port *port,
> +					   struct pci_bus *bus)
> +{
> +	struct pci_dev *pdev;
> +	bool is_restricted_host;
> +	int rc;
> +
> +	/* Check CXL DVSEC of dev 0 func 0 */
> +	pdev = pci_get_slot(bus, PCI_DEVFN(0, 0));
> +
> +	is_restricted_host = pdev
> +		&& (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END)
> +		&& pci_find_dvsec_capability(pdev,
> +					PCI_DVSEC_VENDOR_ID_CXL,
> +					CXL_DVSEC_PCIE_DEVICE);
> +	if (is_restricted_host)
> +		rc = pci_dev_add_dport(pdev, port, CXL_RESOURCE_NONE);

See my comment in patch #6. This reads the port id from RCiEP's PCIe
cap, but instead the RCD UP RCRB should be used for this.
pci_dev_add_dport() needs to be updated to handle that.

-Robert

> +
> +	pci_dev_put(pdev);
> +
> +	if (!is_restricted_host)
> +		return 0;
> +
> +	dev_dbg(bus->bridge, "CXL restricted host found\n");
> +
> +	if (rc)
> +		return rc;
> +
> +	return 1;
> +}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports()
  2022-11-11 11:56     ` Robert Richter
@ 2022-11-11 12:07       ` Robert Richter
  0 siblings, 0 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-11 12:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, linux-cxl, linux-kernel, Bjorn Helgaas,
	Rafael J. Wysocki, Len Brown, Jonathan Cameron, Davidlohr Bueso,
	Dave Jiang

On 11.11.22 12:56:56, Robert Richter wrote:
> Which raises another question to extend struct pci_dev the following:
> 
> #ifdef CXL_PCI
> 	u16	cxl_dev_cap;	/* CXL DVSEC 3, 8.1.3.1 DVSEC CXL Capability (Offset 0Ah)*/

Correct is CXL DVSEC 0 here.

> 	u16	cxl_port_cap;	/* CXL DVSEC 7, 8.2.1.3.1 DVSEC Flex Bus Port Capability (Offset 0Ah) */
> #endif
> 
> Note: At least one cap is mandatory for all kind of CXL devices, see
> CXL 3.0, Table 8-2.

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device
  2022-11-09 10:40 ` [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device Robert Richter
  2022-11-09 23:11   ` Bjorn Helgaas
@ 2022-11-14 20:22   ` Dan Williams
  2022-11-15 10:37     ` Robert Richter
  1 sibling, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-14 20:22 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

Robert Richter wrote:
> A port of a CXL host bridge links to the bridge's acpi device
> (&adev->dev) with its corresponding uport/dport device (uport_dev and
> dport_dev respectively). The device is not a direct parent device in
> the PCI topology as pdev->dev.parent points to a PCI bridge's (struct
> pci_host_bridge) device. The following CXL memory device hierarchy
> would be valid for an endpoint once an RCD EP would be enabled (note
> this will be done in a later patch):
> 
> VH mode:
> 
>  cxlmd->dev.parent->parent
>         ^^^\^^^^^^\ ^^^^^^\
>             \      \       pci_dev (Type 1, Downstream Port)
>              \      pci_dev (Type 0, PCI Express Endpoint)
>               cxl mem device
> 
> RCD mode:
> 
>  cxlmd->dev.parent->parent
>         ^^^\^^^^^^\ ^^^^^^\
>             \      \       pci_host_bridge
>              \      pci_dev (Type 0, RCiEP)
>               cxl mem device
> 
> In VH mode a downstream port is created by port enumeration and thus
> always exists.
> 
> Now, in RCD mode the host bridge also already exists but it references
> to an ACPI device. A port lookup by the PCI device's parent device
> will fail as a direct link to the registered port is missing. The ACPI
> device of the bridge must be determined first.
> 
> To prevent this, change port registration of a CXL host to use the
> bridge device instead. Do this also for the VH case as port topology
> will better reflect the PCI topology then.
> 
> If a mock device is registered by a test driver, the bridge pointer
> can be NULL. Keep using the matching ACPI device (&adev->dev) as a
> fallback in this case.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++----------------
>  1 file changed, 31 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index fb9f72813067..06150c953f58 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -185,6 +185,17 @@ __mock struct acpi_device *to_cxl_host_bridge(struct device *host,
>  	return NULL;
>  }
>  
> +static inline struct acpi_pci_root *to_cxl_pci_root(struct device *host,
> +						    struct device *match)
> +{
> +	struct acpi_device *adev = to_cxl_host_bridge(host, match);
> +
> +	if (!adev)
> +		return NULL;
> +
> +	return acpi_pci_find_root(adev->handle);
> +}
> +
>  /*
>   * A host bridge is a dport to a CFMWS decode and it is a uport to the
>   * dport (PCIe Root Ports) in the host bridge.
> @@ -193,35 +204,35 @@ static int add_host_bridge_uport(struct device *match, void *arg)
>  {
>  	struct cxl_port *root_port = arg;
>  	struct device *host = root_port->dev.parent;
> -	struct acpi_device *bridge = to_cxl_host_bridge(host, match);
> -	struct acpi_pci_root *pci_root;
> +	struct acpi_pci_root *pci_root = to_cxl_pci_root(host, match);
>  	struct cxl_dport *dport;
>  	struct cxl_port *port;
> +	struct device *bridge;
>  	int rc;
>  
> -	if (!bridge)
> +	if (!pci_root)
>  		return 0;
>  
> -	dport = cxl_find_dport_by_dev(root_port, match);
> +	/*
> +	 * If it is a mock dev, the bridge can be NULL, use matching
> +	 * device (&adev->dev) as a fallback then then.
> +	 */
> +	bridge = pci_root->bus->bridge ?: match;

While I appreciate that you ran this against the unit tests, production
code should not know or care about the presence of mock devices. So,
this was showing a gap in the mock implementation, now addressed here:

http://lore.kernel.org/r/166845667383.1449826.14492184009399164787.stgit@dwillia2-xfh.jf.intel.com

With that, this approach can be simplified to the following:

-- >8 --
From 3cb7a46e100d016132727ad32904b629061e40d5 Mon Sep 17 00:00:00 2001
From: Robert Richter <rrichter@amd.com>
Date: Mon, 14 Nov 2022 12:20:27 -0800
Subject: [PATCH v4] cxl/acpi: Register CXL host ports by bridge device

A port of a CXL host bridge links to the bridge's acpi device
(&adev->dev) with its corresponding uport/dport device (uport_dev and
dport_dev respectively). The device is not a direct parent device in
the PCI topology as pdev->dev.parent points to a PCI bridge's (struct
pci_host_bridge) device. The following CXL memory device hierarchy
would be valid for an endpoint once an RCD EP would be enabled (note
this will be done in a later patch):

VH mode:

 cxlmd->dev.parent->parent
        ^^^\^^^^^^\ ^^^^^^\
            \      \       pci_dev (Type 1, Downstream Port)
             \      pci_dev (Type 0, PCI Express Endpoint)
              cxl mem device

RCD mode:

 cxlmd->dev.parent->parent
        ^^^\^^^^^^\ ^^^^^^\
            \      \       pci_host_bridge
             \      pci_dev (Type 0, RCiEP)
              cxl mem device

In VH mode a downstream port is created by port enumeration and thus
always exists.

Now, in RCD mode the host bridge also already exists but it references
to an ACPI device. A port lookup by the PCI device's parent device
will fail as a direct link to the registered port is missing. The ACPI
device of the bridge must be determined first.

To prevent this, change port registration of a CXL host to use the
bridge device instead. Do this also for the VH case as port topology
will better reflect the PCI topology then.

Signed-off-by: Robert Richter <rrichter@amd.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index fb9f72813067..71f8bdedd676 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -193,35 +193,34 @@ static int add_host_bridge_uport(struct device *match, void *arg)
 {
 	struct cxl_port *root_port = arg;
 	struct device *host = root_port->dev.parent;
-	struct acpi_device *bridge = to_cxl_host_bridge(host, match);
+	struct acpi_device *hb = to_cxl_host_bridge(host, match);
 	struct acpi_pci_root *pci_root;
 	struct cxl_dport *dport;
 	struct cxl_port *port;
+	struct device *bridge;
 	int rc;
 
-	if (!bridge)
+	if (!hb)
 		return 0;
 
-	dport = cxl_find_dport_by_dev(root_port, match);
+	pci_root = acpi_pci_find_root(hb->handle);
+	bridge = pci_root->bus->bridge;
+	dport = cxl_find_dport_by_dev(root_port, bridge);
 	if (!dport) {
 		dev_dbg(host, "host bridge expected and not found\n");
 		return 0;
 	}
 
-	/*
-	 * Note that this lookup already succeeded in
-	 * to_cxl_host_bridge(), so no need to check for failure here
-	 */
-	pci_root = acpi_pci_find_root(bridge->handle);
-	rc = devm_cxl_register_pci_bus(host, match, pci_root->bus);
+	rc = devm_cxl_register_pci_bus(host, bridge, pci_root->bus);
 	if (rc)
 		return rc;
 
-	port = devm_cxl_add_port(host, match, dport->component_reg_phys, dport);
+	port = devm_cxl_add_port(host, bridge, dport->component_reg_phys,
+				 dport);
 	if (IS_ERR(port))
 		return PTR_ERR(port);
 
-	dev_info(pci_root->bus->bridge, "host supports CXL\n");
+	dev_info(bridge, "host supports CXL\n");
 
 	return 0;
 }
@@ -253,18 +252,20 @@ static int cxl_get_chbcr(union acpi_subtable_headers *header, void *arg,
 static int add_host_bridge_dport(struct device *match, void *arg)
 {
 	acpi_status status;
+	struct device *bridge;
 	unsigned long long uid;
 	struct cxl_dport *dport;
 	struct cxl_chbs_context ctx;
+	struct acpi_pci_root *pci_root;
 	struct cxl_port *root_port = arg;
 	struct device *host = root_port->dev.parent;
-	struct acpi_device *bridge = to_cxl_host_bridge(host, match);
+	struct acpi_device *hb = to_cxl_host_bridge(host, match);
 
-	if (!bridge)
+	if (!hb)
 		return 0;
 
-	status = acpi_evaluate_integer(bridge->handle, METHOD_NAME__UID, NULL,
-				       &uid);
+	status =
+		acpi_evaluate_integer(hb->handle, METHOD_NAME__UID, NULL, &uid);
 	if (status != AE_OK) {
 		dev_err(match, "unable to retrieve _UID\n");
 		return -ENODEV;
@@ -285,7 +286,9 @@ static int add_host_bridge_dport(struct device *match, void *arg)
 
 	dev_dbg(match, "CHBCR found: 0x%08llx\n", (u64)ctx.chbcr);
 
-	dport = devm_cxl_add_dport(root_port, match, uid, ctx.chbcr);
+	pci_root = acpi_pci_find_root(hb->handle);
+	bridge = pci_root->bus->bridge;
+	dport = devm_cxl_add_dport(root_port, bridge, uid, ctx.chbcr);
 	if (IS_ERR(dport))
 		return PTR_ERR(dport);
 
-- 
2.38.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB
  2022-11-09 10:40 ` [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB Robert Richter
@ 2022-11-14 21:30   ` Dan Williams
  2022-11-15 12:17     ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-14 21:30 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter, Terry Bowman

Robert Richter wrote:
> A downstream port must be connected to a component register block.
> For restricted hosts the base address is determined from the RCRB. The
> RCRB is provided by the host's CEDT CHBS entry. Rework CEDT parser to
> get the RCRB and add code to extract the component register block from
> it.
> 
> RCRB's BAR[0..1] point to the component block containing CXL subsystem
> component registers. MEMBAR extraction follows the PCI base spec here,
> esp. 64 bit extraction and memory range alignment (6.0, 7.5.1.2.1).
> 
> Note: Right now the component register block is used for HDM decoder
> capability only which is optional for RCDs. If unsupported by the RCD,
> the HDM init will fail. It is future work to bypass it in this case.
> 
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/acpi.c      | 43 +++++++++++++++++++++++++++++---------
>  drivers/cxl/core/regs.c | 46 +++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h       |  8 +++++++
>  3 files changed, 87 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 06150c953f58..caea42cf9522 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -9,6 +9,8 @@
>  #include "cxlpci.h"
>  #include "cxl.h"
>  
> +#define CXL_RCRB_SIZE	SZ_8K
> +
>  static unsigned long cfmws_to_decoder_flags(int restrictions)
>  {
>  	unsigned long flags = CXL_DECODER_F_ENABLE;
> @@ -240,27 +242,46 @@ static int add_host_bridge_uport(struct device *match, void *arg)
>  struct cxl_chbs_context {
>  	struct device *dev;
>  	unsigned long long uid;
> -	resource_size_t chbcr;
> +	struct acpi_cedt_chbs chbs;
>  };
>  
> -static int cxl_get_chbcr(union acpi_subtable_headers *header, void *arg,
> -			 const unsigned long end)
> +static int cxl_get_chbs(union acpi_subtable_headers *header, void *arg,
> +			const unsigned long end)
>  {
>  	struct cxl_chbs_context *ctx = arg;
>  	struct acpi_cedt_chbs *chbs;
>  
> -	if (ctx->chbcr)
> +	if (ctx->chbs.base)
>  		return 0;
>  
>  	chbs = (struct acpi_cedt_chbs *) header;
>  
>  	if (ctx->uid != chbs->uid)
>  		return 0;
> -	ctx->chbcr = chbs->base;
> +	ctx->chbs = *chbs;
>  
>  	return 0;
>  }
>  
> +static resource_size_t cxl_get_chbcr(struct cxl_chbs_context *ctx)
> +{
> +	struct acpi_cedt_chbs *chbs = &ctx->chbs;
> +
> +	if (!chbs->base)
> +		return CXL_RESOURCE_NONE;
> +
> +	if (chbs->cxl_version != ACPI_CEDT_CHBS_VERSION_CXL11)
> +		return chbs->base;
> +
> +	if (chbs->length != CXL_RCRB_SIZE)
> +		return CXL_RESOURCE_NONE;
> +
> +	dev_dbg(ctx->dev, "RCRB found for UID %lld: 0x%08llx\n",
> +		ctx->uid, (u64)chbs->base);
> +
> +	return cxl_rcrb_to_component(ctx->dev, chbs->base, CXL_RCRB_DOWNSTREAM);
> +}
> +
>  static int add_host_bridge_dport(struct device *match, void *arg)
>  {
>  	acpi_status status;
> @@ -272,6 +293,7 @@ static int add_host_bridge_dport(struct device *match, void *arg)
>  	struct acpi_pci_root *pci_root = to_cxl_pci_root(host, match);
>  	struct device *bridge;
>  	acpi_handle handle;
> +	resource_size_t component_reg_phys;
>  
>  	if (!pci_root)
>  		return 0;
> @@ -287,19 +309,20 @@ static int add_host_bridge_dport(struct device *match, void *arg)
>  	dev_dbg(match, "UID found: %lld\n", uid);
>  
>  	ctx = (struct cxl_chbs_context) {
> -		.dev = host,
> +		.dev = match,
>  		.uid = uid,
>  	};
> -	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, cxl_get_chbcr, &ctx);
> +	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, cxl_get_chbs, &ctx);
>  
> -	if (ctx.chbcr == 0) {
> +	component_reg_phys = cxl_get_chbcr(&ctx);
> +	if (component_reg_phys == CXL_RESOURCE_NONE) {
>  		dev_warn(match, "No CHBS found for Host Bridge (UID %lld)\n", uid);
>  		return 0;
>  	}
>  
> -	dev_dbg(match, "CHBCR found: 0x%08llx\n", (u64)ctx.chbcr);
> +	dev_dbg(match, "CHBCR found: 0x%08llx\n", (u64)component_reg_phys);
>  
> -	dport = devm_cxl_add_dport(root_port, bridge, uid, ctx.chbcr);
> +	dport = devm_cxl_add_dport(root_port, bridge, uid, component_reg_phys);
>  	if (IS_ERR(dport))
>  		return PTR_ERR(dport);
>  
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index ec178e69b18f..7a5bde81e949 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -307,3 +307,49 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
>  	return -ENODEV;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_find_regblock, CXL);
> +
> +resource_size_t cxl_rcrb_to_component(struct device *dev,
> +				      resource_size_t rcrb,
> +				      enum cxl_rcrb which)
> +{
> +	resource_size_t component_reg_phys;
> +	u32 bar0, bar1;
> +	void *addr;
> +
> +	if (which == CXL_RCRB_UPSTREAM)
> +		rcrb += SZ_4K;
> +
> +	/*
> +	 * RCRB's BAR[0..1] point to component block containing CXL
> +	 * subsystem component registers. MEMBAR extraction follows
> +	 * the PCI Base spec here, esp. 64 bit extraction and memory
> +	 * ranges alignment (6.0, 7.5.1.2.1).
> +	 */

A request_mem_region() is needed here to ensure ownership and expected
sequencing of accessing the RCRB to locate the component registers, and
accessing the RCRB to manipulate the component registers. It also helps
to sanity check that the BIOS mapped an exclusive range for the RCRB.

> +	addr = ioremap(rcrb, PCI_BASE_ADDRESS_0 + SZ_8);

That PCI_BASE_ADDRESS_0 does not belong there. It ends up being benign
and forcing ioremap to map 12K instead of 8K, but it is a
config-register offset, not part of the RCRB size.

> +	if (!addr) {
> +		dev_err(dev, "Failed to map region %pr\n", addr);
> +		return CXL_RESOURCE_NONE;
> +	}
> +
> +	bar0 = readl(addr + PCI_BASE_ADDRESS_0);
> +	bar1 = readl(addr + PCI_BASE_ADDRESS_1);
> +	iounmap(addr);

...corresponding release_mem_region() would go here.

> +
> +	/* sanity check */
> +	if (bar0 & (PCI_BASE_ADDRESS_MEM_TYPE_1M | PCI_BASE_ADDRESS_SPACE_IO))
> +		return CXL_RESOURCE_NONE;

I would have also expected:

- a sanity check for "Memory Space Enable" being set in the command
  register.

- an explicit check for 0xffffffff for the case when the upstream-port
  implements "no RCRB" mode.

- some check that BIOS initialized the BAR values post reset given these
  BARs are invisible to the PCI core resource assignment 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port
  2022-11-09 10:40 ` [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port Robert Richter
@ 2022-11-14 23:45   ` Dan Williams
  2022-11-15 13:12     ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-14 23:45 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

Robert Richter wrote:
> The PCIe software view of an RCH and RCD is different to VH mode. An
> RCD is paired with an RCH and shows up as RCiEP with a parent already
> pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
> mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
> device as parent (struct pci_dev, most of the time a downstream switch
> port, but could also be a root port). The following hierarchy applies
> in VH mode:
> 
>  CXL memory device, cxl_memdev                               endpoint
>  └──PCIe Endpoint (type 0), pci_dev                           |
>     └──Downstream Port (type 1), pci_dev (Nth switch)        portN
>        └──Upstream Port (type 1), pci_dev (Nth switch)        |
>           :                                                   :
>           └──Downstream Port (type 1), pci_dev (1st switch)  port1
>              └──Upstream Port (type 1), pci_dev (1st switch)  |
>                 └──Root Port (type 1), pci_dev                |
>                    └──PCI host bridge, pci_host_bridge       port0
>                       :                                       |
>                       :..ACPI0017, acpi_dev                  root
> 
>  (There can be zero or any other number of switches in between.)
> 
> An iterator through the grandparents takes us to the root port which
> is registered as dport to the bridge. The next port an endpoint is
> connected to can be determined by using the grandparent of the memory
> device as a dport_dev in cxl_mem_find_port().
> 
> The same does not work in RCD mode where only an RCiEP is connected to
> the host bridge:
> 
>  CXL memory device, cxl_memdev                               endpoint
>  └──PCIe Endpoint (type 0), pci_dev                           |
>     └──PCI host bridge, pci_host_bridge                      port0
>        :                                                      |
>        :..ACPI0017, acpi_dev                                 root
> 
> Here, an endpoint is directly connected to the host bridge without a
> type 1 PCI device (root or downstream port) in between. To link the
> endpoint to the correct port, the endpoint's PCI device (parent of the
> memory device) must be taken as dport_dev arg in cxl_mem_find_port().
> 
> Change cxl_mem_find_port() to find an RCH's port.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 0431ed860d8e..d10c3580719b 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
>  	return rc;
>  }
>  
> +static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
> +{
> +	struct device *parent = cxlmd->dev.parent;
> +	if (!dev_is_pci(parent))
> +		return false;
> +	return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
> +}
> +
>  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>  {
>  	struct device *dev = &cxlmd->dev;
> @@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
>  
> +/*
> + * CXL memory device and port hierarchy:
> + *
> + * VH mode:
> + *
> + * CXL memory device, cxl_memdev                               endpoint
> + * └──PCIe Endpoint (type 0), pci_dev                           |
> + *    └──Downstream Port (type 1), pci_dev (Nth switch)        portN
> + *       └──Upstream Port (type 1), pci_dev (Nth switch)        |
> + *          :                                                   :
> + *          └──Downstream Port (type 1), pci_dev (1st switch)  port1
> + *             └──Upstream Port (type 1), pci_dev (1st switch)  |
> + *                └──Root Port (type 1), pci_dev                |
> + *                   └──PCI host bridge, pci_host_bridge       port0
> + *                      :                                       |
> + *                      :..ACPI0017, acpi_dev                  root
> + *
> + * (There can be zero or any other number of switches in between.)
> + *
> + * RCD mode:
> + *
> + * CXL memory device, cxl_memdev                               endpoint
> + * └──PCIe Endpoint (type 0), pci_dev                           |
> + *    └──PCI host bridge, pci_host_bridge                      port0
> + *       :                                                      |
> + *       :..ACPI0017, acpi_dev                                 root
> + */
>  struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
>  				   struct cxl_dport **dport)
>  {
> +	if (is_cxl_restricted(cxlmd))
> +		return find_cxl_port(cxlmd->dev.parent, dport);
> +
>  	return find_cxl_port(grandparent(&cxlmd->dev), dport);

I do not see why this change is needed. For example:

# readlink -f /sys/bus/cxl/devices/mem0
/sys/devices/pci0000:38/0000:38:00.0/mem0
# cxl list -BT
[
  {
    "bus":"root0",
    "provider":"ACPI.CXL",
    "nr_dports":1,
    "dports":[
      {
        "dport":"pci0000:38",
        "id":49
      }
    ]
  }
]

...so, in this case, the grandparent of "mem0" is "pci0000:38", and
"pci0000:38" is a dport. Unmodified cxl_mem_find_port() will do the
right thing and find that this CXL RCIEP is directly connected to
"root0".

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)
  2022-11-09 10:40 ` [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs) Robert Richter
  2022-11-09 16:55   ` Dave Jiang
@ 2022-11-15  0:07   ` Dan Williams
  2022-11-15 13:17     ` Robert Richter
  2022-11-15  0:24   ` Dan Williams
  2 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-15  0:07 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

Robert Richter wrote:
> When an endpoint is found, all ports in beetween the endpoint and the
> CXL host bridge need to be created. In the RCH case there are no ports
> in between a host bridge and the endpoint. Skip the enumeration of
> intermediate ports.
> 
> The port enumeration does not only create all ports, it also
> initializes the endpoint chain by adding the endpoint to every
> downstream port up to the root bridge. This must be done also in RCD
> mode, but is much more simple as the endpoint only needs to be added
> to the host bridge's dport.
> 
> Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> released in cxl_port_release().
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/port.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index d10c3580719b..e21a9c3fe4da 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>  {
>  	struct device *dev = &cxlmd->dev;
>  	struct device *iter;
> +	struct cxl_dport *dport;
> +	struct cxl_port *port;
>  	int rc;
>  
> +	/*
> +	 * Skip intermediate port enumeration in the RCH case, there
> +	 * are no ports in between a host bridge and an endpoint. Only
> +	 * initialize the EP chain.
> +	 */
> +	if (is_cxl_restricted(cxlmd)) {

I changed this to:

	if (cxlmd->cxlds->rcd) {

...where the cxl_pci driver sets that rcd flag. Aside from keeping some
PCI specifics out of this helper it also allows RCH/RCD configurations
to be mocked with cxl_test.

> +		port = cxl_mem_find_port(cxlmd, &dport);
> +		if (!port)
> +			return -ENXIO;
> +		rc = cxl_add_ep(dport, &cxlmd->dev);

Ah, good catch, I had missed this detail previously.

> +		put_device(&port->dev);
> +		return rc;
> +	}
> +
>  	rc = devm_add_action_or_reset(&cxlmd->dev, cxl_detach_ep, cxlmd);
>  	if (rc)
>  		return rc;
> @@ -1381,8 +1397,6 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>  	for (iter = dev; iter; iter = grandparent(iter)) {
>  		struct device *dport_dev = grandparent(iter);
>  		struct device *uport_dev;
> -		struct cxl_dport *dport;
> -		struct cxl_port *port;
>  
>  		if (!dport_dev)
>  			return 0;
> -- 
> 2.30.2
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)
  2022-11-09 10:40 ` [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs) Robert Richter
  2022-11-09 16:55   ` Dave Jiang
  2022-11-15  0:07   ` Dan Williams
@ 2022-11-15  0:24   ` Dan Williams
  2022-11-15 13:28     ` Robert Richter
  2 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-15  0:24 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

Robert Richter wrote:
> When an endpoint is found, all ports in beetween the endpoint and the
> CXL host bridge need to be created. In the RCH case there are no ports
> in between a host bridge and the endpoint. Skip the enumeration of
> intermediate ports.
> 
> The port enumeration does not only create all ports, it also
> initializes the endpoint chain by adding the endpoint to every
> downstream port up to the root bridge. This must be done also in RCD
> mode, but is much more simple as the endpoint only needs to be added
> to the host bridge's dport.
> 
> Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> released in cxl_port_release().
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/port.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index d10c3580719b..e21a9c3fe4da 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>  {
>  	struct device *dev = &cxlmd->dev;
>  	struct device *iter;
> +	struct cxl_dport *dport;
> +	struct cxl_port *port;
>  	int rc;
>  
> +	/*
> +	 * Skip intermediate port enumeration in the RCH case, there
> +	 * are no ports in between a host bridge and an endpoint. Only
> +	 * initialize the EP chain.
> +	 */
> +	if (is_cxl_restricted(cxlmd)) {
> +		port = cxl_mem_find_port(cxlmd, &dport);
> +		if (!port)
> +			return -ENXIO;
> +		rc = cxl_add_ep(dport, &cxlmd->dev);

On second look, this seems problematic. While yes it will be deleted
when the root CXL port dies, it will not be deleted if the cxl_pci
driver is reloaded. I will code up a unit test to double check.

I note that cxl_add_ep() for the VH case is skipped for the root CXL
port, so I do not suspect it is needed here either. Did you add it for a
specific reason?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device
  2022-11-14 20:22   ` Dan Williams
@ 2022-11-15 10:37     ` Robert Richter
  0 siblings, 0 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-15 10:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 14.11.22 12:22:49, Dan Williams wrote:
> Robert Richter wrote:
> > A port of a CXL host bridge links to the bridge's acpi device
> > (&adev->dev) with its corresponding uport/dport device (uport_dev and
> > dport_dev respectively). The device is not a direct parent device in
> > the PCI topology as pdev->dev.parent points to a PCI bridge's (struct
> > pci_host_bridge) device. The following CXL memory device hierarchy
> > would be valid for an endpoint once an RCD EP would be enabled (note
> > this will be done in a later patch):
> > 
> > VH mode:
> > 
> >  cxlmd->dev.parent->parent
> >         ^^^\^^^^^^\ ^^^^^^\
> >             \      \       pci_dev (Type 1, Downstream Port)
> >              \      pci_dev (Type 0, PCI Express Endpoint)
> >               cxl mem device
> > 
> > RCD mode:
> > 
> >  cxlmd->dev.parent->parent
> >         ^^^\^^^^^^\ ^^^^^^\
> >             \      \       pci_host_bridge
> >              \      pci_dev (Type 0, RCiEP)
> >               cxl mem device
> > 
> > In VH mode a downstream port is created by port enumeration and thus
> > always exists.
> > 
> > Now, in RCD mode the host bridge also already exists but it references
> > to an ACPI device. A port lookup by the PCI device's parent device
> > will fail as a direct link to the registered port is missing. The ACPI
> > device of the bridge must be determined first.
> > 
> > To prevent this, change port registration of a CXL host to use the
> > bridge device instead. Do this also for the VH case as port topology
> > will better reflect the PCI topology then.
> > 
> > If a mock device is registered by a test driver, the bridge pointer
> > can be NULL. Keep using the matching ACPI device (&adev->dev) as a
> > fallback in this case.
> > 
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> >  drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++----------------
> >  1 file changed, 31 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index fb9f72813067..06150c953f58 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -185,6 +185,17 @@ __mock struct acpi_device *to_cxl_host_bridge(struct device *host,
> >  	return NULL;
> >  }
> >  
> > +static inline struct acpi_pci_root *to_cxl_pci_root(struct device *host,
> > +						    struct device *match)
> > +{
> > +	struct acpi_device *adev = to_cxl_host_bridge(host, match);
> > +
> > +	if (!adev)
> > +		return NULL;
> > +
> > +	return acpi_pci_find_root(adev->handle);
> > +}
> > +
> >  /*
> >   * A host bridge is a dport to a CFMWS decode and it is a uport to the
> >   * dport (PCIe Root Ports) in the host bridge.
> > @@ -193,35 +204,35 @@ static int add_host_bridge_uport(struct device *match, void *arg)
> >  {
> >  	struct cxl_port *root_port = arg;
> >  	struct device *host = root_port->dev.parent;
> > -	struct acpi_device *bridge = to_cxl_host_bridge(host, match);
> > -	struct acpi_pci_root *pci_root;
> > +	struct acpi_pci_root *pci_root = to_cxl_pci_root(host, match);
> >  	struct cxl_dport *dport;
> >  	struct cxl_port *port;
> > +	struct device *bridge;
> >  	int rc;
> >  
> > -	if (!bridge)
> > +	if (!pci_root)
> >  		return 0;
> >  
> > -	dport = cxl_find_dport_by_dev(root_port, match);
> > +	/*
> > +	 * If it is a mock dev, the bridge can be NULL, use matching
> > +	 * device (&adev->dev) as a fallback then then.
> > +	 */
> > +	bridge = pci_root->bus->bridge ?: match;
> 
> While I appreciate that you ran this against the unit tests, production
> code should not know or care about the presence of mock devices. So,
> this was showing a gap in the mock implementation, now addressed here:
> 
> http://lore.kernel.org/r/166845667383.1449826.14492184009399164787.stgit@dwillia2-xfh.jf.intel.com

Yes, with that update the above check can be dropped and the code can
be implemented now without the helper. The patch below looks good to
me. Going to run a test with it.

> 
> With that, this approach can be simplified to the following:
> 
> -- >8 --
> >From 3cb7a46e100d016132727ad32904b629061e40d5 Mon Sep 17 00:00:00 2001
> From: Robert Richter <rrichter@amd.com>
> Date: Mon, 14 Nov 2022 12:20:27 -0800
> Subject: [PATCH v4] cxl/acpi: Register CXL host ports by bridge device

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB
  2022-11-14 21:30   ` Dan Williams
@ 2022-11-15 12:17     ` Robert Richter
  2022-11-15 17:54       ` Dan Williams
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-15 12:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Terry Bowman

On 14.11.22 13:30:01, Dan Williams wrote:
> Robert Richter wrote:

> > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > index ec178e69b18f..7a5bde81e949 100644
> > --- a/drivers/cxl/core/regs.c
> > +++ b/drivers/cxl/core/regs.c
> > @@ -307,3 +307,49 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> >  	return -ENODEV;
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_find_regblock, CXL);
> > +
> > +resource_size_t cxl_rcrb_to_component(struct device *dev,
> > +				      resource_size_t rcrb,
> > +				      enum cxl_rcrb which)
> > +{
> > +	resource_size_t component_reg_phys;
> > +	u32 bar0, bar1;
> > +	void *addr;
> > +
> > +	if (which == CXL_RCRB_UPSTREAM)
> > +		rcrb += SZ_4K;
> > +
> > +	/*
> > +	 * RCRB's BAR[0..1] point to component block containing CXL
> > +	 * subsystem component registers. MEMBAR extraction follows
> > +	 * the PCI Base spec here, esp. 64 bit extraction and memory
> > +	 * ranges alignment (6.0, 7.5.1.2.1).
> > +	 */
> 
> A request_mem_region() is needed here to ensure ownership and expected
> sequencing of accessing the RCRB to locate the component registers, and
> accessing the RCRB to manipulate the component registers. It also helps
> to sanity check that the BIOS mapped an exclusive range for the RCRB.

Right, that is missing.

> 
> > +	addr = ioremap(rcrb, PCI_BASE_ADDRESS_0 + SZ_8);
> 
> That PCI_BASE_ADDRESS_0 does not belong there. It ends up being benign
> and forcing ioremap to map 12K instead of 8K, but it is a
> config-register offset, not part of the RCRB size.

Note this is BAR0 + 8 bytes, not 8k, and it does not map the whole
RCRB region but instead the first part of the config space up to
including the 64 bit BAR.

> 
> > +	if (!addr) {
> > +		dev_err(dev, "Failed to map region %pr\n", addr);
> > +		return CXL_RESOURCE_NONE;
> > +	}
> > +
> > +	bar0 = readl(addr + PCI_BASE_ADDRESS_0);
> > +	bar1 = readl(addr + PCI_BASE_ADDRESS_1);
> > +	iounmap(addr);
> 
> ...corresponding release_mem_region() would go here.
> 
> > +
> > +	/* sanity check */
> > +	if (bar0 & (PCI_BASE_ADDRESS_MEM_TYPE_1M | PCI_BASE_ADDRESS_SPACE_IO))
> > +		return CXL_RESOURCE_NONE;
> 
> I would have also expected:
> 
> - a sanity check for "Memory Space Enable" being set in the command
>   register.

Ok.

> 
> - an explicit check for 0xffffffff for the case when the upstream-port
>   implements "no RCRB" mode.

Yes, I left support for this to a later patch, but it's better to
check it here already and possibly fall back to reg loc DVSEC then.

> 
> - some check that BIOS initialized the BAR values post reset given these
>   BARs are invisible to the PCI core resource assignment 

What check do you have in mind here? There is already the NULL check
which would be the out-of-reset value.

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port
  2022-11-14 23:45   ` Dan Williams
@ 2022-11-15 13:12     ` Robert Richter
  2022-11-15 18:06       ` Dan Williams
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-15 13:12 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 14.11.22 15:45:19, Dan Williams wrote:
> Robert Richter wrote:
> > The PCIe software view of an RCH and RCD is different to VH mode. An
> > RCD is paired with an RCH and shows up as RCiEP with a parent already
> > pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
> > mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
> > device as parent (struct pci_dev, most of the time a downstream switch
> > port, but could also be a root port). The following hierarchy applies
> > in VH mode:
> > 
> >  CXL memory device, cxl_memdev                               endpoint
> >  └──PCIe Endpoint (type 0), pci_dev                           |
> >     └──Downstream Port (type 1), pci_dev (Nth switch)        portN
> >        └──Upstream Port (type 1), pci_dev (Nth switch)        |
> >           :                                                   :
> >           └──Downstream Port (type 1), pci_dev (1st switch)  port1
> >              └──Upstream Port (type 1), pci_dev (1st switch)  |
> >                 └──Root Port (type 1), pci_dev                |
> >                    └──PCI host bridge, pci_host_bridge       port0
> >                       :                                       |
> >                       :..ACPI0017, acpi_dev                  root
> > 
> >  (There can be zero or any other number of switches in between.)
> > 
> > An iterator through the grandparents takes us to the root port which
> > is registered as dport to the bridge. The next port an endpoint is
> > connected to can be determined by using the grandparent of the memory
> > device as a dport_dev in cxl_mem_find_port().
> > 
> > The same does not work in RCD mode where only an RCiEP is connected to
> > the host bridge:
> > 
> >  CXL memory device, cxl_memdev                               endpoint
> >  └──PCIe Endpoint (type 0), pci_dev                           |
> >     └──PCI host bridge, pci_host_bridge                      port0
> >        :                                                      |
> >        :..ACPI0017, acpi_dev                                 root
> > 
> > Here, an endpoint is directly connected to the host bridge without a
> > type 1 PCI device (root or downstream port) in between. To link the
> > endpoint to the correct port, the endpoint's PCI device (parent of the
> > memory device) must be taken as dport_dev arg in cxl_mem_find_port().
> > 
> > Change cxl_mem_find_port() to find an RCH's port.
> > 
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> >  drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 0431ed860d8e..d10c3580719b 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> >  	return rc;
> >  }
> >  
> > +static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
> > +{
> > +	struct device *parent = cxlmd->dev.parent;
> > +	if (!dev_is_pci(parent))
> > +		return false;
> > +	return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
> > +}
> > +
> >  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> >  {
> >  	struct device *dev = &cxlmd->dev;
> > @@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
> >  
> > +/*
> > + * CXL memory device and port hierarchy:
> > + *
> > + * VH mode:
> > + *
> > + * CXL memory device, cxl_memdev                               endpoint
> > + * └──PCIe Endpoint (type 0), pci_dev                           |
> > + *    └──Downstream Port (type 1), pci_dev (Nth switch)        portN
> > + *       └──Upstream Port (type 1), pci_dev (Nth switch)        |
> > + *          :                                                   :
> > + *          └──Downstream Port (type 1), pci_dev (1st switch)  port1
> > + *             └──Upstream Port (type 1), pci_dev (1st switch)  |
> > + *                └──Root Port (type 1), pci_dev                |
> > + *                   └──PCI host bridge, pci_host_bridge       port0
> > + *                      :                                       |
> > + *                      :..ACPI0017, acpi_dev                  root
> > + *
> > + * (There can be zero or any other number of switches in between.)
> > + *
> > + * RCD mode:
> > + *
> > + * CXL memory device, cxl_memdev                               endpoint
> > + * └──PCIe Endpoint (type 0), pci_dev                           |
> > + *    └──PCI host bridge, pci_host_bridge                      port0
> > + *       :                                                      |
> > + *       :..ACPI0017, acpi_dev                                 root
> > + */
> >  struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
> >  				   struct cxl_dport **dport)
> >  {
> > +	if (is_cxl_restricted(cxlmd))
> > +		return find_cxl_port(cxlmd->dev.parent, dport);
> > +
> >  	return find_cxl_port(grandparent(&cxlmd->dev), dport);
> 
> I do not see why this change is needed. For example:
> 
> # readlink -f /sys/bus/cxl/devices/mem0
> /sys/devices/pci0000:38/0000:38:00.0/mem0
> # cxl list -BT
> [
>   {
>     "bus":"root0",
>     "provider":"ACPI.CXL",
>     "nr_dports":1,
>     "dports":[
>       {
>         "dport":"pci0000:38",
>         "id":49
>       }
>     ]
>   }
> ]
> 
> ...so, in this case, the grandparent of "mem0" is "pci0000:38", and
> "pci0000:38" is a dport. Unmodified cxl_mem_find_port() will do the
> right thing and find that this CXL RCIEP is directly connected to
> "root0".

find_cxl_port() uses the dport_dev, not the uport_dev. A lookup of
pci0000:38 gives the cxl root (ACPI.CXL). Instead, the endpoint's
device (0000:38:00.0) must be used to get to the bridge
("pci0000:38").

There is a parent missing because there is no Root Port in the RCD
hierarchy, simplified example:

VH mode:

 CXL memory device, cxl_memdev                         endpoint <- cxlmd
 └──PCIe Endpoint (type 0), pci_dev                     |
    └──Downstream Port (type 1), pci_dev (1st switch)  port1    <- port1: registered as dport at port0
       └──Upstream Port (type 1), pci_dev (1st switch)  |          port1: grandparent(cxlmd)
          └──Root Port (type 1), pci_dev                |	<- pdev:  registered as dport at port0
             |                                          |          pdev:  grandparent(port1)
             └──PCI host bridge, pci_host_bridge       port0    <- find_cxl_port(grandparent(grandparent(cxlmd)))
                :                                       |          port0 enumerates root ports (pdev) as dports
                :..ACPI0017, acpi_dev                  root

Additional switches add another grandparent() level each (adding an
dport and uport).

RCD mode:

 CXL memory device, cxl_memdev                         endpoint <- cxlmd
 └──PCIe Endpoint (type 0), pci_dev                     |       <- pdev: registered as dport at port0
    |                                                   |          pdev: parent(cxlmd)
    └──PCI host bridge, pci_host_bridge                port0    <- find_cxl_port(parent(endpoint))
       :                                                |          port0 enumerates endpoint (RCiEP) as dports
       :..ACPI0017, acpi_dev                           root

I hope I could shed some light here.

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)
  2022-11-15  0:07   ` Dan Williams
@ 2022-11-15 13:17     ` Robert Richter
  2022-11-15 18:08       ` Dan Williams
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-15 13:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 14.11.22 16:07:58, Dan Williams wrote:
> Robert Richter wrote:
> > When an endpoint is found, all ports in beetween the endpoint and the
> > CXL host bridge need to be created. In the RCH case there are no ports
> > in between a host bridge and the endpoint. Skip the enumeration of
> > intermediate ports.
> > 
> > The port enumeration does not only create all ports, it also
> > initializes the endpoint chain by adding the endpoint to every
> > downstream port up to the root bridge. This must be done also in RCD
> > mode, but is much more simple as the endpoint only needs to be added
> > to the host bridge's dport.
> > 
> > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > released in cxl_port_release().
> > 
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> >  drivers/cxl/core/port.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index d10c3580719b..e21a9c3fe4da 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> >  {
> >  	struct device *dev = &cxlmd->dev;
> >  	struct device *iter;
> > +	struct cxl_dport *dport;
> > +	struct cxl_port *port;
> >  	int rc;
> >  
> > +	/*
> > +	 * Skip intermediate port enumeration in the RCH case, there
> > +	 * are no ports in between a host bridge and an endpoint. Only
> > +	 * initialize the EP chain.
> > +	 */
> > +	if (is_cxl_restricted(cxlmd)) {
> 
> I changed this to:
> 
> 	if (cxlmd->cxlds->rcd) {

I a mail to Bjorn I suggested to have cxl_dev_cap and a cxl_port_cap
in the pci_dev struct that could be looked up too including RCD mode.
Checking the pci_dev looks more reasonable to me here, though we could
have a flag of it in cxlds too.

-Robert

> 
> ...where the cxl_pci driver sets that rcd flag. Aside from keeping some
> PCI specifics out of this helper it also allows RCH/RCD configurations
> to be mocked with cxl_test.
> 
> > +		port = cxl_mem_find_port(cxlmd, &dport);
> > +		if (!port)
> > +			return -ENXIO;
> > +		rc = cxl_add_ep(dport, &cxlmd->dev);
> 
> Ah, good catch, I had missed this detail previously.
> 
> > +		put_device(&port->dev);
> > +		return rc;
> > +	}
> > +
> >  	rc = devm_add_action_or_reset(&cxlmd->dev, cxl_detach_ep, cxlmd);
> >  	if (rc)
> >  		return rc;
> > @@ -1381,8 +1397,6 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> >  	for (iter = dev; iter; iter = grandparent(iter)) {
> >  		struct device *dport_dev = grandparent(iter);
> >  		struct device *uport_dev;
> > -		struct cxl_dport *dport;
> > -		struct cxl_port *port;
> >  
> >  		if (!dport_dev)
> >  			return 0;
> > -- 
> > 2.30.2
> > 
> 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)
  2022-11-15  0:24   ` Dan Williams
@ 2022-11-15 13:28     ` Robert Richter
  2022-11-15 18:09       ` Dan Williams
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-15 13:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 14.11.22 16:24:06, Dan Williams wrote:
> Robert Richter wrote:
> > When an endpoint is found, all ports in beetween the endpoint and the
> > CXL host bridge need to be created. In the RCH case there are no ports
> > in between a host bridge and the endpoint. Skip the enumeration of
> > intermediate ports.
> > 
> > The port enumeration does not only create all ports, it also
> > initializes the endpoint chain by adding the endpoint to every
> > downstream port up to the root bridge. This must be done also in RCD
> > mode, but is much more simple as the endpoint only needs to be added
> > to the host bridge's dport.
> > 
> > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > released in cxl_port_release().
> > 
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> >  drivers/cxl/core/port.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index d10c3580719b..e21a9c3fe4da 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> >  {
> >  	struct device *dev = &cxlmd->dev;
> >  	struct device *iter;
> > +	struct cxl_dport *dport;
> > +	struct cxl_port *port;
> >  	int rc;
> >  
> > +	/*
> > +	 * Skip intermediate port enumeration in the RCH case, there
> > +	 * are no ports in between a host bridge and an endpoint. Only
> > +	 * initialize the EP chain.
> > +	 */
> > +	if (is_cxl_restricted(cxlmd)) {
> > +		port = cxl_mem_find_port(cxlmd, &dport);
> > +		if (!port)
> > +			return -ENXIO;
> > +		rc = cxl_add_ep(dport, &cxlmd->dev);
> 
> On second look, this seems problematic. While yes it will be deleted
> when the root CXL port dies, it will not be deleted if the cxl_pci
> driver is reloaded. I will code up a unit test to double check.
> 
> I note that cxl_add_ep() for the VH case is skipped for the root CXL
> port, so I do not suspect it is needed here either. Did you add it for a
> specific reason?

Yes, all endpoint iterators need to be reworked. Also true, the first
endpoint is skipped in the chain. So only intermediate EP structs are
touched by the loops actually.

In particular, cxl_ep_load() returned NULL for the first lookup if the
ep is missing. We could stop the iteration then. I tried to avoid a
rework here, but maybe it is not too extensive as I expected first.

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB
  2022-11-15 12:17     ` Robert Richter
@ 2022-11-15 17:54       ` Dan Williams
  2022-11-17 12:43         ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-15 17:54 UTC (permalink / raw)
  To: Robert Richter, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Terry Bowman

Robert Richter wrote:
> On 14.11.22 13:30:01, Dan Williams wrote:
> > Robert Richter wrote:
> 
> > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > > index ec178e69b18f..7a5bde81e949 100644
> > > --- a/drivers/cxl/core/regs.c
> > > +++ b/drivers/cxl/core/regs.c
> > > @@ -307,3 +307,49 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> > >  	return -ENODEV;
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(cxl_find_regblock, CXL);
> > > +
> > > +resource_size_t cxl_rcrb_to_component(struct device *dev,
> > > +				      resource_size_t rcrb,
> > > +				      enum cxl_rcrb which)
> > > +{
> > > +	resource_size_t component_reg_phys;
> > > +	u32 bar0, bar1;
> > > +	void *addr;
> > > +
> > > +	if (which == CXL_RCRB_UPSTREAM)
> > > +		rcrb += SZ_4K;
> > > +
> > > +	/*
> > > +	 * RCRB's BAR[0..1] point to component block containing CXL
> > > +	 * subsystem component registers. MEMBAR extraction follows
> > > +	 * the PCI Base spec here, esp. 64 bit extraction and memory
> > > +	 * ranges alignment (6.0, 7.5.1.2.1).
> > > +	 */
> > 
> > A request_mem_region() is needed here to ensure ownership and expected
> > sequencing of accessing the RCRB to locate the component registers, and
> > accessing the RCRB to manipulate the component registers. It also helps
> > to sanity check that the BIOS mapped an exclusive range for the RCRB.
> 
> Right, that is missing.
> 
> > 
> > > +	addr = ioremap(rcrb, PCI_BASE_ADDRESS_0 + SZ_8);
> > 
> > That PCI_BASE_ADDRESS_0 does not belong there. It ends up being benign
> > and forcing ioremap to map 12K instead of 8K, but it is a
> > config-register offset, not part of the RCRB size.
> 
> Note this is BAR0 + 8 bytes, not 8k, and it does not map the whole
> RCRB region but instead the first part of the config space up to
> including the 64 bit BAR.

Oh, sorry, yes, my mistake. However, there is not much value in mapping
less than 4K since all ioremap requests are rounded up to PAGE_SIZE.
Since an RCRB is only 4K per port lets just map the whole thing.

> > > +	if (!addr) {
> > > +		dev_err(dev, "Failed to map region %pr\n", addr);
> > > +		return CXL_RESOURCE_NONE;
> > > +	}
> > > +
> > > +	bar0 = readl(addr + PCI_BASE_ADDRESS_0);
> > > +	bar1 = readl(addr + PCI_BASE_ADDRESS_1);
> > > +	iounmap(addr);
> > 
> > ...corresponding release_mem_region() would go here.
> > 
> > > +
> > > +	/* sanity check */
> > > +	if (bar0 & (PCI_BASE_ADDRESS_MEM_TYPE_1M | PCI_BASE_ADDRESS_SPACE_IO))
> > > +		return CXL_RESOURCE_NONE;
> > 
> > I would have also expected:
> > 
> > - a sanity check for "Memory Space Enable" being set in the command
> >   register.
> 
> Ok.
> 
> > 
> > - an explicit check for 0xffffffff for the case when the upstream-port
> >   implements "no RCRB" mode.
> 
> Yes, I left support for this to a later patch, but it's better to
> check it here already and possibly fall back to reg loc DVSEC then.

Yeah, I think simply failing on 0xffffffff is sufficient for now.

> > 
> > - some check that BIOS initialized the BAR values post reset given these
> >   BARs are invisible to the PCI core resource assignment 
> 
> What check do you have in mind here? There is already the NULL check
> which would be the out-of-reset value.

I was thinking more along the lines of sanity checking that the
programmed RCRB range falls within the assigned MMIO space of the
host-bridge, but perhaps that is overkill since it would just be
validating self consistency between 2 BIOS provided values. Robustness
principle would say try to continue if those disagree.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port
  2022-11-15 13:12     ` Robert Richter
@ 2022-11-15 18:06       ` Dan Williams
  2022-11-17 18:13         ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-15 18:06 UTC (permalink / raw)
  To: Robert Richter, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

Robert Richter wrote:
> On 14.11.22 15:45:19, Dan Williams wrote:
> > Robert Richter wrote:
> > > The PCIe software view of an RCH and RCD is different to VH mode. An
> > > RCD is paired with an RCH and shows up as RCiEP with a parent already
> > > pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
> > > mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
> > > device as parent (struct pci_dev, most of the time a downstream switch
> > > port, but could also be a root port). The following hierarchy applies
> > > in VH mode:
> > > 
> > >  CXL memory device, cxl_memdev                               endpoint
> > >  └──PCIe Endpoint (type 0), pci_dev                           |
> > >     └──Downstream Port (type 1), pci_dev (Nth switch)        portN
> > >        └──Upstream Port (type 1), pci_dev (Nth switch)        |
> > >           :                                                   :
> > >           └──Downstream Port (type 1), pci_dev (1st switch)  port1
> > >              └──Upstream Port (type 1), pci_dev (1st switch)  |
> > >                 └──Root Port (type 1), pci_dev                |
> > >                    └──PCI host bridge, pci_host_bridge       port0
> > >                       :                                       |
> > >                       :..ACPI0017, acpi_dev                  root
> > > 
> > >  (There can be zero or any other number of switches in between.)
> > > 
> > > An iterator through the grandparents takes us to the root port which
> > > is registered as dport to the bridge. The next port an endpoint is
> > > connected to can be determined by using the grandparent of the memory
> > > device as a dport_dev in cxl_mem_find_port().
> > > 
> > > The same does not work in RCD mode where only an RCiEP is connected to
> > > the host bridge:
> > > 
> > >  CXL memory device, cxl_memdev                               endpoint
> > >  └──PCIe Endpoint (type 0), pci_dev                           |
> > >     └──PCI host bridge, pci_host_bridge                      port0
> > >        :                                                      |
> > >        :..ACPI0017, acpi_dev                                 root
> > > 
> > > Here, an endpoint is directly connected to the host bridge without a
> > > type 1 PCI device (root or downstream port) in between. To link the
> > > endpoint to the correct port, the endpoint's PCI device (parent of the
> > > memory device) must be taken as dport_dev arg in cxl_mem_find_port().
> > > 
> > > Change cxl_mem_find_port() to find an RCH's port.
> > > 
> > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > ---
> > >  drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 38 insertions(+)
> > > 
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index 0431ed860d8e..d10c3580719b 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> > >  	return rc;
> > >  }
> > >  
> > > +static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
> > > +{
> > > +	struct device *parent = cxlmd->dev.parent;
> > > +	if (!dev_is_pci(parent))
> > > +		return false;
> > > +	return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
> > > +}
> > > +
> > >  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > >  {
> > >  	struct device *dev = &cxlmd->dev;
> > > @@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
> > >  
> > > +/*
> > > + * CXL memory device and port hierarchy:
> > > + *
> > > + * VH mode:
> > > + *
> > > + * CXL memory device, cxl_memdev                               endpoint
> > > + * └──PCIe Endpoint (type 0), pci_dev                           |
> > > + *    └──Downstream Port (type 1), pci_dev (Nth switch)        portN
> > > + *       └──Upstream Port (type 1), pci_dev (Nth switch)        |
> > > + *          :                                                   :
> > > + *          └──Downstream Port (type 1), pci_dev (1st switch)  port1
> > > + *             └──Upstream Port (type 1), pci_dev (1st switch)  |
> > > + *                └──Root Port (type 1), pci_dev                |
> > > + *                   └──PCI host bridge, pci_host_bridge       port0
> > > + *                      :                                       |
> > > + *                      :..ACPI0017, acpi_dev                  root
> > > + *
> > > + * (There can be zero or any other number of switches in between.)
> > > + *
> > > + * RCD mode:
> > > + *
> > > + * CXL memory device, cxl_memdev                               endpoint
> > > + * └──PCIe Endpoint (type 0), pci_dev                           |
> > > + *    └──PCI host bridge, pci_host_bridge                      port0
> > > + *       :                                                      |
> > > + *       :..ACPI0017, acpi_dev                                 root
> > > + */
> > >  struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
> > >  				   struct cxl_dport **dport)
> > >  {
> > > +	if (is_cxl_restricted(cxlmd))
> > > +		return find_cxl_port(cxlmd->dev.parent, dport);
> > > +
> > >  	return find_cxl_port(grandparent(&cxlmd->dev), dport);
> > 
> > I do not see why this change is needed. For example:
> > 
> > # readlink -f /sys/bus/cxl/devices/mem0
> > /sys/devices/pci0000:38/0000:38:00.0/mem0
> > # cxl list -BT
> > [
> >   {
> >     "bus":"root0",
> >     "provider":"ACPI.CXL",
> >     "nr_dports":1,
> >     "dports":[
> >       {
> >         "dport":"pci0000:38",
> >         "id":49
> >       }
> >     ]
> >   }
> > ]
> > 
> > ...so, in this case, the grandparent of "mem0" is "pci0000:38", and
> > "pci0000:38" is a dport. Unmodified cxl_mem_find_port() will do the
> > right thing and find that this CXL RCIEP is directly connected to
> > "root0".
> 
> find_cxl_port() uses the dport_dev, not the uport_dev. A lookup of
> pci0000:38 gives the cxl root (ACPI.CXL).

...but that is what I would expect. I.e. that RCDs appear directly
connected to the cxl root with no intervening cxl_port instance, and RCH
host-bridges only serve the role of dport devices. This also matches the
diagram from 9.11.8 CXL Devices Attached to an RCH where a
downstream-port RCRB in the host bridge is directly connected to the
RCIEP endpoint.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)
  2022-11-15 13:17     ` Robert Richter
@ 2022-11-15 18:08       ` Dan Williams
  2022-11-17 18:46         ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-15 18:08 UTC (permalink / raw)
  To: Robert Richter, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

Robert Richter wrote:
> On 14.11.22 16:07:58, Dan Williams wrote:
> > Robert Richter wrote:
> > > When an endpoint is found, all ports in beetween the endpoint and the
> > > CXL host bridge need to be created. In the RCH case there are no ports
> > > in between a host bridge and the endpoint. Skip the enumeration of
> > > intermediate ports.
> > > 
> > > The port enumeration does not only create all ports, it also
> > > initializes the endpoint chain by adding the endpoint to every
> > > downstream port up to the root bridge. This must be done also in RCD
> > > mode, but is much more simple as the endpoint only needs to be added
> > > to the host bridge's dport.
> > > 
> > > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > > released in cxl_port_release().
> > > 
> > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > ---
> > >  drivers/cxl/core/port.c | 18 ++++++++++++++++--
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index d10c3580719b..e21a9c3fe4da 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > >  {
> > >  	struct device *dev = &cxlmd->dev;
> > >  	struct device *iter;
> > > +	struct cxl_dport *dport;
> > > +	struct cxl_port *port;
> > >  	int rc;
> > >  
> > > +	/*
> > > +	 * Skip intermediate port enumeration in the RCH case, there
> > > +	 * are no ports in between a host bridge and an endpoint. Only
> > > +	 * initialize the EP chain.
> > > +	 */
> > > +	if (is_cxl_restricted(cxlmd)) {
> > 
> > I changed this to:
> > 
> > 	if (cxlmd->cxlds->rcd) {
> 
> I a mail to Bjorn I suggested to have cxl_dev_cap and a cxl_port_cap
> in the pci_dev struct that could be looked up too including RCD mode.
> Checking the pci_dev looks more reasonable to me here, though we could
> have a flag of it in cxlds too.

Would that not need the PCI core to understand how to walk the RCRB
generically? As far as I understand the RCRB association is ACPI.CEDT
specific.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)
  2022-11-15 13:28     ` Robert Richter
@ 2022-11-15 18:09       ` Dan Williams
  0 siblings, 0 replies; 52+ messages in thread
From: Dan Williams @ 2022-11-15 18:09 UTC (permalink / raw)
  To: Robert Richter, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

Robert Richter wrote:
> On 14.11.22 16:24:06, Dan Williams wrote:
> > Robert Richter wrote:
> > > When an endpoint is found, all ports in beetween the endpoint and the
> > > CXL host bridge need to be created. In the RCH case there are no ports
> > > in between a host bridge and the endpoint. Skip the enumeration of
> > > intermediate ports.
> > > 
> > > The port enumeration does not only create all ports, it also
> > > initializes the endpoint chain by adding the endpoint to every
> > > downstream port up to the root bridge. This must be done also in RCD
> > > mode, but is much more simple as the endpoint only needs to be added
> > > to the host bridge's dport.
> > > 
> > > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > > released in cxl_port_release().
> > > 
> > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > ---
> > >  drivers/cxl/core/port.c | 18 ++++++++++++++++--
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index d10c3580719b..e21a9c3fe4da 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > >  {
> > >  	struct device *dev = &cxlmd->dev;
> > >  	struct device *iter;
> > > +	struct cxl_dport *dport;
> > > +	struct cxl_port *port;
> > >  	int rc;
> > >  
> > > +	/*
> > > +	 * Skip intermediate port enumeration in the RCH case, there
> > > +	 * are no ports in between a host bridge and an endpoint. Only
> > > +	 * initialize the EP chain.
> > > +	 */
> > > +	if (is_cxl_restricted(cxlmd)) {
> > > +		port = cxl_mem_find_port(cxlmd, &dport);
> > > +		if (!port)
> > > +			return -ENXIO;
> > > +		rc = cxl_add_ep(dport, &cxlmd->dev);
> > 
> > On second look, this seems problematic. While yes it will be deleted
> > when the root CXL port dies, it will not be deleted if the cxl_pci
> > driver is reloaded. I will code up a unit test to double check.
> > 
> > I note that cxl_add_ep() for the VH case is skipped for the root CXL
> > port, so I do not suspect it is needed here either. Did you add it for a
> > specific reason?
> 
> Yes, all endpoint iterators need to be reworked. Also true, the first
> endpoint is skipped in the chain. So only intermediate EP structs are
> touched by the loops actually.
> 
> In particular, cxl_ep_load() returned NULL for the first lookup if the
> ep is missing. We could stop the iteration then. I tried to avoid a
> rework here, but maybe it is not too extensive as I expected first.

Hmm, ok, let me get the unit test topology working for this to make sure
my assumptions are correct about when an @ep reference is used / needed.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device
  2022-11-09 10:40 ` [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device Robert Richter
@ 2022-11-16 19:24   ` Dan Williams
  2022-11-17 15:56     ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-16 19:24 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

Robert Richter wrote:
> The Device 0, Function 0 DVSEC controls the CXL functionality of the
> entire device. Add a check to prevent registration of any other PCI
> device on the bus as a CXL memory device.

Can you reference the specification wording that indicates that the OS
needs to actively avoid these situations, or otherwise point to the real
world scenario where this filtering is needed?

> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..cc4f206f24b3 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
>  	}
>  }
>  
> +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> +{
> +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> +		return 0;		/* no RCD */
> +
> +	if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> +		return 0;		/* ok */
> +
> +	dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",

s/0x%02x/%#02x/

> +		pdev->devfn, pcie_dvsec);

This looks like a dev_dbg() to me. Otherwise a warning will always fire
on a benign condition.

> +
> +	return -ENODEV;
> +}
> +
>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct cxl_register_map map;
>  	struct cxl_memdev *cxlmd;
>  	struct cxl_dev_state *cxlds;
> +	u16 pcie_dvsec;
>  	int rc;
>  
>  	/*
> @@ -442,6 +457,13 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	BUILD_BUG_ON(offsetof(struct cxl_regs, memdev) !=
>  		     offsetof(struct cxl_regs, device_regs.memdev));
>  
> +	pcie_dvsec = pci_find_dvsec_capability(
> +		pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
> +
> +	rc = check_restricted_device(pdev, pcie_dvsec);
> +	if (rc)
> +		return rc;
> +
>  	rc = pcim_enable_device(pdev);
>  	if (rc)
>  		return rc;
> @@ -451,8 +473,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  		return PTR_ERR(cxlds);
>  
>  	cxlds->serial = pci_get_dsn(pdev);
> -	cxlds->cxl_dvsec = pci_find_dvsec_capability(
> -		pdev, PCI_DVSEC_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
> +	cxlds->cxl_dvsec = pcie_dvsec;
>  	if (!cxlds->cxl_dvsec)
>  		dev_warn(&pdev->dev,
>  			 "Device DVSEC not present, skip CXL.mem init\n");
> -- 
> 2.30.2
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports()
  2022-11-09 10:40 ` [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports() Robert Richter
  2022-11-09 23:09   ` Bjorn Helgaas
@ 2022-11-16 19:36   ` Dan Williams
  1 sibling, 0 replies; 52+ messages in thread
From: Dan Williams @ 2022-11-16 19:36 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

Robert Richter wrote:
> The link capabilities of a PCI device are read when enumerating its
> dports. This is done by reading the PCI config space. If that fails
> port enumeration ignores that error. However, reading the PCI config
> space should reliably work.
> 
> To reduce some complexity to the code flow when factoring out parts of
> the code in match_add_dports() for later reuse, change this to throw
> an error.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 0dbbe8d39b07..8271b8abde7a 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -47,7 +47,7 @@ static int match_add_dports(struct pci_dev *pdev, void *data)

match_add_dports() never comes into play in the RCH topology case. There
are no switch ports to handle and CXL host-bridges are only ever dports
in the RCH case.

I will post the cxl_test enabling for an RCH topology so we can compare
notes there.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 7/9] cxl/pci: Factor out code in match_add_dports() to pci_dev_add_dport()
  2022-11-09 10:40 ` [PATCH v3 7/9] cxl/pci: Factor out code in match_add_dports() to pci_dev_add_dport() Robert Richter
@ 2022-11-16 19:37   ` Dan Williams
  0 siblings, 0 replies; 52+ messages in thread
From: Dan Williams @ 2022-11-16 19:37 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

Robert Richter wrote:
> Factor out the code to register a PCI device's dport to a port. It
> will be reused to implement RCD mode.

Per the comment on [PATCH v3 6/9] I am not seeing why this is needed.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH)
  2022-11-09 10:40 ` [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH) Robert Richter
  2022-11-11 11:59   ` Robert Richter
@ 2022-11-16 20:55   ` Dan Williams
  1 sibling, 0 replies; 52+ messages in thread
From: Dan Williams @ 2022-11-16 20:55 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams
  Cc: linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Robert Richter

Robert Richter wrote:
> The PCIe Software View of an RCH and RCD is different to VH mode. An
> RCD is paired with an RCH and shows up as RCiEP. Its downstream and
> upstream ports are hidden to the PCI hierarchy. This different PCI
> topology requires a different handling of RCHs.
> 
> Extend devm_cxl_port_enumerate_dports() to support restricted hosts
> (RCH). If an RCH is detected, register its port as dport to the
> device. An RCH is found if the host's dev 0 func 0 devices is an RCiEP
> with an existing PCIe DVSEC for CXL Devices (ID 0000h).

It is not clear to me what this extra dport represents. Here are the
Linux CXL objects I see in a VH vs an RCH topology:

               VH
          ┌──────────┐
          │ ACPI0017 │
          │  root0   │
          └─────┬────┘
                │
          ┌─────┴────┐
          │  dport0  │
    ┌─────┤ ACPI0016 ├─────┐
    │     │  port1   │     │
    │     └────┬─────┘     │
    │          │           │
 ┌──┴───┐   ┌──┴───┐   ┌───┴──┐
 │dport0│   │dport1│   │dport2│
 │ RP0  │   │ RP1  │   │ RP2  │
 └──────┘   └──┬───┘   └──────┘
               │
           ┌───┴─────┐
           │endpoint0│
           │  port2  │
           └─────────┘


              RCH
          ┌──────────┐
          │ ACPI0017 │
          │  root0   │
          └────┬─────┘
               │
           ┌───┴────┐
           │ dport0 │
           │ACPI0016│
           └───┬────┘
               │
          ┌────┴─────┐
          │endpoint0 │
          │  port1   │
          └──────────┘

So in the RCH case the only dport is the dport that root0 targets, and
then that dport is directly connected to the RCIEP endpoint-port.

In the VH case another level of dports are needed to route from root0 to
the fan out across the CXL root ports.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB
  2022-11-15 17:54       ` Dan Williams
@ 2022-11-17 12:43         ` Robert Richter
  2022-11-17 17:20           ` Dan Williams
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-17 12:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Terry Bowman

On 15.11.22 09:54:16, Dan Williams wrote:
> Robert Richter wrote:
> > On 14.11.22 13:30:01, Dan Williams wrote:
> > > Robert Richter wrote:
> > 
> > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > > > index ec178e69b18f..7a5bde81e949 100644
> > > > --- a/drivers/cxl/core/regs.c
> > > > +++ b/drivers/cxl/core/regs.c
> > > > @@ -307,3 +307,49 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> > > >  	return -ENODEV;
> > > >  }
> > > >  EXPORT_SYMBOL_NS_GPL(cxl_find_regblock, CXL);
> > > > +
> > > > +resource_size_t cxl_rcrb_to_component(struct device *dev,
> > > > +				      resource_size_t rcrb,
> > > > +				      enum cxl_rcrb which)
> > > > +{
> > > > +	resource_size_t component_reg_phys;
> > > > +	u32 bar0, bar1;
> > > > +	void *addr;
> > > > +
> > > > +	if (which == CXL_RCRB_UPSTREAM)
> > > > +		rcrb += SZ_4K;
> > > > +
> > > > +	/*
> > > > +	 * RCRB's BAR[0..1] point to component block containing CXL
> > > > +	 * subsystem component registers. MEMBAR extraction follows
> > > > +	 * the PCI Base spec here, esp. 64 bit extraction and memory
> > > > +	 * ranges alignment (6.0, 7.5.1.2.1).
> > > > +	 */
> > > 
> > > A request_mem_region() is needed here to ensure ownership and expected
> > > sequencing of accessing the RCRB to locate the component registers, and
> > > accessing the RCRB to manipulate the component registers. It also helps
> > > to sanity check that the BIOS mapped an exclusive range for the RCRB.
> > 
> > Right, that is missing.
> > 
> > > 
> > > > +	addr = ioremap(rcrb, PCI_BASE_ADDRESS_0 + SZ_8);
> > > 
> > > That PCI_BASE_ADDRESS_0 does not belong there. It ends up being benign
> > > and forcing ioremap to map 12K instead of 8K, but it is a
> > > config-register offset, not part of the RCRB size.
> > 
> > Note this is BAR0 + 8 bytes, not 8k, and it does not map the whole
> > RCRB region but instead the first part of the config space up to
> > including the 64 bit BAR.
> 
> Oh, sorry, yes, my mistake. However, there is not much value in mapping
> less than 4K since all ioremap requests are rounded up to PAGE_SIZE.
> Since an RCRB is only 4K per port lets just map the whole thing.

I was going to keep the ranges small to avoid conflicts with other
requests for the same page (though request_mem_region() was missing
yet).

> 
> > > > +	if (!addr) {
> > > > +		dev_err(dev, "Failed to map region %pr\n", addr);
> > > > +		return CXL_RESOURCE_NONE;
> > > > +	}
> > > > +
> > > > +	bar0 = readl(addr + PCI_BASE_ADDRESS_0);
> > > > +	bar1 = readl(addr + PCI_BASE_ADDRESS_1);
> > > > +	iounmap(addr);
> > > 
> > > ...corresponding release_mem_region() would go here.
> > > 
> > > > +
> > > > +	/* sanity check */
> > > > +	if (bar0 & (PCI_BASE_ADDRESS_MEM_TYPE_1M | PCI_BASE_ADDRESS_SPACE_IO))
> > > > +		return CXL_RESOURCE_NONE;
> > > 
> > > I would have also expected:
> > > 
> > > - a sanity check for "Memory Space Enable" being set in the command
> > >   register.
> > 
> > Ok.
> > 
> > > 
> > > - an explicit check for 0xffffffff for the case when the upstream-port
> > >   implements "no RCRB" mode.
> > 
> > Yes, I left support for this to a later patch, but it's better to
> > check it here already and possibly fall back to reg loc DVSEC then.
> 
> Yeah, I think simply failing on 0xffffffff is sufficient for now.
> 
> > > 
> > > - some check that BIOS initialized the BAR values post reset given these
> > >   BARs are invisible to the PCI core resource assignment 
> > 
> > What check do you have in mind here? There is already the NULL check
> > which would be the out-of-reset value.
> 
> I was thinking more along the lines of sanity checking that the
> programmed RCRB range falls within the assigned MMIO space of the
> host-bridge, but perhaps that is overkill since it would just be
> validating self consistency between 2 BIOS provided values. Robustness
> principle would say try to continue if those disagree.

Ok, will drop a check here.

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device
  2022-11-16 19:24   ` Dan Williams
@ 2022-11-17 15:56     ` Robert Richter
  2022-11-17 17:27       ` Dan Williams
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-17 15:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 16.11.22 11:24:48, Dan Williams wrote:
> Robert Richter wrote:
> > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > entire device. Add a check to prevent registration of any other PCI
> > device on the bus as a CXL memory device.
> 
> Can you reference the specification wording that indicates that the OS
> needs to actively avoid these situations, or otherwise point to the real
> world scenario where this filtering is needed?

CXL 3.0

8.1.3 PCIe DVSEC for CXL Device

"""
An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
and can expose one or more PCIe device numbers and function numbers at this bus
number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
in Figure 8-1.
"""

"""
In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
control the CXL functionality of the entire device.
"""

There are some other occurrences. I think this is even true for VH
mode, as multiple CXL devices on the bus are exposed through multiple
DSPs or Root Ports.

Anyway, I limited this to an RCD only, esp. because its counterpart
would be missing and thus port mapping would fail otherwise. See
restricted_host_enumerate_dport() of this series.

> 
> > 
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> >  drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> >  1 file changed, 23 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index faeb5d9d7a7a..cc4f206f24b3 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> >  	}
> >  }
> >  
> > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > +{
> > +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > +		return 0;		/* no RCD */
> > +
> > +	if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > +		return 0;		/* ok */
> > +
> > +	dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> 
> s/0x%02x/%#02x/
> 
> > +		pdev->devfn, pcie_dvsec);

Ok.

> This looks like a dev_dbg() to me. Otherwise a warning will always fire
> on a benign condition.

I have chosen dev_warn() here as this is a non-compliant unexpected
behavior of the device. There are no (legal) cases this may happen. I
suppose you are worried about spamming the console here, but that
error should be reported somewhere and thus being visible.

Note: There can be multiple devices on the bus, but those shouldn't
have a CXL mem dev class code and the devices shouldn't being probed
by cxl_pci_probe() which contains the check and later creates a cxlmd
dev.

-Robert



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB
  2022-11-17 12:43         ` Robert Richter
@ 2022-11-17 17:20           ` Dan Williams
  2022-11-17 18:25             ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-17 17:20 UTC (permalink / raw)
  To: Robert Richter, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Terry Bowman

Robert Richter wrote:
> On 15.11.22 09:54:16, Dan Williams wrote:
> > Robert Richter wrote:
> > > On 14.11.22 13:30:01, Dan Williams wrote:
> > > > Robert Richter wrote:
> > > 
> > > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > > > > index ec178e69b18f..7a5bde81e949 100644
> > > > > --- a/drivers/cxl/core/regs.c
> > > > > +++ b/drivers/cxl/core/regs.c
> > > > > @@ -307,3 +307,49 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> > > > >  	return -ENODEV;
> > > > >  }
> > > > >  EXPORT_SYMBOL_NS_GPL(cxl_find_regblock, CXL);
> > > > > +
> > > > > +resource_size_t cxl_rcrb_to_component(struct device *dev,
> > > > > +				      resource_size_t rcrb,
> > > > > +				      enum cxl_rcrb which)
> > > > > +{
> > > > > +	resource_size_t component_reg_phys;
> > > > > +	u32 bar0, bar1;
> > > > > +	void *addr;
> > > > > +
> > > > > +	if (which == CXL_RCRB_UPSTREAM)
> > > > > +		rcrb += SZ_4K;
> > > > > +
> > > > > +	/*
> > > > > +	 * RCRB's BAR[0..1] point to component block containing CXL
> > > > > +	 * subsystem component registers. MEMBAR extraction follows
> > > > > +	 * the PCI Base spec here, esp. 64 bit extraction and memory
> > > > > +	 * ranges alignment (6.0, 7.5.1.2.1).
> > > > > +	 */
> > > > 
> > > > A request_mem_region() is needed here to ensure ownership and expected
> > > > sequencing of accessing the RCRB to locate the component registers, and
> > > > accessing the RCRB to manipulate the component registers. It also helps
> > > > to sanity check that the BIOS mapped an exclusive range for the RCRB.
> > > 
> > > Right, that is missing.
> > > 
> > > > 
> > > > > +	addr = ioremap(rcrb, PCI_BASE_ADDRESS_0 + SZ_8);
> > > > 
> > > > That PCI_BASE_ADDRESS_0 does not belong there. It ends up being benign
> > > > and forcing ioremap to map 12K instead of 8K, but it is a
> > > > config-register offset, not part of the RCRB size.
> > > 
> > > Note this is BAR0 + 8 bytes, not 8k, and it does not map the whole
> > > RCRB region but instead the first part of the config space up to
> > > including the 64 bit BAR.
> > 
> > Oh, sorry, yes, my mistake. However, there is not much value in mapping
> > less than 4K since all ioremap requests are rounded up to PAGE_SIZE.
> > Since an RCRB is only 4K per port lets just map the whole thing.
> 
> I was going to keep the ranges small to avoid conflicts with other
> requests for the same page (though request_mem_region() was missing
> yet).

What else will be conflicting the RCRB? Linux has never accessed an RCRB
in the past as far as I can see. If there is a conflict then we may need
to move this mapping to the PCI core so that it is managed like other
mmconf space.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device
  2022-11-17 15:56     ` Robert Richter
@ 2022-11-17 17:27       ` Dan Williams
  2022-11-18  8:27         ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-17 17:27 UTC (permalink / raw)
  To: Robert Richter, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

Robert Richter wrote:
> On 16.11.22 11:24:48, Dan Williams wrote:
> > Robert Richter wrote:
> > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > entire device. Add a check to prevent registration of any other PCI
> > > device on the bus as a CXL memory device.
> > 
> > Can you reference the specification wording that indicates that the OS
> > needs to actively avoid these situations, or otherwise point to the real
> > world scenario where this filtering is needed?
> 
> CXL 3.0
> 
> 8.1.3 PCIe DVSEC for CXL Device
> 
> """
> An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> and can expose one or more PCIe device numbers and function numbers at this bus
> number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> in Figure 8-1.
> """
> 
> """
> In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> control the CXL functionality of the entire device.
> """
> 
> There are some other occurrences. I think this is even true for VH
> mode, as multiple CXL devices on the bus are exposed through multiple
> DSPs or Root Ports.
> 
> Anyway, I limited this to an RCD only, esp. because its counterpart
> would be missing and thus port mapping would fail otherwise. See
> restricted_host_enumerate_dport() of this series.
> 
> > 
> > > 
> > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > ---
> > >  drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > >  	}
> > >  }
> > >  
> > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > +{
> > > +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > +		return 0;		/* no RCD */
> > > +
> > > +	if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > +		return 0;		/* ok */
> > > +
> > > +	dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> > 
> > s/0x%02x/%#02x/
> > 
> > > +		pdev->devfn, pcie_dvsec);
> 
> Ok.
> 
> > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > on a benign condition.
> 
> I have chosen dev_warn() here as this is a non-compliant unexpected
> behavior of the device. There are no (legal) cases this may happen. I
> suppose you are worried about spamming the console here, but that
> error should be reported somewhere and thus being visible.

There are so many spec illegal values and conditions that the driver
could checki, but does not. The reason I am poking here is why does the
driver need to be explicit about *this* illegal condition versus all the
other potential conditions? What is the practical end user impact if
Linux does not include this change? For example, if it is just one
vendor that made this mistake that can be an explicit quirk.

A dev_warn() is not necessary for simple quirks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port
  2022-11-15 18:06       ` Dan Williams
@ 2022-11-17 18:13         ` Robert Richter
  0 siblings, 0 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-17 18:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 15.11.22 10:06:28, Dan Williams wrote:
> Robert Richter wrote:
> > On 14.11.22 15:45:19, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > The PCIe software view of an RCH and RCD is different to VH mode. An
> > > > RCD is paired with an RCH and shows up as RCiEP with a parent already
> > > > pointing to a PCI bridge (struct pci_host_bridge). In contrast, in VH
> > > > mode an PCI Express Endpoint is a PCI type 0 device with a PCI type 1
> > > > device as parent (struct pci_dev, most of the time a downstream switch
> > > > port, but could also be a root port). The following hierarchy applies
> > > > in VH mode:
> > > > 
> > > >  CXL memory device, cxl_memdev                               endpoint
> > > >  └──PCIe Endpoint (type 0), pci_dev                           |
> > > >     └──Downstream Port (type 1), pci_dev (Nth switch)        portN
> > > >        └──Upstream Port (type 1), pci_dev (Nth switch)        |
> > > >           :                                                   :
> > > >           └──Downstream Port (type 1), pci_dev (1st switch)  port1
> > > >              └──Upstream Port (type 1), pci_dev (1st switch)  |
> > > >                 └──Root Port (type 1), pci_dev                |
> > > >                    └──PCI host bridge, pci_host_bridge       port0
> > > >                       :                                       |
> > > >                       :..ACPI0017, acpi_dev                  root
> > > > 
> > > >  (There can be zero or any other number of switches in between.)
> > > > 
> > > > An iterator through the grandparents takes us to the root port which
> > > > is registered as dport to the bridge. The next port an endpoint is
> > > > connected to can be determined by using the grandparent of the memory
> > > > device as a dport_dev in cxl_mem_find_port().
> > > > 
> > > > The same does not work in RCD mode where only an RCiEP is connected to
> > > > the host bridge:
> > > > 
> > > >  CXL memory device, cxl_memdev                               endpoint
> > > >  └──PCIe Endpoint (type 0), pci_dev                           |
> > > >     └──PCI host bridge, pci_host_bridge                      port0
> > > >        :                                                      |
> > > >        :..ACPI0017, acpi_dev                                 root
> > > > 
> > > > Here, an endpoint is directly connected to the host bridge without a
> > > > type 1 PCI device (root or downstream port) in between. To link the
> > > > endpoint to the correct port, the endpoint's PCI device (parent of the
> > > > memory device) must be taken as dport_dev arg in cxl_mem_find_port().
> > > > 
> > > > Change cxl_mem_find_port() to find an RCH's port.
> > > > 
> > > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > > ---
> > > >  drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 38 insertions(+)
> > > > 
> > > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > > index 0431ed860d8e..d10c3580719b 100644
> > > > --- a/drivers/cxl/core/port.c
> > > > +++ b/drivers/cxl/core/port.c
> > > > @@ -1354,6 +1354,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> > > >  	return rc;
> > > >  }
> > > >  
> > > > +static inline bool is_cxl_restricted(struct cxl_memdev *cxlmd)
> > > > +{
> > > > +	struct device *parent = cxlmd->dev.parent;
> > > > +	if (!dev_is_pci(parent))
> > > > +		return false;
> > > > +	return pci_pcie_type(to_pci_dev(parent)) == PCI_EXP_TYPE_RC_END;
> > > > +}
> > > > +
> > > >  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > >  {
> > > >  	struct device *dev = &cxlmd->dev;
> > > > @@ -1433,9 +1441,39 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > >  }
> > > >  EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
> > > >  
> > > > +/*
> > > > + * CXL memory device and port hierarchy:
> > > > + *
> > > > + * VH mode:
> > > > + *
> > > > + * CXL memory device, cxl_memdev                               endpoint
> > > > + * └──PCIe Endpoint (type 0), pci_dev                           |
> > > > + *    └──Downstream Port (type 1), pci_dev (Nth switch)        portN
> > > > + *       └──Upstream Port (type 1), pci_dev (Nth switch)        |
> > > > + *          :                                                   :
> > > > + *          └──Downstream Port (type 1), pci_dev (1st switch)  port1
> > > > + *             └──Upstream Port (type 1), pci_dev (1st switch)  |
> > > > + *                └──Root Port (type 1), pci_dev                |
> > > > + *                   └──PCI host bridge, pci_host_bridge       port0
> > > > + *                      :                                       |
> > > > + *                      :..ACPI0017, acpi_dev                  root
> > > > + *
> > > > + * (There can be zero or any other number of switches in between.)
> > > > + *
> > > > + * RCD mode:
> > > > + *
> > > > + * CXL memory device, cxl_memdev                               endpoint
> > > > + * └──PCIe Endpoint (type 0), pci_dev                           |
> > > > + *    └──PCI host bridge, pci_host_bridge                      port0
> > > > + *       :                                                      |
> > > > + *       :..ACPI0017, acpi_dev                                 root
> > > > + */
> > > >  struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
> > > >  				   struct cxl_dport **dport)
> > > >  {
> > > > +	if (is_cxl_restricted(cxlmd))
> > > > +		return find_cxl_port(cxlmd->dev.parent, dport);
> > > > +
> > > >  	return find_cxl_port(grandparent(&cxlmd->dev), dport);
> > > 
> > > I do not see why this change is needed. For example:
> > > 
> > > # readlink -f /sys/bus/cxl/devices/mem0
> > > /sys/devices/pci0000:38/0000:38:00.0/mem0
> > > # cxl list -BT
> > > [
> > >   {
> > >     "bus":"root0",
> > >     "provider":"ACPI.CXL",
> > >     "nr_dports":1,
> > >     "dports":[
> > >       {
> > >         "dport":"pci0000:38",
> > >         "id":49
> > >       }
> > >     ]
> > >   }
> > > ]
> > > 
> > > ...so, in this case, the grandparent of "mem0" is "pci0000:38", and
> > > "pci0000:38" is a dport. Unmodified cxl_mem_find_port() will do the
> > > right thing and find that this CXL RCIEP is directly connected to
> > > "root0".
> > 
> > find_cxl_port() uses the dport_dev, not the uport_dev. A lookup of
> > pci0000:38 gives the cxl root (ACPI.CXL).
> 
> ...but that is what I would expect. I.e. that RCDs appear directly
> connected to the cxl root with no intervening cxl_port instance, and RCH
> host-bridges only serve the role of dport devices. This also matches the
> diagram from 9.11.8 CXL Devices Attached to an RCH where a
> downstream-port RCRB in the host bridge is directly connected to the
> RCIEP endpoint.

All devices connected to ACPI.CXL? IMO this needs to be bound to the
host bridge. The hierarchy should show a host/device mapping. In fact,
in a VH the root port is also not part of the mapping, instead the
host bridge shows up there. Also, there can be multiple RCHs.

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB
  2022-11-17 17:20           ` Dan Williams
@ 2022-11-17 18:25             ` Robert Richter
  2022-11-17 19:23               ` Dan Williams
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-17 18:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Terry Bowman

On 17.11.22 09:20:55, Dan Williams wrote:
> Robert Richter wrote:
> > On 15.11.22 09:54:16, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > On 14.11.22 13:30:01, Dan Williams wrote:
> > > > > Robert Richter wrote:
> > > > 
> > > > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > > > > > index ec178e69b18f..7a5bde81e949 100644
> > > > > > --- a/drivers/cxl/core/regs.c
> > > > > > +++ b/drivers/cxl/core/regs.c
> > > > > > @@ -307,3 +307,49 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> > > > > >  	return -ENODEV;
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_NS_GPL(cxl_find_regblock, CXL);
> > > > > > +
> > > > > > +resource_size_t cxl_rcrb_to_component(struct device *dev,
> > > > > > +				      resource_size_t rcrb,
> > > > > > +				      enum cxl_rcrb which)
> > > > > > +{
> > > > > > +	resource_size_t component_reg_phys;
> > > > > > +	u32 bar0, bar1;
> > > > > > +	void *addr;
> > > > > > +
> > > > > > +	if (which == CXL_RCRB_UPSTREAM)
> > > > > > +		rcrb += SZ_4K;
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * RCRB's BAR[0..1] point to component block containing CXL
> > > > > > +	 * subsystem component registers. MEMBAR extraction follows
> > > > > > +	 * the PCI Base spec here, esp. 64 bit extraction and memory
> > > > > > +	 * ranges alignment (6.0, 7.5.1.2.1).
> > > > > > +	 */
> > > > > 
> > > > > A request_mem_region() is needed here to ensure ownership and expected
> > > > > sequencing of accessing the RCRB to locate the component registers, and
> > > > > accessing the RCRB to manipulate the component registers. It also helps
> > > > > to sanity check that the BIOS mapped an exclusive range for the RCRB.
> > > > 
> > > > Right, that is missing.
> > > > 
> > > > > 
> > > > > > +	addr = ioremap(rcrb, PCI_BASE_ADDRESS_0 + SZ_8);
> > > > > 
> > > > > That PCI_BASE_ADDRESS_0 does not belong there. It ends up being benign
> > > > > and forcing ioremap to map 12K instead of 8K, but it is a
> > > > > config-register offset, not part of the RCRB size.
> > > > 
> > > > Note this is BAR0 + 8 bytes, not 8k, and it does not map the whole
> > > > RCRB region but instead the first part of the config space up to
> > > > including the 64 bit BAR.
> > > 
> > > Oh, sorry, yes, my mistake. However, there is not much value in mapping
> > > less than 4K since all ioremap requests are rounded up to PAGE_SIZE.
> > > Since an RCRB is only 4K per port lets just map the whole thing.
> > 
> > I was going to keep the ranges small to avoid conflicts with other
> > requests for the same page (though request_mem_region() was missing
> > yet).
> 
> What else will be conflicting the RCRB? Linux has never accessed an RCRB
> in the past as far as I can see. If there is a conflict then we may need
> to move this mapping to the PCI core so that it is managed like other
> mmconf space.

The capabilities (PCIe and DVSEC) could be used by various subsystems
and parts of the driver. I am thinking of the various RAS caps (UP,
DP, CXL, AER variants) that are accessed from different parts of the
driver. Of curse, access could be delegated but else there is the
option to directly map and access that parts. In the component reg
block we already see issues with that broad mappings.

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs)
  2022-11-15 18:08       ` Dan Williams
@ 2022-11-17 18:46         ` Robert Richter
  0 siblings, 0 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-17 18:46 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 15.11.22 10:08:51, Dan Williams wrote:
> Robert Richter wrote:
> > On 14.11.22 16:07:58, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > When an endpoint is found, all ports in beetween the endpoint and the
> > > > CXL host bridge need to be created. In the RCH case there are no ports
> > > > in between a host bridge and the endpoint. Skip the enumeration of
> > > > intermediate ports.
> > > > 
> > > > The port enumeration does not only create all ports, it also
> > > > initializes the endpoint chain by adding the endpoint to every
> > > > downstream port up to the root bridge. This must be done also in RCD
> > > > mode, but is much more simple as the endpoint only needs to be added
> > > > to the host bridge's dport.
> > > > 
> > > > Note: For endpoint removal the cxl_detach_ep() is not needed as it is
> > > > released in cxl_port_release().
> > > > 
> > > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > > ---
> > > >  drivers/cxl/core/port.c | 18 ++++++++++++++++--
> > > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > > index d10c3580719b..e21a9c3fe4da 100644
> > > > --- a/drivers/cxl/core/port.c
> > > > +++ b/drivers/cxl/core/port.c
> > > > @@ -1366,8 +1366,24 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > > >  {
> > > >  	struct device *dev = &cxlmd->dev;
> > > >  	struct device *iter;
> > > > +	struct cxl_dport *dport;
> > > > +	struct cxl_port *port;
> > > >  	int rc;
> > > >  
> > > > +	/*
> > > > +	 * Skip intermediate port enumeration in the RCH case, there
> > > > +	 * are no ports in between a host bridge and an endpoint. Only
> > > > +	 * initialize the EP chain.
> > > > +	 */
> > > > +	if (is_cxl_restricted(cxlmd)) {
> > > 
> > > I changed this to:
> > > 
> > > 	if (cxlmd->cxlds->rcd) {
> > 
> > I a mail to Bjorn I suggested to have cxl_dev_cap and a cxl_port_cap
> > in the pci_dev struct that could be looked up too including RCD mode.
> > Checking the pci_dev looks more reasonable to me here, though we could
> > have a flag of it in cxlds too.
> 
> Would that not need the PCI core to understand how to walk the RCRB
> generically? As far as I understand the RCRB association is ACPI.CEDT
> specific.

I am thinking of doing some sort of cxl_setup_pci_dev(pdev) in
cxl_pci_probe() which extracts the caps and writes them into a struct
pci_dev. Possibly the CXL 68B Flit and VH Capable/Enable bit could be
used for RCD mode. But if that is not feasible for some reason a flag
somewhere could work too.

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB
  2022-11-17 18:25             ` Robert Richter
@ 2022-11-17 19:23               ` Dan Williams
  2022-11-18  8:12                 ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-17 19:23 UTC (permalink / raw)
  To: Robert Richter, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Terry Bowman

Robert Richter wrote:
> On 17.11.22 09:20:55, Dan Williams wrote:
> > Robert Richter wrote:
> > > On 15.11.22 09:54:16, Dan Williams wrote:
> > > > Robert Richter wrote:
> > > > > On 14.11.22 13:30:01, Dan Williams wrote:
> > > > > > Robert Richter wrote:
> > > > > 
> > > > > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > > > > > > index ec178e69b18f..7a5bde81e949 100644
> > > > > > > --- a/drivers/cxl/core/regs.c
> > > > > > > +++ b/drivers/cxl/core/regs.c
> > > > > > > @@ -307,3 +307,49 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> > > > > > >  	return -ENODEV;
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_NS_GPL(cxl_find_regblock, CXL);
> > > > > > > +
> > > > > > > +resource_size_t cxl_rcrb_to_component(struct device *dev,
> > > > > > > +				      resource_size_t rcrb,
> > > > > > > +				      enum cxl_rcrb which)
> > > > > > > +{
> > > > > > > +	resource_size_t component_reg_phys;
> > > > > > > +	u32 bar0, bar1;
> > > > > > > +	void *addr;
> > > > > > > +
> > > > > > > +	if (which == CXL_RCRB_UPSTREAM)
> > > > > > > +		rcrb += SZ_4K;
> > > > > > > +
> > > > > > > +	/*
> > > > > > > +	 * RCRB's BAR[0..1] point to component block containing CXL
> > > > > > > +	 * subsystem component registers. MEMBAR extraction follows
> > > > > > > +	 * the PCI Base spec here, esp. 64 bit extraction and memory
> > > > > > > +	 * ranges alignment (6.0, 7.5.1.2.1).
> > > > > > > +	 */
> > > > > > 
> > > > > > A request_mem_region() is needed here to ensure ownership and expected
> > > > > > sequencing of accessing the RCRB to locate the component registers, and
> > > > > > accessing the RCRB to manipulate the component registers. It also helps
> > > > > > to sanity check that the BIOS mapped an exclusive range for the RCRB.
> > > > > 
> > > > > Right, that is missing.
> > > > > 
> > > > > > 
> > > > > > > +	addr = ioremap(rcrb, PCI_BASE_ADDRESS_0 + SZ_8);
> > > > > > 
> > > > > > That PCI_BASE_ADDRESS_0 does not belong there. It ends up being benign
> > > > > > and forcing ioremap to map 12K instead of 8K, but it is a
> > > > > > config-register offset, not part of the RCRB size.
> > > > > 
> > > > > Note this is BAR0 + 8 bytes, not 8k, and it does not map the whole
> > > > > RCRB region but instead the first part of the config space up to
> > > > > including the 64 bit BAR.
> > > > 
> > > > Oh, sorry, yes, my mistake. However, there is not much value in mapping
> > > > less than 4K since all ioremap requests are rounded up to PAGE_SIZE.
> > > > Since an RCRB is only 4K per port lets just map the whole thing.
> > > 
> > > I was going to keep the ranges small to avoid conflicts with other
> > > requests for the same page (though request_mem_region() was missing
> > > yet).
> > 
> > What else will be conflicting the RCRB? Linux has never accessed an RCRB
> > in the past as far as I can see. If there is a conflict then we may need
> > to move this mapping to the PCI core so that it is managed like other
> > mmconf space.
> 
> The capabilities (PCIe and DVSEC) could be used by various subsystems
> and parts of the driver. I am thinking of the various RAS caps (UP,
> DP, CXL, AER variants) that are accessed from different parts of the
> driver. Of curse, access could be delegated but else there is the
> option to directly map and access that parts. In the component reg
> block we already see issues with that broad mappings.

Sure, but lets cross that bridge when we get to that point. Something is
broken if these competing usages can not at least have their own page
mapping since that limits being able to hand out control across security
boundaries (like VMs or userspace). Any ioremap less than PAGE_SIZE is
somewhat suspect.

The cxl_port driver so far seems to be sufficient for owning the entire
component register space.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB
  2022-11-17 19:23               ` Dan Williams
@ 2022-11-18  8:12                 ` Robert Richter
  0 siblings, 0 replies; 52+ messages in thread
From: Robert Richter @ 2022-11-18  8:12 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang,
	Terry Bowman

On 17.11.22 11:23:16, Dan Williams wrote:
> Robert Richter wrote:
> > On 17.11.22 09:20:55, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > On 15.11.22 09:54:16, Dan Williams wrote:
> > > > > Robert Richter wrote:
> > > > > > On 14.11.22 13:30:01, Dan Williams wrote:

> > > > > Oh, sorry, yes, my mistake. However, there is not much value in mapping
> > > > > less than 4K since all ioremap requests are rounded up to PAGE_SIZE.
> > > > > Since an RCRB is only 4K per port lets just map the whole thing.
> > > > 
> > > > I was going to keep the ranges small to avoid conflicts with other
> > > > requests for the same page (though request_mem_region() was missing
> > > > yet).
> > > 
> > > What else will be conflicting the RCRB? Linux has never accessed an RCRB
> > > in the past as far as I can see. If there is a conflict then we may need
> > > to move this mapping to the PCI core so that it is managed like other
> > > mmconf space.
> > 
> > The capabilities (PCIe and DVSEC) could be used by various subsystems
> > and parts of the driver. I am thinking of the various RAS caps (UP,
> > DP, CXL, AER variants) that are accessed from different parts of the
> > driver. Of curse, access could be delegated but else there is the
> > option to directly map and access that parts. In the component reg
> > block we already see issues with that broad mappings.
> 
> Sure, but lets cross that bridge when we get to that point. Something is
> broken if these competing usages can not at least have their own page
> mapping since that limits being able to hand out control across security
> boundaries (like VMs or userspace). Any ioremap less than PAGE_SIZE is
> somewhat suspect.
> 
> The cxl_port driver so far seems to be sufficient for owning the entire
> component register space.

Ok, I can change that.

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device
  2022-11-17 17:27       ` Dan Williams
@ 2022-11-18  8:27         ` Robert Richter
  2022-11-18 16:55           ` Dan Williams
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-18  8:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 17.11.22 09:27:23, Dan Williams wrote:
> Robert Richter wrote:
> > On 16.11.22 11:24:48, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > > entire device. Add a check to prevent registration of any other PCI
> > > > device on the bus as a CXL memory device.
> > > 
> > > Can you reference the specification wording that indicates that the OS
> > > needs to actively avoid these situations, or otherwise point to the real
> > > world scenario where this filtering is needed?
> > 
> > CXL 3.0
> > 
> > 8.1.3 PCIe DVSEC for CXL Device
> > 
> > """
> > An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> > and can expose one or more PCIe device numbers and function numbers at this bus
> > number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> > Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> > in Figure 8-1.
> > """
> > 
> > """
> > In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> > control the CXL functionality of the entire device.
> > """
> > 
> > There are some other occurrences. I think this is even true for VH
> > mode, as multiple CXL devices on the bus are exposed through multiple
> > DSPs or Root Ports.
> > 
> > Anyway, I limited this to an RCD only, esp. because its counterpart
> > would be missing and thus port mapping would fail otherwise. See
> > restricted_host_enumerate_dport() of this series.
> > 
> > > 
> > > > 
> > > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > > ---
> > > >  drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > > --- a/drivers/cxl/pci.c
> > > > +++ b/drivers/cxl/pci.c
> > > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > >  	}
> > > >  }
> > > >  
> > > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > > +{
> > > > +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > > +		return 0;		/* no RCD */
> > > > +
> > > > +	if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > > +		return 0;		/* ok */
> > > > +
> > > > +	dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> > > 
> > > s/0x%02x/%#02x/
> > > 
> > > > +		pdev->devfn, pcie_dvsec);
> > 
> > Ok.
> > 
> > > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > > on a benign condition.
> > 
> > I have chosen dev_warn() here as this is a non-compliant unexpected
> > behavior of the device. There are no (legal) cases this may happen. I
> > suppose you are worried about spamming the console here, but that
> > error should be reported somewhere and thus being visible.
> 
> There are so many spec illegal values and conditions that the driver
> could checki, but does not. The reason I am poking here is why does the
> driver need to be explicit about *this* illegal condition versus all the
> other potential conditions? What is the practical end user impact if
> Linux does not include this change? For example, if it is just one
> vendor that made this mistake that can be an explicit quirk.
> 
> A dev_warn() is not necessary for simple quirks.

This is not simply a cross check, the driver prevents enablement of
CXL mem devs other than PCI_DEVFN(0, 0). It shouldn't silently drop
out then. It's a device malfunction which should appropriate reported
and not only visible if dbg is enabled.

As written above, the check is necessary as the counterpart is missing
otherwise and init would fail later with a more obfuscating error
message.

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device
  2022-11-18  8:27         ` Robert Richter
@ 2022-11-18 16:55           ` Dan Williams
  2022-11-18 19:53             ` Robert Richter
  0 siblings, 1 reply; 52+ messages in thread
From: Dan Williams @ 2022-11-18 16:55 UTC (permalink / raw)
  To: Robert Richter, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

Robert Richter wrote:
> On 17.11.22 09:27:23, Dan Williams wrote:
> > Robert Richter wrote:
> > > On 16.11.22 11:24:48, Dan Williams wrote:
> > > > Robert Richter wrote:
> > > > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > > > entire device. Add a check to prevent registration of any other PCI
> > > > > device on the bus as a CXL memory device.
> > > > 
> > > > Can you reference the specification wording that indicates that the OS
> > > > needs to actively avoid these situations, or otherwise point to the real
> > > > world scenario where this filtering is needed?
> > > 
> > > CXL 3.0
> > > 
> > > 8.1.3 PCIe DVSEC for CXL Device
> > > 
> > > """
> > > An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> > > and can expose one or more PCIe device numbers and function numbers at this bus
> > > number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> > > Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> > > in Figure 8-1.
> > > """
> > > 
> > > """
> > > In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> > > control the CXL functionality of the entire device.
> > > """
> > > 
> > > There are some other occurrences. I think this is even true for VH
> > > mode, as multiple CXL devices on the bus are exposed through multiple
> > > DSPs or Root Ports.
> > > 
> > > Anyway, I limited this to an RCD only, esp. because its counterpart
> > > would be missing and thus port mapping would fail otherwise. See
> > > restricted_host_enumerate_dport() of this series.
> > > 
> > > > 
> > > > > 
> > > > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > > > ---
> > > > >  drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > > > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > > > --- a/drivers/cxl/pci.c
> > > > > +++ b/drivers/cxl/pci.c
> > > > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > > >  	}
> > > > >  }
> > > > >  
> > > > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > > > +{
> > > > > +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > > > +		return 0;		/* no RCD */
> > > > > +
> > > > > +	if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > > > +		return 0;		/* ok */
> > > > > +
> > > > > +	dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> > > > 
> > > > s/0x%02x/%#02x/
> > > > 
> > > > > +		pdev->devfn, pcie_dvsec);
> > > 
> > > Ok.
> > > 
> > > > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > > > on a benign condition.
> > > 
> > > I have chosen dev_warn() here as this is a non-compliant unexpected
> > > behavior of the device. There are no (legal) cases this may happen. I
> > > suppose you are worried about spamming the console here, but that
> > > error should be reported somewhere and thus being visible.
> > 
> > There are so many spec illegal values and conditions that the driver
> > could checki, but does not. The reason I am poking here is why does the
> > driver need to be explicit about *this* illegal condition versus all the
> > other potential conditions? What is the practical end user impact if
> > Linux does not include this change? For example, if it is just one
> > vendor that made this mistake that can be an explicit quirk.
> > 
> > A dev_warn() is not necessary for simple quirks.
> 
> This is not simply a cross check, the driver prevents enablement of
> CXL mem devs other than PCI_DEVFN(0, 0). It shouldn't silently drop
> out then. It's a device malfunction which should appropriate reported
> and not only visible if dbg is enabled.
> 
> As written above, the check is necessary as the counterpart is missing

It is only necessary if this condition happens in practice, not a
theoretically. So I am asking, are you seeing this with an actual device
that someone will use in production? If so, that's what pci quirks are
for to keep those workarounds organized in a common location.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device
  2022-11-18 16:55           ` Dan Williams
@ 2022-11-18 19:53             ` Robert Richter
  2022-11-18 20:30               ` Dan Williams
  0 siblings, 1 reply; 52+ messages in thread
From: Robert Richter @ 2022-11-18 19:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

On 18.11.22 08:55:13, Dan Williams wrote:
> Robert Richter wrote:
> > On 17.11.22 09:27:23, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > On 16.11.22 11:24:48, Dan Williams wrote:
> > > > > Robert Richter wrote:
> > > > > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > > > > entire device. Add a check to prevent registration of any other PCI
> > > > > > device on the bus as a CXL memory device.
> > > > > 
> > > > > Can you reference the specification wording that indicates that the OS
> > > > > needs to actively avoid these situations, or otherwise point to the real
> > > > > world scenario where this filtering is needed?
> > > > 
> > > > CXL 3.0
> > > > 
> > > > 8.1.3 PCIe DVSEC for CXL Device
> > > > 
> > > > """
> > > > An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> > > > and can expose one or more PCIe device numbers and function numbers at this bus
> > > > number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> > > > Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> > > > in Figure 8-1.
> > > > """
> > > > 
> > > > """
> > > > In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> > > > control the CXL functionality of the entire device.
> > > > """
> > > > 
> > > > There are some other occurrences. I think this is even true for VH
> > > > mode, as multiple CXL devices on the bus are exposed through multiple
> > > > DSPs or Root Ports.
> > > > 
> > > > Anyway, I limited this to an RCD only, esp. because its counterpart
> > > > would be missing and thus port mapping would fail otherwise. See
> > > > restricted_host_enumerate_dport() of this series.
> > > > 
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > > > > ---
> > > > > >  drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > > > > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > > > > --- a/drivers/cxl/pci.c
> > > > > > +++ b/drivers/cxl/pci.c
> > > > > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > > > >  	}
> > > > > >  }
> > > > > >  
> > > > > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > > > > +{
> > > > > > +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > > > > +		return 0;		/* no RCD */
> > > > > > +
> > > > > > +	if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > > > > +		return 0;		/* ok */
> > > > > > +
> > > > > > +	dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> > > > > 
> > > > > s/0x%02x/%#02x/
> > > > > 
> > > > > > +		pdev->devfn, pcie_dvsec);
> > > > 
> > > > Ok.
> > > > 
> > > > > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > > > > on a benign condition.
> > > > 
> > > > I have chosen dev_warn() here as this is a non-compliant unexpected
> > > > behavior of the device. There are no (legal) cases this may happen. I
> > > > suppose you are worried about spamming the console here, but that
> > > > error should be reported somewhere and thus being visible.
> > > 
> > > There are so many spec illegal values and conditions that the driver
> > > could checki, but does not. The reason I am poking here is why does the
> > > driver need to be explicit about *this* illegal condition versus all the
> > > other potential conditions? What is the practical end user impact if
> > > Linux does not include this change? For example, if it is just one
> > > vendor that made this mistake that can be an explicit quirk.
> > > 
> > > A dev_warn() is not necessary for simple quirks.
> > 
> > This is not simply a cross check, the driver prevents enablement of
> > CXL mem devs other than PCI_DEVFN(0, 0). It shouldn't silently drop
> > out then. It's a device malfunction which should appropriate reported
> > and not only visible if dbg is enabled.
> > 
> > As written above, the check is necessary as the counterpart is missing
> 
> It is only necessary if this condition happens in practice, not a
> theoretically. So I am asking, are you seeing this with an actual device
> that someone will use in production? If so, that's what pci quirks are
> for to keep those workarounds organized in a common location.

I can make it a dev_dbg() message. But I do not understand the ratio
behind this. This is not a quirk nor a workaround or a fix for
something. The likely paths are the conditions checked that return 0.
Only if the unlikely case happens where a CXL mem dev is not a dev 0,
func 0, a warning is shown to inform the user that this dev is not
enabled. So yes, this might be theoretical similar to that a driver
cannot allocate memory. But why not print this as a warning message
then?

Anyway, let's make it a dev_dbg().

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device
  2022-11-18 19:53             ` Robert Richter
@ 2022-11-18 20:30               ` Dan Williams
  0 siblings, 0 replies; 52+ messages in thread
From: Dan Williams @ 2022-11-18 20:30 UTC (permalink / raw)
  To: Robert Richter, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	linux-cxl, linux-kernel, Bjorn Helgaas, Rafael J. Wysocki,
	Len Brown, Jonathan Cameron, Davidlohr Bueso, Dave Jiang

Robert Richter wrote:
> On 18.11.22 08:55:13, Dan Williams wrote:
> > Robert Richter wrote:
> > > On 17.11.22 09:27:23, Dan Williams wrote:
> > > > Robert Richter wrote:
> > > > > On 16.11.22 11:24:48, Dan Williams wrote:
> > > > > > Robert Richter wrote:
> > > > > > > The Device 0, Function 0 DVSEC controls the CXL functionality of the
> > > > > > > entire device. Add a check to prevent registration of any other PCI
> > > > > > > device on the bus as a CXL memory device.
> > > > > > 
> > > > > > Can you reference the specification wording that indicates that the OS
> > > > > > needs to actively avoid these situations, or otherwise point to the real
> > > > > > world scenario where this filtering is needed?
> > > > > 
> > > > > CXL 3.0
> > > > > 
> > > > > 8.1.3 PCIe DVSEC for CXL Device
> > > > > 
> > > > > """
> > > > > An RCD creates a new PCIe enumeration hierarchy. As such, it spawns a new Root Bus
> > > > > and can expose one or more PCIe device numbers and function numbers at this bus
> > > > > number. These are exposed as Root Complex Integrated Endpoints (RCiEP). The PCIe
> > > > > Configuration Space of Device 0, Function 0 shall include the CXL PCIe DVSEC as shown
> > > > > in Figure 8-1.
> > > > > """
> > > > > 
> > > > > """
> > > > > In either case, the capability, status, and control fields in Device 0, Function 0 DVSEC
> > > > > control the CXL functionality of the entire device.
> > > > > """
> > > > > 
> > > > > There are some other occurrences. I think this is even true for VH
> > > > > mode, as multiple CXL devices on the bus are exposed through multiple
> > > > > DSPs or Root Ports.
> > > > > 
> > > > > Anyway, I limited this to an RCD only, esp. because its counterpart
> > > > > would be missing and thus port mapping would fail otherwise. See
> > > > > restricted_host_enumerate_dport() of this series.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > > > > > ---
> > > > > > >  drivers/cxl/pci.c | 25 +++++++++++++++++++++++--
> > > > > > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > > > > index faeb5d9d7a7a..cc4f206f24b3 100644
> > > > > > > --- a/drivers/cxl/pci.c
> > > > > > > +++ b/drivers/cxl/pci.c
> > > > > > > @@ -428,11 +428,26 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > > > > >  	}
> > > > > > >  }
> > > > > > >  
> > > > > > > +static int check_restricted_device(struct pci_dev *pdev, u16 pcie_dvsec)
> > > > > > > +{
> > > > > > > +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
> > > > > > > +		return 0;		/* no RCD */
> > > > > > > +
> > > > > > > +	if (pdev->devfn == PCI_DEVFN(0, 0) && pcie_dvsec)
> > > > > > > +		return 0;		/* ok */
> > > > > > > +
> > > > > > > +	dev_warn(&pdev->dev, "Skipping RCD: devfn=0x%02x dvsec=%u\n",
> > > > > > 
> > > > > > s/0x%02x/%#02x/
> > > > > > 
> > > > > > > +		pdev->devfn, pcie_dvsec);
> > > > > 
> > > > > Ok.
> > > > > 
> > > > > > This looks like a dev_dbg() to me. Otherwise a warning will always fire
> > > > > > on a benign condition.
> > > > > 
> > > > > I have chosen dev_warn() here as this is a non-compliant unexpected
> > > > > behavior of the device. There are no (legal) cases this may happen. I
> > > > > suppose you are worried about spamming the console here, but that
> > > > > error should be reported somewhere and thus being visible.
> > > > 
> > > > There are so many spec illegal values and conditions that the driver
> > > > could checki, but does not. The reason I am poking here is why does the
> > > > driver need to be explicit about *this* illegal condition versus all the
> > > > other potential conditions? What is the practical end user impact if
> > > > Linux does not include this change? For example, if it is just one
> > > > vendor that made this mistake that can be an explicit quirk.
> > > > 
> > > > A dev_warn() is not necessary for simple quirks.
> > > 
> > > This is not simply a cross check, the driver prevents enablement of
> > > CXL mem devs other than PCI_DEVFN(0, 0). It shouldn't silently drop
> > > out then. It's a device malfunction which should appropriate reported
> > > and not only visible if dbg is enabled.
> > > 
> > > As written above, the check is necessary as the counterpart is missing
> > 
> > It is only necessary if this condition happens in practice, not a
> > theoretically. So I am asking, are you seeing this with an actual device
> > that someone will use in production? If so, that's what pci quirks are
> > for to keep those workarounds organized in a common location.
> 
> I can make it a dev_dbg() message. But I do not understand the ratio
> behind this. This is not a quirk nor a workaround or a fix for
> something. The likely paths are the conditions checked that return 0.
> Only if the unlikely case happens where a CXL mem dev is not a dev 0,
> func 0, a warning is shown to inform the user that this dev is not
> enabled. So yes, this might be theoretical similar to that a driver
> cannot allocate memory. But why not print this as a warning message
> then?
> 
> Anyway, let's make it a dev_dbg().

Sorry for the thrash, lets set aside the the dev_dbg() vs dev_warn()
issue. It is minor compared to *why* this patch needs to be applied. I
would expect all production devices to be spec compliant and not
advertise the CXL memory device class code on anything but function0.

So either, there is a real threat that someone will build such a mistake
and Linux needs to take this action to protect itself, or no one will
ever build such a device and this patch is not needed.

Basically I read the changelog and it answered the "What?" question, but
it did not answer the "Why?" question.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2022-11-18 20:30 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-09 10:40 [PATCH v3 0/9] cxl: Add support for Restricted CXL hosts (RCD mode) Robert Richter
2022-11-09 10:40 ` [PATCH v3 1/9] cxl/acpi: Register CXL host ports by bridge device Robert Richter
2022-11-09 23:11   ` Bjorn Helgaas
2022-11-14 20:22   ` Dan Williams
2022-11-15 10:37     ` Robert Richter
2022-11-09 10:40 ` [PATCH v3 2/9] cxl/acpi: Extract component registers of restricted hosts from RCRB Robert Richter
2022-11-14 21:30   ` Dan Williams
2022-11-15 12:17     ` Robert Richter
2022-11-15 17:54       ` Dan Williams
2022-11-17 12:43         ` Robert Richter
2022-11-17 17:20           ` Dan Williams
2022-11-17 18:25             ` Robert Richter
2022-11-17 19:23               ` Dan Williams
2022-11-18  8:12                 ` Robert Richter
2022-11-09 10:40 ` [PATCH v3 3/9] cxl/mem: Adjust cxl_mem_find_port() to find an RCH's port Robert Richter
2022-11-14 23:45   ` Dan Williams
2022-11-15 13:12     ` Robert Richter
2022-11-15 18:06       ` Dan Williams
2022-11-17 18:13         ` Robert Richter
2022-11-09 10:40 ` [PATCH v3 4/9] cxl/mem: Skip intermediate port enumeration of restricted endpoints (RCDs) Robert Richter
2022-11-09 16:55   ` Dave Jiang
2022-11-15  0:07   ` Dan Williams
2022-11-15 13:17     ` Robert Richter
2022-11-15 18:08       ` Dan Williams
2022-11-17 18:46         ` Robert Richter
2022-11-15  0:24   ` Dan Williams
2022-11-15 13:28     ` Robert Richter
2022-11-15 18:09       ` Dan Williams
2022-11-09 10:40 ` [PATCH v3 5/9] cxl/pci: Only register RCDs with device 0, function 0 as CXL memory device Robert Richter
2022-11-16 19:24   ` Dan Williams
2022-11-17 15:56     ` Robert Richter
2022-11-17 17:27       ` Dan Williams
2022-11-18  8:27         ` Robert Richter
2022-11-18 16:55           ` Dan Williams
2022-11-18 19:53             ` Robert Richter
2022-11-18 20:30               ` Dan Williams
2022-11-09 10:40 ` [PATCH v3 6/9] cxl/pci: Do not ignore PCI config read errors in match_add_dports() Robert Richter
2022-11-09 23:09   ` Bjorn Helgaas
2022-11-11 11:56     ` Robert Richter
2022-11-11 12:07       ` Robert Richter
2022-11-16 19:36   ` Dan Williams
2022-11-09 10:40 ` [PATCH v3 7/9] cxl/pci: Factor out code in match_add_dports() to pci_dev_add_dport() Robert Richter
2022-11-16 19:37   ` Dan Williams
2022-11-09 10:40 ` [PATCH v3 8/9] cxl/pci: Extend devm_cxl_port_enumerate_dports() to support restricted hosts (RCH) Robert Richter
2022-11-11 11:59   ` Robert Richter
2022-11-16 20:55   ` Dan Williams
2022-11-09 10:40 ` [PATCH v3 9/9] cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support Robert Richter
2022-11-09 12:22   ` Rafael J. Wysocki
2022-11-09 23:35   ` Bjorn Helgaas
2022-11-10  0:51     ` Verma, Vishal L
2022-11-10 17:10       ` Bjorn Helgaas
2022-11-10 19:43     ` Terry Bowman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.