Changes since v1 [1]: - Add stronger rationale for the 'struct cxl_regs' arrangement that accommodates devices that compose register blocks mixing "component", "device", and "memory device" register sets. (Christoph) - Reorganize root, address space, and port device creation into separate 'alloc' and 'add' steps. - Rename "upstream_port" to "port_host". This is the PCI or ACPI device that implements the CXL port capability. - Rename @parent to @host in init routines to clarify the device that is the context for devm from the device-model parent of the device being created. - Rename port.cxl_regs_phys to port.component_regs_phys - Add initialization for cxl_root.port.{target_id,component_regs_phys,port_host}, where target_id and component_regs_phys are set to invalid values (do not apply to cxl_root as the first port under root is the first actual device in the system and port_host is the ACPI0017 device. [1]: http://lore.kernel.org/r/161662142382.1723715.5934723983022398253.stgit@dwillia2-desk3.amr.corp.intel.com --- This series is a preview of the proposed infrastructure for enabling dynamic mapping of Host-managed Device Memory (HDM) Decoders. It includes a not-for-upstream hack at the tail of the series to stand-in for the in-flight ACPICA enabling. The goal is to get review of the proposal in parallel with other in-flight dependencies. The next step after this is to add dynamic enumeration and assignment of HDM Decoders in coordination with per-cxl_port driver instances. --- The enumeration starts with the ACPI0017 driver registering a 'struct cxl_root' object to establish the top of a cxl_port topology. It then scans the ACPI bus looking for ACPI0016 instances. The cxl_root object is a singleton* anchor to hang "address-space" objects and be a parent device for the downstream 'struct cxl_port' instances. An address-space has a 1:1 relationship with a platform defined memory resource range, like _CRS for PCIE Host Bridges. Use module parameters to model a root-level HDM decoder that all downstream ports further decode, to be replaced with a Code First ECN to do the same. Each address space is modeled as a sysfs object that also shows up in /proc/iomem as "CXL Address Space". That iomem resource is functionally equivalent to the root-level 'PCI Bus' resources for PCIE.mmio while 'CXL Address Space' indicates space for CXL.mem to be mapped. "System RAM" and "Persistent Memory", when mapped by HDM decoders, will appear as child CXL.mem resources. Once a 'struct cxl_root' is established the host bridge is modeled as 1 upstream 'struct cxl_port' and N downstream 'struct cxl_port' instances (one per Root Port), just like a PCIE switch. The host-bridge upstream port optionally has the HDM decoder registers from the CHBCR if the host-bridge has multiple PCIE/CXL root ports. Single-ported host bridges will not have HDM decoders in the CHBCR space (see CHBCR note in 8.2.5.12 CXL HDM Decoder Capability Structure), but the 'struct cxl_port' object is still needed to represent other CXL capabilities and access port-specific component registers outside of HDM decoders. Each 'struct cxl_port' has a 'target_id' attribute that answers the question "what port am I in my upstream port's HDM decoder target list?". For the host-bridge struct cxl_port, the first tier of ports below cxl_root.port, the id is derived from the ordinal mapping of the ACPI0016 id (instance id, _UID, or other handle TBD), for all other ports the id is the PCIE Root Port ID from the Link Capabilities register [1]. The mapping of ordinal port identifiers relative to their parent may change once libcxl and cxl-cli prove out region creation, or a better option is found to establish a static device path / persistent naming scheme. System software must not assume that 'struct cxl_port' device names will be static from one boot to the next. See patch7 for a tree(1) topology picture of what QEMU is producing today with this enabling. * cxl_root is singleton only by convention. A given cxl_root could represent 1 to N address spaces, this patch set chooses to implement 1 cxl_root for all address spaces. [1]: CXL 2.0 8.2.5.12.8 CXL HDM Decoder 0 Target List Low Register (Offset 24h) ...The Target Port Identifier for a given Downstream Port is reported via Port Number field in Link Capabilities Register. (See PCI Express Base Specification). --- Dan Williams (8): cxl/mem: Move some definitions to mem.h cxl/mem: Introduce 'struct cxl_regs' for "composable" CXL devices cxl/core: Rename bus.c to core.c cxl/core: Refactor CXL register lookup for bridge reuse cxl/acpi: Introduce ACPI0017 driver and cxl_root cxl/Kconfig: Default drivers to CONFIG_CXL_BUS cxl/port: Introduce cxl_port objects cxl/acpi: Add module parameters to stand in for ACPI tables Documentation/driver-api/cxl/memory-devices.rst | 6 drivers/cxl/Kconfig | 16 + drivers/cxl/Makefile | 6 drivers/cxl/acpi.c | 215 +++++++++ drivers/cxl/bus.c | 29 - drivers/cxl/core.c | 553 +++++++++++++++++++++++ drivers/cxl/cxl.h | 146 ++++-- drivers/cxl/mem.c | 97 +--- drivers/cxl/mem.h | 82 +++ 9 files changed, 990 insertions(+), 160 deletions(-) create mode 100644 drivers/cxl/acpi.c delete mode 100644 drivers/cxl/bus.c create mode 100644 drivers/cxl/core.c create mode 100644 drivers/cxl/mem.h base-commit: a38fd8748464831584a19438cbb3082b5a2dab15
In preparation for sharing cxl.h with other generic CXL consumers, move / consolidate some of the memory device specifics to mem.h. Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/cxl.h | 57 ------------------------------------ drivers/cxl/mem.c | 25 +--------------- drivers/cxl/mem.h | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+), 81 deletions(-) create mode 100644 drivers/cxl/mem.h diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 6f14838c2d25..2e3bdacb32e7 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -34,62 +34,5 @@ #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18 #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20 -/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */ -#define CXLMDEV_STATUS_OFFSET 0x0 -#define CXLMDEV_DEV_FATAL BIT(0) -#define CXLMDEV_FW_HALT BIT(1) -#define CXLMDEV_STATUS_MEDIA_STATUS_MASK GENMASK(3, 2) -#define CXLMDEV_MS_NOT_READY 0 -#define CXLMDEV_MS_READY 1 -#define CXLMDEV_MS_ERROR 2 -#define CXLMDEV_MS_DISABLED 3 -#define CXLMDEV_READY(status) \ - (FIELD_GET(CXLMDEV_STATUS_MEDIA_STATUS_MASK, status) == \ - CXLMDEV_MS_READY) -#define CXLMDEV_MBOX_IF_READY BIT(4) -#define CXLMDEV_RESET_NEEDED_MASK GENMASK(7, 5) -#define CXLMDEV_RESET_NEEDED_NOT 0 -#define CXLMDEV_RESET_NEEDED_COLD 1 -#define CXLMDEV_RESET_NEEDED_WARM 2 -#define CXLMDEV_RESET_NEEDED_HOT 3 -#define CXLMDEV_RESET_NEEDED_CXL 4 -#define CXLMDEV_RESET_NEEDED(status) \ - (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \ - CXLMDEV_RESET_NEEDED_NOT) - -struct cxl_memdev; -/** - * struct cxl_mem - A CXL memory device - * @pdev: The PCI device associated with this CXL device. - * @regs: IO mappings to the device's MMIO - * @status_regs: CXL 2.0 8.2.8.3 Device Status Registers - * @mbox_regs: CXL 2.0 8.2.8.4 Mailbox Registers - * @memdev_regs: CXL 2.0 8.2.8.5 Memory Device Registers - * @payload_size: Size of space for payload - * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register) - * @mbox_mutex: Mutex to synchronize mailbox access. - * @firmware_version: Firmware version for the memory device. - * @enabled_commands: Hardware commands found enabled in CEL. - * @pmem_range: Persistent memory capacity information. - * @ram_range: Volatile memory capacity information. - */ -struct cxl_mem { - struct pci_dev *pdev; - void __iomem *regs; - struct cxl_memdev *cxlmd; - - void __iomem *status_regs; - void __iomem *mbox_regs; - void __iomem *memdev_regs; - - size_t payload_size; - struct mutex mbox_mutex; /* Protects device mailbox and firmware */ - char firmware_version[0x10]; - unsigned long *enabled_cmds; - - struct range pmem_range; - struct range ram_range; -}; - extern struct bus_type cxl_bus_type; #endif /* __CXL_H__ */ diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 244cb7d89678..45871ef65152 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -12,6 +12,7 @@ #include <linux/io-64-nonatomic-lo-hi.h> #include "pci.h" #include "cxl.h" +#include "mem.h" /** * DOC: cxl mem @@ -29,12 +30,6 @@ * - Handle and manage error conditions. */ -/* - * An entire PCI topology full of devices should be enough for any - * config - */ -#define CXL_MEM_MAX_DEVS 65536 - #define cxl_doorbell_busy(cxlm) \ (readl((cxlm)->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET) & \ CXLDEV_MBOX_CTRL_DOORBELL) @@ -91,24 +86,6 @@ struct mbox_cmd { #define CXL_MBOX_SUCCESS 0 }; -/** - * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device - * @dev: driver core device object - * @cdev: char dev core object for ioctl operations - * @cxlm: pointer to the parent device driver data - * @ops_active: active user of @cxlm in ops handlers - * @ops_dead: completion when all @cxlm ops users have exited - * @id: id number of this memdev instance. - */ -struct cxl_memdev { - struct device dev; - struct cdev cdev; - struct cxl_mem *cxlm; - struct percpu_ref ops_active; - struct completion ops_dead; - int id; -}; - static int cxl_mem_major; static DEFINE_IDA(cxl_memdev_ida); static struct dentry *cxl_debugfs; diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h new file mode 100644 index 000000000000..daa9aba0e218 --- /dev/null +++ b/drivers/cxl/mem.h @@ -0,0 +1,85 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright(c) 2020-2021 Intel Corporation. */ +#ifndef __CXL_MEM_H__ +#define __CXL_MEM_H__ + +/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */ +#define CXLMDEV_STATUS_OFFSET 0x0 +#define CXLMDEV_DEV_FATAL BIT(0) +#define CXLMDEV_FW_HALT BIT(1) +#define CXLMDEV_STATUS_MEDIA_STATUS_MASK GENMASK(3, 2) +#define CXLMDEV_MS_NOT_READY 0 +#define CXLMDEV_MS_READY 1 +#define CXLMDEV_MS_ERROR 2 +#define CXLMDEV_MS_DISABLED 3 +#define CXLMDEV_READY(status) \ + (FIELD_GET(CXLMDEV_STATUS_MEDIA_STATUS_MASK, status) == \ + CXLMDEV_MS_READY) +#define CXLMDEV_MBOX_IF_READY BIT(4) +#define CXLMDEV_RESET_NEEDED_MASK GENMASK(7, 5) +#define CXLMDEV_RESET_NEEDED_NOT 0 +#define CXLMDEV_RESET_NEEDED_COLD 1 +#define CXLMDEV_RESET_NEEDED_WARM 2 +#define CXLMDEV_RESET_NEEDED_HOT 3 +#define CXLMDEV_RESET_NEEDED_CXL 4 +#define CXLMDEV_RESET_NEEDED(status) \ + (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \ + CXLMDEV_RESET_NEEDED_NOT) + +/* + * An entire PCI topology full of devices should be enough for any + * config + */ +#define CXL_MEM_MAX_DEVS 65536 + +/** + * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device + * @dev: driver core device object + * @cdev: char dev core object for ioctl operations + * @cxlm: pointer to the parent device driver data + * @ops_active: active user of @cxlm in ops handlers + * @ops_dead: completion when all @cxlm ops users have exited + * @id: id number of this memdev instance. + */ +struct cxl_memdev { + struct device dev; + struct cdev cdev; + struct cxl_mem *cxlm; + struct percpu_ref ops_active; + struct completion ops_dead; + int id; +}; + +/** + * struct cxl_mem - A CXL memory device + * @pdev: The PCI device associated with this CXL device. + * @regs: IO mappings to the device's MMIO + * @status_regs: CXL 2.0 8.2.8.3 Device Status Registers + * @mbox_regs: CXL 2.0 8.2.8.4 Mailbox Registers + * @memdev_regs: CXL 2.0 8.2.8.5 Memory Device Registers + * @payload_size: Size of space for payload + * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register) + * @mbox_mutex: Mutex to synchronize mailbox access. + * @firmware_version: Firmware version for the memory device. + * @enabled_commands: Hardware commands found enabled in CEL. + * @pmem_range: Persistent memory capacity information. + * @ram_range: Volatile memory capacity information. + */ +struct cxl_mem { + struct pci_dev *pdev; + void __iomem *regs; + struct cxl_memdev *cxlmd; + + void __iomem *status_regs; + void __iomem *mbox_regs; + void __iomem *memdev_regs; + + size_t payload_size; + struct mutex mbox_mutex; /* Protects device mailbox and firmware */ + char firmware_version[0x10]; + unsigned long *enabled_cmds; + + struct range pmem_range; + struct range ram_range; +}; +#endif /* __CXL_MEM_H__ */
CXL MMIO register blocks are organized by device type and capabilities. There are Component registers, Device registers (yes, an ambiguous name), and Memory Device registers (a specific extension of Device registers). It is possible for a given device instance (endpoint or port) to implement register sets from multiple of the above categories. The driver code that enumerates and maps the registers is type specific so it is useful to have a dedicated type and helpers for each block type. At the same time, once the registers are mapped the origin type does not matter. It is overly pedantic to reference the register block type in code that is using the registers. In preparation for the endpoint driver to incorporate Component registers into its MMIO operations reorganize the registers to allow typed enumeration + mapping, but anonymous usage. With the end state of 'struct cxl_regs' to be: struct cxl_regs { union { struct { CXL_DEVICE_REGS(); }; struct cxl_device_regs device_regs; }; union { struct { CXL_COMPONENT_REGS(); }; struct cxl_component_regs component_regs; }; }; With this arrangement the driver can share component init code with ports, but when using the registers it can directly reference the component register block type by name without the 'component_regs' prefix. So, map + enumerate can be shared across drivers of different CXL classes e.g.: void cxl_setup_device_regs(struct device *dev, void __iomem *base, struct cxl_device_regs *regs); void cxl_setup_component_regs(struct device *dev, void __iomem *base, struct cxl_component_regs *regs); ...while inline usage in the driver need not indicate where the registers came from: readl(cxlm->regs.mbox + MBOX_OFFSET); readl(cxlm->regs.hdm + HDM_OFFSET); ...instead of: readl(cxlm->regs.device_regs.mbox + MBOX_OFFSET); readl(cxlm->regs.component_regs.hdm + HDM_OFFSET); This complexity of the definition in .h yields improvement in code readability in .c while maintaining type-safety for organization of setup code. It prepares the implementation to maintain organization in the face of CXL devices that compose register interfaces consisting of multiple types. Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/cxl.h | 33 +++++++++++++++++++++++++++++++++ drivers/cxl/mem.c | 44 ++++++++++++++++++++++++-------------------- drivers/cxl/mem.h | 13 +++++-------- 3 files changed, 62 insertions(+), 28 deletions(-) diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 2e3bdacb32e7..37325e504fb7 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -34,5 +34,38 @@ #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18 #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20 +/* See note for 'struct cxl_regs' for the rationale of this organization */ +#define CXL_DEVICE_REGS() \ + void __iomem *status; \ + void __iomem *mbox; \ + void __iomem *memdev + +/** + * struct cxl_device_regs - Common container of CXL Device register + * block base pointers + * @status: CXL 2.0 8.2.8.3 Device Status Registers + * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers + * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers + */ +struct cxl_device_regs { + CXL_DEVICE_REGS(); +}; + +/* + * Note, the anonymous union organization allows for per + * register-block-type helper routines, without requiring block-type + * agnostic code to include the prefix. I.e. + * cxl_setup_device_regs(&cxlm->regs.dev) vs readl(cxlm->regs.mbox). + * The specificity reads naturally from left-to-right. + */ +struct cxl_regs { + union { + struct { + CXL_DEVICE_REGS(); + }; + struct cxl_device_regs device_regs; + }; +}; + extern struct bus_type cxl_bus_type; #endif /* __CXL_H__ */ diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 45871ef65152..6951243d128e 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -31,7 +31,7 @@ */ #define cxl_doorbell_busy(cxlm) \ - (readl((cxlm)->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET) & \ + (readl((cxlm)->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET) & \ CXLDEV_MBOX_CTRL_DOORBELL) /* CXL 2.0 - 8.2.8.4 */ @@ -271,7 +271,7 @@ static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm, static int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, struct mbox_cmd *mbox_cmd) { - void __iomem *payload = cxlm->mbox_regs + CXLDEV_MBOX_PAYLOAD_OFFSET; + void __iomem *payload = cxlm->regs.mbox + CXLDEV_MBOX_PAYLOAD_OFFSET; u64 cmd_reg, status_reg; size_t out_len; int rc; @@ -314,12 +314,12 @@ static int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, } /* #2, #3 */ - writeq(cmd_reg, cxlm->mbox_regs + CXLDEV_MBOX_CMD_OFFSET); + writeq(cmd_reg, cxlm->regs.mbox + CXLDEV_MBOX_CMD_OFFSET); /* #4 */ dev_dbg(&cxlm->pdev->dev, "Sending command\n"); writel(CXLDEV_MBOX_CTRL_DOORBELL, - cxlm->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET); + cxlm->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET); /* #5 */ rc = cxl_mem_wait_for_doorbell(cxlm); @@ -329,7 +329,7 @@ static int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, } /* #6 */ - status_reg = readq(cxlm->mbox_regs + CXLDEV_MBOX_STATUS_OFFSET); + status_reg = readq(cxlm->regs.mbox + CXLDEV_MBOX_STATUS_OFFSET); mbox_cmd->return_code = FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg); @@ -339,7 +339,7 @@ static int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, } /* #7 */ - cmd_reg = readq(cxlm->mbox_regs + CXLDEV_MBOX_CMD_OFFSET); + cmd_reg = readq(cxlm->regs.mbox + CXLDEV_MBOX_CMD_OFFSET); out_len = FIELD_GET(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, cmd_reg); /* #8 */ @@ -400,7 +400,7 @@ static int cxl_mem_mbox_get(struct cxl_mem *cxlm) goto out; } - md_status = readq(cxlm->memdev_regs + CXLMDEV_STATUS_OFFSET); + md_status = readq(cxlm->regs.memdev + CXLMDEV_STATUS_OFFSET); if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n"); rc = -EBUSY; @@ -868,7 +868,7 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm) int cap, cap_count; u64 cap_array; - cap_array = readq(cxlm->regs + CXLDEV_CAP_ARRAY_OFFSET); + cap_array = readq(cxlm->base + CXLDEV_CAP_ARRAY_OFFSET); if (FIELD_GET(CXLDEV_CAP_ARRAY_ID_MASK, cap_array) != CXLDEV_CAP_ARRAY_CAP_ID) return -ENODEV; @@ -881,25 +881,25 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm) u16 cap_id; cap_id = FIELD_GET(CXLDEV_CAP_HDR_CAP_ID_MASK, - readl(cxlm->regs + cap * 0x10)); - offset = readl(cxlm->regs + cap * 0x10 + 0x4); - register_block = cxlm->regs + offset; + readl(cxlm->base + cap * 0x10)); + offset = readl(cxlm->base + cap * 0x10 + 0x4); + register_block = cxlm->base + offset; switch (cap_id) { case CXLDEV_CAP_CAP_ID_DEVICE_STATUS: dev_dbg(dev, "found Status capability (0x%x)\n", offset); - cxlm->status_regs = register_block; + cxlm->regs.status = register_block; break; case CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX: dev_dbg(dev, "found Mailbox capability (0x%x)\n", offset); - cxlm->mbox_regs = register_block; + cxlm->regs.mbox = register_block; break; case CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX: dev_dbg(dev, "found Secondary Mailbox capability (0x%x)\n", offset); break; case CXLDEV_CAP_CAP_ID_MEMDEV: dev_dbg(dev, "found Memory Device capability (0x%x)\n", offset); - cxlm->memdev_regs = register_block; + cxlm->regs.memdev = register_block; break; default: dev_dbg(dev, "Unknown cap ID: %d (0x%x)\n", cap_id, offset); @@ -907,11 +907,11 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm) } } - if (!cxlm->status_regs || !cxlm->mbox_regs || !cxlm->memdev_regs) { + if (!cxlm->regs.status || !cxlm->regs.mbox || !cxlm->regs.memdev) { dev_err(dev, "registers not found: %s%s%s\n", - !cxlm->status_regs ? "status " : "", - !cxlm->mbox_regs ? "mbox " : "", - !cxlm->memdev_regs ? "memdev" : ""); + !cxlm->regs.status ? "status " : "", + !cxlm->regs.mbox ? "mbox " : "", + !cxlm->regs.memdev ? "memdev" : ""); return -ENXIO; } @@ -920,7 +920,7 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm) static int cxl_mem_setup_mailbox(struct cxl_mem *cxlm) { - const int cap = readl(cxlm->mbox_regs + CXLDEV_MBOX_CAPS_OFFSET); + const int cap = readl(cxlm->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET); cxlm->payload_size = 1 << FIELD_GET(CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK, cap); @@ -980,7 +980,7 @@ static struct cxl_mem *cxl_mem_create(struct pci_dev *pdev, u32 reg_lo, mutex_init(&cxlm->mbox_mutex); cxlm->pdev = pdev; - cxlm->regs = regs + offset; + cxlm->base = regs + offset; cxlm->enabled_cmds = devm_kmalloc_array(dev, BITS_TO_LONGS(cxl_cmd_count), sizeof(unsigned long), @@ -1495,6 +1495,10 @@ static __init int cxl_mem_init(void) dev_t devt; int rc; + /* Double check the anonymous union trickery in struct cxl_regs */ + BUILD_BUG_ON(offsetof(struct cxl_regs, memdev) != + offsetof(struct cxl_regs, device_regs.memdev)); + rc = alloc_chrdev_region(&devt, 0, CXL_MEM_MAX_DEVS, "cxl"); if (rc) return rc; diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h index daa9aba0e218..c247cf9c71af 100644 --- a/drivers/cxl/mem.h +++ b/drivers/cxl/mem.h @@ -53,10 +53,9 @@ struct cxl_memdev { /** * struct cxl_mem - A CXL memory device * @pdev: The PCI device associated with this CXL device. - * @regs: IO mappings to the device's MMIO - * @status_regs: CXL 2.0 8.2.8.3 Device Status Registers - * @mbox_regs: CXL 2.0 8.2.8.4 Mailbox Registers - * @memdev_regs: CXL 2.0 8.2.8.5 Memory Device Registers + * @base: IO mappings to the device's MMIO + * @cxlmd: Logical memory device chardev / interface + * @regs: Parsed register blocks * @payload_size: Size of space for payload * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register) * @mbox_mutex: Mutex to synchronize mailbox access. @@ -67,12 +66,10 @@ struct cxl_memdev { */ struct cxl_mem { struct pci_dev *pdev; - void __iomem *regs; + void __iomem *base; struct cxl_memdev *cxlmd; - void __iomem *status_regs; - void __iomem *mbox_regs; - void __iomem *memdev_regs; + struct cxl_regs regs; size_t payload_size; struct mutex mbox_mutex; /* Protects device mailbox and firmware */
In preparation for more generic shared functionality across endpoint consumers of core cxl resources, and platform-firmware producers of those resources, rename bus.c to core.c. In addition to the central rendezvous for interleave coordination, the core will also define common routines like CXL register block mapping. Acked-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- Documentation/driver-api/cxl/memory-devices.rst | 6 ++--- drivers/cxl/Makefile | 4 ++- drivers/cxl/bus.c | 29 ---------------------- drivers/cxl/core.c | 30 +++++++++++++++++++++++ 4 files changed, 35 insertions(+), 34 deletions(-) delete mode 100644 drivers/cxl/bus.c create mode 100644 drivers/cxl/core.c diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst index 1bad466f9167..71495ed77069 100644 --- a/Documentation/driver-api/cxl/memory-devices.rst +++ b/Documentation/driver-api/cxl/memory-devices.rst @@ -28,10 +28,10 @@ CXL Memory Device .. kernel-doc:: drivers/cxl/mem.c :internal: -CXL Bus +CXL Core ------- -.. kernel-doc:: drivers/cxl/bus.c - :doc: cxl bus +.. kernel-doc:: drivers/cxl/core.c + :doc: cxl core External Interfaces =================== diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index a314a1891f4d..3808e39dd31f 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 -obj-$(CONFIG_CXL_BUS) += cxl_bus.o +obj-$(CONFIG_CXL_BUS) += cxl_core.o obj-$(CONFIG_CXL_MEM) += cxl_mem.o ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL -cxl_bus-y := bus.o +cxl_core-y := core.o cxl_mem-y := mem.o diff --git a/drivers/cxl/bus.c b/drivers/cxl/bus.c deleted file mode 100644 index 58f74796d525..000000000000 --- a/drivers/cxl/bus.c +++ /dev/null @@ -1,29 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* Copyright(c) 2020 Intel Corporation. All rights reserved. */ -#include <linux/device.h> -#include <linux/module.h> - -/** - * DOC: cxl bus - * - * The CXL bus provides namespace for control devices and a rendezvous - * point for cross-device interleave coordination. - */ -struct bus_type cxl_bus_type = { - .name = "cxl", -}; -EXPORT_SYMBOL_GPL(cxl_bus_type); - -static __init int cxl_bus_init(void) -{ - return bus_register(&cxl_bus_type); -} - -static void cxl_bus_exit(void) -{ - bus_unregister(&cxl_bus_type); -} - -module_init(cxl_bus_init); -module_exit(cxl_bus_exit); -MODULE_LICENSE("GPL v2"); diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c new file mode 100644 index 000000000000..7f8d2034038a --- /dev/null +++ b/drivers/cxl/core.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright(c) 2020 Intel Corporation. All rights reserved. */ +#include <linux/device.h> +#include <linux/module.h> + +/** + * DOC: cxl core + * + * The CXL core provides a sysfs hierarchy for control devices and a rendezvous + * point for cross-device interleave coordination through cxl ports. + */ + +struct bus_type cxl_bus_type = { + .name = "cxl", +}; +EXPORT_SYMBOL_GPL(cxl_bus_type); + +static __init int cxl_core_init(void) +{ + return bus_register(&cxl_bus_type); +} + +static void cxl_core_exit(void) +{ + bus_unregister(&cxl_bus_type); +} + +module_init(cxl_core_init); +module_exit(cxl_core_exit); +MODULE_LICENSE("GPL v2");
While CXL Memory Device endpoints locate the CXL MMIO registers in a PCI BAR, CXL root bridges have their MMIO base address described by platform firmware. Refactor the existing register lookup into a generic facility for endpoints and bridges to share. Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/core.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++- drivers/cxl/cxl.h | 3 +++ drivers/cxl/mem.c | 50 +++++----------------------------------------- 3 files changed, 65 insertions(+), 45 deletions(-) diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c index 7f8d2034038a..2ab467ef9909 100644 --- a/drivers/cxl/core.c +++ b/drivers/cxl/core.c @@ -1,7 +1,8 @@ // SPDX-License-Identifier: GPL-2.0-only -/* Copyright(c) 2020 Intel Corporation. All rights reserved. */ +/* Copyright(c) 2020-2021 Intel Corporation. All rights reserved. */ #include <linux/device.h> #include <linux/module.h> +#include "cxl.h" /** * DOC: cxl core @@ -10,6 +11,60 @@ * point for cross-device interleave coordination through cxl ports. */ +/* + * cxl_setup_device_regs() - Detect CXL Device register blocks + * @dev: Host device of the @base mapping + * @base: mapping of CXL 2.0 8.2.8 CXL Device Register Interface + */ +void cxl_setup_device_regs(struct device *dev, void __iomem *base, + struct cxl_device_regs *regs) +{ + int cap, cap_count; + u64 cap_array; + + *regs = (struct cxl_device_regs) { 0 }; + + cap_array = readq(base + CXLDEV_CAP_ARRAY_OFFSET); + if (FIELD_GET(CXLDEV_CAP_ARRAY_ID_MASK, cap_array) != + CXLDEV_CAP_ARRAY_CAP_ID) + return; + + cap_count = FIELD_GET(CXLDEV_CAP_ARRAY_COUNT_MASK, cap_array); + + for (cap = 1; cap <= cap_count; cap++) { + void __iomem *register_block; + u32 offset; + u16 cap_id; + + cap_id = FIELD_GET(CXLDEV_CAP_HDR_CAP_ID_MASK, + readl(base + cap * 0x10)); + offset = readl(base + cap * 0x10 + 0x4); + register_block = base + offset; + + switch (cap_id) { + case CXLDEV_CAP_CAP_ID_DEVICE_STATUS: + dev_dbg(dev, "found Status capability (0x%x)\n", offset); + regs->status = register_block; + break; + case CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX: + dev_dbg(dev, "found Mailbox capability (0x%x)\n", offset); + regs->mbox = register_block; + break; + case CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX: + dev_dbg(dev, "found Secondary Mailbox capability (0x%x)\n", offset); + break; + case CXLDEV_CAP_CAP_ID_MEMDEV: + dev_dbg(dev, "found Memory Device capability (0x%x)\n", offset); + regs->memdev = register_block; + break; + default: + dev_dbg(dev, "Unknown cap ID: %d (0x%x)\n", cap_id, offset); + break; + } + } +} +EXPORT_SYMBOL_GPL(cxl_setup_device_regs); + struct bus_type cxl_bus_type = { .name = "cxl", }; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 37325e504fb7..cbd29650c4e2 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -67,5 +67,8 @@ struct cxl_regs { }; }; +void cxl_setup_device_regs(struct device *dev, void __iomem *base, + struct cxl_device_regs *regs); + extern struct bus_type cxl_bus_type; #endif /* __CXL_H__ */ diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 6951243d128e..ee55abfa147e 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -865,53 +865,15 @@ static int cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, u16 opcode, static int cxl_mem_setup_regs(struct cxl_mem *cxlm) { struct device *dev = &cxlm->pdev->dev; - int cap, cap_count; - u64 cap_array; + struct cxl_regs *regs = &cxlm->regs; - cap_array = readq(cxlm->base + CXLDEV_CAP_ARRAY_OFFSET); - if (FIELD_GET(CXLDEV_CAP_ARRAY_ID_MASK, cap_array) != - CXLDEV_CAP_ARRAY_CAP_ID) - return -ENODEV; - - cap_count = FIELD_GET(CXLDEV_CAP_ARRAY_COUNT_MASK, cap_array); - - for (cap = 1; cap <= cap_count; cap++) { - void __iomem *register_block; - u32 offset; - u16 cap_id; - - cap_id = FIELD_GET(CXLDEV_CAP_HDR_CAP_ID_MASK, - readl(cxlm->base + cap * 0x10)); - offset = readl(cxlm->base + cap * 0x10 + 0x4); - register_block = cxlm->base + offset; - - switch (cap_id) { - case CXLDEV_CAP_CAP_ID_DEVICE_STATUS: - dev_dbg(dev, "found Status capability (0x%x)\n", offset); - cxlm->regs.status = register_block; - break; - case CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX: - dev_dbg(dev, "found Mailbox capability (0x%x)\n", offset); - cxlm->regs.mbox = register_block; - break; - case CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX: - dev_dbg(dev, "found Secondary Mailbox capability (0x%x)\n", offset); - break; - case CXLDEV_CAP_CAP_ID_MEMDEV: - dev_dbg(dev, "found Memory Device capability (0x%x)\n", offset); - cxlm->regs.memdev = register_block; - break; - default: - dev_dbg(dev, "Unknown cap ID: %d (0x%x)\n", cap_id, offset); - break; - } - } + cxl_setup_device_regs(dev, cxlm->base, ®s->device_regs); - if (!cxlm->regs.status || !cxlm->regs.mbox || !cxlm->regs.memdev) { + if (!regs->status || !regs->mbox || !regs->memdev) { dev_err(dev, "registers not found: %s%s%s\n", - !cxlm->regs.status ? "status " : "", - !cxlm->regs.mbox ? "mbox " : "", - !cxlm->regs.memdev ? "memdev" : ""); + !regs->status ? "status " : "", + !regs->mbox ? "mbox " : "", + !regs->memdev ? "memdev" : ""); return -ENXIO; }
While CXL builds upon the PCI software model for dynamic enumeration and control, a static platform component is required to bootstrap the CXL memory layout. In addition to identifying the host bridges ACPI is responsible for enumerating the CXL memory space that can be addressed by decoders. This is similar to the requirement for ACPI to publish resources reported by _CRS for PCI host bridges. Introduce the cxl_root object as an abstract "port" into the CXL.mem address space described by HDM decoders identified by the ACPI CEDT.CHBS. For now just establish the initial boilerplate and sysfs attributes, to be followed by enumeration of the ports within the host bridge. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/Kconfig | 14 ++ drivers/cxl/Makefile | 2 drivers/cxl/acpi.c | 39 ++++++ drivers/cxl/core.c | 349 ++++++++++++++++++++++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 64 +++++++++ 5 files changed, 468 insertions(+) create mode 100644 drivers/cxl/acpi.c diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 97dc4d751651..fb282af84afd 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -50,4 +50,18 @@ config CXL_MEM_RAW_COMMANDS potential impact to memory currently in use by the kernel. If developing CXL hardware or the driver say Y, otherwise say N. + +config CXL_ACPI + tristate "CXL ACPI: Platform Support" + depends on ACPI + help + Enable support for host managed device memory (HDM) resources + published by a platform's ACPI CXL memory layout description. + See Chapter 9.14.1 CXL Early Discovery Table (CEDT) in the CXL + 2.0 specification. The CXL core consumes these resource to + publish port and address_space objects used to map regions + that represent System RAM, or Persistent Memory regions to be + managed by LIBNVDIMM. + + If unsure say 'm'. endif diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index 3808e39dd31f..f429ca6b59d9 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,7 +1,9 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_CXL_BUS) += cxl_core.o obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL cxl_core-y := core.o cxl_mem-y := mem.o +cxl_acpi-y := acpi.o diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c new file mode 100644 index 000000000000..d54c2d5de730 --- /dev/null +++ b/drivers/cxl/acpi.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */ +#include <linux/platform_device.h> +#include <linux/module.h> +#include <linux/device.h> +#include <linux/kernel.h> +#include <linux/acpi.h> +#include "cxl.h" + +static int cxl_acpi_probe(struct platform_device *pdev) +{ + struct device *dev = &pdev->dev; + struct cxl_root *cxl_root; + + cxl_root = devm_cxl_add_root(dev, NULL, 0); + if (IS_ERR(cxl_root)) + return PTR_ERR(cxl_root); + dev_dbg(dev, "register: %s\n", dev_name(&cxl_root->port.dev)); + + return 0; +} + +static const struct acpi_device_id cxl_acpi_ids[] = { + { "ACPI0017", 0 }, + { "", 0 }, +}; +MODULE_DEVICE_TABLE(acpi, cxl_acpi_ids); + +static struct platform_driver cxl_acpi_driver = { + .probe = cxl_acpi_probe, + .driver = { + .name = KBUILD_MODNAME, + .acpi_match_table = cxl_acpi_ids, + }, +}; + +module_platform_driver(cxl_acpi_driver); +MODULE_LICENSE("GPL v2"); +MODULE_IMPORT_NS(CXL); diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c index 2ab467ef9909..46c3b2588d2f 100644 --- a/drivers/cxl/core.c +++ b/drivers/cxl/core.c @@ -2,6 +2,8 @@ /* Copyright(c) 2020-2021 Intel Corporation. All rights reserved. */ #include <linux/device.h> #include <linux/module.h> +#include <linux/slab.h> +#include <linux/idr.h> #include "cxl.h" /** @@ -11,6 +13,353 @@ * point for cross-device interleave coordination through cxl ports. */ +static DEFINE_IDA(cxl_port_ida); + +static ssize_t devtype_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "%s\n", dev->type->name); +} +static DEVICE_ATTR_RO(devtype); + +static struct attribute *cxl_base_attributes[] = { + &dev_attr_devtype.attr, + NULL, +}; + +static struct attribute_group cxl_base_attribute_group = { + .attrs = cxl_base_attributes, +}; + +static struct cxl_address_space *dev_to_address_space(struct device *dev) +{ + struct cxl_address_space_dev *cxl_asd = to_cxl_address_space(dev); + + return cxl_asd->address_space; +} + +static ssize_t start_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_address_space *space = dev_to_address_space(dev); + + return sysfs_emit(buf, "%#llx\n", space->range.start); +} +static DEVICE_ATTR_RO(start); + +static ssize_t end_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_address_space *space = dev_to_address_space(dev); + + return sysfs_emit(buf, "%#llx\n", space->range.end); +} +static DEVICE_ATTR_RO(end); + +#define CXL_ATTR_SUPPORTS(name, flag) \ +static ssize_t supports_##name##_show( \ + struct device *dev, struct device_attribute *attr, char *buf) \ +{ \ + struct cxl_address_space *space = dev_to_address_space(dev); \ + \ + return sysfs_emit(buf, "%s\n", \ + (space->flags & (flag)) ? "1" : "0"); \ +} \ +static DEVICE_ATTR_RO(supports_##name) + +CXL_ATTR_SUPPORTS(pmem, CXL_ADDRSPACE_PMEM); +CXL_ATTR_SUPPORTS(ram, CXL_ADDRSPACE_RAM); +CXL_ATTR_SUPPORTS(type2, CXL_ADDRSPACE_TYPE2); +CXL_ATTR_SUPPORTS(type3, CXL_ADDRSPACE_TYPE3); + +static struct attribute *cxl_address_space_attributes[] = { + &dev_attr_start.attr, + &dev_attr_end.attr, + &dev_attr_supports_pmem.attr, + &dev_attr_supports_ram.attr, + &dev_attr_supports_type2.attr, + &dev_attr_supports_type3.attr, + NULL, +}; + +static umode_t cxl_address_space_visible(struct kobject *kobj, + struct attribute *a, int n) +{ + struct device *dev = container_of(kobj, struct device, kobj); + struct cxl_address_space *space = dev_to_address_space(dev); + + if (a == &dev_attr_supports_pmem.attr && + !(space->flags & CXL_ADDRSPACE_PMEM)) + return 0; + + if (a == &dev_attr_supports_ram.attr && + !(space->flags & CXL_ADDRSPACE_RAM)) + return 0; + + if (a == &dev_attr_supports_type2.attr && + !(space->flags & CXL_ADDRSPACE_TYPE2)) + return 0; + + if (a == &dev_attr_supports_type3.attr && + !(space->flags & CXL_ADDRSPACE_TYPE3)) + return 0; + + return a->mode; +} + +static struct attribute_group cxl_address_space_attribute_group = { + .attrs = cxl_address_space_attributes, + .is_visible = cxl_address_space_visible, +}; + +static const struct attribute_group *cxl_address_space_attribute_groups[] = { + &cxl_address_space_attribute_group, + &cxl_base_attribute_group, + NULL, +}; + +static void cxl_address_space_release(struct device *dev) +{ + struct cxl_address_space_dev *cxl_asd = to_cxl_address_space(dev); + + remove_resource(&cxl_asd->res); + kfree(cxl_asd); +} + +static const struct device_type cxl_address_space_type = { + .name = "cxl_address_space", + .release = cxl_address_space_release, + .groups = cxl_address_space_attribute_groups, +}; + +struct cxl_address_space_dev *to_cxl_address_space(struct device *dev) +{ + if (dev_WARN_ONCE(dev, dev->type != &cxl_address_space_type, + "not a cxl_address_space device\n")) + return NULL; + return container_of(dev, struct cxl_address_space_dev, dev); +} + +static void cxl_root_release(struct device *dev) +{ + struct cxl_root *cxl_root = to_cxl_root(dev); + + ida_free(&cxl_port_ida, cxl_root->port.id); + kfree(cxl_root); +} + +static ssize_t target_id_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_port *cxl_port = to_cxl_port(dev); + + return sysfs_emit(buf, "%d\n", cxl_port->target_id); +} +static DEVICE_ATTR_RO(target_id); + +static struct attribute *cxl_port_attributes[] = { + &dev_attr_target_id.attr, + NULL, +}; + +static struct attribute_group cxl_port_attribute_group = { + .attrs = cxl_port_attributes, +}; + +static const struct attribute_group *cxl_port_attribute_groups[] = { + &cxl_port_attribute_group, + &cxl_base_attribute_group, + NULL, +}; + +static const struct device_type cxl_root_type = { + .name = "cxl_root", + .release = cxl_root_release, + .groups = cxl_port_attribute_groups, +}; + +struct cxl_root *to_cxl_root(struct device *dev) +{ + if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, + "not a cxl_root device\n")) + return NULL; + return container_of(dev, struct cxl_root, port.dev); +} + +struct cxl_port *to_cxl_port(struct device *dev) +{ + if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, + "not a cxl_port device\n")) + return NULL; + return container_of(dev, struct cxl_port, dev); +} + +static void unregister_dev(void *dev) +{ + device_unregister(dev); +} + +static struct cxl_root *cxl_root_alloc(struct device *parent, + struct cxl_address_space *cxl_space, + int nr_spaces) +{ + struct cxl_root *cxl_root; + struct cxl_port *port; + struct device *dev; + int rc; + + cxl_root = kzalloc(struct_size(cxl_root, address_space, nr_spaces), + GFP_KERNEL); + if (!cxl_root) + return ERR_PTR(-ENOMEM); + + memcpy(cxl_root->address_space, cxl_space, + flex_array_size(cxl_root, address_space, nr_spaces)); + cxl_root->nr_spaces = nr_spaces; + + rc = ida_alloc(&cxl_port_ida, GFP_KERNEL); + if (rc < 0) + goto err; + port = &cxl_root->port; + port->id = rc; + + /* + * Root does not have a cxl_port as its parent and it does not + * have any corresponding component registers it is only a + * logical anchor to the first level of actual ports that decode + * the root address spaces. + */ + port->port_host = parent; + port->target_id = -1; + port->component_regs_phys = -1; + + dev = &port->dev; + device_initialize(dev); + device_set_pm_not_required(dev); + dev->parent = parent; + dev->bus = &cxl_bus_type; + dev->type = &cxl_root_type; + + return cxl_root; + +err: + kfree(cxl_root); + return ERR_PTR(rc); +} + +static struct cxl_address_space_dev * +cxl_address_space_dev_alloc(struct device *parent, + struct cxl_address_space *space) +{ + struct cxl_address_space_dev *cxl_asd; + struct resource *res; + struct device *dev; + int rc; + + cxl_asd = kzalloc(sizeof(*cxl_asd), GFP_KERNEL); + if (!cxl_asd) + return ERR_PTR(-ENOMEM); + + res = &cxl_asd->res; + res->name = "CXL Address Space"; + res->start = space->range.start; + res->end = space->range.end; + res->flags = IORESOURCE_MEM; + + rc = insert_resource(&iomem_resource, res); + if (rc) + goto err; + + cxl_asd->address_space = space; + dev = &cxl_asd->dev; + device_initialize(dev); + device_set_pm_not_required(dev); + dev->parent = parent; + dev->type = &cxl_address_space_type; + + return cxl_asd; + +err: + kfree(cxl_asd); + return ERR_PTR(rc); +} + +static int cxl_address_space_dev_add(struct device *host, + struct cxl_address_space_dev *cxl_asd, + int id) +{ + struct device *dev = &cxl_asd->dev; + int rc; + + rc = dev_set_name(dev, "address_space%d", id); + if (rc) + goto err; + + rc = device_add(dev); + if (rc) + goto err; + + dev_dbg(host, "%s: register %s\n", dev_name(dev->parent), + dev_name(dev)); + + return devm_add_action_or_reset(host, unregister_dev, dev); + +err: + put_device(dev); + return rc; +} + +struct cxl_root *devm_cxl_add_root(struct device *host, + struct cxl_address_space *cxl_space, + int nr_spaces) +{ + struct cxl_root *cxl_root; + struct cxl_port *port; + struct device *dev; + int i, rc; + + cxl_root = cxl_root_alloc(host, cxl_space, nr_spaces); + if (IS_ERR(cxl_root)) + return cxl_root; + + port = &cxl_root->port; + dev = &port->dev; + rc = dev_set_name(dev, "root%d", port->id); + if (rc) + goto err; + + rc = device_add(dev); + if (rc) + goto err; + + rc = devm_add_action_or_reset(host, unregister_dev, dev); + if (rc) + return ERR_PTR(rc); + + for (i = 0; i < nr_spaces; i++) { + struct cxl_address_space *space = &cxl_root->address_space[i]; + struct cxl_address_space_dev *cxl_asd; + + if (!range_len(&space->range)) + continue; + + cxl_asd = cxl_address_space_dev_alloc(dev, space); + if (IS_ERR(cxl_asd)) + return ERR_CAST(cxl_asd); + + rc = cxl_address_space_dev_add(host, cxl_asd, i); + if (rc) + return ERR_PTR(rc); + } + + return cxl_root; + +err: + put_device(dev); + return ERR_PTR(rc); +} +EXPORT_SYMBOL_GPL(devm_cxl_add_root); + /* * cxl_setup_device_regs() - Detect CXL Device register blocks * @dev: Host device of the @base mapping diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index cbd29650c4e2..559f8343fee4 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -70,5 +70,69 @@ struct cxl_regs { void cxl_setup_device_regs(struct device *dev, void __iomem *base, struct cxl_device_regs *regs); +/* + * Address space properties derived from: + * CXL 2.0 8.2.5.12.7 CXL HDM Decoder 0 Control Register + */ +#define CXL_ADDRSPACE_RAM BIT(0) +#define CXL_ADDRSPACE_PMEM BIT(1) +#define CXL_ADDRSPACE_TYPE2 BIT(2) +#define CXL_ADDRSPACE_TYPE3 BIT(3) +#define CXL_ADDRSPACE_MASK GENMASK(3, 0) + +struct cxl_address_space { + struct range range; + int interleave_size; + unsigned long flags; + unsigned long targets; +}; + +struct cxl_address_space_dev { + struct device dev; + struct resource res; + struct cxl_address_space *address_space; +}; + +/** + * struct cxl_port - object representing a root, upstream, or downstream port + * @dev: this port's device + * @port_host: PCI or platform device host of the CXL capability + * @id: id for port device-name + * @target_id: this port's HDM decoder id in the parent port + * @component_regs_phys: component register capability array base address + */ +struct cxl_port { + struct device dev; + struct device *port_host; + int id; + int target_id; + resource_size_t component_regs_phys; +}; + +/* + * struct cxl_root - platform object parent of CXL host bridges + * + * A cxl_root object represents a set of address spaces that are + * interleaved across a set of child host bridges, but never interleaved + * to another cxl_root object. It contains a cxl_port that is a special + * case in that it does not have a parent port and related HDMs, instead + * its decode is derived from the root (platform firmware defined) + * address space description. Not to be confused with CXL Root Ports + * that are the PCIE Root Ports within PCIE Host Bridges that are + * flagged by platform firmware (ACPI0016 on ACPI platforms) as having + * CXL capabilities. + */ +struct cxl_root { + struct cxl_port port; + int nr_spaces; + struct cxl_address_space address_space[]; +}; + +struct cxl_root *to_cxl_root(struct device *dev); +struct cxl_port *to_cxl_port(struct device *dev); +struct cxl_address_space_dev *to_cxl_address_space(struct device *dev); +struct cxl_root *devm_cxl_add_root(struct device *parent, + struct cxl_address_space *cxl_space, + int nr_spaces); extern struct bus_type cxl_bus_type; #endif /* __CXL_H__ */
CONFIG_CXL_BUS is default 'n' as expected for new functionality. When that is enabled do not make the end user hunt for all the expected sub-options to enable. For example CONFIG_CXL_BUS without CONFIG_CXL_MEM is an odd/expert configuration, so is CONFIG_CXL_MEM without CONFIG_CXL_ACPI (on ACPI capable platforms). Default CONFIG_CXL_MEM and CONFIG_CXL_ACPI to CONFIG_CXL_BUS. Acked-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index fb282af84afd..1da7970a5e55 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -15,6 +15,7 @@ if CXL_BUS config CXL_MEM tristate "CXL.mem: Memory Devices" + default CXL_BUS help The CXL.mem protocol allows a device to act as a provider of "System RAM" and/or "Persistent Memory" that is fully coherent @@ -54,6 +55,7 @@ config CXL_MEM_RAW_COMMANDS config CXL_ACPI tristate "CXL ACPI: Platform Support" depends on ACPI + default CXL_BUS help Enable support for host managed device memory (HDM) resources published by a platform's ACPI CXL memory layout description.
Once the cxl_root is established then other ports in the hierarchy can be attached. The cxl_port object, unlike cxl_root that is associated with host bridges, is associated with PCIE Root Ports or PCIE Switch Ports. Add cxl_port instances for all PCIE Root Ports in an ACPI0016 host bridge. The cxl_port instances for PCIE Switch Ports are not included here as those are to be modeled as another service device registered on the pcie_port_bus_type. A sample sysfs topology for a single-host-bridge with single-PCIE/CXL-root-port: /sys/bus/cxl/devices/root0 ├── address_space0 │ ├── devtype │ ├── end │ ├── start │ ├── supports_ram │ ├── supports_type2 │ ├── supports_type3 │ └── uevent ├── address_space1 │ ├── devtype │ ├── end │ ├── start │ ├── supports_pmem │ ├── supports_type2 │ ├── supports_type3 │ └── uevent ├── devtype ├── port1 │ ├── devtype │ ├── host -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 │ ├── port2 │ │ ├── devtype │ │ ├── host -> ../../../../../pci0000:34/0000:34:00.0 │ │ ├── subsystem -> ../../../../../../bus/cxl │ │ ├── target_id │ │ └── uevent │ ├── subsystem -> ../../../../../bus/cxl │ ├── target_id │ └── uevent ├── subsystem -> ../../../../bus/cxl ├── target_id └── uevent Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/acpi.c | 99 +++++++++++++++++++++++++++++++++++++++++++ drivers/cxl/core.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 5 ++ 3 files changed, 224 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index d54c2d5de730..bc2a35ae880b 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -5,18 +5,117 @@ #include <linux/device.h> #include <linux/kernel.h> #include <linux/acpi.h> +#include <linux/pci.h> #include "cxl.h" +static int match_ACPI0016(struct device *dev, const void *host) +{ + struct acpi_device *adev = to_acpi_device(dev); + const char *hid = acpi_device_hid(adev); + + return strcmp(hid, "ACPI0016") == 0; +} + +struct cxl_walk_context { + struct device *dev; + struct pci_bus *root; + struct cxl_port *port; + int error; + int count; +}; + +static int match_add_root_ports(struct pci_dev *pdev, void *data) +{ + struct cxl_walk_context *ctx = data; + struct pci_bus *root_bus = ctx->root; + struct cxl_port *port = ctx->port; + int type = pci_pcie_type(pdev); + struct device *dev = ctx->dev; + resource_size_t cxl_regs_phys; + int target_id = ctx->count; + + if (pdev->bus != root_bus) + return 0; + if (!pci_is_pcie(pdev)) + return 0; + if (type != PCI_EXP_TYPE_ROOT_PORT) + return 0; + + ctx->count++; + + /* TODO walk DVSEC to find component register base */ + cxl_regs_phys = -1; + + port = devm_cxl_add_port(dev, port, &pdev->dev, target_id, + cxl_regs_phys); + if (IS_ERR(port)) { + ctx->error = PTR_ERR(port); + return ctx->error; + } + + dev_dbg(dev, "%s: register: %s\n", dev_name(&pdev->dev), + dev_name(&port->dev)); + + return 0; +} + +/* + * A host bridge may contain one or more root ports. Register each port + * as a child of the cxl_root. + */ +static int cxl_acpi_register_ports(struct device *dev, struct acpi_device *root, + struct cxl_port *port, int idx) +{ + struct acpi_pci_root *pci_root = acpi_pci_find_root(root->handle); + struct cxl_walk_context ctx; + + if (!pci_root) + return -ENXIO; + + /* TODO: fold in CEDT.CHBS retrieval */ + port = devm_cxl_add_port(dev, port, &root->dev, idx, ~0ULL); + if (IS_ERR(port)) + return PTR_ERR(port); + dev_dbg(dev, "%s: register: %s\n", dev_name(&root->dev), + dev_name(&port->dev)); + + ctx = (struct cxl_walk_context) { + .dev = dev, + .root = pci_root->bus, + .port = port, + }; + pci_walk_bus(pci_root->bus, match_add_root_ports, &ctx); + + if (ctx.count == 0) + return -ENODEV; + return ctx.error; +} + static int cxl_acpi_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; + struct acpi_device *adev = ACPI_COMPANION(dev); + struct device *bridge = NULL; struct cxl_root *cxl_root; + int rc, i = 0; cxl_root = devm_cxl_add_root(dev, NULL, 0); if (IS_ERR(cxl_root)) return PTR_ERR(cxl_root); dev_dbg(dev, "register: %s\n", dev_name(&cxl_root->port.dev)); + while (true) { + bridge = bus_find_device(adev->dev.bus, bridge, dev, + match_ACPI0016); + if (!bridge) + break; + + rc = cxl_acpi_register_ports(dev, to_acpi_device(bridge), + &cxl_root->port, i++); + if (rc) + return rc; + } + return 0; } diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c index 46c3b2588d2f..65cd704581bc 100644 --- a/drivers/cxl/core.c +++ b/drivers/cxl/core.c @@ -148,6 +148,15 @@ static void cxl_root_release(struct device *dev) kfree(cxl_root); } +static void cxl_port_release(struct device *dev) +{ + struct cxl_port *port = to_cxl_port(dev); + + ida_free(&cxl_port_ida, port->id); + put_device(port->port_host); + kfree(port); +} + static ssize_t target_id_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -178,6 +187,12 @@ static const struct device_type cxl_root_type = { .groups = cxl_port_attribute_groups, }; +static const struct device_type cxl_port_type = { + .name = "cxl_port", + .release = cxl_port_release, + .groups = cxl_port_attribute_groups, +}; + struct cxl_root *to_cxl_root(struct device *dev) { if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, @@ -188,7 +203,9 @@ struct cxl_root *to_cxl_root(struct device *dev) struct cxl_port *to_cxl_port(struct device *dev) { - if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, + if (dev_WARN_ONCE(dev, + dev->type != &cxl_root_type && + dev->type != &cxl_port_type, "not a cxl_port device\n")) return NULL; return container_of(dev, struct cxl_port, dev); @@ -360,6 +377,108 @@ struct cxl_root *devm_cxl_add_root(struct device *host, } EXPORT_SYMBOL_GPL(devm_cxl_add_root); +static void cxl_unlink_port(void *_port) +{ + struct cxl_port *port = _port; + + sysfs_remove_link(&port->dev.kobj, "host"); +} + +static int devm_cxl_link_port(struct device *dev, struct cxl_port *port) +{ + int rc; + + rc = sysfs_create_link(&port->dev.kobj, &port->port_host->kobj, "host"); + if (rc) + return rc; + return devm_add_action_or_reset(dev, cxl_unlink_port, port); +} + +static struct cxl_port *cxl_port_alloc(struct cxl_port *parent_port, + struct device *port_dev, int target_id, + resource_size_t component_regs_phys) +{ + struct cxl_port *port; + struct device *dev; + int rc; + + if (!port_dev) + return ERR_PTR(-EINVAL); + + port = kzalloc(sizeof(*port), GFP_KERNEL); + if (!port) + return ERR_PTR(-ENOMEM); + + rc = ida_alloc(&cxl_port_ida, GFP_KERNEL); + if (rc < 0) + goto err; + + port->id = rc; + port->target_id = target_id; + port->port_host = get_device(port_dev); + port->component_regs_phys = component_regs_phys; + + dev = &port->dev; + device_initialize(dev); + device_set_pm_not_required(dev); + dev->parent = &parent_port->dev; + dev->bus = &cxl_bus_type; + dev->type = &cxl_port_type; + + return port; + +err: + kfree(port); + return ERR_PTR(rc); +} + +/** + * devm_cxl_add_port() - add a cxl_port to the topology + * @host: devm context / discovery agent + * @parent_port: immediate ancestor towards cxl_root + * @port_host: PCI or platform-firmware device hosting this port + * @target_id: ordinal id relative to other siblings under @parent_port + * @component_regs_phys: CXL component register base address + */ +struct cxl_port *devm_cxl_add_port(struct device *host, + struct cxl_port *parent_port, + struct device *port_host, int target_id, + resource_size_t component_regs_phys) +{ + struct cxl_port *port; + struct device *dev; + int rc; + + port = cxl_port_alloc(parent_port, port_host, target_id, + component_regs_phys); + if (IS_ERR(port)) + return port; + + dev = &port->dev; + rc = dev_set_name(dev, "port%d", port->id); + if (rc) + goto err; + + rc = device_add(dev); + if (rc) + goto err; + + rc = devm_add_action_or_reset(host, unregister_dev, dev); + if (rc) + return ERR_PTR(rc); + + rc = devm_cxl_link_port(host, port); + if (rc) + return ERR_PTR(rc); + + return port; + +err: + put_device(dev); + return ERR_PTR(rc); +} +EXPORT_SYMBOL_GPL(devm_cxl_add_port); + /* * cxl_setup_device_regs() - Detect CXL Device register blocks * @dev: Host device of the @base mapping diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 559f8343fee4..0211f44c95a2 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -134,5 +134,10 @@ struct cxl_address_space_dev *to_cxl_address_space(struct device *dev); struct cxl_root *devm_cxl_add_root(struct device *parent, struct cxl_address_space *cxl_space, int nr_spaces); +struct cxl_port *devm_cxl_add_port(struct device *host, + struct cxl_port *parent_port, + struct device *port_host, int target_id, + resource_size_t component_regs_phys); + extern struct bus_type cxl_bus_type; #endif /* __CXL_H__ */
[debug / to-be-replaced / not-for-upstream] Given ACPICA support is needed before drivers can integrate ACPI functionality add some module parameters as proxies. --- drivers/cxl/acpi.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 79 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index bc2a35ae880b..2a48a728f3e0 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -4,10 +4,84 @@ #include <linux/module.h> #include <linux/device.h> #include <linux/kernel.h> +#include <linux/range.h> #include <linux/acpi.h> #include <linux/pci.h> #include "cxl.h" +/* + * TODO: Replace all of the below module parameters with ACPI CXL + * resource descriptions once ACPICA makes them available. + */ +static unsigned long chbcr[4]; +module_param_named(chbcr0, chbcr[0], ulong, 0400); +module_param_named(chbcr1, chbcr[1], ulong, 0400); +module_param_named(chbcr2, chbcr[2], ulong, 0400); +module_param_named(chbcr3, chbcr[3], ulong, 0400); + +/* TODO: cross-bridge interleave */ +static struct cxl_address_space cxl_space[] = { + [0] = { .range = { 0, -1 }, .targets = 0x1, }, + [1] = { .range = { 0, -1 }, .targets = 0x1, }, + [2] = { .range = { 0, -1 }, .targets = 0x1, }, + [3] = { .range = { 0, -1 }, .targets = 0x1, }, +}; + +static int set_range(const char *val, const struct kernel_param *kp) +{ + unsigned long long size, base; + struct cxl_address_space *space; + unsigned long flags; + char *p; + int rc; + + size = memparse(val, &p); + if (*p != '@') + return -EINVAL; + + base = memparse(p + 1, &p); + if (*p != ':') + return -EINVAL; + + rc = kstrtoul(p + 1, 0, &flags); + if (rc) + return rc; + if (!flags || flags > CXL_ADDRSPACE_MASK) + return rc; + + space = kp->arg; + *space = (struct cxl_address_space) { + .range = { + .start = base, + .end = base + size - 1, + }, + .flags = flags, + }; + + return 0; +} + +static int get_range(char *buf, const struct kernel_param *kp) +{ + struct cxl_address_space *space = kp->arg; + + if (!range_len(&space->range)) + return -EINVAL; + + return sysfs_emit(buf, "%#llx@%#llx :%s%s%s%s\n", + (unsigned long long)range_len(&space->range), + (unsigned long long)space->range.start, + space->flags & CXL_ADDRSPACE_RAM ? " ram" : "", + space->flags & CXL_ADDRSPACE_PMEM ? " pmem" : "", + space->flags & CXL_ADDRSPACE_TYPE2 ? " type2" : "", + space->flags & CXL_ADDRSPACE_TYPE3 ? " type3" : ""); +} + +module_param_call(range0, set_range, get_range, &cxl_space[0], 0400); +module_param_call(range1, set_range, get_range, &cxl_space[1], 0400); +module_param_call(range2, set_range, get_range, &cxl_space[2], 0400); +module_param_call(range3, set_range, get_range, &cxl_space[3], 0400); + static int match_ACPI0016(struct device *dev, const void *host) { struct acpi_device *adev = to_acpi_device(dev); @@ -67,13 +141,16 @@ static int cxl_acpi_register_ports(struct device *dev, struct acpi_device *root, struct cxl_port *port, int idx) { struct acpi_pci_root *pci_root = acpi_pci_find_root(root->handle); + resource_size_t chbcr_base = ~0ULL; struct cxl_walk_context ctx; if (!pci_root) return -ENXIO; /* TODO: fold in CEDT.CHBS retrieval */ - port = devm_cxl_add_port(dev, port, &root->dev, idx, ~0ULL); + if (idx < ARRAY_SIZE(chbcr)) + chbcr_base = chbcr[idx]; + port = devm_cxl_add_port(dev, port, &root->dev, idx, chbcr_base); if (IS_ERR(port)) return PTR_ERR(port); dev_dbg(dev, "%s: register: %s\n", dev_name(&root->dev), @@ -99,7 +176,7 @@ static int cxl_acpi_probe(struct platform_device *pdev) struct cxl_root *cxl_root; int rc, i = 0; - cxl_root = devm_cxl_add_root(dev, NULL, 0); + cxl_root = devm_cxl_add_root(dev, cxl_space, ARRAY_SIZE(cxl_space)); if (IS_ERR(cxl_root)) return PTR_ERR(cxl_root); dev_dbg(dev, "register: %s\n", dev_name(&cxl_root->port.dev));
[-- Attachment #1: Type: text/plain, Size: 2471 bytes --] Hi Dan, I love your patch! Yet something to improve: [auto build test ERROR on a38fd8748464831584a19438cbb3082b5a2dab15] url: https://github.com/0day-ci/linux/commits/Dan-Williams/CXL-Port-Enumeration/20210402-025333 base: a38fd8748464831584a19438cbb3082b5a2dab15 config: ia64-randconfig-r016-20210401 (attached as .config) compiler: ia64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/84c25f52c3c7f8f20988e98eb9947c4eace11927 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Dan-Williams/CXL-Port-Enumeration/20210402-025333 git checkout 84c25f52c3c7f8f20988e98eb9947c4eace11927 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=ia64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from drivers/cxl/core.c:7: >> drivers/cxl/cxl.h:84:15: error: field 'range' has incomplete type 84 | struct range range; | ^~~~~ drivers/cxl/core.c: In function 'devm_cxl_add_root': >> drivers/cxl/core.c:343:8: error: implicit declaration of function 'range_len' [-Werror=implicit-function-declaration] 343 | if (!range_len(&space->range)) | ^~~~~~~~~ drivers/cxl/core.c: In function 'end_show': drivers/cxl/core.c:56:1: error: control reaches end of non-void function [-Werror=return-type] 56 | } | ^ drivers/cxl/core.c: In function 'start_show': drivers/cxl/core.c:47:1: error: control reaches end of non-void function [-Werror=return-type] 47 | } | ^ cc1: some warnings being treated as errors -- In file included from drivers/cxl/acpi.c:8: >> drivers/cxl/cxl.h:84:15: error: field 'range' has incomplete type 84 | struct range range; | ^~~~~ vim +/range +84 drivers/cxl/cxl.h 82 83 struct cxl_address_space { > 84 struct range range; 85 int interleave_size; 86 unsigned long flags; 87 unsigned long targets; 88 }; 89 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 26333 bytes --]
On Thu, 1 Apr 2021 07:30:47 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > In preparation for sharing cxl.h with other generic CXL consumers, > move / consolidate some of the memory device specifics to mem.h. > > Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Hi Dan, Would be good to see something in this patch description saying why you chose to have mem.h rather than push the defines down into mem.c (which from the current code + patch set looks like the more logical thing to do). As a side note, docs for struct cxl_mem need a fix as they cover enabled_commands which at somepoint got shortened to enabled_cmds Jonathan > --- > drivers/cxl/cxl.h | 57 ------------------------------------ > drivers/cxl/mem.c | 25 +--------------- > drivers/cxl/mem.h | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 86 insertions(+), 81 deletions(-) > create mode 100644 drivers/cxl/mem.h > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 6f14838c2d25..2e3bdacb32e7 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -34,62 +34,5 @@ > #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18 > #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20 > > -/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */ > -#define CXLMDEV_STATUS_OFFSET 0x0 > -#define CXLMDEV_DEV_FATAL BIT(0) > -#define CXLMDEV_FW_HALT BIT(1) > -#define CXLMDEV_STATUS_MEDIA_STATUS_MASK GENMASK(3, 2) > -#define CXLMDEV_MS_NOT_READY 0 > -#define CXLMDEV_MS_READY 1 > -#define CXLMDEV_MS_ERROR 2 > -#define CXLMDEV_MS_DISABLED 3 > -#define CXLMDEV_READY(status) \ > - (FIELD_GET(CXLMDEV_STATUS_MEDIA_STATUS_MASK, status) == \ > - CXLMDEV_MS_READY) > -#define CXLMDEV_MBOX_IF_READY BIT(4) > -#define CXLMDEV_RESET_NEEDED_MASK GENMASK(7, 5) > -#define CXLMDEV_RESET_NEEDED_NOT 0 > -#define CXLMDEV_RESET_NEEDED_COLD 1 > -#define CXLMDEV_RESET_NEEDED_WARM 2 > -#define CXLMDEV_RESET_NEEDED_HOT 3 > -#define CXLMDEV_RESET_NEEDED_CXL 4 > -#define CXLMDEV_RESET_NEEDED(status) \ > - (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \ > - CXLMDEV_RESET_NEEDED_NOT) > - > -struct cxl_memdev; > -/** > - * struct cxl_mem - A CXL memory device > - * @pdev: The PCI device associated with this CXL device. > - * @regs: IO mappings to the device's MMIO > - * @status_regs: CXL 2.0 8.2.8.3 Device Status Registers > - * @mbox_regs: CXL 2.0 8.2.8.4 Mailbox Registers > - * @memdev_regs: CXL 2.0 8.2.8.5 Memory Device Registers > - * @payload_size: Size of space for payload > - * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register) > - * @mbox_mutex: Mutex to synchronize mailbox access. > - * @firmware_version: Firmware version for the memory device. > - * @enabled_commands: Hardware commands found enabled in CEL. > - * @pmem_range: Persistent memory capacity information. > - * @ram_range: Volatile memory capacity information. > - */ > -struct cxl_mem { > - struct pci_dev *pdev; > - void __iomem *regs; > - struct cxl_memdev *cxlmd; > - > - void __iomem *status_regs; > - void __iomem *mbox_regs; > - void __iomem *memdev_regs; > - > - size_t payload_size; > - struct mutex mbox_mutex; /* Protects device mailbox and firmware */ > - char firmware_version[0x10]; > - unsigned long *enabled_cmds; > - > - struct range pmem_range; > - struct range ram_range; > -}; > - > extern struct bus_type cxl_bus_type; > #endif /* __CXL_H__ */ > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > index 244cb7d89678..45871ef65152 100644 > --- a/drivers/cxl/mem.c > +++ b/drivers/cxl/mem.c > @@ -12,6 +12,7 @@ > #include <linux/io-64-nonatomic-lo-hi.h> > #include "pci.h" > #include "cxl.h" > +#include "mem.h" > > /** > * DOC: cxl mem > @@ -29,12 +30,6 @@ > * - Handle and manage error conditions. > */ > > -/* > - * An entire PCI topology full of devices should be enough for any > - * config > - */ > -#define CXL_MEM_MAX_DEVS 65536 > - > #define cxl_doorbell_busy(cxlm) \ > (readl((cxlm)->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET) & \ > CXLDEV_MBOX_CTRL_DOORBELL) > @@ -91,24 +86,6 @@ struct mbox_cmd { > #define CXL_MBOX_SUCCESS 0 > }; > > -/** > - * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device > - * @dev: driver core device object > - * @cdev: char dev core object for ioctl operations > - * @cxlm: pointer to the parent device driver data > - * @ops_active: active user of @cxlm in ops handlers > - * @ops_dead: completion when all @cxlm ops users have exited > - * @id: id number of this memdev instance. > - */ > -struct cxl_memdev { > - struct device dev; > - struct cdev cdev; > - struct cxl_mem *cxlm; > - struct percpu_ref ops_active; > - struct completion ops_dead; > - int id; > -}; > - Why move this stuff? As far as I could tell, at the end of this patch set this is still only used within mem.c. > static int cxl_mem_major; > static DEFINE_IDA(cxl_memdev_ida); > static struct dentry *cxl_debugfs; > diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h > new file mode 100644 > index 000000000000..daa9aba0e218 > --- /dev/null > +++ b/drivers/cxl/mem.h > @@ -0,0 +1,85 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* Copyright(c) 2020-2021 Intel Corporation. */ > +#ifndef __CXL_MEM_H__ > +#define __CXL_MEM_H__ > + > +/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */ > +#define CXLMDEV_STATUS_OFFSET 0x0 > +#define CXLMDEV_DEV_FATAL BIT(0) > +#define CXLMDEV_FW_HALT BIT(1) > +#define CXLMDEV_STATUS_MEDIA_STATUS_MASK GENMASK(3, 2) > +#define CXLMDEV_MS_NOT_READY 0 > +#define CXLMDEV_MS_READY 1 > +#define CXLMDEV_MS_ERROR 2 > +#define CXLMDEV_MS_DISABLED 3 > +#define CXLMDEV_READY(status) \ > + (FIELD_GET(CXLMDEV_STATUS_MEDIA_STATUS_MASK, status) == \ > + CXLMDEV_MS_READY) > +#define CXLMDEV_MBOX_IF_READY BIT(4) > +#define CXLMDEV_RESET_NEEDED_MASK GENMASK(7, 5) > +#define CXLMDEV_RESET_NEEDED_NOT 0 > +#define CXLMDEV_RESET_NEEDED_COLD 1 > +#define CXLMDEV_RESET_NEEDED_WARM 2 > +#define CXLMDEV_RESET_NEEDED_HOT 3 > +#define CXLMDEV_RESET_NEEDED_CXL 4 > +#define CXLMDEV_RESET_NEEDED(status) \ > + (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \ > + CXLMDEV_RESET_NEEDED_NOT) > + > +/* > + * An entire PCI topology full of devices should be enough for any > + * config > + */ > +#define CXL_MEM_MAX_DEVS 65536 > + > +/** > + * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device > + * @dev: driver core device object > + * @cdev: char dev core object for ioctl operations > + * @cxlm: pointer to the parent device driver data > + * @ops_active: active user of @cxlm in ops handlers > + * @ops_dead: completion when all @cxlm ops users have exited > + * @id: id number of this memdev instance. > + */ > +struct cxl_memdev { > + struct device dev; > + struct cdev cdev; > + struct cxl_mem *cxlm; > + struct percpu_ref ops_active; > + struct completion ops_dead; > + int id; > +}; > + > +/** > + * struct cxl_mem - A CXL memory device > + * @pdev: The PCI device associated with this CXL device. > + * @regs: IO mappings to the device's MMIO > + * @status_regs: CXL 2.0 8.2.8.3 Device Status Registers > + * @mbox_regs: CXL 2.0 8.2.8.4 Mailbox Registers > + * @memdev_regs: CXL 2.0 8.2.8.5 Memory Device Registers > + * @payload_size: Size of space for payload > + * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register) > + * @mbox_mutex: Mutex to synchronize mailbox access. > + * @firmware_version: Firmware version for the memory device. > + * @enabled_commands: Hardware commands found enabled in CEL. @enabled_cmds: > + * @pmem_range: Persistent memory capacity information. > + * @ram_range: Volatile memory capacity information. > + */ > +struct cxl_mem { > + struct pci_dev *pdev; > + void __iomem *regs; > + struct cxl_memdev *cxlmd; > + > + void __iomem *status_regs; > + void __iomem *mbox_regs; > + void __iomem *memdev_regs; > + > + size_t payload_size; > + struct mutex mbox_mutex; /* Protects device mailbox and firmware */ > + char firmware_version[0x10]; > + unsigned long *enabled_cmds; > + > + struct range pmem_range; > + struct range ram_range; > +}; > +#endif /* __CXL_MEM_H__ */ >
On Thu, 1 Apr 2021 07:31:03 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > While CXL Memory Device endpoints locate the CXL MMIO registers in a PCI > BAR, CXL root bridges have their MMIO base address described by platform > firmware. Refactor the existing register lookup into a generic facility > for endpoints and bridges to share. > > Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Nice to make the docs kernel-doc, but otherwise this is simple and makes sense Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/cxl/core.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++- > drivers/cxl/cxl.h | 3 +++ > drivers/cxl/mem.c | 50 +++++----------------------------------------- > 3 files changed, 65 insertions(+), 45 deletions(-) > > diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c > index 7f8d2034038a..2ab467ef9909 100644 > --- a/drivers/cxl/core.c > +++ b/drivers/cxl/core.c > @@ -1,7 +1,8 @@ > // SPDX-License-Identifier: GPL-2.0-only > -/* Copyright(c) 2020 Intel Corporation. All rights reserved. */ > +/* Copyright(c) 2020-2021 Intel Corporation. All rights reserved. */ > #include <linux/device.h> > #include <linux/module.h> > +#include "cxl.h" > > /** > * DOC: cxl core > @@ -10,6 +11,60 @@ > * point for cross-device interleave coordination through cxl ports. > */ > > +/* > + * cxl_setup_device_regs() - Detect CXL Device register blocks > + * @dev: Host device of the @base mapping > + * @base: mapping of CXL 2.0 8.2.8 CXL Device Register Interface Not much to add to make this kernel-doc. Just the one missing parameter and mark it /** Given it's exported, it would be nice to tidy that up. > + */ > +void cxl_setup_device_regs(struct device *dev, void __iomem *base, > + struct cxl_device_regs *regs) > +{ > + int cap, cap_count; > + u64 cap_array; > + > + *regs = (struct cxl_device_regs) { 0 }; > + > + cap_array = readq(base + CXLDEV_CAP_ARRAY_OFFSET); > + if (FIELD_GET(CXLDEV_CAP_ARRAY_ID_MASK, cap_array) != > + CXLDEV_CAP_ARRAY_CAP_ID) > + return; > + > + cap_count = FIELD_GET(CXLDEV_CAP_ARRAY_COUNT_MASK, cap_array); > + > + for (cap = 1; cap <= cap_count; cap++) { > + void __iomem *register_block; > + u32 offset; > + u16 cap_id; > + > + cap_id = FIELD_GET(CXLDEV_CAP_HDR_CAP_ID_MASK, > + readl(base + cap * 0x10)); > + offset = readl(base + cap * 0x10 + 0x4); > + register_block = base + offset; > + > + switch (cap_id) { > + case CXLDEV_CAP_CAP_ID_DEVICE_STATUS: > + dev_dbg(dev, "found Status capability (0x%x)\n", offset); > + regs->status = register_block; > + break; > + case CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX: > + dev_dbg(dev, "found Mailbox capability (0x%x)\n", offset); > + regs->mbox = register_block; > + break; > + case CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX: > + dev_dbg(dev, "found Secondary Mailbox capability (0x%x)\n", offset); > + break; > + case CXLDEV_CAP_CAP_ID_MEMDEV: > + dev_dbg(dev, "found Memory Device capability (0x%x)\n", offset); > + regs->memdev = register_block; > + break; > + default: > + dev_dbg(dev, "Unknown cap ID: %d (0x%x)\n", cap_id, offset); > + break; > + } > + } > +} > +EXPORT_SYMBOL_GPL(cxl_setup_device_regs); > + > struct bus_type cxl_bus_type = { > .name = "cxl", > }; > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 37325e504fb7..cbd29650c4e2 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -67,5 +67,8 @@ struct cxl_regs { > }; > }; > > +void cxl_setup_device_regs(struct device *dev, void __iomem *base, > + struct cxl_device_regs *regs); > + > extern struct bus_type cxl_bus_type; > #endif /* __CXL_H__ */ > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > index 6951243d128e..ee55abfa147e 100644 > --- a/drivers/cxl/mem.c > +++ b/drivers/cxl/mem.c > @@ -865,53 +865,15 @@ static int cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, u16 opcode, > static int cxl_mem_setup_regs(struct cxl_mem *cxlm) > { > struct device *dev = &cxlm->pdev->dev; > - int cap, cap_count; > - u64 cap_array; > + struct cxl_regs *regs = &cxlm->regs; > > - cap_array = readq(cxlm->base + CXLDEV_CAP_ARRAY_OFFSET); > - if (FIELD_GET(CXLDEV_CAP_ARRAY_ID_MASK, cap_array) != > - CXLDEV_CAP_ARRAY_CAP_ID) > - return -ENODEV; > - > - cap_count = FIELD_GET(CXLDEV_CAP_ARRAY_COUNT_MASK, cap_array); > - > - for (cap = 1; cap <= cap_count; cap++) { > - void __iomem *register_block; > - u32 offset; > - u16 cap_id; > - > - cap_id = FIELD_GET(CXLDEV_CAP_HDR_CAP_ID_MASK, > - readl(cxlm->base + cap * 0x10)); > - offset = readl(cxlm->base + cap * 0x10 + 0x4); > - register_block = cxlm->base + offset; > - > - switch (cap_id) { > - case CXLDEV_CAP_CAP_ID_DEVICE_STATUS: > - dev_dbg(dev, "found Status capability (0x%x)\n", offset); > - cxlm->regs.status = register_block; > - break; > - case CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX: > - dev_dbg(dev, "found Mailbox capability (0x%x)\n", offset); > - cxlm->regs.mbox = register_block; > - break; > - case CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX: > - dev_dbg(dev, "found Secondary Mailbox capability (0x%x)\n", offset); > - break; > - case CXLDEV_CAP_CAP_ID_MEMDEV: > - dev_dbg(dev, "found Memory Device capability (0x%x)\n", offset); > - cxlm->regs.memdev = register_block; > - break; > - default: > - dev_dbg(dev, "Unknown cap ID: %d (0x%x)\n", cap_id, offset); > - break; > - } > - } > + cxl_setup_device_regs(dev, cxlm->base, ®s->device_regs); > > - if (!cxlm->regs.status || !cxlm->regs.mbox || !cxlm->regs.memdev) { > + if (!regs->status || !regs->mbox || !regs->memdev) { > dev_err(dev, "registers not found: %s%s%s\n", > - !cxlm->regs.status ? "status " : "", > - !cxlm->regs.mbox ? "mbox " : "", > - !cxlm->regs.memdev ? "memdev" : ""); > + !regs->status ? "status " : "", > + !regs->mbox ? "mbox " : "", > + !regs->memdev ? "memdev" : ""); > return -ENXIO; > } > >
On Thu, 1 Apr 2021 07:30:53 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > CXL MMIO register blocks are organized by device type and capabilities. > There are Component registers, Device registers (yes, an ambiguous > name), and Memory Device registers (a specific extension of Device > registers). > > It is possible for a given device instance (endpoint or port) to > implement register sets from multiple of the above categories. > > The driver code that enumerates and maps the registers is type specific > so it is useful to have a dedicated type and helpers for each block > type. > > At the same time, once the registers are mapped the origin type does not > matter. It is overly pedantic to reference the register block type in > code that is using the registers. > > In preparation for the endpoint driver to incorporate Component registers > into its MMIO operations reorganize the registers to allow typed > enumeration + mapping, but anonymous usage. With the end state of > 'struct cxl_regs' to be: > > struct cxl_regs { > union { > struct { > CXL_DEVICE_REGS(); > }; > struct cxl_device_regs device_regs; > }; > union { > struct { > CXL_COMPONENT_REGS(); > }; > struct cxl_component_regs component_regs; > }; > }; > > With this arrangement the driver can share component init code with > ports, but when using the registers it can directly reference the > component register block type by name without the 'component_regs' > prefix. > > So, map + enumerate can be shared across drivers of different CXL > classes e.g.: > > void cxl_setup_device_regs(struct device *dev, void __iomem *base, > struct cxl_device_regs *regs); > > void cxl_setup_component_regs(struct device *dev, void __iomem *base, > struct cxl_component_regs *regs); > > ...while inline usage in the driver need not indicate where the > registers came from: > > readl(cxlm->regs.mbox + MBOX_OFFSET); > readl(cxlm->regs.hdm + HDM_OFFSET); > > ...instead of: > > readl(cxlm->regs.device_regs.mbox + MBOX_OFFSET); > readl(cxlm->regs.component_regs.hdm + HDM_OFFSET); > > This complexity of the definition in .h yields improvement in code > readability in .c while maintaining type-safety for organization of > setup code. It prepares the implementation to maintain organization in > the face of CXL devices that compose register interfaces consisting of > multiple types. > > Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> A few minor things inline. > --- > drivers/cxl/cxl.h | 33 +++++++++++++++++++++++++++++++++ > drivers/cxl/mem.c | 44 ++++++++++++++++++++++++-------------------- > drivers/cxl/mem.h | 13 +++++-------- > 3 files changed, 62 insertions(+), 28 deletions(-) > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 2e3bdacb32e7..37325e504fb7 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -34,5 +34,38 @@ > #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18 > #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20 > > +/* See note for 'struct cxl_regs' for the rationale of this organization */ > +#define CXL_DEVICE_REGS() \ > + void __iomem *status; \ > + void __iomem *mbox; \ > + void __iomem *memdev > + > +/** > + * struct cxl_device_regs - Common container of CXL Device register > + * block base pointers > + * @status: CXL 2.0 8.2.8.3 Device Status Registers > + * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers > + * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers kernel-doc script is not going to be happy with documenting fields it can't see + not documenting the CXL_DEVICE_REGS() field it can. I've no idea what the right way to handle this might be. > + */ > +struct cxl_device_regs { > + CXL_DEVICE_REGS(); > +}; > + > +/* > + * Note, the anonymous union organization allows for per > + * register-block-type helper routines, without requiring block-type > + * agnostic code to include the prefix. I.e. > + * cxl_setup_device_regs(&cxlm->regs.dev) vs readl(cxlm->regs.mbox). > + * The specificity reads naturally from left-to-right. > + */ > +struct cxl_regs { > + union { > + struct { > + CXL_DEVICE_REGS(); > + }; > + struct cxl_device_regs device_regs; > + }; > +}; > + > extern struct bus_type cxl_bus_type; > #endif /* __CXL_H__ */ > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > index 45871ef65152..6951243d128e 100644 > --- a/drivers/cxl/mem.c > +++ b/drivers/cxl/mem.c > @@ -31,7 +31,7 @@ > */ > > #define cxl_doorbell_busy(cxlm) \ > - (readl((cxlm)->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET) & \ > + (readl((cxlm)->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET) & \ > CXLDEV_MBOX_CTRL_DOORBELL) > > /* CXL 2.0 - 8.2.8.4 */ > @@ -271,7 +271,7 @@ static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm, > static int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, > struct mbox_cmd *mbox_cmd) > { > - void __iomem *payload = cxlm->mbox_regs + CXLDEV_MBOX_PAYLOAD_OFFSET; > + void __iomem *payload = cxlm->regs.mbox + CXLDEV_MBOX_PAYLOAD_OFFSET; > u64 cmd_reg, status_reg; > size_t out_len; > int rc; > @@ -314,12 +314,12 @@ static int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, > } > > /* #2, #3 */ > - writeq(cmd_reg, cxlm->mbox_regs + CXLDEV_MBOX_CMD_OFFSET); > + writeq(cmd_reg, cxlm->regs.mbox + CXLDEV_MBOX_CMD_OFFSET); > > /* #4 */ > dev_dbg(&cxlm->pdev->dev, "Sending command\n"); > writel(CXLDEV_MBOX_CTRL_DOORBELL, > - cxlm->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET); > + cxlm->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET); > > /* #5 */ > rc = cxl_mem_wait_for_doorbell(cxlm); > @@ -329,7 +329,7 @@ static int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, > } > > /* #6 */ > - status_reg = readq(cxlm->mbox_regs + CXLDEV_MBOX_STATUS_OFFSET); > + status_reg = readq(cxlm->regs.mbox + CXLDEV_MBOX_STATUS_OFFSET); > mbox_cmd->return_code = > FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg); > > @@ -339,7 +339,7 @@ static int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, > } > > /* #7 */ > - cmd_reg = readq(cxlm->mbox_regs + CXLDEV_MBOX_CMD_OFFSET); > + cmd_reg = readq(cxlm->regs.mbox + CXLDEV_MBOX_CMD_OFFSET); > out_len = FIELD_GET(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, cmd_reg); > > /* #8 */ > @@ -400,7 +400,7 @@ static int cxl_mem_mbox_get(struct cxl_mem *cxlm) > goto out; > } > > - md_status = readq(cxlm->memdev_regs + CXLMDEV_STATUS_OFFSET); > + md_status = readq(cxlm->regs.memdev + CXLMDEV_STATUS_OFFSET); > if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) { > dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n"); > rc = -EBUSY; > @@ -868,7 +868,7 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm) > int cap, cap_count; > u64 cap_array; > > - cap_array = readq(cxlm->regs + CXLDEV_CAP_ARRAY_OFFSET); > + cap_array = readq(cxlm->base + CXLDEV_CAP_ARRAY_OFFSET); > if (FIELD_GET(CXLDEV_CAP_ARRAY_ID_MASK, cap_array) != > CXLDEV_CAP_ARRAY_CAP_ID) > return -ENODEV; > @@ -881,25 +881,25 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm) > u16 cap_id; > > cap_id = FIELD_GET(CXLDEV_CAP_HDR_CAP_ID_MASK, > - readl(cxlm->regs + cap * 0x10)); > - offset = readl(cxlm->regs + cap * 0x10 + 0x4); > - register_block = cxlm->regs + offset; > + readl(cxlm->base + cap * 0x10)); > + offset = readl(cxlm->base + cap * 0x10 + 0x4); > + register_block = cxlm->base + offset; > > switch (cap_id) { > case CXLDEV_CAP_CAP_ID_DEVICE_STATUS: > dev_dbg(dev, "found Status capability (0x%x)\n", offset); > - cxlm->status_regs = register_block; > + cxlm->regs.status = register_block; > break; > case CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX: > dev_dbg(dev, "found Mailbox capability (0x%x)\n", offset); > - cxlm->mbox_regs = register_block; > + cxlm->regs.mbox = register_block; > break; > case CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX: > dev_dbg(dev, "found Secondary Mailbox capability (0x%x)\n", offset); > break; > case CXLDEV_CAP_CAP_ID_MEMDEV: > dev_dbg(dev, "found Memory Device capability (0x%x)\n", offset); > - cxlm->memdev_regs = register_block; > + cxlm->regs.memdev = register_block; > break; > default: > dev_dbg(dev, "Unknown cap ID: %d (0x%x)\n", cap_id, offset); > @@ -907,11 +907,11 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm) > } > } > > - if (!cxlm->status_regs || !cxlm->mbox_regs || !cxlm->memdev_regs) { > + if (!cxlm->regs.status || !cxlm->regs.mbox || !cxlm->regs.memdev) { > dev_err(dev, "registers not found: %s%s%s\n", > - !cxlm->status_regs ? "status " : "", > - !cxlm->mbox_regs ? "mbox " : "", > - !cxlm->memdev_regs ? "memdev" : ""); > + !cxlm->regs.status ? "status " : "", > + !cxlm->regs.mbox ? "mbox " : "", > + !cxlm->regs.memdev ? "memdev" : ""); > return -ENXIO; > } > > @@ -920,7 +920,7 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm) > > static int cxl_mem_setup_mailbox(struct cxl_mem *cxlm) > { > - const int cap = readl(cxlm->mbox_regs + CXLDEV_MBOX_CAPS_OFFSET); > + const int cap = readl(cxlm->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET); > > cxlm->payload_size = > 1 << FIELD_GET(CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK, cap); > @@ -980,7 +980,7 @@ static struct cxl_mem *cxl_mem_create(struct pci_dev *pdev, u32 reg_lo, > > mutex_init(&cxlm->mbox_mutex); > cxlm->pdev = pdev; > - cxlm->regs = regs + offset; > + cxlm->base = regs + offset; > cxlm->enabled_cmds = > devm_kmalloc_array(dev, BITS_TO_LONGS(cxl_cmd_count), > sizeof(unsigned long), > @@ -1495,6 +1495,10 @@ static __init int cxl_mem_init(void) > dev_t devt; > int rc; > > + /* Double check the anonymous union trickery in struct cxl_regs */ > + BUILD_BUG_ON(offsetof(struct cxl_regs, memdev) != > + offsetof(struct cxl_regs, device_regs.memdev)); > + > rc = alloc_chrdev_region(&devt, 0, CXL_MEM_MAX_DEVS, "cxl"); > if (rc) > return rc; > diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h > index daa9aba0e218..c247cf9c71af 100644 > --- a/drivers/cxl/mem.h > +++ b/drivers/cxl/mem.h > @@ -53,10 +53,9 @@ struct cxl_memdev { > /** > * struct cxl_mem - A CXL memory device > * @pdev: The PCI device associated with this CXL device. > - * @regs: IO mappings to the device's MMIO > - * @status_regs: CXL 2.0 8.2.8.3 Device Status Registers > - * @mbox_regs: CXL 2.0 8.2.8.4 Mailbox Registers > - * @memdev_regs: CXL 2.0 8.2.8.5 Memory Device Registers > + * @base: IO mappings to the device's MMIO > + * @cxlmd: Logical memory device chardev / interface Unrelated missing docs fix? > + * @regs: Parsed register blocks > * @payload_size: Size of space for payload > * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register) > * @mbox_mutex: Mutex to synchronize mailbox access. > @@ -67,12 +66,10 @@ struct cxl_memdev { > */ > struct cxl_mem { > struct pci_dev *pdev; > - void __iomem *regs; > + void __iomem *base; Whilst I have no problem with the rename and fact you want to free it up for other uses, perhaps call it out in the patch description? > struct cxl_memdev *cxlmd; > > - void __iomem *status_regs; > - void __iomem *mbox_regs; > - void __iomem *memdev_regs; > + struct cxl_regs regs; > > size_t payload_size; > struct mutex mbox_mutex; /* Protects device mailbox and firmware */ >
On Thu, 1 Apr 2021 07:31:09 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > While CXL builds upon the PCI software model for dynamic enumeration and > control, a static platform component is required to bootstrap the CXL > memory layout. In addition to identifying the host bridges ACPI is > responsible for enumerating the CXL memory space that can be addressed > by decoders. This is similar to the requirement for ACPI to publish > resources reported by _CRS for PCI host bridges. > > Introduce the cxl_root object as an abstract "port" into the CXL.mem > address space described by HDM decoders identified by the ACPI > CEDT.CHBS. > > For now just establish the initial boilerplate and sysfs attributes, to > be followed by enumeration of the ports within the host bridge. > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> A few minor comments inline. > --- > drivers/cxl/Kconfig | 14 ++ > drivers/cxl/Makefile | 2 > drivers/cxl/acpi.c | 39 ++++++ > drivers/cxl/core.c | 349 ++++++++++++++++++++++++++++++++++++++++++++++++++ > drivers/cxl/cxl.h | 64 +++++++++ > 5 files changed, 468 insertions(+) > create mode 100644 drivers/cxl/acpi.c > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > index 97dc4d751651..fb282af84afd 100644 > --- a/drivers/cxl/Kconfig > +++ b/drivers/cxl/Kconfig > @@ -50,4 +50,18 @@ config CXL_MEM_RAW_COMMANDS > potential impact to memory currently in use by the kernel. > > If developing CXL hardware or the driver say Y, otherwise say N. > + > +config CXL_ACPI > + tristate "CXL ACPI: Platform Support" > + depends on ACPI > + help > + Enable support for host managed device memory (HDM) resources > + published by a platform's ACPI CXL memory layout description. > + See Chapter 9.14.1 CXL Early Discovery Table (CEDT) in the CXL > + 2.0 specification. The CXL core consumes these resource to > + publish port and address_space objects used to map regions > + that represent System RAM, or Persistent Memory regions to be > + managed by LIBNVDIMM. > + > + If unsure say 'm'. > endif > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile > index 3808e39dd31f..f429ca6b59d9 100644 > --- a/drivers/cxl/Makefile > +++ b/drivers/cxl/Makefile > @@ -1,7 +1,9 @@ > # SPDX-License-Identifier: GPL-2.0 > obj-$(CONFIG_CXL_BUS) += cxl_core.o > obj-$(CONFIG_CXL_MEM) += cxl_mem.o > +obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o > > ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL > cxl_core-y := core.o > cxl_mem-y := mem.o > +cxl_acpi-y := acpi.o > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > new file mode 100644 > index 000000000000..d54c2d5de730 > --- /dev/null > +++ b/drivers/cxl/acpi.c > @@ -0,0 +1,39 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */ > +#include <linux/platform_device.h> > +#include <linux/module.h> > +#include <linux/device.h> > +#include <linux/kernel.h> > +#include <linux/acpi.h> swap acpi.h that for mod_devicetable.h unless this is going to need acpi.h later for something else. > +#include "cxl.h" > + > +static int cxl_acpi_probe(struct platform_device *pdev) > +{ > + struct device *dev = &pdev->dev; > + struct cxl_root *cxl_root; > + > + cxl_root = devm_cxl_add_root(dev, NULL, 0); > + if (IS_ERR(cxl_root)) > + return PTR_ERR(cxl_root); > + dev_dbg(dev, "register: %s\n", dev_name(&cxl_root->port.dev)); > + > + return 0; > +} > + > +static const struct acpi_device_id cxl_acpi_ids[] = { > + { "ACPI0017", 0 }, > + { "", 0 }, > +}; > +MODULE_DEVICE_TABLE(acpi, cxl_acpi_ids); > + > +static struct platform_driver cxl_acpi_driver = { > + .probe = cxl_acpi_probe, > + .driver = { > + .name = KBUILD_MODNAME, > + .acpi_match_table = cxl_acpi_ids, > + }, > +}; > + > +module_platform_driver(cxl_acpi_driver); > +MODULE_LICENSE("GPL v2"); > +MODULE_IMPORT_NS(CXL); > diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c > index 2ab467ef9909..46c3b2588d2f 100644 > --- a/drivers/cxl/core.c > +++ b/drivers/cxl/core.c > @@ -2,6 +2,8 @@ > /* Copyright(c) 2020-2021 Intel Corporation. All rights reserved. */ > #include <linux/device.h> > #include <linux/module.h> > +#include <linux/slab.h> > +#include <linux/idr.h> > #include "cxl.h" > > /** > @@ -11,6 +13,353 @@ > * point for cross-device interleave coordination through cxl ports. > */ > > +static DEFINE_IDA(cxl_port_ida); > + > +static ssize_t devtype_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + return sysfs_emit(buf, "%s\n", dev->type->name); > +} > +static DEVICE_ATTR_RO(devtype); > + > +static struct attribute *cxl_base_attributes[] = { > + &dev_attr_devtype.attr, > + NULL, > +}; > + > +static struct attribute_group cxl_base_attribute_group = { > + .attrs = cxl_base_attributes, > +}; > + > +static struct cxl_address_space *dev_to_address_space(struct device *dev) > +{ > + struct cxl_address_space_dev *cxl_asd = to_cxl_address_space(dev); > + > + return cxl_asd->address_space; > +} > + > +static ssize_t start_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct cxl_address_space *space = dev_to_address_space(dev); > + > + return sysfs_emit(buf, "%#llx\n", space->range.start); > +} > +static DEVICE_ATTR_RO(start); > + > +static ssize_t end_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct cxl_address_space *space = dev_to_address_space(dev); > + > + return sysfs_emit(buf, "%#llx\n", space->range.end); > +} > +static DEVICE_ATTR_RO(end); > + > +#define CXL_ATTR_SUPPORTS(name, flag) \ > +static ssize_t supports_##name##_show( \ > + struct device *dev, struct device_attribute *attr, char *buf) \ > +{ \ > + struct cxl_address_space *space = dev_to_address_space(dev); \ > + \ > + return sysfs_emit(buf, "%s\n", \ > + (space->flags & (flag)) ? "1" : "0"); \ > +} \ > +static DEVICE_ATTR_RO(supports_##name) > + > +CXL_ATTR_SUPPORTS(pmem, CXL_ADDRSPACE_PMEM); > +CXL_ATTR_SUPPORTS(ram, CXL_ADDRSPACE_RAM); > +CXL_ATTR_SUPPORTS(type2, CXL_ADDRSPACE_TYPE2); > +CXL_ATTR_SUPPORTS(type3, CXL_ADDRSPACE_TYPE3); > + > +static struct attribute *cxl_address_space_attributes[] = { > + &dev_attr_start.attr, > + &dev_attr_end.attr, > + &dev_attr_supports_pmem.attr, > + &dev_attr_supports_ram.attr, > + &dev_attr_supports_type2.attr, > + &dev_attr_supports_type3.attr, > + NULL, > +}; > + > +static umode_t cxl_address_space_visible(struct kobject *kobj, > + struct attribute *a, int n) > +{ > + struct device *dev = container_of(kobj, struct device, kobj); > + struct cxl_address_space *space = dev_to_address_space(dev); > + > + if (a == &dev_attr_supports_pmem.attr && > + !(space->flags & CXL_ADDRSPACE_PMEM)) > + return 0; > + > + if (a == &dev_attr_supports_ram.attr && > + !(space->flags & CXL_ADDRSPACE_RAM)) > + return 0; > + > + if (a == &dev_attr_supports_type2.attr && > + !(space->flags & CXL_ADDRSPACE_TYPE2)) > + return 0; > + > + if (a == &dev_attr_supports_type3.attr && > + !(space->flags & CXL_ADDRSPACE_TYPE3)) > + return 0; > + > + return a->mode; > +} > + > +static struct attribute_group cxl_address_space_attribute_group = { > + .attrs = cxl_address_space_attributes, > + .is_visible = cxl_address_space_visible, > +}; > + > +static const struct attribute_group *cxl_address_space_attribute_groups[] = { > + &cxl_address_space_attribute_group, > + &cxl_base_attribute_group, > + NULL, > +}; > + > +static void cxl_address_space_release(struct device *dev) > +{ > + struct cxl_address_space_dev *cxl_asd = to_cxl_address_space(dev); > + > + remove_resource(&cxl_asd->res); > + kfree(cxl_asd); > +} > + > +static const struct device_type cxl_address_space_type = { > + .name = "cxl_address_space", > + .release = cxl_address_space_release, > + .groups = cxl_address_space_attribute_groups, > +}; > + > +struct cxl_address_space_dev *to_cxl_address_space(struct device *dev) > +{ > + if (dev_WARN_ONCE(dev, dev->type != &cxl_address_space_type, > + "not a cxl_address_space device\n")) > + return NULL; > + return container_of(dev, struct cxl_address_space_dev, dev); > +} > + > +static void cxl_root_release(struct device *dev) > +{ > + struct cxl_root *cxl_root = to_cxl_root(dev); > + > + ida_free(&cxl_port_ida, cxl_root->port.id); > + kfree(cxl_root); > +} > + > +static ssize_t target_id_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct cxl_port *cxl_port = to_cxl_port(dev); > + > + return sysfs_emit(buf, "%d\n", cxl_port->target_id); > +} > +static DEVICE_ATTR_RO(target_id); > + > +static struct attribute *cxl_port_attributes[] = { > + &dev_attr_target_id.attr, > + NULL, > +}; > + > +static struct attribute_group cxl_port_attribute_group = { > + .attrs = cxl_port_attributes, > +}; > + > +static const struct attribute_group *cxl_port_attribute_groups[] = { > + &cxl_port_attribute_group, > + &cxl_base_attribute_group, > + NULL, > +}; > + > +static const struct device_type cxl_root_type = { > + .name = "cxl_root", > + .release = cxl_root_release, > + .groups = cxl_port_attribute_groups, > +}; > + > +struct cxl_root *to_cxl_root(struct device *dev) > +{ > + if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, > + "not a cxl_root device\n")) > + return NULL; > + return container_of(dev, struct cxl_root, port.dev); > +} > + > +struct cxl_port *to_cxl_port(struct device *dev) > +{ > + if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, > + "not a cxl_port device\n")) > + return NULL; > + return container_of(dev, struct cxl_port, dev); > +} > + > +static void unregister_dev(void *dev) > +{ > + device_unregister(dev); > +} > + > +static struct cxl_root *cxl_root_alloc(struct device *parent, > + struct cxl_address_space *cxl_space, > + int nr_spaces) > +{ > + struct cxl_root *cxl_root; > + struct cxl_port *port; > + struct device *dev; > + int rc; > + > + cxl_root = kzalloc(struct_size(cxl_root, address_space, nr_spaces), > + GFP_KERNEL); > + if (!cxl_root) > + return ERR_PTR(-ENOMEM); > + > + memcpy(cxl_root->address_space, cxl_space, > + flex_array_size(cxl_root, address_space, nr_spaces)); > + cxl_root->nr_spaces = nr_spaces; > + > + rc = ida_alloc(&cxl_port_ida, GFP_KERNEL); > + if (rc < 0) > + goto err; > + port = &cxl_root->port; > + port->id = rc; > + > + /* > + * Root does not have a cxl_port as its parent and it does not > + * have any corresponding component registers it is only a have any corresponding component registers; it is only a .. or you could use two sentences > + * logical anchor to the first level of actual ports that decode > + * the root address spaces. > + */ > + port->port_host = parent; > + port->target_id = -1; > + port->component_regs_phys = -1; > + > + dev = &port->dev; > + device_initialize(dev); > + device_set_pm_not_required(dev); > + dev->parent = parent; > + dev->bus = &cxl_bus_type; > + dev->type = &cxl_root_type; > + > + return cxl_root; > + > +err: > + kfree(cxl_root); > + return ERR_PTR(rc); > +} > + > +static struct cxl_address_space_dev * > +cxl_address_space_dev_alloc(struct device *parent, > + struct cxl_address_space *space) > +{ > + struct cxl_address_space_dev *cxl_asd; > + struct resource *res; > + struct device *dev; > + int rc; > + > + cxl_asd = kzalloc(sizeof(*cxl_asd), GFP_KERNEL); > + if (!cxl_asd) > + return ERR_PTR(-ENOMEM); > + > + res = &cxl_asd->res; > + res->name = "CXL Address Space"; > + res->start = space->range.start; > + res->end = space->range.end; > + res->flags = IORESOURCE_MEM; > + > + rc = insert_resource(&iomem_resource, res); > + if (rc) > + goto err; > + > + cxl_asd->address_space = space; > + dev = &cxl_asd->dev; > + device_initialize(dev); > + device_set_pm_not_required(dev); > + dev->parent = parent; > + dev->type = &cxl_address_space_type; > + > + return cxl_asd; > + > +err: > + kfree(cxl_asd); > + return ERR_PTR(rc); > +} > + > +static int cxl_address_space_dev_add(struct device *host, > + struct cxl_address_space_dev *cxl_asd, > + int id) > +{ > + struct device *dev = &cxl_asd->dev; > + int rc; > + > + rc = dev_set_name(dev, "address_space%d", id); > + if (rc) > + goto err; > + > + rc = device_add(dev); > + if (rc) > + goto err; > + > + dev_dbg(host, "%s: register %s\n", dev_name(dev->parent), > + dev_name(dev)); > + > + return devm_add_action_or_reset(host, unregister_dev, dev); > + > +err: > + put_device(dev); This is unusual. The error handling is undoing something this function wasn't responsible for. See below for suggested resolution. > + return rc; > +} > + > +struct cxl_root *devm_cxl_add_root(struct device *host, > + struct cxl_address_space *cxl_space, > + int nr_spaces) > +{ > + struct cxl_root *cxl_root; > + struct cxl_port *port; > + struct device *dev; > + int i, rc; > + > + cxl_root = cxl_root_alloc(host, cxl_space, nr_spaces); > + if (IS_ERR(cxl_root)) > + return cxl_root; > + > + port = &cxl_root->port; > + dev = &port->dev; > + rc = dev_set_name(dev, "root%d", port->id); > + if (rc) > + goto err; > + > + rc = device_add(dev); > + if (rc) > + goto err; > + > + rc = devm_add_action_or_reset(host, unregister_dev, dev); > + if (rc) > + return ERR_PTR(rc); > + > + for (i = 0; i < nr_spaces; i++) { > + struct cxl_address_space *space = &cxl_root->address_space[i]; > + struct cxl_address_space_dev *cxl_asd; > + > + if (!range_len(&space->range)) > + continue; > + > + cxl_asd = cxl_address_space_dev_alloc(dev, space); > + if (IS_ERR(cxl_asd)) > + return ERR_CAST(cxl_asd); > + Nothing is done between the dev_alloc() and the dev_add() and this is currently in the odd position of doing put_device() in the error path of *dev_add() when it wasn't responsible for getting the reference it is putting, dev_alloc() did that. That suggests to me that we can clean up the oddity by just combining cxl_address_space_dev_alloc() and cxl_adress_space_dev_add() into one alloc_and_add() function (with a better name) > + rc = cxl_address_space_dev_add(host, cxl_asd, i); Lifetime management here seems overly complex. Why not use host for both the alloc and add() devm calls? I guess there is a good reason though so good to have a comment here saying what it is. > + if (rc) > + return ERR_PTR(rc); > + } > + > + return cxl_root; > + > +err: > + put_device(dev); > + return ERR_PTR(rc); > +} > +EXPORT_SYMBOL_GPL(devm_cxl_add_root); > + > /* > * cxl_setup_device_regs() - Detect CXL Device register blocks > * @dev: Host device of the @base mapping > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index cbd29650c4e2..559f8343fee4 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -70,5 +70,69 @@ struct cxl_regs { > void cxl_setup_device_regs(struct device *dev, void __iomem *base, > struct cxl_device_regs *regs); > > +/* > + * Address space properties derived from: > + * CXL 2.0 8.2.5.12.7 CXL HDM Decoder 0 Control Register > + */ > +#define CXL_ADDRSPACE_RAM BIT(0) > +#define CXL_ADDRSPACE_PMEM BIT(1) > +#define CXL_ADDRSPACE_TYPE2 BIT(2) > +#define CXL_ADDRSPACE_TYPE3 BIT(3) > +#define CXL_ADDRSPACE_MASK GENMASK(3, 0) > + > +struct cxl_address_space { > + struct range range; > + int interleave_size; > + unsigned long flags; > + unsigned long targets; > +}; > + > +struct cxl_address_space_dev { > + struct device dev; > + struct resource res; > + struct cxl_address_space *address_space; > +}; > + > +/** > + * struct cxl_port - object representing a root, upstream, or downstream port > + * @dev: this port's device > + * @port_host: PCI or platform device host of the CXL capability > + * @id: id for port device-name > + * @target_id: this port's HDM decoder id in the parent port > + * @component_regs_phys: component register capability array base address > + */ > +struct cxl_port { > + struct device dev; > + struct device *port_host; > + int id; > + int target_id; > + resource_size_t component_regs_phys; > +}; > + > +/* > + * struct cxl_root - platform object parent of CXL host bridges > + * > + * A cxl_root object represents a set of address spaces that are > + * interleaved across a set of child host bridges, but never interleaved > + * to another cxl_root object. It contains a cxl_port that is a special > + * case in that it does not have a parent port and related HDMs, instead > + * its decode is derived from the root (platform firmware defined) > + * address space description. Not to be confused with CXL Root Ports > + * that are the PCIE Root Ports within PCIE Host Bridges that are > + * flagged by platform firmware (ACPI0016 on ACPI platforms) as having > + * CXL capabilities. > + */ > +struct cxl_root { > + struct cxl_port port; > + int nr_spaces; > + struct cxl_address_space address_space[]; > +}; > + > +struct cxl_root *to_cxl_root(struct device *dev); > +struct cxl_port *to_cxl_port(struct device *dev); > +struct cxl_address_space_dev *to_cxl_address_space(struct device *dev); > +struct cxl_root *devm_cxl_add_root(struct device *parent, > + struct cxl_address_space *cxl_space, > + int nr_spaces); > extern struct bus_type cxl_bus_type; > #endif /* __CXL_H__ */ >
On Thu, 1 Apr 2021 07:31:20 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > Once the cxl_root is established then other ports in the hierarchy can > be attached. The cxl_port object, unlike cxl_root that is associated > with host bridges, is associated with PCIE Root Ports or PCIE Switch > Ports. Add cxl_port instances for all PCIE Root Ports in an ACPI0016 > host bridge. The cxl_port instances for PCIE Switch Ports are not > included here as those are to be modeled as another service device > registered on the pcie_port_bus_type. Good to give a bit of description of what port2 represents vs port1. > > A sample sysfs topology for a single-host-bridge with > single-PCIE/CXL-root-port: > > /sys/bus/cxl/devices/root0 > ├── address_space0 > │ ├── devtype > │ ├── end > │ ├── start > │ ├── supports_ram > │ ├── supports_type2 > │ ├── supports_type3 > │ └── uevent > ├── address_space1 > │ ├── devtype > │ ├── end > │ ├── start > │ ├── supports_pmem > │ ├── supports_type2 > │ ├── supports_type3 > │ └── uevent > ├── devtype > ├── port1 > │ ├── devtype > │ ├── host -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 > │ ├── port2 > │ │ ├── devtype > │ │ ├── host -> ../../../../../pci0000:34/0000:34:00.0 > │ │ ├── subsystem -> ../../../../../../bus/cxl > │ │ ├── target_id > │ │ └── uevent > │ ├── subsystem -> ../../../../../bus/cxl > │ ├── target_id > │ └── uevent > ├── subsystem -> ../../../../bus/cxl > ├── target_id > └── uevent > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/acpi.c | 99 +++++++++++++++++++++++++++++++++++++++++++ > drivers/cxl/core.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > drivers/cxl/cxl.h | 5 ++ > 3 files changed, 224 insertions(+), 1 deletion(-) > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index d54c2d5de730..bc2a35ae880b 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -5,18 +5,117 @@ > #include <linux/device.h> > #include <linux/kernel.h> > #include <linux/acpi.h> > +#include <linux/pci.h> > #include "cxl.h" > > +static int match_ACPI0016(struct device *dev, const void *host) > +{ > + struct acpi_device *adev = to_acpi_device(dev); > + const char *hid = acpi_device_hid(adev); > + > + return strcmp(hid, "ACPI0016") == 0; > +} > + > +struct cxl_walk_context { > + struct device *dev; > + struct pci_bus *root; > + struct cxl_port *port; > + int error; > + int count; > +}; > + > +static int match_add_root_ports(struct pci_dev *pdev, void *data) > +{ > + struct cxl_walk_context *ctx = data; > + struct pci_bus *root_bus = ctx->root; > + struct cxl_port *port = ctx->port; > + int type = pci_pcie_type(pdev); > + struct device *dev = ctx->dev; > + resource_size_t cxl_regs_phys; > + int target_id = ctx->count; > + > + if (pdev->bus != root_bus) > + return 0; > + if (!pci_is_pcie(pdev)) > + return 0; > + if (type != PCI_EXP_TYPE_ROOT_PORT) > + return 0; > + > + ctx->count++; > + > + /* TODO walk DVSEC to find component register base */ > + cxl_regs_phys = -1; > + > + port = devm_cxl_add_port(dev, port, &pdev->dev, target_id, > + cxl_regs_phys); > + if (IS_ERR(port)) { > + ctx->error = PTR_ERR(port); > + return ctx->error; > + } > + > + dev_dbg(dev, "%s: register: %s\n", dev_name(&pdev->dev), > + dev_name(&port->dev)); > + > + return 0; > +} > + > +/* > + * A host bridge may contain one or more root ports. Register each port > + * as a child of the cxl_root. > + */ > +static int cxl_acpi_register_ports(struct device *dev, struct acpi_device *root, > + struct cxl_port *port, int idx) > +{ > + struct acpi_pci_root *pci_root = acpi_pci_find_root(root->handle); > + struct cxl_walk_context ctx; > + > + if (!pci_root) > + return -ENXIO; > + > + /* TODO: fold in CEDT.CHBS retrieval */ > + port = devm_cxl_add_port(dev, port, &root->dev, idx, ~0ULL); > + if (IS_ERR(port)) > + return PTR_ERR(port); > + dev_dbg(dev, "%s: register: %s\n", dev_name(&root->dev), > + dev_name(&port->dev)); > + > + ctx = (struct cxl_walk_context) { > + .dev = dev, > + .root = pci_root->bus, > + .port = port, > + }; > + pci_walk_bus(pci_root->bus, match_add_root_ports, &ctx); > + > + if (ctx.count == 0) > + return -ENODEV; > + return ctx.error; > +} > + > static int cxl_acpi_probe(struct platform_device *pdev) > { > struct device *dev = &pdev->dev; > + struct acpi_device *adev = ACPI_COMPANION(dev); > + struct device *bridge = NULL; > struct cxl_root *cxl_root; > + int rc, i = 0; > > cxl_root = devm_cxl_add_root(dev, NULL, 0); > if (IS_ERR(cxl_root)) > return PTR_ERR(cxl_root); > dev_dbg(dev, "register: %s\n", dev_name(&cxl_root->port.dev)); > > + while (true) { > + bridge = bus_find_device(adev->dev.bus, bridge, dev, > + match_ACPI0016); > + if (!bridge) > + break; > + > + rc = cxl_acpi_register_ports(dev, to_acpi_device(bridge), > + &cxl_root->port, i++); > + if (rc) > + return rc; > + } > + > return 0; > } > > diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c > index 46c3b2588d2f..65cd704581bc 100644 > --- a/drivers/cxl/core.c > +++ b/drivers/cxl/core.c > @@ -148,6 +148,15 @@ static void cxl_root_release(struct device *dev) > kfree(cxl_root); > } > > +static void cxl_port_release(struct device *dev) > +{ > + struct cxl_port *port = to_cxl_port(dev); > + > + ida_free(&cxl_port_ida, port->id); > + put_device(port->port_host); > + kfree(port); > +} > + > static ssize_t target_id_show(struct device *dev, struct device_attribute *attr, > char *buf) > { > @@ -178,6 +187,12 @@ static const struct device_type cxl_root_type = { > .groups = cxl_port_attribute_groups, > }; > > +static const struct device_type cxl_port_type = { > + .name = "cxl_port", > + .release = cxl_port_release, > + .groups = cxl_port_attribute_groups, > +}; > + > struct cxl_root *to_cxl_root(struct device *dev) > { > if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, > @@ -188,7 +203,9 @@ struct cxl_root *to_cxl_root(struct device *dev) > > struct cxl_port *to_cxl_port(struct device *dev) > { > - if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, > + if (dev_WARN_ONCE(dev, > + dev->type != &cxl_root_type && > + dev->type != &cxl_port_type, > "not a cxl_port device\n")) > return NULL; > return container_of(dev, struct cxl_port, dev); > @@ -360,6 +377,108 @@ struct cxl_root *devm_cxl_add_root(struct device *host, > } > EXPORT_SYMBOL_GPL(devm_cxl_add_root); > > +static void cxl_unlink_port(void *_port) > +{ > + struct cxl_port *port = _port; > + > + sysfs_remove_link(&port->dev.kobj, "host"); > +} > + > +static int devm_cxl_link_port(struct device *dev, struct cxl_port *port) > +{ > + int rc; > + > + rc = sysfs_create_link(&port->dev.kobj, &port->port_host->kobj, "host"); > + if (rc) > + return rc; > + return devm_add_action_or_reset(dev, cxl_unlink_port, port); > +} > + > +static struct cxl_port *cxl_port_alloc(struct cxl_port *parent_port, > + struct device *port_dev, int target_id, > + resource_size_t component_regs_phys) > +{ > + struct cxl_port *port; > + struct device *dev; > + int rc; > + > + if (!port_dev) > + return ERR_PTR(-EINVAL); > + > + port = kzalloc(sizeof(*port), GFP_KERNEL); > + if (!port) > + return ERR_PTR(-ENOMEM); > + > + rc = ida_alloc(&cxl_port_ida, GFP_KERNEL); > + if (rc < 0) > + goto err; > + > + port->id = rc; > + port->target_id = target_id; > + port->port_host = get_device(port_dev); > + port->component_regs_phys = component_regs_phys; > + > + dev = &port->dev; > + device_initialize(dev); > + device_set_pm_not_required(dev); > + dev->parent = &parent_port->dev; > + dev->bus = &cxl_bus_type; > + dev->type = &cxl_port_type; > + > + return port; > + > +err: > + kfree(port); > + return ERR_PTR(rc); > +} > + > +/** > + * devm_cxl_add_port() - add a cxl_port to the topology > + * @host: devm context / discovery agent > + * @parent_port: immediate ancestor towards cxl_root > + * @port_host: PCI or platform-firmware device hosting this port > + * @target_id: ordinal id relative to other siblings under @parent_port > + * @component_regs_phys: CXL component register base address > + */ > +struct cxl_port *devm_cxl_add_port(struct device *host, > + struct cxl_port *parent_port, > + struct device *port_host, int target_id, > + resource_size_t component_regs_phys) > +{ > + struct cxl_port *port; > + struct device *dev; > + int rc; > + > + port = cxl_port_alloc(parent_port, port_host, target_id, > + component_regs_phys); > + if (IS_ERR(port)) > + return port; > + > + dev = &port->dev; > + rc = dev_set_name(dev, "port%d", port->id); > + if (rc) > + goto err; > + > + rc = device_add(dev); > + if (rc) > + goto err; > + > + rc = devm_add_action_or_reset(host, unregister_dev, dev); > + if (rc) > + return ERR_PTR(rc); > + > + rc = devm_cxl_link_port(host, port); > + if (rc) > + return ERR_PTR(rc); > + > + return port; > + > +err: > + put_device(dev); > + return ERR_PTR(rc); > +} > +EXPORT_SYMBOL_GPL(devm_cxl_add_port); > + > /* > * cxl_setup_device_regs() - Detect CXL Device register blocks > * @dev: Host device of the @base mapping > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 559f8343fee4..0211f44c95a2 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -134,5 +134,10 @@ struct cxl_address_space_dev *to_cxl_address_space(struct device *dev); > struct cxl_root *devm_cxl_add_root(struct device *parent, > struct cxl_address_space *cxl_space, > int nr_spaces); > +struct cxl_port *devm_cxl_add_port(struct device *host, > + struct cxl_port *parent_port, > + struct device *port_host, int target_id, > + resource_size_t component_regs_phys); > + > extern struct bus_type cxl_bus_type; > #endif /* __CXL_H__ */ >
[+cc Greg, Rafael, Matthew: device model questions] Hi Dan, On Thu, Apr 01, 2021 at 07:31:20AM -0700, Dan Williams wrote: > Once the cxl_root is established then other ports in the hierarchy can > be attached. The cxl_port object, unlike cxl_root that is associated > with host bridges, is associated with PCIE Root Ports or PCIE Switch > Ports. Add cxl_port instances for all PCIE Root Ports in an ACPI0016 > host bridge. I'm not a device model expert, but I'm not sure about adding a new /sys/bus/cxl/devices hierarchy. I'm under the impression that CXL devices will be enumerated by the PCI core as PCIe devices. Doesn't that mean we will have one struct device in the pci_dev, and another one in the cxl_port? That seems like an issue to me. More below. > The cxl_port instances for PCIE Switch Ports are not > included here as those are to be modeled as another service device > registered on the pcie_port_bus_type. I'm hesitant about the idea of adding more uses of pcie_port_bus_type. I really dislike portdrv because it makes a parallel hierarchy: /sys/bus/pci /sys/bus/pci_express for things that really should not be different. There's a struct device in pci_dev, and potentially several pcie_devices, each with another struct device. We make these pcie_device things for AER, DPC, hotplug, etc. E.g., /sys/bus/pci/devices/0000:00:1c.0 /sys/bus/pci_express/devices/0000:00:1c.0:pcie002 # AER /sys/bus/pci_express/devices/0000:00:1c.0:pcie010 # BW notification These are all the same PCI device. AER is a PCI capability. Bandwidth notification is just a feature of all Downstream Ports. I think it makes zero sense to have extra struct devices for them. From a device point of view (enumeration, power management, VM assignment), we can't manage them separately from the underlying PCI device. For example, we have three separate "power/" directories, but obviously there's only one point of control (00:1c.0): /sys/devices/pci0000:00/0000:00:1c.0/power/ /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/power/ /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie010/power/ > A sample sysfs topology for a single-host-bridge with > single-PCIE/CXL-root-port: > > /sys/bus/cxl/devices/root0 > ├── address_space0 > │ ├── devtype > │ ├── end > │ ├── start > │ ├── supports_ram > │ ├── supports_type2 > │ ├── supports_type3 > │ └── uevent > ├── address_space1 > │ ├── devtype > │ ├── end > │ ├── start > │ ├── supports_pmem > │ ├── supports_type2 > │ ├── supports_type3 > │ └── uevent > ├── devtype > ├── port1 > │ ├── devtype > │ ├── host -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 > │ ├── port2 > │ │ ├── devtype > │ │ ├── host -> ../../../../../pci0000:34/0000:34:00.0 > │ │ ├── subsystem -> ../../../../../../bus/cxl > │ │ ├── target_id > │ │ └── uevent > │ ├── subsystem -> ../../../../../bus/cxl > │ ├── target_id > │ └── uevent > ├── subsystem -> ../../../../bus/cxl > ├── target_id > └── uevent > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/acpi.c | 99 +++++++++++++++++++++++++++++++++++++++++++ > drivers/cxl/core.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > drivers/cxl/cxl.h | 5 ++ > 3 files changed, 224 insertions(+), 1 deletion(-) > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index d54c2d5de730..bc2a35ae880b 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -5,18 +5,117 @@ > #include <linux/device.h> > #include <linux/kernel.h> > #include <linux/acpi.h> > +#include <linux/pci.h> > #include "cxl.h" > > +static int match_ACPI0016(struct device *dev, const void *host) > +{ > + struct acpi_device *adev = to_acpi_device(dev); > + const char *hid = acpi_device_hid(adev); > + > + return strcmp(hid, "ACPI0016") == 0; > +} > + > +struct cxl_walk_context { > + struct device *dev; > + struct pci_bus *root; > + struct cxl_port *port; > + int error; > + int count; > +}; > + > +static int match_add_root_ports(struct pci_dev *pdev, void *data) > +{ > + struct cxl_walk_context *ctx = data; > + struct pci_bus *root_bus = ctx->root; > + struct cxl_port *port = ctx->port; > + int type = pci_pcie_type(pdev); > + struct device *dev = ctx->dev; > + resource_size_t cxl_regs_phys; > + int target_id = ctx->count; > + > + if (pdev->bus != root_bus) > + return 0; > + if (!pci_is_pcie(pdev)) > + return 0; > + if (type != PCI_EXP_TYPE_ROOT_PORT) > + return 0; > + > + ctx->count++; > + > + /* TODO walk DVSEC to find component register base */ > + cxl_regs_phys = -1; > + > + port = devm_cxl_add_port(dev, port, &pdev->dev, target_id, > + cxl_regs_phys); > + if (IS_ERR(port)) { > + ctx->error = PTR_ERR(port); > + return ctx->error; > + } > + > + dev_dbg(dev, "%s: register: %s\n", dev_name(&pdev->dev), > + dev_name(&port->dev)); > + > + return 0; > +} > + > +/* > + * A host bridge may contain one or more root ports. Register each port > + * as a child of the cxl_root. > + */ > +static int cxl_acpi_register_ports(struct device *dev, struct acpi_device *root, > + struct cxl_port *port, int idx) > +{ > + struct acpi_pci_root *pci_root = acpi_pci_find_root(root->handle); > + struct cxl_walk_context ctx; > + > + if (!pci_root) > + return -ENXIO; > + > + /* TODO: fold in CEDT.CHBS retrieval */ > + port = devm_cxl_add_port(dev, port, &root->dev, idx, ~0ULL); > + if (IS_ERR(port)) > + return PTR_ERR(port); > + dev_dbg(dev, "%s: register: %s\n", dev_name(&root->dev), > + dev_name(&port->dev)); > + > + ctx = (struct cxl_walk_context) { > + .dev = dev, > + .root = pci_root->bus, > + .port = port, > + }; > + pci_walk_bus(pci_root->bus, match_add_root_ports, &ctx); > + > + if (ctx.count == 0) > + return -ENODEV; > + return ctx.error; > +} > + > static int cxl_acpi_probe(struct platform_device *pdev) > { > struct device *dev = &pdev->dev; > + struct acpi_device *adev = ACPI_COMPANION(dev); > + struct device *bridge = NULL; > struct cxl_root *cxl_root; > + int rc, i = 0; > > cxl_root = devm_cxl_add_root(dev, NULL, 0); > if (IS_ERR(cxl_root)) > return PTR_ERR(cxl_root); > dev_dbg(dev, "register: %s\n", dev_name(&cxl_root->port.dev)); > > + while (true) { > + bridge = bus_find_device(adev->dev.bus, bridge, dev, > + match_ACPI0016); > + if (!bridge) > + break; > + > + rc = cxl_acpi_register_ports(dev, to_acpi_device(bridge), > + &cxl_root->port, i++); > + if (rc) > + return rc; > + } > + > return 0; > } > > diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c > index 46c3b2588d2f..65cd704581bc 100644 > --- a/drivers/cxl/core.c > +++ b/drivers/cxl/core.c > @@ -148,6 +148,15 @@ static void cxl_root_release(struct device *dev) > kfree(cxl_root); > } > > +static void cxl_port_release(struct device *dev) > +{ > + struct cxl_port *port = to_cxl_port(dev); > + > + ida_free(&cxl_port_ida, port->id); > + put_device(port->port_host); > + kfree(port); > +} > + > static ssize_t target_id_show(struct device *dev, struct device_attribute *attr, > char *buf) > { > @@ -178,6 +187,12 @@ static const struct device_type cxl_root_type = { > .groups = cxl_port_attribute_groups, > }; > > +static const struct device_type cxl_port_type = { > + .name = "cxl_port", > + .release = cxl_port_release, > + .groups = cxl_port_attribute_groups, > +}; > + > struct cxl_root *to_cxl_root(struct device *dev) > { > if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, > @@ -188,7 +203,9 @@ struct cxl_root *to_cxl_root(struct device *dev) > > struct cxl_port *to_cxl_port(struct device *dev) > { > - if (dev_WARN_ONCE(dev, dev->type != &cxl_root_type, > + if (dev_WARN_ONCE(dev, > + dev->type != &cxl_root_type && > + dev->type != &cxl_port_type, > "not a cxl_port device\n")) > return NULL; > return container_of(dev, struct cxl_port, dev); > @@ -360,6 +377,108 @@ struct cxl_root *devm_cxl_add_root(struct device *host, > } > EXPORT_SYMBOL_GPL(devm_cxl_add_root); > > +static void cxl_unlink_port(void *_port) > +{ > + struct cxl_port *port = _port; > + > + sysfs_remove_link(&port->dev.kobj, "host"); > +} > + > +static int devm_cxl_link_port(struct device *dev, struct cxl_port *port) > +{ > + int rc; > + > + rc = sysfs_create_link(&port->dev.kobj, &port->port_host->kobj, "host"); > + if (rc) > + return rc; > + return devm_add_action_or_reset(dev, cxl_unlink_port, port); > +} > + > +static struct cxl_port *cxl_port_alloc(struct cxl_port *parent_port, > + struct device *port_dev, int target_id, > + resource_size_t component_regs_phys) > +{ > + struct cxl_port *port; > + struct device *dev; > + int rc; > + > + if (!port_dev) > + return ERR_PTR(-EINVAL); > + > + port = kzalloc(sizeof(*port), GFP_KERNEL); > + if (!port) > + return ERR_PTR(-ENOMEM); > + > + rc = ida_alloc(&cxl_port_ida, GFP_KERNEL); > + if (rc < 0) > + goto err; > + > + port->id = rc; > + port->target_id = target_id; > + port->port_host = get_device(port_dev); > + port->component_regs_phys = component_regs_phys; > + > + dev = &port->dev; > + device_initialize(dev); > + device_set_pm_not_required(dev); > + dev->parent = &parent_port->dev; > + dev->bus = &cxl_bus_type; > + dev->type = &cxl_port_type; > + > + return port; > + > +err: > + kfree(port); > + return ERR_PTR(rc); > +} > + > +/** > + * devm_cxl_add_port() - add a cxl_port to the topology > + * @host: devm context / discovery agent > + * @parent_port: immediate ancestor towards cxl_root > + * @port_host: PCI or platform-firmware device hosting this port > + * @target_id: ordinal id relative to other siblings under @parent_port > + * @component_regs_phys: CXL component register base address > + */ > +struct cxl_port *devm_cxl_add_port(struct device *host, > + struct cxl_port *parent_port, > + struct device *port_host, int target_id, > + resource_size_t component_regs_phys) > +{ > + struct cxl_port *port; > + struct device *dev; > + int rc; > + > + port = cxl_port_alloc(parent_port, port_host, target_id, > + component_regs_phys); > + if (IS_ERR(port)) > + return port; > + > + dev = &port->dev; > + rc = dev_set_name(dev, "port%d", port->id); > + if (rc) > + goto err; > + > + rc = device_add(dev); > + if (rc) > + goto err; > + > + rc = devm_add_action_or_reset(host, unregister_dev, dev); > + if (rc) > + return ERR_PTR(rc); > + > + rc = devm_cxl_link_port(host, port); > + if (rc) > + return ERR_PTR(rc); > + > + return port; > + > +err: > + put_device(dev); > + return ERR_PTR(rc); > +} > +EXPORT_SYMBOL_GPL(devm_cxl_add_port); > + > /* > * cxl_setup_device_regs() - Detect CXL Device register blocks > * @dev: Host device of the @base mapping > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 559f8343fee4..0211f44c95a2 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -134,5 +134,10 @@ struct cxl_address_space_dev *to_cxl_address_space(struct device *dev); > struct cxl_root *devm_cxl_add_root(struct device *parent, > struct cxl_address_space *cxl_space, > int nr_spaces); > +struct cxl_port *devm_cxl_add_port(struct device *host, > + struct cxl_port *parent_port, > + struct device *port_host, int target_id, > + resource_size_t component_regs_phys); > + > extern struct bus_type cxl_bus_type; > #endif /* __CXL_H__ */ >
Hi Bjorn, thanks for taking a look. On Thu, Apr 8, 2021 at 3:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Greg, Rafael, Matthew: device model questions] > > Hi Dan, > > On Thu, Apr 01, 2021 at 07:31:20AM -0700, Dan Williams wrote: > > Once the cxl_root is established then other ports in the hierarchy can > > be attached. The cxl_port object, unlike cxl_root that is associated > > with host bridges, is associated with PCIE Root Ports or PCIE Switch > > Ports. Add cxl_port instances for all PCIE Root Ports in an ACPI0016 > > host bridge. > > I'm not a device model expert, but I'm not sure about adding a new > /sys/bus/cxl/devices hierarchy. I'm under the impression that CXL > devices will be enumerated by the PCI core as PCIe devices. Yes, PCIe is involved, but mostly only for the CXL.io slow path (configuration and provisioning via mailbox) when we're talking about memory expander devices (CXL calls these Type-3). So-called "Type-3" support is the primary driver of this infrastructure. You might be thinking of CXL accelerator devices that will look like plain PCIe devices that happen to participate in the CPU cache hierarchy (CXL calls these Type-1). There will also be accelerator devices that want to share coherent memory with the system (CXL calls these Type-2). The infrastructure being proposed here is primarily for the memory expander (Type-3) device case where the PCI sysfs hierarchy is wholly unsuited for modeling it. A single CXL memory region device may span multiple endpoints, switches, and host bridges. It poses similar stress to an OS device model as RAID where there is a driver for the component contributors to an upper level device / driver that exposes the RAID Volume (CXL memory region interleave set). The CXL memory decode space (HDM: Host Managed Device Memory) is independent of the PCIe MMIO BAR space. That's where the /sys/bus/cxl hierarchy is needed, to manage the HDM space across the CXL topology in a way that is foreign to PCIE (HDM Decoder hierarchy). > Doesn't > that mean we will have one struct device in the pci_dev, and another > one in the cxl_port? Yes, that is the proposal. > That seems like an issue to me. More below. hmm... > > > The cxl_port instances for PCIE Switch Ports are not > > included here as those are to be modeled as another service device > > registered on the pcie_port_bus_type. > > I'm hesitant about the idea of adding more uses of pcie_port_bus_type. > I really dislike portdrv because it makes a parallel hierarchy: > > /sys/bus/pci > /sys/bus/pci_express > > for things that really should not be different. There's a struct > device in pci_dev, and potentially several pcie_devices, each with > another struct device. We make these pcie_device things for AER, DPC, > hotplug, etc. E.g., > > /sys/bus/pci/devices/0000:00:1c.0 > /sys/bus/pci_express/devices/0000:00:1c.0:pcie002 # AER > /sys/bus/pci_express/devices/0000:00:1c.0:pcie010 # BW notification > > These are all the same PCI device. AER is a PCI capability. > Bandwidth notification is just a feature of all Downstream Ports. I > think it makes zero sense to have extra struct devices for them. From > a device point of view (enumeration, power management, VM assignment), > we can't manage them separately from the underlying PCI device. For > example, we have three separate "power/" directories, but obviously > there's only one point of control (00:1c.0): > > /sys/devices/pci0000:00/0000:00:1c.0/power/ > /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/power/ > /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie010/power/ The superfluous power/ issue can be cleaned up with device_set_pm_not_required(). What are the other problems this poses, because in other areas this ability to subdivide a device's functionality into sub-drivers is a useful organization principle? So much so that several device writer teams came together to create the auxiliary-bus for the purpose of allowing sub-drivers to be carved off for independent functionality similar to the portdrv organization. That said, I'm open to CXL switch support *not* building on the portdrv model, but I'm not yet on the same page with your concern.
On Thu, Apr 8, 2021 at 7:13 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Hi Bjorn, thanks for taking a look.
>
>
> On Thu, Apr 8, 2021 at 3:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Greg, Rafael, Matthew: device model questions]
> >
> > Hi Dan,
> >
> > On Thu, Apr 01, 2021 at 07:31:20AM -0700, Dan Williams wrote:
> > > Once the cxl_root is established then other ports in the hierarchy can
> > > be attached. The cxl_port object, unlike cxl_root that is associated
> > > with host bridges, is associated with PCIE Root Ports or PCIE Switch
> > > Ports. Add cxl_port instances for all PCIE Root Ports in an ACPI0016
> > > host bridge.
> >
> > I'm not a device model expert, but I'm not sure about adding a new
> > /sys/bus/cxl/devices hierarchy. I'm under the impression that CXL
> > devices will be enumerated by the PCI core as PCIe devices.
>
> Yes, PCIe is involved, but mostly only for the CXL.io slow path
> (configuration and provisioning via mailbox) when we're talking about
> memory expander devices (CXL calls these Type-3). So-called "Type-3"
> support is the primary driver of this infrastructure.
>
> You might be thinking of CXL accelerator devices that will look like
> plain PCIe devices that happen to participate in the CPU cache
> hierarchy (CXL calls these Type-1). There will also be accelerator
> devices that want to share coherent memory with the system (CXL calls
> these Type-2).
>
> The infrastructure being proposed here is primarily for the memory
> expander (Type-3) device case where the PCI sysfs hierarchy is wholly
> unsuited for modeling it. A single CXL memory region device may span
> multiple endpoints, switches, and host bridges. It poses similar
> stress to an OS device model as RAID where there is a driver for the
> component contributors to an upper level device / driver that exposes
> the RAID Volume (CXL memory region interleave set). The CXL memory
> decode space (HDM: Host Managed Device Memory) is independent of the
> PCIe MMIO BAR space.
>
> That's where the /sys/bus/cxl hierarchy is needed, to manage the HDM
> space across the CXL topology in a way that is foreign to PCIE (HDM
> Decoder hierarchy).
>
> > Doesn't
> > that mean we will have one struct device in the pci_dev, and another
> > one in the cxl_port?
>
> Yes, that is the proposal.
>
> > That seems like an issue to me. More below.
>
> hmm...
>
> >
> > > The cxl_port instances for PCIE Switch Ports are not
> > > included here as those are to be modeled as another service device
> > > registered on the pcie_port_bus_type.
> >
> > I'm hesitant about the idea of adding more uses of pcie_port_bus_type.
> > I really dislike portdrv because it makes a parallel hierarchy:
> >
> > /sys/bus/pci
> > /sys/bus/pci_express
> >
> > for things that really should not be different. There's a struct
> > device in pci_dev, and potentially several pcie_devices, each with
> > another struct device. We make these pcie_device things for AER, DPC,
> > hotplug, etc. E.g.,
> >
> > /sys/bus/pci/devices/0000:00:1c.0
> > /sys/bus/pci_express/devices/0000:00:1c.0:pcie002 # AER
> > /sys/bus/pci_express/devices/0000:00:1c.0:pcie010 # BW notification
> >
> > These are all the same PCI device. AER is a PCI capability.
> > Bandwidth notification is just a feature of all Downstream Ports. I
> > think it makes zero sense to have extra struct devices for them. From
> > a device point of view (enumeration, power management, VM assignment),
> > we can't manage them separately from the underlying PCI device. For
> > example, we have three separate "power/" directories, but obviously
> > there's only one point of control (00:1c.0):
> >
> > /sys/devices/pci0000:00/0000:00:1c.0/power/
> > /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/power/
> > /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie010/power/
>
> The superfluous power/ issue can be cleaned up with
> device_set_pm_not_required().
>
> What are the other problems this poses, because in other areas this
> ability to subdivide a device's functionality into sub-drivers is a
> useful organization principle? So much so that several device writer
> teams came together to create the auxiliary-bus for the purpose of
> allowing sub-drivers to be carved off for independent functionality
> similar to the portdrv organization.
>
Bjorn, any further thoughts on this?
This port architecture question is in the critical path for the next
phase of CXL development (targeting v5.14 not v5.13).
On Tue, Apr 6, 2021 at 10:47 AM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Thu, 1 Apr 2021 07:30:47 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > In preparation for sharing cxl.h with other generic CXL consumers, > > move / consolidate some of the memory device specifics to mem.h. > > > > Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > Hi Dan, > > Would be good to see something in this patch description saying > why you chose to have mem.h rather than push the defines down > into mem.c (which from the current code + patch set looks like > the more logical thing to do). The main motivation was least privilege access to memory-device details, so they had to move out of cxl.h. As to why move them in to a new mem.h instead of piling more into mem.c that's just a personal organizational style choice to aid review. I tend to go to headers first and read data structure definitions before reading the implementation, and having that all in one place is cleaner than interspersed with implementation details in the C code. It's all still private to drivers/cxl/ so I don't see any "least privilege" concerns with moving it there. Does that satisfy your concern? If yes, I'll add the above to v3. > As a side note, docs for struct cxl_mem need a fix as they cover > enabled_commands which at somepoint got shortened to enabled_cmds Thanks, will fix.
On Tue, Apr 6, 2021 at 10:47 AM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Thu, 1 Apr 2021 07:30:53 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > CXL MMIO register blocks are organized by device type and capabilities. > > There are Component registers, Device registers (yes, an ambiguous > > name), and Memory Device registers (a specific extension of Device > > registers). > > > > It is possible for a given device instance (endpoint or port) to > > implement register sets from multiple of the above categories. > > > > The driver code that enumerates and maps the registers is type specific > > so it is useful to have a dedicated type and helpers for each block > > type. > > > > At the same time, once the registers are mapped the origin type does not > > matter. It is overly pedantic to reference the register block type in > > code that is using the registers. > > > > In preparation for the endpoint driver to incorporate Component registers > > into its MMIO operations reorganize the registers to allow typed > > enumeration + mapping, but anonymous usage. With the end state of > > 'struct cxl_regs' to be: > > > > struct cxl_regs { > > union { > > struct { > > CXL_DEVICE_REGS(); > > }; > > struct cxl_device_regs device_regs; > > }; > > union { > > struct { > > CXL_COMPONENT_REGS(); > > }; > > struct cxl_component_regs component_regs; > > }; > > }; > > > > With this arrangement the driver can share component init code with > > ports, but when using the registers it can directly reference the > > component register block type by name without the 'component_regs' > > prefix. > > > > So, map + enumerate can be shared across drivers of different CXL > > classes e.g.: > > > > void cxl_setup_device_regs(struct device *dev, void __iomem *base, > > struct cxl_device_regs *regs); > > > > void cxl_setup_component_regs(struct device *dev, void __iomem *base, > > struct cxl_component_regs *regs); > > > > ...while inline usage in the driver need not indicate where the > > registers came from: > > > > readl(cxlm->regs.mbox + MBOX_OFFSET); > > readl(cxlm->regs.hdm + HDM_OFFSET); > > > > ...instead of: > > > > readl(cxlm->regs.device_regs.mbox + MBOX_OFFSET); > > readl(cxlm->regs.component_regs.hdm + HDM_OFFSET); > > > > This complexity of the definition in .h yields improvement in code > > readability in .c while maintaining type-safety for organization of > > setup code. It prepares the implementation to maintain organization in > > the face of CXL devices that compose register interfaces consisting of > > multiple types. > > > > Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > A few minor things inline. > > > --- > > drivers/cxl/cxl.h | 33 +++++++++++++++++++++++++++++++++ > > drivers/cxl/mem.c | 44 ++++++++++++++++++++++++-------------------- > > drivers/cxl/mem.h | 13 +++++-------- > > 3 files changed, 62 insertions(+), 28 deletions(-) > > > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > > index 2e3bdacb32e7..37325e504fb7 100644 > > --- a/drivers/cxl/cxl.h > > +++ b/drivers/cxl/cxl.h > > @@ -34,5 +34,38 @@ > > #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18 > > #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20 > > > > +/* See note for 'struct cxl_regs' for the rationale of this organization */ > > +#define CXL_DEVICE_REGS() \ > > + void __iomem *status; \ > > + void __iomem *mbox; \ > > + void __iomem *memdev > > + > > +/** > > + * struct cxl_device_regs - Common container of CXL Device register > > + * block base pointers > > + * @status: CXL 2.0 8.2.8.3 Device Status Registers > > + * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers > > + * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers > > kernel-doc script is not going to be happy with documenting fields it can't see > + not documenting the CXL_DEVICE_REGS() field it can. > > I've no idea what the right way to handle this might be. Sure, I'll at least check that the tool does not complain, I might just make this not a kernel-doc and change the /** to plain /*. [..] > > diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h > > index daa9aba0e218..c247cf9c71af 100644 > > --- a/drivers/cxl/mem.h > > +++ b/drivers/cxl/mem.h > > @@ -53,10 +53,9 @@ struct cxl_memdev { > > /** > > * struct cxl_mem - A CXL memory device > > * @pdev: The PCI device associated with this CXL device. > > - * @regs: IO mappings to the device's MMIO > > - * @status_regs: CXL 2.0 8.2.8.3 Device Status Registers > > - * @mbox_regs: CXL 2.0 8.2.8.4 Mailbox Registers > > - * @memdev_regs: CXL 2.0 8.2.8.5 Memory Device Registers > > + * @base: IO mappings to the device's MMIO > > + * @cxlmd: Logical memory device chardev / interface > > Unrelated missing docs fix? Yeah, I'll declare that in the changelog. > > > + * @regs: Parsed register blocks > > * @payload_size: Size of space for payload > > * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register) > > * @mbox_mutex: Mutex to synchronize mailbox access. > > @@ -67,12 +66,10 @@ struct cxl_memdev { > > */ > > struct cxl_mem { > > struct pci_dev *pdev; > > - void __iomem *regs; > > + void __iomem *base; > > Whilst I have no problem with the rename and fact you want to free it > up for other uses, perhaps call it out in the patch description? Sure.
On Tue, Apr 13, 2021 at 5:18 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Tue, Apr 6, 2021 at 10:47 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Thu, 1 Apr 2021 07:30:47 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > > In preparation for sharing cxl.h with other generic CXL consumers,
> > > move / consolidate some of the memory device specifics to mem.h.
> > >
> > > Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >
> > Hi Dan,
> >
> > Would be good to see something in this patch description saying
> > why you chose to have mem.h rather than push the defines down
> > into mem.c (which from the current code + patch set looks like
> > the more logical thing to do).
>
> The main motivation was least privilege access to memory-device
> details, so they had to move out of cxl.h. As to why move them in to a
> new mem.h instead of piling more into mem.c that's just a personal
> organizational style choice to aid review. I tend to go to headers
> first and read data structure definitions before reading the
> implementation, and having that all in one place is cleaner than
> interspersed with implementation details in the C code. It's all still
> private to drivers/cxl/ so I don't see any "least privilege" concerns
> with moving it there.
>
> Does that satisfy your concern?
>
> If yes, I'll add the above to v3.
Oh, another thing it helps is the information content of diffstats to
distinguish definition changes from implementation development.
On Thu, Apr 08, 2021 at 07:13:38PM -0700, Dan Williams wrote: > Hi Bjorn, thanks for taking a look. > > On Thu, Apr 8, 2021 at 3:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > [+cc Greg, Rafael, Matthew: device model questions] > > > > Hi Dan, > > > > On Thu, Apr 01, 2021 at 07:31:20AM -0700, Dan Williams wrote: > > > Once the cxl_root is established then other ports in the hierarchy can > > > be attached. The cxl_port object, unlike cxl_root that is associated > > > with host bridges, is associated with PCIE Root Ports or PCIE Switch > > > Ports. Add cxl_port instances for all PCIE Root Ports in an ACPI0016 > > > host bridge. Incidentally, "PCIe" is the abbreviation used in the PCIe specs, so I try to use that instead of "PCIE" in drivers/pci/. > > I'm not a device model expert, but I'm not sure about adding a new > > /sys/bus/cxl/devices hierarchy. I'm under the impression that CXL > > devices will be enumerated by the PCI core as PCIe devices. > > Yes, PCIe is involved, but mostly only for the CXL.io slow path > (configuration and provisioning via mailbox) when we're talking about > memory expander devices (CXL calls these Type-3). So-called "Type-3" > support is the primary driver of this infrastructure. > > You might be thinking of CXL accelerator devices that will look like > plain PCIe devices that happen to participate in the CPU cache > hierarchy (CXL calls these Type-1). There will also be accelerator > devices that want to share coherent memory with the system (CXL calls > these Type-2). IIUC all these CXL devices will be enumerated by the PCI core. They seem to have regular PCI BARs (separate from the HDM stuff), so the PCI core will presumably manage address allocation for them. It looks like Function Level Reset and hotplug are supposed to use the regular PCIe code. I guess this will all be visible via lspci just like regular PCI devices, right? > The infrastructure being proposed here is primarily for the memory > expander (Type-3) device case where the PCI sysfs hierarchy is wholly > unsuited for modeling it. A single CXL memory region device may span > multiple endpoints, switches, and host bridges. It poses similar > stress to an OS device model as RAID where there is a driver for the > component contributors to an upper level device / driver that exposes > the RAID Volume (CXL memory region interleave set). The CXL memory > decode space (HDM: Host Managed Device Memory) is independent of the > PCIe MMIO BAR space. It looks like you add a cxl_port for each ACPI0016 device and every PCIe Root Port below it. So I guess the upper level spanning is at a higher level than cxl_port? > That's where the /sys/bus/cxl hierarchy is needed, to manage the HDM > space across the CXL topology in a way that is foreign to PCIE (HDM > Decoder hierarchy). When we do FLR on the PCIe device, what happens to these CXL clients? Do they care? Are they notified? Do they need to do anything before or after the FLR? What about hotplug? Spec says it leverages PCIe hotplug, but it looks like maybe this all requires ACPI hotplug (acpiphp) for adding ACPI0017 devices and notifying of hot remove requests? If it uses PCIe native hotplug (pciehp), what connects the CXL side to the PCI side? I guess the HDM address space management is entirely outside the scope of PCI -- the address space is not described by the CXL host bridge _CRS and not described by CXL endpoint BARs? Where *is* it described and who manages and allocates it? I guess any transaction routing through the CXL fabric for HDM space is also completely outside the scope of PCI -- we don't need to worry about managing PCI-to-PCI bridge windows, for instance? Is there a cxl_register_driver() or something? I assume there will be drivers that need to manage CXL devices? Or will they use pci_register_driver() and search for a CXL capability? > > Doesn't that mean we will have one struct device in the pci_dev, > > and another one in the cxl_port? > > Yes, that is the proposal. > The superfluous power/ issue can be cleaned up with > device_set_pm_not_required(). Thanks, we might be able to use that for portdrv. I added it to my list to investigate. > What are the other problems this poses, because in other areas this > ability to subdivide a device's functionality into sub-drivers is a > useful organization principle? Well, I'm thinking about things like enumeration, hotplug, reset, resource management (BARs, bridge windows, etc), interrupts, power management (suspend, resume, etc), and error reporting. These are all things that PCIe defines on a per-Function basis and seem kind of hard to cleanly subdivide. > So much so that several device writer teams came together to create > the auxiliary-bus for the purpose of allowing sub-drivers to be > carved off for independent functionality similar to the portdrv > organization. Is "auxiliary-bus" a specific thing? I'm not familiar with it but again I'd like to read up on it in case it has ideas we could leverage. Sub-drivers *is* an issue for PCI in general, although mostly I think it tends to be historical devices where people made the design mistake of putting several unrelated pieces of functionality in the same PCI function, so I don't think PCI has good infrastructure for doing that. Bjorn
On Tue, 13 Apr 2021 17:42:37 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Apr 13, 2021 at 5:18 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Tue, Apr 6, 2021 at 10:47 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Thu, 1 Apr 2021 07:30:47 -0700
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > > In preparation for sharing cxl.h with other generic CXL consumers,
> > > > move / consolidate some of the memory device specifics to mem.h.
> > > >
> > > > Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > >
> > > Hi Dan,
> > >
> > > Would be good to see something in this patch description saying
> > > why you chose to have mem.h rather than push the defines down
> > > into mem.c (which from the current code + patch set looks like
> > > the more logical thing to do).
> >
> > The main motivation was least privilege access to memory-device
> > details, so they had to move out of cxl.h. As to why move them in to a
> > new mem.h instead of piling more into mem.c that's just a personal
> > organizational style choice to aid review. I tend to go to headers
> > first and read data structure definitions before reading the
> > implementation, and having that all in one place is cleaner than
> > interspersed with implementation details in the C code. It's all still
> > private to drivers/cxl/ so I don't see any "least privilege" concerns
> > with moving it there.
> >
> > Does that satisfy your concern?
> >
> > If yes, I'll add the above to v3.
>
> Oh, another thing it helps is the information content of diffstats to
> distinguish definition changes from implementation development.
I go the other way style wise, but agree it doesn't really matter for
local headers included from few other files. Adding a above to
comment will at least avoid anyone else (or forgetful me) raising question on v3.
Jonathan
On Tue, Apr 13, 2021 at 6:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Apr 08, 2021 at 07:13:38PM -0700, Dan Williams wrote: > > Hi Bjorn, thanks for taking a look. > > > > On Thu, Apr 8, 2021 at 3:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > [+cc Greg, Rafael, Matthew: device model questions] > > > > > > Hi Dan, > > > > > > On Thu, Apr 01, 2021 at 07:31:20AM -0700, Dan Williams wrote: > > > > Once the cxl_root is established then other ports in the hierarchy can > > > > be attached. The cxl_port object, unlike cxl_root that is associated > > > > with host bridges, is associated with PCIE Root Ports or PCIE Switch > > > > Ports. Add cxl_port instances for all PCIE Root Ports in an ACPI0016 > > > > host bridge. > > Incidentally, "PCIe" is the abbreviation used in the PCIe specs, so I > try to use that instead of "PCIE" in drivers/pci/. Noted. > > > > I'm not a device model expert, but I'm not sure about adding a new > > > /sys/bus/cxl/devices hierarchy. I'm under the impression that CXL > > > devices will be enumerated by the PCI core as PCIe devices. > > > > Yes, PCIe is involved, but mostly only for the CXL.io slow path > > (configuration and provisioning via mailbox) when we're talking about > > memory expander devices (CXL calls these Type-3). So-called "Type-3" > > support is the primary driver of this infrastructure. > > > > You might be thinking of CXL accelerator devices that will look like > > plain PCIe devices that happen to participate in the CPU cache > > hierarchy (CXL calls these Type-1). There will also be accelerator > > devices that want to share coherent memory with the system (CXL calls > > these Type-2). > > IIUC all these CXL devices will be enumerated by the PCI core. They > seem to have regular PCI BARs (separate from the HDM stuff), so the > PCI core will presumably manage address allocation for them. It looks > like Function Level Reset and hotplug are supposed to use the regular > PCIe code. I guess this will all be visible via lspci just like > regular PCI devices, right? Yes. the CXL.io protocol is synonymous with PCIe. Hotplug is native PCIe hotplug to negotiate getting the card online and offline. Although, for offline an additional constraint is to deny removal whenever the card has active pages in the page allocator. Similar to what happens today for ACPI memory hotplug where the OS can say "nope, there's still active pages in the range you asked to eject". FLR has no effect on CXL.cache or CXL.mem state, only CXL.io. > > The infrastructure being proposed here is primarily for the memory > > expander (Type-3) device case where the PCI sysfs hierarchy is wholly > > unsuited for modeling it. A single CXL memory region device may span > > multiple endpoints, switches, and host bridges. It poses similar > > stress to an OS device model as RAID where there is a driver for the > > component contributors to an upper level device / driver that exposes > > the RAID Volume (CXL memory region interleave set). The CXL memory > > decode space (HDM: Host Managed Device Memory) is independent of the > > PCIe MMIO BAR space. > > It looks like you add a cxl_port for each ACPI0016 device and every > PCIe Root Port below it. So I guess the upper level spanning is at a > higher level than cxl_port? A memory interleave can span any level of the hierarchy. It can be across host bridges at the top level, but also incorporate a leaf device at the bottom of a CXL switch hierarchy. There will be a cxl_port instance for each side of each link. > > That's where the /sys/bus/cxl hierarchy is needed, to manage the HDM > > space across the CXL topology in a way that is foreign to PCIE (HDM > > Decoder hierarchy). > > When we do FLR on the PCIe device, what happens to these CXL clients? > Do they care? Are they notified? Do they need to do anything before > or after the FLR? Per CXL Spec: "FLR has no effect on the CXL.cache and CXL.mem protocol. Any CXL.cache and CXL.mem related control registers including CXL DVSEC structures and state held by the CXL device are not affected by FLR. The memory controller hosting the HDM is not reset by FLR." > What about hotplug? Spec says it leverages PCIe hotplug, but it looks > like maybe this all requires ACPI hotplug (acpiphp) for adding > ACPI0017 devices and notifying of hot remove requests? If it uses > PCIe native hotplug (pciehp), what connects the CXL side to the PCI > side? No ACPI hotplug is not involved. ACPI0017 is essentially just a dummy anchor device to hang the interleave set coordination. The connect from native hotplug to CXL is the cxl_mem driver. When that it detects a new device it walks the cxl_port hierarchy to see if one is a parent of this endpoint. Then it registers its HDM decoders with the CXL core and the CXL core can online it as a standalone interneleave set or consolidate it with others to make a wider set. For persistent memory there is on-device metadata to recall whether this device was part of a set previously. For volatile-only devices it would need to rely on some policy to decide if devices are immediately onlined standalone, or wait for an administrator to configure them. > I guess the HDM address space management is entirely outside the scope > of PCI -- the address space is not described by the CXL host bridge > _CRS and not described by CXL endpoint BARs? Correct. > Where *is* it described > and who manages and allocates it? ACPI0017 will communicate a set of address spaces that the CXL core can allocate interleave sets. > I guess any transaction routing > through the CXL fabric for HDM space is also completely outside the > scope of PCI -- we don't need to worry about managing PCI-to-PCI > bridge windows, for instance? Correct. For example a PCIe switch could disable all I/O space and Memory (MMIO) space, but still decode Host-managed Device Memory (HDM) space. > Is there a cxl_register_driver() or something? I assume there will be > drivers that need to manage CXL devices? Or will they use > pci_register_driver() and search for a CXL capability? A bit of both. The cxl_mem driver does pci_register_driver(), but for ports there will be a driver on the CXL bus for that component capability. Both endpoints and switches will produce cxl_port instances to be connected / driven by a core driver and coordinated with a root level driver for address space and interleave management. > > > Doesn't that mean we will have one struct device in the pci_dev, > > > and another one in the cxl_port? > > > > Yes, that is the proposal. > > > The superfluous power/ issue can be cleaned up with > > device_set_pm_not_required(). > > Thanks, we might be able to use that for portdrv. I added it to my > list to investigate. > > > What are the other problems this poses, because in other areas this > > ability to subdivide a device's functionality into sub-drivers is a > > useful organization principle? > > Well, I'm thinking about things like enumeration, hotplug, reset, > resource management (BARs, bridge windows, etc), interrupts, power > management (suspend, resume, etc), and error reporting. These are all > things that PCIe defines on a per-Function basis and seem kind of hard > to cleanly subdivide. Right, I'm hoping like FLR there is little need to coordinate PCI / CXL.io operations with CXL.mem operations, or that once a PCI driver registers some CXL capabilities it never needs to look back. The only hook that violates this so far is NAKing device removal when CXL.mem for that device is busy. > > So much so that several device writer teams came together to create > > the auxiliary-bus for the purpose of allowing sub-drivers to be > > carved off for independent functionality similar to the portdrv > > organization. > > Is "auxiliary-bus" a specific thing? I'm not familiar with it but > again I'd like to read up on it in case it has ideas we could > leverage. auxiliary-bus is not a specific thing, it's a generic way for any driver to register a custom device for a sub-driver to drive. One of the primary examples are PCI Ethernet drivers exporting RDMA device interfaces for common RDMA functionality. So you could have multiple generations of Ethernet devices all producing a common RDMA interface and rather than have an equivalent RDMA driver per generation just create a shared common one that attaches to all the different baseline Ethernet implementations. See: Documentation/driver-api/auxiliary_bus.rst That document is still a bit too generic, and I have an item on my backlog to flesh it out with more practical guidelines. > Sub-drivers *is* an issue for PCI in general, although mostly I think > it tends to be historical devices where people made the design mistake > of putting several unrelated pieces of functionality in the same PCI > function, so I don't think PCI has good infrastructure for doing that. Auxiliary-bus might help especially if those unrelated pieces have been duplicated across multiple different device implementations. Aux-bus might clean up the driver model for those pieces.
On Tue, Apr 6, 2021 at 10:47 AM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Thu, 1 Apr 2021 07:31:09 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > While CXL builds upon the PCI software model for dynamic enumeration and > > control, a static platform component is required to bootstrap the CXL > > memory layout. In addition to identifying the host bridges ACPI is > > responsible for enumerating the CXL memory space that can be addressed > > by decoders. This is similar to the requirement for ACPI to publish > > resources reported by _CRS for PCI host bridges. > > > > Introduce the cxl_root object as an abstract "port" into the CXL.mem > > address space described by HDM decoders identified by the ACPI > > CEDT.CHBS. > > > > For now just establish the initial boilerplate and sysfs attributes, to > > be followed by enumeration of the ports within the host bridge. > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > A few minor comments inline. > > > --- > > drivers/cxl/Kconfig | 14 ++ > > drivers/cxl/Makefile | 2 > > drivers/cxl/acpi.c | 39 ++++++ > > drivers/cxl/core.c | 349 ++++++++++++++++++++++++++++++++++++++++++++++++++ > > drivers/cxl/cxl.h | 64 +++++++++ > > 5 files changed, 468 insertions(+) > > create mode 100644 drivers/cxl/acpi.c > > > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > > index 97dc4d751651..fb282af84afd 100644 > > --- a/drivers/cxl/Kconfig > > +++ b/drivers/cxl/Kconfig > > @@ -50,4 +50,18 @@ config CXL_MEM_RAW_COMMANDS > > potential impact to memory currently in use by the kernel. > > > > If developing CXL hardware or the driver say Y, otherwise say N. > > + > > +config CXL_ACPI > > + tristate "CXL ACPI: Platform Support" > > + depends on ACPI > > + help > > + Enable support for host managed device memory (HDM) resources > > + published by a platform's ACPI CXL memory layout description. > > + See Chapter 9.14.1 CXL Early Discovery Table (CEDT) in the CXL > > + 2.0 specification. The CXL core consumes these resource to > > + publish port and address_space objects used to map regions > > + that represent System RAM, or Persistent Memory regions to be > > + managed by LIBNVDIMM. > > + > > + If unsure say 'm'. > > endif > > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile > > index 3808e39dd31f..f429ca6b59d9 100644 > > --- a/drivers/cxl/Makefile > > +++ b/drivers/cxl/Makefile > > @@ -1,7 +1,9 @@ > > # SPDX-License-Identifier: GPL-2.0 > > obj-$(CONFIG_CXL_BUS) += cxl_core.o > > obj-$(CONFIG_CXL_MEM) += cxl_mem.o > > +obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o > > > > ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL > > cxl_core-y := core.o > > cxl_mem-y := mem.o > > +cxl_acpi-y := acpi.o > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > > new file mode 100644 > > index 000000000000..d54c2d5de730 > > --- /dev/null > > +++ b/drivers/cxl/acpi.c > > @@ -0,0 +1,39 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */ > > +#include <linux/platform_device.h> > > +#include <linux/module.h> > > +#include <linux/device.h> > > +#include <linux/kernel.h> > > +#include <linux/acpi.h> > > swap acpi.h that for mod_devicetable.h unless this is going to > need acpi.h later for something else. It will need it after patch7, so I'll just leave it as is for now. [..] > > +static struct cxl_root *cxl_root_alloc(struct device *parent, > > + struct cxl_address_space *cxl_space, > > + int nr_spaces) > > +{ > > + struct cxl_root *cxl_root; > > + struct cxl_port *port; > > + struct device *dev; > > + int rc; > > + > > + cxl_root = kzalloc(struct_size(cxl_root, address_space, nr_spaces), > > + GFP_KERNEL); > > + if (!cxl_root) > > + return ERR_PTR(-ENOMEM); > > + > > + memcpy(cxl_root->address_space, cxl_space, > > + flex_array_size(cxl_root, address_space, nr_spaces)); > > + cxl_root->nr_spaces = nr_spaces; > > + > > + rc = ida_alloc(&cxl_port_ida, GFP_KERNEL); > > + if (rc < 0) > > + goto err; > > + port = &cxl_root->port; > > + port->id = rc; > > + > > + /* > > + * Root does not have a cxl_port as its parent and it does not > > + * have any corresponding component registers it is only a > > have any corresponding component registers; it is only a > .. or you could use two sentences Sure. > > > + * logical anchor to the first level of actual ports that decode > > + * the root address spaces. > > + */ > > + port->port_host = parent; > > + port->target_id = -1; > > + port->component_regs_phys = -1; > > + > > + dev = &port->dev; > > + device_initialize(dev); > > + device_set_pm_not_required(dev); > > + dev->parent = parent; > > + dev->bus = &cxl_bus_type; > > + dev->type = &cxl_root_type; > > + > > + return cxl_root; > > + > > +err: > > + kfree(cxl_root); > > + return ERR_PTR(rc); > > +} > > + > > +static struct cxl_address_space_dev * > > +cxl_address_space_dev_alloc(struct device *parent, > > + struct cxl_address_space *space) > > +{ > > + struct cxl_address_space_dev *cxl_asd; > > + struct resource *res; > > + struct device *dev; > > + int rc; > > + > > + cxl_asd = kzalloc(sizeof(*cxl_asd), GFP_KERNEL); > > + if (!cxl_asd) > > + return ERR_PTR(-ENOMEM); > > + > > + res = &cxl_asd->res; > > + res->name = "CXL Address Space"; > > + res->start = space->range.start; > > + res->end = space->range.end; > > + res->flags = IORESOURCE_MEM; > > + > > + rc = insert_resource(&iomem_resource, res); > > + if (rc) > > + goto err; > > + > > + cxl_asd->address_space = space; > > + dev = &cxl_asd->dev; > > + device_initialize(dev); > > + device_set_pm_not_required(dev); > > + dev->parent = parent; > > + dev->type = &cxl_address_space_type; > > + > > + return cxl_asd; > > + > > +err: > > + kfree(cxl_asd); > > + return ERR_PTR(rc); > > +} > > + > > +static int cxl_address_space_dev_add(struct device *host, > > + struct cxl_address_space_dev *cxl_asd, > > + int id) > > +{ > > + struct device *dev = &cxl_asd->dev; > > + int rc; > > + > > + rc = dev_set_name(dev, "address_space%d", id); > > + if (rc) > > + goto err; > > + > > + rc = device_add(dev); > > + if (rc) > > + goto err; > > + > > + dev_dbg(host, "%s: register %s\n", dev_name(dev->parent), > > + dev_name(dev)); > > + > > + return devm_add_action_or_reset(host, unregister_dev, dev); > > + > > +err: > > + put_device(dev); > This is unusual. The error handling is undoing something this function > wasn't responsible for. See below for suggested resolution. > > > + return rc; > > +} > > + > > +struct cxl_root *devm_cxl_add_root(struct device *host, > > + struct cxl_address_space *cxl_space, > > + int nr_spaces) > > +{ > > + struct cxl_root *cxl_root; > > + struct cxl_port *port; > > + struct device *dev; > > + int i, rc; > > + > > + cxl_root = cxl_root_alloc(host, cxl_space, nr_spaces); > > + if (IS_ERR(cxl_root)) > > + return cxl_root; > > + > > + port = &cxl_root->port; > > + dev = &port->dev; > > + rc = dev_set_name(dev, "root%d", port->id); > > + if (rc) > > + goto err; > > + > > + rc = device_add(dev); > > + if (rc) > > + goto err; > > + > > + rc = devm_add_action_or_reset(host, unregister_dev, dev); > > + if (rc) > > + return ERR_PTR(rc); > > + > > + for (i = 0; i < nr_spaces; i++) { > > + struct cxl_address_space *space = &cxl_root->address_space[i]; > > + struct cxl_address_space_dev *cxl_asd; > > + > > + if (!range_len(&space->range)) > > + continue; > > + > > + cxl_asd = cxl_address_space_dev_alloc(dev, space); > > + if (IS_ERR(cxl_asd)) > > + return ERR_CAST(cxl_asd); > > + > > Nothing is done between the dev_alloc() and the dev_add() > and this is currently in the odd position of doing put_device() in the > error path of *dev_add() when it wasn't responsible for getting the > reference it is putting, dev_alloc() did that. No, you missed the back and forth that Jason and I had about proper device initialization flows: https://lore.kernel.org/linux-cxl/161714738634.2168142.10860201861152789544.stgit@dwillia2-desk3.amr.corp.intel.com/ The put_device() is not undoing the dev_alloc(), it is undoing the dev_alloc() + follow on allocations. Specifically it undoes the dev_set_name() allocation. That is why the alloc and the add are split into explicit code paths where the recovery shifts from alloc-unwind to put_device(). > That suggests to me that we can clean up the oddity by just combining > cxl_address_space_dev_alloc() and cxl_adress_space_dev_add() into one > > alloc_and_add() function (with a better name) It's not an oddity and alloc_and_add() is an anti-pattern that leads to bugs. > > > + rc = cxl_address_space_dev_add(host, cxl_asd, i); > > Lifetime management here seems overly complex. Why not use host for both > the alloc and add() devm calls? I guess there is a good reason > though so good to have a comment here saying what it is. I'll add a kernel-doc to cxl_address_space_dev_add() to clarify what is happening here.
On Tue, Apr 6, 2021 at 10:47 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Thu, 1 Apr 2021 07:31:03 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > While CXL Memory Device endpoints locate the CXL MMIO registers in a PCI
> > BAR, CXL root bridges have their MMIO base address described by platform
> > firmware. Refactor the existing register lookup into a generic facility
> > for endpoints and bridges to share.
> >
> > Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Nice to make the docs kernel-doc, but otherwise this is simple and makes sense
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> > ---
> > drivers/cxl/core.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> > drivers/cxl/cxl.h | 3 +++
> > drivers/cxl/mem.c | 50 +++++-----------------------------------------
> > 3 files changed, 65 insertions(+), 45 deletions(-)
> >
> > diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c
> > index 7f8d2034038a..2ab467ef9909 100644
> > --- a/drivers/cxl/core.c
> > +++ b/drivers/cxl/core.c
> > @@ -1,7 +1,8 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> > -/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> > +/* Copyright(c) 2020-2021 Intel Corporation. All rights reserved. */
> > #include <linux/device.h>
> > #include <linux/module.h>
> > +#include "cxl.h"
> >
> > /**
> > * DOC: cxl core
> > @@ -10,6 +11,60 @@
> > * point for cross-device interleave coordination through cxl ports.
> > */
> >
> > +/*
> > + * cxl_setup_device_regs() - Detect CXL Device register blocks
> > + * @dev: Host device of the @base mapping
> > + * @base: mapping of CXL 2.0 8.2.8 CXL Device Register Interface
>
> Not much to add to make this kernel-doc. Just the one missing parameter
> and mark it /** Given it's exported, it would be nice to tidy that up.
Will do, thanks.