linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] cxl: introduce CXL Virtualization module
@ 2023-12-28  6:05 Dongsheng Yang
  2023-12-28  6:05 ` [RFC PATCH 1/4] cxl: move some function from acpi module to core module Dongsheng Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Dongsheng Yang @ 2023-12-28  6:05 UTC (permalink / raw)
  To: dave, jonathan.cameron, ave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams
  Cc: linux-cxl, Dongsheng Yang

Hi all:
	This patchset introduce cxlv module to allow user to
create virtual cxl device. it's based linux6.7-rc5, you can
get the code from https://github.com/DataTravelGuide/linux

	As the real CXL device is not widely available now, we need
some virtual cxl device to do uplayer software developing or
testing. Qemu is good for functional testing, but not good
for some performance testing.

	The new CXLV module allow user to use the reserved RAM[1], to
create virtual cxl device. When the cxlv module load, it will
create a directory named as "cxl_virt" under /sys/devices/virtual:

	"/sys/devices/virtual/cxl_virt/"

that's the top level device for all cxlv devices.
At the same time, cxlv module will create a debugfs directory:

/sys/kernel/debug/cxl/cxlv
├── create
└── remove

the create and remove debugfs file is the cxlv entry to create or remove
a cxlv device.

	Each cxlv device have its owned virtual pci related bridge and bus, cxlv
will create a new root_port for the new cxlv device, setup cxl ports for
dport and nvdimm-bridge. After that, we will add the virtual pci device,
that will go into the cxl_pci_probe to setup new memdev.

	Then we can see the cxl device with cxl list and use it as a real cxl
device.

 $ echo "memstart=$((8*1024*1024*1024)),cxltype=3,pmem=1,memsize=$((2*1024*1024*1024))" > /sys/kernel/debug/cxl/cxlv/create
 $ cxl list
[
  {
    "memdev":"mem0",
    "pmem_size":1879048192,
    "serial":0,
    "numa_node":0,
    "host":"0010:01:00.0"
  }
]
 $ cxl create-region -m mem0 -d decoder0.0 -t pmem
{
  "region":"region0",
  "resource":"0x210000000",
  "size":"1792.00 MiB (1879.05 MB)",
  "type":"pmem",
  "interleave_ways":1,
  "interleave_granularity":256,
  "decode_state":"commit",
  "mappings":[
    {
      "position":0,
      "memdev":"mem0",
      "decoder":"decoder2.0"
    }
  ]
}
cxl region: cmd_create_region: created 1 region

 $ ndctl create-namespace -r region0 -m fsdax --map dev -t pmem -b 0
{
  "dev":"namespace0.0",
  "mode":"fsdax",
  "map":"dev",
  "size":"1762.00 MiB (1847.59 MB)",
  "uuid":"686fd289-a252-42cf-a3a5-95a39ed5c9d5",
  "sector_size":512,
  "align":2097152,
  "blockdev":"pmem0"
}

 $ mkfs.xfs -f /dev/pmem0 
meta-data=/dev/pmem0             isize=512    agcount=4, agsize=112768
blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1,
rmapbt=0
         =                       reflink=1    bigtime=0 inobtcount=0
data     =                       bsize=4096   blocks=451072, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Any comment is welcome!

TODO: implement cxlv command in ndctl to do cxlv device management.

[1]: Add argument in kernel command line: "memmap=nn[KMG]$ss[KMG]",
detail in Documentation/driver-api/cxl/memory-devices.rst

Thanx

Dongsheng Yang (4):
  cxl: move some function from acpi module to core module
  cxl/port: allow dport host to be driver-less device
  cxl/port: introduce cxl_disable_port() function
  cxl: introduce CXL Virtualization module

 MAINTAINERS                         |   6 +
 drivers/cxl/Kconfig                 |  11 +
 drivers/cxl/Makefile                |   1 +
 drivers/cxl/acpi.c                  | 143 +-----
 drivers/cxl/core/port.c             | 231 ++++++++-
 drivers/cxl/cxl.h                   |   6 +
 drivers/cxl/cxl_virt/Makefile       |   5 +
 drivers/cxl/cxl_virt/cxlv.h         |  87 ++++
 drivers/cxl/cxl_virt/cxlv_debugfs.c | 260 ++++++++++
 drivers/cxl/cxl_virt/cxlv_device.c  | 311 ++++++++++++
 drivers/cxl/cxl_virt/cxlv_main.c    |  67 +++
 drivers/cxl/cxl_virt/cxlv_pci.c     | 710 ++++++++++++++++++++++++++++
 drivers/cxl/cxl_virt/cxlv_pci.h     | 549 +++++++++++++++++++++
 drivers/cxl/cxl_virt/cxlv_port.c    | 149 ++++++
 14 files changed, 2388 insertions(+), 148 deletions(-)
 create mode 100644 drivers/cxl/cxl_virt/Makefile
 create mode 100644 drivers/cxl/cxl_virt/cxlv.h
 create mode 100644 drivers/cxl/cxl_virt/cxlv_debugfs.c
 create mode 100644 drivers/cxl/cxl_virt/cxlv_device.c
 create mode 100644 drivers/cxl/cxl_virt/cxlv_main.c
 create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.c
 create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.h
 create mode 100644 drivers/cxl/cxl_virt/cxlv_port.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/4] cxl: move some function from acpi module to core module
  2023-12-28  6:05 [RFC PATCH 0/4] cxl: introduce CXL Virtualization module Dongsheng Yang
@ 2023-12-28  6:05 ` Dongsheng Yang
  2023-12-28  6:43   ` Dongsheng Yang
  2023-12-28  6:05 ` [RFC PATCH 3/4] cxl/port: introduce cxl_disable_port() function Dongsheng Yang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Dongsheng Yang @ 2023-12-28  6:05 UTC (permalink / raw)
  To: dave, jonathan.cameron, ave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams
  Cc: linux-cxl, Dongsheng Yang

cxl_virt module will create root_port without cxl_acpi_probe(),
export these symbol to allow cxl_virt to create it's own root_port.

Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
---
 drivers/cxl/acpi.c      | 143 +--------------------------------------
 drivers/cxl/core/port.c | 145 ++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h       |   5 ++
 3 files changed, 151 insertions(+), 142 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 2034eb4ce83f..a60ed4156a5e 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -447,7 +447,7 @@ static int add_host_bridge_dport(struct device *match, void *arg)
  * A host bridge is a dport to a CFMWS decode and it is a uport to the
  * dport (PCIe Root Ports) in the host bridge.
  */
-static int add_host_bridge_uport(struct device *match, void *arg)
+int add_host_bridge_uport(struct device *match, void *arg)
 {
 	struct cxl_port *root_port = arg;
 	struct device *host = root_port->dev.parent;
@@ -504,30 +504,6 @@ static int add_host_bridge_uport(struct device *match, void *arg)
 	return 0;
 }
 
-static int add_root_nvdimm_bridge(struct device *match, void *data)
-{
-	struct cxl_decoder *cxld;
-	struct cxl_port *root_port = data;
-	struct cxl_nvdimm_bridge *cxl_nvb;
-	struct device *host = root_port->dev.parent;
-
-	if (!is_root_decoder(match))
-		return 0;
-
-	cxld = to_cxl_decoder(match);
-	if (!(cxld->flags & CXL_DECODER_F_PMEM))
-		return 0;
-
-	cxl_nvb = devm_cxl_add_nvdimm_bridge(host, root_port);
-	if (IS_ERR(cxl_nvb)) {
-		dev_dbg(host, "failed to register pmem\n");
-		return PTR_ERR(cxl_nvb);
-	}
-	dev_dbg(host, "%s: add: %s\n", dev_name(&root_port->dev),
-		dev_name(&cxl_nvb->dev));
-	return 1;
-}
-
 static struct lock_class_key cxl_root_key;
 
 static void cxl_acpi_lock_reset_class(void *dev)
@@ -535,123 +511,6 @@ static void cxl_acpi_lock_reset_class(void *dev)
 	device_lock_reset_class(dev);
 }
 
-static void del_cxl_resource(struct resource *res)
-{
-	kfree(res->name);
-	kfree(res);
-}
-
-static void cxl_set_public_resource(struct resource *priv, struct resource *pub)
-{
-	priv->desc = (unsigned long) pub;
-}
-
-static struct resource *cxl_get_public_resource(struct resource *priv)
-{
-	return (struct resource *) priv->desc;
-}
-
-static void remove_cxl_resources(void *data)
-{
-	struct resource *res, *next, *cxl = data;
-
-	for (res = cxl->child; res; res = next) {
-		struct resource *victim = cxl_get_public_resource(res);
-
-		next = res->sibling;
-		remove_resource(res);
-
-		if (victim) {
-			remove_resource(victim);
-			kfree(victim);
-		}
-
-		del_cxl_resource(res);
-	}
-}
-
-/**
- * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
- * @cxl_res: A standalone resource tree where each CXL window is a sibling
- *
- * Walk each CXL window in @cxl_res and add it to iomem_resource potentially
- * expanding its boundaries to ensure that any conflicting resources become
- * children. If a window is expanded it may then conflict with a another window
- * entry and require the window to be truncated or trimmed. Consider this
- * situation:
- *
- * |-- "CXL Window 0" --||----- "CXL Window 1" -----|
- * |--------------- "System RAM" -------------|
- *
- * ...where platform firmware has established as System RAM resource across 2
- * windows, but has left some portion of window 1 for dynamic CXL region
- * provisioning. In this case "Window 0" will span the entirety of the "System
- * RAM" span, and "CXL Window 1" is truncated to the remaining tail past the end
- * of that "System RAM" resource.
- */
-static int add_cxl_resources(struct resource *cxl_res)
-{
-	struct resource *res, *new, *next;
-
-	for (res = cxl_res->child; res; res = next) {
-		new = kzalloc(sizeof(*new), GFP_KERNEL);
-		if (!new)
-			return -ENOMEM;
-		new->name = res->name;
-		new->start = res->start;
-		new->end = res->end;
-		new->flags = IORESOURCE_MEM;
-		new->desc = IORES_DESC_CXL;
-
-		/*
-		 * Record the public resource in the private cxl_res tree for
-		 * later removal.
-		 */
-		cxl_set_public_resource(res, new);
-
-		insert_resource_expand_to_fit(&iomem_resource, new);
-
-		next = res->sibling;
-		while (next && resource_overlaps(new, next)) {
-			if (resource_contains(new, next)) {
-				struct resource *_next = next->sibling;
-
-				remove_resource(next);
-				del_cxl_resource(next);
-				next = _next;
-			} else
-				next->start = new->end + 1;
-		}
-	}
-	return 0;
-}
-
-static int pair_cxl_resource(struct device *dev, void *data)
-{
-	struct resource *cxl_res = data;
-	struct resource *p;
-
-	if (!is_root_decoder(dev))
-		return 0;
-
-	for (p = cxl_res->child; p; p = p->sibling) {
-		struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
-		struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
-		struct resource res = {
-			.start = cxld->hpa_range.start,
-			.end = cxld->hpa_range.end,
-			.flags = IORESOURCE_MEM,
-		};
-
-		if (resource_contains(p, &res)) {
-			cxlrd->res = cxl_get_public_resource(p);
-			break;
-		}
-	}
-
-	return 0;
-}
-
 static int cxl_acpi_probe(struct platform_device *pdev)
 {
 	int rc;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 38441634e4c6..d8dae028e8a4 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -989,6 +989,151 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
 	return 0;
 }
 
+int add_root_nvdimm_bridge(struct device *match, void *data)
+{
+	struct cxl_decoder *cxld;
+	struct cxl_port *root_port = data;
+	struct cxl_nvdimm_bridge *cxl_nvb;
+	struct device *host = root_port->dev.parent;
+
+	if (!is_root_decoder(match))
+		return 0;
+
+	cxld = to_cxl_decoder(match);
+	if (!(cxld->flags & CXL_DECODER_F_PMEM))
+		return 0;
+
+	cxl_nvb = devm_cxl_add_nvdimm_bridge(host, root_port);
+	if (IS_ERR(cxl_nvb)) {
+		dev_dbg(host, "failed to register pmem\n");
+		return PTR_ERR(cxl_nvb);
+	}
+	dev_dbg(host, "%s: add: %s\n", dev_name(&root_port->dev),
+		dev_name(&cxl_nvb->dev));
+	return 1;
+}
+EXPORT_SYMBOL_NS_GPL(add_root_nvdimm_bridge, CXL);
+
+static void del_cxl_resource(struct resource *res)
+{
+	kfree(res->name);
+	kfree(res);
+}
+
+static void cxl_set_public_resource(struct resource *priv, struct resource *pub)
+{
+	priv->desc = (unsigned long) pub;
+}
+
+static struct resource *cxl_get_public_resource(struct resource *priv)
+{
+	return (struct resource *) priv->desc;
+}
+
+void remove_cxl_resources(void *data)
+{
+	struct resource *res, *next, *cxl = data;
+
+	for (res = cxl->child; res; res = next) {
+		struct resource *victim = cxl_get_public_resource(res);
+
+		next = res->sibling;
+		remove_resource(res);
+
+		if (victim) {
+			remove_resource(victim);
+			kfree(victim);
+		}
+
+		del_cxl_resource(res);
+	}
+}
+EXPORT_SYMBOL_NS_GPL(remove_cxl_resources, CXL);
+
+/**
+ * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
+ * @cxl_res: A standalone resource tree where each CXL window is a sibling
+ *
+ * Walk each CXL window in @cxl_res and add it to iomem_resource potentially
+ * expanding its boundaries to ensure that any conflicting resources become
+ * children. If a window is expanded it may then conflict with a another window
+ * entry and require the window to be truncated or trimmed. Consider this
+ * situation:
+ *
+ * |-- "CXL Window 0" --||----- "CXL Window 1" -----|
+ * |--------------- "System RAM" -------------|
+ *
+ * ...where platform firmware has established as System RAM resource across 2
+ * windows, but has left some portion of window 1 for dynamic CXL region
+ * provisioning. In this case "Window 0" will span the entirety of the "System
+ * RAM" span, and "CXL Window 1" is truncated to the remaining tail past the end
+ * of that "System RAM" resource.
+ */
+int add_cxl_resources(struct resource *cxl_res)
+{
+	struct resource *res, *new, *next;
+
+	for (res = cxl_res->child; res; res = next) {
+		new = kzalloc(sizeof(*new), GFP_KERNEL);
+		if (!new)
+			return -ENOMEM;
+		new->name = res->name;
+		new->start = res->start;
+		new->end = res->end;
+		new->flags = IORESOURCE_MEM;
+		new->desc = IORES_DESC_CXL;
+
+		/*
+		 * Record the public resource in the private cxl_res tree for
+		 * later removal.
+		 */
+		cxl_set_public_resource(res, new);
+
+		insert_resource_expand_to_fit(&iomem_resource, new);
+
+		next = res->sibling;
+		while (next && resource_overlaps(new, next)) {
+			if (resource_contains(new, next)) {
+				struct resource *_next = next->sibling;
+
+				remove_resource(next);
+				del_cxl_resource(next);
+				next = _next;
+			} else
+				next->start = new->end + 1;
+		}
+	}
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(add_cxl_resources, CXL);
+
+int pair_cxl_resource(struct device *dev, void *data)
+{
+	struct resource *cxl_res = data;
+	struct resource *p;
+
+	if (!is_root_decoder(dev))
+		return 0;
+
+	for (p = cxl_res->child; p; p = p->sibling) {
+		struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+		struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
+		struct resource res = {
+			.start = cxld->hpa_range.start,
+			.end = cxld->hpa_range.end,
+			.flags = IORESOURCE_MEM,
+		};
+
+		if (resource_contains(p, &res)) {
+			cxlrd->res = cxl_get_public_resource(p);
+			break;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(pair_cxl_resource, CXL);
+
 /*
  * Since root-level CXL dports cannot be enumerated by PCI they are not
  * enumerated by the common port driver that acquires the port lock over
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 687043ece101..1397f66d943b 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -839,6 +839,11 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
 }
 #endif
 
+void remove_cxl_resources(void *data);
+int add_cxl_resources(struct resource *cxl_res);
+int pair_cxl_resource(struct device *dev, void *data);
+int add_root_nvdimm_bridge(struct device *match, void *data);
+
 /*
  * Unit test builds overrides this to __weak, find the 'strong' version
  * of these symbols in tools/testing/cxl/.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/4] cxl/port: introduce cxl_disable_port() function
  2023-12-28  6:05 [RFC PATCH 0/4] cxl: introduce CXL Virtualization module Dongsheng Yang
  2023-12-28  6:05 ` [RFC PATCH 1/4] cxl: move some function from acpi module to core module Dongsheng Yang
@ 2023-12-28  6:05 ` Dongsheng Yang
  2023-12-28  6:05 ` [RFC PATCH 4/4] cxl: introduce CXL Virtualization module Dongsheng Yang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Dongsheng Yang @ 2023-12-28  6:05 UTC (permalink / raw)
  To: dave, jonathan.cameron, ave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams
  Cc: linux-cxl, Dongsheng Yang

when we want to delete an port (e.g in cxlv), we want to
make sure there is no region attached to this port or
any child port. And more, we need to prevent region to
attach in port deleting.

cxl_disable_port() will return -EBUSY if there is any
region attached to this port or child port, otherwise it
will set any child endpoint decoder to CXL_DECODER_DEAD,
that means this port are going to be deleted, dont attach region
to it.

Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
---
 drivers/cxl/core/port.c | 80 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h       |  1 +
 2 files changed, 81 insertions(+)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 8d2d54da45e5..59ab8fe2cff2 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1508,6 +1508,86 @@ static void reap_dports(struct cxl_port *port)
 	}
 }
 
+/*
+ * Disable an endpoint decoder to prevent any more region attach.
+ */
+static int disable_decoder(struct device *device, void *data)
+{
+	struct cxl_endpoint_decoder *cxled;
+
+	if (!is_endpoint_decoder(device))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(device);
+	cxled->mode = CXL_DECODER_DEAD;
+
+	return 0;
+}
+
+/*
+ * Disable a port, if it is an endpoint port, it will disable
+ * the related endpoint decoder, otherwise, disable all child ports.
+ */
+static int disable_port(struct device *device, void *data)
+{
+	struct cxl_port *port;
+	int ret;
+
+	if (!is_cxl_port(device))
+		return 0;
+
+	port = to_cxl_port(device);
+	if (is_cxl_endpoint(port)) {
+		ret = device_for_each_child(&port->dev, NULL, disable_decoder);
+	} else {
+		ret = device_for_each_child(&port->dev, NULL, disable_port);
+	}
+
+	return ret;
+}
+
+/*
+ * If there is any region attached to this port or child port, return -EBUSY.
+ */
+static int port_busy(struct device *device, void *data)
+{
+	struct cxl_port *port;
+
+	if (!is_cxl_port(device))
+		return 0;
+
+	port = to_cxl_port(device);
+	if (!xa_empty(&port->regions)) {
+		return -EBUSY;
+	}
+
+	return device_for_each_child(&port->dev, NULL, port_busy);
+}
+
+/*
+ * Disable any child endpoint decoder to prevent region attach,
+ * then we can delete this port safely.
+ *
+ * Returns -EBUSY if there is still region attached to this port
+ * or child port.
+ */
+int cxl_disable_port(struct cxl_port *port)
+{
+	int ret;
+
+	down_write(&cxl_region_rwsem);
+	if (port_busy(&port->dev, NULL)) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
+	ret = disable_port(&port->dev, NULL);
+unlock:
+	up_write(&cxl_region_rwsem);
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_disable_port, CXL);
+
 struct detach_ctx {
 	struct cxl_memdev *cxlmd;
 	int depth;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1397f66d943b..a1343449f35c 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -716,6 +716,7 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
 struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
 					 struct device *dport_dev, int port_id,
 					 resource_size_t rcrb);
+int cxl_disable_port(struct cxl_port *port);
 
 #ifdef CONFIG_PCIEAER_CXL
 void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 4/4] cxl: introduce CXL Virtualization module
  2023-12-28  6:05 [RFC PATCH 0/4] cxl: introduce CXL Virtualization module Dongsheng Yang
  2023-12-28  6:05 ` [RFC PATCH 1/4] cxl: move some function from acpi module to core module Dongsheng Yang
  2023-12-28  6:05 ` [RFC PATCH 3/4] cxl/port: introduce cxl_disable_port() function Dongsheng Yang
@ 2023-12-28  6:05 ` Dongsheng Yang
  2024-01-03 17:22 ` [RFC PATCH 0/4] " Ira Weiny
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Dongsheng Yang @ 2023-12-28  6:05 UTC (permalink / raw)
  To: dave, jonathan.cameron, ave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams
  Cc: linux-cxl, Dongsheng Yang

As the real CXL device is not widely available now, we need
some virtual cxl device to do uplayer software developing or
testing. Qemu is good for functional testing, but not good
for some performance testing.

The new CXLV module allow user to use the reserved RAM[1], to
create virtual cxl device. When the cxlv module load, it will
create a directory named as "cxl_virt" under /sys/devices/virtual:

	"/sys/devices/virtual/cxl_virt/"

that's the top level device for all cxlv devices.
At the same time, cxlv module will create a debugfs directory:

/sys/kernel/debug/cxl/cxlv
├── create
└── remove

the create and remove debugfs file is the cxlv entry to create or remove
a cxlv device.

Each cxlv device have its owned virtual pci related bridge and bus, cxlv
will create a new root_port for the new cxlv device, setup cxl ports for
dport and nvdimm-bridge. After that, we will add the virtual pci device,
that will go into the cxl_pci_probe to setup new memdev.

Then we can see the cxl device with cxl list and use it as a real cxl
device.

[1]: Add argument in kernel command line: "memmap=nn[KMG]$ss[KMG]",
detail in Documentation/driver-api/cxl/memory-devices.rst

Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
---
 MAINTAINERS                         |   6 +
 drivers/cxl/Kconfig                 |  11 +
 drivers/cxl/Makefile                |   1 +
 drivers/cxl/cxl_virt/Makefile       |   5 +
 drivers/cxl/cxl_virt/cxlv.h         |  87 ++++
 drivers/cxl/cxl_virt/cxlv_debugfs.c | 260 ++++++++++
 drivers/cxl/cxl_virt/cxlv_device.c  | 311 ++++++++++++
 drivers/cxl/cxl_virt/cxlv_main.c    |  67 +++
 drivers/cxl/cxl_virt/cxlv_pci.c     | 710 ++++++++++++++++++++++++++++
 drivers/cxl/cxl_virt/cxlv_pci.h     | 549 +++++++++++++++++++++
 drivers/cxl/cxl_virt/cxlv_port.c    | 149 ++++++
 11 files changed, 2156 insertions(+)
 create mode 100644 drivers/cxl/cxl_virt/Makefile
 create mode 100644 drivers/cxl/cxl_virt/cxlv.h
 create mode 100644 drivers/cxl/cxl_virt/cxlv_debugfs.c
 create mode 100644 drivers/cxl/cxl_virt/cxlv_device.c
 create mode 100644 drivers/cxl/cxl_virt/cxlv_main.c
 create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.c
 create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.h
 create mode 100644 drivers/cxl/cxl_virt/cxlv_port.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e2c6187a3ac8..36fa8b6352b1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5255,6 +5255,12 @@ S:	Maintained
 F:	Documentation/admin-guide/perf/cxl.rst
 F:	drivers/perf/cxl_pmu.c
 
+COMPUTE EXPRESS LINK VIRTUALIZATION (CXLV)
+M:	Dongsheng Yang <dongsheng.yang@easystack.cn>
+L:	linux-cxl@vger.kernel.org
+S:	Maintained
+F:	drivers/cxl/cxl_virt/
+
 CONEXANT ACCESSRUNNER USB DRIVER
 L:	accessrunner-general@lists.sourceforge.net
 S:	Orphan
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 8ea1d340e438..065767ba4e47 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -154,4 +154,15 @@ config CXL_PMU
 	  monitoring units and provide standard perf based interfaces.
 
 	  If unsure say 'm'.
+
+config CXL_VIRT
+	tristate "CXL Vritualization"
+	depends on CXL_MEM && CXL_PMEM
+	help
+	  Enable virtualization of cxl device. It can create cxl devices
+	  by reserved memory. That would be helpful to get a fast cxl
+	  devices for performance tests.
+
+	  If unsure, or if this kernel is meant for production environments,
+	  say N.
 endif
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index db321f48ba52..7732eff8241e 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-y += core/
+obj-$(CONFIG_CXL_VIRT) += cxl_virt/
 obj-$(CONFIG_CXL_PCI) += cxl_pci.o
 obj-$(CONFIG_CXL_MEM) += cxl_mem.o
 obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
diff --git a/drivers/cxl/cxl_virt/Makefile b/drivers/cxl/cxl_virt/Makefile
new file mode 100644
index 000000000000..0585435ce553
--- /dev/null
+++ b/drivers/cxl/cxl_virt/Makefile
@@ -0,0 +1,5 @@
+cxlv-y := cxlv_main.o cxlv_pci.o cxlv_debugfs.o cxlv_port.o cxlv_device.o
+
+ccflags-y += -I$(srctree)/drivers/cxl
+ccflags-y += -I$(srctree)/drivers/cxl/core
+obj-$(CONFIG_CXL_VIRT) += cxlv.o
diff --git a/drivers/cxl/cxl_virt/cxlv.h b/drivers/cxl/cxl_virt/cxlv.h
new file mode 100644
index 000000000000..33ed4ff81713
--- /dev/null
+++ b/drivers/cxl/cxl_virt/cxlv.h
@@ -0,0 +1,87 @@
+#ifndef __CXLV_H__
+#define __CXLV_H__
+#include <linux/pci.h>
+#include "cxlmem.h"
+#include "core.h"
+
+#define CXLV_FW_VERSION	"CXLV VERSION 00"
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+struct cxlv_dev_options {
+	u8	cxltype;
+	u64	memstart;
+	u64	memsize;
+
+	bool	pmem;
+};
+
+struct cxlv_pci_cfg {
+	struct cxlv_pci_cfg_header	*pcihdr;
+	struct cxlv_pci_pm_cap		*pmcap;
+	struct cxlv_pci_msix_cap	*msixcap;
+	struct cxlv_pcie_cap		*pciecap;
+	struct cxlv_pci_ext_cap		*extcap;
+	u8 cfg_data[PCI_CFG_SPACE_EXP_SIZE];
+};
+
+struct cxlv_device {
+	struct device dev;
+	int cxlv_dev_id;
+
+	struct cxlv_dev_options *opts;
+
+	/* start and end should be CXLV_DEVICE_ALIGN aligned */
+	u64	aligned_start;
+	u64	aligned_end;
+
+	struct cxlv_pci_cfg dev_cfg;
+	struct cxlv_pci_cfg bridge_cfg;
+
+	struct pci_dev *bridge_pdev;
+	struct pci_dev *dev_pdev;
+
+	struct task_struct *cxlv_dev_handler;
+
+	struct cxl_port *root_port;
+	int domain_nr;
+	int host_bridge_busnr;
+	struct pci_host_bridge *host_bridge;
+};
+
+#define CXLV_DRV_NAME "CXLVirt"
+#define CXLV_VERSION 0x0110
+#define CXLV_DEVICE_ID	CXLV_VERSION
+#define CXLV_VENDOR_ID 0x7c73
+#define CXLV_SUBSYSTEM_ID	0x9a6c
+#define CXLV_SUBSYSTEM_VENDOR_ID CXLV_VENDOR_ID
+
+#define CXLV_DEVICE_RES_MIN		(1UL * CXL_CAPACITY_MULTIPLIER)
+#define CXLV_DEVICE_ALIGN		(SZ_256M)
+
+/* cxlv_main */
+extern struct bus_type cxlv_subsys;
+
+/* cxlv_pci */
+int cxlv_pci_init(struct cxlv_device *dev);
+void process_mbox(struct cxlv_device *dev);
+void process_decoder(struct cxlv_device *dev);
+
+/* cxlv_port */
+int cxlv_port_init(struct cxlv_device *cxlv_device);
+
+/* cxlv_device */
+int cxlv_create_dev(struct cxlv_dev_options *opts);
+int cxlv_remove_dev(u32 cxlv_dev_id);
+int cxlv_device_init(void);
+void cxlv_device_exit(void);
+struct cxlv_pci_cfg *find_pci_cfg(struct pci_bus *bus, unsigned int devfn);
+
+/* cxlv_debugfs */
+void cxlv_debugfs_cleanup(void);
+int cxlv_debugfs_init(void);
+#endif /*__CXLV_H__*/
diff --git a/drivers/cxl/cxl_virt/cxlv_debugfs.c b/drivers/cxl/cxl_virt/cxlv_debugfs.c
new file mode 100644
index 000000000000..084c36414900
--- /dev/null
+++ b/drivers/cxl/cxl_virt/cxlv_debugfs.c
@@ -0,0 +1,260 @@
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/parser.h>
+
+#include "cxlv.h"
+
+enum {
+	CXLV_CREATE_OPT_ERR		= 0,
+	CXLV_CREATE_OPT_CXLTYPE,
+	CXLV_CREATE_OPT_PMEM,
+	CXLV_CREATE_OPT_MEMSTART,
+	CXLV_CREATE_OPT_MEMSIZE,
+};
+
+static const match_table_t create_opt_tokens = {
+	{ CXLV_CREATE_OPT_CXLTYPE,	"cxltype=%u"	},
+	{ CXLV_CREATE_OPT_PMEM,		"pmem=%u"	},
+	{ CXLV_CREATE_OPT_MEMSTART,	"memstart=%s"	},
+	{ CXLV_CREATE_OPT_MEMSIZE,	"memsize=%s"	},
+	{ CXLV_CREATE_OPT_ERR,		NULL		}
+};
+
+static int parse_create_options(char *buf,
+		struct cxlv_dev_options *opts)
+{
+	substring_t args[MAX_OPT_ARGS];
+	char *o, *p;
+	int token, ret = 0;
+	u64 token64;
+
+	o = buf;
+
+	while ((p = strsep(&o, ",\n")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, create_opt_tokens, args);
+		switch (token) {
+		case CXLV_CREATE_OPT_PMEM:
+			if (match_uint(args, &token)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			opts->pmem = token;
+			break;
+		case CXLV_CREATE_OPT_CXLTYPE:
+			/* Only support type3 cxl device currently */
+			if (match_uint(args, &token) || token != 3) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			opts->cxltype = token;
+			break;
+		case CXLV_CREATE_OPT_MEMSTART:
+			if (match_u64(args, &token64)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			opts->memstart = token64;
+			break;
+		case CXLV_CREATE_OPT_MEMSIZE:
+			if (match_u64(args, &token64)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			opts->memsize = token64;;
+			break;
+		default:
+			pr_warn("unknown parameter or missing value '%s'\n", p);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+out:
+	return ret;
+}
+
+
+static struct dentry *cxlv_debugfs_root;
+static struct dentry *create_f;
+static struct dentry *remove_f;
+
+static void cxlv_debugfs_remove(struct dentry **dp)
+{
+	debugfs_remove(*dp);
+	*dp = NULL;
+}
+
+#define CXLV_DEBUGFS_WO_FILE(NAME)					\
+static const struct file_operations cxlv_ ## NAME ## _fops = {		\
+	.owner		= THIS_MODULE,					\
+	.open		= simple_open,					\
+	.write		= cxlv_ ## NAME ## _write,                      \
+	.llseek		= seq_lseek,					\
+};
+
+#define CXLV_DEBUGFS_FILE(NAME)						\
+static const struct file_operations cxlv_ ## NAME ## _fops = {		\
+	.owner		= THIS_MODULE,					\
+	.open		= simple_open,					\
+	.write		= cxlv_ ## NAME ## _write,			\
+	.read		= seq_read,					\
+	.llseek		= seq_lseek,					\
+};
+
+static ssize_t cxlv_debugfs_create_write(struct file *file, const char __user *ubuf,
+				size_t cnt, loff_t *ppos)
+{
+	int ret;
+	char *buf;
+	struct cxlv_dev_options *opts;
+
+	opts = kzalloc(sizeof(struct cxlv_dev_options), GFP_KERNEL);
+	if (!opts) {
+		pr_err("failed to alloc cxlv_dev_options.");
+		return -1;
+	}
+
+	buf = memdup_user(ubuf, cnt);
+	if (IS_ERR(buf)) {
+		pr_err("failed to dup buf: %d", (int)PTR_ERR(buf));
+		return PTR_ERR(buf);
+	}
+
+	ret = parse_create_options(buf, opts);
+	if (ret) {
+		kfree(buf);
+		return ret;
+	}
+	kfree(buf);
+
+	ret = cxlv_create_dev(opts);
+	if (ret) {
+		pr_err("failed to create device: %d", ret);
+		return -EINVAL;
+	}
+
+	return cnt;
+}
+
+CXLV_DEBUGFS_WO_FILE(debugfs_create);
+
+enum {
+	CXLV_REMOVE_OPT_ERR		= 0,
+	CXLV_REMOVE_OPT_CXLV_ID,
+};
+
+static const match_table_t remove_opt_tokens = {
+	{ CXLV_REMOVE_OPT_CXLV_ID,	"cxlv_dev_id=%u"	},
+	{ CXLV_REMOVE_OPT_ERR,		NULL		}
+};
+
+static int parse_remove_options(char *buf, u32 *cxlv_dev_id)
+{
+	substring_t args[MAX_OPT_ARGS];
+	char *o, *p;
+	int token, ret = 0;
+
+	o = buf;
+
+	while ((p = strsep(&o, ",\n")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, remove_opt_tokens, args);
+		switch (token) {
+		case CXLV_REMOVE_OPT_CXLV_ID:
+			if (match_uint(args, &token)) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			*cxlv_dev_id = token;
+			break;
+		default:
+			pr_warn("unknown parameter or missing value '%s'\n", p);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+out:
+	return ret;
+}
+
+static ssize_t cxlv_debugfs_remove_write(struct file *file, const char __user *ubuf,
+				size_t cnt, loff_t *ppos)
+{
+	char *buf;
+	u32 cxlv_dev_id;
+	int ret;
+
+	buf = memdup_user(ubuf, cnt);
+	if (IS_ERR(buf)) {
+		pr_err("failed to dup buf: %d", (int)PTR_ERR(buf));
+		return PTR_ERR(buf);
+	}
+
+	ret = parse_remove_options(buf, &cxlv_dev_id);
+	if (ret) {
+		kfree(buf);
+		return ret;
+	}
+	kfree(buf);
+
+	ret = cxlv_remove_dev(cxlv_dev_id);
+	if (ret < 0) {
+		return ret;
+	}
+
+	return cnt;
+}
+
+CXLV_DEBUGFS_WO_FILE(debugfs_remove);
+
+void cxlv_debugfs_cleanup(void)
+{
+	cxlv_debugfs_remove(&remove_f);
+	cxlv_debugfs_remove(&create_f);
+	cxlv_debugfs_remove(&cxlv_debugfs_root);
+}
+
+int cxlv_debugfs_init(void)
+{
+	struct dentry *dentry;
+	int ret;
+
+	dentry = cxl_debugfs_create_dir("cxlv");
+	if (IS_ERR(dentry)) {
+		ret = PTR_ERR(dentry);
+		goto out;
+	}
+
+	cxlv_debugfs_root = dentry;
+
+	create_f = debugfs_create_file("create", 0600, dentry, NULL,
+			&cxlv_debugfs_create_fops);
+	if (IS_ERR(create_f)) {
+		ret = PTR_ERR(create_f);
+		goto remove_root;
+	}
+
+	remove_f = debugfs_create_file("remove", 0600, dentry, NULL,
+			&cxlv_debugfs_remove_fops);
+	if (IS_ERR(remove_f)) {
+		ret = PTR_ERR(remove_f);
+		goto remove_create_f;
+	}
+
+	return 0;
+
+remove_create_f:
+	cxlv_debugfs_remove(&create_f);
+remove_root:
+	cxlv_debugfs_remove(&cxlv_debugfs_root);
+out:
+	return ret;
+}
diff --git a/drivers/cxl/cxl_virt/cxlv_device.c b/drivers/cxl/cxl_virt/cxlv_device.c
new file mode 100644
index 000000000000..3a0da247513d
--- /dev/null
+++ b/drivers/cxl/cxl_virt/cxlv_device.c
@@ -0,0 +1,311 @@
+#include <linux/delay.h>
+#include <linux/kthread.h>
+
+#include "cxlpci.h"
+#include "cxlv.h"
+#include "cxlv_pci.h"
+
+/* TODO support more cxlv devices */
+#define CXLV_DEVICE_MAX_NUM	1
+static struct cxlv_device *cxlv_devices[CXLV_DEVICE_MAX_NUM];
+static struct mutex cxlv_devices_lock;
+
+/* TODO faster way to find pci cfg for more devices supporting, e.g: XARRAY */
+struct cxlv_pci_cfg *find_pci_cfg(struct pci_bus *bus, unsigned int devfn)
+{
+	int i;
+	struct cxlv_device *cxlv_device;
+
+	for (i = 0; i < CXLV_DEVICE_MAX_NUM; i++) {
+		cxlv_device = cxlv_devices[i];
+
+		if (!cxlv_device)
+			continue;
+
+		if (pci_find_host_bridge(bus)->bus->number != cxlv_device->host_bridge_busnr ||
+				pci_domain_nr(bus) != cxlv_device->domain_nr)
+			continue;
+
+		if (pci_is_root_bus(bus)) {
+			return &cxlv_device->bridge_cfg;
+		} else {
+			return &cxlv_device->dev_cfg;
+		}
+
+		continue;
+	}
+
+	return NULL;
+}
+
+static int cxlv_device_find_empty(void)
+{
+	int i;
+
+	for (i = 0; i < CXLV_DEVICE_MAX_NUM; i++) {
+		if (!cxlv_devices[i])
+			return i;
+	}
+
+	return -1;
+}
+
+static int cxlv_device_register(struct cxlv_device *cxlv_device)
+{
+	int cxlv_dev_id = cxlv_device->cxlv_dev_id;
+
+	if (cxlv_devices[cxlv_dev_id] != NULL) {
+		return -EEXIST;
+	}
+
+	cxlv_devices[cxlv_dev_id] = cxlv_device;
+
+	return 0;
+}
+
+static void cxlv_device_unregister(struct cxlv_device *cxlv_device)
+{
+	int cxlv_dev_id = cxlv_device->cxlv_dev_id;
+
+	BUG_ON(cxlv_devices[cxlv_dev_id] != cxlv_device);
+
+	cxlv_devices[cxlv_dev_id] = NULL;
+}
+
+int cxlv_device_init(void)
+{
+	int i;
+
+	for (i = 0; i < CXLV_DEVICE_MAX_NUM; i++) {
+		cxlv_devices[i] = NULL;
+	}
+
+	mutex_init(&cxlv_devices_lock);
+
+	return 0;
+}
+
+void cxlv_device_exit(void)
+{
+	return;
+}
+
+static void cxlv_dev_release(struct device *dev)
+{
+}
+
+static struct cxlv_device *cxlv_device_create(struct cxlv_dev_options *opts)
+{
+	struct device *cxlv_dev;
+	struct cxlv_device *cxlv_device = NULL;
+	int cxlv_dev_id;
+	int ret;
+
+	mutex_lock(&cxlv_devices_lock);
+	cxlv_dev_id = cxlv_device_find_empty();
+	if (cxlv_dev_id < 0) {
+		pr_err("There is no more cxlv device can be created.");
+		goto unlock;
+	}
+
+	cxlv_device = kzalloc(sizeof(struct cxlv_device), GFP_KERNEL);
+	if (!cxlv_device) {
+		pr_err("failed to alloc cxlv_device");
+		goto unlock;
+	}
+
+	cxlv_device->opts = opts;
+	cxlv_device->cxlv_dev_id = cxlv_dev_id;
+	cxlv_device->aligned_start = ALIGN(opts->memstart + CXLV_RESOURCE_OFF,
+					   CXLV_DEVICE_ALIGN);
+	cxlv_device->aligned_end = ALIGN_DOWN(opts->memstart + opts->memsize,
+					      CXLV_DEVICE_ALIGN) - 1;
+
+	ret = cxlv_device_register(cxlv_device);
+	if (ret) {
+		pr_err("failed to register cxlv_device");
+		goto release_device;
+	}
+	mutex_unlock(&cxlv_devices_lock);
+
+	cxlv_dev = &cxlv_device->dev;
+	cxlv_dev->release = cxlv_dev_release;
+	cxlv_dev->bus = &cxlv_subsys;
+	dev_set_name(cxlv_dev, "cxlv%d", cxlv_dev_id);
+	device_set_pm_not_required(cxlv_dev);
+
+	ret = device_register(cxlv_dev);
+	if (ret < 0) {
+		goto unregister;
+	}
+
+	return cxlv_device;
+
+unregister:
+        mutex_lock(&cxlv_devices_lock);
+        cxlv_device_unregister(cxlv_device);
+release_device:
+	kfree(cxlv_device);
+unlock:
+	mutex_unlock(&cxlv_devices_lock);
+	return NULL;
+}
+
+void cxlv_device_release(struct cxlv_device *cxlv_device)
+{
+	device_unregister(&cxlv_device->dev);
+
+	mutex_lock(&cxlv_devices_lock);
+	cxlv_device_unregister(cxlv_device);
+	mutex_unlock(&cxlv_devices_lock);
+
+	if (cxlv_device->opts)
+		kfree(cxlv_device->opts);
+
+	if (cxlv_device)
+		kfree(cxlv_device);
+}
+
+#define CXLV_HANDLER_SLEEP_US		1000
+static int cxlv_handle(void *data)
+{
+	while (!kthread_should_stop()) {
+		process_mbox(data);
+		process_decoder(data);
+
+		/* sleep 10us after each loop */
+		fsleep(CXLV_HANDLER_SLEEP_US);
+	}
+
+	return 0;
+}
+
+static void cxlv_dev_handler_init(struct cxlv_device *cxlv_device)
+{
+	cxlv_device->cxlv_dev_handler = kthread_create(cxlv_handle,
+						       cxlv_device,
+						       "cxlv_dev_handler");
+	wake_up_process(cxlv_device->cxlv_dev_handler);
+}
+
+static void cxlv_dev_handler_final(struct cxlv_device *cxlv_device)
+{
+	if (!IS_ERR_OR_NULL(cxlv_device->cxlv_dev_handler)) {
+		kthread_stop(cxlv_device->cxlv_dev_handler);
+		cxlv_device->cxlv_dev_handler = NULL;
+	}
+}
+
+static int not_reserved(struct resource *res, void *arg)
+{
+	pr_err("has System RAM: %pr\n", res);
+
+	return 1;
+}
+
+static int validate_configs(struct cxlv_dev_options *opts)
+{
+	u64 res_start;
+	u64 res_end;
+	int ret;
+
+	if (!IS_ENABLED(CONFIG_CXL_PMEM) && opts->pmem) {
+		pr_err("CONFIG_CXL_PMEM is not enabled");
+		return -EINVAL;
+	}
+
+	if (!opts->memstart || !opts->memsize) {
+		pr_err("[memstart] and [memsize] should be specified");
+		return -EINVAL;
+	}
+
+	/* check for memory reserved */
+	res_start = opts->memstart;
+	res_end = res_start + opts->memsize - 1;
+
+	ret = walk_iomem_res_desc(IORES_DESC_NONE,
+				IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
+				res_start,
+				res_end, NULL,
+				not_reserved);
+
+	if (ret > 0) {
+		pr_err("range [%llu, %llu] is not reserved.", res_start, res_end);
+		return ret;
+	}
+
+	/* check the aligned resource */
+	res_start = ALIGN(res_start + CXLV_RESOURCE_OFF, CXLV_DEVICE_ALIGN);
+	if ((res_end - res_start + 1) < CXLV_DEVICE_RES_MIN) {
+		pr_err("[%llu, %llu]: first %u is for metadata, \
+				the rest is too small as we need %lu aligned resource range.",
+				opts->memstart, res_end, CXLV_RESOURCE_OFF, CXLV_DEVICE_RES_MIN);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int cxlv_create_dev(struct cxlv_dev_options *opts)
+{
+	int ret;
+	struct cxlv_device *cxlv_device;
+
+	if (validate_configs(opts)) {
+		return -EINVAL;
+	}
+
+	cxlv_device = cxlv_device_create(opts);
+	if (!cxlv_device) {
+		return -ENOMEM;
+	}
+
+	ret = cxlv_pci_init(cxlv_device);
+	if (ret) {
+		goto err;
+	}
+
+	ret = cxlv_port_init(cxlv_device);
+	if (ret)
+		goto err;
+
+	cxlv_dev_handler_init(cxlv_device);
+
+	pci_bus_add_devices(cxlv_device->host_bridge->bus);
+
+	__module_get(THIS_MODULE);
+	return 0;
+
+err:
+	cxlv_device_release(cxlv_device);
+	return -EIO;
+}
+
+int cxlv_remove_dev(u32 cxlv_dev_id)
+{
+	struct cxlv_device *cxlv_device;
+
+	if (cxlv_dev_id >= CXLV_DEVICE_MAX_NUM)
+		return -EINVAL;
+
+	if (cxlv_devices[cxlv_dev_id] == NULL)
+		return -EINVAL;
+
+	cxlv_device = cxlv_devices[cxlv_dev_id];
+	if (cxl_disable_port(cxlv_device->root_port))
+		return -EBUSY;
+
+	if (cxlv_device->host_bridge) {
+		pci_stop_root_bus(cxlv_device->host_bridge->bus);
+		pci_remove_root_bus(cxlv_device->host_bridge->bus);
+		put_device(&cxlv_device->host_bridge->dev);
+	}
+
+	cxlv_dev_handler_final(cxlv_device);
+
+	cxlv_device_release(cxlv_device);
+
+	module_put(THIS_MODULE);
+
+	return 0;
+}
diff --git a/drivers/cxl/cxl_virt/cxlv_main.c b/drivers/cxl/cxl_virt/cxlv_main.c
new file mode 100644
index 000000000000..3ac6f612b7ca
--- /dev/null
+++ b/drivers/cxl/cxl_virt/cxlv_main.c
@@ -0,0 +1,67 @@
+/*
+ * Copyright(C) 2024, Dongsheng Yang <dongsheng.yang@easystack.cn>
+ */
+
+#include "cxlv.h"
+
+struct bus_type cxlv_subsys = {
+	.name                           = "cxl_virt",
+};
+
+static int cxl_virt_dev_init(void)
+{
+	int ret;
+
+	ret = subsys_virtual_register(&cxlv_subsys, NULL);
+	if (ret) {
+		pr_err("failed to register cxlv subsys");
+		return ret;
+	}
+
+	return 0;
+}
+
+static void cxl_virt_dev_exit(void)
+{
+	bus_unregister(&cxlv_subsys);
+}
+
+static int __init cxlv_init(void)
+{
+	int ret;
+
+	ret = cxl_virt_dev_init();
+	if (ret)
+		goto out;
+
+	ret = cxlv_device_init();
+	if (ret)
+		goto cxl_virt_dev_exit;
+
+	ret = cxlv_debugfs_init();
+	if (ret)
+		goto device_exit;
+
+	return 0;
+
+device_exit:
+	cxlv_device_exit();
+cxl_virt_dev_exit:
+	cxl_virt_dev_exit();
+out:
+	return ret;
+}
+
+static void cxlv_exit(void)
+{
+	cxlv_debugfs_cleanup();
+	cxlv_device_exit();
+	cxl_virt_dev_exit();
+}
+
+MODULE_AUTHOR("Dongsheng Yang <dongsheng.yang@easystack.cn>");
+MODULE_DESCRIPTION("CXL(Compute Express Link) Virtualization");
+MODULE_LICENSE("GPL v2");
+MODULE_IMPORT_NS(CXL);
+module_init(cxlv_init);
+module_exit(cxlv_exit);
diff --git a/drivers/cxl/cxl_virt/cxlv_pci.c b/drivers/cxl/cxl_virt/cxlv_pci.c
new file mode 100644
index 000000000000..b3e73d4c5957
--- /dev/null
+++ b/drivers/cxl/cxl_virt/cxlv_pci.c
@@ -0,0 +1,710 @@
+#include "cxlv.h"
+#include "cxlv_pci.h"
+#include "cxlpci.h"
+#include "cxlmem.h"
+
+static struct cxl_cel_entry cel_logs[] = {
+	{ .opcode = CXL_MBOX_OP_GET_SUPPORTED_LOGS, .effect = 0 },
+	{ .opcode = CXL_MBOX_OP_GET_LOG, .effect = 0 },
+	{ .opcode = CXL_MBOX_OP_IDENTIFY, .effect = 0 },
+};
+
+#define CXLV_CEL_SUPPORTED_NUM		3
+
+void process_decoder(struct cxlv_device *dev)
+{
+	struct cxl_component *comp;
+	struct cxl_decoder_cap *decoder;
+
+	/* process device decoder */
+	comp = ioremap(pci_resource_start(dev->dev_pdev, 0) + CXLV_DEV_BAR_COMPONENT_OFF,
+					  CXLV_DEV_BAR_COMPONENT_LEN);
+
+	decoder = (struct cxl_decoder_cap *)((char *)comp + CXLV_COMP_CACHEMEM_OFF + CXLV_COMP_DECODER_OFF);
+	if (decoder->decoder[0].ctrl_regs & CXLV_DECODER_CTRL_COMMIT) {
+		decoder->decoder[0].ctrl_regs |= CXLV_DECODER_CTRL_COMMITTED;
+		decoder->decoder[0].ctrl_regs &= ~CXLV_DECODER_CTRL_COMMIT;
+		decoder->decoder[0].ctrl_regs &= ~CXLV_DECODER_CTRL_COMMIT_ERR;
+	}
+	iounmap(comp);
+
+	/* process bridge decoder */
+	comp = ioremap(pci_resource_start(dev->bridge_pdev, 0) + CXLV_BRIDGE_BAR_COMPONENT_OFF,
+					  CXLV_BRIDGE_BAR_COMPONENT_LEN);
+
+	decoder = (struct cxl_decoder_cap *)((char *)comp + CXLV_COMP_CACHEMEM_OFF + CXLV_COMP_DECODER_OFF);
+	if (decoder->decoder[0].ctrl_regs & CXLV_DECODER_CTRL_COMMIT) {
+		decoder->decoder[0].ctrl_regs |= CXLV_DECODER_CTRL_COMMITTED;
+		decoder->decoder[0].ctrl_regs &= ~CXLV_DECODER_CTRL_COMMIT;
+		decoder->decoder[0].ctrl_regs &= ~CXLV_DECODER_CTRL_COMMIT_ERR;
+	}
+	iounmap(comp);
+
+	return;
+}
+
+void process_mbox(struct cxlv_device *dev)
+{
+	struct pci_dev *pdev = dev->dev_pdev;
+	struct cxl_bar *bar;
+	struct cxlv_mbox *mbox;
+	int ret;
+
+	bar = ioremap(pci_resource_start(pdev, 0) + CXLV_DEV_BAR_DEV_REGS_OFF,
+		      CXLV_DEV_BAR_DEV_REGS_LEN);
+
+	mbox = ((void *)bar) + CXLV_DEV_CAP_MBOX_OFF;
+
+	if (cxlv_mbox_test_doorbell(mbox)) {
+		if (cxlv_mbox_get_cmd(mbox) == CXL_MBOX_OP_GET_SUPPORTED_LOGS) {
+			struct cxl_mbox_get_supported_logs *supported_log;
+			u32 payload_len;
+
+			payload_len = sizeof(*supported_log) + sizeof(supported_log->entry[0]);
+
+			supported_log = kzalloc(payload_len, GFP_KERNEL);
+			if (!supported_log) {
+				ret = CXL_MBOX_CMD_RC_INTERNAL;
+				goto out;
+			}
+
+			supported_log->entries = cpu_to_le16(1);
+			supported_log->entry[0].uuid = DEFINE_CXL_CEL_UUID;
+			supported_log->entry[0].size = cpu_to_le32(sizeof(struct cxl_cel_entry) * CXLV_CEL_SUPPORTED_NUM);
+
+			cxlv_mbox_copy_to_payload(mbox, 0, supported_log, payload_len);
+			cxlv_mbox_set_cmd_payload_len(mbox, payload_len);
+			ret = CXL_MBOX_CMD_RC_SUCCESS;
+			kfree(supported_log);
+		} else if (cxlv_mbox_get_cmd(mbox) == CXL_MBOX_OP_GET_LOG) {
+			struct cxl_mbox_get_log get_log;
+
+			cxlv_mbox_copy_from_payload(mbox, 0, &get_log, sizeof(struct cxl_mbox_get_log));
+
+			if (!uuid_equal(&get_log.uuid, &DEFINE_CXL_CEL_UUID)) {
+				ret = CXL_MBOX_CMD_RC_LOG;
+				goto out;
+			}
+
+			cxlv_mbox_copy_to_payload(mbox, le32_to_cpu(get_log.offset), cel_logs, le32_to_cpu(get_log.length));
+			cxlv_mbox_set_cmd_payload_len(mbox, le32_to_cpu(get_log.length));
+			ret = CXL_MBOX_CMD_RC_SUCCESS;
+		} else if (cxlv_mbox_get_cmd(mbox) == CXL_MBOX_OP_IDENTIFY) {
+			struct cxl_mbox_identify id = { 0 };
+			u64 capacity = (dev->aligned_end - dev->aligned_start + 1) / CXL_CAPACITY_MULTIPLIER;
+
+			strcpy(id.fw_revision, CXLV_FW_VERSION);
+
+			if (dev->opts->pmem) {
+				id.total_capacity = cpu_to_le64(capacity);
+				id.volatile_capacity = 0;
+				id.persistent_capacity = cpu_to_le64(capacity);
+				id.lsa_size = cpu_to_le64(CXLV_DEV_BAR_LSA_LEN);
+			} else {
+				id.total_capacity = cpu_to_le64(capacity);
+				id.volatile_capacity = cpu_to_le64(capacity);
+				id.persistent_capacity = 0;
+			}
+
+			cxlv_mbox_copy_to_payload(mbox, 0, &id, sizeof(id));
+			cxlv_mbox_set_cmd_payload_len(mbox, sizeof(id));
+			ret = CXL_MBOX_CMD_RC_SUCCESS;
+		} else if (cxlv_mbox_get_cmd(mbox) == CXL_MBOX_OP_GET_LSA) {
+			void *lsa;
+			struct cxl_mbox_get_lsa get_lsa = { 0 };
+
+			cxlv_mbox_copy_from_payload(mbox, 0, &get_lsa, sizeof(struct cxl_mbox_get_lsa));
+
+			u32 offset = le32_to_cpu(get_lsa.offset);
+			u32 len = le32_to_cpu(get_lsa.length);
+
+			if (len > CXLV_DEV_CAP_MBOX_PAYLOAD) {
+				ret = CXL_MBOX_CMD_RC_INPUT;
+				goto out;
+			}
+
+			/* read lsa from bar */
+			lsa = memremap(pci_resource_start(pdev, 0) + CXLV_DEV_BAR_LSA_OFF,
+					CXLV_DEV_BAR_LSA_LEN, MEMREMAP_WB);
+			cxlv_mbox_copy_to_payload(mbox, 0, lsa + offset, len);
+			memunmap(lsa);
+
+			cxlv_mbox_set_cmd_payload_len(mbox, len);
+			ret = CXL_MBOX_CMD_RC_SUCCESS;
+		} else if (cxlv_mbox_get_cmd(mbox) == CXL_MBOX_OP_SET_LSA) {
+			void *lsa;
+			struct cxl_mbox_set_lsa *set_lsa = (struct cxl_mbox_set_lsa *)mbox->payload;
+			u32 offset = le32_to_cpu(set_lsa->offset);
+			u32 len = FIELD_GET(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, mbox->cmd);
+
+			/* write lsa to bar */
+			lsa = memremap(pci_resource_start(pdev, 0) + CXLV_DEV_BAR_LSA_OFF,
+					CXLV_DEV_BAR_LSA_LEN, MEMREMAP_WB);
+			memcpy(lsa + offset, set_lsa->data, len);
+			memunmap(lsa);
+
+			ret = CXL_MBOX_CMD_RC_SUCCESS;
+		} else {
+			dev_err(&dev->dev, "unsupported cmd: 0x%x", cxlv_mbox_get_cmd(mbox));
+			ret = CXL_MBOX_CMD_RC_UNSUPPORTED;
+		}
+out:
+		cxlv_mbox_set_retcode(mbox, ret);
+		smp_mb();
+		cxlv_mbox_clear_doorbell(mbox);
+		iounmap(bar);
+	}
+
+	return;
+}
+
+static int cxlv_pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val)
+{
+	struct cxlv_pci_cfg *pci_cfg;
+
+	if (devfn != 0)
+		return 1;
+
+	pci_cfg = find_pci_cfg(bus, devfn);
+	if (!pci_cfg)
+		return -ENXIO;
+
+	memcpy(val, pci_cfg->cfg_data + where, size);
+
+	pr_debug("[R] bus: %p, devfn: %u, 0x%x, size: %d, val: 0x%x\n", bus, devfn, where, size, *val);
+
+	return 0;
+};
+
+static int cxlv_pci_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 _val)
+{
+	struct cxlv_pci_cfg *pci_cfg;
+	u32 mask = ~(0U);
+	u32 val = 0x00;
+	int target = where;
+
+	WARN_ON(size > sizeof(_val));
+
+	pci_cfg = find_pci_cfg(bus, devfn);
+	if (!pci_cfg)
+		return -ENXIO;
+
+	memcpy(&val, pci_cfg->cfg_data + where, size);
+
+	if (where < CXLV_PCI_PM_CAP_OFFS) {
+		if (target == PCI_STATUS) {
+			mask = 0xF200;
+		} else if (target == PCI_BIST) {
+			mask = PCI_BIST_START;
+		} else if (target == PCI_BASE_ADDRESS_0) {
+			/* bar size is 1M */
+			mask = 0xFFE00000;
+		} else if (target == PCI_INTERRUPT_LINE) {
+			mask = 0xFF;
+		} else {
+			mask = 0x0;
+		}
+	}
+
+	val = (val & (~mask)) | (_val & mask);
+	memcpy(pci_cfg->cfg_data + where, &val, size);
+
+	pr_debug("[W] bridge 0x%x, mask: 0x%x, val: 0x%x -> 0x%x, size: %d, new: 0x%x\n", where, mask,
+	    val, _val, size, (val & (~mask)) | (_val & mask));
+
+	return 0;
+};
+
+static struct pci_ops cxlv_pci_ops = {
+	.read = cxlv_pci_read,
+	.write = cxlv_pci_write,
+};
+
+static struct pci_sysdata cxlv_pci_sysdata = {
+	.domain = CXLV_PCI_DOMAIN_NUM,
+	.node = 0,
+};
+
+static void cxlv_dev_reg_init(struct pci_dev *dev)
+{
+	struct cxl_bar *bar;
+	struct cap_array_header *array_header;
+	struct cap_header *cap_header;
+	struct cxlv_mbox *mbox;
+	struct cxl_dev_status *dev_status;
+	struct cxl_memdev_cap *memdev;
+	u16 val;
+	u64 status_val;
+
+	bar = ioremap(pci_resource_start(dev, 0) + CXLV_DEV_BAR_DEV_REGS_OFF, CXLV_DEV_BAR_DEV_REGS_LEN);
+
+	BUG_ON(!bar);
+
+	memset(bar, 0x0, CXLV_DEV_BAR_DEV_REGS_LEN);
+
+	/* Initialize device cap array header */
+	array_header = &bar->cap_array_header;
+	array_header->cap_id = cpu_to_le16(CXLDEV_CAP_ARRAY_CAP_ID);
+
+	val = CXLV_DEV_CAP_ARRAY_HEADER_VERS_DEFAULT;
+	val |= FIELD_PREP(CXLV_DEV_CAP_ARRAY_HEADER_TYPE_MASK, CXLV_DEV_CAP_ARRAY_HEADER_TYPE_MEMDEV);
+	array_header->vers_type = cpu_to_le16(val);
+
+	array_header->cap_count = cpu_to_le16(CXLV_DEV_CAP_ARRAY_SIZE);
+
+	/* Initialize device status cap */
+	cap_header = &bar->cap_headers[0];
+	cap_header->cap_id = cpu_to_le16(CXLDEV_CAP_CAP_ID_DEVICE_STATUS);
+	cap_header->version = 0;
+	cap_header->offset = cpu_to_le32(CXLV_DEV_CAP_STATUS_OFF);
+	cap_header->len = cpu_to_le32(CXLV_DEV_CAP_STATUS_LEN);
+
+	cap_header = &bar->cap_headers[1];
+	cap_header->cap_id = cpu_to_le16(CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX);
+	cap_header->version = 0;
+	cap_header->offset = cpu_to_le32(CXLV_DEV_CAP_MBOX_OFF);
+	cap_header->len = cpu_to_le32(CXLV_DEV_CAP_MBOX_LEN);
+
+	cap_header = &bar->cap_headers[2];
+	cap_header->cap_id = cpu_to_le16(CXLDEV_CAP_CAP_ID_MEMDEV);
+	cap_header->version = 0;
+	cap_header->offset = cpu_to_le32(CXLV_DEV_CAP_MEMDEV_OFF);
+	cap_header->len = cpu_to_le32(CXLV_DEV_CAP_MEMDEV_LEN);
+
+	dev_status = ((void *)bar) + CXLV_DEV_CAP_STATUS_OFF;
+	dev_status->status = 0;
+
+	mbox = ((void *)bar) + CXLV_DEV_CAP_MBOX_OFF;
+	mbox->cap = cpu_to_le32(CXLV_MBOX_CAP_PAYLOAD_SIZE_DEFAULT & CXLV_MBOX_CAP_PAYLOAD_SIZE_MASK);
+	mbox->control = 0;
+
+	memdev = ((void *)bar) + CXLV_DEV_CAP_MEMDEV_OFF;
+	status_val = CXLV_MEMDEV_CAP_MBXO_INTERFACE_READY;
+	status_val |= FIELD_PREP(CXLV_MEMDEV_CAP_MEDIA_STATUS_MASK, CXLV_MEMDEV_CAP_MEDIA_STATUS_DEFAULT);
+	status_val |= FIELD_PREP(CXLV_MEMDEV_CAP_MBOX_RESET_NEEDED_MASK, CXLV_MEMDEV_CAP_MBOX_RESET_NEEDED_DEFAULT);
+	memdev->status = cpu_to_le64(status_val);
+
+	iounmap(bar);
+}
+
+static int cxlv_component_reg_init(struct pci_dev *pdev, u32 off, u32 len)
+{
+	struct cxl_component *comp;
+	struct cxl_decoder_cap *decoder;
+	u32 val;
+
+	comp = ioremap(pci_resource_start(pdev, 0) + off, len);
+
+	val = CM_CAP_HDR_CAP_ID;
+	val |= FIELD_PREP(CXLV_COMP_CACHEMEM_HDR_CAP_VER_MASK, 1);
+	val |= FIELD_PREP(CXLV_COMP_CACHEMEM_HDR_CACHEMEM_VER_MASK, 1);
+	val |= FIELD_PREP(CXLV_COMP_CACHEMEM_HDR_ARRAY_SIZE_MASK, 1);
+	writel(val, &comp->cachemem_comp.header);
+
+	val = CXL_CM_CAP_CAP_ID_HDM;
+	val |= FIELD_PREP(CXLV_COMP_CACHEMEM_HDM_CAP_VER_MASK, 3);
+	val |= FIELD_PREP(CXLV_COMP_CACHEMEM_HDM_DECODER_POINTER_MASK, CXLV_COMP_DECODER_OFF);
+	writel(val, &comp->cachemem_comp.hdm_cap);
+
+	decoder = (struct cxl_decoder_cap *)((char *)comp + CXLV_COMP_CACHEMEM_OFF + CXLV_COMP_DECODER_OFF);
+	val = FIELD_PREP(CXLV_DECODER_CAP_DCOUNT_MASK, 0);
+	val |= FIELD_PREP(CXLV_DECODER_CAP_TCOUNT_MASK, 1);
+	writel(val, &decoder->cap_reg);
+
+	decoder->decoder[0].ctrl_regs &= ~CXLV_DECODER_CTRL_COMMITTED;
+
+	iounmap(comp);
+
+	return 0;
+}
+
+static void cxlv_msix_table_init(struct pci_dev *dev)
+{
+	void *msix_table;
+
+	msix_table = ioremap(pci_resource_start(dev, 0) + CXLV_BAR_PCI_MSIX_OFF,
+			CXLV_BAR_PCI_MSIX_LEN);
+	memset(msix_table, 0x00, CXLV_BAR_PCI_MSIX_LEN);
+	iounmap(msix_table);
+}
+
+static struct pci_bus *cxlv_pci_bus_init(struct cxlv_device *cxlv_device)
+{
+	struct pci_bus *bus = cxlv_device->host_bridge->bus;
+	struct pci_dev *dev, *t_dev;
+
+	pci_scan_child_bus(bus);
+
+	list_for_each_entry(t_dev, &bus->devices, bus_list) {
+		if (!t_dev->subordinate)
+			continue;
+
+		struct pci_bus *b_bus = t_dev->subordinate;
+		struct resource *res = &t_dev->resource[0];
+		int i;
+
+		cxlv_device->bridge_pdev = t_dev;
+
+		res->parent = &iomem_resource;
+
+		for (i = PCI_BRIDGE_RESOURCES; i <= PCI_BRIDGE_RESOURCE_END; i++) {
+			res = &t_dev->resource[i];
+			res->parent = &iomem_resource;
+		}
+
+		cxlv_component_reg_init(t_dev, CXLV_BRIDGE_BAR_COMPONENT_OFF, CXLV_BRIDGE_BAR_COMPONENT_LEN);
+		cxlv_msix_table_init(t_dev);
+
+		list_for_each_entry(dev, &b_bus->devices, bus_list) {
+			res = &dev->resource[0];
+			res->parent = &iomem_resource;
+
+			cxlv_device->dev_pdev = dev;
+			cxlv_dev_reg_init(dev);
+			cxlv_component_reg_init(dev, CXLV_DEV_BAR_COMPONENT_OFF, CXLV_DEV_BAR_COMPONENT_LEN);
+			cxlv_msix_table_init(dev);
+		}
+	}
+
+	return bus;
+};
+
+static void pci_dev_header_init(struct cxlv_pci_cfg_header *pcihdr, unsigned long base_pa)
+{
+	pcihdr->vid = CXLV_VENDOR_ID;
+	pcihdr->did = CXLV_DEVICE_ID;
+	u32 bar = 0;
+
+	pcihdr->status = cpu_to_le16(PCI_STATUS_CAP_LIST);
+
+	pcihdr->rid = 0x01;
+
+	pcihdr->class_code.bcc = PCI_BASE_CLASS_MEMORY;
+	pcihdr->class_code.scc = 0x02;
+	pcihdr->class_code.pi = 0x10;
+
+	pcihdr->header_type = PCI_HEADER_TYPE_NORMAL;
+
+	bar |= PCI_BASE_ADDRESS_MEM_TYPE_64;
+	bar |= PCI_BASE_ADDRESS_MEM_PREFETCH;
+	bar |= PCI_BASE_ADDRESS_SPACE_MEMORY;
+	bar |= base_pa & CXLV_PCI_BASE_ADDRESS_PA_MASK;
+	pcihdr->bar0 = cpu_to_le32(bar);
+
+	pcihdr->bar1 = cpu_to_le32(base_pa >> 32);
+
+	pcihdr->type0.subsystem_id = cpu_to_le16(CXLV_SUBSYSTEM_ID);
+	pcihdr->type0.subsystem_vendor_id = cpu_to_le16(CXLV_SUBSYSTEM_VENDOR_ID);
+
+	pcihdr->type0.expand_rom = cpu_to_le32(0);
+
+	pcihdr->type0.cap_pointer = CXLV_PCI_PM_CAP_OFFS;
+}
+
+static void pci_pmcap_init(struct cxlv_pci_pm_cap *pmcap)
+{
+	pmcap->cid = PCI_CAP_ID_PM;
+	pmcap->next = CXLV_PCI_MSIX_CAP_OFFS;
+
+	/* set version of power management cap to 0x11 */
+	pmcap->pm_cap = cpu_to_le16(PCI_PM_CAP_VER_MASK & 0x11);
+
+	pmcap->pm_ctrl_status = cpu_to_le16(PCI_D0 | PCI_PM_CTRL_NO_SOFT_RESET);
+}
+
+static void pci_msixcap_init(struct cxlv_pci_msix_cap *msixcap)
+{
+	u16 val;
+	u32 tab_val;
+
+	msixcap->cid = PCI_CAP_ID_MSIX;
+	msixcap->next = CXLV_PCIE_CAP_OFFS;
+
+	val = PCI_MSIX_FLAGS_ENABLE;
+	/* set msix table size decoded by (n + 1) */
+	val |= ((CXLV_BAR_PCI_MSIX_OFF - 1) & PCI_MSIX_FLAGS_QSIZE);
+	msixcap->msix_ctrl = cpu_to_le16(val);
+
+	/* msix table at the beginning of bar0 */
+	tab_val = (PCI_MSIX_TABLE_BIR & 0x0);
+	tab_val |= (PCI_MSIX_TABLE_OFFSET & CXLV_BAR_PCI_MSIX_OFF);
+	msixcap->msix_tab = cpu_to_le32(tab_val);
+}
+
+static void pci_pciecap_init(struct cxlv_pcie_cap *pciecap, u8 type)
+{
+	u32 val;
+	u16 cap_val;
+
+	pciecap->cid = PCI_CAP_ID_EXP;
+	pciecap->next = 0x0;
+
+	cap_val = CXLV_PCI_EXP_VERS_DEFAULT;
+	cap_val |= FIELD_PREP(CXLV_PCI_EXP_TYPE_MASK, type);
+	pciecap->pcie_cap = cpu_to_le16(cap_val);
+
+	val = CXLV_PCI_EXP_PAYLOAD_DEFAULT;
+	val |= FIELD_PREP(CXLV_PCI_EXP_DEVCAP_L0S_MASK, CXLV_PCI_EXP_DEVCAP_L0S_DEFAULT);
+	val |= FIELD_PREP(CXLV_PCI_EXP_DEVCAP_L1_MASK, CXLV_PCI_EXP_DEVCAP_L1_DEFAULT);
+	pciecap->pcie_dev_cap = cpu_to_le32(val);
+}
+
+static void init_pci_ext_cap(struct cxlv_pci_ext_cap *ext_cap, u16 next)
+{
+	u16 next_val;
+
+	ext_cap->cid = cpu_to_le16(PCI_EXT_CAP_ID_DVSEC);
+	next_val = CXLV_PCI_EXT_CAP_VERS_DEFAULT;
+	next_val |= FIELD_PREP(CXLV_PCI_EXT_CAP_NEXT_MASK, next);
+	ext_cap->next = cpu_to_le16(next_val);
+}
+
+static void init_cxl_dvsec_header1(__le32 *header1, u16 len)
+{
+	u32 header1_val;
+
+	header1_val = PCI_DVSEC_VENDOR_ID_CXL;
+	header1_val |= FIELD_PREP(CXLV_DVSEC_REVISION_MASK, CXLV_DVSEC_REVISION_DEFAULT);
+	header1_val |= FIELD_PREP(CXLV_DVSEC_LEN_MASK, len);
+
+	*header1 = cpu_to_le32(header1_val);
+}
+
+static void init_cxl_loc_low(__le32 *low, u8 bar, u8 type, u64 off)
+{
+	u32 val;
+	u32 off_val;
+
+	off_val = FIELD_GET(CXLV_DVSEC_LOC_LO_OFF_MASK, off);
+
+	val = bar;
+	val |= FIELD_PREP(CXLV_DVSEC_LOC_LO_TYPE_MASK, type);
+	val |= FIELD_PREP(CXLV_DVSEC_LOC_LO_OFF_MASK, off_val);
+
+	*low = cpu_to_le32(val);
+}
+
+static void init_cxl_loc_hi(__le32 *hi, u64 off)
+{
+	u32 off_val;
+
+	if (!FIELD_FIT(CXLV_DVSEC_LOC_HI_OFF_MASK, off)) {
+		*hi = cpu_to_le32(0);
+		return;
+	}
+
+	off_val = FIELD_GET(CXLV_DVSEC_LOC_HI_OFF_MASK, off);
+	*hi = cpu_to_le32(FIELD_PREP(CXLV_DVSEC_LOC_HI_OFF_MASK, off_val));
+}
+
+static void pci_dev_excap_init(struct cxlv_pci_ext_cap *ext_cap)
+{
+	void *ext_cap_base = ext_cap;
+	struct cxlv_pci_ext_cap_id_dvsec *cap_id;
+	struct cxlv_pci_ext_cap_locator *cap_loc;
+	struct reg_block_loc *loc;
+	u16 cap_val;
+
+	/* Initialize the CXL_DVSEC_PCIE_DEVICE */
+	cap_id = ext_cap_base;
+	init_pci_ext_cap(&cap_id->header.cap_header, PCI_CFG_SPACE_SIZE + 0x3c);
+
+	init_cxl_dvsec_header1(&cap_id->header.cxl_header1, 0x3c);
+
+	cap_id->header.cxl_header2 = cpu_to_le16(CXL_DVSEC_PCIE_DEVICE);
+
+	cap_val = CXLV_DVSEC_CAP_MEM;
+	cap_val |= FIELD_PREP(CXLV_DVSEC_CAP_HDM_COUNT_MASK, 1);
+	cap_id->cap = cpu_to_le16(cap_val);
+
+	cap_id->size_low_1 = cpu_to_le32(CXLV_DVSEC_CAP_VALID | CXLV_DVSEC_CAP_ACTIVE);
+
+	/* Initialize locator dvsec for memdev */
+	cap_loc = ext_cap_base + 0x3c;
+	init_pci_ext_cap(&cap_loc->header.cap_header, 0);
+
+	init_cxl_dvsec_header1(&cap_loc->header.cxl_header1, 0xC + sizeof(struct reg_block_loc) * 2);
+
+	cap_loc->header.cxl_header2 = cpu_to_le16(CXL_DVSEC_REG_LOCATOR);
+
+	loc = &cap_loc->loc1;
+	init_cxl_loc_low(&loc->reg_block_lo_off, 0, CXL_REGLOC_RBI_MEMDEV, CXLV_DEV_BAR_DEV_REGS_OFF);
+	init_cxl_loc_hi(&loc->reg_block_hi_off, CXLV_DEV_BAR_DEV_REGS_OFF);
+
+	loc = &cap_loc->loc2;
+	init_cxl_loc_low(&loc->reg_block_lo_off, 0, CXL_REGLOC_RBI_COMPONENT, CXLV_DEV_BAR_COMPONENT_OFF);
+	init_cxl_loc_hi(&loc->reg_block_hi_off, CXLV_DEV_BAR_COMPONENT_OFF);
+}
+
+static void pci_bridge_extcap_init(struct cxlv_pci_ext_cap *ext_cap)
+{
+	void *ext_cap_base = ext_cap;
+	struct cxlv_pci_ext_cap_id_dvsec *cap_id;
+	struct cxlv_pci_ext_cap_locator *cap_loc;
+	struct reg_block_loc *loc;
+	u16 cap_val;
+
+	/* Initialize the CXL_DVSEC_PCIE_DEVICE */
+	cap_id = ext_cap_base;
+	init_pci_ext_cap(&cap_id->header.cap_header, PCI_CFG_SPACE_SIZE + 0x3c);
+
+	init_cxl_dvsec_header1(&cap_id->header.cxl_header1, 0x3c);
+	cap_id->header.cxl_header2 = cpu_to_le16(CXL_DVSEC_PCIE_DEVICE);
+
+	cap_val = CXLV_DVSEC_CAP_MEM;
+	cap_val |= FIELD_PREP(CXLV_DVSEC_CAP_HDM_COUNT_MASK, 1);
+	cap_id->cap = cpu_to_le16(cap_val);
+
+	cap_id->size_low_1 = cpu_to_le32(CXLV_DVSEC_CAP_VALID | CXLV_DVSEC_CAP_ACTIVE);
+
+	/* Initialize locator dvsec for memdev */
+	cap_loc = ext_cap_base + 0x3c;
+	init_pci_ext_cap(&cap_loc->header.cap_header, 0);
+
+	init_cxl_dvsec_header1(&cap_loc->header.cxl_header1, 0xC + sizeof(struct reg_block_loc) * 3);
+	cap_loc->header.cxl_header2 = cpu_to_le16(CXL_DVSEC_REG_LOCATOR);
+
+	loc = &cap_loc->loc1;
+	init_cxl_loc_low(&loc->reg_block_lo_off, 0, CXL_REGLOC_RBI_COMPONENT, CXLV_BRIDGE_BAR_COMPONENT_OFF);
+	init_cxl_loc_hi(&loc->reg_block_hi_off, CXLV_BRIDGE_BAR_COMPONENT_OFF);
+}
+
+
+static void pci_bridge_header_init(struct cxlv_pci_cfg_header *pcihdr, unsigned long base_pa)
+{
+	u32 bar;
+
+	pcihdr->did = CXLV_DEVICE_ID;
+	pcihdr->vid = CXLV_VENDOR_ID;
+	pcihdr->status = cpu_to_le16(PCI_STATUS_CAP_LIST);
+
+	pcihdr->header_type = PCI_HEADER_TYPE_BRIDGE;
+
+	pcihdr->rid = 0x01;
+
+	pcihdr->class_code.bcc = PCI_BASE_CLASS_BRIDGE;
+	pcihdr->class_code.scc = 0x04;
+	pcihdr->class_code.pi = 0x00;
+
+	bar = PCI_BASE_ADDRESS_MEM_TYPE_64;
+	bar |= PCI_BASE_ADDRESS_MEM_PREFETCH;
+	bar |= PCI_BASE_ADDRESS_SPACE_MEMORY;
+	bar |= base_pa & CXLV_PCI_BASE_ADDRESS_PA_MASK;
+	pcihdr->bar0 = cpu_to_le32(bar);
+
+	pcihdr->bar1 = cpu_to_le32(base_pa >> 32);
+
+	pcihdr->type1.capabilities_pointer = CXLV_PCI_PM_CAP_OFFS;
+}
+
+static void pci_pointer_assign(struct cxlv_pci_cfg *cfg)
+{
+	cfg->pcihdr = (void *)cfg->cfg_data + CXLV_PCI_HDR_OFFS;
+	cfg->pmcap = (void *)cfg->cfg_data + CXLV_PCI_PM_CAP_OFFS;
+	cfg->msixcap = (void *)cfg->cfg_data + CXLV_PCI_MSIX_CAP_OFFS;
+	cfg->pciecap = (void *)cfg->cfg_data + CXLV_PCIE_CAP_OFFS;
+	cfg->extcap = (void *)cfg->cfg_data + CXLV_PCI_EXT_CAP_OFFS;
+}
+
+static int pci_bridge_init(struct cxlv_pci_cfg *bridge, u64 off)
+{
+	pci_pointer_assign(bridge);
+
+	pci_bridge_header_init(bridge->pcihdr, off);
+	pci_pmcap_init(bridge->pmcap);
+	pci_msixcap_init(bridge->msixcap);
+	pci_pciecap_init(bridge->pciecap, PCI_EXP_TYPE_ROOT_PORT);
+	pci_bridge_extcap_init(bridge->extcap);
+
+	return 0;
+}
+
+static void pci_dev_init(struct cxlv_pci_cfg *dev_cfg, u64 off)
+{
+	pci_pointer_assign(dev_cfg);
+
+	pci_dev_header_init((struct cxlv_pci_cfg_header *)dev_cfg->pcihdr, off);
+	pci_pmcap_init(dev_cfg->pmcap);
+	pci_msixcap_init(dev_cfg->msixcap);
+	pci_pciecap_init(dev_cfg->pciecap, PCI_EXP_TYPE_ENDPOINT);
+	pci_dev_excap_init(dev_cfg->extcap);
+}
+
+static int cxlv_pci_find_busnr(int domain_start, int *domain_ret, int *bus_ret)
+{
+	int domain = domain_start;
+	int busnr = 0;
+	struct pci_bus *bus;
+
+	for (; domain < 255; domain++) {
+		for (busnr = 0; busnr < 255; busnr++) {
+			bus = pci_find_bus(domain, busnr);
+			if (!bus) {
+				goto found;
+			}
+		}
+	}
+
+	pr_err("There is no available bus number found.");
+
+	return -1;
+found:
+	*domain_ret = domain;
+	*bus_ret = busnr;
+
+	return 0;
+}
+
+static int cxlv_pci_create_host_bridge(struct cxlv_device *cxlv_device)
+{
+	LIST_HEAD(resources);
+	struct pci_bus *bus;
+	int domain, busnr;
+	int ret;
+	static struct resource busn_res = {
+	        .start = 0,
+	        .end = 255,
+	        .flags = IORESOURCE_BUS,
+	};
+
+	ret = cxlv_pci_find_busnr(CXLV_PCI_DOMAIN_NUM, &domain, &busnr);
+	if (ret) {
+		return ret;
+	}
+
+	cxlv_device->domain_nr = domain;
+	cxlv_device->host_bridge_busnr = busnr;
+
+	cxlv_pci_sysdata.domain = domain;
+
+	pci_add_resource(&resources, &ioport_resource);
+	pci_add_resource(&resources, &iomem_resource);
+	pci_add_resource(&resources, &busn_res);
+
+	bus = pci_create_root_bus(NULL, busnr, &cxlv_pci_ops, &cxlv_pci_sysdata, &resources);
+	if (!bus) {
+		pci_free_resource_list(&resources);
+		pr_err("Unable to create PCI bus\n");
+		return -1;
+	}
+
+	cxlv_device->host_bridge = to_pci_host_bridge(bus->bridge);
+
+	/* TODO to support native cxl error */
+	cxlv_device->host_bridge->native_cxl_error = 0;
+
+	return 0;
+}
+
+int cxlv_pci_init(struct cxlv_device *cxlv_device)
+{
+	cxlv_pci_create_host_bridge(cxlv_device);
+
+	pci_bridge_init(&cxlv_device->bridge_cfg, cxlv_device->opts->memstart + CXLV_BRIDGE_REG_OFF);
+
+	pci_dev_init(&cxlv_device->dev_cfg, cxlv_device->opts->memstart + CXLV_DEV_REG_OFF);
+
+	cxlv_pci_bus_init(cxlv_device);
+
+	return 0;
+}
diff --git a/drivers/cxl/cxl_virt/cxlv_pci.h b/drivers/cxl/cxl_virt/cxlv_pci.h
new file mode 100644
index 000000000000..b39c27760859
--- /dev/null
+++ b/drivers/cxl/cxl_virt/cxlv_pci.h
@@ -0,0 +1,549 @@
+#ifndef __CXLV_PCI_H__
+#define __CXLV_PCI_H__
+#include <linux/pci.h>
+
+/* [PCIE 6.0] 7.5.1 PCI-Compatible Configuration Registers */
+#define CXLV_PCI_BASE_ADDRESS_PA_MASK	0xFFFF8000
+
+struct cxlv_pci_cfg_header {
+	__le16 vid; /* vendor ID */
+	__le16 did; /* device ID */
+
+	__le16 command;
+	__le16 status;
+
+	u8 rid;	/* revision ID */
+
+	struct {
+		u8 pi;
+		u8 scc;
+		u8 bcc;
+	} class_code;
+
+	u8 cache_line_size;
+	u8 latency_timer_reg;
+
+	u8 header_type;
+	u8 bist;
+
+	__le32 bar0;
+	__le32 bar1;
+
+	union {
+		struct {
+			__le32 bar[4];
+
+			__le32 cardbus_cis_pointer;
+
+			__le16 subsystem_vendor_id;
+			__le16 subsystem_id;
+
+			__le32 expand_rom;
+			u8 cap_pointer;
+
+			u8 rsvd[7];
+
+			u8 intr_line;
+			u8 intr_pin;
+
+			u8 min_gnt;
+			u8 max_lat;
+		} type0;
+		struct {
+			u8 primary_bus;
+			u8 secondary_bus;
+			u8 subordinate_bus;
+			u8 secondary_latency_timer;
+			u8 iobase;
+			u8 iolimit;
+			__le16 secondary_status;
+			__le16 membase;
+			__le16 memlimit;
+			__le16 pref_mem_base;
+			__le16 pref_mem_limit;
+			__le32 prefbaseupper;
+			__le32 preflimitupper;
+			__le16 iobaseupper;
+			__le16 iolimitupper;
+			u8 capabilities_pointer;
+			u8 reserve[3];
+			__le32 romaddr;
+			u8 intline;
+			u8 intpin;
+			__le16 bridgectrl;
+		} type1;
+	};
+};
+
+struct cxlv_pci_pm_cap {
+	u8 cid;
+	u8 next;
+
+	__le16 pm_cap; /* power management capability */
+	__le16 pm_ctrl_status; /* power management control status */
+
+	u8 resv;
+	u8 data;
+};
+
+struct cxlv_pci_msix_cap {
+	u8 cid;
+	u8 next;
+
+	__le16 msix_ctrl;
+	__le32 msix_tab;
+	__le32 msix_pba; /* pending bit array */
+};
+
+/*
+ * [PCIE 6.0] 7.5.3.2 PCI Express Capabilities Register
+ */
+
+/* version must be hardwired to 2h for Functions compliant */
+#define CXLV_PCI_EXP_VERS_DEFAULT	2
+
+#define CXLV_PCI_EXP_TYPE_MASK		GENMASK(7, 4)
+
+/*
+ *
+ * Max payload size defined encodings are:
+ *
+ * 000b		128 bytes max payload size
+ * 001b		256 bytes max payload size
+ * 010b		512 bytes max payload size
+ * 011b		1024 bytes max payload size
+ * 100b		2048 bytes max payload size
+ * 101b		4096 bytes max payload size
+ */
+
+/* set default max payload to 256 bytes */
+#define CXLV_PCI_EXP_PAYLOAD_DEFAULT	0b001
+
+/*
+ * Endpoint L0s Acceptable Latency
+ *
+ * 000b		Maximum of 64 ns
+ * 001b		Maximum of 128 ns
+ * 010b		Maximum of 256 ns
+ * 011b		Maximum of 512 ns
+ * 100b		Maximum of 1 μs
+ * 101b		Maximum of 2 μs
+ * 110b		Maximum of 4 μs
+ * 111b		No limit
+ */
+#define CXLV_PCI_EXP_DEVCAP_L0S_MASK	GENMASK(8, 6)
+#define CXLV_PCI_EXP_DEVCAP_L0S_DEFAULT 0b110
+
+/*
+ * Endpoint L1 Acceptable Latency
+ *
+ * 000b		Maximum of 1 μs
+ * 001b		Maximum of 2 μs
+ * 010b		Maximum of 4 μs
+ * 011b		Maximum of 8 μs
+ * 100b		Maximum of 16 μs
+ * 101b		Maximum of 32 μs
+ * 110b		Maximum of 64 μs
+ * 111b		No limit
+ */
+#define CXLV_PCI_EXP_DEVCAP_L1_MASK	GENMASK(11, 9)
+#define CXLV_PCI_EXP_DEVCAP_L1_DEFAULT	0b110
+
+struct cxlv_pcie_cap {
+	u8 cid;
+	u8 next;
+
+	__le16 pcie_cap;
+	__le32 pcie_dev_cap;
+	__le16 pxdc;
+	__le16 pxds;
+	__le32 pxlcap;
+	__le16 pxlc;
+	__le16 pxls;
+
+	/* not used in cxlv */
+	__le32 others[10];
+};
+
+/**
+ *
+ * [PCIE 6.0] 7.6.3 PCI Express Extended Capability Header
+ */
+#define CXLV_PCI_EXT_CAP_VERS_DEFAULT	1
+#define CXLV_PCI_EXT_CAP_NEXT_MASK	GENMASK(15, 4)
+
+struct cxlv_pci_ext_cap {
+	__le16 cid;
+	__le16 next;
+};
+
+/*
+ * cxlv memory layout
+ *
+ * |--dev regs (1M)---|--bridge regs (1M)---|---reserved(2M)---|----resource (rest)-----|
+ */
+#define CXLV_DEV_REG_OFF			0x0
+#define CXLV_DEV_REG_SIZE			0x100000
+#define CXLV_BRIDGE_REG_OFF			(CXLV_DEV_REG_OFF + CXLV_DEV_REG_SIZE)
+#define CXLV_BRIDGE_REG_SIZE			0x100000
+
+/* resource start from 4M offset */
+#define CXLV_RESOURCE_OFF		0x400000
+
+#define   CXLV_BAR_PCI_MSIX_OFF		0x0
+#define   CXLV_MSIX_ENTRY_NUM		128
+#define   CXLV_BAR_PCI_MSIX_LEN		(PCI_MSIX_ENTRY_SIZE * CXLV_MSIX_ENTRY_NUM)
+
+#define CXLV_DEV_BAR_PCI_OFF		0x0
+#define CXLV_DEV_BAR_PCI_LEN		0x10000
+#define CXLV_DEV_BAR_DEV_REGS_OFF	(CXLV_DEV_BAR_PCI_OFF + CXLV_DEV_BAR_PCI_LEN)
+#define CXLV_DEV_BAR_DEV_REGS_LEN	0x10000
+#define CXLV_DEV_BAR_COMPONENT_OFF	(CXLV_DEV_BAR_DEV_REGS_OFF + CXLV_DEV_BAR_DEV_REGS_LEN)
+#define CXLV_DEV_BAR_COMPONENT_LEN	0x10000
+#define CXLV_DEV_BAR_LSA_OFF		(CXLV_DEV_BAR_COMPONENT_OFF + CXLV_DEV_BAR_COMPONENT_LEN)
+#define CXLV_DEV_BAR_LSA_LEN		0x10000
+
+#define CXLV_BRIDGE_BAR_PCI_OFF		0x0
+#define CXLV_BRIDGE_BAR_PCI_LEN		0x10000
+#define CXLV_BRIDGE_BAR_COMPONENT_OFF	(CXLV_BRIDGE_BAR_PCI_OFF + CXLV_BRIDGE_BAR_PCI_LEN)
+#define CXLV_BRIDGE_BAR_COMPONENT_LEN	0x10000
+
+/*
+ * [CXL 3.0] 8.1.3 PCIe DVSEC for CXL Devices
+ */
+
+/*
+ * DVSEC Revision ID 2h represents the structure
+ * as defined in the CXL 3.0 specification.
+ * */
+
+#define CXLV_DVSEC_REVISION_MASK	GENMASK(19, 16)
+#define CXLV_DVSEC_LEN_MASK		GENMASK(31, 20)
+
+#define CXLV_DVSEC_REVISION_DEFAULT	3
+
+struct cxlx_dvsec_header {
+	struct cxlv_pci_ext_cap cap_header;
+	__le32 cxl_header1;
+	__le16 cxl_header2;
+} __packed;
+
+#define CXLV_DVSEC_CAP_MEM		0x4
+#define CXLV_DVSEC_CAP_HDM_COUNT_MASK	GENMASK(5, 4)
+
+#define CXLV_DVSEC_CAP_VALID		0x1
+#define CXLV_DVSEC_CAP_ACTIVE		0x2
+struct cxlv_pci_ext_cap_id_dvsec {
+	struct cxlx_dvsec_header header;
+	__le16 cap;
+
+	__le32	skip[3];
+	__le32	size_hi_1;
+	__le32	size_low_1;
+};
+
+/*
+ * [CXL 3.0] 8.1.9 Register Locator DVSEC
+ */
+
+#define CXLV_DVSEC_LOC_LO_TYPE_MASK	GENMASK(15, 8)
+#define CXLV_DVSEC_LOC_LO_OFF_MASK	GENMASK(31, 16)
+#define CXLV_DVSEC_LOC_HI_OFF_MASK	GENMASK(63, 32)
+struct reg_block_loc {
+	__le32 reg_block_lo_off;
+	__le32 reg_block_hi_off;
+};
+
+struct cxlv_pci_ext_cap_locator {
+	struct cxlx_dvsec_header header;
+	struct reg_block_loc loc1;
+	struct reg_block_loc loc2;
+	struct reg_block_loc loc3;
+};
+
+/*
+ * [CXL 3.0] 8.2.8 CXL Device Register Interface
+ */
+
+/*
+ * Version: Defines the version of the capability structure present. This field shall be
+ * set to 01h. Software shall check this version number during initialization to
+ * determine the layout of the device capabilities, treating an unknown version number
+ * as an error preventing any further access to the device by that software.
+ */
+#define CXLV_DEV_CAP_ARRAY_HEADER_VERS_DEFAULT	1
+/*
+ * Type: Identifies the type-specific capabilities in the CXL Device Capabilities Array.
+ *   0h = The type is inferred from the PCI Class code. If the PCI Class code is not
+ * associated with a type defined by this specification, no type-specific capabilities
+ * are present.
+ *   1h = Memory Device Capabilities (see Section 8.2.8.5).
+ *   2h = Switch Mailbox CCI Capabilities (see Section 8.2.8.6).
+ *   All other encodings are reserved.
+ */
+#define CXLV_DEV_CAP_ARRAY_HEADER_TYPE_MASK	GENMASK(12, 8)
+#define CXLV_DEV_CAP_ARRAY_HEADER_TYPE_MEMDEV	1
+#define CXLV_DEV_CAP_ARRAY_HEADER_TYPE_SWITCH	2
+
+struct cap_array_header {
+	__le16	cap_id;
+	__le16	vers_type;
+	__le16	cap_count;
+	__le16	res[5];
+} __packed;
+
+struct cap_header {
+	__le16	cap_id;
+	__le16	version;
+	__le32	offset;
+	__le32	len;
+	__le32	res2;
+};
+
+struct cxl_bar {
+	struct cap_array_header cap_array_header;
+	struct cap_header	cap_headers[];
+};
+
+/*
+ *
+ * [CXL 3.0] 8.2.8.3 Device Status Registers (Offset: Varies)
+ */
+struct cxl_dev_status {
+	__le32	status;
+	__le32	reserved;
+};
+
+/*
+ * [CXL 3.0] 8.2.8.4 Mailbox Registers (Offset: Varies)
+ */
+
+/*
+ * Payload Size: Size of the Command Payload registers in bytes, expressed as 2^n.
+ * The minimum size is 256 bytes (n=8) and the maximum size is 1 MB (n=20).
+ */
+#define CXLV_MBOX_CAP_PAYLOAD_SIZE_MASK		0x1f
+#define CXLV_MBOX_CAP_PAYLOAD_SIZE_DEFAULT	11	/* 2K */
+
+struct cxlv_mbox {
+	__le32 cap;
+	__le32 control;
+	__le64 cmd;
+	__le64 status;
+	__le64 bg_cmd_status;
+	u8	payload[];
+} __packed;
+
+static inline bool cxlv_mbox_test_doorbell(struct cxlv_mbox *mbox)
+{
+	return (readl(&mbox->control) & CXLDEV_MBOX_CTRL_DOORBELL);
+}
+
+static inline void cxlv_mbox_clear_doorbell(struct cxlv_mbox *mbox)
+{
+	u32 val;
+
+	val = readl(&mbox->control);
+	val &= ~CXLDEV_MBOX_CTRL_DOORBELL;
+
+	writel(val, &mbox->control);
+}
+
+static inline u16 cxlv_mbox_get_cmd(struct cxlv_mbox *mbox)
+{
+	return FIELD_GET(CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK, mbox->cmd);
+}
+
+static inline void cxlv_mbox_set_cmd_payload_len(struct cxlv_mbox *mbox, u16 len)
+{
+	u64 val;
+
+	val = readq(&mbox->cmd);
+	val |= FIELD_PREP(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, len);
+
+	writeq(val, &mbox->cmd);
+}
+
+static inline void cxlv_mbox_set_retcode(struct cxlv_mbox *mbox, int ret)
+{
+	u64 val;
+
+	val = readq(&mbox->control);
+	val |= FIELD_PREP(CXLDEV_MBOX_STATUS_RET_CODE_MASK, ret);
+
+	writeq(val, &mbox->control);
+}
+
+static inline void cxlv_mbox_copy_to_payload(struct cxlv_mbox *mbox, u32 off,
+		void *p, u32 len)
+{
+	memcpy_toio(mbox->payload + off, p, len);
+}
+
+static inline void cxlv_mbox_copy_from_payload(struct cxlv_mbox *mbox, u32 off,
+		void *p, u32 len)
+{
+	memcpy_fromio(p, mbox->payload + off, len);
+}
+
+/*
+ * [CXL 3.0] 8.2.8.5 Memory Device Capabilities
+ */
+
+/*
+ * Media Status: Describes the status of the device media.
+ *  00b = Not Ready - Media training is incomplete.
+ *  01b = Ready - The media trained successfully and is ready for use.
+ *  10b = Error - The media failed to train or encountered an error.
+ *  11b = Disabled - Access to the media is disabled.
+ */
+#define CXLV_MEMDEV_CAP_MEDIA_STATUS_MASK	GENMASK(3, 2)
+#define CXLV_MEMDEV_CAP_MEDIA_STATUS_DEFAULT	0b01
+
+#define CXLV_MEMDEV_CAP_MBXO_INTERFACE_READY	0x10
+
+/*
+ * Reset Needed: When nonzero, indicates the least impactful reset type needed to
+ * return the device to the operational state. A cold reset is considered more impactful
+ * than a warm reset. A warm reset is considered more impactful that a hot reset,
+ * which is more impactful than a CXL reset. This field returns nonzero value if FW Halt
+ * is set, Media Status is in the Error or Disabled state, or the Mailbox Interfaces Ready
+ * does not become set.
+ *  000b = Device is operational and a reset is not required
+ *  001b = Cold Reset
+ *  010b = Warm Reset
+ *  011b = Hot Reset
+ *  100b = CXL Reset (device must not report this value if it does not support CXL
+ * Reset)
+ * • All other encodings are reserved.
+ */
+#define CXLV_MEMDEV_CAP_MBOX_RESET_NEEDED_MASK		GENMASK(7, 5)
+#define CXLV_MEMDEV_CAP_MBOX_RESET_NEEDED_DEFAULT	0b0
+
+struct cxl_memdev_cap {
+	__le64 status;
+} __packed;
+
+#define CXLV_DEV_CAP_MBOX_PAYLOAD	2048
+#define CXLV_DEV_CAP_ARRAY_SIZE		4
+
+#define CXLV_DEV_CAP_STATUS_OFF		(0x10 * CXLV_DEV_CAP_ARRAY_SIZE)
+#define CXLV_DEV_CAP_STATUS_LEN		sizeof(struct cxl_dev_status)
+#define CXLV_DEV_CAP_MEMDEV_OFF		(CXLV_DEV_CAP_STATUS_OFF + CXLV_DEV_CAP_STATUS_LEN)
+#define CXLV_DEV_CAP_MEMDEV_LEN		sizeof(struct cxl_memdev_cap)
+#define CXLV_DEV_CAP_MBOX_OFF		(CXLV_DEV_CAP_MEMDEV_OFF + CXLV_DEV_CAP_MEMDEV_LEN)
+#define CXLV_DEV_CAP_MBOX_LEN		sizeof(struct cxlv_mbox) + CXLV_DEV_CAP_MBOX_PAYLOAD
+
+struct cxlv_pci_ext_cap_dsn {
+	struct cxlv_pci_ext_cap id;
+	__le64 serial;
+};
+
+/*
+ * [CXL 3.0] 8.2.3 Component Register Layout and Definition
+ */
+
+#define CXLV_COMP_CACHEMEM_OFF		4096
+#define   CXLV_COMP_DECODER_OFF		1024
+
+#define   CXLV_COMP_CACHEMEM_HDR_CAP_ID_MASK		GENMASK(15, 0)
+#define   CXLV_COMP_CACHEMEM_HDR_CAP_VER_MASK		GENMASK(19, 16)
+#define   CXLV_COMP_CACHEMEM_HDR_CACHEMEM_VER_MASK	GENMASK(23, 20)
+#define   CXLV_COMP_CACHEMEM_HDR_ARRAY_SIZE_MASK	GENMASK(31, 24)
+
+#define   CXLV_COMP_CACHEMEM_HDM_CAP_ID_MASK		GENMASK(15, 0)
+#define   CXLV_COMP_CACHEMEM_HDM_CAP_VER_MASK		GENMASK(19, 16)
+#define   CXLV_COMP_CACHEMEM_HDM_DECODER_POINTER_MASK	GENMASK(31, 20)
+
+struct cxl_cachemem_comp {
+	__le32 header;
+	__le32 hdm_cap;
+};
+
+struct cxl_component {
+	u8	resv1[4096];
+	struct cxl_cachemem_comp	cachemem_comp;
+	u8	impl_spec[49152];
+	u8	arb_mux[1024];
+	u8	resv2[7168];
+};
+
+/*
+ * Decoder Count: Reports the number of memory address decoders
+ * implemented by the component. CXL devices shall not advertise more than 10
+ * decoders. CXL switches and Host Bridges may advertise up to 32 decoders.
+ * 0h – 1 Decoder
+ * 1h – 2 Decoders
+ * 2h – 4 Decoders
+ * 3h – 6 Decoders
+ * 4h – 8 Decoders
+ * 5h – 10 Decoders
+ * 6h – 12 Decoders2
+ * 7h – 14 Decoders2
+ * 8h – 16 Decoders2
+ * 9h – 20 Decoders2
+ * Ah – 24 Decoders2
+ * Bh – 28 Decoders2
+ * Ch – 32 Decoders2
+ * All other values are reserved
+ */
+#define CXLV_DECODER_CAP_DCOUNT_MASK	GENMASK(3, 0)
+
+/*
+ * Target Count: The number of target ports each decoder supports (applicable
+ * only to Upstream Switch Port and CXL Host Bridge). Maximum of 8.
+ * 1h – 1 target port
+ * 2h – 2 target ports
+ * 4h – 4 target ports
+ * 8h – 8 target ports
+ * All other values are reserved.
+ */
+#define CXLV_DECODER_CAP_TCOUNT_MASK	GENMASK(7, 4)
+
+#define CXLV_DECODER_GLOBAL_CTRL_POISON		BIT(0)
+#define CXLV_DECODER_GLOBAL_CTRL_ENABLE		BIT(1)
+
+#define CXLV_DECODER_CTRL_IG_MASK		GENMASK(3, 0)
+#define CXLV_DECODER_CTRL_IW_MASK		GENMASK(7, 4)
+#define CXLV_DECODER_CTRL_COMMIT		BIT(9)
+#define CXLV_DECODER_CTRL_COMMITTED		BIT(10)
+#define CXLV_DECODER_CTRL_COMMIT_ERR	BIT(11)
+
+struct cxl_decoder_regs {
+	__le32	base_lo;
+	__le32	base_hi;
+	__le32	size_lo;
+	__le32	size_hi;
+
+	__le32 ctrl_regs;
+
+	union {
+		__le32	target_list_lo;
+		__le32	dpa_skip_lo;
+	};
+	union {
+		__le32	target_list_hi;
+		__le32	dpa_skip_hi;
+	};
+} __packed;
+
+struct cxl_decoder_cap {
+	__le32 cap_reg;
+	__le32 global_ctrl_reg;
+	__le32	resv[2];
+	struct cxl_decoder_regs decoder[];
+} __packed;
+
+
+/* use domain 0x10 instead of 0x0 to avoid race with real pci device */
+#define CXLV_PCI_DOMAIN_NUM	0x10
+#define CXLV_PCI_BUS_NUM	0x0
+
+/* offset in pci configureation space */
+#define CXLV_PCI_HDR_OFFS	0x0
+#define CXLV_PCI_PM_CAP_OFFS	0x40
+#define CXLV_PCI_MSIX_CAP_OFFS	0x50
+#define CXLV_PCIE_CAP_OFFS	0x60
+
+#define CXLV_PCI_EXT_CAP_OFFS (PCI_CFG_SPACE_SIZE)
+#endif /* __CXLV_PCI_H__ */
diff --git a/drivers/cxl/cxl_virt/cxlv_port.c b/drivers/cxl/cxl_virt/cxlv_port.c
new file mode 100644
index 000000000000..eb1a2de7f333
--- /dev/null
+++ b/drivers/cxl/cxl_virt/cxlv_port.c
@@ -0,0 +1,149 @@
+#include "cxlv.h"
+#include "cxlv_pci.h"
+
+static int cxlv_port_create_root_port(struct cxlv_device *cxlv_device)
+{
+	struct device *host = &cxlv_device->dev;
+	struct cxl_port *root_port;
+
+	root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
+	if (IS_ERR(root_port))
+		return PTR_ERR(root_port);
+
+	cxlv_device->root_port = root_port;
+
+	return 0;
+}
+
+static int cxlv_port_add_root_decoder(struct cxlv_device *cxlv_device, struct resource *cxlv_res)
+{
+	int ret;
+	struct resource *res;
+	struct cxl_root_decoder *cxlrd;
+	struct cxl_decoder *cxld;
+	int target_map[CXL_DECODER_MAX_INTERLEAVE];
+
+	res = kzalloc(sizeof(*res), GFP_KERNEL);
+	if (!res)
+		return -ENOMEM;
+
+	res->name = kasprintf(GFP_KERNEL, "CXLV Window %d", cxlv_device->cxlv_dev_id);
+	if (!res->name)
+		goto free_res;
+
+	res->start = cxlv_device->opts->memstart + CXLV_RESOURCE_OFF;
+	res->end = cxlv_device->opts->memstart + cxlv_device->opts->memsize - 1;
+	res->flags = IORESOURCE_MEM;
+
+	ret = insert_resource(cxlv_res, res);
+	if (ret)
+		goto free_name;
+
+	cxlrd = cxl_root_decoder_alloc(cxlv_device->root_port, 1, cxl_hb_modulo);
+	if (IS_ERR(cxlrd)) {
+		ret = PTR_ERR(cxlrd);
+		goto out;
+	}
+	cxlrd->qos_class = 0;
+
+	cxld = &cxlrd->cxlsd.cxld;
+	cxld->flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM | CXL_DECODER_F_PMEM;
+	cxld->target_type = CXL_DECODER_HOSTONLYMEM;
+
+	cxld->hpa_range = (struct range) {
+		.start = res->start,
+		.end = res->end,
+	};
+	cxld->interleave_ways = 1;
+	cxld->interleave_granularity = CXL_DECODER_MIN_GRANULARITY;
+
+	target_map[0] = 1;
+
+	ret = cxl_decoder_add(cxld, target_map);
+	if (ret) {
+		put_device(&cxld->dev);
+		goto out;
+	}
+
+	ret = cxl_decoder_autoremove(&cxlv_device->host_bridge->dev, cxld);
+	if (ret)
+		goto out;
+
+	return 0;
+
+free_name:
+	kfree(res->name);
+free_res:
+	kfree(res);
+out:
+	return ret;
+}
+
+int cxlv_port_init(struct cxlv_device *cxlv_device)
+{
+	int ret;
+	struct resource *cxl_res;
+	struct cxl_port *root_port, *port;
+	struct cxl_dport *dport;
+	u64 component_phy_addr;
+
+	ret = cxlv_port_create_root_port(cxlv_device);
+	if (ret)
+		return ret;;
+
+	root_port = cxlv_device->root_port;
+
+	dport = devm_cxl_add_dport(root_port, &cxlv_device->host_bridge->dev, 1, CXL_RESOURCE_NONE);
+	if (IS_ERR(dport)) {
+		pr_err("failed to add dport: %d", (int)PTR_ERR(dport));
+		return PTR_ERR(dport);
+	}
+
+	cxl_res = devm_kzalloc(&cxlv_device->host_bridge->dev, sizeof(*cxl_res), GFP_KERNEL);
+	if (!cxl_res) {
+		return -ENOMEM;
+	}
+
+	cxl_res->name = "CXL mem";
+	cxl_res->start = 0;
+	cxl_res->end = -1;
+	cxl_res->flags = IORESOURCE_MEM;
+
+	ret = devm_add_action_or_reset(&cxlv_device->host_bridge->dev, remove_cxl_resources, cxl_res);
+	if (ret)
+		return ret;
+
+	ret = cxlv_port_add_root_decoder(cxlv_device, cxl_res);
+	if (ret) {
+		return ret;
+	}
+
+	ret = add_cxl_resources(cxl_res);
+	if (ret) {
+		return ret;
+	}
+
+	device_for_each_child(&root_port->dev, cxl_res, pair_cxl_resource);
+
+	ret = devm_cxl_register_pci_bus(&root_port->dev, &cxlv_device->host_bridge->dev, cxlv_device->host_bridge->bus);
+	if (ret) {
+		pr_err("failed to register pci bus");
+		return ret;
+	}
+
+	component_phy_addr = cxlv_device->opts->memstart + CXLV_BRIDGE_REG_OFF + CXLV_BRIDGE_BAR_COMPONENT_OFF;
+	port = devm_cxl_add_port(&root_port->dev, &cxlv_device->host_bridge->dev, component_phy_addr, dport);
+	if (IS_ERR(port))
+		return PTR_ERR(port);
+
+	if (IS_ENABLED(CONFIG_CXL_PMEM)) {
+		ret = device_for_each_child(&root_port->dev, root_port,
+					   add_root_nvdimm_bridge);
+		if (ret < 0) {
+			pr_err("failed add nv bridge");
+			return ret;
+		}
+	}
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/4] cxl: move some function from acpi module to core module
  2023-12-28  6:05 ` [RFC PATCH 1/4] cxl: move some function from acpi module to core module Dongsheng Yang
@ 2023-12-28  6:43   ` Dongsheng Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Dongsheng Yang @ 2023-12-28  6:43 UTC (permalink / raw)
  To: dave, jonathan.cameron, ave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams
  Cc: linux-cxl



在 2023/12/28 星期四 下午 2:05, Dongsheng Yang 写道:
> cxl_virt module will create root_port without cxl_acpi_probe(),
> export these symbol to allow cxl_virt to create it's own root_port.
> 
> Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
> ---
>   drivers/cxl/acpi.c      | 143 +--------------------------------------
>   drivers/cxl/core/port.c | 145 ++++++++++++++++++++++++++++++++++++++++
>   drivers/cxl/cxl.h       |   5 ++
>   3 files changed, 151 insertions(+), 142 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 2034eb4ce83f..a60ed4156a5e 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -447,7 +447,7 @@ static int add_host_bridge_dport(struct device *match, void *arg)
>    * A host bridge is a dport to a CFMWS decode and it is a uport to the
>    * dport (PCIe Root Ports) in the host bridge.
>    */
> -static int add_host_bridge_uport(struct device *match, void *arg)
> +int add_host_bridge_uport(struct device *match, void *arg)

This change is included by mistake, it will be revert in next version

Thanx
>   {
>   	struct cxl_port *root_port = arg;
>   	struct device *host = root_port->dev.parent;
> @@ -504,30 +504,6 @@ static int add_host_bridge_uport(struct device *match, void *arg)
>   	return 0;
>   }
>   
> -static int add_root_nvdimm_bridge(struct device *match, void *data)
> -{
> -	struct cxl_decoder *cxld;
> -	struct cxl_port *root_port = data;
> -	struct cxl_nvdimm_bridge *cxl_nvb;
> -	struct device *host = root_port->dev.parent;
> -
> -	if (!is_root_decoder(match))
> -		return 0;
> -
> -	cxld = to_cxl_decoder(match);
> -	if (!(cxld->flags & CXL_DECODER_F_PMEM))
> -		return 0;
> -
> -	cxl_nvb = devm_cxl_add_nvdimm_bridge(host, root_port);
> -	if (IS_ERR(cxl_nvb)) {
> -		dev_dbg(host, "failed to register pmem\n");
> -		return PTR_ERR(cxl_nvb);
> -	}
> -	dev_dbg(host, "%s: add: %s\n", dev_name(&root_port->dev),
> -		dev_name(&cxl_nvb->dev));
> -	return 1;
> -}
> -
>   static struct lock_class_key cxl_root_key;
>   
>   static void cxl_acpi_lock_reset_class(void *dev)
> @@ -535,123 +511,6 @@ static void cxl_acpi_lock_reset_class(void *dev)
>   	device_lock_reset_class(dev);
>   }
>   
> -static void del_cxl_resource(struct resource *res)
> -{
> -	kfree(res->name);
> -	kfree(res);
> -}
> -
> -static void cxl_set_public_resource(struct resource *priv, struct resource *pub)
> -{
> -	priv->desc = (unsigned long) pub;
> -}
> -
> -static struct resource *cxl_get_public_resource(struct resource *priv)
> -{
> -	return (struct resource *) priv->desc;
> -}
> -
> -static void remove_cxl_resources(void *data)
> -{
> -	struct resource *res, *next, *cxl = data;
> -
> -	for (res = cxl->child; res; res = next) {
> -		struct resource *victim = cxl_get_public_resource(res);
> -
> -		next = res->sibling;
> -		remove_resource(res);
> -
> -		if (victim) {
> -			remove_resource(victim);
> -			kfree(victim);
> -		}
> -
> -		del_cxl_resource(res);
> -	}
> -}
> -
> -/**
> - * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
> - * @cxl_res: A standalone resource tree where each CXL window is a sibling
> - *
> - * Walk each CXL window in @cxl_res and add it to iomem_resource potentially
> - * expanding its boundaries to ensure that any conflicting resources become
> - * children. If a window is expanded it may then conflict with a another window
> - * entry and require the window to be truncated or trimmed. Consider this
> - * situation:
> - *
> - * |-- "CXL Window 0" --||----- "CXL Window 1" -----|
> - * |--------------- "System RAM" -------------|
> - *
> - * ...where platform firmware has established as System RAM resource across 2
> - * windows, but has left some portion of window 1 for dynamic CXL region
> - * provisioning. In this case "Window 0" will span the entirety of the "System
> - * RAM" span, and "CXL Window 1" is truncated to the remaining tail past the end
> - * of that "System RAM" resource.
> - */
> -static int add_cxl_resources(struct resource *cxl_res)
> -{
> -	struct resource *res, *new, *next;
> -
> -	for (res = cxl_res->child; res; res = next) {
> -		new = kzalloc(sizeof(*new), GFP_KERNEL);
> -		if (!new)
> -			return -ENOMEM;
> -		new->name = res->name;
> -		new->start = res->start;
> -		new->end = res->end;
> -		new->flags = IORESOURCE_MEM;
> -		new->desc = IORES_DESC_CXL;
> -
> -		/*
> -		 * Record the public resource in the private cxl_res tree for
> -		 * later removal.
> -		 */
> -		cxl_set_public_resource(res, new);
> -
> -		insert_resource_expand_to_fit(&iomem_resource, new);
> -
> -		next = res->sibling;
> -		while (next && resource_overlaps(new, next)) {
> -			if (resource_contains(new, next)) {
> -				struct resource *_next = next->sibling;
> -
> -				remove_resource(next);
> -				del_cxl_resource(next);
> -				next = _next;
> -			} else
> -				next->start = new->end + 1;
> -		}
> -	}
> -	return 0;
> -}
> -
> -static int pair_cxl_resource(struct device *dev, void *data)
> -{
> -	struct resource *cxl_res = data;
> -	struct resource *p;
> -
> -	if (!is_root_decoder(dev))
> -		return 0;
> -
> -	for (p = cxl_res->child; p; p = p->sibling) {
> -		struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> -		struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> -		struct resource res = {
> -			.start = cxld->hpa_range.start,
> -			.end = cxld->hpa_range.end,
> -			.flags = IORESOURCE_MEM,
> -		};
> -
> -		if (resource_contains(p, &res)) {
> -			cxlrd->res = cxl_get_public_resource(p);
> -			break;
> -		}
> -	}
> -
> -	return 0;
> -}
> -
>   static int cxl_acpi_probe(struct platform_device *pdev)
>   {
>   	int rc;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 38441634e4c6..d8dae028e8a4 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -989,6 +989,151 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
>   	return 0;
>   }
>   
> +int add_root_nvdimm_bridge(struct device *match, void *data)
> +{
> +	struct cxl_decoder *cxld;
> +	struct cxl_port *root_port = data;
> +	struct cxl_nvdimm_bridge *cxl_nvb;
> +	struct device *host = root_port->dev.parent;
> +
> +	if (!is_root_decoder(match))
> +		return 0;
> +
> +	cxld = to_cxl_decoder(match);
> +	if (!(cxld->flags & CXL_DECODER_F_PMEM))
> +		return 0;
> +
> +	cxl_nvb = devm_cxl_add_nvdimm_bridge(host, root_port);
> +	if (IS_ERR(cxl_nvb)) {
> +		dev_dbg(host, "failed to register pmem\n");
> +		return PTR_ERR(cxl_nvb);
> +	}
> +	dev_dbg(host, "%s: add: %s\n", dev_name(&root_port->dev),
> +		dev_name(&cxl_nvb->dev));
> +	return 1;
> +}
> +EXPORT_SYMBOL_NS_GPL(add_root_nvdimm_bridge, CXL);
> +
> +static void del_cxl_resource(struct resource *res)
> +{
> +	kfree(res->name);
> +	kfree(res);
> +}
> +
> +static void cxl_set_public_resource(struct resource *priv, struct resource *pub)
> +{
> +	priv->desc = (unsigned long) pub;
> +}
> +
> +static struct resource *cxl_get_public_resource(struct resource *priv)
> +{
> +	return (struct resource *) priv->desc;
> +}
> +
> +void remove_cxl_resources(void *data)
> +{
> +	struct resource *res, *next, *cxl = data;
> +
> +	for (res = cxl->child; res; res = next) {
> +		struct resource *victim = cxl_get_public_resource(res);
> +
> +		next = res->sibling;
> +		remove_resource(res);
> +
> +		if (victim) {
> +			remove_resource(victim);
> +			kfree(victim);
> +		}
> +
> +		del_cxl_resource(res);
> +	}
> +}
> +EXPORT_SYMBOL_NS_GPL(remove_cxl_resources, CXL);
> +
> +/**
> + * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
> + * @cxl_res: A standalone resource tree where each CXL window is a sibling
> + *
> + * Walk each CXL window in @cxl_res and add it to iomem_resource potentially
> + * expanding its boundaries to ensure that any conflicting resources become
> + * children. If a window is expanded it may then conflict with a another window
> + * entry and require the window to be truncated or trimmed. Consider this
> + * situation:
> + *
> + * |-- "CXL Window 0" --||----- "CXL Window 1" -----|
> + * |--------------- "System RAM" -------------|
> + *
> + * ...where platform firmware has established as System RAM resource across 2
> + * windows, but has left some portion of window 1 for dynamic CXL region
> + * provisioning. In this case "Window 0" will span the entirety of the "System
> + * RAM" span, and "CXL Window 1" is truncated to the remaining tail past the end
> + * of that "System RAM" resource.
> + */
> +int add_cxl_resources(struct resource *cxl_res)
> +{
> +	struct resource *res, *new, *next;
> +
> +	for (res = cxl_res->child; res; res = next) {
> +		new = kzalloc(sizeof(*new), GFP_KERNEL);
> +		if (!new)
> +			return -ENOMEM;
> +		new->name = res->name;
> +		new->start = res->start;
> +		new->end = res->end;
> +		new->flags = IORESOURCE_MEM;
> +		new->desc = IORES_DESC_CXL;
> +
> +		/*
> +		 * Record the public resource in the private cxl_res tree for
> +		 * later removal.
> +		 */
> +		cxl_set_public_resource(res, new);
> +
> +		insert_resource_expand_to_fit(&iomem_resource, new);
> +
> +		next = res->sibling;
> +		while (next && resource_overlaps(new, next)) {
> +			if (resource_contains(new, next)) {
> +				struct resource *_next = next->sibling;
> +
> +				remove_resource(next);
> +				del_cxl_resource(next);
> +				next = _next;
> +			} else
> +				next->start = new->end + 1;
> +		}
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(add_cxl_resources, CXL);
> +
> +int pair_cxl_resource(struct device *dev, void *data)
> +{
> +	struct resource *cxl_res = data;
> +	struct resource *p;
> +
> +	if (!is_root_decoder(dev))
> +		return 0;
> +
> +	for (p = cxl_res->child; p; p = p->sibling) {
> +		struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +		struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> +		struct resource res = {
> +			.start = cxld->hpa_range.start,
> +			.end = cxld->hpa_range.end,
> +			.flags = IORESOURCE_MEM,
> +		};
> +
> +		if (resource_contains(p, &res)) {
> +			cxlrd->res = cxl_get_public_resource(p);
> +			break;
> +		}
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(pair_cxl_resource, CXL);
> +
>   /*
>    * Since root-level CXL dports cannot be enumerated by PCI they are not
>    * enumerated by the common port driver that acquires the port lock over
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 687043ece101..1397f66d943b 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -839,6 +839,11 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
>   }
>   #endif
>   
> +void remove_cxl_resources(void *data);
> +int add_cxl_resources(struct resource *cxl_res);
> +int pair_cxl_resource(struct device *dev, void *data);
> +int add_root_nvdimm_bridge(struct device *match, void *data);
> +
>   /*
>    * Unit test builds overrides this to __weak, find the 'strong' version
>    * of these symbols in tools/testing/cxl/.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/4] cxl: introduce CXL Virtualization module
  2023-12-28  6:05 [RFC PATCH 0/4] cxl: introduce CXL Virtualization module Dongsheng Yang
                   ` (2 preceding siblings ...)
  2023-12-28  6:05 ` [RFC PATCH 4/4] cxl: introduce CXL Virtualization module Dongsheng Yang
@ 2024-01-03 17:22 ` Ira Weiny
  2024-01-08 12:28   ` Jonathan Cameron
  2024-01-10  2:07   ` Dongsheng Yang
  2024-01-03 20:48 ` Dan Williams
  2024-05-03  5:12 ` Hyeongtak Ji
  5 siblings, 2 replies; 13+ messages in thread
From: Ira Weiny @ 2024-01-03 17:22 UTC (permalink / raw)
  To: Dongsheng Yang, dave, jonathan.cameron, ave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams
  Cc: linux-cxl, Dongsheng Yang

Dongsheng Yang wrote:
> Hi all:
> 	This patchset introduce cxlv module to allow user to
> create virtual cxl device. it's based linux6.7-rc5, you can
> get the code from https://github.com/DataTravelGuide/linux
> 
> 	As the real CXL device is not widely available now, we need
> some virtual cxl device to do uplayer software developing or
> testing. Qemu is good for functional testing, but not good
> for some performance testing.

Do you have more details on what performance is missing from Qemu and why
this solution is better than a solution to fix Qemu?

Long term it seems better to fix Qemu for this type of work.

Are their other advantages to having this additional test infrastructure
in the kernel?  We already have cxl_test.

Ira

> 
> 	The new CXLV module allow user to use the reserved RAM[1], to
> create virtual cxl device. When the cxlv module load, it will
> create a directory named as "cxl_virt" under /sys/devices/virtual:
> 
> 	"/sys/devices/virtual/cxl_virt/"
> 
> that's the top level device for all cxlv devices.
> At the same time, cxlv module will create a debugfs directory:
> 
> /sys/kernel/debug/cxl/cxlv
> ├── create
> └── remove
> 
> the create and remove debugfs file is the cxlv entry to create or remove
> a cxlv device.
> 
> 	Each cxlv device have its owned virtual pci related bridge and bus, cxlv
> will create a new root_port for the new cxlv device, setup cxl ports for
> dport and nvdimm-bridge. After that, we will add the virtual pci device,
> that will go into the cxl_pci_probe to setup new memdev.
> 
> 	Then we can see the cxl device with cxl list and use it as a real cxl
> device.
> 
>  $ echo "memstart=$((8*1024*1024*1024)),cxltype=3,pmem=1,memsize=$((2*1024*1024*1024))" > /sys/kernel/debug/cxl/cxlv/create
>  $ cxl list
> [
>   {
>     "memdev":"mem0",
>     "pmem_size":1879048192,
>     "serial":0,
>     "numa_node":0,
>     "host":"0010:01:00.0"
>   }
> ]
>  $ cxl create-region -m mem0 -d decoder0.0 -t pmem
> {
>   "region":"region0",
>   "resource":"0x210000000",
>   "size":"1792.00 MiB (1879.05 MB)",
>   "type":"pmem",
>   "interleave_ways":1,
>   "interleave_granularity":256,
>   "decode_state":"commit",
>   "mappings":[
>     {
>       "position":0,
>       "memdev":"mem0",
>       "decoder":"decoder2.0"
>     }
>   ]
> }
> cxl region: cmd_create_region: created 1 region
> 
>  $ ndctl create-namespace -r region0 -m fsdax --map dev -t pmem -b 0
> {
>   "dev":"namespace0.0",
>   "mode":"fsdax",
>   "map":"dev",
>   "size":"1762.00 MiB (1847.59 MB)",
>   "uuid":"686fd289-a252-42cf-a3a5-95a39ed5c9d5",
>   "sector_size":512,
>   "align":2097152,
>   "blockdev":"pmem0"
> }
> 
>  $ mkfs.xfs -f /dev/pmem0 
> meta-data=/dev/pmem0             isize=512    agcount=4, agsize=112768
> blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1,
> rmapbt=0
>          =                       reflink=1    bigtime=0 inobtcount=0
> data     =                       bsize=4096   blocks=451072, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=2560, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Any comment is welcome!
> 
> TODO: implement cxlv command in ndctl to do cxlv device management.
> 
> [1]: Add argument in kernel command line: "memmap=nn[KMG]$ss[KMG]",
> detail in Documentation/driver-api/cxl/memory-devices.rst
> 
> Thanx
> 
> Dongsheng Yang (4):
>   cxl: move some function from acpi module to core module
>   cxl/port: allow dport host to be driver-less device
>   cxl/port: introduce cxl_disable_port() function
>   cxl: introduce CXL Virtualization module
> 
>  MAINTAINERS                         |   6 +
>  drivers/cxl/Kconfig                 |  11 +
>  drivers/cxl/Makefile                |   1 +
>  drivers/cxl/acpi.c                  | 143 +-----
>  drivers/cxl/core/port.c             | 231 ++++++++-
>  drivers/cxl/cxl.h                   |   6 +
>  drivers/cxl/cxl_virt/Makefile       |   5 +
>  drivers/cxl/cxl_virt/cxlv.h         |  87 ++++
>  drivers/cxl/cxl_virt/cxlv_debugfs.c | 260 ++++++++++
>  drivers/cxl/cxl_virt/cxlv_device.c  | 311 ++++++++++++
>  drivers/cxl/cxl_virt/cxlv_main.c    |  67 +++
>  drivers/cxl/cxl_virt/cxlv_pci.c     | 710 ++++++++++++++++++++++++++++
>  drivers/cxl/cxl_virt/cxlv_pci.h     | 549 +++++++++++++++++++++
>  drivers/cxl/cxl_virt/cxlv_port.c    | 149 ++++++
>  14 files changed, 2388 insertions(+), 148 deletions(-)
>  create mode 100644 drivers/cxl/cxl_virt/Makefile
>  create mode 100644 drivers/cxl/cxl_virt/cxlv.h
>  create mode 100644 drivers/cxl/cxl_virt/cxlv_debugfs.c
>  create mode 100644 drivers/cxl/cxl_virt/cxlv_device.c
>  create mode 100644 drivers/cxl/cxl_virt/cxlv_main.c
>  create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.c
>  create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.h
>  create mode 100644 drivers/cxl/cxl_virt/cxlv_port.c
> 
> -- 
> 2.34.1
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFC PATCH 0/4] cxl: introduce CXL Virtualization module
  2023-12-28  6:05 [RFC PATCH 0/4] cxl: introduce CXL Virtualization module Dongsheng Yang
                   ` (3 preceding siblings ...)
  2024-01-03 17:22 ` [RFC PATCH 0/4] " Ira Weiny
@ 2024-01-03 20:48 ` Dan Williams
       [not found]   ` <a32d859f-054f-11ca-e8a3-dff7a5234d0a@easystack.cn>
  2024-05-03  5:12 ` Hyeongtak Ji
  5 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2024-01-03 20:48 UTC (permalink / raw)
  To: Dongsheng Yang, dave, jonathan.cameron, ave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams
  Cc: linux-cxl, Dongsheng Yang

Dongsheng Yang wrote:
> Hi all:
> 	This patchset introduce cxlv module to allow user to
> create virtual cxl device. it's based linux6.7-rc5, you can
> get the code from https://github.com/DataTravelGuide/linux
> 
> 	As the real CXL device is not widely available now, we need
> some virtual cxl device to do uplayer software developing or
> testing. Qemu is good for functional testing, but not good
> for some performance testing.

How is it performance testing if it's just using host-DRAM? Is the use
case something like pinning the benchmark on Socket0 and target DRAM on
Socket1 as emulated CXL to approximate CXL bus latency?

> 
> 	The new CXLV module allow user to use the reserved RAM[1], to
> create virtual cxl device. When the cxlv module load, it will
> create a directory named as "cxl_virt" under /sys/devices/virtual:
> 
> 	"/sys/devices/virtual/cxl_virt/"
> 
> that's the top level device for all cxlv devices.
> At the same time, cxlv module will create a debugfs directory:
> 
> /sys/kernel/debug/cxl/cxlv
> ├── create
> └── remove
> 
> the create and remove debugfs file is the cxlv entry to create or remove
> a cxlv device.
> 
> 	Each cxlv device have its owned virtual pci related bridge and bus, cxlv
> will create a new root_port for the new cxlv device, setup cxl ports for
> dport and nvdimm-bridge. After that, we will add the virtual pci device,
> that will go into the cxl_pci_probe to setup new memdev.
> 
> 	Then we can see the cxl device with cxl list and use it as a real cxl
> device.
> 
>  $ echo "memstart=$((8*1024*1024*1024)),cxltype=3,pmem=1,memsize=$((2*1024*1024*1024))" > /sys/kernel/debug/cxl/cxlv/create

Are these ranges reserved out of the mmap at boot time? 

[..]
>  14 files changed, 2388 insertions(+), 148 deletions(-)

This seems like a lot of code for something that is mostly already
supported by tools/testing/cxl/ (cxl_test). That too creates virtual CXL
devices that support ABI flows that are difficult to support in QEMU.
The only thing missing for "performance / functional emulation" testing
today is backing the memory regions with accessible memory rather than
unusable address space.

It is also the case that the static nature of cxl_test topology
definition has already started to prove too limiting for some tests. So
an enhancement to make cxl_test more dynamic like your proposed command
interface is appealing.

One change to get cxl_test to get it to emulate with DRAM rather than
fake address space is to just fill cxl_mock_pool with addresses backed
by DRAM rather than addresses from an unused portion of the physical
address map.

Currently cxl_test defines address ranges that may be larger than what a
host or VM can support, so this would be a new cxl_test mode limited by
available / reserved memory capacity.

See cxl-create-region.sh for an example of the virtual CXL regions that
cxl_test creates today:

https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/4] cxl: introduce CXL Virtualization module
  2024-01-03 17:22 ` [RFC PATCH 0/4] " Ira Weiny
@ 2024-01-08 12:28   ` Jonathan Cameron
  2024-01-10  2:07   ` Dongsheng Yang
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Cameron @ 2024-01-08 12:28 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dongsheng Yang, dave, ave.jiang, alison.schofield,
	vishal.l.verma, dan.j.williams, linux-cxl

On Wed, 3 Jan 2024 09:22:36 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> Dongsheng Yang wrote:
> > Hi all:
> > 	This patchset introduce cxlv module to allow user to
> > create virtual cxl device. it's based linux6.7-rc5, you can
> > get the code from https://github.com/DataTravelGuide/linux
> > 
> > 	As the real CXL device is not widely available now, we need
> > some virtual cxl device to do uplayer software developing or
> > testing. Qemu is good for functional testing, but not good
> > for some performance testing.  
> 
> Do you have more details on what performance is missing from Qemu and why
> this solution is better than a solution to fix Qemu?
> 
> Long term it seems better to fix Qemu for this type of work.

I plan to look at this sometime soon, but note that the fix will be special
cases only (no interleave!) in the short term.  Emulating interleave
is always going to be costly - can probably be better than we have it today
for large granularity (pages) but I'm not sure we will ever care enough
to implement that.

For virtualization usecases, if we go with CXL emulation as the path for
DCD then we'll just emulate direct connected devices and patch up the
perf characteristics to cover interleave, switches etc.

With such limitations we can get QEMU to perform well. I'm not keen on
separating QEMU for functional testing from QEMU for workload testing
but meh, we can at least make it automatic to use a higher perf root
if the interleave config allows it.

> 
> Are their other advantages to having this additional test infrastructure
> in the kernel?  We already have cxl_test.
> 
> Ira
> 
> > 
> > 	The new CXLV module allow user to use the reserved RAM[1], to
> > create virtual cxl device. When the cxlv module load, it will
> > create a directory named as "cxl_virt" under /sys/devices/virtual:
> > 
> > 	"/sys/devices/virtual/cxl_virt/"
> > 
> > that's the top level device for all cxlv devices.
> > At the same time, cxlv module will create a debugfs directory:
> > 
> > /sys/kernel/debug/cxl/cxlv
> > ├── create
> > └── remove
> > 
> > the create and remove debugfs file is the cxlv entry to create or remove
> > a cxlv device.
> > 
> > 	Each cxlv device have its owned virtual pci related bridge and bus, cxlv
> > will create a new root_port for the new cxlv device, setup cxl ports for
> > dport and nvdimm-bridge. After that, we will add the virtual pci device,
> > that will go into the cxl_pci_probe to setup new memdev.
> > 
> > 	Then we can see the cxl device with cxl list and use it as a real cxl
> > device.
> > 
> >  $ echo "memstart=$((8*1024*1024*1024)),cxltype=3,pmem=1,memsize=$((2*1024*1024*1024))" > /sys/kernel/debug/cxl/cxlv/create
> >  $ cxl list
> > [
> >   {
> >     "memdev":"mem0",
> >     "pmem_size":1879048192,
> >     "serial":0,
> >     "numa_node":0,
> >     "host":"0010:01:00.0"
> >   }
> > ]
> >  $ cxl create-region -m mem0 -d decoder0.0 -t pmem
> > {
> >   "region":"region0",
> >   "resource":"0x210000000",
> >   "size":"1792.00 MiB (1879.05 MB)",
> >   "type":"pmem",
> >   "interleave_ways":1,
> >   "interleave_granularity":256,
> >   "decode_state":"commit",
> >   "mappings":[
> >     {
> >       "position":0,
> >       "memdev":"mem0",
> >       "decoder":"decoder2.0"
> >     }
> >   ]
> > }
> > cxl region: cmd_create_region: created 1 region
> > 
> >  $ ndctl create-namespace -r region0 -m fsdax --map dev -t pmem -b 0
> > {
> >   "dev":"namespace0.0",
> >   "mode":"fsdax",
> >   "map":"dev",
> >   "size":"1762.00 MiB (1847.59 MB)",
> >   "uuid":"686fd289-a252-42cf-a3a5-95a39ed5c9d5",
> >   "sector_size":512,
> >   "align":2097152,
> >   "blockdev":"pmem0"
> > }
> > 
> >  $ mkfs.xfs -f /dev/pmem0 
> > meta-data=/dev/pmem0             isize=512    agcount=4, agsize=112768
> > blks
> >          =                       sectsz=4096  attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=1,
> > rmapbt=0
> >          =                       reflink=1    bigtime=0 inobtcount=0
> > data     =                       bsize=4096   blocks=451072, imaxpct=25
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > log      =internal log           bsize=4096   blocks=2560, version=2
> >          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > Any comment is welcome!
> > 
> > TODO: implement cxlv command in ndctl to do cxlv device management.
> > 
> > [1]: Add argument in kernel command line: "memmap=nn[KMG]$ss[KMG]",
> > detail in Documentation/driver-api/cxl/memory-devices.rst
> > 
> > Thanx
> > 
> > Dongsheng Yang (4):
> >   cxl: move some function from acpi module to core module
> >   cxl/port: allow dport host to be driver-less device
> >   cxl/port: introduce cxl_disable_port() function
> >   cxl: introduce CXL Virtualization module
> > 
> >  MAINTAINERS                         |   6 +
> >  drivers/cxl/Kconfig                 |  11 +
> >  drivers/cxl/Makefile                |   1 +
> >  drivers/cxl/acpi.c                  | 143 +-----
> >  drivers/cxl/core/port.c             | 231 ++++++++-
> >  drivers/cxl/cxl.h                   |   6 +
> >  drivers/cxl/cxl_virt/Makefile       |   5 +
> >  drivers/cxl/cxl_virt/cxlv.h         |  87 ++++
> >  drivers/cxl/cxl_virt/cxlv_debugfs.c | 260 ++++++++++
> >  drivers/cxl/cxl_virt/cxlv_device.c  | 311 ++++++++++++
> >  drivers/cxl/cxl_virt/cxlv_main.c    |  67 +++
> >  drivers/cxl/cxl_virt/cxlv_pci.c     | 710 ++++++++++++++++++++++++++++
> >  drivers/cxl/cxl_virt/cxlv_pci.h     | 549 +++++++++++++++++++++
> >  drivers/cxl/cxl_virt/cxlv_port.c    | 149 ++++++
> >  14 files changed, 2388 insertions(+), 148 deletions(-)
> >  create mode 100644 drivers/cxl/cxl_virt/Makefile
> >  create mode 100644 drivers/cxl/cxl_virt/cxlv.h
> >  create mode 100644 drivers/cxl/cxl_virt/cxlv_debugfs.c
> >  create mode 100644 drivers/cxl/cxl_virt/cxlv_device.c
> >  create mode 100644 drivers/cxl/cxl_virt/cxlv_main.c
> >  create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.c
> >  create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.h
> >  create mode 100644 drivers/cxl/cxl_virt/cxlv_port.c
> > 
> > -- 
> > 2.34.1
> >   
> 
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/4] cxl: introduce CXL Virtualization module
  2024-01-03 17:22 ` [RFC PATCH 0/4] " Ira Weiny
  2024-01-08 12:28   ` Jonathan Cameron
@ 2024-01-10  2:07   ` Dongsheng Yang
  1 sibling, 0 replies; 13+ messages in thread
From: Dongsheng Yang @ 2024-01-10  2:07 UTC (permalink / raw)
  To: Ira Weiny, dave, jonathan.cameron, ave.jiang, alison.schofield,
	vishal.l.verma, dan.j.williams
  Cc: linux-cxl



在 2024/1/4 星期四 上午 1:22, Ira Weiny 写道:
> Dongsheng Yang wrote:
>> Hi all:
>> 	This patchset introduce cxlv module to allow user to
>> create virtual cxl device. it's based linux6.7-rc5, you can
>> get the code from https://github.com/DataTravelGuide/linux
>>
>> 	As the real CXL device is not widely available now, we need
>> some virtual cxl device to do uplayer software developing or
>> testing. Qemu is good for functional testing, but not good
>> for some performance testing.
> 
> Do you have more details on what performance is missing from Qemu and why
> this solution is better than a solution to fix Qemu?
> 
> Long term it seems better to fix Qemu for this type of work.
> 
> Are their other advantages to having this additional test infrastructure
> in the kernel?  We already have cxl_test.

Hi Ira,
	Let me explain more about what I mean by "qemu is not good for some 
performance testing". cxlv is not designed to test cxl driver itself, it 
is used to do performance testing for upper level layer software. I can 
give an example about the performance data:

(1) fio to test /dev/dax0.0 in qemu
qemu with memory-backend-ram, and create region and namespace with mod 
of devdax. and then run fio with ioengine dev-dax, fio file as below[1].

fio in qemu result detail in [2], the average iops is: avg=1919.26.

(2) fio to test /dev/dax0.0 in native host with cxlv
use cxlv to create cxl device and create region and namespace with mod 
of devdax. then run fio with the same fio file in [1].

fio in host result detail in [3], the average iops is: avg=1510391.68.


Now you can see the resule iops is about 1500K vs 1.9K.

I can explain more about why this matters, I am doing another project in 
block device layer, named cbd(cxl block device). It uses cxl memdev as a 
cache and a backing with other block device. it works similar with 
bcache, but is newly designed for cxl memory device which is byte 
addressable and latency very small. So I need a fast cxl device to 
verify my design in uppper layer is working well, E,g, indexing. qemu is 
too slow to this kind of performance testing. I dont think we can "fix" 
it, that's not what qemu need to do.

So when I say qemu is not good for performance testing, it's not saying 
I want some performance improvement of cxl implement in qemu, but I want 
to say the whole qemu-way is not suitable for latency sensitive testing.

Thanx




[1]:
[global]
bs=1K
ioengine=dev-dax
norandommap
time_based
runtime=10
group_reporting
disable_lat=1
disable_slat=1
disable_clat=1
clat_percentiles=0
cpus_allowed_policy=split

# For the dev-dax engine:
#
#   IOs always complete immediately
#   IOs are always direct
#
iodepth=1
direct=0
thread
numjobs=1
#
# The dev-dax engine does IO to DAX device that are special character
# devices exported by the kernel (e.g. /dev/dax0.0). The device is
# opened normally and then the region is accessible via mmap. We do
# not use the O_DIRECT flag because the device is naturally direct
# access. The O_DIRECT flags will result in failure. The engine
# access the underlying NVDIMM directly once the mmapping is setup.
#
# Check the alignment requirement of your DAX device. Currently the default
# should be 2M. Blocksize (bs) should meet alignment requirement.
#
# An example of creating a dev dax device node from pmem:
# ndctl create-namespace --reconfig=namespace0.0 --mode=dax --force
#
filename=/dev/dax0.0

[dev-dax-write]
rw=randwrite
stonewall

[2]:
# fio ./dax.fio
dev-dax-write: (g=0): rw=randwrite, bs=(R) 1024B-1024B, (W) 1024B-1024B, 
(T) 1024B-1024B, ioengine=dev-dax, iodepth=1
fio-3.36
Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=1929KiB/s][w=1929 IOPS][eta 00m:00s] 

dev-dax-write: (groupid=0, jobs=1): err= 0: pid=1198: Tue Jan  9 
10:17:21 2024
   write: IOPS=1917, BW=1918KiB/s (1964kB/s)(18.7MiB/10001msec); 0 zone 
resets
    bw (  KiB/s): min= 1700, max= 1944, per=100.00%, avg=1919.26, 
stdev=54.14, samples=19
    iops        : min= 1700, max= 1944, avg=1919.26, stdev=54.14, 
samples=19
   cpu          : usr=99.97%, sys=0.00%, ctx=12, majf=0, minf=126 

   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=0,19181,0,0 short=0,0,0,0 dropped=0,0,0,0 

      latency   : target=0, window=0, percentile=100.00%, depth=1 


Run status group 0 (all jobs):
   WRITE: bw=1918KiB/s (1964kB/s), 1918KiB/s-1918KiB/s 
(1964kB/s-1964kB/s), io=18.7MiB (19.6MB), run=10001-10001msec

[3]:
# fio ./dax.fio 

dev-dax-write: (g=0): rw=randwrite, bs=(R) 1024B-1024B, (W) 1024B-1024B, 
(T) 1024B-1024B, ioengine=dev-dax, iodepth=1
fio-3.36 

Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=1480MiB/s][w=1515k IOPS][eta 00m:00s] 

dev-dax-write: (groupid=0, jobs=1): err= 0: pid=41999: Tue Jan  9 
18:11:18 2024
   write: IOPS=1510k, BW=1474MiB/s (1546MB/s)(14.4GiB/10000msec); 0 zone 
resets
    bw (  MiB/s): min= 1418, max= 1480, per=100.00%, avg=1474.99, 
stdev=13.83, samples=19
    iops        : min=1452406, max=1515908, avg=1510391.68, 
stdev=14156.58, samples=19 
 
                                                     cpu          : 
usr=99.82%, sys=0.00%, ctx=22, majf=0, minf=899 
 
 
                        IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 
16=0.0%, 32=0.0%, >=64=0.0% 
 
                                                                submit 
   : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 

      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=0,15096228,0,0 short=0,0,0,0 dropped=0,0,0,0 
 
 

      latency   : target=0, window=0, percentile=100.00%, depth=1 

 

Run status group 0 (all jobs): 

   WRITE: bw=1474MiB/s (1546MB/s), 1474MiB/s-1474MiB/s 
(1546MB/s-1546MB/s), io=14.4GiB (15.5GB), run=10000-10000msec
> 
> Ira
> 
>>
>> 	The new CXLV module allow user to use the reserved RAM[1], to
>> create virtual cxl device. When the cxlv module load, it will
>> create a directory named as "cxl_virt" under /sys/devices/virtual:
>>
>> 	"/sys/devices/virtual/cxl_virt/"
>>
>> that's the top level device for all cxlv devices.
>> At the same time, cxlv module will create a debugfs directory:
>>
>> /sys/kernel/debug/cxl/cxlv
>> ├── create
>> └── remove
>>
>> the create and remove debugfs file is the cxlv entry to create or remove
>> a cxlv device.
>>
>> 	Each cxlv device have its owned virtual pci related bridge and bus, cxlv
>> will create a new root_port for the new cxlv device, setup cxl ports for
>> dport and nvdimm-bridge. After that, we will add the virtual pci device,
>> that will go into the cxl_pci_probe to setup new memdev.
>>
>> 	Then we can see the cxl device with cxl list and use it as a real cxl
>> device.
>>
>>   $ echo "memstart=$((8*1024*1024*1024)),cxltype=3,pmem=1,memsize=$((2*1024*1024*1024))" > /sys/kernel/debug/cxl/cxlv/create
>>   $ cxl list
>> [
>>    {
>>      "memdev":"mem0",
>>      "pmem_size":1879048192,
>>      "serial":0,
>>      "numa_node":0,
>>      "host":"0010:01:00.0"
>>    }
>> ]
>>   $ cxl create-region -m mem0 -d decoder0.0 -t pmem
>> {
>>    "region":"region0",
>>    "resource":"0x210000000",
>>    "size":"1792.00 MiB (1879.05 MB)",
>>    "type":"pmem",
>>    "interleave_ways":1,
>>    "interleave_granularity":256,
>>    "decode_state":"commit",
>>    "mappings":[
>>      {
>>        "position":0,
>>        "memdev":"mem0",
>>        "decoder":"decoder2.0"
>>      }
>>    ]
>> }
>> cxl region: cmd_create_region: created 1 region
>>
>>   $ ndctl create-namespace -r region0 -m fsdax --map dev -t pmem -b 0
>> {
>>    "dev":"namespace0.0",
>>    "mode":"fsdax",
>>    "map":"dev",
>>    "size":"1762.00 MiB (1847.59 MB)",
>>    "uuid":"686fd289-a252-42cf-a3a5-95a39ed5c9d5",
>>    "sector_size":512,
>>    "align":2097152,
>>    "blockdev":"pmem0"
>> }
>>
>>   $ mkfs.xfs -f /dev/pmem0
>> meta-data=/dev/pmem0             isize=512    agcount=4, agsize=112768
>> blks
>>           =                       sectsz=4096  attr=2, projid32bit=1
>>           =                       crc=1        finobt=1, sparse=1,
>> rmapbt=0
>>           =                       reflink=1    bigtime=0 inobtcount=0
>> data     =                       bsize=4096   blocks=451072, imaxpct=25
>>           =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
>> log      =internal log           bsize=4096   blocks=2560, version=2
>>           =                       sectsz=4096  sunit=1 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>
>> Any comment is welcome!
>>
>> TODO: implement cxlv command in ndctl to do cxlv device management.
>>
>> [1]: Add argument in kernel command line: "memmap=nn[KMG]$ss[KMG]",
>> detail in Documentation/driver-api/cxl/memory-devices.rst
>>
>> Thanx
>>
>> Dongsheng Yang (4):
>>    cxl: move some function from acpi module to core module
>>    cxl/port: allow dport host to be driver-less device
>>    cxl/port: introduce cxl_disable_port() function
>>    cxl: introduce CXL Virtualization module
>>
>>   MAINTAINERS                         |   6 +
>>   drivers/cxl/Kconfig                 |  11 +
>>   drivers/cxl/Makefile                |   1 +
>>   drivers/cxl/acpi.c                  | 143 +-----
>>   drivers/cxl/core/port.c             | 231 ++++++++-
>>   drivers/cxl/cxl.h                   |   6 +
>>   drivers/cxl/cxl_virt/Makefile       |   5 +
>>   drivers/cxl/cxl_virt/cxlv.h         |  87 ++++
>>   drivers/cxl/cxl_virt/cxlv_debugfs.c | 260 ++++++++++
>>   drivers/cxl/cxl_virt/cxlv_device.c  | 311 ++++++++++++
>>   drivers/cxl/cxl_virt/cxlv_main.c    |  67 +++
>>   drivers/cxl/cxl_virt/cxlv_pci.c     | 710 ++++++++++++++++++++++++++++
>>   drivers/cxl/cxl_virt/cxlv_pci.h     | 549 +++++++++++++++++++++
>>   drivers/cxl/cxl_virt/cxlv_port.c    | 149 ++++++
>>   14 files changed, 2388 insertions(+), 148 deletions(-)
>>   create mode 100644 drivers/cxl/cxl_virt/Makefile
>>   create mode 100644 drivers/cxl/cxl_virt/cxlv.h
>>   create mode 100644 drivers/cxl/cxl_virt/cxlv_debugfs.c
>>   create mode 100644 drivers/cxl/cxl_virt/cxlv_device.c
>>   create mode 100644 drivers/cxl/cxl_virt/cxlv_main.c
>>   create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.c
>>   create mode 100644 drivers/cxl/cxl_virt/cxlv_pci.h
>>   create mode 100644 drivers/cxl/cxl_virt/cxlv_port.c
>>
>> -- 
>> 2.34.1
>>
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/4] cxl: introduce CXL Virtualization module
       [not found]   ` <a32d859f-054f-11ca-e8a3-dff7a5234d0a@easystack.cn>
@ 2024-01-25  3:49     ` Dan Williams
  2024-01-25  6:49       ` Dongsheng Yang
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2024-01-25  3:49 UTC (permalink / raw)
  To: Dongsheng Yang, Dan Williams, dave, jonathan.cameron, ave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny
  Cc: linux-cxl

Dongsheng Yang wrote:
> 
> 
> 在 2024/1/4 星期四 上午 4:48, Dan Williams 写道:
> > Dongsheng Yang wrote:
> >> Hi all:
> >> 	This patchset introduce cxlv module to allow user to
> >> create virtual cxl device. it's based linux6.7-rc5, you can
> >> get the code from https://github.com/DataTravelGuide/linux
> >>
> >> 	As the real CXL device is not widely available now, we need
> >> some virtual cxl device to do uplayer software developing or
> >> testing. Qemu is good for functional testing, but not good
> >> for some performance testing.
> > 
> > How is it performance testing if it's just using host-DRAM? Is the use
> > case something like pinning the benchmark on Socket0 and target DRAM on
> > Socket1 as emulated CXL to approximate CXL bus latency?
> 
> Hi Dan,
> 	I give an example as below, please check it inline.
> > 
> >>
> >> 	The new CXLV module allow user to use the reserved RAM[1], to
> >> create virtual cxl device. When the cxlv module load, it will
> >> create a directory named as "cxl_virt" under /sys/devices/virtual:
> >>
> >> 	"/sys/devices/virtual/cxl_virt/"
> >>
> >> that's the top level device for all cxlv devices.
> >> At the same time, cxlv module will create a debugfs directory:
> >>
> >> /sys/kernel/debug/cxl/cxlv
> >> ├── create
> >> └── remove
> >>
> >> the create and remove debugfs file is the cxlv entry to create or remove
> >> a cxlv device.
> >>
> >> 	Each cxlv device have its owned virtual pci related bridge and bus, cxlv
> >> will create a new root_port for the new cxlv device, setup cxl ports for
> >> dport and nvdimm-bridge. After that, we will add the virtual pci device,
> >> that will go into the cxl_pci_probe to setup new memdev.
> >>
> >> 	Then we can see the cxl device with cxl list and use it as a real cxl
> >> device.
> >>
> >>   $ echo "memstart=$((8*1024*1024*1024)),cxltype=3,pmem=1,memsize=$((2*1024*1024*1024))" > /sys/kernel/debug/cxl/cxlv/create
> > 
> > Are these ranges reserved out of the mmap at boot time?
> 
> Yes, it is reserved by memmap option in boot cmdline. I use memmap=8G$8G.

A faster way to get to a device-dax interface fronting reserved memory
is to use the efi_fake_mem= command line option.

For example:

    efi_fake_mem=4G@13G:0x40000

...assigns 4GB of System-RAM starting at the 13G physical offset with
the EFI_MEMORY_SP attribute. By default the kernel creates device-dax
devices for that dedicated memory.

For dax mapping performance testing you don't need any of the CXL
driver infrastructure since the CXL driver has nothing to do with the
data path.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/4] cxl: introduce CXL Virtualization module
  2024-01-25  3:49     ` Dan Williams
@ 2024-01-25  6:49       ` Dongsheng Yang
  2024-01-25  7:46         ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: Dongsheng Yang @ 2024-01-25  6:49 UTC (permalink / raw)
  To: Dan Williams, dave, jonathan.cameron, ave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny
  Cc: linux-cxl



在 2024/1/25 星期四 上午 11:49, Dan Williams 写道:
> Dongsheng Yang wrote:
>>
>>
>> 在 2024/1/4 星期四 上午 4:48, Dan Williams 写道:
>>> Dongsheng Yang wrote:
>>>> Hi all:
>>>> 	This patchset introduce cxlv module to allow user to
>>>> create virtual cxl device. it's based linux6.7-rc5, you can
>>>> get the code from https://github.com/DataTravelGuide/linux
>>>>
>>>> 	As the real CXL device is not widely available now, we need
>>>> some virtual cxl device to do uplayer software developing or
>>>> testing. Qemu is good for functional testing, but not good
>>>> for some performance testing.
>>>
>>> How is it performance testing if it's just using host-DRAM? Is the use
>>> case something like pinning the benchmark on Socket0 and target DRAM on
>>> Socket1 as emulated CXL to approximate CXL bus latency?
>>
>> Hi Dan,
>> 	I give an example as below, please check it inline.
>>>
>>>>
>>>> 	The new CXLV module allow user to use the reserved RAM[1], to
>>>> create virtual cxl device. When the cxlv module load, it will
>>>> create a directory named as "cxl_virt" under /sys/devices/virtual:
>>>>
>>>> 	"/sys/devices/virtual/cxl_virt/"
>>>>
>>>> that's the top level device for all cxlv devices.
>>>> At the same time, cxlv module will create a debugfs directory:
>>>>
>>>> /sys/kernel/debug/cxl/cxlv
>>>> ├── create
>>>> └── remove
>>>>
>>>> the create and remove debugfs file is the cxlv entry to create or remove
>>>> a cxlv device.
>>>>
>>>> 	Each cxlv device have its owned virtual pci related bridge and bus, cxlv
>>>> will create a new root_port for the new cxlv device, setup cxl ports for
>>>> dport and nvdimm-bridge. After that, we will add the virtual pci device,
>>>> that will go into the cxl_pci_probe to setup new memdev.
>>>>
>>>> 	Then we can see the cxl device with cxl list and use it as a real cxl
>>>> device.
>>>>
>>>>    $ echo "memstart=$((8*1024*1024*1024)),cxltype=3,pmem=1,memsize=$((2*1024*1024*1024))" > /sys/kernel/debug/cxl/cxlv/create
>>>
>>> Are these ranges reserved out of the mmap at boot time?
>>
>> Yes, it is reserved by memmap option in boot cmdline. I use memmap=8G$8G.
> 
> A faster way to get to a device-dax interface fronting reserved memory
> is to use the efi_fake_mem= command line option.
> 
> For example:
> 
>      efi_fake_mem=4G@13G:0x40000
> 
> ...assigns 4GB of System-RAM starting at the 13G physical offset with
> the EFI_MEMORY_SP attribute. By default the kernel creates device-dax
> devices for that dedicated memory.
> 
> For dax mapping performance testing you don't need any of the CXL
> driver infrastructure since the CXL driver has nothing to do with the
> data path.

Thanx for your information, I create cxlv because I think there could be 
some other use cases other than device-dax to use cxl memdev. In that 
way, we need emulate cxl memdev in cxl driver level.

If we always use cxl memdev by creating a region and creating a 
device-dax, I agree we dont need to emulate it in cxl driver level.

Thanx
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/4] cxl: introduce CXL Virtualization module
  2024-01-25  6:49       ` Dongsheng Yang
@ 2024-01-25  7:46         ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2024-01-25  7:46 UTC (permalink / raw)
  To: Dongsheng Yang, Dan Williams, dave, jonathan.cameron, ave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny
  Cc: linux-cxl

Dongsheng Yang wrote:
[..]
> > A faster way to get to a device-dax interface fronting reserved memory
> > is to use the efi_fake_mem= command line option.
> > 
> > For example:
> > 
> >      efi_fake_mem=4G@13G:0x40000
> > 
> > ...assigns 4GB of System-RAM starting at the 13G physical offset with
> > the EFI_MEMORY_SP attribute. By default the kernel creates device-dax
> > devices for that dedicated memory.
> > 
> > For dax mapping performance testing you don't need any of the CXL
> > driver infrastructure since the CXL driver has nothing to do with the
> > data path.
> 
> Thanx for your information, I create cxlv because I think there could be 
> some other use cases other than device-dax to use cxl memdev. In that 
> way, we need emulate cxl memdev in cxl driver level.
> 
> If we always use cxl memdev by creating a region and creating a 
> device-dax, I agree we dont need to emulate it in cxl driver level.

Upstream Linux has no apparent demand for cxlv as cxl_test, QEMU CXL,
and/or EFI_MEMORY_SP enumeration of device-dax already covers the need.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/4] cxl: introduce CXL Virtualization module
  2023-12-28  6:05 [RFC PATCH 0/4] cxl: introduce CXL Virtualization module Dongsheng Yang
                   ` (4 preceding siblings ...)
  2024-01-03 20:48 ` Dan Williams
@ 2024-05-03  5:12 ` Hyeongtak Ji
  5 siblings, 0 replies; 13+ messages in thread
From: Hyeongtak Ji @ 2024-05-03  5:12 UTC (permalink / raw)
  To: Dongsheng Yang
  Cc: linux-cxl, dave, jonathan.cameron, ave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams

Hello Dongsheng,

Thank you for sharing this work!  I might be a little late but it would be
helpful if you can answer a few questions below.

On Thu, 28 Dec 2023 06:05:06 +0000 Dongsheng Yang <dongsheng.yang@easystack.cn> wrote:
> Hi all:
> 	This patchset introduce cxlv module to allow user to
> create virtual cxl device. it's based linux6.7-rc5, you can
> get the code from https://github.com/DataTravelGuide/linux
> 
> 	As the real CXL device is not widely available now, we need
> some virtual cxl device to do uplayer software developing or
> testing. Qemu is good for functional testing, but not good
> for some performance testing.
> 
> 	The new CXLV module allow user to use the reserved RAM[1], to
> create virtual cxl device. When the cxlv module load, it will
> create a directory named as "cxl_virt" under /sys/devices/virtual:
> 
> 	"/sys/devices/virtual/cxl_virt/"
> 
> that's the top level device for all cxlv devices.
> At the same time, cxlv module will create a debugfs directory:
> 
> /sys/kernel/debug/cxl/cxlv
> ├── create
> └── remove
> 
> the create and remove debugfs file is the cxlv entry to create or remove
> a cxlv device.
> 
> 	Each cxlv device have its owned virtual pci related bridge and bus, cxlv
> will create a new root_port for the new cxlv device, setup cxl ports for
> dport and nvdimm-bridge. After that, we will add the virtual pci device,
> that will go into the cxl_pci_probe to setup new memdev.
> 
> 	Then we can see the cxl device with cxl list and use it as a real cxl
> device.
> 
>  $ echo "memstart=$((8*1024*1024*1024)),cxltype=3,pmem=1,memsize=$((2*1024*1024*1024))" > /sys/kernel/debug/cxl/cxlv/create

I tried following your usage (w/ "memmap=8G$8G") but it does not seem to work
well. After creation I got logs like below:

  [   35.484764] PCI host bridge to bus 0010:00
  [   35.485015] pci_bus 0010:00: root bus resource [io  0x0000-0xffff]
  [   35.485446] pci_bus 0010:00: root bus resource [mem 0x00000000-0x7fffffffff]
  [   35.485817] pci_bus 0010:00: root bus resource [bus 00-ff]
  [   35.486126] pci 0010:00:00.0: [7c73:9a6c] type 01 class 0x060400
  [   35.486436] pci 0010:00:00.0: reg 0x10: [mem 0x200100000-0x2001fffff 64bit pref]
  [   35.486875] pci 0010:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
  [   35.487300] pci 0010:01:00.0: [7c73:9a6c] type 00 class 0x050210
  [   35.487745] pci 0010:01:00.0: reg 0x10: [mem 0x200000000-0x2001fffff 64bit pref]
  [   35.488171] pci 0010:00:00.0: PCI bridge to [bus 01-ff]
  [   35.488438] pci 0010:00:00.0:   bridge window [io  0x0000-0x0fff]
  [   35.488756] pci 0010:00:00.0:   bridge window [mem 0x00000000-0x000fffff]
  [   35.489101] pci 0010:00:00.0:   bridge window [mem 0x00000000-0x000fffff pref]
  [   35.489462] pci_bus 0010:01: busn_res: [bus 01-ff] end is updated to 01
  [   35.511966] pcieport 0010:00:00.0: enabling device (0000 -> 0003)
  [   35.512403] pci 0010:00:00.0: enabling device (0000 -> 0003)
  [   35.512738] cxl_pci 0010:01:00.0: enabling device (0000 -> 0002)
  [   35.517755] cxl_virt cxlv0: unsupported cmd: 0x301
  [   35.542026] cxl_virt cxlv0: unsupported cmd: 0x4500
  [   35.543738] cxl_virt cxlv0: unsupported cmd: 0x4500

Is it normal to get "unsupported cmd" here?

>  $ cxl list
> [
>   {
>     "memdev":"mem0",
>     "pmem_size":1879048192,
>     "serial":0,
>     "numa_node":0,
>     "host":"0010:01:00.0"
>   }
> ]

I got the exact same result against `cxl list`.

>  $ cxl create-region -m mem0 -d decoder0.0 -t pmem
> {
>   "region":"region0",
>   "resource":"0x210000000",
>   "size":"1792.00 MiB (1879.05 MB)",
>   "type":"pmem",
>   "interleave_ways":1,
>   "interleave_granularity":256,
>   "decode_state":"commit",
>   "mappings":[
>     {
>       "position":0,
>       "memdev":"mem0",
>       "decoder":"decoder2.0"
>     }
>   ]
> }
> cxl region: cmd_create_region: created 1 region

Instead of successful region creation, what I could find was

  $ cxl create-region -m mem0 -d decoder0.0 -t pmem
  cxl region: create_region: region0: failed to commit decode: No such device or address
  cxl region: cmd_create_region: created 0 regions

How can I follow your usage in the letter?

...snip...

Kind regards,
Hyeongtak

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-05-03  5:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-28  6:05 [RFC PATCH 0/4] cxl: introduce CXL Virtualization module Dongsheng Yang
2023-12-28  6:05 ` [RFC PATCH 1/4] cxl: move some function from acpi module to core module Dongsheng Yang
2023-12-28  6:43   ` Dongsheng Yang
2023-12-28  6:05 ` [RFC PATCH 3/4] cxl/port: introduce cxl_disable_port() function Dongsheng Yang
2023-12-28  6:05 ` [RFC PATCH 4/4] cxl: introduce CXL Virtualization module Dongsheng Yang
2024-01-03 17:22 ` [RFC PATCH 0/4] " Ira Weiny
2024-01-08 12:28   ` Jonathan Cameron
2024-01-10  2:07   ` Dongsheng Yang
2024-01-03 20:48 ` Dan Williams
     [not found]   ` <a32d859f-054f-11ca-e8a3-dff7a5234d0a@easystack.cn>
2024-01-25  3:49     ` Dan Williams
2024-01-25  6:49       ` Dongsheng Yang
2024-01-25  7:46         ` Dan Williams
2024-05-03  5:12 ` Hyeongtak Ji

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).