linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Region Creation
@ 2021-06-10 18:57 Ben Widawsky
  2021-06-10 18:57 ` [RFC PATCH 1/4] cxl/region: Add region creation ABI Ben Widawsky
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Ben Widawsky @ 2021-06-10 18:57 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ben Widawsky, Alison Schofield, Dan Williams, Ira Weiny,
	Jonathan Cameron, Vishal Verma

CXL interleave sets and non-interleave sets are described via regions. A region
is specified in the CXL 2.0 specification and the purpose is to create a
standardized way to preserve the region across reboots.

Introduced here is the basic mechanism to create and configure and delete a CXL
region. Configuring a region simply means giving it a size, offset within the
CFMWS window, UUID, and a target list. Enabling/activating a region, which
ultimately means programming the HDM decoders in the chain, is left for later
work.

The patches are only minimally tested so far in QEMU emulation and so x1
interleave is all that's supported.

Here is a sample topology (also in patch #4)

    decoder1.0
    ├── create_region
    ├── delete_region
    ├── devtype
    ├── locked
    ├── region1.0:0
    │   ├── offset
    │   ├── size
    │   ├── subsystem -> ../../../../../../../bus/cxl
    │   ├── target0
    │   ├── uevent
    │   ├── uuid
    │   └── verify
    ├── size
    ├── start
    ├── subsystem -> ../../../../../../bus/cxl
    ├── target_list
    ├── target_type
    └── uevent

Ben Widawsky (4):
  cxl/region: Add region creation ABI
  cxl/region: Create attribute structure / verify
  cxl: Move cxl_memdev conversion helper to mem.h
  cxl/region: Introduce concept of region configuration

 Documentation/ABI/testing/sysfs-bus-cxl       |  59 +++
 .../driver-api/cxl/memory-devices.rst         |   8 +
 drivers/cxl/Makefile                          |   2 +-
 drivers/cxl/core.c                            |  71 ++++
 drivers/cxl/cxl.h                             |  11 +
 drivers/cxl/mem.h                             |  26 ++
 drivers/cxl/pci.c                             |   5 -
 drivers/cxl/region.c                          | 400 ++++++++++++++++++
 8 files changed, 576 insertions(+), 6 deletions(-)
 create mode 100644 drivers/cxl/region.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 1/4] cxl/region: Add region creation ABI
  2021-06-10 18:57 [RFC PATCH 0/4] Region Creation Ben Widawsky
@ 2021-06-10 18:57 ` Ben Widawsky
  2021-06-11 13:31   ` Jonathan Cameron
  2021-06-10 18:57 ` [RFC PATCH 2/4] cxl/region: Create attribute structure / verify Ben Widawsky
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Ben Widawsky @ 2021-06-10 18:57 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ben Widawsky, Alison Schofield, Dan Williams, Ira Weiny,
	Jonathan Cameron, Vishal Verma

Regions are created as a child of the decoder that encompasses an
address space with constraints. Regions only exist for persistent
capacities.

As an example, creating a region with:
echo 1 > /sys/bus/cxl/devices/decoder1.0/create_region

Will yield /sys/bus/cxl/devices/decoder1.0/region1.0:0

That region may then be deleted with:
echo "region1.0:0 > /sys/bus/cxl/devices/decoder1.0/delete_region

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl       |  19 +++
 .../driver-api/cxl/memory-devices.rst         |   8 +
 drivers/cxl/Makefile                          |   2 +-
 drivers/cxl/core.c                            |  71 ++++++++
 drivers/cxl/cxl.h                             |  11 ++
 drivers/cxl/mem.h                             |  19 +++
 drivers/cxl/region.c                          | 155 ++++++++++++++++++
 7 files changed, 284 insertions(+), 1 deletion(-)
 create mode 100644 drivers/cxl/region.c

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 0b6a2e6e8fbb..5bcbefd4ea38 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -127,3 +127,22 @@ Description:
 		memory (type-3). The 'target_type' attribute indicates the
 		current setting which may dynamically change based on what
 		memory regions are activated in this decode hierarchy.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/create_region
+Date:		June, 2021
+KernelVersion:	v5.14
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Writing a value of '1' will create a new uninitialized region
+		that will be mapped by the CXL decoderX.Y. Reading from this
+		node will return the last created region. Regions must be
+		subsequently configured and bound to a region driver before they
+		can be used.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
+Date:		June, 2021
+KernelVersion:	v5.14
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Deletes the named region. A region must be unbound from the
+		region driver before being deleted.
diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
index 487ce4f41d77..3066638b087f 100644
--- a/Documentation/driver-api/cxl/memory-devices.rst
+++ b/Documentation/driver-api/cxl/memory-devices.rst
@@ -39,6 +39,14 @@ CXL Core
 .. kernel-doc:: drivers/cxl/core.c
    :doc: cxl core
 
+CXL Regions
+-----------
+.. kernel-doc:: drivers/cxl/region.c
+   :doc: cxl region
+
+.. kernel-doc:: drivers/cxl/region.c
+   :identifiers:
+
 External Interfaces
 ===================
 
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index a29efb3e8ad2..2b9dd9187788 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -4,6 +4,6 @@ obj-$(CONFIG_CXL_MEM) += cxl_pci.o
 obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
 
 ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL
-cxl_core-y := core.o
+cxl_core-y := core.o region.o
 cxl_pci-y := pci.o
 cxl_acpi-y := acpi.o
diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c
index 1b9ee0b08384..cda09a9cd98e 100644
--- a/drivers/cxl/core.c
+++ b/drivers/cxl/core.c
@@ -7,6 +7,7 @@
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include "cxl.h"
+#include "mem.h"
 
 /**
  * DOC: cxl core
@@ -119,7 +120,68 @@ static ssize_t target_list_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(target_list);
 
+static ssize_t create_region_show(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	int rc;
+
+	device_lock(dev);
+	rc = sprintf(buf, "%s\n",
+		     cxld->youngest ? dev_name(&cxld->youngest->dev) : "");
+	device_unlock(dev);
+
+	return rc;
+}
+
+static ssize_t create_region_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	struct cxl_region *region;
+	ssize_t rc;
+	int val;
+
+	rc = kstrtoint(buf, 0, &val);
+	if (rc)
+		return rc;
+	if (val != 1)
+		return -EINVAL;
+
+	region = cxl_alloc_region(cxld);
+	if (IS_ERR(region))
+		return PTR_ERR(region);
+
+	rc = cxl_add_region(cxld, region);
+	if (rc) {
+		cxl_free_region(cxld, region);
+		return rc;
+	}
+
+	cxld->youngest = region;
+	return len;
+}
+static DEVICE_ATTR_RW(create_region);
+
+static ssize_t delete_region_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	int rc;
+
+	rc = cxl_delete_region(cxld, buf);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_WO(delete_region);
+
 static struct attribute *cxl_decoder_base_attrs[] = {
+	&dev_attr_create_region.attr,
+	&dev_attr_delete_region.attr,
 	&dev_attr_start.attr,
 	&dev_attr_size.attr,
 	&dev_attr_locked.attr,
@@ -170,7 +232,13 @@ static void cxl_decoder_release(struct device *dev)
 {
 	struct cxl_decoder *cxld = to_cxl_decoder(dev);
 	struct cxl_port *port = to_cxl_port(dev->parent);
+	struct cxl_region *region;
 
+	list_for_each_entry(region, &cxld->regions, list) {
+		cxl_delete_region(cxld, dev_name(&region->dev));
+	}
+	dev_WARN_ONCE(dev, !ida_is_empty(&cxld->region_ida),
+		      "Lost track of a region");
 	ida_free(&port->decoder_ida, cxld->id);
 	kfree(cxld);
 }
@@ -475,8 +543,11 @@ cxl_decoder_alloc(struct cxl_port *port, int nr_targets, resource_size_t base,
 		.interleave_ways = interleave_ways,
 		.interleave_granularity = interleave_granularity,
 		.target_type = type,
+		.regions = LIST_HEAD_INIT(cxld->regions),
 	};
 
+	ida_init(&cxld->region_ida);
+
 	/* handle implied target_list */
 	if (interleave_ways == 1)
 		cxld->target[0] =
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b988ea288f53..1ffc5e07e24d 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -7,6 +7,7 @@
 #include <linux/bitfield.h>
 #include <linux/bitops.h>
 #include <linux/io.h>
+#include <linux/uuid.h>
 
 /**
  * DOC: cxl objects
@@ -182,6 +183,9 @@ enum cxl_decoder_type {
  * @interleave_granularity: data stride per dport
  * @target_type: accelerator vs expander (type2 vs type3) selector
  * @flags: memory type capabilities and locking
+ * @region_ida: allocator for region ids.
+ * @regions: List of regions mapped (may be disabled) by this decoder.
+ * @youngest: Last region created for this decoder.
  * @target: active ordered target list in current decoder configuration
  */
 struct cxl_decoder {
@@ -192,6 +196,9 @@ struct cxl_decoder {
 	int interleave_granularity;
 	enum cxl_decoder_type target_type;
 	unsigned long flags;
+	struct ida region_ida;
+	struct list_head regions;
+	struct cxl_region *youngest;
 	struct cxl_dport *target[];
 };
 
@@ -231,6 +238,10 @@ struct cxl_dport {
 	struct list_head list;
 };
 
+struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld);
+void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region);
+int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region);
+int cxl_delete_region(struct cxl_decoder *cxld, const char *region);
 struct cxl_port *to_cxl_port(struct device *dev);
 struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
 				   resource_size_t component_reg_phys,
diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h
index 13868ff7cadf..4c4eb06a966b 100644
--- a/drivers/cxl/mem.h
+++ b/drivers/cxl/mem.h
@@ -3,6 +3,8 @@
 #ifndef __CXL_MEM_H__
 #define __CXL_MEM_H__
 
+#include <linux/cdev.h>
+
 /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
 #define CXLMDEV_STATUS_OFFSET 0x0
 #define   CXLMDEV_DEV_FATAL BIT(0)
@@ -46,6 +48,23 @@ struct cxl_memdev {
 	int id;
 };
 
+/**
+ * struct cxl_region - CXL region
+ * @dev: This region's device.
+ * @id: This regions id. Id is globally unique across all regions.
+ * @res: Address space consumed by this region.
+ * @list: Node in decoders region list.
+ * @targets: The memory devices comprising the region.
+ */
+struct cxl_region {
+	struct device dev;
+	int id;
+	struct resource res;
+	struct list_head list;
+	struct cxl_memdev *targets[];
+};
+
+
 /**
  * struct cxl_mem - A CXL memory device
  * @pdev: The PCI device associated with this CXL device.
diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
new file mode 100644
index 000000000000..1f47bc17bd50
--- /dev/null
+++ b/drivers/cxl/region.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include "cxl.h"
+#include "mem.h"
+
+/**
+ * DOC: cxl region
+ *
+ * A CXL region encompasses a chunk of host physical address space that may be
+ * consumed by a single device (x1 interleave aka linear) or across multiple
+ * devices (xN interleaved). A region is a child device of a &struct
+ * cxl_decoder. There may be multiple active regions under a single &struct
+ * cxl_decoder. The common case for multiple regions would be several linear,
+ * contiguous regions under a single decoder. Generally, there will be a 1:1
+ * relationship between decoder and region when the region is interleaved.
+ */
+
+static void cxl_region_release(struct device *dev);
+
+static const struct device_type cxl_region_type = {
+	.name = "cxl_region",
+	.release = cxl_region_release,
+};
+
+static struct cxl_region *to_cxl_region(struct device *dev)
+{
+	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
+			  "not a cxl_region device\n"))
+		return NULL;
+
+	return container_of(dev, struct cxl_region, dev);
+}
+
+void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
+{
+	ida_free(&cxld->region_ida, region->id);
+	kfree(region->targets);
+	kfree(region);
+}
+
+static void cxl_region_release(struct device *dev)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
+	struct cxl_region *region;
+
+	region = to_cxl_region(dev);
+	cxl_free_region(cxld, region);
+}
+
+struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld)
+{
+	struct cxl_region *region;
+	int rc;
+
+	region = kzalloc(struct_size(region, targets, cxld->interleave_ways),
+			 GFP_KERNEL);
+	if (!region) {
+		pr_err("No memory\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
+	if (rc < 0) {
+		dev_err(&cxld->dev, "Couldn't get a new id\n");
+		kfree(region);
+		return ERR_PTR(rc);
+	}
+	region->id = rc;
+
+	return region;
+}
+
+/**
+ * cxl_add_region - Adds a region to a decoder
+ * @cxld: Parent decoder.
+ * @region: Region to be added to the decoder.
+ *
+ * This is the second step of region initialization. Regions exist within an
+ * address space which is mapped by a @cxld, and that @cxld enforces constraints
+ * upon the region as it is configured. Regions may be added to a @cxld but not
+ * activated and therefore it is possible to have more regions in a @cxld than
+ * there are interleave ways in the @cxld. Regions exist only for persistent
+ * capacities.
+ *
+ * Return: zero if the region was added to the @cxld, else returns negative
+ * error code.
+ */
+int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+	struct device *dev = &region->dev;
+	struct resource *res;
+	int rc;
+
+	device_initialize(dev);
+	dev->parent = &cxld->dev;
+	dev->bus = &cxl_bus_type;
+	dev->type = &cxl_region_type;
+	rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, region->id);
+	if (rc)
+		goto err;
+
+	rc = device_add(dev);
+	if (rc)
+		goto err;
+
+	res = &region->res;
+	res->name = "Persistent Memory";
+	res->start = 0;
+	res->end = -1;
+	res->flags = IORESOURCE_MEM;
+	res->desc = IORES_DESC_PERSISTENT_MEMORY;
+
+	dev_dbg(dev, "Added %s to %s\n", dev_name(dev), dev_name(&cxld->dev));
+
+	return 0;
+
+err:
+	put_device(dev);
+	return rc;
+}
+
+static struct cxl_region *
+cxl_find_region_by_name(struct cxl_decoder *cxld, const char *name)
+{
+	struct device *region_dev;
+
+	region_dev = device_find_child_by_name(&cxld->dev, name);
+	if (!region_dev)
+		return ERR_PTR(-ENOENT);
+
+	return to_cxl_region(region_dev);
+}
+
+int cxl_delete_region(struct cxl_decoder *cxld, const char *region_name)
+{
+	struct cxl_region *region;
+
+	region = cxl_find_region_by_name(cxld, region_name);
+	if (IS_ERR(region))
+		return PTR_ERR(region);
+
+	dev_dbg(&cxld->dev, "Requested removal of %s from %s\n",
+		dev_name(&region->dev), dev_name(&cxld->dev));
+
+	device_unregister(&region->dev);
+	put_device(&region->dev);
+
+	return 0;
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 2/4] cxl/region: Create attribute structure / verify
  2021-06-10 18:57 [RFC PATCH 0/4] Region Creation Ben Widawsky
  2021-06-10 18:57 ` [RFC PATCH 1/4] cxl/region: Add region creation ABI Ben Widawsky
@ 2021-06-10 18:57 ` Ben Widawsky
  2021-06-11 13:37   ` Jonathan Cameron
  2021-06-12  0:59   ` Dan Williams
  2021-06-10 18:57 ` [RFC PATCH 3/4] cxl: Move cxl_memdev conversion helper to mem.h Ben Widawsky
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: Ben Widawsky @ 2021-06-10 18:57 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ben Widawsky, Alison Schofield, Dan Williams, Ira Weiny,
	Jonathan Cameron, Vishal Verma

Introduce a verification mechanism for a region. Regions have complex
configuration requirements and it is beneficial to provide a way to
verify the constraints are met before trying to bind. Primarily it adds
ABI to inform userspace of more detailed information about what failed
rather than the limited choices of errno at bind time.

It's important to point out that a verified region can still fail to
bind, but the first step in binding will be to run the same verification
algorithm.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>

--

Functionally it might make sense to squash this patch in with other
patches adding attributes. From a discussion standpoint however, it's
nice to have this broken out as I suspect there might be some debate
about it.
---
 Documentation/ABI/testing/sysfs-bus-cxl | 13 +++++++++++++
 drivers/cxl/region.c                    | 22 +++++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 5bcbefd4ea38..699c8514fd7b 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -146,3 +146,16 @@ Contact:	linux-cxl@vger.kernel.org
 Description:
 		Deletes the named region. A region must be unbound from the
 		region driver before being deleted.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/verify
+Date:		June, 2021
+KernelVersion:	v5.14
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Instructs the kernel to verify that the regionX.Y:Z is properly
+		configured and provide more detailed information about
+		configuration errors. A value of 0 indicates the region is
+		properly configured and ready to bind, otherwise a negative
+		integer is returned describing the first error found in the
+		configuration. A verified region can still fail binding due to
+		lack of resources.
diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
index 1f47bc17bd50..ea1ac848c713 100644
--- a/drivers/cxl/region.c
+++ b/drivers/cxl/region.c
@@ -20,11 +20,31 @@
  * relationship between decoder and region when the region is interleaved.
  */
 
-static void cxl_region_release(struct device *dev);
+static ssize_t verify_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "0");
+}
+
+static DEVICE_ATTR_RO(verify);
+
+static struct attribute *region_attrs[] = {
+	&dev_attr_verify.attr,
+	NULL,
+};
 
+static const struct attribute_group region_group = {
+	.attrs = region_attrs,
+};
+
+static const struct attribute_group *region_groups[] = {
+	&region_group,
+};
+
+static void cxl_region_release(struct device *dev);
 static const struct device_type cxl_region_type = {
 	.name = "cxl_region",
 	.release = cxl_region_release,
+	.groups = region_groups,
 };
 
 static struct cxl_region *to_cxl_region(struct device *dev)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 3/4] cxl: Move cxl_memdev conversion helper to mem.h
  2021-06-10 18:57 [RFC PATCH 0/4] Region Creation Ben Widawsky
  2021-06-10 18:57 ` [RFC PATCH 1/4] cxl/region: Add region creation ABI Ben Widawsky
  2021-06-10 18:57 ` [RFC PATCH 2/4] cxl/region: Create attribute structure / verify Ben Widawsky
@ 2021-06-10 18:57 ` Ben Widawsky
  2021-06-10 18:57 ` [RFC PATCH 4/4] cxl/region: Introduce concept of region configuration Ben Widawsky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Ben Widawsky @ 2021-06-10 18:57 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ben Widawsky, Alison Schofield, Dan Williams, Ira Weiny,
	Jonathan Cameron, Vishal Verma

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 drivers/cxl/mem.h | 5 +++++
 drivers/cxl/pci.c | 5 -----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h
index 4c4eb06a966b..9795aa924035 100644
--- a/drivers/cxl/mem.h
+++ b/drivers/cxl/mem.h
@@ -48,6 +48,11 @@ struct cxl_memdev {
 	int id;
 };
 
+static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
+{
+       return container_of(dev, struct cxl_memdev, dev);
+}
+
 /**
  * struct cxl_region - CXL region
  * @dev: This region's device.
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 5a1705b52278..be158f8f995e 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1142,11 +1142,6 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm)
 	return ret;
 }
 
-static struct cxl_memdev *to_cxl_memdev(struct device *dev)
-{
-	return container_of(dev, struct cxl_memdev, dev);
-}
-
 static void cxl_memdev_release(struct device *dev)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 4/4] cxl/region: Introduce concept of region configuration
  2021-06-10 18:57 [RFC PATCH 0/4] Region Creation Ben Widawsky
                   ` (2 preceding siblings ...)
  2021-06-10 18:57 ` [RFC PATCH 3/4] cxl: Move cxl_memdev conversion helper to mem.h Ben Widawsky
@ 2021-06-10 18:57 ` Ben Widawsky
  2021-06-11 13:52   ` Jonathan Cameron
  2021-06-11 13:11 ` [RFC PATCH 0/4] Region Creation Jonathan Cameron
  2021-06-12  0:44 ` Dan Williams
  5 siblings, 1 reply; 22+ messages in thread
From: Ben Widawsky @ 2021-06-10 18:57 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ben Widawsky, Alison Schofield, Dan Williams, Ira Weiny,
	Jonathan Cameron, Vishal Verma

The region creation APIs leave a region unconfigured. Configuring the
region will work in the same way as similar subsystems such as devdax.
Sysfs attrs will be provided to allow userspace to configure the region.
Finally once all configuration is complete, userspace may "commit" the
config. What the kernel decides to do after a config is committed is out
of scope at this point.

Introduced here are the most basic attributes needed to configure a
region.

A x1 interleave example is provided below:

decoder1.0
├── create_region
├── delete_region
├── devtype
├── locked
├── region1.0:0
│   ├── offset
│   ├── size
│   ├── subsystem -> ../../../../../../../bus/cxl
│   ├── target0
│   ├── uevent
│   ├── uuid
│   └── verify
├── size
├── start
├── subsystem -> ../../../../../../bus/cxl
├── target_list
├── target_type
└── uevent

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |  27 +++
 drivers/cxl/mem.h                       |   2 +
 drivers/cxl/region.c                    | 227 +++++++++++++++++++++++-
 3 files changed, 255 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 699c8514fd7b..d7174a84f70d 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -159,3 +159,30 @@ Description:
 		integer is returned describing the first error found in the
 		configuration. A verified region can still fail binding due to
 		lack of resources.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/{offset,size}
+Date:		June, 2021
+KernelVersion:	v5.14
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		A region resides within an address space that is claimed by a
+		decoder. The region will be of some size within the address
+		space and at some offset that must also reside within the
+		address space. The size and position of the region is specified
+		by these attributes.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/uuid
+Date:		June, 2021
+KernelVersion:	v5.14
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		The unique identifier for the region.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/target[0-15]
+Date:		June, 2021
+KernelVersion:	v5.14
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Memory devices are the backing storage for a region. Each target
+		must be populated with a memdev in order for the region to be
+		eligible to be activated.
diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h
index 9795aa924035..059fbf084fa1 100644
--- a/drivers/cxl/mem.h
+++ b/drivers/cxl/mem.h
@@ -58,6 +58,7 @@ static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
  * @dev: This region's device.
  * @id: This regions id. Id is globally unique across all regions.
  * @res: Address space consumed by this region.
+ * @uuid: The UUID for this region.
  * @list: Node in decoders region list.
  * @targets: The memory devices comprising the region.
  */
@@ -65,6 +66,7 @@ struct cxl_region {
 	struct device dev;
 	int id;
 	struct resource res;
+	uuid_t uuid;
 	struct list_head list;
 	struct cxl_memdev *targets[];
 };
diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
index ea1ac848c713..a69ee00514cb 100644
--- a/drivers/cxl/region.c
+++ b/drivers/cxl/region.c
@@ -3,7 +3,9 @@
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/device.h>
 #include <linux/module.h>
+#include <linux/sizes.h>
 #include <linux/slab.h>
+#include <linux/uuid.h>
 #include <linux/idr.h>
 #include "cxl.h"
 #include "mem.h"
@@ -20,15 +22,130 @@
  * relationship between decoder and region when the region is interleaved.
  */
 
+static struct cxl_region *to_cxl_region(struct device *dev);
+
+#define cxl_region_ways(region)                                                \
+	to_cxl_decoder((region)->dev.parent)->interleave_ways
+
 static ssize_t verify_show(struct device *dev, struct device_attribute *attr, char *buf)
 {
+	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
+	struct cxl_region *region = to_cxl_region(dev);
+	struct resource decode_res;
+	int i;
+
+	decode_res = (struct resource)DEFINE_RES_MEM(cxld->range.start,
+						     range_len(&cxld->range));
+
+	/* Invalid region size */
+	if (!resource_contains(&decode_res, &region->res))
+		return sysfs_emit(buf, "size");
+
+	if (resource_size(&region->res) % (SZ_256M * cxld->interleave_ways))
+		return sysfs_emit(buf, "alignment");
+
+	/* Missing target memory device */
+	for (i = 0; i < cxld->interleave_ways; i++)
+		if (!region->targets[i])
+			return sysfs_emit(buf, "memdev");
+
 	return sysfs_emit(buf, "0");
 }
 
 static DEVICE_ATTR_RO(verify);
 
+static bool is_region_active(struct cxl_region *region)
+{
+	/* TODO: Regions can't be activated yet. */
+	return false;
+}
+
+static ssize_t offset_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	struct cxl_region *region = to_cxl_region(dev);
+
+	return sysfs_emit(buf, "%#llx\n", region->res.start);
+}
+
+static ssize_t offset_store(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t len)
+{
+	struct cxl_region *region = to_cxl_region(dev);
+	unsigned long long val;
+	ssize_t rc;
+
+	rc = kstrtoull(buf, 0, &val);
+	if (rc)
+		return rc;
+
+	if (is_region_active(region)) {
+		/* TODO: */
+	} else {
+		region->res.start = val;
+	}
+
+	return len;
+}
+
+static DEVICE_ATTR_RW(offset);
+
+static ssize_t size_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct cxl_region *region = to_cxl_region(dev);
+
+	return sysfs_emit(buf, "%llu\n", resource_size(&region->res));
+}
+
+static ssize_t size_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
+{
+	struct cxl_region *region = to_cxl_region(dev);
+	unsigned long long val;
+	ssize_t rc;
+
+	rc = kstrtoull(buf, 0, &val);
+	if (rc)
+		return rc;
+
+	if (is_region_active(region)) {
+		/* TODO: */
+	} else {
+		region->res.end = region->res.start + val - 1;
+	}
+
+	return len;
+}
+static DEVICE_ATTR_RW(size);
+
+static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct cxl_region *region = to_cxl_region(dev);
+
+	return sysfs_emit(buf, "%pUb\n", &region->uuid);
+}
+static ssize_t uuid_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
+{
+	struct cxl_region *region = to_cxl_region(dev);
+	ssize_t rc;
+
+	if (len != UUID_STRING_LEN + 1)
+		return -EINVAL;
+
+	rc = uuid_parse(buf, &region->uuid);
+
+	return rc ? rc : len;
+}
+static DEVICE_ATTR_RW(uuid);
+
 static struct attribute *region_attrs[] = {
 	&dev_attr_verify.attr,
+	&dev_attr_offset.attr,
+	&dev_attr_size.attr,
+	&dev_attr_uuid.attr,
 	NULL,
 };
 
@@ -36,8 +153,111 @@ static const struct attribute_group region_group = {
 	.attrs = region_attrs,
 };
 
+static size_t show_targetN(struct cxl_region *region, char *buf, int n)
+{
+	if (region->targets[n])
+		return sysfs_emit(buf, "%s\n", dev_name(&region->targets[n]->dev));
+	else
+		return sysfs_emit(buf, "nil\n");
+}
+
+static size_t set_targetN(struct cxl_region *region, const char *buf, int n, size_t len)
+{
+	struct device *memdev_dev;
+	struct cxl_memdev *cxlmd;
+	ssize_t rc;
+	int val;
+
+	rc = kstrtoint(buf, 0, &val);
+	if (!rc && val == 0) {
+		cxlmd = region->targets[n] = cxlmd;
+		if (cxlmd)
+			put_device(&cxlmd->dev);
+		region->targets[n] = NULL;
+		return len;
+	}
+
+	memdev_dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
+	if (!memdev_dev)
+		return -ENOENT;
+
+	cxlmd = to_cxl_memdev(memdev_dev);
+	get_device(&cxlmd->dev);
+	region->targets[n] = cxlmd;
+
+	return len;
+}
+
+#define TARGET_ATTR_RW(n)                                                      \
+	static ssize_t target##n##_show(                                       \
+		struct device *dev, struct device_attribute *attr, char *buf)  \
+	{                                                                      \
+		return show_targetN(to_cxl_region(dev), buf, n);               \
+	}                                                                      \
+	static ssize_t target##n##_store(struct device *dev,                   \
+					 struct device_attribute *attr,        \
+					 const char *buf, size_t len)          \
+	{                                                                      \
+		return set_targetN(to_cxl_region(dev), buf, n, len);           \
+	}                                                                      \
+	static DEVICE_ATTR_RW(target##n)
+
+TARGET_ATTR_RW(0);
+TARGET_ATTR_RW(1);
+TARGET_ATTR_RW(2);
+TARGET_ATTR_RW(3);
+TARGET_ATTR_RW(4);
+TARGET_ATTR_RW(5);
+TARGET_ATTR_RW(6);
+TARGET_ATTR_RW(7);
+TARGET_ATTR_RW(8);
+TARGET_ATTR_RW(9);
+TARGET_ATTR_RW(10);
+TARGET_ATTR_RW(11);
+TARGET_ATTR_RW(12);
+TARGET_ATTR_RW(13);
+TARGET_ATTR_RW(14);
+TARGET_ATTR_RW(15);
+
+static struct attribute *interleave_attrs[] = {
+	&dev_attr_target0.attr,
+	&dev_attr_target1.attr,
+	&dev_attr_target2.attr,
+	&dev_attr_target3.attr,
+	&dev_attr_target4.attr,
+	&dev_attr_target5.attr,
+	&dev_attr_target6.attr,
+	&dev_attr_target7.attr,
+	&dev_attr_target8.attr,
+	&dev_attr_target9.attr,
+	&dev_attr_target10.attr,
+	&dev_attr_target11.attr,
+	&dev_attr_target12.attr,
+	&dev_attr_target13.attr,
+	&dev_attr_target14.attr,
+	&dev_attr_target15.attr,
+	NULL,
+};
+
+static umode_t visible_targets(struct kobject *kobj, struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct cxl_region *region = to_cxl_region(dev);
+
+	if (n < cxl_region_ways(region))
+		return a->mode;
+	return 0;
+}
+
+static const struct attribute_group region_interleave_group = {
+	.attrs = interleave_attrs,
+	.is_visible = visible_targets,
+};
+
 static const struct attribute_group *region_groups[] = {
 	&region_group,
+	&region_interleave_group,
+	NULL,
 };
 
 static void cxl_region_release(struct device *dev);
@@ -58,8 +278,13 @@ static struct cxl_region *to_cxl_region(struct device *dev)
 
 void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
 {
+	int i;
+
 	ida_free(&cxld->region_ida, region->id);
-	kfree(region->targets);
+	for (i = 0; i < cxld->interleave_ways; i++) {
+		if (region->targets[i])
+			put_device(&region->targets[i]->dev);
+	}
 	kfree(region);
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] Region Creation
  2021-06-10 18:57 [RFC PATCH 0/4] Region Creation Ben Widawsky
                   ` (3 preceding siblings ...)
  2021-06-10 18:57 ` [RFC PATCH 4/4] cxl/region: Introduce concept of region configuration Ben Widawsky
@ 2021-06-11 13:11 ` Jonathan Cameron
  2021-06-11 13:53   ` Jonathan Cameron
  2021-06-12  0:44 ` Dan Williams
  5 siblings, 1 reply; 22+ messages in thread
From: Jonathan Cameron @ 2021-06-11 13:11 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Dan Williams, Ira Weiny, Vishal Verma

On Thu, 10 Jun 2021 11:57:21 -0700
Ben Widawsky <ben.widawsky@intel.com> wrote:

> CXL interleave sets and non-interleave sets are described via regions. A region
> is specified in the CXL 2.0 specification and the purpose is to create a
> standardized way to preserve the region across reboots.

A specific section reference would be helpful.

> 
> Introduced here is the basic mechanism to create and configure and delete a CXL
> region. Configuring a region simply means giving it a size, offset within the
> CFMWS window, UUID, and a target list. Enabling/activating a region, which
> ultimately means programming the HDM decoders in the chain, is left for later
> work.
> 
> The patches are only minimally tested so far in QEMU emulation and so x1
> interleave is all that's supported.

I'm guessing this is why it's an RFC rather than a final submission?

If you can call out the RFC reasons in a cover letter it is helpful
as saves people wondering what specifically you want comments on.

> 
> Here is a sample topology (also in patch #4)
> 
>     decoder1.0
>     ├── create_region
>     ├── delete_region
>     ├── devtype
>     ├── locked
>     ├── region1.0:0
>     │   ├── offset
>     │   ├── size
>     │   ├── subsystem -> ../../../../../../../bus/cxl
>     │   ├── target0
>     │   ├── uevent
>     │   ├── uuid
>     │   └── verify
>     ├── size
>     ├── start
>     ├── subsystem -> ../../../../../../bus/cxl
>     ├── target_list
>     ├── target_type
>     └── uevent
> 
> Ben Widawsky (4):
>   cxl/region: Add region creation ABI
>   cxl/region: Create attribute structure / verify
>   cxl: Move cxl_memdev conversion helper to mem.h
>   cxl/region: Introduce concept of region configuration
> 
>  Documentation/ABI/testing/sysfs-bus-cxl       |  59 +++
>  .../driver-api/cxl/memory-devices.rst         |   8 +
>  drivers/cxl/Makefile                          |   2 +-
>  drivers/cxl/core.c                            |  71 ++++
>  drivers/cxl/cxl.h                             |  11 +
>  drivers/cxl/mem.h                             |  26 ++
>  drivers/cxl/pci.c                             |   5 -
>  drivers/cxl/region.c                          | 400 ++++++++++++++++++
>  8 files changed, 576 insertions(+), 6 deletions(-)
>  create mode 100644 drivers/cxl/region.c
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/4] cxl/region: Add region creation ABI
  2021-06-10 18:57 ` [RFC PATCH 1/4] cxl/region: Add region creation ABI Ben Widawsky
@ 2021-06-11 13:31   ` Jonathan Cameron
  2021-06-16 17:38     ` Ben Widawsky
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Cameron @ 2021-06-11 13:31 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Dan Williams, Ira Weiny, Vishal Verma

On Thu, 10 Jun 2021 11:57:22 -0700
Ben Widawsky <ben.widawsky@intel.com> wrote:

> Regions are created as a child of the decoder that encompasses an
> address space with constraints. Regions only exist for persistent
> capacities.
> 
> As an example, creating a region with:
> echo 1 > /sys/bus/cxl/devices/decoder1.0/create_region
> 
> Will yield /sys/bus/cxl/devices/decoder1.0/region1.0:0
> 
> That region may then be deleted with:
> echo "region1.0:0 > /sys/bus/cxl/devices/decoder1.0/delete_region
> 
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Minor stuff inline.

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl       |  19 +++
>  .../driver-api/cxl/memory-devices.rst         |   8 +
>  drivers/cxl/Makefile                          |   2 +-
>  drivers/cxl/core.c                            |  71 ++++++++
>  drivers/cxl/cxl.h                             |  11 ++
>  drivers/cxl/mem.h                             |  19 +++
>  drivers/cxl/region.c                          | 155 ++++++++++++++++++
>  7 files changed, 284 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/cxl/region.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 0b6a2e6e8fbb..5bcbefd4ea38 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -127,3 +127,22 @@ Description:
>  		memory (type-3). The 'target_type' attribute indicates the
>  		current setting which may dynamically change based on what
>  		memory regions are activated in this decode hierarchy.
> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/create_region
> +Date:		June, 2021
> +KernelVersion:	v5.14
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		Writing a value of '1' will create a new uninitialized region
> +		that will be mapped by the CXL decoderX.Y. Reading from this
> +		node will return the last created region. Regions must be
> +		subsequently configured and bound to a region driver before they
> +		can be used.
> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
> +Date:		June, 2021
> +KernelVersion:	v5.14
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		Deletes the named region. A region must be unbound from the
> +		region driver before being deleted.

Perhaps call out an example of what a name looks like?

> diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> index 487ce4f41d77..3066638b087f 100644
> --- a/Documentation/driver-api/cxl/memory-devices.rst
> +++ b/Documentation/driver-api/cxl/memory-devices.rst
> @@ -39,6 +39,14 @@ CXL Core
>  .. kernel-doc:: drivers/cxl/core.c
>     :doc: cxl core
>  
> +CXL Regions
> +-----------
> +.. kernel-doc:: drivers/cxl/region.c
> +   :doc: cxl region
> +
> +.. kernel-doc:: drivers/cxl/region.c
> +   :identifiers:
> +
>  External Interfaces
>  ===================
>  
> diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> index a29efb3e8ad2..2b9dd9187788 100644
> --- a/drivers/cxl/Makefile
> +++ b/drivers/cxl/Makefile
> @@ -4,6 +4,6 @@ obj-$(CONFIG_CXL_MEM) += cxl_pci.o
>  obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
>  
>  ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL
> -cxl_core-y := core.o
> +cxl_core-y := core.o region.o
>  cxl_pci-y := pci.o
>  cxl_acpi-y := acpi.o
> diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c
> index 1b9ee0b08384..cda09a9cd98e 100644
> --- a/drivers/cxl/core.c
> +++ b/drivers/cxl/core.c
> @@ -7,6 +7,7 @@
>  #include <linux/slab.h>
>  #include <linux/idr.h>
>  #include "cxl.h"
> +#include "mem.h"

This seems to be breaking the nice separation in cxl.
Perhaps add a region.h instead?

>  
>  /**
>   * DOC: cxl core
> @@ -119,7 +120,68 @@ static ssize_t target_list_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(target_list);
>  
> +static ssize_t create_region_show(struct device *dev,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +	int rc;
> +
> +	device_lock(dev);
> +	rc = sprintf(buf, "%s\n",
> +		     cxld->youngest ? dev_name(&cxld->youngest->dev) : "");
> +	device_unlock(dev);
> +
> +	return rc;
> +}
> +
> +static ssize_t create_region_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +	struct cxl_region *region;
> +	ssize_t rc;
> +	int val;
> +
> +	rc = kstrtoint(buf, 0, &val);
> +	if (rc)
> +		return rc;
> +	if (val != 1)
> +		return -EINVAL;
> +
> +	region = cxl_alloc_region(cxld);
> +	if (IS_ERR(region))
> +		return PTR_ERR(region);
> +
> +	rc = cxl_add_region(cxld, region);
> +	if (rc) {
> +		cxl_free_region(cxld, region);
> +		return rc;
> +	}
> +
> +	cxld->youngest = region;

Care needed I think if a delete is called on whatever
is the youngest region then this file read.

> +	return len;
> +}
> +static DEVICE_ATTR_RW(create_region);
> +
> +static ssize_t delete_region_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +	int rc;
> +
> +	rc = cxl_delete_region(cxld, buf);
> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(delete_region);
> +
>  static struct attribute *cxl_decoder_base_attrs[] = {
> +	&dev_attr_create_region.attr,
> +	&dev_attr_delete_region.attr,
>  	&dev_attr_start.attr,
>  	&dev_attr_size.attr,
>  	&dev_attr_locked.attr,
> @@ -170,7 +232,13 @@ static void cxl_decoder_release(struct device *dev)
>  {
>  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
>  	struct cxl_port *port = to_cxl_port(dev->parent);
> +	struct cxl_region *region;
>  
> +	list_for_each_entry(region, &cxld->regions, list) {
> +		cxl_delete_region(cxld, dev_name(&region->dev));
> +	}

Nitpick, brackets not needed.

> +	dev_WARN_ONCE(dev, !ida_is_empty(&cxld->region_ida),
> +		      "Lost track of a region");
>  	ida_free(&port->decoder_ida, cxld->id);
>  	kfree(cxld);
>  }
> @@ -475,8 +543,11 @@ cxl_decoder_alloc(struct cxl_port *port, int nr_targets, resource_size_t base,
>  		.interleave_ways = interleave_ways,
>  		.interleave_granularity = interleave_granularity,
>  		.target_type = type,
> +		.regions = LIST_HEAD_INIT(cxld->regions),
>  	};
>  
> +	ida_init(&cxld->region_ida);
> +
>  	/* handle implied target_list */
>  	if (interleave_ways == 1)
>  		cxld->target[0] =
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index b988ea288f53..1ffc5e07e24d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -7,6 +7,7 @@
>  #include <linux/bitfield.h>
>  #include <linux/bitops.h>
>  #include <linux/io.h>
> +#include <linux/uuid.h>
>  
>  /**
>   * DOC: cxl objects
> @@ -182,6 +183,9 @@ enum cxl_decoder_type {
>   * @interleave_granularity: data stride per dport
>   * @target_type: accelerator vs expander (type2 vs type3) selector
>   * @flags: memory type capabilities and locking
> + * @region_ida: allocator for region ids.
> + * @regions: List of regions mapped (may be disabled) by this decoder.
> + * @youngest: Last region created for this decoder.
>   * @target: active ordered target list in current decoder configuration
>   */
>  struct cxl_decoder {
> @@ -192,6 +196,9 @@ struct cxl_decoder {
>  	int interleave_granularity;
>  	enum cxl_decoder_type target_type;
>  	unsigned long flags;
> +	struct ida region_ida;
> +	struct list_head regions;
> +	struct cxl_region *youngest;
>  	struct cxl_dport *target[];
>  };
>  
> @@ -231,6 +238,10 @@ struct cxl_dport {
>  	struct list_head list;
>  };
>  
> +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld);
> +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region);
> +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region);
> +int cxl_delete_region(struct cxl_decoder *cxld, const char *region);
>  struct cxl_port *to_cxl_port(struct device *dev);
>  struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
>  				   resource_size_t component_reg_phys,
> diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h
> index 13868ff7cadf..4c4eb06a966b 100644
> --- a/drivers/cxl/mem.h
> +++ b/drivers/cxl/mem.h
> @@ -3,6 +3,8 @@
>  #ifndef __CXL_MEM_H__
>  #define __CXL_MEM_H__
>  
> +#include <linux/cdev.h>

Unrelated?

> +
>  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
>  #define CXLMDEV_STATUS_OFFSET 0x0
>  #define   CXLMDEV_DEV_FATAL BIT(0)
> @@ -46,6 +48,23 @@ struct cxl_memdev {
>  	int id;
>  };
>  
> +/**
> + * struct cxl_region - CXL region
> + * @dev: This region's device.
> + * @id: This regions id. Id is globally unique across all regions.
> + * @res: Address space consumed by this region.
> + * @list: Node in decoders region list.
> + * @targets: The memory devices comprising the region.
> + */
> +struct cxl_region {
> +	struct device dev;
> +	int id;
> +	struct resource res;
> +	struct list_head list;
> +	struct cxl_memdev *targets[];
> +};
> +
> +

One line only.

>  /**
>   * struct cxl_mem - A CXL memory device
>   * @pdev: The PCI device associated with this CXL device.
> diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
> new file mode 100644
> index 000000000000..1f47bc17bd50
> --- /dev/null
> +++ b/drivers/cxl/region.c
> @@ -0,0 +1,155 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/idr.h>
> +#include "cxl.h"
> +#include "mem.h"
> +
> +/**
> + * DOC: cxl region
> + *
> + * A CXL region encompasses a chunk of host physical address space that may be
> + * consumed by a single device (x1 interleave aka linear) or across multiple
> + * devices (xN interleaved). A region is a child device of a &struct
> + * cxl_decoder. There may be multiple active regions under a single &struct
> + * cxl_decoder. The common case for multiple regions would be several linear,
> + * contiguous regions under a single decoder. Generally, there will be a 1:1
> + * relationship between decoder and region when the region is interleaved.
> + */
> +
> +static void cxl_region_release(struct device *dev);
> +
> +static const struct device_type cxl_region_type = {
> +	.name = "cxl_region",
> +	.release = cxl_region_release,
> +};
> +
> +static struct cxl_region *to_cxl_region(struct device *dev)
> +{
> +	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> +			  "not a cxl_region device\n"))
> +		return NULL;
> +
> +	return container_of(dev, struct cxl_region, dev);
> +}
> +
> +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
> +{
> +	ida_free(&cxld->region_ida, region->id);
> +	kfree(region->targets);
> +	kfree(region);
> +}
> +
> +static void cxl_region_release(struct device *dev)
> +{
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> +	struct cxl_region *region;
	struct cxl_region *region = to_cxl_region(dev);
?
Or just roll it into one line.

cxl_free_region(to_cxl_decoder(dev->parent), to_cxl_region(dev));

Of course may be affected by later patches so this doesn't make sense
but I'm too lazy to check!

> +
> +	region = to_cxl_region(dev);
> +	cxl_free_region(cxld, region);
> +}
> +
> +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld)
> +{
> +	struct cxl_region *region;
> +	int rc;
> +
> +	region = kzalloc(struct_size(region, targets, cxld->interleave_ways),
> +			 GFP_KERNEL);
> +	if (!region) {
> +		pr_err("No memory\n");

Drop, or move to dev_err? 

> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> +	if (rc < 0) {
> +		dev_err(&cxld->dev, "Couldn't get a new id\n");
> +		kfree(region);
> +		return ERR_PTR(rc);
> +	}
> +	region->id = rc;
> +
> +	return region;
> +}
> +
> +/**
> + * cxl_add_region - Adds a region to a decoder
> + * @cxld: Parent decoder.
> + * @region: Region to be added to the decoder.
> + *
> + * This is the second step of region initialization. Regions exist within an
> + * address space which is mapped by a @cxld, and that @cxld enforces constraints
> + * upon the region as it is configured. Regions may be added to a @cxld but not
> + * activated and therefore it is possible to have more regions in a @cxld than
> + * there are interleave ways in the @cxld. Regions exist only for persistent
> + * capacities.
> + *
> + * Return: zero if the region was added to the @cxld, else returns negative
> + * error code.
> + */
> +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +	struct device *dev = &region->dev;
> +	struct resource *res;
> +	int rc;
> +
> +	device_initialize(dev);
> +	dev->parent = &cxld->dev;
> +	dev->bus = &cxl_bus_type;
> +	dev->type = &cxl_region_type;

Probably doesn't want the pm controls.

> +	rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, region->id);
> +	if (rc)
> +		goto err;
> +
> +	rc = device_add(dev);
> +	if (rc)
> +		goto err;
> +
> +	res = &region->res;
> +	res->name = "Persistent Memory";
> +	res->start = 0;
> +	res->end = -1;
> +	res->flags = IORESOURCE_MEM;
> +	res->desc = IORES_DESC_PERSISTENT_MEMORY;
> +
> +	dev_dbg(dev, "Added %s to %s\n", dev_name(dev), dev_name(&cxld->dev));
> +
> +	return 0;
> +
> +err:
> +	put_device(dev);
> +	return rc;
> +}
> +
> +static struct cxl_region *
> +cxl_find_region_by_name(struct cxl_decoder *cxld, const char *name)
> +{
> +	struct device *region_dev;
> +
> +	region_dev = device_find_child_by_name(&cxld->dev, name);
> +	if (!region_dev)
> +		return ERR_PTR(-ENOENT);
> +
> +	return to_cxl_region(region_dev);
> +}
> +
> +int cxl_delete_region(struct cxl_decoder *cxld, const char *region_name)
> +{
> +	struct cxl_region *region;
> +
> +	region = cxl_find_region_by_name(cxld, region_name);
> +	if (IS_ERR(region))
> +		return PTR_ERR(region);
> +
> +	dev_dbg(&cxld->dev, "Requested removal of %s from %s\n",
> +		dev_name(&region->dev), dev_name(&cxld->dev));
> +
> +	device_unregister(&region->dev);
> +	put_device(&region->dev);
> +
> +	return 0;
> +}


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/4] cxl/region: Create attribute structure / verify
  2021-06-10 18:57 ` [RFC PATCH 2/4] cxl/region: Create attribute structure / verify Ben Widawsky
@ 2021-06-11 13:37   ` Jonathan Cameron
  2021-06-12  0:59   ` Dan Williams
  1 sibling, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2021-06-11 13:37 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Dan Williams, Ira Weiny, Vishal Verma

On Thu, 10 Jun 2021 11:57:23 -0700
Ben Widawsky <ben.widawsky@intel.com> wrote:

> Introduce a verification mechanism for a region. Regions have complex
> configuration requirements and it is beneficial to provide a way to
> verify the constraints are met before trying to bind. Primarily it adds
> ABI to inform userspace of more detailed information about what failed
> rather than the limited choices of errno at bind time.
> 
> It's important to point out that a verified region can still fail to
> bind, but the first step in binding will be to run the same verification
> algorithm.
> 
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> 
> --
> 
> Functionally it might make sense to squash this patch in with other
> patches adding attributes. From a discussion standpoint however, it's
> nice to have this broken out as I suspect there might be some debate
> about it.

Hmm. Definitely squash it in later, as this is downright odd at the moment!

Is there precedence elsewhere for this interface approach?
I can see the advantage of it as it lets us pass through invalid states
whilst configuring but it is somewhat unusual.

Probably one for linux-api@vger.kernel.org to get more exposure to people
who care about this stuff.  I suspect there aren't that many people
on linux-cxl yet ;)

Jonathan

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 13 +++++++++++++
>  drivers/cxl/region.c                    | 22 +++++++++++++++++++++-
>  2 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 5bcbefd4ea38..699c8514fd7b 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -146,3 +146,16 @@ Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		Deletes the named region. A region must be unbound from the
>  		region driver before being deleted.
> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/verify
> +Date:		June, 2021
> +KernelVersion:	v5.14
> +Contact:	linux-cxl@vger.kernel.org
> +Description:

Doing 'what' to this file causes to this to happen?
You want to state that "Reading this file instructs..."

> +		Instructs the kernel to verify that the regionX.Y:Z is properly
> +		configured and provide more detailed information about
> +		configuration errors. A value of 0 indicates the region is
> +		properly configured and ready to bind, otherwise a negative
> +		integer is returned describing the first error found in the
> +		configuration. A verified region can still fail binding due to
> +		lack of resources.
> diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
> index 1f47bc17bd50..ea1ac848c713 100644
> --- a/drivers/cxl/region.c
> +++ b/drivers/cxl/region.c
> @@ -20,11 +20,31 @@
>   * relationship between decoder and region when the region is interleaved.
>   */
>  
> -static void cxl_region_release(struct device *dev);
> +static ssize_t verify_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	return sysfs_emit(buf, "0");
> +}
> +
> +static DEVICE_ATTR_RO(verify);
> +
> +static struct attribute *region_attrs[] = {
> +	&dev_attr_verify.attr,
> +	NULL,
> +};
>  
> +static const struct attribute_group region_group = {
> +	.attrs = region_attrs,
> +};
> +
> +static const struct attribute_group *region_groups[] = {
> +	&region_group,
> +};
> +
> +static void cxl_region_release(struct device *dev);
>  static const struct device_type cxl_region_type = {
>  	.name = "cxl_region",
>  	.release = cxl_region_release,
> +	.groups = region_groups,
>  };
>  
>  static struct cxl_region *to_cxl_region(struct device *dev)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce concept of region configuration
  2021-06-10 18:57 ` [RFC PATCH 4/4] cxl/region: Introduce concept of region configuration Ben Widawsky
@ 2021-06-11 13:52   ` Jonathan Cameron
  2021-06-14 16:18     ` Ben Widawsky
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Cameron @ 2021-06-11 13:52 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Dan Williams, Ira Weiny, Vishal Verma

On Thu, 10 Jun 2021 11:57:25 -0700
Ben Widawsky <ben.widawsky@intel.com> wrote:

> The region creation APIs leave a region unconfigured. Configuring the
> region will work in the same way as similar subsystems such as devdax.
> Sysfs attrs will be provided to allow userspace to configure the region.
> Finally once all configuration is complete, userspace may "commit" the
> config. What the kernel decides to do after a config is committed is out
> of scope at this point.
> 
> Introduced here are the most basic attributes needed to configure a
> region.
> 
> A x1 interleave example is provided below:
> 
> decoder1.0
> ├── create_region
> ├── delete_region
> ├── devtype
> ├── locked
> ├── region1.0:0
> │   ├── offset
> │   ├── size
> │   ├── subsystem -> ../../../../../../../bus/cxl
> │   ├── target0
> │   ├── uevent
> │   ├── uuid
> │   └── verify
> ├── size
> ├── start
> ├── subsystem -> ../../../../../../bus/cxl
> ├── target_list
> ├── target_type
> └── uevent
> 
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |  27 +++
>  drivers/cxl/mem.h                       |   2 +
>  drivers/cxl/region.c                    | 227 +++++++++++++++++++++++-
>  3 files changed, 255 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 699c8514fd7b..d7174a84f70d 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -159,3 +159,30 @@ Description:
>  		integer is returned describing the first error found in the
>  		configuration. A verified region can still fail binding due to
>  		lack of resources.
> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/{offset,size}

I missed this before, but why do we need X.Y in the region naming given it's
always inside decoderX.Y.  Seems like RegionZ would work.

> +Date:		June, 2021
> +KernelVersion:	v5.14
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		A region resides within an address space that is claimed by a
> +		decoder. The region will be of some size within the address
> +		space and at some offset that must also reside within the
> +		address space. The size and position of the region is specified
> +		by these attributes.
Could perhaps reword this.  Something like.

		The region defined by size and offset must be fully contained
		within the address space.
> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/uuid
> +Date:		June, 2021
> +KernelVersion:	v5.14
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		The unique identifier for the region.
> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/target[0-15]
> +Date:		June, 2021
> +KernelVersion:	v5.14
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		Memory devices are the backing storage for a region. Each target
> +		must be populated with a memdev in order for the region to be
> +		eligible to be activated.

How do you do that?  What is written to this file?

> diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h
> index 9795aa924035..059fbf084fa1 100644
> --- a/drivers/cxl/mem.h
> +++ b/drivers/cxl/mem.h
> @@ -58,6 +58,7 @@ static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
>   * @dev: This region's device.
>   * @id: This regions id. Id is globally unique across all regions.
>   * @res: Address space consumed by this region.
> + * @uuid: The UUID for this region.
>   * @list: Node in decoders region list.
>   * @targets: The memory devices comprising the region.
>   */
> @@ -65,6 +66,7 @@ struct cxl_region {
>  	struct device dev;
>  	int id;
>  	struct resource res;
> +	uuid_t uuid;
>  	struct list_head list;
>  	struct cxl_memdev *targets[];
>  };
> diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
> index ea1ac848c713..a69ee00514cb 100644
> --- a/drivers/cxl/region.c
> +++ b/drivers/cxl/region.c
> @@ -3,7 +3,9 @@
>  #include <linux/io-64-nonatomic-lo-hi.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
> +#include <linux/sizes.h>
>  #include <linux/slab.h>
> +#include <linux/uuid.h>
>  #include <linux/idr.h>
>  #include "cxl.h"
>  #include "mem.h"
> @@ -20,15 +22,130 @@
>   * relationship between decoder and region when the region is interleaved.
>   */
>  
> +static struct cxl_region *to_cxl_region(struct device *dev);
> +
> +#define cxl_region_ways(region)                                                \
> +	to_cxl_decoder((region)->dev.parent)->interleave_ways
> +
>  static ssize_t verify_show(struct device *dev, struct device_attribute *attr, char *buf)
>  {
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> +	struct cxl_region *region = to_cxl_region(dev);
> +	struct resource decode_res;
> +	int i;
> +
> +	decode_res = (struct resource)DEFINE_RES_MEM(cxld->range.start,
> +						     range_len(&cxld->range));
> +
> +	/* Invalid region size */
> +	if (!resource_contains(&decode_res, &region->res))
> +		return sysfs_emit(buf, "size");

Perhaps "outside region"?  Size might be fine, but not the offset.
Also, docs say a negative integer is returned for this attribute.
Those docs need to call out the full list of things that might be returned
so that userspace can know what to expect.

> +
> +	if (resource_size(&region->res) % (SZ_256M * cxld->interleave_ways))
> +		return sysfs_emit(buf, "alignment");
> +
> +	/* Missing target memory device */
> +	for (i = 0; i < cxld->interleave_ways; i++)
> +		if (!region->targets[i])
> +			return sysfs_emit(buf, "memdev");
> +
>  	return sysfs_emit(buf, "0");
>  }
>  
>  static DEVICE_ATTR_RO(verify);
>  
> +static bool is_region_active(struct cxl_region *region)
> +{
> +	/* TODO: Regions can't be activated yet. */
> +	return false;
> +}
> +
> +static ssize_t offset_show(struct device *dev, struct device_attribute *attr,
> +			   char *buf)
> +{
> +	struct cxl_region *region = to_cxl_region(dev);
> +
> +	return sysfs_emit(buf, "%#llx\n", region->res.start);
> +}
> +
> +static ssize_t offset_store(struct device *dev, struct device_attribute *attr,
> +			    const char *buf, size_t len)
> +{
> +	struct cxl_region *region = to_cxl_region(dev);
> +	unsigned long long val;
> +	ssize_t rc;
> +
> +	rc = kstrtoull(buf, 0, &val);
> +	if (rc)
> +		return rc;
> +
> +	if (is_region_active(region)) {
> +		/* TODO: */
> +	} else {
> +		region->res.start = val;
> +	}
> +
> +	return len;
> +}
> +
> +static DEVICE_ATTR_RW(offset);
> +
> +static ssize_t size_show(struct device *dev, struct device_attribute *attr,
> +			 char *buf)
> +{
> +	struct cxl_region *region = to_cxl_region(dev);
> +
> +	return sysfs_emit(buf, "%llu\n", resource_size(&region->res));
> +}
> +
> +static ssize_t size_store(struct device *dev, struct device_attribute *attr,
> +			  const char *buf, size_t len)
> +{
> +	struct cxl_region *region = to_cxl_region(dev);
> +	unsigned long long val;
> +	ssize_t rc;
> +
> +	rc = kstrtoull(buf, 0, &val);
> +	if (rc)
> +		return rc;
> +
> +	if (is_region_active(region)) {
> +		/* TODO: */
> +	} else {
> +		region->res.end = region->res.start + val - 1;
> +	}
> +
> +	return len;
> +}
> +static DEVICE_ATTR_RW(size);
> +
> +static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
> +			 char *buf)
> +{
> +	struct cxl_region *region = to_cxl_region(dev);
> +
> +	return sysfs_emit(buf, "%pUb\n", &region->uuid);
> +}
> +static ssize_t uuid_store(struct device *dev, struct device_attribute *attr,
> +			  const char *buf, size_t len)
> +{
> +	struct cxl_region *region = to_cxl_region(dev);
> +	ssize_t rc;
> +
> +	if (len != UUID_STRING_LEN + 1)
> +		return -EINVAL;
> +
> +	rc = uuid_parse(buf, &region->uuid);
> +
> +	return rc ? rc : len;
> +}
> +static DEVICE_ATTR_RW(uuid);
> +
>  static struct attribute *region_attrs[] = {
>  	&dev_attr_verify.attr,
> +	&dev_attr_offset.attr,
> +	&dev_attr_size.attr,
> +	&dev_attr_uuid.attr,
>  	NULL,
>  };
>  
> @@ -36,8 +153,111 @@ static const struct attribute_group region_group = {
>  	.attrs = region_attrs,
>  };
>  
> +static size_t show_targetN(struct cxl_region *region, char *buf, int n)
> +{
> +	if (region->targets[n])
> +		return sysfs_emit(buf, "%s\n", dev_name(&region->targets[n]->dev));
> +	else
> +		return sysfs_emit(buf, "nil\n");

This needs documenting in the ABI docs. I'd I guessed it would return an empty
string.

> +}
> +
> +static size_t set_targetN(struct cxl_region *region, const char *buf, int n, size_t len)
> +{
> +	struct device *memdev_dev;
> +	struct cxl_memdev *cxlmd;
> +	ssize_t rc;
> +	int val;
> +
> +	rc = kstrtoint(buf, 0, &val);
> +	if (!rc && val == 0) {
> +		cxlmd = region->targets[n] = cxlmd;
> +		if (cxlmd)
> +			put_device(&cxlmd->dev);
> +		region->targets[n] = NULL;
> +		return len;
> +	}
> +
> +	memdev_dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
> +	if (!memdev_dev)
> +		return -ENOENT;
> +
> +	cxlmd = to_cxl_memdev(memdev_dev);
> +	get_device(&cxlmd->dev);
> +	region->targets[n] = cxlmd;
> +
> +	return len;
> +}
> +
> +#define TARGET_ATTR_RW(n)                                                      \
> +	static ssize_t target##n##_show(                                       \
> +		struct device *dev, struct device_attribute *attr, char *buf)  \
> +	{                                                                      \
> +		return show_targetN(to_cxl_region(dev), buf, n);               \
> +	}                                                                      \
> +	static ssize_t target##n##_store(struct device *dev,                   \
> +					 struct device_attribute *attr,        \
> +					 const char *buf, size_t len)          \
> +	{                                                                      \
> +		return set_targetN(to_cxl_region(dev), buf, n, len);           \
> +	}                                                                      \
> +	static DEVICE_ATTR_RW(target##n)
> +
> +TARGET_ATTR_RW(0);
> +TARGET_ATTR_RW(1);
> +TARGET_ATTR_RW(2);
> +TARGET_ATTR_RW(3);
> +TARGET_ATTR_RW(4);
> +TARGET_ATTR_RW(5);
> +TARGET_ATTR_RW(6);
> +TARGET_ATTR_RW(7);
> +TARGET_ATTR_RW(8);
> +TARGET_ATTR_RW(9);
> +TARGET_ATTR_RW(10);
> +TARGET_ATTR_RW(11);
> +TARGET_ATTR_RW(12);
> +TARGET_ATTR_RW(13);
> +TARGET_ATTR_RW(14);
> +TARGET_ATTR_RW(15);
> +
> +static struct attribute *interleave_attrs[] = {
> +	&dev_attr_target0.attr,
> +	&dev_attr_target1.attr,
> +	&dev_attr_target2.attr,
> +	&dev_attr_target3.attr,
> +	&dev_attr_target4.attr,
> +	&dev_attr_target5.attr,
> +	&dev_attr_target6.attr,
> +	&dev_attr_target7.attr,
> +	&dev_attr_target8.attr,
> +	&dev_attr_target9.attr,
> +	&dev_attr_target10.attr,
> +	&dev_attr_target11.attr,
> +	&dev_attr_target12.attr,
> +	&dev_attr_target13.attr,
> +	&dev_attr_target14.attr,
> +	&dev_attr_target15.attr,
> +	NULL,
> +};
> +
> +static umode_t visible_targets(struct kobject *kobj, struct attribute *a, int n)
> +{
> +	struct device *dev = container_of(kobj, struct device, kobj);
> +	struct cxl_region *region = to_cxl_region(dev);
> +
> +	if (n < cxl_region_ways(region))
> +		return a->mode;
> +	return 0;
> +}
> +
> +static const struct attribute_group region_interleave_group = {
> +	.attrs = interleave_attrs,
> +	.is_visible = visible_targets,
> +};
> +
>  static const struct attribute_group *region_groups[] = {
>  	&region_group,
> +	&region_interleave_group,
> +	NULL,
>  };
>  
>  static void cxl_region_release(struct device *dev);
> @@ -58,8 +278,13 @@ static struct cxl_region *to_cxl_region(struct device *dev)
>  
>  void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
>  {
> +	int i;
> +
>  	ida_free(&cxld->region_ida, region->id);
> -	kfree(region->targets);

This line looks like a bug in earlier patch as targets is allocated
as part of the allocation of region.

> +	for (i = 0; i < cxld->interleave_ways; i++) {
> +		if (region->targets[i])
> +			put_device(&region->targets[i]->dev);
> +	}
>  	kfree(region);
>  }
>  


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] Region Creation
  2021-06-11 13:11 ` [RFC PATCH 0/4] Region Creation Jonathan Cameron
@ 2021-06-11 13:53   ` Jonathan Cameron
  2021-06-11 16:12     ` Ben Widawsky
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Cameron @ 2021-06-11 13:53 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Dan Williams, Ira Weiny, Vishal Verma

On Fri, 11 Jun 2021 14:11:36 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Thu, 10 Jun 2021 11:57:21 -0700
> Ben Widawsky <ben.widawsky@intel.com> wrote:
> 
> > CXL interleave sets and non-interleave sets are described via regions. A region
> > is specified in the CXL 2.0 specification and the purpose is to create a
> > standardized way to preserve the region across reboots.  
> 
> A specific section reference would be helpful.
> 
> > 
> > Introduced here is the basic mechanism to create and configure and delete a CXL
> > region. Configuring a region simply means giving it a size, offset within the
> > CFMWS window, UUID, and a target list. Enabling/activating a region, which
> > ultimately means programming the HDM decoders in the chain, is left for later
> > work.
> > 
> > The patches are only minimally tested so far in QEMU emulation and so x1
> > interleave is all that's supported.  
> 
> I'm guessing this is why it's an RFC rather than a final submission?
> 
> If you can call out the RFC reasons in a cover letter it is helpful
> as saves people wondering what specifically you want comments on.

Hi Ben,

Having read through them all, I think this needs more thought than
I feel up to on a Friday afternoon.  Will get back to you on v2
perhaps.

Jonathan

> 
> > 
> > Here is a sample topology (also in patch #4)
> > 
> >     decoder1.0
> >     ├── create_region
> >     ├── delete_region
> >     ├── devtype
> >     ├── locked
> >     ├── region1.0:0
> >     │   ├── offset
> >     │   ├── size
> >     │   ├── subsystem -> ../../../../../../../bus/cxl
> >     │   ├── target0
> >     │   ├── uevent
> >     │   ├── uuid
> >     │   └── verify
> >     ├── size
> >     ├── start
> >     ├── subsystem -> ../../../../../../bus/cxl
> >     ├── target_list
> >     ├── target_type
> >     └── uevent
> > 
> > Ben Widawsky (4):
> >   cxl/region: Add region creation ABI
> >   cxl/region: Create attribute structure / verify
> >   cxl: Move cxl_memdev conversion helper to mem.h
> >   cxl/region: Introduce concept of region configuration
> > 
> >  Documentation/ABI/testing/sysfs-bus-cxl       |  59 +++
> >  .../driver-api/cxl/memory-devices.rst         |   8 +
> >  drivers/cxl/Makefile                          |   2 +-
> >  drivers/cxl/core.c                            |  71 ++++
> >  drivers/cxl/cxl.h                             |  11 +
> >  drivers/cxl/mem.h                             |  26 ++
> >  drivers/cxl/pci.c                             |   5 -
> >  drivers/cxl/region.c                          | 400 ++++++++++++++++++
> >  8 files changed, 576 insertions(+), 6 deletions(-)
> >  create mode 100644 drivers/cxl/region.c
> >   
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] Region Creation
  2021-06-11 13:53   ` Jonathan Cameron
@ 2021-06-11 16:12     ` Ben Widawsky
  0 siblings, 0 replies; 22+ messages in thread
From: Ben Widawsky @ 2021-06-11 16:12 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Alison Schofield, Dan Williams, Ira Weiny, Vishal Verma

On 21-06-11 14:53:31, Jonathan Cameron wrote:
> On Fri, 11 Jun 2021 14:11:36 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Thu, 10 Jun 2021 11:57:21 -0700
> > Ben Widawsky <ben.widawsky@intel.com> wrote:
> > 
> > > CXL interleave sets and non-interleave sets are described via regions. A region
> > > is specified in the CXL 2.0 specification and the purpose is to create a
> > > standardized way to preserve the region across reboots.  
> > 
> > A specific section reference would be helpful.
> > 
> > > 
> > > Introduced here is the basic mechanism to create and configure and delete a CXL
> > > region. Configuring a region simply means giving it a size, offset within the
> > > CFMWS window, UUID, and a target list. Enabling/activating a region, which
> > > ultimately means programming the HDM decoders in the chain, is left for later
> > > work.
> > > 
> > > The patches are only minimally tested so far in QEMU emulation and so x1
> > > interleave is all that's supported.  
> > 
> > I'm guessing this is why it's an RFC rather than a final submission?
> > 
> > If you can call out the RFC reasons in a cover letter it is helpful
> > as saves people wondering what specifically you want comments on.
> 
> Hi Ben,
> 
> Having read through them all, I think this needs more thought than
> I feel up to on a Friday afternoon.  Will get back to you on v2
> perhaps.
> 
> Jonathan
> 

Thanks for looking. Totally fair too :-)

I'm mainly looking for feedback on the region creation and configuration from
ABI perspective. Nitty gritty code review can happen with the v1 submission.

> > 
> > > 
> > > Here is a sample topology (also in patch #4)
> > > 
> > >     decoder1.0
> > >     ├── create_region
> > >     ├── delete_region
> > >     ├── devtype
> > >     ├── locked
> > >     ├── region1.0:0
> > >     │   ├── offset
> > >     │   ├── size
> > >     │   ├── subsystem -> ../../../../../../../bus/cxl
> > >     │   ├── target0
> > >     │   ├── uevent
> > >     │   ├── uuid
> > >     │   └── verify
> > >     ├── size
> > >     ├── start
> > >     ├── subsystem -> ../../../../../../bus/cxl
> > >     ├── target_list
> > >     ├── target_type
> > >     └── uevent
> > > 
> > > Ben Widawsky (4):
> > >   cxl/region: Add region creation ABI
> > >   cxl/region: Create attribute structure / verify
> > >   cxl: Move cxl_memdev conversion helper to mem.h
> > >   cxl/region: Introduce concept of region configuration
> > > 
> > >  Documentation/ABI/testing/sysfs-bus-cxl       |  59 +++
> > >  .../driver-api/cxl/memory-devices.rst         |   8 +
> > >  drivers/cxl/Makefile                          |   2 +-
> > >  drivers/cxl/core.c                            |  71 ++++
> > >  drivers/cxl/cxl.h                             |  11 +
> > >  drivers/cxl/mem.h                             |  26 ++
> > >  drivers/cxl/pci.c                             |   5 -
> > >  drivers/cxl/region.c                          | 400 ++++++++++++++++++
> > >  8 files changed, 576 insertions(+), 6 deletions(-)
> > >  create mode 100644 drivers/cxl/region.c
> > >   
> > 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] Region Creation
  2021-06-10 18:57 [RFC PATCH 0/4] Region Creation Ben Widawsky
                   ` (4 preceding siblings ...)
  2021-06-11 13:11 ` [RFC PATCH 0/4] Region Creation Jonathan Cameron
@ 2021-06-12  0:44 ` Dan Williams
  2021-06-14  8:20   ` Jonathan Cameron
  2021-06-14 16:12   ` Ben Widawsky
  5 siblings, 2 replies; 22+ messages in thread
From: Dan Williams @ 2021-06-12  0:44 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Ira Weiny, Jonathan Cameron, Vishal Verma

On Thu, Jun 10, 2021 at 11:58 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> CXL interleave sets and non-interleave sets are described via regions. A region
> is specified in the CXL 2.0 specification and the purpose is to create a
> standardized way to preserve the region across reboots.
>
> Introduced here is the basic mechanism to create and configure and delete a CXL
> region. Configuring a region simply means giving it a size, offset within the
> CFMWS window, UUID, and a target list. Enabling/activating a region, which
> ultimately means programming the HDM decoders in the chain, is left for later
> work.
>
> The patches are only minimally tested so far in QEMU emulation and so x1
> interleave is all that's supported.
>
> Here is a sample topology (also in patch #4)

I'm just going to react to the attributes before looking at the
implementation to make sure we're level set.

>
>     decoder1.0
>     ├── create_region
>     ├── delete_region
>     ├── devtype
>     ├── locked
>     ├── region1.0:0
>     │   ├── offset

Is this the region's offset relative to the next available free space
in the parent decoder range? If this is output only I think it's ok,
but I think the address space allocation decision belongs to the
region driver at activation time. I.e. userspace does not have much of
a chance at specifying this relative all the other dynamic operations
that can be happening in the decoder.

>     │   ├── size
>     │   ├── subsystem -> ../../../../../../../bus/cxl
>     │   ├── target0
>     │   ├── uevent
>     │   ├── uuid
>     │   └── verify

I don't understand the role of a standalone @verify attribute, there
is verification that can happen per attribute write, and there is
final verification that can happen at region bind time. Either way
anything verify would check is duplicated somewhere else, and the
verification per attribute update is more precise. For example writes
to @size can check for free space in parent decoder and fail if
unavailable. Writes to targetX can fail if the memdev is not connected
to this decoder's port topology, or the memdev is out of decoder
resources. The final region bind will fail if mid-level switches are
lacking decoder resources, or would require changing a decoder
configuration that is pinned active.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/4] cxl/region: Create attribute structure / verify
  2021-06-10 18:57 ` [RFC PATCH 2/4] cxl/region: Create attribute structure / verify Ben Widawsky
  2021-06-11 13:37   ` Jonathan Cameron
@ 2021-06-12  0:59   ` Dan Williams
  2021-06-14 16:12     ` Ben Widawsky
  1 sibling, 1 reply; 22+ messages in thread
From: Dan Williams @ 2021-06-12  0:59 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Ira Weiny, Jonathan Cameron, Vishal Verma

On Thu, Jun 10, 2021 at 11:57 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> Introduce a verification mechanism for a region. Regions have complex
> configuration requirements and it is beneficial to provide a way to
> verify the constraints are met before trying to bind. Primarily it adds
> ABI to inform userspace of more detailed information about what failed
> rather than the limited choices of errno at bind time.

If you want to give userspace more data about an internal process
that's a TRACE_EVENT().

> It's important to point out that a verified region can still fail to
> bind, but the first step in binding will be to run the same verification
> algorithm.

I don't see this point of giving userspace a racy / less accurate
answer than the bind result.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] Region Creation
  2021-06-12  0:44 ` Dan Williams
@ 2021-06-14  8:20   ` Jonathan Cameron
  2021-06-14 16:12   ` Ben Widawsky
  1 sibling, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2021-06-14  8:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ben Widawsky, linux-cxl, Alison Schofield, Ira Weiny, Vishal Verma

On Fri, 11 Jun 2021 17:44:02 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Thu, Jun 10, 2021 at 11:58 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > CXL interleave sets and non-interleave sets are described via regions. A region
> > is specified in the CXL 2.0 specification and the purpose is to create a
> > standardized way to preserve the region across reboots.
> >
> > Introduced here is the basic mechanism to create and configure and delete a CXL
> > region. Configuring a region simply means giving it a size, offset within the
> > CFMWS window, UUID, and a target list. Enabling/activating a region, which
> > ultimately means programming the HDM decoders in the chain, is left for later
> > work.
> >
> > The patches are only minimally tested so far in QEMU emulation and so x1
> > interleave is all that's supported.
> >
> > Here is a sample topology (also in patch #4)  
> 
> I'm just going to react to the attributes before looking at the
> implementation to make sure we're level set.
> 
> >
> >     decoder1.0
> >     ├── create_region
> >     ├── delete_region
> >     ├── devtype
> >     ├── locked
> >     ├── region1.0:0
> >     │   ├── offset  
> 
> Is this the region's offset relative to the next available free space
> in the parent decoder range? If this is output only I think it's ok,
> but I think the address space allocation decision belongs to the
> region driver at activation time. I.e. userspace does not have much of
> a chance at specifying this relative all the other dynamic operations
> that can be happening in the decoder.
> 
> >     │   ├── size
> >     │   ├── subsystem -> ../../../../../../../bus/cxl
> >     │   ├── target0
> >     │   ├── uevent
> >     │   ├── uuid
> >     │   └── verify  
> 
> I don't understand the role of a standalone @verify attribute, there
> is verification that can happen per attribute write, and there is
> final verification that can happen at region bind time. Either way
> anything verify would check is duplicated somewhere else, and the
> verification per attribute update is more precise. For example writes
> to @size can check for free space in parent decoder and fail if
> unavailable.

This comes back to your question above on whether offset is writable
and what it is with respect to.

If it is writeable, then you can't really verify size and offset
separately.

I'm not against just doing it on commit.

> Writes to targetX can fail if the memdev is not connected
> to this decoder's port topology, or the memdev is out of decoder
> resources. The final region bind will fail if mid-level switches are
> lacking decoder resources, or would require changing a decoder
> configuration that is pinned active.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] Region Creation
  2021-06-12  0:44 ` Dan Williams
  2021-06-14  8:20   ` Jonathan Cameron
@ 2021-06-14 16:12   ` Ben Widawsky
  2021-06-14 21:04     ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: Ben Widawsky @ 2021-06-14 16:12 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Alison Schofield, Ira Weiny, Jonathan Cameron, Vishal Verma

On 21-06-11 17:44:02, Dan Williams wrote:
> On Thu, Jun 10, 2021 at 11:58 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > CXL interleave sets and non-interleave sets are described via regions. A region
> > is specified in the CXL 2.0 specification and the purpose is to create a
> > standardized way to preserve the region across reboots.
> >
> > Introduced here is the basic mechanism to create and configure and delete a CXL
> > region. Configuring a region simply means giving it a size, offset within the
> > CFMWS window, UUID, and a target list. Enabling/activating a region, which
> > ultimately means programming the HDM decoders in the chain, is left for later
> > work.
> >
> > The patches are only minimally tested so far in QEMU emulation and so x1
> > interleave is all that's supported.
> >
> > Here is a sample topology (also in patch #4)
> 
> I'm just going to react to the attributes before looking at the
> implementation to make sure we're level set.
> 
> >
> >     decoder1.0
> >     ├── create_region
> >     ├── delete_region
> >     ├── devtype
> >     ├── locked
> >     ├── region1.0:0
> >     │   ├── offset
> 
> Is this the region's offset relative to the next available free space
> in the parent decoder range? If this is output only I think it's ok,
> but I think the address space allocation decision belongs to the
> region driver at activation time. I.e. userspace does not have much of
> a chance at specifying this relative all the other dynamic operations
> that can be happening in the decoder.
> 

This was my mistake. Offset will be determined by the driver and I intend for
this to be read-only.

> >     │   ├── size
> >     │   ├── subsystem -> ../../../../../../../bus/cxl
> >     │   ├── target0
> >     │   ├── uevent
> >     │   ├── uuid
> >     │   └── verify
> 
> I don't understand the role of a standalone @verify attribute, there
> is verification that can happen per attribute write, and there is
> final verification that can happen at region bind time. Either way
> anything verify would check is duplicated somewhere else, and the
> verification per attribute update is more precise. For example writes
> to @size can check for free space in parent decoder and fail if
> unavailable. Writes to targetX can fail if the memdev is not connected
> to this decoder's port topology, or the memdev is out of decoder
> resources. The final region bind will fail if mid-level switches are
> lacking decoder resources, or would require changing a decoder
> configuration that is pinned active.

I strongly believe verification per attribute write will get too fragile. I'm
afraid it's going to require writing attributes in a specific order so that we
can do said verification in a sane way. We can skip that and just check it all
on bind. Most of that logic is what would be contained in verify(), so why not
expose it for userspace that may want to test out various configs without
actually trying to bind?

Also, I like having ABI that helps userspace get details on the configuration
failure reason. You mention in the other reply, TRACE_EVENT. I suppose userspace
could use tracepoints, or scrape dmesg for this same info. Maybe it's 6 one way,
a half dozen the other. I'd be interested to know if there are other examples of
tracepoints being used by userspace in a way like this and what the experience
is like.

To summarize, I think we need an atomic way to do verification (which obviously
happens at bind()), and I think we need UAPI to get the configuration error.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/4] cxl/region: Create attribute structure / verify
  2021-06-12  0:59   ` Dan Williams
@ 2021-06-14 16:12     ` Ben Widawsky
  0 siblings, 0 replies; 22+ messages in thread
From: Ben Widawsky @ 2021-06-14 16:12 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Alison Schofield, Ira Weiny, Jonathan Cameron, Vishal Verma

On 21-06-11 17:59:15, Dan Williams wrote:
> On Thu, Jun 10, 2021 at 11:57 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > Introduce a verification mechanism for a region. Regions have complex
> > configuration requirements and it is beneficial to provide a way to
> > verify the constraints are met before trying to bind. Primarily it adds
> > ABI to inform userspace of more detailed information about what failed
> > rather than the limited choices of errno at bind time.
> 
> If you want to give userspace more data about an internal process
> that's a TRACE_EVENT().

Addressed this in the other email...

> 
> > It's important to point out that a verified region can still fail to
> > bind, but the first step in binding will be to run the same verification
> > algorithm.
> 
> I don't see this point of giving userspace a racy / less accurate
> answer than the bind result.

Hopefully I didn't forget to mention that verify() would be an optional call
from userspace. Perhaps making it debugfs makes more sense?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce concept of region configuration
  2021-06-11 13:52   ` Jonathan Cameron
@ 2021-06-14 16:18     ` Ben Widawsky
  2021-06-14 16:20       ` Jonathan Cameron
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Widawsky @ 2021-06-14 16:18 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Alison Schofield, Dan Williams, Ira Weiny, Vishal Verma

On 21-06-11 14:52:06, Jonathan Cameron wrote:
> On Thu, 10 Jun 2021 11:57:25 -0700
> Ben Widawsky <ben.widawsky@intel.com> wrote:
> 
> > The region creation APIs leave a region unconfigured. Configuring the
> > region will work in the same way as similar subsystems such as devdax.
> > Sysfs attrs will be provided to allow userspace to configure the region.
> > Finally once all configuration is complete, userspace may "commit" the
> > config. What the kernel decides to do after a config is committed is out
> > of scope at this point.
> > 
> > Introduced here are the most basic attributes needed to configure a
> > region.
> > 
> > A x1 interleave example is provided below:
> > 
> > decoder1.0
> > ├── create_region
> > ├── delete_region
> > ├── devtype
> > ├── locked
> > ├── region1.0:0
> > │   ├── offset
> > │   ├── size
> > │   ├── subsystem -> ../../../../../../../bus/cxl
> > │   ├── target0
> > │   ├── uevent
> > │   ├── uuid
> > │   └── verify
> > ├── size
> > ├── start
> > ├── subsystem -> ../../../../../../bus/cxl
> > ├── target_list
> > ├── target_type
> > └── uevent
> > 
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl |  27 +++
> >  drivers/cxl/mem.h                       |   2 +
> >  drivers/cxl/region.c                    | 227 +++++++++++++++++++++++-
> >  3 files changed, 255 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 699c8514fd7b..d7174a84f70d 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -159,3 +159,30 @@ Description:
> >  		integer is returned describing the first error found in the
> >  		configuration. A verified region can still fail binding due to
> >  		lack of resources.
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/{offset,size}
> 
> I missed this before, but why do we need X.Y in the region naming given it's
> always inside decoderX.Y.  Seems like RegionZ would work.
> 

Yes. There are two options, either we do X.Y:Z, or Z where Z is globally unique
across all decoders. The reason for this is the devices are symlinked into sysfs
and so you can't have the same Z for multiple regions in different decoders.

My preference is to have regionX.y:[0-n] for each decoder, rather than the globally
unique. I'll entertain an argument the other way though.

> > +Date:		June, 2021
> > +KernelVersion:	v5.14
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		A region resides within an address space that is claimed by a
> > +		decoder. The region will be of some size within the address
> > +		space and at some offset that must also reside within the
> > +		address space. The size and position of the region is specified
> > +		by these attributes.
> Could perhaps reword this.  Something like.
> 
> 		The region defined by size and offset must be fully contained
> 		within the address space.
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/uuid
> > +Date:		June, 2021
> > +KernelVersion:	v5.14
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		The unique identifier for the region.
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/target[0-15]
> > +Date:		June, 2021
> > +KernelVersion:	v5.14
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		Memory devices are the backing storage for a region. Each target
> > +		must be populated with a memdev in order for the region to be
> > +		eligible to be activated.
> 
> How do you do that?  What is written to this file?

A name of the dev is written here, ie "mem0". I can add this to the
documentation...

> 
> > diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h
> > index 9795aa924035..059fbf084fa1 100644
> > --- a/drivers/cxl/mem.h
> > +++ b/drivers/cxl/mem.h
> > @@ -58,6 +58,7 @@ static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
> >   * @dev: This region's device.
> >   * @id: This regions id. Id is globally unique across all regions.
> >   * @res: Address space consumed by this region.
> > + * @uuid: The UUID for this region.
> >   * @list: Node in decoders region list.
> >   * @targets: The memory devices comprising the region.
> >   */
> > @@ -65,6 +66,7 @@ struct cxl_region {
> >  	struct device dev;
> >  	int id;
> >  	struct resource res;
> > +	uuid_t uuid;
> >  	struct list_head list;
> >  	struct cxl_memdev *targets[];
> >  };
> > diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
> > index ea1ac848c713..a69ee00514cb 100644
> > --- a/drivers/cxl/region.c
> > +++ b/drivers/cxl/region.c
> > @@ -3,7 +3,9 @@
> >  #include <linux/io-64-nonatomic-lo-hi.h>
> >  #include <linux/device.h>
> >  #include <linux/module.h>
> > +#include <linux/sizes.h>
> >  #include <linux/slab.h>
> > +#include <linux/uuid.h>
> >  #include <linux/idr.h>
> >  #include "cxl.h"
> >  #include "mem.h"
> > @@ -20,15 +22,130 @@
> >   * relationship between decoder and region when the region is interleaved.
> >   */
> >  
> > +static struct cxl_region *to_cxl_region(struct device *dev);
> > +
> > +#define cxl_region_ways(region)                                                \
> > +	to_cxl_decoder((region)->dev.parent)->interleave_ways
> > +
> >  static ssize_t verify_show(struct device *dev, struct device_attribute *attr, char *buf)
> >  {
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> > +	struct cxl_region *region = to_cxl_region(dev);
> > +	struct resource decode_res;
> > +	int i;
> > +
> > +	decode_res = (struct resource)DEFINE_RES_MEM(cxld->range.start,
> > +						     range_len(&cxld->range));
> > +
> > +	/* Invalid region size */
> > +	if (!resource_contains(&decode_res, &region->res))
> > +		return sysfs_emit(buf, "size");
> 
> Perhaps "outside region"?  Size might be fine, but not the offset.
> Also, docs say a negative integer is returned for this attribute.
> Those docs need to call out the full list of things that might be returned
> so that userspace can know what to expect.

Okay.

> 
> > +
> > +	if (resource_size(&region->res) % (SZ_256M * cxld->interleave_ways))
> > +		return sysfs_emit(buf, "alignment");
> > +
> > +	/* Missing target memory device */
> > +	for (i = 0; i < cxld->interleave_ways; i++)
> > +		if (!region->targets[i])
> > +			return sysfs_emit(buf, "memdev");
> > +
> >  	return sysfs_emit(buf, "0");
> >  }
> >  
> >  static DEVICE_ATTR_RO(verify);
> >  
> > +static bool is_region_active(struct cxl_region *region)
> > +{
> > +	/* TODO: Regions can't be activated yet. */
> > +	return false;
> > +}
> > +
> > +static ssize_t offset_show(struct device *dev, struct device_attribute *attr,
> > +			   char *buf)
> > +{
> > +	struct cxl_region *region = to_cxl_region(dev);
> > +
> > +	return sysfs_emit(buf, "%#llx\n", region->res.start);
> > +}
> > +
> > +static ssize_t offset_store(struct device *dev, struct device_attribute *attr,
> > +			    const char *buf, size_t len)
> > +{
> > +	struct cxl_region *region = to_cxl_region(dev);
> > +	unsigned long long val;
> > +	ssize_t rc;
> > +
> > +	rc = kstrtoull(buf, 0, &val);
> > +	if (rc)
> > +		return rc;
> > +
> > +	if (is_region_active(region)) {
> > +		/* TODO: */
> > +	} else {
> > +		region->res.start = val;
> > +	}
> > +
> > +	return len;
> > +}
> > +
> > +static DEVICE_ATTR_RW(offset);
> > +
> > +static ssize_t size_show(struct device *dev, struct device_attribute *attr,
> > +			 char *buf)
> > +{
> > +	struct cxl_region *region = to_cxl_region(dev);
> > +
> > +	return sysfs_emit(buf, "%llu\n", resource_size(&region->res));
> > +}
> > +
> > +static ssize_t size_store(struct device *dev, struct device_attribute *attr,
> > +			  const char *buf, size_t len)
> > +{
> > +	struct cxl_region *region = to_cxl_region(dev);
> > +	unsigned long long val;
> > +	ssize_t rc;
> > +
> > +	rc = kstrtoull(buf, 0, &val);
> > +	if (rc)
> > +		return rc;
> > +
> > +	if (is_region_active(region)) {
> > +		/* TODO: */
> > +	} else {
> > +		region->res.end = region->res.start + val - 1;
> > +	}
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_RW(size);
> > +
> > +static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
> > +			 char *buf)
> > +{
> > +	struct cxl_region *region = to_cxl_region(dev);
> > +
> > +	return sysfs_emit(buf, "%pUb\n", &region->uuid);
> > +}
> > +static ssize_t uuid_store(struct device *dev, struct device_attribute *attr,
> > +			  const char *buf, size_t len)
> > +{
> > +	struct cxl_region *region = to_cxl_region(dev);
> > +	ssize_t rc;
> > +
> > +	if (len != UUID_STRING_LEN + 1)
> > +		return -EINVAL;
> > +
> > +	rc = uuid_parse(buf, &region->uuid);
> > +
> > +	return rc ? rc : len;
> > +}
> > +static DEVICE_ATTR_RW(uuid);
> > +
> >  static struct attribute *region_attrs[] = {
> >  	&dev_attr_verify.attr,
> > +	&dev_attr_offset.attr,
> > +	&dev_attr_size.attr,
> > +	&dev_attr_uuid.attr,
> >  	NULL,
> >  };
> >  
> > @@ -36,8 +153,111 @@ static const struct attribute_group region_group = {
> >  	.attrs = region_attrs,
> >  };
> >  
> > +static size_t show_targetN(struct cxl_region *region, char *buf, int n)
> > +{
> > +	if (region->targets[n])
> > +		return sysfs_emit(buf, "%s\n", dev_name(&region->targets[n]->dev));
> > +	else
> > +		return sysfs_emit(buf, "nil\n");
> 
> This needs documenting in the ABI docs. I'd I guessed it would return an empty
> string.

Okay.

> 
> > +}
> > +
> > +static size_t set_targetN(struct cxl_region *region, const char *buf, int n, size_t len)
> > +{
> > +	struct device *memdev_dev;
> > +	struct cxl_memdev *cxlmd;
> > +	ssize_t rc;
> > +	int val;
> > +
> > +	rc = kstrtoint(buf, 0, &val);
> > +	if (!rc && val == 0) {
> > +		cxlmd = region->targets[n] = cxlmd;
> > +		if (cxlmd)
> > +			put_device(&cxlmd->dev);
> > +		region->targets[n] = NULL;
> > +		return len;
> > +	}
> > +
> > +	memdev_dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
> > +	if (!memdev_dev)
> > +		return -ENOENT;
> > +
> > +	cxlmd = to_cxl_memdev(memdev_dev);
> > +	get_device(&cxlmd->dev);
> > +	region->targets[n] = cxlmd;
> > +
> > +	return len;
> > +}
> > +
> > +#define TARGET_ATTR_RW(n)                                                      \
> > +	static ssize_t target##n##_show(                                       \
G> > +		struct device *dev, struct device_attribute *attr, char *buf)  \
> > +	{                                                                      \
> > +		return show_targetN(to_cxl_region(dev), buf, n);               \
> > +	}                                                                      \
> > +	static ssize_t target##n##_store(struct device *dev,                   \
> > +					 struct device_attribute *attr,        \
> > +					 const char *buf, size_t len)          \
> > +	{                                                                      \
> > +		return set_targetN(to_cxl_region(dev), buf, n, len);           \
> > +	}                                                                      \
> > +	static DEVICE_ATTR_RW(target##n)
> > +
> > +TARGET_ATTR_RW(0);
> > +TARGET_ATTR_RW(1);
> > +TARGET_ATTR_RW(2);
> > +TARGET_ATTR_RW(3);
> > +TARGET_ATTR_RW(4);
> > +TARGET_ATTR_RW(5);
> > +TARGET_ATTR_RW(6);
> > +TARGET_ATTR_RW(7);
> > +TARGET_ATTR_RW(8);
> > +TARGET_ATTR_RW(9);
> > +TARGET_ATTR_RW(10);
> > +TARGET_ATTR_RW(11);
> > +TARGET_ATTR_RW(12);
> > +TARGET_ATTR_RW(13);
> > +TARGET_ATTR_RW(14);
> > +TARGET_ATTR_RW(15);
> > +
> > +static struct attribute *interleave_attrs[] = {
> > +	&dev_attr_target0.attr,
> > +	&dev_attr_target1.attr,
> > +	&dev_attr_target2.attr,
> > +	&dev_attr_target3.attr,
> > +	&dev_attr_target4.attr,
> > +	&dev_attr_target5.attr,
> > +	&dev_attr_target6.attr,
> > +	&dev_attr_target7.attr,
> > +	&dev_attr_target8.attr,
> > +	&dev_attr_target9.attr,
> > +	&dev_attr_target10.attr,
> > +	&dev_attr_target11.attr,
> > +	&dev_attr_target12.attr,
> > +	&dev_attr_target13.attr,
> > +	&dev_attr_target14.attr,
> > +	&dev_attr_target15.attr,
> > +	NULL,
> > +};
> > +
> > +static umode_t visible_targets(struct kobject *kobj, struct attribute *a, int n)
> > +{
> > +	struct device *dev = container_of(kobj, struct device, kobj);
> > +	struct cxl_region *region = to_cxl_region(dev);
> > +
> > +	if (n < cxl_region_ways(region))
> > +		return a->mode;
> > +	return 0;
> > +}
> > +
> > +static const struct attribute_group region_interleave_group = {
> > +	.attrs = interleave_attrs,
> > +	.is_visible = visible_targets,
> > +};
> > +
> >  static const struct attribute_group *region_groups[] = {
> >  	&region_group,
> > +	&region_interleave_group,
> > +	NULL,
> >  };
> >  
> >  static void cxl_region_release(struct device *dev);
> > @@ -58,8 +278,13 @@ static struct cxl_region *to_cxl_region(struct device *dev)
> >  
> >  void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
> >  {
> > +	int i;
> > +
> >  	ida_free(&cxld->region_ida, region->id);
> > -	kfree(region->targets);
> 
> This line looks like a bug in earlier patch as targets is allocated
> as part of the allocation of region.

Yep, thanks.

> 
> > +	for (i = 0; i < cxld->interleave_ways; i++) {
> > +		if (region->targets[i])
> > +			put_device(&region->targets[i]->dev);
> > +	}
> >  	kfree(region);
> >  }
> >  
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce concept of region configuration
  2021-06-14 16:18     ` Ben Widawsky
@ 2021-06-14 16:20       ` Jonathan Cameron
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2021-06-14 16:20 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Dan Williams, Ira Weiny, Vishal Verma

On Mon, 14 Jun 2021 09:18:51 -0700
Ben Widawsky <ben.widawsky@intel.com> wrote:

> On 21-06-11 14:52:06, Jonathan Cameron wrote:
> > On Thu, 10 Jun 2021 11:57:25 -0700
> > Ben Widawsky <ben.widawsky@intel.com> wrote:
> >   
> > > The region creation APIs leave a region unconfigured. Configuring the
> > > region will work in the same way as similar subsystems such as devdax.
> > > Sysfs attrs will be provided to allow userspace to configure the region.
> > > Finally once all configuration is complete, userspace may "commit" the
> > > config. What the kernel decides to do after a config is committed is out
> > > of scope at this point.
> > > 
> > > Introduced here are the most basic attributes needed to configure a
> > > region.
> > > 
> > > A x1 interleave example is provided below:
> > > 
> > > decoder1.0
> > > ├── create_region
> > > ├── delete_region
> > > ├── devtype
> > > ├── locked
> > > ├── region1.0:0
> > > │   ├── offset
> > > │   ├── size
> > > │   ├── subsystem -> ../../../../../../../bus/cxl
> > > │   ├── target0
> > > │   ├── uevent
> > > │   ├── uuid
> > > │   └── verify
> > > ├── size
> > > ├── start
> > > ├── subsystem -> ../../../../../../bus/cxl
> > > ├── target_list
> > > ├── target_type
> > > └── uevent
> > > 
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > ---
> > >  Documentation/ABI/testing/sysfs-bus-cxl |  27 +++
> > >  drivers/cxl/mem.h                       |   2 +
> > >  drivers/cxl/region.c                    | 227 +++++++++++++++++++++++-
> > >  3 files changed, 255 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > > index 699c8514fd7b..d7174a84f70d 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > > @@ -159,3 +159,30 @@ Description:
> > >  		integer is returned describing the first error found in the
> > >  		configuration. A verified region can still fail binding due to
> > >  		lack of resources.
> > > +
> > > +What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/{offset,size}  
> > 
> > I missed this before, but why do we need X.Y in the region naming given it's
> > always inside decoderX.Y.  Seems like RegionZ would work.
> >   
> 
> Yes. There are two options, either we do X.Y:Z, or Z where Z is globally unique
> across all decoders. The reason for this is the devices are symlinked into sysfs
> and so you can't have the same Z for multiple regions in different decoders.
> 
> My preference is to have regionX.y:[0-n] for each decoder, rather than the globally
> unique. I'll entertain an argument the other way though.

I'm happy with either approach.

Thanks,

Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] Region Creation
  2021-06-14 16:12   ` Ben Widawsky
@ 2021-06-14 21:04     ` Dan Williams
  2021-06-14 21:54       ` Ben Widawsky
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Williams @ 2021-06-14 21:04 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Ira Weiny, Jonathan Cameron, Vishal Verma

On Mon, Jun 14, 2021 at 9:12 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 21-06-11 17:44:02, Dan Williams wrote:
> > On Thu, Jun 10, 2021 at 11:58 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > CXL interleave sets and non-interleave sets are described via regions. A region
> > > is specified in the CXL 2.0 specification and the purpose is to create a
> > > standardized way to preserve the region across reboots.
> > >
> > > Introduced here is the basic mechanism to create and configure and delete a CXL
> > > region. Configuring a region simply means giving it a size, offset within the
> > > CFMWS window, UUID, and a target list. Enabling/activating a region, which
> > > ultimately means programming the HDM decoders in the chain, is left for later
> > > work.
> > >
> > > The patches are only minimally tested so far in QEMU emulation and so x1
> > > interleave is all that's supported.
> > >
> > > Here is a sample topology (also in patch #4)
> >
> > I'm just going to react to the attributes before looking at the
> > implementation to make sure we're level set.
> >
> > >
> > >     decoder1.0
> > >     ├── create_region
> > >     ├── delete_region
> > >     ├── devtype
> > >     ├── locked
> > >     ├── region1.0:0
> > >     │   ├── offset
> >
> > Is this the region's offset relative to the next available free space
> > in the parent decoder range? If this is output only I think it's ok,
> > but I think the address space allocation decision belongs to the
> > region driver at activation time. I.e. userspace does not have much of
> > a chance at specifying this relative all the other dynamic operations
> > that can be happening in the decoder.
> >
>
> This was my mistake. Offset will be determined by the driver and I intend for
> this to be read-only.
>
> > >     │   ├── size
> > >     │   ├── subsystem -> ../../../../../../../bus/cxl
> > >     │   ├── target0
> > >     │   ├── uevent
> > >     │   ├── uuid
> > >     │   └── verify
> >
> > I don't understand the role of a standalone @verify attribute, there
> > is verification that can happen per attribute write, and there is
> > final verification that can happen at region bind time. Either way
> > anything verify would check is duplicated somewhere else, and the
> > verification per attribute update is more precise. For example writes
> > to @size can check for free space in parent decoder and fail if
> > unavailable. Writes to targetX can fail if the memdev is not connected
> > to this decoder's port topology, or the memdev is out of decoder
> > resources. The final region bind will fail if mid-level switches are
> > lacking decoder resources, or would require changing a decoder
> > configuration that is pinned active.
>
> I strongly believe verification per attribute write will get too fragile. I'm
> afraid it's going to require writing attributes in a specific order so that we
> can do said verification in a sane way. We can skip that and just check it all
> on bind. Most of that logic is what would be contained in verify(), so why not
> expose it for userspace that may want to test out various configs without
> actually trying to bind?

Because there's no harm in actually trying to bind. A verify attribute
is at best redundant, or I am otherwise not understanding the proposed
use case?

> Also, I like having ABI that helps userspace get details on the configuration
> failure reason. You mention in the other reply, TRACE_EVENT. I suppose userspace
> could use tracepoints, or scrape dmesg for this same info. Maybe it's 6 one way,
> a half dozen the other. I'd be interested to know if there are other examples of
> tracepoints being used by userspace in a way like this and what the experience
> is like.
>
> To summarize, I think we need an atomic way to do verification (which obviously
> happens at bind()), and I think we need UAPI to get the configuration error.

I expect higher order configuration error reporting and non-atomic
pre-verification to come from user tooling. As for what the kernel can
do at runtime in the absence of user tooling, or in the development of
more aware tooling has been debated in the past [1]. In this case the
entire decoder resource topology is visible in userspace, and while
userspace can't atomically predict what will happen, it also does not
need to because the admin should not be racing resource querying and
resource consumption if they want to get a reliable answer. The reason
I recommended TRACE_EVENT() rather than dev_dbg() is due to being able
to filter event messages by cpu, pid, tid, uid... etc. Another
approach I have seen upstream is to emit extra variables with a
KOBJ_CHANGE event, but that is more about event reporting than extra
information about provisioning failure.

[1]: https://lwn.net/Articles/657341/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] Region Creation
  2021-06-14 21:04     ` Dan Williams
@ 2021-06-14 21:54       ` Ben Widawsky
  2021-06-14 22:21         ` Dan Williams
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Widawsky @ 2021-06-14 21:54 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Alison Schofield, Ira Weiny, Jonathan Cameron, Vishal Verma

On 21-06-14 14:04:32, Dan Williams wrote:
> On Mon, Jun 14, 2021 at 9:12 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > On 21-06-11 17:44:02, Dan Williams wrote:
> > > On Thu, Jun 10, 2021 at 11:58 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > >
> > > > CXL interleave sets and non-interleave sets are described via regions. A region
> > > > is specified in the CXL 2.0 specification and the purpose is to create a
> > > > standardized way to preserve the region across reboots.
> > > >
> > > > Introduced here is the basic mechanism to create and configure and delete a CXL
> > > > region. Configuring a region simply means giving it a size, offset within the
> > > > CFMWS window, UUID, and a target list. Enabling/activating a region, which
> > > > ultimately means programming the HDM decoders in the chain, is left for later
> > > > work.
> > > >
> > > > The patches are only minimally tested so far in QEMU emulation and so x1
> > > > interleave is all that's supported.
> > > >
> > > > Here is a sample topology (also in patch #4)
> > >
> > > I'm just going to react to the attributes before looking at the
> > > implementation to make sure we're level set.
> > >
> > > >
> > > >     decoder1.0
> > > >     ├── create_region
> > > >     ├── delete_region
> > > >     ├── devtype
> > > >     ├── locked
> > > >     ├── region1.0:0
> > > >     │   ├── offset
> > >
> > > Is this the region's offset relative to the next available free space
> > > in the parent decoder range? If this is output only I think it's ok,
> > > but I think the address space allocation decision belongs to the
> > > region driver at activation time. I.e. userspace does not have much of
> > > a chance at specifying this relative all the other dynamic operations
> > > that can be happening in the decoder.
> > >
> >
> > This was my mistake. Offset will be determined by the driver and I intend for
> > this to be read-only.
> >
> > > >     │   ├── size
> > > >     │   ├── subsystem -> ../../../../../../../bus/cxl
> > > >     │   ├── target0
> > > >     │   ├── uevent
> > > >     │   ├── uuid
> > > >     │   └── verify
> > >
> > > I don't understand the role of a standalone @verify attribute, there
> > > is verification that can happen per attribute write, and there is
> > > final verification that can happen at region bind time. Either way
> > > anything verify would check is duplicated somewhere else, and the
> > > verification per attribute update is more precise. For example writes
> > > to @size can check for free space in parent decoder and fail if
> > > unavailable. Writes to targetX can fail if the memdev is not connected
> > > to this decoder's port topology, or the memdev is out of decoder
> > > resources. The final region bind will fail if mid-level switches are
> > > lacking decoder resources, or would require changing a decoder
> > > configuration that is pinned active.
> >
> > I strongly believe verification per attribute write will get too fragile. I'm
> > afraid it's going to require writing attributes in a specific order so that we
> > can do said verification in a sane way. We can skip that and just check it all
> > on bind. Most of that logic is what would be contained in verify(), so why not
> > expose it for userspace that may want to test out various configs without
> > actually trying to bind?
> 
> Because there's no harm in actually trying to bind. A verify attribute
> is at best redundant, or I am otherwise not understanding the proposed
> use case?
> 

That's the use case. Though I don't consider it redundant. All bind() can return
is errnos + what you mention below (and following LWN link).

> > Also, I like having ABI that helps userspace get details on the configuration
> > failure reason. You mention in the other reply, TRACE_EVENT. I suppose userspace
> > could use tracepoints, or scrape dmesg for this same info. Maybe it's 6 one way,
> > a half dozen the other. I'd be interested to know if there are other examples of
> > tracepoints being used by userspace in a way like this and what the experience
> > is like.
> >
> > To summarize, I think we need an atomic way to do verification (which obviously
> > happens at bind()), and I think we need UAPI to get the configuration error.
> 
> I expect higher order configuration error reporting and non-atomic
> pre-verification to come from user tooling.

But isn't that just duplicating code that we have to have in the kernel anyway?

> As for what the kernel can do at runtime in the absence of user tooling, or in
> the development of more aware tooling has been debated in the past [1]. In
> this case the entire decoder resource topology is visible in userspace, an3d
> while userspace can't atomically predict what will happen, it also does not
> need to because the admin should not be racing resource querying and resource
> consumption if they want to get a reliable answer. The reason I recommended
> TRACE_EVENT() rather than dev_dbg() is due to being able to filter event
> messages by cpu, pid, tid, uid... etc. Another approach I have seen upstream
> is to emit extra variables with a KOBJ_CHANGE event, but that is more about
> event reporting than extra information about provisioning failure.

Interesting. Thanks for the link, it looks like it never landed. I think trace
makes a good deal of sense considering all the options. I'm not convinced the
interface is "at best redundant". I'll just drop verify(). I have no further
arguments in favor and you don't sound convinced of the original ones.

> 

> [1]: https://lwn.net/Articles/657341/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] Region Creation
  2021-06-14 21:54       ` Ben Widawsky
@ 2021-06-14 22:21         ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2021-06-14 22:21 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, Alison Schofield, Ira Weiny, Jonathan Cameron, Vishal Verma

On Mon, Jun 14, 2021 at 2:54 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 21-06-14 14:04:32, Dan Williams wrote:
> > On Mon, Jun 14, 2021 at 9:12 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > On 21-06-11 17:44:02, Dan Williams wrote:
> > > > On Thu, Jun 10, 2021 at 11:58 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > > >
> > > > > CXL interleave sets and non-interleave sets are described via regions. A region
> > > > > is specified in the CXL 2.0 specification and the purpose is to create a
> > > > > standardized way to preserve the region across reboots.
> > > > >
> > > > > Introduced here is the basic mechanism to create and configure and delete a CXL
> > > > > region. Configuring a region simply means giving it a size, offset within the
> > > > > CFMWS window, UUID, and a target list. Enabling/activating a region, which
> > > > > ultimately means programming the HDM decoders in the chain, is left for later
> > > > > work.
> > > > >
> > > > > The patches are only minimally tested so far in QEMU emulation and so x1
> > > > > interleave is all that's supported.
> > > > >
> > > > > Here is a sample topology (also in patch #4)
> > > >
> > > > I'm just going to react to the attributes before looking at the
> > > > implementation to make sure we're level set.
> > > >
> > > > >
> > > > >     decoder1.0
> > > > >     ├── create_region
> > > > >     ├── delete_region
> > > > >     ├── devtype
> > > > >     ├── locked
> > > > >     ├── region1.0:0
> > > > >     │   ├── offset
> > > >
> > > > Is this the region's offset relative to the next available free space
> > > > in the parent decoder range? If this is output only I think it's ok,
> > > > but I think the address space allocation decision belongs to the
> > > > region driver at activation time. I.e. userspace does not have much of
> > > > a chance at specifying this relative all the other dynamic operations
> > > > that can be happening in the decoder.
> > > >
> > >
> > > This was my mistake. Offset will be determined by the driver and I intend for
> > > this to be read-only.
> > >
> > > > >     │   ├── size
> > > > >     │   ├── subsystem -> ../../../../../../../bus/cxl
> > > > >     │   ├── target0
> > > > >     │   ├── uevent
> > > > >     │   ├── uuid
> > > > >     │   └── verify
> > > >
> > > > I don't understand the role of a standalone @verify attribute, there
> > > > is verification that can happen per attribute write, and there is
> > > > final verification that can happen at region bind time. Either way
> > > > anything verify would check is duplicated somewhere else, and the
> > > > verification per attribute update is more precise. For example writes
> > > > to @size can check for free space in parent decoder and fail if
> > > > unavailable. Writes to targetX can fail if the memdev is not connected
> > > > to this decoder's port topology, or the memdev is out of decoder
> > > > resources. The final region bind will fail if mid-level switches are
> > > > lacking decoder resources, or would require changing a decoder
> > > > configuration that is pinned active.
> > >
> > > I strongly believe verification per attribute write will get too fragile. I'm
> > > afraid it's going to require writing attributes in a specific order so that we
> > > can do said verification in a sane way. We can skip that and just check it all
> > > on bind. Most of that logic is what would be contained in verify(), so why not
> > > expose it for userspace that may want to test out various configs without
> > > actually trying to bind?
> >
> > Because there's no harm in actually trying to bind. A verify attribute
> > is at best redundant, or I am otherwise not understanding the proposed
> > use case?
> >
>
> That's the use case. Though I don't consider it redundant. All bind() can return
> is errnos + what you mention below (and following LWN link).
>
> > > Also, I like having ABI that helps userspace get details on the configuration
> > > failure reason. You mention in the other reply, TRACE_EVENT. I suppose userspace
> > > could use tracepoints, or scrape dmesg for this same info. Maybe it's 6 one way,
> > > a half dozen the other. I'd be interested to know if there are other examples of
> > > tracepoints being used by userspace in a way like this and what the experience
> > > is like.
> > >
> > > To summarize, I think we need an atomic way to do verification (which obviously
> > > happens at bind()), and I think we need UAPI to get the configuration error.
> >
> > I expect higher order configuration error reporting and non-atomic
> > pre-verification to come from user tooling.
>
> But isn't that just duplicating code that we have to have in the kernel anyway?

It is, but it's less constrained. The kernel could only tell you yes
or no for a given region verification. The tooling can identify
tradeoffs and potential resource collisions across regions.

> > As for what the kernel can do at runtime in the absence of user tooling, or in
> > the development of more aware tooling has been debated in the past [1]. In
> > this case the entire decoder resource topology is visible in userspace, an3d
> > while userspace can't atomically predict what will happen, it also does not
> > need to because the admin should not be racing resource querying and resource
> > consumption if they want to get a reliable answer. The reason I recommended
> > TRACE_EVENT() rather than dev_dbg() is due to being able to filter event
> > messages by cpu, pid, tid, uid... etc. Another approach I have seen upstream
> > is to emit extra variables with a KOBJ_CHANGE event, but that is more about
> > event reporting than extra information about provisioning failure.
>
> Interesting. Thanks for the link, it looks like it never landed. I think trace
> makes a good deal of sense considering all the options. I'm not convinced the
> interface is "at best redundant". I'll just drop verify(). I have no further
> arguments in favor and you don't sound convinced of the original ones.

My main concern is the atomicity of the verify response when two
agents are racing through the resource tree. The user tooling can
either explicitly coordinate with other user tooling, or the admin
could implicitly avoid races. When the races happen they will likely
be cases where bind() fails and verify() said everything was ok. In
those cases we'll still need a mechanism to understand why bind()
failed and by that time it seems everything verify() might have been
able to offer has been reimplemented for tracing bind().

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/4] cxl/region: Add region creation ABI
  2021-06-11 13:31   ` Jonathan Cameron
@ 2021-06-16 17:38     ` Ben Widawsky
  0 siblings, 0 replies; 22+ messages in thread
From: Ben Widawsky @ 2021-06-16 17:38 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Alison Schofield, Dan Williams, Ira Weiny, Vishal Verma

On 21-06-11 14:31:43, Jonathan Cameron wrote:
> On Thu, 10 Jun 2021 11:57:22 -0700
> Ben Widawsky <ben.widawsky@intel.com> wrote:
> 
> > Regions are created as a child of the decoder that encompasses an
> > address space with constraints. Regions only exist for persistent
> > capacities.
> > 
> > As an example, creating a region with:
> > echo 1 > /sys/bus/cxl/devices/decoder1.0/create_region
> > 
> > Will yield /sys/bus/cxl/devices/decoder1.0/region1.0:0
> > 
> > That region may then be deleted with:
> > echo "region1.0:0 > /sys/bus/cxl/devices/decoder1.0/delete_region
> > 
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> Minor stuff inline.
> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl       |  19 +++
> >  .../driver-api/cxl/memory-devices.rst         |   8 +
> >  drivers/cxl/Makefile                          |   2 +-
> >  drivers/cxl/core.c                            |  71 ++++++++
> >  drivers/cxl/cxl.h                             |  11 ++
> >  drivers/cxl/mem.h                             |  19 +++
> >  drivers/cxl/region.c                          | 155 ++++++++++++++++++
> >  7 files changed, 284 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/cxl/region.c
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 0b6a2e6e8fbb..5bcbefd4ea38 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -127,3 +127,22 @@ Description:
> >  		memory (type-3). The 'target_type' attribute indicates the
> >  		current setting which may dynamically change based on what
> >  		memory regions are activated in this decode hierarchy.
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/create_region
> > +Date:		June, 2021
> > +KernelVersion:	v5.14
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		Writing a value of '1' will create a new uninitialized region
> > +		that will be mapped by the CXL decoderX.Y. Reading from this
> > +		node will return the last created region. Regions must be
> > +		subsequently configured and bound to a region driver before they
> > +		can be used.
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
> > +Date:		June, 2021
> > +KernelVersion:	v5.14
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		Deletes the named region. A region must be unbound from the
> > +		region driver before being deleted.
> 
> Perhaps call out an example of what a name looks like?
> 
> > diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> > index 487ce4f41d77..3066638b087f 100644
> > --- a/Documentation/driver-api/cxl/memory-devices.rst
> > +++ b/Documentation/driver-api/cxl/memory-devices.rst
> > @@ -39,6 +39,14 @@ CXL Core
> >  .. kernel-doc:: drivers/cxl/core.c
> >     :doc: cxl core
> >  
> > +CXL Regions
> > +-----------
> > +.. kernel-doc:: drivers/cxl/region.c
> > +   :doc: cxl region
> > +
> > +.. kernel-doc:: drivers/cxl/region.c
> > +   :identifiers:
> > +
> >  External Interfaces
> >  ===================
> >  
> > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> > index a29efb3e8ad2..2b9dd9187788 100644
> > --- a/drivers/cxl/Makefile
> > +++ b/drivers/cxl/Makefile
> > @@ -4,6 +4,6 @@ obj-$(CONFIG_CXL_MEM) += cxl_pci.o
> >  obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
> >  
> >  ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=CXL
> > -cxl_core-y := core.o
> > +cxl_core-y := core.o region.o
> >  cxl_pci-y := pci.o
> >  cxl_acpi-y := acpi.o
> > diff --git a/drivers/cxl/core.c b/drivers/cxl/core.c
> > index 1b9ee0b08384..cda09a9cd98e 100644
> > --- a/drivers/cxl/core.c
> > +++ b/drivers/cxl/core.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/slab.h>
> >  #include <linux/idr.h>
> >  #include "cxl.h"
> > +#include "mem.h"
> 
> This seems to be breaking the nice separation in cxl.
> Perhaps add a region.h instead?
> 
> >  
> >  /**
> >   * DOC: cxl core
> > @@ -119,7 +120,68 @@ static ssize_t target_list_show(struct device *dev,
> >  }
> >  static DEVICE_ATTR_RO(target_list);
> >  
> > +static ssize_t create_region_show(struct device *dev,
> > +				  struct device_attribute *attr, char *buf)
> > +{
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +	int rc;
> > +
> > +	device_lock(dev);
> > +	rc = sprintf(buf, "%s\n",
> > +		     cxld->youngest ? dev_name(&cxld->youngest->dev) : "");
> > +	device_unlock(dev);
> > +
> > +	return rc;
> > +}
> > +
> > +static ssize_t create_region_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +	struct cxl_region *region;
> > +	ssize_t rc;
> > +	int val;
> > +
> > +	rc = kstrtoint(buf, 0, &val);
> > +	if (rc)
> > +		return rc;
> > +	if (val != 1)
> > +		return -EINVAL;
> > +
> > +	region = cxl_alloc_region(cxld);
> > +	if (IS_ERR(region))
> > +		return PTR_ERR(region);
> > +
> > +	rc = cxl_add_region(cxld, region);
> > +	if (rc) {
> > +		cxl_free_region(cxld, region);
> > +		return rc;
> > +	}
> > +
> > +	cxld->youngest = region;
> 
> Care needed I think if a delete is called on whatever
> is the youngest region then this file read.
> 

I took all your other feedback, thanks. Good catch here. I *intended* youngest to
be racy (for simplicity sake, since it's informational only), but as it stands
here, it can oops.

> > +	return len;
> > +}
> > +static DEVICE_ATTR_RW(create_region);
> > +
> > +static ssize_t delete_region_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +	int rc;
> > +
> > +	rc = cxl_delete_region(cxld, buf);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(delete_region);
> > +
> >  static struct attribute *cxl_decoder_base_attrs[] = {
> > +	&dev_attr_create_region.attr,
> > +	&dev_attr_delete_region.attr,
> >  	&dev_attr_start.attr,
> >  	&dev_attr_size.attr,
> >  	&dev_attr_locked.attr,
> > @@ -170,7 +232,13 @@ static void cxl_decoder_release(struct device *dev)
> >  {
> >  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> >  	struct cxl_port *port = to_cxl_port(dev->parent);
> > +	struct cxl_region *region;
> >  
> > +	list_for_each_entry(region, &cxld->regions, list) {
> > +		cxl_delete_region(cxld, dev_name(&region->dev));
> > +	}
> 
> Nitpick, brackets not needed.
> 
> > +	dev_WARN_ONCE(dev, !ida_is_empty(&cxld->region_ida),
> > +		      "Lost track of a region");
> >  	ida_free(&port->decoder_ida, cxld->id);
> >  	kfree(cxld);
> >  }
> > @@ -475,8 +543,11 @@ cxl_decoder_alloc(struct cxl_port *port, int nr_targets, resource_size_t base,
> >  		.interleave_ways = interleave_ways,
> >  		.interleave_granularity = interleave_granularity,
> >  		.target_type = type,
> > +		.regions = LIST_HEAD_INIT(cxld->regions),
> >  	};
> >  
> > +	ida_init(&cxld->region_ida);
> > +
> >  	/* handle implied target_list */
> >  	if (interleave_ways == 1)
> >  		cxld->target[0] =
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index b988ea288f53..1ffc5e07e24d 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -7,6 +7,7 @@
> >  #include <linux/bitfield.h>
> >  #include <linux/bitops.h>
> >  #include <linux/io.h>
> > +#include <linux/uuid.h>
> >  
> >  /**
> >   * DOC: cxl objects
> > @@ -182,6 +183,9 @@ enum cxl_decoder_type {
> >   * @interleave_granularity: data stride per dport
> >   * @target_type: accelerator vs expander (type2 vs type3) selector
> >   * @flags: memory type capabilities and locking
> > + * @region_ida: allocator for region ids.
> > + * @regions: List of regions mapped (may be disabled) by this decoder.
> > + * @youngest: Last region created for this decoder.
> >   * @target: active ordered target list in current decoder configuration
> >   */
> >  struct cxl_decoder {
> > @@ -192,6 +196,9 @@ struct cxl_decoder {
> >  	int interleave_granularity;
> >  	enum cxl_decoder_type target_type;
> >  	unsigned long flags;
> > +	struct ida region_ida;
> > +	struct list_head regions;
> > +	struct cxl_region *youngest;
> >  	struct cxl_dport *target[];
> >  };
> >  
> > @@ -231,6 +238,10 @@ struct cxl_dport {
> >  	struct list_head list;
> >  };
> >  
> > +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld);
> > +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region);
> > +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region);
> > +int cxl_delete_region(struct cxl_decoder *cxld, const char *region);
> >  struct cxl_port *to_cxl_port(struct device *dev);
> >  struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
> >  				   resource_size_t component_reg_phys,
> > diff --git a/drivers/cxl/mem.h b/drivers/cxl/mem.h
> > index 13868ff7cadf..4c4eb06a966b 100644
> > --- a/drivers/cxl/mem.h
> > +++ b/drivers/cxl/mem.h
> > @@ -3,6 +3,8 @@
> >  #ifndef __CXL_MEM_H__
> >  #define __CXL_MEM_H__
> >  
> > +#include <linux/cdev.h>
> 
> Unrelated?
> 
> > +
> >  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> >  #define CXLMDEV_STATUS_OFFSET 0x0
> >  #define   CXLMDEV_DEV_FATAL BIT(0)
> > @@ -46,6 +48,23 @@ struct cxl_memdev {
> >  	int id;
> >  };
> >  
> > +/**
> > + * struct cxl_region - CXL region
> > + * @dev: This region's device.
> > + * @id: This regions id. Id is globally unique across all regions.
> > + * @res: Address space consumed by this region.
> > + * @list: Node in decoders region list.
> > + * @targets: The memory devices comprising the region.
> > + */
> > +struct cxl_region {
> > +	struct device dev;
> > +	int id;
> > +	struct resource res;
> > +	struct list_head list;
> > +	struct cxl_memdev *targets[];
> > +};
> > +
> > +
> 
> One line only.
> 
> >  /**
> >   * struct cxl_mem - A CXL memory device
> >   * @pdev: The PCI device associated with this CXL device.
> > diff --git a/drivers/cxl/region.c b/drivers/cxl/region.c
> > new file mode 100644
> > index 000000000000..1f47bc17bd50
> > --- /dev/null
> > +++ b/drivers/cxl/region.c
> > @@ -0,0 +1,155 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> > +#include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/idr.h>
> > +#include "cxl.h"
> > +#include "mem.h"
> > +
> > +/**
> > + * DOC: cxl region
> > + *
> > + * A CXL region encompasses a chunk of host physical address space that may be
> > + * consumed by a single device (x1 interleave aka linear) or across multiple
> > + * devices (xN interleaved). A region is a child device of a &struct
> > + * cxl_decoder. There may be multiple active regions under a single &struct
> > + * cxl_decoder. The common case for multiple regions would be several linear,
> > + * contiguous regions under a single decoder. Generally, there will be a 1:1
> > + * relationship between decoder and region when the region is interleaved.
> > + */
> > +
> > +static void cxl_region_release(struct device *dev);
> > +
> > +static const struct device_type cxl_region_type = {
> > +	.name = "cxl_region",
> > +	.release = cxl_region_release,
> > +};
> > +
> > +static struct cxl_region *to_cxl_region(struct device *dev)
> > +{
> > +	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> > +			  "not a cxl_region device\n"))
> > +		return NULL;
> > +
> > +	return container_of(dev, struct cxl_region, dev);
> > +}
> > +
> > +void cxl_free_region(struct cxl_decoder *cxld, struct cxl_region *region)
> > +{
> > +	ida_free(&cxld->region_ida, region->id);
> > +	kfree(region->targets);
> > +	kfree(region);
> > +}
> > +
> > +static void cxl_region_release(struct device *dev)
> > +{
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> > +	struct cxl_region *region;
> 	struct cxl_region *region = to_cxl_region(dev);
> ?
> Or just roll it into one line.
> 
> cxl_free_region(to_cxl_decoder(dev->parent), to_cxl_region(dev));
> 
> Of course may be affected by later patches so this doesn't make sense
> but I'm too lazy to check!
> 
> > +
> > +	region = to_cxl_region(dev);
> > +	cxl_free_region(cxld, region);
> > +}
> > +
> > +struct cxl_region *cxl_alloc_region(struct cxl_decoder *cxld)
> > +{
> > +	struct cxl_region *region;
> > +	int rc;
> > +
> > +	region = kzalloc(struct_size(region, targets, cxld->interleave_ways),
> > +			 GFP_KERNEL);
> > +	if (!region) {
> > +		pr_err("No memory\n");
> 
> Drop, or move to dev_err? 
> 
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> > +	if (rc < 0) {
> > +		dev_err(&cxld->dev, "Couldn't get a new id\n");
> > +		kfree(region);
> > +		return ERR_PTR(rc);
> > +	}
> > +	region->id = rc;
> > +
> > +	return region;
> > +}
> > +
> > +/**
> > + * cxl_add_region - Adds a region to a decoder
> > + * @cxld: Parent decoder.
> > + * @region: Region to be added to the decoder.
> > + *
> > + * This is the second step of region initialization. Regions exist within an
> > + * address space which is mapped by a @cxld, and that @cxld enforces constraints
> > + * upon the region as it is configured. Regions may be added to a @cxld but not
> > + * activated and therefore it is possible to have more regions in a @cxld than
> > + * there are interleave ways in the @cxld. Regions exist only for persistent
> > + * capacities.
> > + *
> > + * Return: zero if the region was added to the @cxld, else returns negative
> > + * error code.
> > + */
> > +int cxl_add_region(struct cxl_decoder *cxld, struct cxl_region *region)
> > +{
> > +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > +	struct device *dev = &region->dev;
> > +	struct resource *res;
> > +	int rc;
> > +
> > +	device_initialize(dev);
> > +	dev->parent = &cxld->dev;
> > +	dev->bus = &cxl_bus_type;
> > +	dev->type = &cxl_region_type;
> 
> Probably doesn't want the pm controls.
> 
> > +	rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, region->id);
> > +	if (rc)
> > +		goto err;
> > +
> > +	rc = device_add(dev);
> > +	if (rc)
> > +		goto err;
> > +
> > +	res = &region->res;
> > +	res->name = "Persistent Memory";
> > +	res->start = 0;
> > +	res->end = -1;
> > +	res->flags = IORESOURCE_MEM;
> > +	res->desc = IORES_DESC_PERSISTENT_MEMORY;
> > +
> > +	dev_dbg(dev, "Added %s to %s\n", dev_name(dev), dev_name(&cxld->dev));
> > +
> > +	return 0;
> > +
> > +err:
> > +	put_device(dev);
> > +	return rc;
> > +}
> > +
> > +static struct cxl_region *
> > +cxl_find_region_by_name(struct cxl_decoder *cxld, const char *name)
> > +{
> > +	struct device *region_dev;
> > +
> > +	region_dev = device_find_child_by_name(&cxld->dev, name);
> > +	if (!region_dev)
> > +		return ERR_PTR(-ENOENT);
> > +
> > +	return to_cxl_region(region_dev);
> > +}
> > +
> > +int cxl_delete_region(struct cxl_decoder *cxld, const char *region_name)
> > +{
> > +	struct cxl_region *region;
> > +
> > +	region = cxl_find_region_by_name(cxld, region_name);
> > +	if (IS_ERR(region))
> > +		return PTR_ERR(region);
> > +
> > +	dev_dbg(&cxld->dev, "Requested removal of %s from %s\n",
> > +		dev_name(&region->dev), dev_name(&cxld->dev));
> > +
> > +	device_unregister(&region->dev);
> > +	put_device(&region->dev);
> > +
> > +	return 0;
> > +}
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-06-16 17:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10 18:57 [RFC PATCH 0/4] Region Creation Ben Widawsky
2021-06-10 18:57 ` [RFC PATCH 1/4] cxl/region: Add region creation ABI Ben Widawsky
2021-06-11 13:31   ` Jonathan Cameron
2021-06-16 17:38     ` Ben Widawsky
2021-06-10 18:57 ` [RFC PATCH 2/4] cxl/region: Create attribute structure / verify Ben Widawsky
2021-06-11 13:37   ` Jonathan Cameron
2021-06-12  0:59   ` Dan Williams
2021-06-14 16:12     ` Ben Widawsky
2021-06-10 18:57 ` [RFC PATCH 3/4] cxl: Move cxl_memdev conversion helper to mem.h Ben Widawsky
2021-06-10 18:57 ` [RFC PATCH 4/4] cxl/region: Introduce concept of region configuration Ben Widawsky
2021-06-11 13:52   ` Jonathan Cameron
2021-06-14 16:18     ` Ben Widawsky
2021-06-14 16:20       ` Jonathan Cameron
2021-06-11 13:11 ` [RFC PATCH 0/4] Region Creation Jonathan Cameron
2021-06-11 13:53   ` Jonathan Cameron
2021-06-11 16:12     ` Ben Widawsky
2021-06-12  0:44 ` Dan Williams
2021-06-14  8:20   ` Jonathan Cameron
2021-06-14 16:12   ` Ben Widawsky
2021-06-14 21:04     ` Dan Williams
2021-06-14 21:54       ` Ben Widawsky
2021-06-14 22:21         ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).