linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Region creation/configuration ABI
@ 2022-02-25  6:00 Ben Widawsky
  2022-02-25  6:00 ` [RFC PATCH 1/2] cxl/region: Add region creation ABI Ben Widawsky
  2022-02-25  6:00 ` [RFC PATCH 2/2] cxl/region: Introduce concept of region configuration Ben Widawsky
  0 siblings, 2 replies; 10+ messages in thread
From: Ben Widawsky @ 2022-02-25  6:00 UTC (permalink / raw)
  To: linux-cxl
  Cc: patches, Ben Widawsky, Alison Schofield, Dan Williams, Ira Weiny,
	Jonathan Cameron, Vishal Verma

I'm looking for feedback on the next generation of the region configuration and
creation ABI. The primary changes from the v3 posting of the region driver [1]
are:
- Decoders instead of memdevs for targets
- Substantial rework around creation
- configuration sanitization at ABI boundary
- teardown region on memdev/decoder removal

With the new changes, it makes the most sense to rework the series so that
existing region enumeration (setup by boot firmware) be done prior to allowing
region creation. This is because the changes requested will require claiming
resources at configuration time rather than region->probe() time.

That said, these patches do reflect the proposed ABI and they can be reviewed in
parallel with that work. I'd appreciate feedback around whether the proposed ABI
is missing anything, and, whether there are inherent flaws in the
implementation.

[1]: https://lore.kernel.org/linux-cxl/20220128002707.391076-1-ben.widawsky@intel.com/

Ben Widawsky (2):
  cxl/region: Add region creation ABI
  cxl/region: Introduce concept of region configuration

 Documentation/ABI/testing/sysfs-bus-cxl       |  82 +++
 .../driver-api/cxl/memory-devices.rst         |  11 +
 drivers/cxl/core/Makefile                     |   1 +
 drivers/cxl/core/core.h                       |   3 +
 drivers/cxl/core/port.c                       |  26 +
 drivers/cxl/core/region.c                     | 568 ++++++++++++++++++
 drivers/cxl/cxl.h                             |  21 +-
 drivers/cxl/region.h                          |  93 +++
 tools/testing/cxl/Kbuild                      |   1 +
 9 files changed, 805 insertions(+), 1 deletion(-)
 create mode 100644 drivers/cxl/core/region.c
 create mode 100644 drivers/cxl/region.h


base-commit: 3bdf187d313e067de2a81109f9a1dd3da7f3dc2c
-- 
2.35.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/2] cxl/region: Add region creation ABI
  2022-02-25  6:00 [RFC PATCH 0/2] Region creation/configuration ABI Ben Widawsky
@ 2022-02-25  6:00 ` Ben Widawsky
  2022-02-28 23:48   ` Dan Williams
  2022-02-25  6:00 ` [RFC PATCH 2/2] cxl/region: Introduce concept of region configuration Ben Widawsky
  1 sibling, 1 reply; 10+ messages in thread
From: Ben Widawsky @ 2022-02-25  6:00 UTC (permalink / raw)
  To: linux-cxl
  Cc: patches, Ben Widawsky, Alison Schofield, Dan Williams, Ira Weiny,
	Jonathan Cameron, Vishal Verma

Regions are created as a child of the decoder that encompasses an
address space with constraints. Regions have a number of attributes that
must be configured before the region can be activated.

The ABI is not meant to be secure, but is meant to avoid accidental
races. As a result, a buggy process may create a region by name that was
allocated by a different process. However, multiple processes which are
trying not to race with each other shouldn't need special
synchronization to do so.

// Allocate a new region name
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)

// Create a new region by name
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_region
do true; done

// Region now exists in sysfs
stat -t /sys/bus/cxl/devices/decoder0.0/$region

// Delete the region, and name
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>

---
Changes since v5 (all Dan):
- Fix erroneous return on create
- Fix ida leak on error
- forward declare to_cxl_region instead of cxl_region_release
- Use REGION_DEAD in the right place
- Allocate next id in region_alloc
---
 Documentation/ABI/testing/sysfs-bus-cxl       |  23 ++
 .../driver-api/cxl/memory-devices.rst         |  11 +
 drivers/cxl/core/Makefile                     |   1 +
 drivers/cxl/core/core.h                       |   3 +
 drivers/cxl/core/port.c                       |  18 ++
 drivers/cxl/core/region.c                     | 223 ++++++++++++++++++
 drivers/cxl/cxl.h                             |   5 +
 drivers/cxl/region.h                          |  28 +++
 tools/testing/cxl/Kbuild                      |   1 +
 9 files changed, 313 insertions(+)
 create mode 100644 drivers/cxl/core/region.c
 create mode 100644 drivers/cxl/region.h

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 7c2b846521f3..e5db45ea70ad 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -163,3 +163,26 @@ Description:
 		memory (type-3). The 'target_type' attribute indicates the
 		current setting which may dynamically change based on what
 		memory regions are activated in this decode hierarchy.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/create_region
+Date:		January, 2022
+KernelVersion:	v5.18
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Write a value of the form 'regionX.Y:Z' to instantiate a new
+		region within the decode range bounded by decoderX.Y. The value
+		written must match the current value returned from reading this
+		attribute. This behavior lets the kernel arbitrate racing
+		attempts to create a region. The thread that fails to write
+		loops and tries the next value. Regions must be created for root
+		decoders, and must subsequently configured and bound to a region
+		driver before they can be used.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
+Date:		January, 2022
+KernelVersion:	v5.18
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Deletes the named region.  The attribute expects a region in the
+		form "regionX.Y:Z". The region's name, allocated by reading
+		create_region, will also be released.
diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
index db476bb170b6..66ddc58a21b1 100644
--- a/Documentation/driver-api/cxl/memory-devices.rst
+++ b/Documentation/driver-api/cxl/memory-devices.rst
@@ -362,6 +362,17 @@ CXL Core
 .. kernel-doc:: drivers/cxl/core/mbox.c
    :doc: cxl mbox
 
+CXL Regions
+-----------
+.. kernel-doc:: drivers/cxl/region.h
+   :identifiers:
+
+.. kernel-doc:: drivers/cxl/core/region.c
+   :doc: cxl core region
+
+.. kernel-doc:: drivers/cxl/core/region.c
+   :identifiers:
+
 External Interfaces
 ===================
 
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 6d37cd78b151..39ce8f2f2373 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_CXL_BUS) += cxl_core.o
 ccflags-y += -I$(srctree)/drivers/cxl
 cxl_core-y := port.o
 cxl_core-y += pmem.o
+cxl_core-y += region.o
 cxl_core-y += regs.o
 cxl_core-y += memdev.o
 cxl_core-y += mbox.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 1a50c0fc399c..adfd42370b28 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -9,6 +9,9 @@ extern const struct device_type cxl_nvdimm_type;
 
 extern struct attribute_group cxl_base_attribute_group;
 
+extern struct device_attribute dev_attr_create_region;
+extern struct device_attribute dev_attr_delete_region;
+
 struct cxl_send_command;
 struct cxl_mem_query_commands;
 int cxl_query_cmd(struct cxl_memdev *cxlmd,
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 1e785a3affaa..f3e1313217a8 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -9,6 +9,7 @@
 #include <linux/idr.h>
 #include <cxlmem.h>
 #include <cxlpci.h>
+#include <region.h>
 #include <cxl.h>
 #include "core.h"
 
@@ -213,6 +214,8 @@ static struct attribute_group cxl_decoder_base_attribute_group = {
 };
 
 static struct attribute *cxl_decoder_root_attrs[] = {
+	&dev_attr_create_region.attr,
+	&dev_attr_delete_region.attr,
 	&dev_attr_cap_pmem.attr,
 	&dev_attr_cap_ram.attr,
 	&dev_attr_cap_type2.attr,
@@ -270,6 +273,8 @@ static void cxl_decoder_release(struct device *dev)
 	struct cxl_decoder *cxld = to_cxl_decoder(dev);
 	struct cxl_port *port = to_cxl_port(dev->parent);
 
+	ida_free(&cxld->region_ida, cxld->next_region_id);
+	ida_destroy(&cxld->region_ida);
 	ida_free(&port->decoder_ida, cxld->id);
 	kfree(cxld);
 }
@@ -1244,6 +1249,13 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 	cxld->target_type = CXL_DECODER_EXPANDER;
 	cxld->platform_res = (struct resource)DEFINE_RES_MEM(0, 0);
 
+	mutex_init(&cxld->id_lock);
+	ida_init(&cxld->region_ida);
+	rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
+	if (rc < 0)
+		goto err;
+
+	cxld->next_region_id = rc;
 	return cxld;
 err:
 	kfree(cxld);
@@ -1502,6 +1514,12 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
 }
 EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
 
+bool schedule_cxl_region_unregister(struct cxl_region *cxlr)
+{
+	return queue_work(cxl_bus_wq, &cxlr->unregister_work);
+}
+EXPORT_SYMBOL_NS_GPL(schedule_cxl_region_unregister, CXL);
+
 /* for user tooling to ensure port disable work has completed */
 static ssize_t flush_store(struct bus_type *bus, const char *buf, size_t count)
 {
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
new file mode 100644
index 000000000000..a934938f8630
--- /dev/null
+++ b/drivers/cxl/core/region.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <region.h>
+#include <cxl.h>
+#include "core.h"
+
+/**
+ * DOC: cxl core region
+ *
+ * CXL Regions represent mapped memory capacity in system physical address
+ * space. Whereas the CXL Root Decoders identify the bounds of potential CXL
+ * Memory ranges, Regions represent the active mapped capacity by the HDM
+ * Decoder Capability structures throughout the Host Bridges, Switches, and
+ * Endpoints in the topology.
+ */
+
+
+static struct cxl_region *to_cxl_region(struct device *dev);
+
+static void cxl_region_release(struct device *dev)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
+	struct cxl_region *cxlr = to_cxl_region(dev);
+
+	dev_dbg(&cxld->dev, "Releasing %s\n", dev_name(dev));
+	ida_free(&cxld->region_ida, cxlr->id);
+	kfree(cxlr);
+	put_device(&cxld->dev);
+}
+
+static const struct device_type cxl_region_type = {
+	.name = "cxl_region",
+	.release = cxl_region_release,
+};
+
+static struct cxl_region *to_cxl_region(struct device *dev)
+{
+	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
+			  "not a cxl_region device\n"))
+		return NULL;
+
+	return container_of(dev, struct cxl_region, dev);
+}
+
+static void unregister_region(struct work_struct *work)
+{
+	struct cxl_region *cxlr;
+
+	cxlr = container_of(work, typeof(*cxlr), unregister_work);
+	device_unregister(&cxlr->dev);
+}
+
+static void schedule_unregister(void *cxlr)
+{
+	schedule_cxl_region_unregister(cxlr);
+}
+
+static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
+{
+	struct cxl_region *cxlr;
+	struct device *dev;
+	int rc;
+
+	lockdep_assert_held(&cxld->id_lock);
+
+	rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
+	if (rc < 0) {
+		dev_dbg(dev, "Failed to get next cached id (%d)\n", rc);
+		return ERR_PTR(rc);
+	}
+
+	cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL);
+	if (!cxlr) {
+		ida_free(&cxld->region_ida, rc);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	cxld->next_region_id = rc;
+
+	dev = &cxlr->dev;
+	device_initialize(dev);
+	dev->parent = &cxld->dev;
+	device_set_pm_not_required(dev);
+	dev->bus = &cxl_bus_type;
+	dev->type = &cxl_region_type;
+	INIT_WORK(&cxlr->unregister_work, unregister_region);
+
+	return cxlr;
+}
+
+/**
+ * devm_cxl_add_region - Adds a region to a decoder
+ * @cxld: Parent decoder.
+ * @cxlr: Region to be added to the decoder.
+ *
+ * This is the second step of region initialization. Regions exist within an
+ * address space which is mapped by a @cxld. That @cxld must be a root decoder,
+ * and it enforces constraints upon the region as it is configured.
+ *
+ * Return: 0 if the region was added to the @cxld, else returns negative error
+ * code. The region will be named "regionX.Y.Z" where X is the port, Y is the
+ * decoder id, and Z is the region number.
+ */
+static struct cxl_region *devm_cxl_add_region(struct cxl_decoder *cxld)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+	struct cxl_region *cxlr;
+	struct device *dev;
+	int rc;
+
+	cxlr = cxl_region_alloc(cxld);
+	if (IS_ERR(cxlr))
+		return cxlr;
+
+	dev = &cxlr->dev;
+
+	cxlr->id = cxld->next_region_id;
+	rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, cxlr->id);
+	if (rc)
+		goto err_out;
+
+	/* affirm that release will have access to the decoder's region ida  */
+	get_device(&cxld->dev);
+
+	rc = device_add(dev);
+	if (rc)
+		goto err_put;
+
+	rc = devm_add_action_or_reset(port->uport, schedule_unregister, cxlr);
+	if (rc)
+		goto err_put;
+
+	return cxlr;
+
+err_put:
+	put_device(&cxld->dev);
+
+err_out:
+	put_device(dev);
+	return ERR_PTR(rc);
+}
+
+static ssize_t create_region_show(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	struct cxl_port *port = to_cxl_port(dev->parent);
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+
+	return sysfs_emit(buf, "region%d.%d:%d\n", port->id, cxld->id,
+			  cxld->next_region_id);
+}
+
+static ssize_t create_region_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_port *port = to_cxl_port(dev->parent);
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	struct cxl_region *cxlr;
+	int d, p, r, rc = 0;
+
+	if (sscanf(buf, "region%d.%d:%d", &p, &d, &r) != 3)
+		return -EINVAL;
+
+	if (port->id != p || cxld->id != d)
+		return -EINVAL;
+
+	rc = mutex_lock_interruptible(&cxld->id_lock);
+	if (rc)
+		return rc;
+
+	if (cxld->next_region_id != r) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	cxlr = devm_cxl_add_region(cxld);
+	rc = 0;
+	dev_dbg(dev, "Created %s\n", dev_name(&cxlr->dev));
+
+out:
+	mutex_unlock(&cxld->id_lock);
+	if (rc)
+		return rc;
+	return len;
+}
+DEVICE_ATTR_RW(create_region);
+
+static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld,
+						  const char *name)
+{
+	struct device *region_dev;
+
+	region_dev = device_find_child_by_name(&cxld->dev, name);
+	if (!region_dev)
+		return ERR_PTR(-ENOENT);
+
+	return to_cxl_region(region_dev);
+}
+
+static ssize_t delete_region_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_port *port = to_cxl_port(dev->parent);
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	struct cxl_region *cxlr;
+
+	cxlr = cxl_find_region_by_name(cxld, buf);
+	if (IS_ERR(cxlr))
+		return PTR_ERR(cxlr);
+
+	if (!test_and_set_bit(REGION_DEAD, &cxlr->flags))
+		devm_release_action(port->uport, schedule_unregister, cxlr);
+	put_device(&cxlr->dev);
+
+	return len;
+}
+DEVICE_ATTR_WO(delete_region);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b4047a310340..d5397f7dfcf4 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -221,6 +221,8 @@ enum cxl_decoder_type {
  * @target_type: accelerator vs expander (type2 vs type3) selector
  * @flags: memory type capabilities and locking
  * @target_lock: coordinate coherent reads of the target list
+ * @region_ida: allocator for region ids.
+ * @next_region_id: Cached region id for next region.
  * @nr_targets: number of elements in @target
  * @target: active ordered target list in current decoder configuration
  */
@@ -236,6 +238,9 @@ struct cxl_decoder {
 	enum cxl_decoder_type target_type;
 	unsigned long flags;
 	seqlock_t target_lock;
+	struct mutex id_lock;
+	struct ida region_ida;
+	int next_region_id;
 	int nr_targets;
 	struct cxl_dport *target[];
 };
diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
new file mode 100644
index 000000000000..7025f6785f83
--- /dev/null
+++ b/drivers/cxl/region.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2021 Intel Corporation. */
+#ifndef __CXL_REGION_H__
+#define __CXL_REGION_H__
+
+#include <linux/uuid.h>
+
+#include "cxl.h"
+
+/**
+ * struct cxl_region - CXL region
+ * @dev: This region's device.
+ * @id: This region's id. Id is globally unique across all regions.
+ * @flags: Flags representing the current state of the region.
+ * @unregister_work: Async unregister to allow attrs to take device_lock.
+ */
+struct cxl_region {
+	struct device dev;
+	int id;
+	unsigned long flags;
+#define REGION_DEAD 0
+	struct work_struct unregister_work;
+
+};
+
+bool schedule_cxl_region_unregister(struct cxl_region *cxlr);
+
+#endif
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 82e49ab0937d..3fe6d34e6d59 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -46,6 +46,7 @@ cxl_core-y += $(CXL_CORE_SRC)/memdev.o
 cxl_core-y += $(CXL_CORE_SRC)/mbox.o
 cxl_core-y += $(CXL_CORE_SRC)/pci.o
 cxl_core-y += $(CXL_CORE_SRC)/hdm.o
+cxl_core-y += $(CXL_CORE_SRC)/region.o
 cxl_core-y += config_check.o
 
 obj-m += test/
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/2] cxl/region: Introduce concept of region configuration
  2022-02-25  6:00 [RFC PATCH 0/2] Region creation/configuration ABI Ben Widawsky
  2022-02-25  6:00 ` [RFC PATCH 1/2] cxl/region: Add region creation ABI Ben Widawsky
@ 2022-02-25  6:00 ` Ben Widawsky
  2022-03-01  1:16   ` Dan Williams
  1 sibling, 1 reply; 10+ messages in thread
From: Ben Widawsky @ 2022-02-25  6:00 UTC (permalink / raw)
  To: linux-cxl
  Cc: patches, Ben Widawsky, kernel test robot, Alison Schofield,
	Dan Williams, Ira Weiny, Jonathan Cameron, Vishal Verma

The region creation APIs create a vacant region. Configuring the region
works in the same way as similar subsystems such as devdax. Sysfs attrs
will be provided to allow userspace to configure the region.  Finally
once all configuration is complete, userspace may activate the region.

Introduced here are the most basic attributes needed to configure a
region. Details of these attribute are described in the ABI
Documentation.

A example is provided below:

/sys/bus/cxl/devices/region0.0:0
├── devtype
├── interleave_granularity
├── interleave_ways
├── modalias
├── offset
├── size
├── subsystem -> ../../../../../../bus/cxl
├── target0
├── uevent
└── uuid

Reported-by: kernel test robot <lkp@intel.com> (v2)
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
Changes since v3:
- Make target be a decoder
- Use device_lock for protecting config/probe race
- Teardown region on decoder removal

Size is still not handled.
---
 Documentation/ABI/testing/sysfs-bus-cxl |  59 ++++
 drivers/cxl/core/port.c                 |   8 +
 drivers/cxl/core/region.c               | 351 +++++++++++++++++++++++-
 drivers/cxl/cxl.h                       |  16 +-
 drivers/cxl/region.h                    |  65 +++++
 5 files changed, 495 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index e5db45ea70ad..c447826e8286 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -186,3 +186,62 @@ Description:
 		Deletes the named region.  The attribute expects a region in the
 		form "regionX.Y:Z". The region's name, allocated by reading
 		create_region, will also be released.
+		Deletes the named region. A region must be unbound from the
+		region driver before being deleted. The attributes expects a
+		region in the form "regionX.Y:Z". The region's name, allocated
+		by reading create_region, will also be released.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/resource
+Date:		January, 2022
+KernelVersion:	v5.18
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		A region is a contiguous partition of a CXL Root decoder address
+		space. Region capacity is allocated by writing to the size
+		attribute, the resulting physical address base determined by the
+		driver is reflected here.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/size
+Date:		January, 2022
+KernelVersion:	v5.18
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		System physical address space to be consumed by the region.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/interleave_ways
+Date:		January, 2022
+KernelVersion:	v5.18
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Configures the number of devices participating in the region is
+		set by writing this value. Each device will provide
+		1/interleave_ways of storage for the region.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/interleave_granularity
+Date:		January, 2022
+KernelVersion:	v5.18
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Set the number of consecutive bytes each device in the
+		interleave set will claim. The possible interleave granularity
+		values are determined by the CXL spec and the participating
+		devices.
+
+What:		/sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/uuid
+Date:		January, 2022
+KernelVersion:	v5.18
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Write a unique identifier for the region. This field must be set
+		for persistent regions and it must not conflict with the UUID of
+		another region. If this field is set for volatile regions, the
+		value is ignored.
+
+What: /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/endpoint_decoder[0..interleave_ways]
+Date:		January, 2022
+KernelVersion:	v5.18
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		Write a decoder object that is unused and will participate in
+		decoding memory transactions for the interleave set, ie.
+		decoderX.Y. All attributes must be populated.
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index f3e1313217a8..0eff36f748c3 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1415,6 +1415,14 @@ EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
 
 static void cxld_unregister(void *dev)
 {
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+
+	if (cxld->cxlr) {
+		mutex_lock(&cxld->cxlr->remove_lock);
+		device_release_driver(&cxld->cxlr->dev);
+		mutex_unlock(&cxld->cxlr->remove_lock);
+	}
+
 	device_unregister(dev);
 }
 
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a934938f8630..2b17b0af48de 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2,9 +2,12 @@
 /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
 #include <linux/device.h>
 #include <linux/module.h>
+#include <linux/sizes.h>
 #include <linux/slab.h>
+#include <linux/uuid.h>
 #include <linux/idr.h>
 #include <region.h>
+#include <cxlmem.h>
 #include <cxl.h>
 #include "core.h"
 
@@ -16,28 +19,367 @@
  * Memory ranges, Regions represent the active mapped capacity by the HDM
  * Decoder Capability structures throughout the Host Bridges, Switches, and
  * Endpoints in the topology.
+ *
+ * Region configuration has some ordering constraints:
+ * - Size: Must be set after all targets
+ * - Targets: Must be set after interleave ways
+ * - Interleave ways: Must be set after Interleave Granularity
+ *
+ * UUID may be set at any time before binding the driver to the region.
  */
 
 
-static struct cxl_region *to_cxl_region(struct device *dev);
+static const struct attribute_group region_interleave_group;
+
+#define _REGION_ATTR_RO(name)                                                  \
+	static ssize_t name##_show(struct device *dev,                         \
+				   struct device_attribute *attr, char *buf)   \
+	{                                                                      \
+		struct cxl_region *cxlr = to_cxl_region(dev);                  \
+		if (cxlr->flags & REGION_DEAD)                                 \
+			return -ENODEV;                                        \
+		return show_##name(to_cxl_region(dev), buf);                   \
+	}
+
+#define REGION_ATTR_RO(name)                                                   \
+	_REGION_ATTR_RO(name)                                                  \
+	static DEVICE_ATTR_RO(name)
+
+#define _REGION_ATTR_WO(name)                                                  \
+	static ssize_t name##_store(struct device *dev,                        \
+				    struct device_attribute *attr,             \
+				    const char *buf, size_t len)               \
+	{                                                                      \
+		int ret;                                                       \
+		if (device_lock_interruptible(dev) < 0)                        \
+			return -EINTR;                                         \
+		if (dev->driver) {                                             \
+			device_unlock(dev);                                    \
+			return -EBUSY;                                         \
+		}                                                              \
+		ret = store_##name(to_cxl_region(dev), buf, len);              \
+		device_unlock(dev);                                            \
+		return ret;                                                    \
+	}
+
+#define REGION_ATTR_RW(name)                                                   \
+	_REGION_ATTR_RO(name)                                                  \
+	_REGION_ATTR_WO(name)                                                  \
+	static DEVICE_ATTR_RW(name)
+
+#define TARGET_ATTR_RW(n)                                                      \
+	static ssize_t target##n##_show(                                       \
+		struct device *dev, struct device_attribute *attr, char *buf)  \
+	{                                                                      \
+		return show_targetN(to_cxl_region(dev), buf, (n));             \
+	}                                                                      \
+	static ssize_t target##n##_store(struct device *dev,                   \
+					 struct device_attribute *attr,        \
+					 const char *buf, size_t len)          \
+	{                                                                      \
+		int ret;                                                       \
+		if (device_lock_interruptible(dev) < 0)                        \
+			return -EINTR;                                         \
+		if (dev->driver) {                                             \
+			device_unlock(dev);                                    \
+			return -EBUSY;                                         \
+		}                                                              \
+		ret = store_targetN(to_cxl_region(dev), buf, (n), len);        \
+		device_unlock(dev);                                            \
+		return ret;                                                    \
+	}                                                                      \
+	static DEVICE_ATTR_RW(target##n)
+
+static void remove_target(struct cxl_region *cxlr, int target)
+{
+	struct cxl_decoder *cxld;
+
+	mutex_lock(&cxlr->remove_lock);
+	cxld = cxlr->targets[target];
+	if (cxld) {
+		cxld->cxlr = NULL;
+		put_device(&cxld->dev);
+	}
+	cxlr->targets[target] = NULL;
+	mutex_unlock(&cxlr->remove_lock);
+}
 
 static void cxl_region_release(struct device *dev)
 {
 	struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
 	struct cxl_region *cxlr = to_cxl_region(dev);
+	int i;
 
 	dev_dbg(&cxld->dev, "Releasing %s\n", dev_name(dev));
 	ida_free(&cxld->region_ida, cxlr->id);
+	for (i = 0; i < cxlr->interleave_ways; i++)
+		remove_target(cxlr, i);
 	kfree(cxlr);
 	put_device(&cxld->dev);
 }
 
+static ssize_t show_interleave_ways(struct cxl_region *cxlr, char *buf)
+{
+	return sysfs_emit(buf, "%d\n", cxlr->interleave_ways);
+}
+
+static ssize_t store_interleave_ways(struct cxl_region *cxlr, const char *buf,
+				     size_t len)
+{
+	struct cxl_decoder *rootd;
+	int ret, val;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+	if (!cxlr->interleave_granularity) {
+		dev_dbg(&cxlr->dev, "IG must be set before IW\n");
+		return -ENXIO;
+	}
+	if (cxlr->interleave_ways)
+		return -EOPNOTSUPP;
+
+	rootd = to_cxl_decoder(cxlr->dev.parent);
+	if (!cxl_is_interleave_ways_valid(cxlr, rootd, val))
+		return -EINVAL;
+
+	cxlr->interleave_ways = val;
+
+	ret = sysfs_update_group(&cxlr->dev.kobj, &region_interleave_group);
+	if (ret < 0) {
+		cxlr->interleave_ways = 0;
+		return ret;
+	}
+
+	return len;
+}
+REGION_ATTR_RW(interleave_ways);
+
+static ssize_t show_interleave_granularity(struct cxl_region *cxlr, char *buf)
+{
+	return sysfs_emit(buf, "%d\n", cxlr->interleave_granularity);
+}
+
+static ssize_t store_interleave_granularity(struct cxl_region *cxlr,
+					    const char *buf, size_t len)
+{
+	struct cxl_decoder *rootd;
+	int val, ret;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+	rootd = to_cxl_decoder(cxlr->dev.parent);
+	if (!cxl_is_interleave_granularity_valid(rootd, val))
+		return -EINVAL;
+
+	cxlr->interleave_granularity = val;
+
+	return len;
+}
+REGION_ATTR_RW(interleave_granularity);
+
+static ssize_t show_offset(struct cxl_region *cxlr, char *buf)
+{
+	if (!cxlr->res)
+		return sysfs_emit(buf, "\n");
+
+	return sysfs_emit(buf, "%pa\n", &cxlr->res->start);
+}
+REGION_ATTR_RO(offset);
+
+static ssize_t show_size(struct cxl_region *cxlr, char *buf)
+{
+	return sysfs_emit(buf, "%llu\n", cxlr->size);
+}
+
+static ssize_t store_size(struct cxl_region *cxlr, const char *buf, size_t len)
+{
+	unsigned long long val;
+	ssize_t rc;
+
+	rc = kstrtoull(buf, 0, &val);
+	if (rc)
+		return rc;
+
+	cxlr->size = val;
+	return len;
+}
+REGION_ATTR_RW(size);
+
+static ssize_t show_uuid(struct cxl_region *cxlr, char *buf)
+{
+	return sysfs_emit(buf, "%pUb\n", &cxlr->uuid);
+}
+
+static int is_dupe(struct device *match, void *_cxlr)
+{
+	struct cxl_region *c, *cxlr = _cxlr;
+
+	if (!is_cxl_region(match))
+		return 0;
+
+	if (&cxlr->dev == match)
+		return 0;
+
+	c = to_cxl_region(match);
+	if (uuid_equal(&c->uuid, &cxlr->uuid))
+		return -EEXIST;
+
+	return 0;
+}
+
+static ssize_t store_uuid(struct cxl_region *cxlr, const char *buf, size_t len)
+{
+	ssize_t rc;
+	uuid_t temp;
+
+	if (len != UUID_STRING_LEN + 1)
+		return -EINVAL;
+
+	rc = uuid_parse(buf, &temp);
+	if (rc)
+		return rc;
+
+	rc = bus_for_each_dev(&cxl_bus_type, NULL, cxlr, is_dupe);
+	if (rc < 0)
+		return false;
+
+	cxlr->uuid = temp;
+	return len;
+}
+REGION_ATTR_RW(uuid);
+
+static struct attribute *region_attrs[] = {
+	&dev_attr_interleave_ways.attr,
+	&dev_attr_interleave_granularity.attr,
+	&dev_attr_offset.attr,
+	&dev_attr_size.attr,
+	&dev_attr_uuid.attr,
+	NULL,
+};
+
+static const struct attribute_group region_group = {
+	.attrs = region_attrs,
+};
+
+static size_t show_targetN(struct cxl_region *cxlr, char *buf, int n)
+{
+	if (!cxlr->targets[n])
+		return sysfs_emit(buf, "\n");
+
+	return sysfs_emit(buf, "%s\n", dev_name(&cxlr->targets[n]->dev));
+}
+
+static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int n,
+			    size_t len)
+{
+	struct cxl_decoder *cxld;
+	struct device *cxld_dev;
+
+	if (len == 1 || cxlr->targets[n])
+		remove_target(cxlr, n);
+
+	/* Remove target special case */
+	if (len == 1) {
+		device_unlock(&cxlr->dev);
+		return len;
+	}
+
+	cxld_dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
+	if (!cxld_dev)
+		return -ENOENT;
+
+	if (!is_cxl_decoder(cxld_dev)) {
+		put_device(cxld_dev);
+		return -EPERM;
+	}
+
+	if (!is_cxl_endpoint(to_cxl_port(cxld_dev->parent))) {
+		put_device(cxld_dev);
+		return -EINVAL;
+	}
+
+	/* decoder reference is held until teardown */
+	cxld = to_cxl_decoder(cxld_dev);
+	cxlr->targets[n] = cxld;
+	cxld->cxlr = cxlr;
+
+	return len;
+}
+
+TARGET_ATTR_RW(0);
+TARGET_ATTR_RW(1);
+TARGET_ATTR_RW(2);
+TARGET_ATTR_RW(3);
+TARGET_ATTR_RW(4);
+TARGET_ATTR_RW(5);
+TARGET_ATTR_RW(6);
+TARGET_ATTR_RW(7);
+TARGET_ATTR_RW(8);
+TARGET_ATTR_RW(9);
+TARGET_ATTR_RW(10);
+TARGET_ATTR_RW(11);
+TARGET_ATTR_RW(12);
+TARGET_ATTR_RW(13);
+TARGET_ATTR_RW(14);
+TARGET_ATTR_RW(15);
+
+static struct attribute *interleave_attrs[] = {
+	&dev_attr_target0.attr,
+	&dev_attr_target1.attr,
+	&dev_attr_target2.attr,
+	&dev_attr_target3.attr,
+	&dev_attr_target4.attr,
+	&dev_attr_target5.attr,
+	&dev_attr_target6.attr,
+	&dev_attr_target7.attr,
+	&dev_attr_target8.attr,
+	&dev_attr_target9.attr,
+	&dev_attr_target10.attr,
+	&dev_attr_target11.attr,
+	&dev_attr_target12.attr,
+	&dev_attr_target13.attr,
+	&dev_attr_target14.attr,
+	&dev_attr_target15.attr,
+	NULL,
+};
+
+static umode_t visible_targets(struct kobject *kobj, struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct cxl_region *cxlr = to_cxl_region(dev);
+
+	if (n < cxlr->interleave_ways)
+		return a->mode;
+	return 0;
+}
+
+static const struct attribute_group region_interleave_group = {
+	.attrs = interleave_attrs,
+	.is_visible = visible_targets,
+};
+
+static const struct attribute_group *region_groups[] = {
+	&region_group,
+	&region_interleave_group,
+	&cxl_base_attribute_group,
+	NULL,
+};
+
 static const struct device_type cxl_region_type = {
 	.name = "cxl_region",
 	.release = cxl_region_release,
+	.groups = region_groups
 };
 
-static struct cxl_region *to_cxl_region(struct device *dev)
+bool is_cxl_region(struct device *dev)
+{
+	return dev->type == &cxl_region_type;
+}
+EXPORT_SYMBOL_NS_GPL(is_cxl_region, CXL);
+
+struct cxl_region *to_cxl_region(struct device *dev)
 {
 	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
 			  "not a cxl_region device\n"))
@@ -45,6 +387,7 @@ static struct cxl_region *to_cxl_region(struct device *dev)
 
 	return container_of(dev, struct cxl_region, dev);
 }
+EXPORT_SYMBOL_NS_GPL(to_cxl_region, CXL);
 
 static void unregister_region(struct work_struct *work)
 {
@@ -79,6 +422,8 @@ static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	cxlr->id = cxld->next_region_id;
+
 	cxld->next_region_id = rc;
 
 	dev = &cxlr->dev;
@@ -88,6 +433,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
 	dev->bus = &cxl_bus_type;
 	dev->type = &cxl_region_type;
 	INIT_WORK(&cxlr->unregister_work, unregister_region);
+	mutex_init(&cxlr->remove_lock);
 
 	return cxlr;
 }
@@ -118,7 +464,6 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_decoder *cxld)
 
 	dev = &cxlr->dev;
 
-	cxlr->id = cxld->next_region_id;
 	rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, cxlr->id);
 	if (rc)
 		goto err_out;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index d5397f7dfcf4..26351ed0ba65 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -81,6 +81,19 @@ static inline int cxl_to_interleave_ways(u8 eniw)
 	}
 }
 
+static inline int cxl_from_ways(u8 ways)
+{
+	if (is_power_of_2(ways))
+		return ilog2(ways);
+
+	return ways / 3 + 8;
+}
+
+static inline int cxl_from_granularity(u16 g)
+{
+	return ilog2(g) - 8;
+}
+
 /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
 #define CXLDEV_CAP_ARRAY_OFFSET 0x0
 #define   CXLDEV_CAP_ARRAY_CAP_ID 0
@@ -223,6 +236,7 @@ enum cxl_decoder_type {
  * @target_lock: coordinate coherent reads of the target list
  * @region_ida: allocator for region ids.
  * @next_region_id: Cached region id for next region.
+ * @region: The region this decoder is associated with.
  * @nr_targets: number of elements in @target
  * @target: active ordered target list in current decoder configuration
  */
@@ -241,11 +255,11 @@ struct cxl_decoder {
 	struct mutex id_lock;
 	struct ida region_ida;
 	int next_region_id;
+	struct cxl_region *cxlr;
 	int nr_targets;
 	struct cxl_dport *target[];
 };
 
-
 /**
  * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
  * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
index 7025f6785f83..e78a049a5729 100644
--- a/drivers/cxl/region.h
+++ b/drivers/cxl/region.h
@@ -13,6 +13,14 @@
  * @id: This region's id. Id is globally unique across all regions.
  * @flags: Flags representing the current state of the region.
  * @unregister_work: Async unregister to allow attrs to take device_lock.
+ * @remove_lock: Coordinates region removal against decoder removal
+ * @list: Node in decoder's region list.
+ * @res: Resource this region carves out of the platform decode range.
+ * @size: Size of the region determined from LSA or userspace.
+ * @uuid: The UUID for this region.
+ * @interleave_ways: Number of interleave ways this region is configured for.
+ * @interleave_granularity: Interleave granularity of region
+ * @targets: The memory devices comprising the region.
  */
 struct cxl_region {
 	struct device dev;
@@ -20,9 +28,66 @@ struct cxl_region {
 	unsigned long flags;
 #define REGION_DEAD 0
 	struct work_struct unregister_work;
+	struct mutex remove_lock;
 
+	struct list_head list;
+	struct resource *res;
+
+	u64 size;
+	uuid_t uuid;
+	int interleave_ways;
+	int interleave_granularity;
+	struct cxl_decoder *targets[CXL_DECODER_MAX_INTERLEAVE];
 };
 
+bool is_cxl_region(struct device *dev);
+struct cxl_region *to_cxl_region(struct device *dev);
 bool schedule_cxl_region_unregister(struct cxl_region *cxlr);
 
+static inline bool cxl_is_interleave_ways_valid(const struct cxl_region *cxlr,
+						const struct cxl_decoder *rootd,
+						u8 ways)
+{
+	int root_ig, region_ig, root_eniw;
+
+	switch (ways) {
+	case 0 ... 4:
+	case 6:
+	case 8:
+	case 12:
+	case 16:
+		break;
+	default:
+		return false;
+	}
+
+	if (rootd->interleave_ways == 1)
+		return true;
+
+	root_ig = cxl_from_granularity(rootd->interleave_granularity);
+	region_ig = cxl_from_granularity(cxlr->interleave_granularity);
+	root_eniw = cxl_from_ways(rootd->interleave_ways);
+
+	return ((1 << (root_ig - region_ig)) * (1 << root_eniw)) <= ways;
+}
+
+static inline bool
+cxl_is_interleave_granularity_valid(const struct cxl_decoder *rootd, int ig)
+{
+	int rootd_hbig;
+
+	if (!is_power_of_2(ig))
+		return false;
+
+	/* 16K is the max */
+	if (ig >> 15)
+		return false;
+
+	rootd_hbig = cxl_from_granularity(rootd->interleave_granularity);
+	if (rootd_hbig < cxl_from_granularity(ig))
+		return false;
+
+	return true;
+}
+
 #endif
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] cxl/region: Add region creation ABI
  2022-02-25  6:00 ` [RFC PATCH 1/2] cxl/region: Add region creation ABI Ben Widawsky
@ 2022-02-28 23:48   ` Dan Williams
  2022-03-01 21:22     ` Ben Widawsky
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2022-02-28 23:48 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, patches, Alison Schofield, Ira Weiny,
	Jonathan Cameron, Vishal Verma

On Thu, Feb 24, 2022 at 10:01 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> Regions are created as a child of the decoder that encompasses an
> address space with constraints. Regions have a number of attributes that
> must be configured before the region can be activated.
>
> The ABI is not meant to be secure,

What does that mean? It's "secure" by virtue of only being root
writable and if root userspace process race themselves the kernel will
coherently handle the conflict and pick a winner.

> but is meant to avoid accidental
> races.

It also solves purposeful race.

>  As a result, a buggy process may create a region by name that was
> allocated by a different process.

That's not a bug, one process atomically claims the next region, the
other loops.

> However, multiple processes which are
> trying not to race with each other shouldn't need special
> synchronization to do so.
>
> // Allocate a new region name
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)
>
> // Create a new region by name
> while
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)
> ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_region
> do true; done
>
> // Region now exists in sysfs
> stat -t /sys/bus/cxl/devices/decoder0.0/$region
>
> // Delete the region, and name
> echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
>
> ---
> Changes since v5 (all Dan):
> - Fix erroneous return on create
> - Fix ida leak on error
> - forward declare to_cxl_region instead of cxl_region_release
> - Use REGION_DEAD in the right place
> - Allocate next id in region_alloc
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl       |  23 ++
>  .../driver-api/cxl/memory-devices.rst         |  11 +
>  drivers/cxl/core/Makefile                     |   1 +
>  drivers/cxl/core/core.h                       |   3 +
>  drivers/cxl/core/port.c                       |  18 ++
>  drivers/cxl/core/region.c                     | 223 ++++++++++++++++++
>  drivers/cxl/cxl.h                             |   5 +
>  drivers/cxl/region.h                          |  28 +++
>  tools/testing/cxl/Kbuild                      |   1 +
>  9 files changed, 313 insertions(+)
>  create mode 100644 drivers/cxl/core/region.c
>  create mode 100644 drivers/cxl/region.h
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 7c2b846521f3..e5db45ea70ad 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -163,3 +163,26 @@ Description:
>                 memory (type-3). The 'target_type' attribute indicates the
>                 current setting which may dynamically change based on what
>                 memory regions are activated in this decode hierarchy.
> +
> +What:          /sys/bus/cxl/devices/decoderX.Y/create_region
> +Date:          January, 2022
> +KernelVersion: v5.18
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               Write a value of the form 'regionX.Y:Z' to instantiate a new

Per internal conversation this will move to "write a value of the form
"%u", right?

> +               region within the decode range bounded by decoderX.Y. The value
> +               written must match the current value returned from reading this
> +               attribute. This behavior lets the kernel arbitrate racing
> +               attempts to create a region. The thread that fails to write
> +               loops and tries the next value. Regions must be created for root
> +               decoders,

Does this need to be stated? It's already implicit by the fact that
'create_region' does not appear on anything but root decoders.


>  and must subsequently configured and bound to a region
> +               driver before they can be used.
> +
> +What:          /sys/bus/cxl/devices/decoderX.Y/delete_region
> +Date:          January, 2022
> +KernelVersion: v5.18
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               Deletes the named region.  The attribute expects a region in the
> +               form "regionX.Y:Z". The region's name, allocated by reading
> +               create_region, will also be released.
> diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> index db476bb170b6..66ddc58a21b1 100644
> --- a/Documentation/driver-api/cxl/memory-devices.rst
> +++ b/Documentation/driver-api/cxl/memory-devices.rst
> @@ -362,6 +362,17 @@ CXL Core
>  .. kernel-doc:: drivers/cxl/core/mbox.c
>     :doc: cxl mbox
>
> +CXL Regions
> +-----------
> +.. kernel-doc:: drivers/cxl/region.h
> +   :identifiers:
> +
> +.. kernel-doc:: drivers/cxl/core/region.c
> +   :doc: cxl core region
> +
> +.. kernel-doc:: drivers/cxl/core/region.c
> +   :identifiers:
> +
>  External Interfaces
>  ===================
>
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 6d37cd78b151..39ce8f2f2373 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_CXL_BUS) += cxl_core.o
>  ccflags-y += -I$(srctree)/drivers/cxl
>  cxl_core-y := port.o
>  cxl_core-y += pmem.o
> +cxl_core-y += region.o
>  cxl_core-y += regs.o
>  cxl_core-y += memdev.o
>  cxl_core-y += mbox.o
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 1a50c0fc399c..adfd42370b28 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -9,6 +9,9 @@ extern const struct device_type cxl_nvdimm_type;
>
>  extern struct attribute_group cxl_base_attribute_group;
>
> +extern struct device_attribute dev_attr_create_region;
> +extern struct device_attribute dev_attr_delete_region;
> +
>  struct cxl_send_command;
>  struct cxl_mem_query_commands;
>  int cxl_query_cmd(struct cxl_memdev *cxlmd,
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 1e785a3affaa..f3e1313217a8 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -9,6 +9,7 @@
>  #include <linux/idr.h>
>  #include <cxlmem.h>
>  #include <cxlpci.h>
> +#include <region.h>
>  #include <cxl.h>
>  #include "core.h"
>
> @@ -213,6 +214,8 @@ static struct attribute_group cxl_decoder_base_attribute_group = {
>  };
>
>  static struct attribute *cxl_decoder_root_attrs[] = {
> +       &dev_attr_create_region.attr,
> +       &dev_attr_delete_region.attr,
>         &dev_attr_cap_pmem.attr,
>         &dev_attr_cap_ram.attr,
>         &dev_attr_cap_type2.attr,
> @@ -270,6 +273,8 @@ static void cxl_decoder_release(struct device *dev)
>         struct cxl_decoder *cxld = to_cxl_decoder(dev);
>         struct cxl_port *port = to_cxl_port(dev->parent);
>
> +       ida_free(&cxld->region_ida, cxld->next_region_id);
> +       ida_destroy(&cxld->region_ida);
>         ida_free(&port->decoder_ida, cxld->id);
>         kfree(cxld);
>  }
> @@ -1244,6 +1249,13 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>         cxld->target_type = CXL_DECODER_EXPANDER;
>         cxld->platform_res = (struct resource)DEFINE_RES_MEM(0, 0);
>
> +       mutex_init(&cxld->id_lock);
> +       ida_init(&cxld->region_ida);
> +       rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> +       if (rc < 0)
> +               goto err;
> +
> +       cxld->next_region_id = rc;
>         return cxld;
>  err:
>         kfree(cxld);
> @@ -1502,6 +1514,12 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
>  }
>  EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
>
> +bool schedule_cxl_region_unregister(struct cxl_region *cxlr)
> +{
> +       return queue_work(cxl_bus_wq, &cxlr->unregister_work);
> +}
> +EXPORT_SYMBOL_NS_GPL(schedule_cxl_region_unregister, CXL);
> +
>  /* for user tooling to ensure port disable work has completed */
>  static ssize_t flush_store(struct bus_type *bus, const char *buf, size_t count)
>  {
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> new file mode 100644
> index 000000000000..a934938f8630
> --- /dev/null
> +++ b/drivers/cxl/core/region.c
> @@ -0,0 +1,223 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/idr.h>
> +#include <region.h>
> +#include <cxl.h>
> +#include "core.h"
> +
> +/**
> + * DOC: cxl core region
> + *
> + * CXL Regions represent mapped memory capacity in system physical address
> + * space. Whereas the CXL Root Decoders identify the bounds of potential CXL
> + * Memory ranges, Regions represent the active mapped capacity by the HDM
> + * Decoder Capability structures throughout the Host Bridges, Switches, and
> + * Endpoints in the topology.
> + */
> +
> +
> +static struct cxl_region *to_cxl_region(struct device *dev);

cxl_lock_class() will want to know the lock_class for the cxl_region
device lock, so this can go straight to be an extern declaration
alongside is_cxl_decoder().

> +
> +static void cxl_region_release(struct device *dev)
> +{
> +       struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> +       struct cxl_region *cxlr = to_cxl_region(dev);
> +
> +       dev_dbg(&cxld->dev, "Releasing %s\n", dev_name(dev));
> +       ida_free(&cxld->region_ida, cxlr->id);
> +       kfree(cxlr);
> +       put_device(&cxld->dev);
> +}
> +
> +static const struct device_type cxl_region_type = {
> +       .name = "cxl_region",
> +       .release = cxl_region_release,
> +};
> +
> +static struct cxl_region *to_cxl_region(struct device *dev)
> +{
> +       if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> +                         "not a cxl_region device\n"))
> +               return NULL;
> +
> +       return container_of(dev, struct cxl_region, dev);
> +}
> +
> +static void unregister_region(struct work_struct *work)
> +{
> +       struct cxl_region *cxlr;
> +
> +       cxlr = container_of(work, typeof(*cxlr), unregister_work);
> +       device_unregister(&cxlr->dev);
> +}
> +
> +static void schedule_unregister(void *cxlr)
> +{
> +       schedule_cxl_region_unregister(cxlr);
> +}
> +
> +static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
> +{
> +       struct cxl_region *cxlr;
> +       struct device *dev;
> +       int rc;
> +
> +       lockdep_assert_held(&cxld->id_lock);
> +
> +       rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> +       if (rc < 0) {
> +               dev_dbg(dev, "Failed to get next cached id (%d)\n", rc);
> +               return ERR_PTR(rc);
> +       }
> +
> +       cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL);
> +       if (!cxlr) {
> +               ida_free(&cxld->region_ida, rc);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       cxld->next_region_id = rc;
> +
> +       dev = &cxlr->dev;
> +       device_initialize(dev);
> +       dev->parent = &cxld->dev;
> +       device_set_pm_not_required(dev);
> +       dev->bus = &cxl_bus_type;
> +       dev->type = &cxl_region_type;
> +       INIT_WORK(&cxlr->unregister_work, unregister_region);

Perhaps name it "detach_work" to match the same for memdevs, or second
choice, go rename the memdev one to "unregister_work". Keep the naming
consistent for data structures that fill the same role.

> +
> +       return cxlr;
> +}
> +
> +/**
> + * devm_cxl_add_region - Adds a region to a decoder
> + * @cxld: Parent decoder.
> + * @cxlr: Region to be added to the decoder.
> + *
> + * This is the second step of region initialization. Regions exist within an
> + * address space which is mapped by a @cxld. That @cxld must be a root decoder,
> + * and it enforces constraints upon the region as it is configured.
> + *
> + * Return: 0 if the region was added to the @cxld, else returns negative error
> + * code. The region will be named "regionX.Y.Z" where X is the port, Y is the
> + * decoder id, and Z is the region number.
> + */
> +static struct cxl_region *devm_cxl_add_region(struct cxl_decoder *cxld)
> +{
> +       struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +       struct cxl_region *cxlr;
> +       struct device *dev;
> +       int rc;
> +
> +       cxlr = cxl_region_alloc(cxld);
> +       if (IS_ERR(cxlr))
> +               return cxlr;
> +
> +       dev = &cxlr->dev;
> +
> +       cxlr->id = cxld->next_region_id;
> +       rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, cxlr->id);
> +       if (rc)
> +               goto err_out;
> +
> +       /* affirm that release will have access to the decoder's region ida  */

s/affirm that release will/arrange for cxl_region_release to/

> +       get_device(&cxld->dev);
> +
> +       rc = device_add(dev);
> +       if (rc)
> +               goto err_put;
> +
> +       rc = devm_add_action_or_reset(port->uport, schedule_unregister, cxlr);
> +       if (rc)
> +               goto err_put;
> +
> +       return cxlr;
> +
> +err_put:
> +       put_device(&cxld->dev);
> +
> +err_out:
> +       put_device(dev);
> +       return ERR_PTR(rc);
> +}
> +
> +static ssize_t create_region_show(struct device *dev,
> +                                 struct device_attribute *attr, char *buf)
> +{
> +       struct cxl_port *port = to_cxl_port(dev->parent);
> +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +
> +       return sysfs_emit(buf, "region%d.%d:%d\n", port->id, cxld->id,
> +                         cxld->next_region_id);

To cut down on userspace racing itself this should acquire the id_lock
to synchronize against the store side. i.e. no point in returning
known bad answers when the lock is held on the store side, even though
the answer given here may be immediately invalidated as soon as the
lock is dropped it's still useful to throttle readers in the presence
of writers.

> +}
> +
> +static ssize_t create_region_store(struct device *dev,
> +                                  struct device_attribute *attr,
> +                                  const char *buf, size_t len)
> +{
> +       struct cxl_port *port = to_cxl_port(dev->parent);
> +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +       struct cxl_region *cxlr;
> +       int d, p, r, rc = 0;
> +
> +       if (sscanf(buf, "region%d.%d:%d", &p, &d, &r) != 3)
> +               return -EINVAL;
> +
> +       if (port->id != p || cxld->id != d)
> +               return -EINVAL;
> +
> +       rc = mutex_lock_interruptible(&cxld->id_lock);
> +       if (rc)
> +               return rc;
> +
> +       if (cxld->next_region_id != r) {
> +               rc = -EINVAL;
> +               goto out;
> +       }
> +
> +       cxlr = devm_cxl_add_region(cxld);
> +       rc = 0;
> +       dev_dbg(dev, "Created %s\n", dev_name(&cxlr->dev));
> +
> +out:
> +       mutex_unlock(&cxld->id_lock);
> +       if (rc)
> +               return rc;
> +       return len;
> +}
> +DEVICE_ATTR_RW(create_region);
> +
> +static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld,
> +                                                 const char *name)
> +{
> +       struct device *region_dev;
> +
> +       region_dev = device_find_child_by_name(&cxld->dev, name);
> +       if (!region_dev)
> +               return ERR_PTR(-ENOENT);
> +
> +       return to_cxl_region(region_dev);
> +}
> +
> +static ssize_t delete_region_store(struct device *dev,
> +                                  struct device_attribute *attr,
> +                                  const char *buf, size_t len)
> +{
> +       struct cxl_port *port = to_cxl_port(dev->parent);
> +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +       struct cxl_region *cxlr;
> +
> +       cxlr = cxl_find_region_by_name(cxld, buf);
> +       if (IS_ERR(cxlr))
> +               return PTR_ERR(cxlr);
> +
> +       if (!test_and_set_bit(REGION_DEAD, &cxlr->flags))
> +               devm_release_action(port->uport, schedule_unregister, cxlr);

Where is the synchronization against port ->remove()? That could have
started before this sysfs file was deleted and trigger double
device_unregister(). Also while any workqueue thread may want to
access the region a reference needs to be held otherwise you can get a
use after free. I expect that this should unconditionally schedule the
unregister work, then in the workqueue check that only one invocation
actually performs the unregistration.

Given that the work is only related to unregistration this can fail
requests to delete something that has already been deleted. If
userspace sees that error and wants to synchronize it can use the
bus/flush attribute for that. I.e.

if (work_pending(&cxlr->detach_work))
   return -EBUSY;

> +       put_device(&cxlr->dev);
> +
> +       return len;
> +}
> +DEVICE_ATTR_WO(delete_region);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index b4047a310340..d5397f7dfcf4 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -221,6 +221,8 @@ enum cxl_decoder_type {
>   * @target_type: accelerator vs expander (type2 vs type3) selector
>   * @flags: memory type capabilities and locking
>   * @target_lock: coordinate coherent reads of the target list
> + * @region_ida: allocator for region ids.
> + * @next_region_id: Cached region id for next region.
>   * @nr_targets: number of elements in @target
>   * @target: active ordered target list in current decoder configuration
>   */
> @@ -236,6 +238,9 @@ struct cxl_decoder {
>         enum cxl_decoder_type target_type;
>         unsigned long flags;
>         seqlock_t target_lock;
> +       struct mutex id_lock;
> +       struct ida region_ida;
> +       int next_region_id;
>         int nr_targets;
>         struct cxl_dport *target[];
>  };
> diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
> new file mode 100644
> index 000000000000..7025f6785f83
> --- /dev/null
> +++ b/drivers/cxl/region.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2021 Intel Corporation. */
> +#ifndef __CXL_REGION_H__
> +#define __CXL_REGION_H__
> +
> +#include <linux/uuid.h>
> +
> +#include "cxl.h"
> +
> +/**
> + * struct cxl_region - CXL region
> + * @dev: This region's device.
> + * @id: This region's id. Id is globally unique across all regions.
> + * @flags: Flags representing the current state of the region.
> + * @unregister_work: Async unregister to allow attrs to take device_lock.
> + */
> +struct cxl_region {
> +       struct device dev;
> +       int id;
> +       unsigned long flags;
> +#define REGION_DEAD 0
> +       struct work_struct unregister_work;
> +
> +};
> +
> +bool schedule_cxl_region_unregister(struct cxl_region *cxlr);
> +
> +#endif
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index 82e49ab0937d..3fe6d34e6d59 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -46,6 +46,7 @@ cxl_core-y += $(CXL_CORE_SRC)/memdev.o
>  cxl_core-y += $(CXL_CORE_SRC)/mbox.o
>  cxl_core-y += $(CXL_CORE_SRC)/pci.o
>  cxl_core-y += $(CXL_CORE_SRC)/hdm.o
> +cxl_core-y += $(CXL_CORE_SRC)/region.o
>  cxl_core-y += config_check.o
>
>  obj-m += test/
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] cxl/region: Introduce concept of region configuration
  2022-02-25  6:00 ` [RFC PATCH 2/2] cxl/region: Introduce concept of region configuration Ben Widawsky
@ 2022-03-01  1:16   ` Dan Williams
  2022-03-01 17:43     ` Ben Widawsky
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2022-03-01  1:16 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, patches, kernel test robot, Alison Schofield,
	Ira Weiny, Jonathan Cameron, Vishal Verma

On Thu, Feb 24, 2022 at 10:01 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> The region creation APIs create a vacant region. Configuring the region
> works in the same way as similar subsystems such as devdax. Sysfs attrs
> will be provided to allow userspace to configure the region.  Finally
> once all configuration is complete, userspace may activate the region.
>
> Introduced here are the most basic attributes needed to configure a
> region. Details of these attribute are described in the ABI
> Documentation.
>
> A example is provided below:
>
> /sys/bus/cxl/devices/region0.0:0
> ├── devtype
> ├── interleave_granularity
> ├── interleave_ways
> ├── modalias
> ├── offset
> ├── size
> ├── subsystem -> ../../../../../../bus/cxl
> ├── target0
> ├── uevent
> └── uuid
>
> Reported-by: kernel test robot <lkp@intel.com> (v2)
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
> Changes since v3:
> - Make target be a decoder
> - Use device_lock for protecting config/probe race
> - Teardown region on decoder removal
>
> Size is still not handled.
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |  59 ++++
>  drivers/cxl/core/port.c                 |   8 +
>  drivers/cxl/core/region.c               | 351 +++++++++++++++++++++++-
>  drivers/cxl/cxl.h                       |  16 +-
>  drivers/cxl/region.h                    |  65 +++++
>  5 files changed, 495 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index e5db45ea70ad..c447826e8286 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -186,3 +186,62 @@ Description:
>                 Deletes the named region.  The attribute expects a region in the
>                 form "regionX.Y:Z". The region's name, allocated by reading
>                 create_region, will also be released.
> +               Deletes the named region. A region must be unbound from the
> +               region driver before being deleted.

This is not enforced by patch1, and I don't see why this would be
required, the kernel will do that as a part of device_unregister,
right?

>   The attributes expects a
> +               region in the form "regionX.Y:Z". The region's name, allocated
> +               by reading create_region, will also be released.

This can also be dropped, userspace does not have any sensitivity to
when / how the region name memory allocation is managed.

> +
> +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/resource
> +Date:          January, 2022
> +KernelVersion: v5.18
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               A region is a contiguous partition of a CXL Root decoder address
> +               space. Region capacity is allocated by writing to the size
> +               attribute, the resulting physical address base determined by the
> +               driver is reflected here.
> +
> +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/size
> +Date:          January, 2022
> +KernelVersion: v5.18
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               System physical address space to be consumed by the region.

s/to be//

> +
> +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/interleave_ways
> +Date:          January, 2022
> +KernelVersion: v5.18
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               Configures the number of devices participating in the region is
> +               set by writing this value. Each device will provide
> +               1/interleave_ways of storage for the region.
> +
> +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/interleave_granularity
> +Date:          January, 2022
> +KernelVersion: v5.18
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               Set the number of consecutive bytes each device in the
> +               interleave set will claim. The possible interleave granularity
> +               values are determined by the CXL spec and the participating
> +               devices.
> +
> +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/uuid
> +Date:          January, 2022
> +KernelVersion: v5.18
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               Write a unique identifier for the region. This field must be set
> +               for persistent regions and it must not conflict with the UUID of
> +               another region. If this field is set for volatile regions, the
> +               value is ignored.

Hmm, could this attribute just be hidden via is_visible() if the
region type is not persistent? Although that opens up new questions
like, what about root decoders that can simultaneously support
volatile and pmem? Encode the type in the create ABI? I.e. have
create_pmem_region and create_volatile_region I like the idea that the
region type is unambiguous at create time.

> +
> +What: /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/endpoint_decoder[0..interleave_ways]
> +Date:          January, 2022
> +KernelVersion: v5.18
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               Write a decoder object that is unused and will participate in
> +               decoding memory transactions for the interleave set, ie.
> +               decoderX.Y. All attributes must be populated.

Feels like this wants a lead-in patch describing / implementing the
writable decoder attributes.

> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index f3e1313217a8..0eff36f748c3 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1415,6 +1415,14 @@ EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
>
>  static void cxld_unregister(void *dev)
>  {
> +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +
> +       if (cxld->cxlr) {
> +               mutex_lock(&cxld->cxlr->remove_lock);

I don't understand what this lock is for? Perhaps if it was named
after the data it is locking rather than the code it would be more
obvious.

> +               device_release_driver(&cxld->cxlr->dev);

I would expect device_release_driver() to only be used in scenarios
where the region is being disabled, but if it is being unregistered
just let the device core detach the driver naturally.

> +               mutex_unlock(&cxld->cxlr->remove_lock);
> +       }
> +
>         device_unregister(dev);
>  }
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index a934938f8630..2b17b0af48de 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2,9 +2,12 @@
>  /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
>  #include <linux/device.h>
>  #include <linux/module.h>
> +#include <linux/sizes.h>
>  #include <linux/slab.h>
> +#include <linux/uuid.h>
>  #include <linux/idr.h>
>  #include <region.h>
> +#include <cxlmem.h>
>  #include <cxl.h>
>  #include "core.h"
>
> @@ -16,28 +19,367 @@
>   * Memory ranges, Regions represent the active mapped capacity by the HDM
>   * Decoder Capability structures throughout the Host Bridges, Switches, and
>   * Endpoints in the topology.
> + *
> + * Region configuration has some ordering constraints:

Add some "why" commentary to supplement the what.

> + * - Size: Must be set after all targets

I would expect the other way around so that decoder capacity can be
validated as they are registered to the region?

> + * - Targets: Must be set after interleave ways
> + * - Interleave ways: Must be set after Interleave Granularity
> + *
> + * UUID may be set at any time before binding the driver to the region.

Did we also talk about making all properties write once? Perhaps only
if it simplifies some of the validation code.

>   */
>
>
> -static struct cxl_region *to_cxl_region(struct device *dev);
> +static const struct attribute_group region_interleave_group;
> +
> +#define _REGION_ATTR_RO(name)                                                  \
> +       static ssize_t name##_show(struct device *dev,                         \
> +                                  struct device_attribute *attr, char *buf)   \
> +       {                                                                      \
> +               struct cxl_region *cxlr = to_cxl_region(dev);                  \
> +               if (cxlr->flags & REGION_DEAD)                                 \

Per the feedback on patch1 this likely wants to be:

if (work_pending(&cxlr->detach_work))

> +                       return -ENODEV;                                        \
> +               return show_##name(to_cxl_region(dev), buf);                   \

I'd rather skip this macro indirection and just open code the 'dead'
check in the show handler...

> +       }
> +
> +#define REGION_ATTR_RO(name)                                                   \
> +       _REGION_ATTR_RO(name)                                                  \
> +       static DEVICE_ATTR_RO(name)

...because this looks exotic to me.

> +
> +#define _REGION_ATTR_WO(name)                                                  \

More macro that can just be C helpers / open-coded.

> +       static ssize_t name##_store(struct device *dev,                        \
> +                                   struct device_attribute *attr,             \
> +                                   const char *buf, size_t len)               \
> +       {                                                                      \
> +               int ret;                                                       \
> +               if (device_lock_interruptible(dev) < 0)                        \

lockdep gave the thumbs up on this? Useful lockdep reports is another
reason to have not this detail buried in macros.

> +                       return -EINTR;                                         \
> +               if (dev->driver) {                                             \
> +                       device_unlock(dev);                                    \
> +                       return -EBUSY;                                         \
> +               }                                                              \
> +               ret = store_##name(to_cxl_region(dev), buf, len);              \
> +               device_unlock(dev);                                            \
> +               return ret;                                                    \
> +       }
> +
> +#define REGION_ATTR_RW(name)                                                   \
> +       _REGION_ATTR_RO(name)                                                  \
> +       _REGION_ATTR_WO(name)                                                  \
> +       static DEVICE_ATTR_RW(name)
> +
> +#define TARGET_ATTR_RW(n)                                                      \
> +       static ssize_t target##n##_show(                                       \
> +               struct device *dev, struct device_attribute *attr, char *buf)  \
> +       {                                                                      \
> +               return show_targetN(to_cxl_region(dev), buf, (n));             \
> +       }                                                                      \
> +       static ssize_t target##n##_store(struct device *dev,                   \
> +                                        struct device_attribute *attr,        \
> +                                        const char *buf, size_t len)          \
> +       {                                                                      \
> +               int ret;                                                       \
> +               if (device_lock_interruptible(dev) < 0)                        \
> +                       return -EINTR;                                         \
> +               if (dev->driver) {                                             \
> +                       device_unlock(dev);                                    \
> +                       return -EBUSY;                                         \
> +               }                                                              \
> +               ret = store_targetN(to_cxl_region(dev), buf, (n), len);        \
> +               device_unlock(dev);                                            \
> +               return ret;                                                    \
> +       }                                                                      \
> +       static DEVICE_ATTR_RW(target##n)
> +
> +static void remove_target(struct cxl_region *cxlr, int target)
> +{
> +       struct cxl_decoder *cxld;
> +
> +       mutex_lock(&cxlr->remove_lock);
> +       cxld = cxlr->targets[target];
> +       if (cxld) {
> +               cxld->cxlr = NULL;
> +               put_device(&cxld->dev);
> +       }
> +       cxlr->targets[target] = NULL;
> +       mutex_unlock(&cxlr->remove_lock);

How does this synchronize with memdev ->remove() and decoders becoming
unregistered while still referenced by an active region? I think,
especially for error handling scenarios the region driver will want to
be able to assume that the region will go through ->remove() before
any targets successfully observed by ->probe() can be removed. To me
that means this needs a test for the memdev ->remove() flow and
whether memdev ->remove() could perhaps synchronously trigger region
remove for all associated regions. That would mean that the region
device lock would need to nest beneath endpoint decoder device_lock().
Other flows get easier if region driver flows can assume that until
region ->remove() the target decoder list is stable.

> +}
>
>  static void cxl_region_release(struct device *dev)
>  {
>         struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
>         struct cxl_region *cxlr = to_cxl_region(dev);
> +       int i;
>
>         dev_dbg(&cxld->dev, "Releasing %s\n", dev_name(dev));
>         ida_free(&cxld->region_ida, cxlr->id);
> +       for (i = 0; i < cxlr->interleave_ways; i++)
> +               remove_target(cxlr, i);
>         kfree(cxlr);
>         put_device(&cxld->dev);
>  }
>
> +static ssize_t show_interleave_ways(struct cxl_region *cxlr, char *buf)
> +{
> +       return sysfs_emit(buf, "%d\n", cxlr->interleave_ways);
> +}
> +
> +static ssize_t store_interleave_ways(struct cxl_region *cxlr, const char *buf,
> +                                    size_t len)
> +{
> +       struct cxl_decoder *rootd;
> +       int ret, val;
> +
> +       ret = kstrtoint(buf, 0, &val);
> +       if (ret)
> +               return ret;
> +       if (!cxlr->interleave_granularity) {
> +               dev_dbg(&cxlr->dev, "IG must be set before IW\n");
> +               return -ENXIO;
> +       }
> +       if (cxlr->interleave_ways)
> +               return -EOPNOTSUPP;
> +
> +       rootd = to_cxl_decoder(cxlr->dev.parent);
> +       if (!cxl_is_interleave_ways_valid(cxlr, rootd, val))
> +               return -EINVAL;
> +
> +       cxlr->interleave_ways = val;
> +
> +       ret = sysfs_update_group(&cxlr->dev.kobj, &region_interleave_group);
> +       if (ret < 0) {
> +               cxlr->interleave_ways = 0;
> +               return ret;
> +       }
> +
> +       return len;
> +}
> +REGION_ATTR_RW(interleave_ways);
> +
> +static ssize_t show_interleave_granularity(struct cxl_region *cxlr, char *buf)
> +{
> +       return sysfs_emit(buf, "%d\n", cxlr->interleave_granularity);
> +}
> +
> +static ssize_t store_interleave_granularity(struct cxl_region *cxlr,
> +                                           const char *buf, size_t len)
> +{
> +       struct cxl_decoder *rootd;
> +       int val, ret;
> +
> +       ret = kstrtoint(buf, 0, &val);
> +       if (ret)
> +               return ret;
> +       rootd = to_cxl_decoder(cxlr->dev.parent);
> +       if (!cxl_is_interleave_granularity_valid(rootd, val))
> +               return -EINVAL;
> +
> +       cxlr->interleave_granularity = val;
> +
> +       return len;
> +}
> +REGION_ATTR_RW(interleave_granularity);
> +
> +static ssize_t show_offset(struct cxl_region *cxlr, char *buf)
> +{
> +       if (!cxlr->res)
> +               return sysfs_emit(buf, "\n");
> +
> +       return sysfs_emit(buf, "%pa\n", &cxlr->res->start);
> +}
> +REGION_ATTR_RO(offset);
> +
> +static ssize_t show_size(struct cxl_region *cxlr, char *buf)
> +{
> +       return sysfs_emit(buf, "%llu\n", cxlr->size);
> +}
> +
> +static ssize_t store_size(struct cxl_region *cxlr, const char *buf, size_t len)
> +{
> +       unsigned long long val;
> +       ssize_t rc;
> +
> +       rc = kstrtoull(buf, 0, &val);
> +       if (rc)
> +               return rc;
> +
> +       cxlr->size = val;
> +       return len;
> +}
> +REGION_ATTR_RW(size);
> +
> +static ssize_t show_uuid(struct cxl_region *cxlr, char *buf)
> +{
> +       return sysfs_emit(buf, "%pUb\n", &cxlr->uuid);
> +}
> +
> +static int is_dupe(struct device *match, void *_cxlr)
> +{
> +       struct cxl_region *c, *cxlr = _cxlr;
> +
> +       if (!is_cxl_region(match))
> +               return 0;
> +
> +       if (&cxlr->dev == match)
> +               return 0;
> +
> +       c = to_cxl_region(match);
> +       if (uuid_equal(&c->uuid, &cxlr->uuid))
> +               return -EEXIST;
> +
> +       return 0;
> +}
> +
> +static ssize_t store_uuid(struct cxl_region *cxlr, const char *buf, size_t len)
> +{
> +       ssize_t rc;
> +       uuid_t temp;
> +
> +       if (len != UUID_STRING_LEN + 1)
> +               return -EINVAL;
> +
> +       rc = uuid_parse(buf, &temp);
> +       if (rc)
> +               return rc;
> +
> +       rc = bus_for_each_dev(&cxl_bus_type, NULL, cxlr, is_dupe);
> +       if (rc < 0)
> +               return false;
> +
> +       cxlr->uuid = temp;
> +       return len;
> +}
> +REGION_ATTR_RW(uuid);
> +
> +static struct attribute *region_attrs[] = {
> +       &dev_attr_interleave_ways.attr,
> +       &dev_attr_interleave_granularity.attr,
> +       &dev_attr_offset.attr,
> +       &dev_attr_size.attr,
> +       &dev_attr_uuid.attr,
> +       NULL,
> +};
> +
> +static const struct attribute_group region_group = {
> +       .attrs = region_attrs,
> +};
> +
> +static size_t show_targetN(struct cxl_region *cxlr, char *buf, int n)
> +{
> +       if (!cxlr->targets[n])
> +               return sysfs_emit(buf, "\n");
> +
> +       return sysfs_emit(buf, "%s\n", dev_name(&cxlr->targets[n]->dev));
> +}
> +
> +static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int n,
> +                           size_t len)
> +{
> +       struct cxl_decoder *cxld;
> +       struct device *cxld_dev;
> +
> +       if (len == 1 || cxlr->targets[n])
> +               remove_target(cxlr, n);
> +
> +       /* Remove target special case */
> +       if (len == 1) {
> +               device_unlock(&cxlr->dev);
> +               return len;
> +       }
> +
> +       cxld_dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
> +       if (!cxld_dev)
> +               return -ENOENT;
> +
> +       if (!is_cxl_decoder(cxld_dev)) {
> +               put_device(cxld_dev);
> +               return -EPERM;
> +       }
> +
> +       if (!is_cxl_endpoint(to_cxl_port(cxld_dev->parent))) {
> +               put_device(cxld_dev);
> +               return -EINVAL;
> +       }
> +
> +       /* decoder reference is held until teardown */
> +       cxld = to_cxl_decoder(cxld_dev);
> +       cxlr->targets[n] = cxld;
> +       cxld->cxlr = cxlr;
> +
> +       return len;
> +}
> +
> +TARGET_ATTR_RW(0);
> +TARGET_ATTR_RW(1);
> +TARGET_ATTR_RW(2);
> +TARGET_ATTR_RW(3);
> +TARGET_ATTR_RW(4);
> +TARGET_ATTR_RW(5);
> +TARGET_ATTR_RW(6);
> +TARGET_ATTR_RW(7);
> +TARGET_ATTR_RW(8);
> +TARGET_ATTR_RW(9);
> +TARGET_ATTR_RW(10);
> +TARGET_ATTR_RW(11);
> +TARGET_ATTR_RW(12);
> +TARGET_ATTR_RW(13);
> +TARGET_ATTR_RW(14);
> +TARGET_ATTR_RW(15);
> +
> +static struct attribute *interleave_attrs[] = {
> +       &dev_attr_target0.attr,
> +       &dev_attr_target1.attr,
> +       &dev_attr_target2.attr,
> +       &dev_attr_target3.attr,
> +       &dev_attr_target4.attr,
> +       &dev_attr_target5.attr,
> +       &dev_attr_target6.attr,
> +       &dev_attr_target7.attr,
> +       &dev_attr_target8.attr,
> +       &dev_attr_target9.attr,
> +       &dev_attr_target10.attr,
> +       &dev_attr_target11.attr,
> +       &dev_attr_target12.attr,
> +       &dev_attr_target13.attr,
> +       &dev_attr_target14.attr,
> +       &dev_attr_target15.attr,
> +       NULL,
> +};
> +
> +static umode_t visible_targets(struct kobject *kobj, struct attribute *a, int n)
> +{
> +       struct device *dev = container_of(kobj, struct device, kobj);
> +       struct cxl_region *cxlr = to_cxl_region(dev);
> +
> +       if (n < cxlr->interleave_ways)
> +               return a->mode;
> +       return 0;
> +}
> +
> +static const struct attribute_group region_interleave_group = {
> +       .attrs = interleave_attrs,
> +       .is_visible = visible_targets,
> +};
> +
> +static const struct attribute_group *region_groups[] = {
> +       &region_group,
> +       &region_interleave_group,
> +       &cxl_base_attribute_group,
> +       NULL,
> +};
> +
>  static const struct device_type cxl_region_type = {
>         .name = "cxl_region",
>         .release = cxl_region_release,
> +       .groups = region_groups
>  };
>
> -static struct cxl_region *to_cxl_region(struct device *dev)
> +bool is_cxl_region(struct device *dev)
> +{
> +       return dev->type == &cxl_region_type;
> +}
> +EXPORT_SYMBOL_NS_GPL(is_cxl_region, CXL);
> +
> +struct cxl_region *to_cxl_region(struct device *dev)
>  {
>         if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
>                           "not a cxl_region device\n"))
> @@ -45,6 +387,7 @@ static struct cxl_region *to_cxl_region(struct device *dev)
>
>         return container_of(dev, struct cxl_region, dev);
>  }
> +EXPORT_SYMBOL_NS_GPL(to_cxl_region, CXL);
>
>  static void unregister_region(struct work_struct *work)
>  {
> @@ -79,6 +422,8 @@ static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
>                 return ERR_PTR(-ENOMEM);
>         }
>
> +       cxlr->id = cxld->next_region_id;
> +
>         cxld->next_region_id = rc;
>
>         dev = &cxlr->dev;
> @@ -88,6 +433,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
>         dev->bus = &cxl_bus_type;
>         dev->type = &cxl_region_type;
>         INIT_WORK(&cxlr->unregister_work, unregister_region);
> +       mutex_init(&cxlr->remove_lock);
>
>         return cxlr;
>  }
> @@ -118,7 +464,6 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_decoder *cxld)
>
>         dev = &cxlr->dev;
>
> -       cxlr->id = cxld->next_region_id;
>         rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, cxlr->id);
>         if (rc)
>                 goto err_out;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index d5397f7dfcf4..26351ed0ba65 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -81,6 +81,19 @@ static inline int cxl_to_interleave_ways(u8 eniw)
>         }
>  }
>
> +static inline int cxl_from_ways(u8 ways)
> +{
> +       if (is_power_of_2(ways))
> +               return ilog2(ways);
> +
> +       return ways / 3 + 8;
> +}
> +
> +static inline int cxl_from_granularity(u16 g)
> +{
> +       return ilog2(g) - 8;
> +}
> +
>  /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
>  #define CXLDEV_CAP_ARRAY_OFFSET 0x0
>  #define   CXLDEV_CAP_ARRAY_CAP_ID 0
> @@ -223,6 +236,7 @@ enum cxl_decoder_type {
>   * @target_lock: coordinate coherent reads of the target list
>   * @region_ida: allocator for region ids.
>   * @next_region_id: Cached region id for next region.
> + * @region: The region this decoder is associated with.
>   * @nr_targets: number of elements in @target
>   * @target: active ordered target list in current decoder configuration
>   */
> @@ -241,11 +255,11 @@ struct cxl_decoder {
>         struct mutex id_lock;
>         struct ida region_ida;
>         int next_region_id;
> +       struct cxl_region *cxlr;
>         int nr_targets;
>         struct cxl_dport *target[];
>  };
>
> -
>  /**
>   * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
>   * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
> diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
> index 7025f6785f83..e78a049a5729 100644
> --- a/drivers/cxl/region.h
> +++ b/drivers/cxl/region.h
> @@ -13,6 +13,14 @@
>   * @id: This region's id. Id is globally unique across all regions.
>   * @flags: Flags representing the current state of the region.
>   * @unregister_work: Async unregister to allow attrs to take device_lock.
> + * @remove_lock: Coordinates region removal against decoder removal
> + * @list: Node in decoder's region list.
> + * @res: Resource this region carves out of the platform decode range.
> + * @size: Size of the region determined from LSA or userspace.
> + * @uuid: The UUID for this region.
> + * @interleave_ways: Number of interleave ways this region is configured for.
> + * @interleave_granularity: Interleave granularity of region
> + * @targets: The memory devices comprising the region.
>   */
>  struct cxl_region {
>         struct device dev;
> @@ -20,9 +28,66 @@ struct cxl_region {
>         unsigned long flags;
>  #define REGION_DEAD 0
>         struct work_struct unregister_work;
> +       struct mutex remove_lock;
>
> +       struct list_head list;
> +       struct resource *res;
> +
> +       u64 size;
> +       uuid_t uuid;
> +       int interleave_ways;
> +       int interleave_granularity;
> +       struct cxl_decoder *targets[CXL_DECODER_MAX_INTERLEAVE];
>  };
>
> +bool is_cxl_region(struct device *dev);
> +struct cxl_region *to_cxl_region(struct device *dev);
>  bool schedule_cxl_region_unregister(struct cxl_region *cxlr);
>
> +static inline bool cxl_is_interleave_ways_valid(const struct cxl_region *cxlr,
> +                                               const struct cxl_decoder *rootd,
> +                                               u8 ways)
> +{
> +       int root_ig, region_ig, root_eniw;
> +
> +       switch (ways) {
> +       case 0 ... 4:
> +       case 6:
> +       case 8:
> +       case 12:
> +       case 16:
> +               break;
> +       default:
> +               return false;
> +       }
> +
> +       if (rootd->interleave_ways == 1)
> +               return true;
> +
> +       root_ig = cxl_from_granularity(rootd->interleave_granularity);
> +       region_ig = cxl_from_granularity(cxlr->interleave_granularity);
> +       root_eniw = cxl_from_ways(rootd->interleave_ways);
> +
> +       return ((1 << (root_ig - region_ig)) * (1 << root_eniw)) <= ways;

Some comments for this math please.

> +}
> +
> +static inline bool
> +cxl_is_interleave_granularity_valid(const struct cxl_decoder *rootd, int ig)
> +{
> +       int rootd_hbig;
> +
> +       if (!is_power_of_2(ig))
> +               return false;
> +
> +       /* 16K is the max */
> +       if (ig >> 15)

Why the shift instead of the more straightforward:

if (ig > SZ_16K)

> +               return false;
> +
> +       rootd_hbig = cxl_from_granularity(rootd->interleave_granularity);
> +       if (rootd_hbig < cxl_from_granularity(ig))
> +               return false;

Why do the comparison in CXL encoding vs ordinal:

if (ig > rootd->interleave_granularity)

?

> +
> +       return true;
> +}
> +
>  #endif
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] cxl/region: Introduce concept of region configuration
  2022-03-01  1:16   ` Dan Williams
@ 2022-03-01 17:43     ` Ben Widawsky
  2022-03-01 18:34       ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Ben Widawsky @ 2022-03-01 17:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, patches, kernel test robot, Alison Schofield,
	Ira Weiny, Jonathan Cameron, Vishal Verma

On 22-02-28 17:16:10, Dan Williams wrote:
> On Thu, Feb 24, 2022 at 10:01 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > The region creation APIs create a vacant region. Configuring the region
> > works in the same way as similar subsystems such as devdax. Sysfs attrs
> > will be provided to allow userspace to configure the region.  Finally
> > once all configuration is complete, userspace may activate the region.
> >
> > Introduced here are the most basic attributes needed to configure a
> > region. Details of these attribute are described in the ABI
> > Documentation.
> >
> > A example is provided below:
> >
> > /sys/bus/cxl/devices/region0.0:0
> > ├── devtype
> > ├── interleave_granularity
> > ├── interleave_ways
> > ├── modalias
> > ├── offset
> > ├── size
> > ├── subsystem -> ../../../../../../bus/cxl
> > ├── target0
> > ├── uevent
> > └── uuid
> >
> > Reported-by: kernel test robot <lkp@intel.com> (v2)
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > ---
> > Changes since v3:
> > - Make target be a decoder
> > - Use device_lock for protecting config/probe race
> > - Teardown region on decoder removal
> >
> > Size is still not handled.
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl |  59 ++++
> >  drivers/cxl/core/port.c                 |   8 +
> >  drivers/cxl/core/region.c               | 351 +++++++++++++++++++++++-
> >  drivers/cxl/cxl.h                       |  16 +-
> >  drivers/cxl/region.h                    |  65 +++++
> >  5 files changed, 495 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index e5db45ea70ad..c447826e8286 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -186,3 +186,62 @@ Description:
> >                 Deletes the named region.  The attribute expects a region in the
> >                 form "regionX.Y:Z". The region's name, allocated by reading
> >                 create_region, will also be released.
> > +               Deletes the named region. A region must be unbound from the
> > +               region driver before being deleted.
> 
> This is not enforced by patch1, and I don't see why this would be
> required, the kernel will do that as a part of device_unregister,
> right?
> 

This predates my understanding that unbind can't fail. I was planning to use
unbind to support managed hot remove. I will [try to remember] to change this.

> >   The attributes expects a
> > +               region in the form "regionX.Y:Z". The region's name, allocated
> > +               by reading create_region, will also be released.
> 
> This can also be dropped, userspace does not have any sensitivity to
> when / how the region name memory allocation is managed.
> 

Okay.

> > +
> > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/resource
> > +Date:          January, 2022
> > +KernelVersion: v5.18
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               A region is a contiguous partition of a CXL Root decoder address
> > +               space. Region capacity is allocated by writing to the size
> > +               attribute, the resulting physical address base determined by the
> > +               driver is reflected here.
> > +
> > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/size
> > +Date:          January, 2022
> > +KernelVersion: v5.18
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               System physical address space to be consumed by the region.
> 
> s/to be//
> 
> > +
> > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/interleave_ways
> > +Date:          January, 2022
> > +KernelVersion: v5.18
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               Configures the number of devices participating in the region is
> > +               set by writing this value. Each device will provide
> > +               1/interleave_ways of storage for the region.
> > +
> > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/interleave_granularity
> > +Date:          January, 2022
> > +KernelVersion: v5.18
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               Set the number of consecutive bytes each device in the
> > +               interleave set will claim. The possible interleave granularity
> > +               values are determined by the CXL spec and the participating
> > +               devices.
> > +
> > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/uuid
> > +Date:          January, 2022
> > +KernelVersion: v5.18
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               Write a unique identifier for the region. This field must be set
> > +               for persistent regions and it must not conflict with the UUID of
> > +               another region. If this field is set for volatile regions, the
> > +               value is ignored.
> 
> Hmm, could this attribute just be hidden via is_visible() if the
> region type is not persistent? Although that opens up new questions
> like, what about root decoders that can simultaneously support
> volatile and pmem? Encode the type in the create ABI? I.e. have
> create_pmem_region and create_volatile_region I like the idea that the
> region type is unambiguous at create time.
> 

Right. Originally I was thinking the decoder implicitly determines the type,
however as you point out decoders can support both. I'm okay to create two
nodes. We could also utilize a string for single create: volatileX, or
persistentY. My preference is two nodes, I'm just offering an alternative.

You do end up with asymmetry because there will be only one delete node for 2
create nodes. Not sure if you care about that.

> > +
> > +What: /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/endpoint_decoder[0..interleave_ways]
> > +Date:          January, 2022
> > +KernelVersion: v5.18
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               Write a decoder object that is unused and will participate in
> > +               decoding memory transactions for the interleave set, ie.
> > +               decoderX.Y. All attributes must be populated.
> 
> Feels like this wants a lead-in patch describing / implementing the
> writable decoder attributes.

Agreed.

> 
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index f3e1313217a8..0eff36f748c3 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1415,6 +1415,14 @@ EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
> >
> >  static void cxld_unregister(void *dev)
> >  {
> > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +
> > +       if (cxld->cxlr) {
> > +               mutex_lock(&cxld->cxlr->remove_lock);
> 
> I don't understand what this lock is for? Perhaps if it was named
> after the data it is locking rather than the code it would be more
> obvious.
> 

It was trying to prevent racing decoders removal with the region's removal.

> > +               device_release_driver(&cxld->cxlr->dev);
> 
> I would expect device_release_driver() to only be used in scenarios
> where the region is being disabled, but if it is being unregistered
> just let the device core detach the driver naturally.

What's the proposal then to make it work? I don't see how the region driver is
notified that a decoder went away without something like this.

> 
> > +               mutex_unlock(&cxld->cxlr->remove_lock);
> > +       }
> > +
> >         device_unregister(dev);
> >  }
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index a934938f8630..2b17b0af48de 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -2,9 +2,12 @@
> >  /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> >  #include <linux/device.h>
> >  #include <linux/module.h>
> > +#include <linux/sizes.h>
> >  #include <linux/slab.h>
> > +#include <linux/uuid.h>
> >  #include <linux/idr.h>
> >  #include <region.h>
> > +#include <cxlmem.h>
> >  #include <cxl.h>
> >  #include "core.h"
> >
> > @@ -16,28 +19,367 @@
> >   * Memory ranges, Regions represent the active mapped capacity by the HDM
> >   * Decoder Capability structures throughout the Host Bridges, Switches, and
> >   * Endpoints in the topology.
> > + *
> > + * Region configuration has some ordering constraints:
> 
> Add some "why" commentary to supplement the what.
> 
> > + * - Size: Must be set after all targets
> 
> I would expect the other way around so that decoder capacity can be
> validated as they are registered to the region?

When size is set, I need to confirm both root decoder capacity exists as well as
individual device capacity. I could flip it around and not allow targets that do
not have enough capacity left. Either is fine really, but ordering is explicit
either way.

If we mandate that the decoders themselves are first configured via sysfs (which
I think we did say but I forgot to add), then this should work fine.

> 
> > + * - Targets: Must be set after interleave ways
> > + * - Interleave ways: Must be set after Interleave Granularity
> > + *
> > + * UUID may be set at any time before binding the driver to the region.
> 
> Did we also talk about making all properties write once? Perhaps only
> if it simplifies some of the validation code.

I never saw a response from you about that when I asked. Sorry if I missed it. I
think write once is fine and makes things easier.

> 
> >   */
> >
> >
> > -static struct cxl_region *to_cxl_region(struct device *dev);
> > +static const struct attribute_group region_interleave_group;
> > +
> > +#define _REGION_ATTR_RO(name)                                                  \
> > +       static ssize_t name##_show(struct device *dev,                         \
> > +                                  struct device_attribute *attr, char *buf)   \
> > +       {                                                                      \
> > +               struct cxl_region *cxlr = to_cxl_region(dev);                  \
> > +               if (cxlr->flags & REGION_DEAD)                                 \
> 
> Per the feedback on patch1 this likely wants to be:
> 
> if (work_pending(&cxlr->detach_work))

I've already forgotten patch1 check but I can go back and check later.

> 
> > +                       return -ENODEV;                                        \
> > +               return show_##name(to_cxl_region(dev), buf);                   \
> 
> I'd rather skip this macro indirection and just open code the 'dead'
> check in the show handler...
> 
> > +       }
> > +
> > +#define REGION_ATTR_RO(name)                                                   \
> > +       _REGION_ATTR_RO(name)                                                  \
> > +       static DEVICE_ATTR_RO(name)
> 
> ...because this looks exotic to me.
> 
> > +
> > +#define _REGION_ATTR_WO(name)                                                  \
> 
> More macro that can just be C helpers / open-coded.
> 

Every attribute has the same race prevention. It seemed nice to combine it in a
relatively simple macro. I can remove it however if you don't like it but I do
believe it helps guard against future errors if adding new attributes.

> > +       static ssize_t name##_store(struct device *dev,                        \
> > +                                   struct device_attribute *attr,             \
> > +                                   const char *buf, size_t len)               \
> > +       {                                                                      \
> > +               int ret;                                                       \
> > +               if (device_lock_interruptible(dev) < 0)                        \
> 
> lockdep gave the thumbs up on this? Useful lockdep reports is another
> reason to have not this detail buried in macros.
> 

I'm running your patches and I hadn't seen anything. I am confused though, isn't
this what we discussed to prevent the race with remove()? Looks like there is
more below.

> > +                       return -EINTR;                                         \
> > +               if (dev->driver) {                                             \
> > +                       device_unlock(dev);                                    \
> > +                       return -EBUSY;                                         \
> > +               }                                                              \
> > +               ret = store_##name(to_cxl_region(dev), buf, len);              \
> > +               device_unlock(dev);                                            \
> > +               return ret;                                                    \
> > +       }
> > +
> > +#define REGION_ATTR_RW(name)                                                   \
> > +       _REGION_ATTR_RO(name)                                                  \
> > +       _REGION_ATTR_WO(name)                                                  \
> > +       static DEVICE_ATTR_RW(name)
> > +
> > +#define TARGET_ATTR_RW(n)                                                      \
> > +       static ssize_t target##n##_show(                                       \
> > +               struct device *dev, struct device_attribute *attr, char *buf)  \
> > +       {                                                                      \
> > +               return show_targetN(to_cxl_region(dev), buf, (n));             \
> > +       }                                                                      \
> > +       static ssize_t target##n##_store(struct device *dev,                   \
> > +                                        struct device_attribute *attr,        \
> > +                                        const char *buf, size_t len)          \
> > +       {                                                                      \
> > +               int ret;                                                       \
> > +               if (device_lock_interruptible(dev) < 0)                        \
> > +                       return -EINTR;                                         \
> > +               if (dev->driver) {                                             \
> > +                       device_unlock(dev);                                    \
> > +                       return -EBUSY;                                         \
> > +               }                                                              \
> > +               ret = store_targetN(to_cxl_region(dev), buf, (n), len);        \
> > +               device_unlock(dev);                                            \
> > +               return ret;                                                    \
> > +       }                                                                      \
> > +       static DEVICE_ATTR_RW(target##n)
> > +
> > +static void remove_target(struct cxl_region *cxlr, int target)
> > +{
> > +       struct cxl_decoder *cxld;
> > +
> > +       mutex_lock(&cxlr->remove_lock);
> > +       cxld = cxlr->targets[target];
> > +       if (cxld) {
> > +               cxld->cxlr = NULL;
> > +               put_device(&cxld->dev);
> > +       }
> > +       cxlr->targets[target] = NULL;
> > +       mutex_unlock(&cxlr->remove_lock);
> 
> How does this synchronize with memdev ->remove() and decoders becoming
> unregistered while still referenced by an active region? I think,
> especially for error handling scenarios the region driver will want to
> be able to assume that the region will go through ->remove() before
> any targets successfully observed by ->probe() can be removed. To me
> that means this needs a test for the memdev ->remove() flow and
> whether memdev ->remove() could perhaps synchronously trigger region
> remove for all associated regions. That would mean that the region
> device lock would need to nest beneath endpoint decoder device_lock().
> Other flows get easier if region driver flows can assume that until
> region ->remove() the target decoder list is stable.
> 

How do I enforce it goes through remove() first?

> > +}
> >
> >  static void cxl_region_release(struct device *dev)
> >  {
> >         struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> >         struct cxl_region *cxlr = to_cxl_region(dev);
> > +       int i;
> >
> >         dev_dbg(&cxld->dev, "Releasing %s\n", dev_name(dev));
> >         ida_free(&cxld->region_ida, cxlr->id);
> > +       for (i = 0; i < cxlr->interleave_ways; i++)
> > +               remove_target(cxlr, i);
> >         kfree(cxlr);
> >         put_device(&cxld->dev);
> >  }
> >
> > +static ssize_t show_interleave_ways(struct cxl_region *cxlr, char *buf)
> > +{
> > +       return sysfs_emit(buf, "%d\n", cxlr->interleave_ways);
> > +}
> > +
> > +static ssize_t store_interleave_ways(struct cxl_region *cxlr, const char *buf,
> > +                                    size_t len)
> > +{
> > +       struct cxl_decoder *rootd;
> > +       int ret, val;
> > +
> > +       ret = kstrtoint(buf, 0, &val);
> > +       if (ret)
> > +               return ret;
> > +       if (!cxlr->interleave_granularity) {
> > +               dev_dbg(&cxlr->dev, "IG must be set before IW\n");
> > +               return -ENXIO;
> > +       }
> > +       if (cxlr->interleave_ways)
> > +               return -EOPNOTSUPP;
> > +
> > +       rootd = to_cxl_decoder(cxlr->dev.parent);
> > +       if (!cxl_is_interleave_ways_valid(cxlr, rootd, val))
> > +               return -EINVAL;
> > +
> > +       cxlr->interleave_ways = val;
> > +
> > +       ret = sysfs_update_group(&cxlr->dev.kobj, &region_interleave_group);
> > +       if (ret < 0) {
> > +               cxlr->interleave_ways = 0;
> > +               return ret;
> > +       }
> > +
> > +       return len;
> > +}
> > +REGION_ATTR_RW(interleave_ways);
> > +
> > +static ssize_t show_interleave_granularity(struct cxl_region *cxlr, char *buf)
> > +{
> > +       return sysfs_emit(buf, "%d\n", cxlr->interleave_granularity);
> > +}
> > +
> > +static ssize_t store_interleave_granularity(struct cxl_region *cxlr,
> > +                                           const char *buf, size_t len)
> > +{
> > +       struct cxl_decoder *rootd;
> > +       int val, ret;
> > +
> > +       ret = kstrtoint(buf, 0, &val);
> > +       if (ret)
> > +               return ret;
> > +       rootd = to_cxl_decoder(cxlr->dev.parent);
> > +       if (!cxl_is_interleave_granularity_valid(rootd, val))
> > +               return -EINVAL;
> > +
> > +       cxlr->interleave_granularity = val;
> > +
> > +       return len;
> > +}
> > +REGION_ATTR_RW(interleave_granularity);
> > +
> > +static ssize_t show_offset(struct cxl_region *cxlr, char *buf)
> > +{
> > +       if (!cxlr->res)
> > +               return sysfs_emit(buf, "\n");
> > +
> > +       return sysfs_emit(buf, "%pa\n", &cxlr->res->start);
> > +}
> > +REGION_ATTR_RO(offset);
> > +
> > +static ssize_t show_size(struct cxl_region *cxlr, char *buf)
> > +{
> > +       return sysfs_emit(buf, "%llu\n", cxlr->size);
> > +}
> > +
> > +static ssize_t store_size(struct cxl_region *cxlr, const char *buf, size_t len)
> > +{
> > +       unsigned long long val;
> > +       ssize_t rc;
> > +
> > +       rc = kstrtoull(buf, 0, &val);
> > +       if (rc)
> > +               return rc;
> > +
> > +       cxlr->size = val;
> > +       return len;
> > +}
> > +REGION_ATTR_RW(size);
> > +
> > +static ssize_t show_uuid(struct cxl_region *cxlr, char *buf)
> > +{
> > +       return sysfs_emit(buf, "%pUb\n", &cxlr->uuid);
> > +}
> > +
> > +static int is_dupe(struct device *match, void *_cxlr)
> > +{
> > +       struct cxl_region *c, *cxlr = _cxlr;
> > +
> > +       if (!is_cxl_region(match))
> > +               return 0;
> > +
> > +       if (&cxlr->dev == match)
> > +               return 0;
> > +
> > +       c = to_cxl_region(match);
> > +       if (uuid_equal(&c->uuid, &cxlr->uuid))
> > +               return -EEXIST;
> > +
> > +       return 0;
> > +}
> > +
> > +static ssize_t store_uuid(struct cxl_region *cxlr, const char *buf, size_t len)
> > +{
> > +       ssize_t rc;
> > +       uuid_t temp;
> > +
> > +       if (len != UUID_STRING_LEN + 1)
> > +               return -EINVAL;
> > +
> > +       rc = uuid_parse(buf, &temp);
> > +       if (rc)
> > +               return rc;
> > +
> > +       rc = bus_for_each_dev(&cxl_bus_type, NULL, cxlr, is_dupe);
> > +       if (rc < 0)
> > +               return false;
> > +
> > +       cxlr->uuid = temp;
> > +       return len;
> > +}
> > +REGION_ATTR_RW(uuid);
> > +
> > +static struct attribute *region_attrs[] = {
> > +       &dev_attr_interleave_ways.attr,
> > +       &dev_attr_interleave_granularity.attr,
> > +       &dev_attr_offset.attr,
> > +       &dev_attr_size.attr,
> > +       &dev_attr_uuid.attr,
> > +       NULL,
> > +};
> > +
> > +static const struct attribute_group region_group = {
> > +       .attrs = region_attrs,
> > +};
> > +
> > +static size_t show_targetN(struct cxl_region *cxlr, char *buf, int n)
> > +{
> > +       if (!cxlr->targets[n])
> > +               return sysfs_emit(buf, "\n");
> > +
> > +       return sysfs_emit(buf, "%s\n", dev_name(&cxlr->targets[n]->dev));
> > +}
> > +
> > +static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int n,
> > +                           size_t len)
> > +{
> > +       struct cxl_decoder *cxld;
> > +       struct device *cxld_dev;
> > +
> > +       if (len == 1 || cxlr->targets[n])
> > +               remove_target(cxlr, n);
> > +
> > +       /* Remove target special case */
> > +       if (len == 1) {
> > +               device_unlock(&cxlr->dev);
> > +               return len;
> > +       }
> > +
> > +       cxld_dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
> > +       if (!cxld_dev)
> > +               return -ENOENT;
> > +
> > +       if (!is_cxl_decoder(cxld_dev)) {
> > +               put_device(cxld_dev);
> > +               return -EPERM;
> > +       }
> > +
> > +       if (!is_cxl_endpoint(to_cxl_port(cxld_dev->parent))) {
> > +               put_device(cxld_dev);
> > +               return -EINVAL;
> > +       }
> > +
> > +       /* decoder reference is held until teardown */
> > +       cxld = to_cxl_decoder(cxld_dev);
> > +       cxlr->targets[n] = cxld;
> > +       cxld->cxlr = cxlr;
> > +
> > +       return len;
> > +}
> > +
> > +TARGET_ATTR_RW(0);
> > +TARGET_ATTR_RW(1);
> > +TARGET_ATTR_RW(2);
> > +TARGET_ATTR_RW(3);
> > +TARGET_ATTR_RW(4);
> > +TARGET_ATTR_RW(5);
> > +TARGET_ATTR_RW(6);
> > +TARGET_ATTR_RW(7);
> > +TARGET_ATTR_RW(8);
> > +TARGET_ATTR_RW(9);
> > +TARGET_ATTR_RW(10);
> > +TARGET_ATTR_RW(11);
> > +TARGET_ATTR_RW(12);
> > +TARGET_ATTR_RW(13);
> > +TARGET_ATTR_RW(14);
> > +TARGET_ATTR_RW(15);
> > +
> > +static struct attribute *interleave_attrs[] = {
> > +       &dev_attr_target0.attr,
> > +       &dev_attr_target1.attr,
> > +       &dev_attr_target2.attr,
> > +       &dev_attr_target3.attr,
> > +       &dev_attr_target4.attr,
> > +       &dev_attr_target5.attr,
> > +       &dev_attr_target6.attr,
> > +       &dev_attr_target7.attr,
> > +       &dev_attr_target8.attr,
> > +       &dev_attr_target9.attr,
> > +       &dev_attr_target10.attr,
> > +       &dev_attr_target11.attr,
> > +       &dev_attr_target12.attr,
> > +       &dev_attr_target13.attr,
> > +       &dev_attr_target14.attr,
> > +       &dev_attr_target15.attr,
> > +       NULL,
> > +};
> > +
> > +static umode_t visible_targets(struct kobject *kobj, struct attribute *a, int n)
> > +{
> > +       struct device *dev = container_of(kobj, struct device, kobj);
> > +       struct cxl_region *cxlr = to_cxl_region(dev);
> > +
> > +       if (n < cxlr->interleave_ways)
> > +               return a->mode;
> > +       return 0;
> > +}
> > +
> > +static const struct attribute_group region_interleave_group = {
> > +       .attrs = interleave_attrs,
> > +       .is_visible = visible_targets,
> > +};
> > +
> > +static const struct attribute_group *region_groups[] = {
> > +       &region_group,
> > +       &region_interleave_group,
> > +       &cxl_base_attribute_group,
> > +       NULL,
> > +};
> > +
> >  static const struct device_type cxl_region_type = {
> >         .name = "cxl_region",
> >         .release = cxl_region_release,
> > +       .groups = region_groups
> >  };
> >
> > -static struct cxl_region *to_cxl_region(struct device *dev)
> > +bool is_cxl_region(struct device *dev)
> > +{
> > +       return dev->type == &cxl_region_type;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(is_cxl_region, CXL);
> > +
> > +struct cxl_region *to_cxl_region(struct device *dev)
> >  {
> >         if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> >                           "not a cxl_region device\n"))
> > @@ -45,6 +387,7 @@ static struct cxl_region *to_cxl_region(struct device *dev)
> >
> >         return container_of(dev, struct cxl_region, dev);
> >  }
> > +EXPORT_SYMBOL_NS_GPL(to_cxl_region, CXL);
> >
> >  static void unregister_region(struct work_struct *work)
> >  {
> > @@ -79,6 +422,8 @@ static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
> >                 return ERR_PTR(-ENOMEM);
> >         }
> >
> > +       cxlr->id = cxld->next_region_id;
> > +
> >         cxld->next_region_id = rc;
> >
> >         dev = &cxlr->dev;
> > @@ -88,6 +433,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
> >         dev->bus = &cxl_bus_type;
> >         dev->type = &cxl_region_type;
> >         INIT_WORK(&cxlr->unregister_work, unregister_region);
> > +       mutex_init(&cxlr->remove_lock);
> >
> >         return cxlr;
> >  }
> > @@ -118,7 +464,6 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_decoder *cxld)
> >
> >         dev = &cxlr->dev;
> >
> > -       cxlr->id = cxld->next_region_id;
> >         rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, cxlr->id);
> >         if (rc)
> >                 goto err_out;
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index d5397f7dfcf4..26351ed0ba65 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -81,6 +81,19 @@ static inline int cxl_to_interleave_ways(u8 eniw)
> >         }
> >  }
> >
> > +static inline int cxl_from_ways(u8 ways)
> > +{
> > +       if (is_power_of_2(ways))
> > +               return ilog2(ways);
> > +
> > +       return ways / 3 + 8;
> > +}
> > +
> > +static inline int cxl_from_granularity(u16 g)
> > +{
> > +       return ilog2(g) - 8;
> > +}
> > +
> >  /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
> >  #define CXLDEV_CAP_ARRAY_OFFSET 0x0
> >  #define   CXLDEV_CAP_ARRAY_CAP_ID 0
> > @@ -223,6 +236,7 @@ enum cxl_decoder_type {
> >   * @target_lock: coordinate coherent reads of the target list
> >   * @region_ida: allocator for region ids.
> >   * @next_region_id: Cached region id for next region.
> > + * @region: The region this decoder is associated with.
> >   * @nr_targets: number of elements in @target
> >   * @target: active ordered target list in current decoder configuration
> >   */
> > @@ -241,11 +255,11 @@ struct cxl_decoder {
> >         struct mutex id_lock;
> >         struct ida region_ida;
> >         int next_region_id;
> > +       struct cxl_region *cxlr;
> >         int nr_targets;
> >         struct cxl_dport *target[];
> >  };
> >
> > -
> >  /**
> >   * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
> >   * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
> > diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
> > index 7025f6785f83..e78a049a5729 100644
> > --- a/drivers/cxl/region.h
> > +++ b/drivers/cxl/region.h
> > @@ -13,6 +13,14 @@
> >   * @id: This region's id. Id is globally unique across all regions.
> >   * @flags: Flags representing the current state of the region.
> >   * @unregister_work: Async unregister to allow attrs to take device_lock.
> > + * @remove_lock: Coordinates region removal against decoder removal
> > + * @list: Node in decoder's region list.
> > + * @res: Resource this region carves out of the platform decode range.
> > + * @size: Size of the region determined from LSA or userspace.
> > + * @uuid: The UUID for this region.
> > + * @interleave_ways: Number of interleave ways this region is configured for.
> > + * @interleave_granularity: Interleave granularity of region
> > + * @targets: The memory devices comprising the region.
> >   */
> >  struct cxl_region {
> >         struct device dev;
> > @@ -20,9 +28,66 @@ struct cxl_region {
> >         unsigned long flags;
> >  #define REGION_DEAD 0
> >         struct work_struct unregister_work;
> > +       struct mutex remove_lock;
> >
> > +       struct list_head list;
> > +       struct resource *res;
> > +
> > +       u64 size;
> > +       uuid_t uuid;
> > +       int interleave_ways;
> > +       int interleave_granularity;
> > +       struct cxl_decoder *targets[CXL_DECODER_MAX_INTERLEAVE];
> >  };
> >
> > +bool is_cxl_region(struct device *dev);
> > +struct cxl_region *to_cxl_region(struct device *dev);
> >  bool schedule_cxl_region_unregister(struct cxl_region *cxlr);
> >
> > +static inline bool cxl_is_interleave_ways_valid(const struct cxl_region *cxlr,
> > +                                               const struct cxl_decoder *rootd,
> > +                                               u8 ways)
> > +{
> > +       int root_ig, region_ig, root_eniw;
> > +
> > +       switch (ways) {
> > +       case 0 ... 4:
> > +       case 6:
> > +       case 8:
> > +       case 12:
> > +       case 16:
> > +               break;
> > +       default:
> > +               return false;
> > +       }
> > +
> > +       if (rootd->interleave_ways == 1)
> > +               return true;
> > +
> > +       root_ig = cxl_from_granularity(rootd->interleave_granularity);
> > +       region_ig = cxl_from_granularity(cxlr->interleave_granularity);
> > +       root_eniw = cxl_from_ways(rootd->interleave_ways);
> > +
> > +       return ((1 << (root_ig - region_ig)) * (1 << root_eniw)) <= ways;
> 
> Some comments for this math please.
> 

Okay.

> > +}
> > +
> > +static inline bool
> > +cxl_is_interleave_granularity_valid(const struct cxl_decoder *rootd, int ig)
> > +{
> > +       int rootd_hbig;
> > +
> > +       if (!is_power_of_2(ig))
> > +               return false;
> > +
> > +       /* 16K is the max */
> > +       if (ig >> 15)
> 
> Why the shift instead of the more straightforward:
> 
> if (ig > SZ_16K)
> 

Okay.

> > +               return false;
> > +
> > +       rootd_hbig = cxl_from_granularity(rootd->interleave_granularity);
> > +       if (rootd_hbig < cxl_from_granularity(ig))
> > +               return false;
> 
> Why do the comparison in CXL encoding vs ordinal:
> 
> if (ig > rootd->interleave_granularity)
> 
> ?

It was meant to future proof in case interleave granularity becomes non-ordinal.
I had a hunch you would request this change. I'd prefer to leave this as-is
unless you really hate it.

> 
> > +
> > +       return true;
> > +}
> > +
> >  #endif
> > --
> > 2.35.1
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] cxl/region: Introduce concept of region configuration
  2022-03-01 17:43     ` Ben Widawsky
@ 2022-03-01 18:34       ` Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2022-03-01 18:34 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, patches, kernel test robot, Alison Schofield,
	Ira Weiny, Jonathan Cameron, Vishal Verma

On Tue, Mar 1, 2022 at 9:43 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 22-02-28 17:16:10, Dan Williams wrote:
> > On Thu, Feb 24, 2022 at 10:01 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > The region creation APIs create a vacant region. Configuring the region
> > > works in the same way as similar subsystems such as devdax. Sysfs attrs
> > > will be provided to allow userspace to configure the region.  Finally
> > > once all configuration is complete, userspace may activate the region.
> > >
> > > Introduced here are the most basic attributes needed to configure a
> > > region. Details of these attribute are described in the ABI
> > > Documentation.
> > >
> > > A example is provided below:
> > >
> > > /sys/bus/cxl/devices/region0.0:0
> > > ├── devtype
> > > ├── interleave_granularity
> > > ├── interleave_ways
> > > ├── modalias
> > > ├── offset
> > > ├── size
> > > ├── subsystem -> ../../../../../../bus/cxl
> > > ├── target0
> > > ├── uevent
> > > └── uuid
> > >
> > > Reported-by: kernel test robot <lkp@intel.com> (v2)
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > ---
> > > Changes since v3:
> > > - Make target be a decoder
> > > - Use device_lock for protecting config/probe race
> > > - Teardown region on decoder removal
> > >
> > > Size is still not handled.
> > > ---
> > >  Documentation/ABI/testing/sysfs-bus-cxl |  59 ++++
> > >  drivers/cxl/core/port.c                 |   8 +
> > >  drivers/cxl/core/region.c               | 351 +++++++++++++++++++++++-
> > >  drivers/cxl/cxl.h                       |  16 +-
> > >  drivers/cxl/region.h                    |  65 +++++
> > >  5 files changed, 495 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > > index e5db45ea70ad..c447826e8286 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > > @@ -186,3 +186,62 @@ Description:
> > >                 Deletes the named region.  The attribute expects a region in the
> > >                 form "regionX.Y:Z". The region's name, allocated by reading
> > >                 create_region, will also be released.
> > > +               Deletes the named region. A region must be unbound from the
> > > +               region driver before being deleted.
> >
> > This is not enforced by patch1, and I don't see why this would be
> > required, the kernel will do that as a part of device_unregister,
> > right?
> >
>
> This predates my understanding that unbind can't fail. I was planning to use
> unbind to support managed hot remove. I will [try to remember] to change this.
>
> > >   The attributes expects a
> > > +               region in the form "regionX.Y:Z". The region's name, allocated
> > > +               by reading create_region, will also be released.
> >
> > This can also be dropped, userspace does not have any sensitivity to
> > when / how the region name memory allocation is managed.
> >
>
> Okay.
>
> > > +
> > > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/resource
> > > +Date:          January, 2022
> > > +KernelVersion: v5.18
> > > +Contact:       linux-cxl@vger.kernel.org
> > > +Description:
> > > +               A region is a contiguous partition of a CXL Root decoder address
> > > +               space. Region capacity is allocated by writing to the size
> > > +               attribute, the resulting physical address base determined by the
> > > +               driver is reflected here.
> > > +
> > > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/size
> > > +Date:          January, 2022
> > > +KernelVersion: v5.18
> > > +Contact:       linux-cxl@vger.kernel.org
> > > +Description:
> > > +               System physical address space to be consumed by the region.
> >
> > s/to be//
> >
> > > +
> > > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/interleave_ways
> > > +Date:          January, 2022
> > > +KernelVersion: v5.18
> > > +Contact:       linux-cxl@vger.kernel.org
> > > +Description:
> > > +               Configures the number of devices participating in the region is
> > > +               set by writing this value. Each device will provide
> > > +               1/interleave_ways of storage for the region.
> > > +
> > > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/interleave_granularity
> > > +Date:          January, 2022
> > > +KernelVersion: v5.18
> > > +Contact:       linux-cxl@vger.kernel.org
> > > +Description:
> > > +               Set the number of consecutive bytes each device in the
> > > +               interleave set will claim. The possible interleave granularity
> > > +               values are determined by the CXL spec and the participating
> > > +               devices.
> > > +
> > > +What:          /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/uuid
> > > +Date:          January, 2022
> > > +KernelVersion: v5.18
> > > +Contact:       linux-cxl@vger.kernel.org
> > > +Description:
> > > +               Write a unique identifier for the region. This field must be set
> > > +               for persistent regions and it must not conflict with the UUID of
> > > +               another region. If this field is set for volatile regions, the
> > > +               value is ignored.
> >
> > Hmm, could this attribute just be hidden via is_visible() if the
> > region type is not persistent? Although that opens up new questions
> > like, what about root decoders that can simultaneously support
> > volatile and pmem? Encode the type in the create ABI? I.e. have
> > create_pmem_region and create_volatile_region I like the idea that the
> > region type is unambiguous at create time.
> >
>
> Right. Originally I was thinking the decoder implicitly determines the type,
> however as you point out decoders can support both. I'm okay to create two
> nodes. We could also utilize a string for single create: volatileX, or
> persistentY. My preference is two nodes, I'm just offering an alternative.

The right sysfs-answer is 2 nodes.

> You do end up with asymmetry because there will be only one delete node for 2
> create nodes. Not sure if you care about that.

Nope, don't care. sysfs interfaces are meant to be utilitarian, not
necessarily user-friendly.

>
> > > +
> > > +What: /sys/bus/cxl/devices/decoderX.Y/regionX.Y:Z/endpoint_decoder[0..interleave_ways]
> > > +Date:          January, 2022
> > > +KernelVersion: v5.18
> > > +Contact:       linux-cxl@vger.kernel.org
> > > +Description:
> > > +               Write a decoder object that is unused and will participate in
> > > +               decoding memory transactions for the interleave set, ie.
> > > +               decoderX.Y. All attributes must be populated.
> >
> > Feels like this wants a lead-in patch describing / implementing the
> > writable decoder attributes.
>
> Agreed.
>
> >
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index f3e1313217a8..0eff36f748c3 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -1415,6 +1415,14 @@ EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
> > >
> > >  static void cxld_unregister(void *dev)

Let's not use local variable shorthand names in function identifiers.
This should be called cxl_decoder_unregister(), I had misread this as
cxlr_unregister(), apologies.

> > >  {
> > > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > > +
> > > +       if (cxld->cxlr) {
> > > +               mutex_lock(&cxld->cxlr->remove_lock);
> >
> > I don't understand what this lock is for? Perhaps if it was named
> > after the data it is locking rather than the code it would be more
> > obvious.
> >
>
> It was trying to prevent racing decoders removal with the region's removal.

It's naming suggested to me that it was locking code rather than data,
let's call it targets_lock.

> > > +               device_release_driver(&cxld->cxlr->dev);
> >
> > I would expect device_release_driver() to only be used in scenarios
> > where the region is being disabled, but if it is being unregistered
> > just let the device core detach the driver naturally.
>
> What's the proposal then to make it work? I don't see how the region driver is
> notified that a decoder went away without something like this.

Perhaps with the naming fixed up I would not have confused this for
the cxl_region unregister path. I think this can work, but it needs a
paired check in the probe path to take the targets_lock to prevent the
region from going enabled again between dropping the targets_lock and
unregistering the decoder.

>
> >
> > > +               mutex_unlock(&cxld->cxlr->remove_lock);
> > > +       }
> > > +
> > >         device_unregister(dev);
> > >  }
> > >
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index a934938f8630..2b17b0af48de 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -2,9 +2,12 @@
> > >  /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> > >  #include <linux/device.h>
> > >  #include <linux/module.h>
> > > +#include <linux/sizes.h>
> > >  #include <linux/slab.h>
> > > +#include <linux/uuid.h>
> > >  #include <linux/idr.h>
> > >  #include <region.h>
> > > +#include <cxlmem.h>
> > >  #include <cxl.h>
> > >  #include "core.h"
> > >
> > > @@ -16,28 +19,367 @@
> > >   * Memory ranges, Regions represent the active mapped capacity by the HDM
> > >   * Decoder Capability structures throughout the Host Bridges, Switches, and
> > >   * Endpoints in the topology.
> > > + *
> > > + * Region configuration has some ordering constraints:
> >
> > Add some "why" commentary to supplement the what.
> >
> > > + * - Size: Must be set after all targets
> >
> > I would expect the other way around so that decoder capacity can be
> > validated as they are registered to the region?
>
> When size is set, I need to confirm both root decoder capacity exists as well as
> individual device capacity. I could flip it around and not allow targets that do
> not have enough capacity left. Either is fine really, but ordering is explicit
> either way.
>
> If we mandate that the decoders themselves are first configured via sysfs (which
> I think we did say but I forgot to add), then this should work fine.

Yeah, they are 2 separate steps. It should be fine to independently
allocate HPA-capacity and DPA-capacity and then validate that the
settings are compatible at decoder-to-region association time. So no
need to have a strict order, just check compatibility when they come
together.

> > > + * - Targets: Must be set after interleave ways
> > > + * - Interleave ways: Must be set after Interleave Granularity
> > > + *
> > > + * UUID may be set at any time before binding the driver to the region.
> >
> > Did we also talk about making all properties write once? Perhaps only
> > if it simplifies some of the validation code.
>
> I never saw a response from you about that when I asked. Sorry if I missed it. I
> think write once is fine and makes things easier.

Works for me, it can always be relaxed later if it proves too restrictive.

> > >   */
> > >
> > >
> > > -static struct cxl_region *to_cxl_region(struct device *dev);
> > > +static const struct attribute_group region_interleave_group;
> > > +
> > > +#define _REGION_ATTR_RO(name)                                                  \
> > > +       static ssize_t name##_show(struct device *dev,                         \
> > > +                                  struct device_attribute *attr, char *buf)   \
> > > +       {                                                                      \
> > > +               struct cxl_region *cxlr = to_cxl_region(dev);                  \
> > > +               if (cxlr->flags & REGION_DEAD)                                 \
> >
> > Per the feedback on patch1 this likely wants to be:
> >
> > if (work_pending(&cxlr->detach_work))
>
> I've already forgotten patch1 check but I can go back and check later.

It's a lot of feedback to be sure, but we need to get through it to
converge. Sometimes the feedback is based on a misunderstanding like
the clx[rd] confusion above, so I appreciate you taking the care to
reconcile it all.

> > > +                       return -ENODEV;                                        \
> > > +               return show_##name(to_cxl_region(dev), buf);                   \
> >
> > I'd rather skip this macro indirection and just open code the 'dead'
> > check in the show handler...
> >
> > > +       }
> > > +
> > > +#define REGION_ATTR_RO(name)                                                   \
> > > +       _REGION_ATTR_RO(name)                                                  \
> > > +       static DEVICE_ATTR_RO(name)
> >
> > ...because this looks exotic to me.
> >
> > > +
> > > +#define _REGION_ATTR_WO(name)                                                  \
> >
> > More macro that can just be C helpers / open-coded.
> >
>
> Every attribute has the same race prevention. It seemed nice to combine it in a
> relatively simple macro. I can remove it however if you don't like it but I do
> believe it helps guard against future errors if adding new attributes.

I don't like that it will make debugging more difficult for resolving
back traces to line numbers, and the common code can be handled with
common preamble / postamble helper functions.

>
> > > +       static ssize_t name##_store(struct device *dev,                        \
> > > +                                   struct device_attribute *attr,             \
> > > +                                   const char *buf, size_t len)               \
> > > +       {                                                                      \
> > > +               int ret;                                                       \
> > > +               if (device_lock_interruptible(dev) < 0)                        \
> >
> > lockdep gave the thumbs up on this? Useful lockdep reports is another
> > reason to have not this detail buried in macros.
> >
>
> I'm running your patches and I hadn't seen anything. I am confused though, isn't
> this what we discussed to prevent the race with remove()?

It's similar to what nvdimm is using for deadlock avoidance, but CXL
locking != NVDIMM locking so it needs reverification. You would need
to update cxl_regions to have a lock_class, see cxl_set_lock_class()
in the new set.

> Looks like there is more below.
>
> > > +                       return -EINTR;                                         \
> > > +               if (dev->driver) {                                             \
> > > +                       device_unlock(dev);                                    \
> > > +                       return -EBUSY;                                         \
> > > +               }                                                              \
> > > +               ret = store_##name(to_cxl_region(dev), buf, len);              \
> > > +               device_unlock(dev);                                            \
> > > +               return ret;                                                    \
> > > +       }
> > > +
> > > +#define REGION_ATTR_RW(name)                                                   \
> > > +       _REGION_ATTR_RO(name)                                                  \
> > > +       _REGION_ATTR_WO(name)                                                  \
> > > +       static DEVICE_ATTR_RW(name)
> > > +
> > > +#define TARGET_ATTR_RW(n)                                                      \
> > > +       static ssize_t target##n##_show(                                       \
> > > +               struct device *dev, struct device_attribute *attr, char *buf)  \
> > > +       {                                                                      \
> > > +               return show_targetN(to_cxl_region(dev), buf, (n));             \
> > > +       }                                                                      \
> > > +       static ssize_t target##n##_store(struct device *dev,                   \
> > > +                                        struct device_attribute *attr,        \
> > > +                                        const char *buf, size_t len)          \
> > > +       {                                                                      \
> > > +               int ret;                                                       \
> > > +               if (device_lock_interruptible(dev) < 0)                        \
> > > +                       return -EINTR;                                         \
> > > +               if (dev->driver) {                                             \
> > > +                       device_unlock(dev);                                    \
> > > +                       return -EBUSY;                                         \
> > > +               }                                                              \
> > > +               ret = store_targetN(to_cxl_region(dev), buf, (n), len);        \
> > > +               device_unlock(dev);                                            \
> > > +               return ret;                                                    \
> > > +       }                                                                      \
> > > +       static DEVICE_ATTR_RW(target##n)
> > > +
> > > +static void remove_target(struct cxl_region *cxlr, int target)
> > > +{
> > > +       struct cxl_decoder *cxld;
> > > +
> > > +       mutex_lock(&cxlr->remove_lock);
> > > +       cxld = cxlr->targets[target];
> > > +       if (cxld) {
> > > +               cxld->cxlr = NULL;
> > > +               put_device(&cxld->dev);
> > > +       }
> > > +       cxlr->targets[target] = NULL;
> > > +       mutex_unlock(&cxlr->remove_lock);
> >
> > How does this synchronize with memdev ->remove() and decoders becoming
> > unregistered while still referenced by an active region? I think,
> > especially for error handling scenarios the region driver will want to
> > be able to assume that the region will go through ->remove() before
> > any targets successfully observed by ->probe() can be removed. To me
> > that means this needs a test for the memdev ->remove() flow and
> > whether memdev ->remove() could perhaps synchronously trigger region
> > remove for all associated regions. That would mean that the region
> > device lock would need to nest beneath endpoint decoder device_lock().
> > Other flows get easier if region driver flows can assume that until
> > region ->remove() the target decoder list is stable.
> >
>
> How do I enforce it goes through remove() first?

If the equivalent of remove_target() happened inside the targets_lock
(formerly remove_lock) section of cxl_decoder_unregister() (formerly
cxld_unregister) after the region had been unbound, then I think that
might be sufficient. Otherwise a standalone remove_target() outside of
the decoder unregister path does not track for me. What am I missing?

[..]
>
> > > +}
> > > +
> > > +static inline bool
> > > +cxl_is_interleave_granularity_valid(const struct cxl_decoder *rootd, int ig)
> > > +{
> > > +       int rootd_hbig;
> > > +
> > > +       if (!is_power_of_2(ig))
> > > +               return false;
> > > +
> > > +       /* 16K is the max */
> > > +       if (ig >> 15)
> >
> > Why the shift instead of the more straightforward:
> >
> > if (ig > SZ_16K)
> >
>
> Okay.
>
> > > +               return false;
> > > +
> > > +       rootd_hbig = cxl_from_granularity(rootd->interleave_granularity);
> > > +       if (rootd_hbig < cxl_from_granularity(ig))
> > > +               return false;
> >
> > Why do the comparison in CXL encoding vs ordinal:
> >
> > if (ig > rootd->interleave_granularity)
> >
> > ?
>
> It was meant to future proof in case interleave granularity becomes non-ordinal.
> I had a hunch you would request this change. I'd prefer to leave this as-is
> unless you really hate it.

How could interleave granularity become non-ordinal in the future?
It's not about hate or taste, it does not have a logical reason to
exist that I can see.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] cxl/region: Add region creation ABI
  2022-02-28 23:48   ` Dan Williams
@ 2022-03-01 21:22     ` Ben Widawsky
  2022-03-01 21:36       ` Ben Widawsky
  2022-03-01 21:49       ` Dan Williams
  0 siblings, 2 replies; 10+ messages in thread
From: Ben Widawsky @ 2022-03-01 21:22 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, patches, Alison Schofield, Ira Weiny,
	Jonathan Cameron, Vishal Verma

On 22-02-28 15:48:43, Dan Williams wrote:
> On Thu, Feb 24, 2022 at 10:01 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > Regions are created as a child of the decoder that encompasses an
> > address space with constraints. Regions have a number of attributes that
> > must be configured before the region can be activated.
> >
> > The ABI is not meant to be secure,
> 
> What does that mean? It's "secure" by virtue of only being root
> writable and if root userspace process race themselves the kernel will
> coherently handle the conflict and pick a winner.
> 
> > but is meant to avoid accidental
> > races.
> 
> It also solves purposeful race.
> 
> >  As a result, a buggy process may create a region by name that was
> > allocated by a different process.
> 
> That's not a bug, one process atomically claims the next region, the
> other loops.
> 
> > However, multiple processes which are
> > trying not to race with each other shouldn't need special
> > synchronization to do so.
> >
> > // Allocate a new region name
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)
> >
> > // Create a new region by name
> > while
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)
> > ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_region
> > do true; done
> >
> > // Region now exists in sysfs
> > stat -t /sys/bus/cxl/devices/decoder0.0/$region
> >
> > // Delete the region, and name
> > echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
> >
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> >
> > ---
> > Changes since v5 (all Dan):
> > - Fix erroneous return on create
> > - Fix ida leak on error
> > - forward declare to_cxl_region instead of cxl_region_release
> > - Use REGION_DEAD in the right place
> > - Allocate next id in region_alloc
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl       |  23 ++
> >  .../driver-api/cxl/memory-devices.rst         |  11 +
> >  drivers/cxl/core/Makefile                     |   1 +
> >  drivers/cxl/core/core.h                       |   3 +
> >  drivers/cxl/core/port.c                       |  18 ++
> >  drivers/cxl/core/region.c                     | 223 ++++++++++++++++++
> >  drivers/cxl/cxl.h                             |   5 +
> >  drivers/cxl/region.h                          |  28 +++
> >  tools/testing/cxl/Kbuild                      |   1 +
> >  9 files changed, 313 insertions(+)
> >  create mode 100644 drivers/cxl/core/region.c
> >  create mode 100644 drivers/cxl/region.h
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 7c2b846521f3..e5db45ea70ad 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -163,3 +163,26 @@ Description:
> >                 memory (type-3). The 'target_type' attribute indicates the
> >                 current setting which may dynamically change based on what
> >                 memory regions are activated in this decode hierarchy.
> > +
> > +What:          /sys/bus/cxl/devices/decoderX.Y/create_region
> > +Date:          January, 2022
> > +KernelVersion: v5.18
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               Write a value of the form 'regionX.Y:Z' to instantiate a new
> 
> Per internal conversation this will move to "write a value of the form
> "%u", right?

Yep.

> 
> > +               region within the decode range bounded by decoderX.Y. The value
> > +               written must match the current value returned from reading this
> > +               attribute. This behavior lets the kernel arbitrate racing
> > +               attempts to create a region. The thread that fails to write
> > +               loops and tries the next value. Regions must be created for root
> > +               decoders,
> 
> Does this need to be stated? It's already implicit by the fact that
> 'create_region' does not appear on anything but root decoders.
> 

Yes. Historically I had added this attribute for all decoders...

> 
> >  and must subsequently configured and bound to a region
> > +               driver before they can be used.
> > +
> > +What:          /sys/bus/cxl/devices/decoderX.Y/delete_region
> > +Date:          January, 2022
> > +KernelVersion: v5.18
> > +Contact:       linux-cxl@vger.kernel.org
> > +Description:
> > +               Deletes the named region.  The attribute expects a region in the
> > +               form "regionX.Y:Z". The region's name, allocated by reading
> > +               create_region, will also be released.
> > diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> > index db476bb170b6..66ddc58a21b1 100644
> > --- a/Documentation/driver-api/cxl/memory-devices.rst
> > +++ b/Documentation/driver-api/cxl/memory-devices.rst
> > @@ -362,6 +362,17 @@ CXL Core
> >  .. kernel-doc:: drivers/cxl/core/mbox.c
> >     :doc: cxl mbox
> >
> > +CXL Regions
> > +-----------
> > +.. kernel-doc:: drivers/cxl/region.h
> > +   :identifiers:
> > +
> > +.. kernel-doc:: drivers/cxl/core/region.c
> > +   :doc: cxl core region
> > +
> > +.. kernel-doc:: drivers/cxl/core/region.c
> > +   :identifiers:
> > +
> >  External Interfaces
> >  ===================
> >
> > diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> > index 6d37cd78b151..39ce8f2f2373 100644
> > --- a/drivers/cxl/core/Makefile
> > +++ b/drivers/cxl/core/Makefile
> > @@ -4,6 +4,7 @@ obj-$(CONFIG_CXL_BUS) += cxl_core.o
> >  ccflags-y += -I$(srctree)/drivers/cxl
> >  cxl_core-y := port.o
> >  cxl_core-y += pmem.o
> > +cxl_core-y += region.o
> >  cxl_core-y += regs.o
> >  cxl_core-y += memdev.o
> >  cxl_core-y += mbox.o
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 1a50c0fc399c..adfd42370b28 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -9,6 +9,9 @@ extern const struct device_type cxl_nvdimm_type;
> >
> >  extern struct attribute_group cxl_base_attribute_group;
> >
> > +extern struct device_attribute dev_attr_create_region;
> > +extern struct device_attribute dev_attr_delete_region;
> > +
> >  struct cxl_send_command;
> >  struct cxl_mem_query_commands;
> >  int cxl_query_cmd(struct cxl_memdev *cxlmd,
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 1e785a3affaa..f3e1313217a8 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -9,6 +9,7 @@
> >  #include <linux/idr.h>
> >  #include <cxlmem.h>
> >  #include <cxlpci.h>
> > +#include <region.h>
> >  #include <cxl.h>
> >  #include "core.h"
> >
> > @@ -213,6 +214,8 @@ static struct attribute_group cxl_decoder_base_attribute_group = {
> >  };
> >
> >  static struct attribute *cxl_decoder_root_attrs[] = {
> > +       &dev_attr_create_region.attr,
> > +       &dev_attr_delete_region.attr,
> >         &dev_attr_cap_pmem.attr,
> >         &dev_attr_cap_ram.attr,
> >         &dev_attr_cap_type2.attr,
> > @@ -270,6 +273,8 @@ static void cxl_decoder_release(struct device *dev)
> >         struct cxl_decoder *cxld = to_cxl_decoder(dev);
> >         struct cxl_port *port = to_cxl_port(dev->parent);
> >
> > +       ida_free(&cxld->region_ida, cxld->next_region_id);
> > +       ida_destroy(&cxld->region_ida);
> >         ida_free(&port->decoder_ida, cxld->id);
> >         kfree(cxld);
> >  }
> > @@ -1244,6 +1249,13 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >         cxld->target_type = CXL_DECODER_EXPANDER;
> >         cxld->platform_res = (struct resource)DEFINE_RES_MEM(0, 0);
> >
> > +       mutex_init(&cxld->id_lock);
> > +       ida_init(&cxld->region_ida);
> > +       rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> > +       if (rc < 0)
> > +               goto err;
> > +
> > +       cxld->next_region_id = rc;
> >         return cxld;
> >  err:
> >         kfree(cxld);
> > @@ -1502,6 +1514,12 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
> >
> > +bool schedule_cxl_region_unregister(struct cxl_region *cxlr)
> > +{
> > +       return queue_work(cxl_bus_wq, &cxlr->unregister_work);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(schedule_cxl_region_unregister, CXL);
> > +
> >  /* for user tooling to ensure port disable work has completed */
> >  static ssize_t flush_store(struct bus_type *bus, const char *buf, size_t count)
> >  {
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > new file mode 100644
> > index 000000000000..a934938f8630
> > --- /dev/null
> > +++ b/drivers/cxl/core/region.c
> > @@ -0,0 +1,223 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/idr.h>
> > +#include <region.h>
> > +#include <cxl.h>
> > +#include "core.h"
> > +
> > +/**
> > + * DOC: cxl core region
> > + *
> > + * CXL Regions represent mapped memory capacity in system physical address
> > + * space. Whereas the CXL Root Decoders identify the bounds of potential CXL
> > + * Memory ranges, Regions represent the active mapped capacity by the HDM
> > + * Decoder Capability structures throughout the Host Bridges, Switches, and
> > + * Endpoints in the topology.
> > + */
> > +
> > +
> > +static struct cxl_region *to_cxl_region(struct device *dev);
> 
> cxl_lock_class() will want to know the lock_class for the cxl_region
> device lock, so this can go straight to be an extern declaration
> alongside is_cxl_decoder().
> 

Yeah, this is stale already. I took what you have in the preview branch which
sets it up right before device_add().

> > +
> > +static void cxl_region_release(struct device *dev)
> > +{
> > +       struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> > +       struct cxl_region *cxlr = to_cxl_region(dev);
> > +
> > +       dev_dbg(&cxld->dev, "Releasing %s\n", dev_name(dev));
> > +       ida_free(&cxld->region_ida, cxlr->id);
> > +       kfree(cxlr);
> > +       put_device(&cxld->dev);
> > +}
> > +
> > +static const struct device_type cxl_region_type = {
> > +       .name = "cxl_region",
> > +       .release = cxl_region_release,
> > +};
> > +
> > +static struct cxl_region *to_cxl_region(struct device *dev)
> > +{
> > +       if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> > +                         "not a cxl_region device\n"))
> > +               return NULL;
> > +
> > +       return container_of(dev, struct cxl_region, dev);
> > +}
> > +
> > +static void unregister_region(struct work_struct *work)
> > +{
> > +       struct cxl_region *cxlr;
> > +
> > +       cxlr = container_of(work, typeof(*cxlr), unregister_work);
> > +       device_unregister(&cxlr->dev);
> > +}
> > +
> > +static void schedule_unregister(void *cxlr)
> > +{
> > +       schedule_cxl_region_unregister(cxlr);
> > +}
> > +
> > +static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
> > +{
> > +       struct cxl_region *cxlr;
> > +       struct device *dev;
> > +       int rc;
> > +
> > +       lockdep_assert_held(&cxld->id_lock);
> > +
> > +       rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> > +       if (rc < 0) {
> > +               dev_dbg(dev, "Failed to get next cached id (%d)\n", rc);
> > +               return ERR_PTR(rc);
> > +       }
> > +
> > +       cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL);
> > +       if (!cxlr) {
> > +               ida_free(&cxld->region_ida, rc);
> > +               return ERR_PTR(-ENOMEM);
> > +       }
> > +
> > +       cxld->next_region_id = rc;
> > +
> > +       dev = &cxlr->dev;
> > +       device_initialize(dev);
> > +       dev->parent = &cxld->dev;
> > +       device_set_pm_not_required(dev);
> > +       dev->bus = &cxl_bus_type;
> > +       dev->type = &cxl_region_type;
> > +       INIT_WORK(&cxlr->unregister_work, unregister_region);
> 
> Perhaps name it "detach_work" to match the same for memdevs, or second
> choice, go rename the memdev one to "unregister_work". Keep the naming
> consistent for data structures that fill the same role.
> 

Okay. Do you want me to rename schedule_cxl_region_unregister() also? I assume
yes.

> > +
> > +       return cxlr;
> > +}
> > +
> > +/**
> > + * devm_cxl_add_region - Adds a region to a decoder
> > + * @cxld: Parent decoder.
> > + * @cxlr: Region to be added to the decoder.
> > + *
> > + * This is the second step of region initialization. Regions exist within an
> > + * address space which is mapped by a @cxld. That @cxld must be a root decoder,
> > + * and it enforces constraints upon the region as it is configured.
> > + *
> > + * Return: 0 if the region was added to the @cxld, else returns negative error
> > + * code. The region will be named "regionX.Y.Z" where X is the port, Y is the
> > + * decoder id, and Z is the region number.
> > + */
> > +static struct cxl_region *devm_cxl_add_region(struct cxl_decoder *cxld)
> > +{
> > +       struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > +       struct cxl_region *cxlr;
> > +       struct device *dev;
> > +       int rc;
> > +
> > +       cxlr = cxl_region_alloc(cxld);
> > +       if (IS_ERR(cxlr))
> > +               return cxlr;
> > +
> > +       dev = &cxlr->dev;
> > +
> > +       cxlr->id = cxld->next_region_id;
> > +       rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, cxlr->id);
> > +       if (rc)
> > +               goto err_out;
> > +
> > +       /* affirm that release will have access to the decoder's region ida  */
> 
> s/affirm that release will/arrange for cxl_region_release to/
> 
> > +       get_device(&cxld->dev);
> > +
> > +       rc = device_add(dev);
> > +       if (rc)
> > +               goto err_put;
> > +
> > +       rc = devm_add_action_or_reset(port->uport, schedule_unregister, cxlr);
> > +       if (rc)
> > +               goto err_put;
> > +
> > +       return cxlr;
> > +
> > +err_put:
> > +       put_device(&cxld->dev);
> > +
> > +err_out:
> > +       put_device(dev);
> > +       return ERR_PTR(rc);
> > +}
> > +
> > +static ssize_t create_region_show(struct device *dev,
> > +                                 struct device_attribute *attr, char *buf)
> > +{
> > +       struct cxl_port *port = to_cxl_port(dev->parent);
> > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +
> > +       return sysfs_emit(buf, "region%d.%d:%d\n", port->id, cxld->id,
> > +                         cxld->next_region_id);
> 
> To cut down on userspace racing itself this should acquire the id_lock
> to synchronize against the store side. i.e. no point in returning
> known bad answers when the lock is held on the store side, even though
> the answer given here may be immediately invalidated as soon as the
> lock is dropped it's still useful to throttle readers in the presence
> of writers.
>  
> > +}
> > +
> > +static ssize_t create_region_store(struct device *dev,
> > +                                  struct device_attribute *attr,
> > +                                  const char *buf, size_t len)
> > +{
> > +       struct cxl_port *port = to_cxl_port(dev->parent);
> > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +       struct cxl_region *cxlr;
> > +       int d, p, r, rc = 0;
> > +
> > +       if (sscanf(buf, "region%d.%d:%d", &p, &d, &r) != 3)
> > +               return -EINVAL;
> > +
> > +       if (port->id != p || cxld->id != d)
> > +               return -EINVAL;
> > +
> > +       rc = mutex_lock_interruptible(&cxld->id_lock);
> > +       if (rc)
> > +               return rc;
> > +
> > +       if (cxld->next_region_id != r) {
> > +               rc = -EINVAL;
> > +               goto out;
> > +       }
> > +
> > +       cxlr = devm_cxl_add_region(cxld);
> > +       rc = 0;
> > +       dev_dbg(dev, "Created %s\n", dev_name(&cxlr->dev));
> > +
> > +out:
> > +       mutex_unlock(&cxld->id_lock);
> > +       if (rc)
> > +               return rc;
> > +       return len;
> > +}
> > +DEVICE_ATTR_RW(create_region);
> > +
> > +static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld,
> > +                                                 const char *name)
> > +{
> > +       struct device *region_dev;
> > +
> > +       region_dev = device_find_child_by_name(&cxld->dev, name);
> > +       if (!region_dev)
> > +               return ERR_PTR(-ENOENT);
> > +
> > +       return to_cxl_region(region_dev);
> > +}
> > +
> > +static ssize_t delete_region_store(struct device *dev,
> > +                                  struct device_attribute *attr,
> > +                                  const char *buf, size_t len)
> > +{
> > +       struct cxl_port *port = to_cxl_port(dev->parent);
> > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > +       struct cxl_region *cxlr;
> > +
> > +       cxlr = cxl_find_region_by_name(cxld, buf);
> > +       if (IS_ERR(cxlr))
> > +               return PTR_ERR(cxlr);
> > +
> > +       if (!test_and_set_bit(REGION_DEAD, &cxlr->flags))
> > +               devm_release_action(port->uport, schedule_unregister, cxlr);
> 
> Where is the synchronization against port ->remove()? That could have
> started before this sysfs file was deleted and trigger double
> device_unregister().

Maybe I'm missing something obvious, but I don't see a way to do this without
adding another lock. I need the root port's device lock for this, but when the
port goes away, so does that device, right?

> Also while any workqueue thread may want to access the region a reference
> needs to be held otherwise you can get a use after free. I expect that this
> should unconditionally schedule the unregister work, then in the workqueue
> check that only one invocation actually performs the unregistration.

Okay.

> 
> Given that the work is only related to unregistration this can fail
> requests to delete something that has already been deleted. If
> userspace sees that error and wants to synchronize it can use the
> bus/flush attribute for that. I.e.
> 
> if (work_pending(&cxlr->detach_work))
>    return -EBUSY;
> 

I'm not following, you mean to put this in flush_store?

> > +       put_device(&cxlr->dev);
> > +
> > +       return len;
> > +}
> > +DEVICE_ATTR_WO(delete_region);
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index b4047a310340..d5397f7dfcf4 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -221,6 +221,8 @@ enum cxl_decoder_type {
> >   * @target_type: accelerator vs expander (type2 vs type3) selector
> >   * @flags: memory type capabilities and locking
> >   * @target_lock: coordinate coherent reads of the target list
> > + * @region_ida: allocator for region ids.
> > + * @next_region_id: Cached region id for next region.
> >   * @nr_targets: number of elements in @target
> >   * @target: active ordered target list in current decoder configuration
> >   */
> > @@ -236,6 +238,9 @@ struct cxl_decoder {
> >         enum cxl_decoder_type target_type;
> >         unsigned long flags;
> >         seqlock_t target_lock;
> > +       struct mutex id_lock;
> > +       struct ida region_ida;
> > +       int next_region_id;
> >         int nr_targets;
> >         struct cxl_dport *target[];
> >  };
> > diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
> > new file mode 100644
> > index 000000000000..7025f6785f83
> > --- /dev/null
> > +++ b/drivers/cxl/region.h
> > @@ -0,0 +1,28 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright(c) 2021 Intel Corporation. */
> > +#ifndef __CXL_REGION_H__
> > +#define __CXL_REGION_H__
> > +
> > +#include <linux/uuid.h>
> > +
> > +#include "cxl.h"
> > +
> > +/**
> > + * struct cxl_region - CXL region
> > + * @dev: This region's device.
> > + * @id: This region's id. Id is globally unique across all regions.
> > + * @flags: Flags representing the current state of the region.
> > + * @unregister_work: Async unregister to allow attrs to take device_lock.
> > + */
> > +struct cxl_region {
> > +       struct device dev;
> > +       int id;
> > +       unsigned long flags;
> > +#define REGION_DEAD 0
> > +       struct work_struct unregister_work;
> > +
> > +};
> > +
> > +bool schedule_cxl_region_unregister(struct cxl_region *cxlr);
> > +
> > +#endif
> > diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> > index 82e49ab0937d..3fe6d34e6d59 100644
> > --- a/tools/testing/cxl/Kbuild
> > +++ b/tools/testing/cxl/Kbuild
> > @@ -46,6 +46,7 @@ cxl_core-y += $(CXL_CORE_SRC)/memdev.o
> >  cxl_core-y += $(CXL_CORE_SRC)/mbox.o
> >  cxl_core-y += $(CXL_CORE_SRC)/pci.o
> >  cxl_core-y += $(CXL_CORE_SRC)/hdm.o
> > +cxl_core-y += $(CXL_CORE_SRC)/region.o
> >  cxl_core-y += config_check.o
> >
> >  obj-m += test/
> > --
> > 2.35.1
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] cxl/region: Add region creation ABI
  2022-03-01 21:22     ` Ben Widawsky
@ 2022-03-01 21:36       ` Ben Widawsky
  2022-03-01 21:49       ` Dan Williams
  1 sibling, 0 replies; 10+ messages in thread
From: Ben Widawsky @ 2022-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, patches, Alison Schofield, Ira Weiny,
	Jonathan Cameron, Vishal Verma

On 22-03-01 13:22:51, Ben Widawsky wrote:
> On 22-02-28 15:48:43, Dan Williams wrote:
> > On Thu, Feb 24, 2022 at 10:01 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > Regions are created as a child of the decoder that encompasses an
> > > address space with constraints. Regions have a number of attributes that
> > > must be configured before the region can be activated.
> > >
> > > The ABI is not meant to be secure,
> > 
> > What does that mean? It's "secure" by virtue of only being root
> > writable and if root userspace process race themselves the kernel will
> > coherently handle the conflict and pick a winner.
> > 
> > > but is meant to avoid accidental
> > > races.
> > 
> > It also solves purposeful race.
> > 
> > >  As a result, a buggy process may create a region by name that was
> > > allocated by a different process.
> > 
> > That's not a bug, one process atomically claims the next region, the
> > other loops.
> > 
> > > However, multiple processes which are
> > > trying not to race with each other shouldn't need special
> > > synchronization to do so.
> > >
> > > // Allocate a new region name
> > > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)
> > >
> > > // Create a new region by name
> > > while
> > > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_region)
> > > ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_region
> > > do true; done
> > >
> > > // Region now exists in sysfs
> > > stat -t /sys/bus/cxl/devices/decoder0.0/$region
> > >
> > > // Delete the region, and name
> > > echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
> > >
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > >
> > > ---
> > > Changes since v5 (all Dan):
> > > - Fix erroneous return on create
> > > - Fix ida leak on error
> > > - forward declare to_cxl_region instead of cxl_region_release
> > > - Use REGION_DEAD in the right place
> > > - Allocate next id in region_alloc
> > > ---
> > >  Documentation/ABI/testing/sysfs-bus-cxl       |  23 ++
> > >  .../driver-api/cxl/memory-devices.rst         |  11 +
> > >  drivers/cxl/core/Makefile                     |   1 +
> > >  drivers/cxl/core/core.h                       |   3 +
> > >  drivers/cxl/core/port.c                       |  18 ++
> > >  drivers/cxl/core/region.c                     | 223 ++++++++++++++++++
> > >  drivers/cxl/cxl.h                             |   5 +
> > >  drivers/cxl/region.h                          |  28 +++
> > >  tools/testing/cxl/Kbuild                      |   1 +
> > >  9 files changed, 313 insertions(+)
> > >  create mode 100644 drivers/cxl/core/region.c
> > >  create mode 100644 drivers/cxl/region.h
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > > index 7c2b846521f3..e5db45ea70ad 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > > @@ -163,3 +163,26 @@ Description:
> > >                 memory (type-3). The 'target_type' attribute indicates the
> > >                 current setting which may dynamically change based on what
> > >                 memory regions are activated in this decode hierarchy.
> > > +
> > > +What:          /sys/bus/cxl/devices/decoderX.Y/create_region
> > > +Date:          January, 2022
> > > +KernelVersion: v5.18
> > > +Contact:       linux-cxl@vger.kernel.org
> > > +Description:
> > > +               Write a value of the form 'regionX.Y:Z' to instantiate a new
> > 
> > Per internal conversation this will move to "write a value of the form
> > "%u", right?
> 
> Yep.
> 
> > 
> > > +               region within the decode range bounded by decoderX.Y. The value
> > > +               written must match the current value returned from reading this
> > > +               attribute. This behavior lets the kernel arbitrate racing
> > > +               attempts to create a region. The thread that fails to write
> > > +               loops and tries the next value. Regions must be created for root
> > > +               decoders,
> > 
> > Does this need to be stated? It's already implicit by the fact that
> > 'create_region' does not appear on anything but root decoders.
> > 
> 
> Yes. Historically I had added this attribute for all decoders...
> 
> > 
> > >  and must subsequently configured and bound to a region
> > > +               driver before they can be used.
> > > +
> > > +What:          /sys/bus/cxl/devices/decoderX.Y/delete_region
> > > +Date:          January, 2022
> > > +KernelVersion: v5.18
> > > +Contact:       linux-cxl@vger.kernel.org
> > > +Description:
> > > +               Deletes the named region.  The attribute expects a region in the
> > > +               form "regionX.Y:Z". The region's name, allocated by reading
> > > +               create_region, will also be released.
> > > diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> > > index db476bb170b6..66ddc58a21b1 100644
> > > --- a/Documentation/driver-api/cxl/memory-devices.rst
> > > +++ b/Documentation/driver-api/cxl/memory-devices.rst
> > > @@ -362,6 +362,17 @@ CXL Core
> > >  .. kernel-doc:: drivers/cxl/core/mbox.c
> > >     :doc: cxl mbox
> > >
> > > +CXL Regions
> > > +-----------
> > > +.. kernel-doc:: drivers/cxl/region.h
> > > +   :identifiers:
> > > +
> > > +.. kernel-doc:: drivers/cxl/core/region.c
> > > +   :doc: cxl core region
> > > +
> > > +.. kernel-doc:: drivers/cxl/core/region.c
> > > +   :identifiers:
> > > +
> > >  External Interfaces
> > >  ===================
> > >
> > > diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> > > index 6d37cd78b151..39ce8f2f2373 100644
> > > --- a/drivers/cxl/core/Makefile
> > > +++ b/drivers/cxl/core/Makefile
> > > @@ -4,6 +4,7 @@ obj-$(CONFIG_CXL_BUS) += cxl_core.o
> > >  ccflags-y += -I$(srctree)/drivers/cxl
> > >  cxl_core-y := port.o
> > >  cxl_core-y += pmem.o
> > > +cxl_core-y += region.o
> > >  cxl_core-y += regs.o
> > >  cxl_core-y += memdev.o
> > >  cxl_core-y += mbox.o
> > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > > index 1a50c0fc399c..adfd42370b28 100644
> > > --- a/drivers/cxl/core/core.h
> > > +++ b/drivers/cxl/core/core.h
> > > @@ -9,6 +9,9 @@ extern const struct device_type cxl_nvdimm_type;
> > >
> > >  extern struct attribute_group cxl_base_attribute_group;
> > >
> > > +extern struct device_attribute dev_attr_create_region;
> > > +extern struct device_attribute dev_attr_delete_region;
> > > +
> > >  struct cxl_send_command;
> > >  struct cxl_mem_query_commands;
> > >  int cxl_query_cmd(struct cxl_memdev *cxlmd,
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index 1e785a3affaa..f3e1313217a8 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -9,6 +9,7 @@
> > >  #include <linux/idr.h>
> > >  #include <cxlmem.h>
> > >  #include <cxlpci.h>
> > > +#include <region.h>
> > >  #include <cxl.h>
> > >  #include "core.h"
> > >
> > > @@ -213,6 +214,8 @@ static struct attribute_group cxl_decoder_base_attribute_group = {
> > >  };
> > >
> > >  static struct attribute *cxl_decoder_root_attrs[] = {
> > > +       &dev_attr_create_region.attr,
> > > +       &dev_attr_delete_region.attr,
> > >         &dev_attr_cap_pmem.attr,
> > >         &dev_attr_cap_ram.attr,
> > >         &dev_attr_cap_type2.attr,
> > > @@ -270,6 +273,8 @@ static void cxl_decoder_release(struct device *dev)
> > >         struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > >         struct cxl_port *port = to_cxl_port(dev->parent);
> > >
> > > +       ida_free(&cxld->region_ida, cxld->next_region_id);
> > > +       ida_destroy(&cxld->region_ida);
> > >         ida_free(&port->decoder_ida, cxld->id);
> > >         kfree(cxld);
> > >  }
> > > @@ -1244,6 +1249,13 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> > >         cxld->target_type = CXL_DECODER_EXPANDER;
> > >         cxld->platform_res = (struct resource)DEFINE_RES_MEM(0, 0);
> > >
> > > +       mutex_init(&cxld->id_lock);
> > > +       ida_init(&cxld->region_ida);
> > > +       rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> > > +       if (rc < 0)
> > > +               goto err;
> > > +
> > > +       cxld->next_region_id = rc;
> > >         return cxld;
> > >  err:
> > >         kfree(cxld);
> > > @@ -1502,6 +1514,12 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
> > >
> > > +bool schedule_cxl_region_unregister(struct cxl_region *cxlr)
> > > +{
> > > +       return queue_work(cxl_bus_wq, &cxlr->unregister_work);
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(schedule_cxl_region_unregister, CXL);
> > > +
> > >  /* for user tooling to ensure port disable work has completed */
> > >  static ssize_t flush_store(struct bus_type *bus, const char *buf, size_t count)
> > >  {
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > new file mode 100644
> > > index 000000000000..a934938f8630
> > > --- /dev/null
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -0,0 +1,223 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> > > +#include <linux/device.h>
> > > +#include <linux/module.h>
> > > +#include <linux/slab.h>
> > > +#include <linux/idr.h>
> > > +#include <region.h>
> > > +#include <cxl.h>
> > > +#include "core.h"
> > > +
> > > +/**
> > > + * DOC: cxl core region
> > > + *
> > > + * CXL Regions represent mapped memory capacity in system physical address
> > > + * space. Whereas the CXL Root Decoders identify the bounds of potential CXL
> > > + * Memory ranges, Regions represent the active mapped capacity by the HDM
> > > + * Decoder Capability structures throughout the Host Bridges, Switches, and
> > > + * Endpoints in the topology.
> > > + */
> > > +
> > > +
> > > +static struct cxl_region *to_cxl_region(struct device *dev);
> > 
> > cxl_lock_class() will want to know the lock_class for the cxl_region
> > device lock, so this can go straight to be an extern declaration
> > alongside is_cxl_decoder().
> > 
> 
> Yeah, this is stale already. I took what you have in the preview branch which
> sets it up right before device_add().
> 
> > > +
> > > +static void cxl_region_release(struct device *dev)
> > > +{
> > > +       struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> > > +       struct cxl_region *cxlr = to_cxl_region(dev);
> > > +
> > > +       dev_dbg(&cxld->dev, "Releasing %s\n", dev_name(dev));
> > > +       ida_free(&cxld->region_ida, cxlr->id);
> > > +       kfree(cxlr);
> > > +       put_device(&cxld->dev);
> > > +}
> > > +
> > > +static const struct device_type cxl_region_type = {
> > > +       .name = "cxl_region",
> > > +       .release = cxl_region_release,
> > > +};
> > > +
> > > +static struct cxl_region *to_cxl_region(struct device *dev)
> > > +{
> > > +       if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> > > +                         "not a cxl_region device\n"))
> > > +               return NULL;
> > > +
> > > +       return container_of(dev, struct cxl_region, dev);
> > > +}
> > > +
> > > +static void unregister_region(struct work_struct *work)
> > > +{
> > > +       struct cxl_region *cxlr;
> > > +
> > > +       cxlr = container_of(work, typeof(*cxlr), unregister_work);
> > > +       device_unregister(&cxlr->dev);
> > > +}
> > > +
> > > +static void schedule_unregister(void *cxlr)
> > > +{
> > > +       schedule_cxl_region_unregister(cxlr);
> > > +}
> > > +
> > > +static struct cxl_region *cxl_region_alloc(struct cxl_decoder *cxld)
> > > +{
> > > +       struct cxl_region *cxlr;
> > > +       struct device *dev;
> > > +       int rc;
> > > +
> > > +       lockdep_assert_held(&cxld->id_lock);
> > > +
> > > +       rc = ida_alloc(&cxld->region_ida, GFP_KERNEL);
> > > +       if (rc < 0) {
> > > +               dev_dbg(dev, "Failed to get next cached id (%d)\n", rc);
> > > +               return ERR_PTR(rc);
> > > +       }
> > > +
> > > +       cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL);
> > > +       if (!cxlr) {
> > > +               ida_free(&cxld->region_ida, rc);
> > > +               return ERR_PTR(-ENOMEM);
> > > +       }
> > > +
> > > +       cxld->next_region_id = rc;
> > > +
> > > +       dev = &cxlr->dev;
> > > +       device_initialize(dev);
> > > +       dev->parent = &cxld->dev;
> > > +       device_set_pm_not_required(dev);
> > > +       dev->bus = &cxl_bus_type;
> > > +       dev->type = &cxl_region_type;
> > > +       INIT_WORK(&cxlr->unregister_work, unregister_region);
> > 
> > Perhaps name it "detach_work" to match the same for memdevs, or second
> > choice, go rename the memdev one to "unregister_work". Keep the naming
> > consistent for data structures that fill the same role.
> > 
> 
> Okay. Do you want me to rename schedule_cxl_region_unregister() also? I assume
> yes.
> 
> > > +
> > > +       return cxlr;
> > > +}
> > > +
> > > +/**
> > > + * devm_cxl_add_region - Adds a region to a decoder
> > > + * @cxld: Parent decoder.
> > > + * @cxlr: Region to be added to the decoder.
> > > + *
> > > + * This is the second step of region initialization. Regions exist within an
> > > + * address space which is mapped by a @cxld. That @cxld must be a root decoder,
> > > + * and it enforces constraints upon the region as it is configured.
> > > + *
> > > + * Return: 0 if the region was added to the @cxld, else returns negative error
> > > + * code. The region will be named "regionX.Y.Z" where X is the port, Y is the
> > > + * decoder id, and Z is the region number.
> > > + */
> > > +static struct cxl_region *devm_cxl_add_region(struct cxl_decoder *cxld)
> > > +{
> > > +       struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > > +       struct cxl_region *cxlr;
> > > +       struct device *dev;
> > > +       int rc;
> > > +
> > > +       cxlr = cxl_region_alloc(cxld);
> > > +       if (IS_ERR(cxlr))
> > > +               return cxlr;
> > > +
> > > +       dev = &cxlr->dev;
> > > +
> > > +       cxlr->id = cxld->next_region_id;
> > > +       rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, cxlr->id);
> > > +       if (rc)
> > > +               goto err_out;
> > > +
> > > +       /* affirm that release will have access to the decoder's region ida  */
> > 
> > s/affirm that release will/arrange for cxl_region_release to/
> > 
> > > +       get_device(&cxld->dev);
> > > +
> > > +       rc = device_add(dev);
> > > +       if (rc)
> > > +               goto err_put;
> > > +
> > > +       rc = devm_add_action_or_reset(port->uport, schedule_unregister, cxlr);
> > > +       if (rc)
> > > +               goto err_put;
> > > +
> > > +       return cxlr;
> > > +
> > > +err_put:
> > > +       put_device(&cxld->dev);
> > > +
> > > +err_out:
> > > +       put_device(dev);
> > > +       return ERR_PTR(rc);
> > > +}
> > > +
> > > +static ssize_t create_region_show(struct device *dev,
> > > +                                 struct device_attribute *attr, char *buf)
> > > +{
> > > +       struct cxl_port *port = to_cxl_port(dev->parent);
> > > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > > +
> > > +       return sysfs_emit(buf, "region%d.%d:%d\n", port->id, cxld->id,
> > > +                         cxld->next_region_id);
> > 
> > To cut down on userspace racing itself this should acquire the id_lock
> > to synchronize against the store side. i.e. no point in returning
> > known bad answers when the lock is held on the store side, even though
> > the answer given here may be immediately invalidated as soon as the
> > lock is dropped it's still useful to throttle readers in the presence
> > of writers.
> >  
> > > +}
> > > +
> > > +static ssize_t create_region_store(struct device *dev,
> > > +                                  struct device_attribute *attr,
> > > +                                  const char *buf, size_t len)
> > > +{
> > > +       struct cxl_port *port = to_cxl_port(dev->parent);
> > > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > > +       struct cxl_region *cxlr;
> > > +       int d, p, r, rc = 0;
> > > +
> > > +       if (sscanf(buf, "region%d.%d:%d", &p, &d, &r) != 3)
> > > +               return -EINVAL;
> > > +
> > > +       if (port->id != p || cxld->id != d)
> > > +               return -EINVAL;
> > > +
> > > +       rc = mutex_lock_interruptible(&cxld->id_lock);
> > > +       if (rc)
> > > +               return rc;
> > > +
> > > +       if (cxld->next_region_id != r) {
> > > +               rc = -EINVAL;
> > > +               goto out;
> > > +       }
> > > +
> > > +       cxlr = devm_cxl_add_region(cxld);
> > > +       rc = 0;
> > > +       dev_dbg(dev, "Created %s\n", dev_name(&cxlr->dev));
> > > +
> > > +out:
> > > +       mutex_unlock(&cxld->id_lock);
> > > +       if (rc)
> > > +               return rc;
> > > +       return len;
> > > +}
> > > +DEVICE_ATTR_RW(create_region);
> > > +
> > > +static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld,
> > > +                                                 const char *name)
> > > +{
> > > +       struct device *region_dev;
> > > +
> > > +       region_dev = device_find_child_by_name(&cxld->dev, name);
> > > +       if (!region_dev)
> > > +               return ERR_PTR(-ENOENT);
> > > +
> > > +       return to_cxl_region(region_dev);
> > > +}
> > > +
> > > +static ssize_t delete_region_store(struct device *dev,
> > > +                                  struct device_attribute *attr,
> > > +                                  const char *buf, size_t len)
> > > +{
> > > +       struct cxl_port *port = to_cxl_port(dev->parent);
> > > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > > +       struct cxl_region *cxlr;
> > > +
> > > +       cxlr = cxl_find_region_by_name(cxld, buf);
> > > +       if (IS_ERR(cxlr))
> > > +               return PTR_ERR(cxlr);
> > > +
> > > +       if (!test_and_set_bit(REGION_DEAD, &cxlr->flags))
> > > +               devm_release_action(port->uport, schedule_unregister, cxlr);
> > 
> > Where is the synchronization against port ->remove()? That could have
> > started before this sysfs file was deleted and trigger double
> > device_unregister().
> 
> Maybe I'm missing something obvious, but I don't see a way to do this without
> adding another lock. I need the root port's device lock for this, but when the
> port goes away, so does that device, right?
> 
> > Also while any workqueue thread may want to access the region a reference
> > needs to be held otherwise you can get a use after free. I expect that this
> > should unconditionally schedule the unregister work, then in the workqueue
> > check that only one invocation actually performs the unregistration.
> 
> Okay.

If we have a lock in place that's fine. The current problem this solves is the
WARN on the delete/remove race when both try to unregister.

> 
> > 
> > Given that the work is only related to unregistration this can fail
> > requests to delete something that has already been deleted. If
> > userspace sees that error and wants to synchronize it can use the
> > bus/flush attribute for that. I.e.
> > 
> > if (work_pending(&cxlr->detach_work))
> >    return -EBUSY;
> > 
> 
> I'm not following, you mean to put this in flush_store?
> 
> > > +       put_device(&cxlr->dev);
> > > +
> > > +       return len;
> > > +}
> > > +DEVICE_ATTR_WO(delete_region);
> > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > > index b4047a310340..d5397f7dfcf4 100644
> > > --- a/drivers/cxl/cxl.h
> > > +++ b/drivers/cxl/cxl.h
> > > @@ -221,6 +221,8 @@ enum cxl_decoder_type {
> > >   * @target_type: accelerator vs expander (type2 vs type3) selector
> > >   * @flags: memory type capabilities and locking
> > >   * @target_lock: coordinate coherent reads of the target list
> > > + * @region_ida: allocator for region ids.
> > > + * @next_region_id: Cached region id for next region.
> > >   * @nr_targets: number of elements in @target
> > >   * @target: active ordered target list in current decoder configuration
> > >   */
> > > @@ -236,6 +238,9 @@ struct cxl_decoder {
> > >         enum cxl_decoder_type target_type;
> > >         unsigned long flags;
> > >         seqlock_t target_lock;
> > > +       struct mutex id_lock;
> > > +       struct ida region_ida;
> > > +       int next_region_id;
> > >         int nr_targets;
> > >         struct cxl_dport *target[];
> > >  };
> > > diff --git a/drivers/cxl/region.h b/drivers/cxl/region.h
> > > new file mode 100644
> > > index 000000000000..7025f6785f83
> > > --- /dev/null
> > > +++ b/drivers/cxl/region.h
> > > @@ -0,0 +1,28 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/* Copyright(c) 2021 Intel Corporation. */
> > > +#ifndef __CXL_REGION_H__
> > > +#define __CXL_REGION_H__
> > > +
> > > +#include <linux/uuid.h>
> > > +
> > > +#include "cxl.h"
> > > +
> > > +/**
> > > + * struct cxl_region - CXL region
> > > + * @dev: This region's device.
> > > + * @id: This region's id. Id is globally unique across all regions.
> > > + * @flags: Flags representing the current state of the region.
> > > + * @unregister_work: Async unregister to allow attrs to take device_lock.
> > > + */
> > > +struct cxl_region {
> > > +       struct device dev;
> > > +       int id;
> > > +       unsigned long flags;
> > > +#define REGION_DEAD 0
> > > +       struct work_struct unregister_work;
> > > +
> > > +};
> > > +
> > > +bool schedule_cxl_region_unregister(struct cxl_region *cxlr);
> > > +
> > > +#endif
> > > diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> > > index 82e49ab0937d..3fe6d34e6d59 100644
> > > --- a/tools/testing/cxl/Kbuild
> > > +++ b/tools/testing/cxl/Kbuild
> > > @@ -46,6 +46,7 @@ cxl_core-y += $(CXL_CORE_SRC)/memdev.o
> > >  cxl_core-y += $(CXL_CORE_SRC)/mbox.o
> > >  cxl_core-y += $(CXL_CORE_SRC)/pci.o
> > >  cxl_core-y += $(CXL_CORE_SRC)/hdm.o
> > > +cxl_core-y += $(CXL_CORE_SRC)/region.o
> > >  cxl_core-y += config_check.o
> > >
> > >  obj-m += test/
> > > --
> > > 2.35.1
> > >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] cxl/region: Add region creation ABI
  2022-03-01 21:22     ` Ben Widawsky
  2022-03-01 21:36       ` Ben Widawsky
@ 2022-03-01 21:49       ` Dan Williams
  1 sibling, 0 replies; 10+ messages in thread
From: Dan Williams @ 2022-03-01 21:49 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: linux-cxl, patches, Alison Schofield, Ira Weiny,
	Jonathan Cameron, Vishal Verma

On Tue, Mar 1, 2022 at 1:23 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
[..]
> > > +       INIT_WORK(&cxlr->unregister_work, unregister_region);
> >
> > Perhaps name it "detach_work" to match the same for memdevs, or second
> > choice, go rename the memdev one to "unregister_work". Keep the naming
> > consistent for data structures that fill the same role.
> >
>
> Okay. Do you want me to rename schedule_cxl_region_unregister() also? I assume
> yes.

Of course.

>
> > > +
> > > +       return cxlr;
> > > +}
> > > +
> > > +/**
> > > + * devm_cxl_add_region - Adds a region to a decoder
> > > + * @cxld: Parent decoder.
> > > + * @cxlr: Region to be added to the decoder.
> > > + *
> > > + * This is the second step of region initialization. Regions exist within an
> > > + * address space which is mapped by a @cxld. That @cxld must be a root decoder,
> > > + * and it enforces constraints upon the region as it is configured.
> > > + *
> > > + * Return: 0 if the region was added to the @cxld, else returns negative error
> > > + * code. The region will be named "regionX.Y.Z" where X is the port, Y is the
> > > + * decoder id, and Z is the region number.
> > > + */
> > > +static struct cxl_region *devm_cxl_add_region(struct cxl_decoder *cxld)
> > > +{
> > > +       struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > > +       struct cxl_region *cxlr;
> > > +       struct device *dev;
> > > +       int rc;
> > > +
> > > +       cxlr = cxl_region_alloc(cxld);
> > > +       if (IS_ERR(cxlr))
> > > +               return cxlr;
> > > +
> > > +       dev = &cxlr->dev;
> > > +
> > > +       cxlr->id = cxld->next_region_id;
> > > +       rc = dev_set_name(dev, "region%d.%d:%d", port->id, cxld->id, cxlr->id);
> > > +       if (rc)
> > > +               goto err_out;
> > > +
> > > +       /* affirm that release will have access to the decoder's region ida  */
> >
> > s/affirm that release will/arrange for cxl_region_release to/
> >
> > > +       get_device(&cxld->dev);
> > > +
> > > +       rc = device_add(dev);
> > > +       if (rc)
> > > +               goto err_put;
> > > +
> > > +       rc = devm_add_action_or_reset(port->uport, schedule_unregister, cxlr);
> > > +       if (rc)
> > > +               goto err_put;
> > > +
> > > +       return cxlr;
> > > +
> > > +err_put:
> > > +       put_device(&cxld->dev);
> > > +
> > > +err_out:
> > > +       put_device(dev);
> > > +       return ERR_PTR(rc);
> > > +}
> > > +
> > > +static ssize_t create_region_show(struct device *dev,
> > > +                                 struct device_attribute *attr, char *buf)
> > > +{
> > > +       struct cxl_port *port = to_cxl_port(dev->parent);
> > > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > > +
> > > +       return sysfs_emit(buf, "region%d.%d:%d\n", port->id, cxld->id,
> > > +                         cxld->next_region_id);
> >
> > To cut down on userspace racing itself this should acquire the id_lock
> > to synchronize against the store side. i.e. no point in returning
> > known bad answers when the lock is held on the store side, even though
> > the answer given here may be immediately invalidated as soon as the
> > lock is dropped it's still useful to throttle readers in the presence
> > of writers.
> >
> > > +}
> > > +
> > > +static ssize_t create_region_store(struct device *dev,
> > > +                                  struct device_attribute *attr,
> > > +                                  const char *buf, size_t len)
> > > +{
> > > +       struct cxl_port *port = to_cxl_port(dev->parent);
> > > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > > +       struct cxl_region *cxlr;
> > > +       int d, p, r, rc = 0;
> > > +
> > > +       if (sscanf(buf, "region%d.%d:%d", &p, &d, &r) != 3)
> > > +               return -EINVAL;
> > > +
> > > +       if (port->id != p || cxld->id != d)
> > > +               return -EINVAL;
> > > +
> > > +       rc = mutex_lock_interruptible(&cxld->id_lock);
> > > +       if (rc)
> > > +               return rc;
> > > +
> > > +       if (cxld->next_region_id != r) {
> > > +               rc = -EINVAL;
> > > +               goto out;
> > > +       }
> > > +
> > > +       cxlr = devm_cxl_add_region(cxld);
> > > +       rc = 0;
> > > +       dev_dbg(dev, "Created %s\n", dev_name(&cxlr->dev));
> > > +
> > > +out:
> > > +       mutex_unlock(&cxld->id_lock);
> > > +       if (rc)
> > > +               return rc;
> > > +       return len;
> > > +}
> > > +DEVICE_ATTR_RW(create_region);
> > > +
> > > +static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld,
> > > +                                                 const char *name)
> > > +{
> > > +       struct device *region_dev;
> > > +
> > > +       region_dev = device_find_child_by_name(&cxld->dev, name);
> > > +       if (!region_dev)
> > > +               return ERR_PTR(-ENOENT);
> > > +
> > > +       return to_cxl_region(region_dev);
> > > +}
> > > +
> > > +static ssize_t delete_region_store(struct device *dev,
> > > +                                  struct device_attribute *attr,
> > > +                                  const char *buf, size_t len)
> > > +{
> > > +       struct cxl_port *port = to_cxl_port(dev->parent);
> > > +       struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > > +       struct cxl_region *cxlr;
> > > +
> > > +       cxlr = cxl_find_region_by_name(cxld, buf);
> > > +       if (IS_ERR(cxlr))
> > > +               return PTR_ERR(cxlr);
> > > +
> > > +       if (!test_and_set_bit(REGION_DEAD, &cxlr->flags))
> > > +               devm_release_action(port->uport, schedule_unregister, cxlr);
> >
> > Where is the synchronization against port ->remove()? That could have
> > started before this sysfs file was deleted and trigger double
> > device_unregister().
>
> Maybe I'm missing something obvious, but I don't see a way to do this without
> adding another lock. I need the root port's device lock for this, but when the
> port goes away, so does that device, right?

When you say "goes away" I can not tell if you are talking about
release or unregister?

Not sure it matters for this case. In this case, all that is needed is
synchronization, not necessarily locking. Synchronization can be
achieved the single threaded workqueue by just scheduling the
unregistration from each place it needs to be scheduled from and then
use the REGION_DEAD flag to make sure that even if multiple
unregistrations are scheduled only one will succeed in calling
device_unregister().

> > Given that the work is only related to unregistration this can fail
> > requests to delete something that has already been deleted. If
> > userspace sees that error and wants to synchronize it can use the
> > bus/flush attribute for that. I.e.
> >
> > if (work_pending(&cxlr->detach_work))
> >    return -EBUSY;
> >
>
> I'm not following, you mean to put this in flush_store?

No, if 2 threads race to delete a region, one will successfully
schedule and the other will see -EBUSY. If either thread wants to know
when the deletion has actually happened it can do "echo 1 >
/sys/bus/cxl/flush".

I forgot that the explicit work_pending() check is not needed. This
can simply use the return code from queue_work() to indicate if the
deletion is now in-flight.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-03-01 21:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25  6:00 [RFC PATCH 0/2] Region creation/configuration ABI Ben Widawsky
2022-02-25  6:00 ` [RFC PATCH 1/2] cxl/region: Add region creation ABI Ben Widawsky
2022-02-28 23:48   ` Dan Williams
2022-03-01 21:22     ` Ben Widawsky
2022-03-01 21:36       ` Ben Widawsky
2022-03-01 21:49       ` Dan Williams
2022-02-25  6:00 ` [RFC PATCH 2/2] cxl/region: Introduce concept of region configuration Ben Widawsky
2022-03-01  1:16   ` Dan Williams
2022-03-01 17:43     ` Ben Widawsky
2022-03-01 18:34       ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).