All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
@ 2023-02-06  1:02 Dan Williams
  2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
                   ` (21 more replies)
  0 siblings, 22 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:02 UTC (permalink / raw)
  To: linux-cxl
  Cc: David Hildenbrand, Kees Cook, stable, Dave Hansen, Michal Hocko,
	linux-mm, linux-acpi

Summary:
--------

CXL RAM support allows for the dynamic provisioning of new CXL RAM
regions, and more routinely, assembling a region from an existing
configuration established by platform-firmware. The latter is motivated
by CXL memory RAS (Reliability, Availability and Serviceability)
support, that requires associating device events with System Physical
Address ranges and vice versa.

The 'Soft Reserved' policy rework arranges for performance
differentiated memory like CXL attached DRAM, or high-bandwidth memory,
to be designated for 'System RAM' by default, rather than the device-dax
dedicated access mode. That current device-dax default is confusing and
surprising for the Pareto of users that do not expect memory to be
quarantined for dedicated access by default. Most users expect all
'System RAM'-capable memory to show up in FREE(1).


Details:
--------

Recall that the Linux 'Soft Reserved' designation for memory is a
reaction to platform-firmware, like EFI EDK2, delineating memory with
the EFI Specific Purpose Memory attribute (EFI_MEMORY_SP). An
alternative way to think of that attribute is that it specifies the
*not* general-purpose memory pool. It is memory that may be too precious
for general usage or not performant enough for some hot data structures.
However, in the absence of explicit policy it should just be 'System
RAM' by default.

Rather than require every distribution to ship a udev policy to assign
dax devices to dax_kmem (the device-memory hotplug driver) just make
that the kernel default. This is similar to the rationale in:

commit 8604d9e534a3 ("memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE")

With this change the relatively niche use case of accessing this memory
via mapping a device-dax instance can be achieved by building with
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, or specifying
memhp_default_state=offline at boot, and then use:

    daxctl reconfigure-device $device -m devdax --force

...to shift the corresponding address range to device-dax access.

The process of assembling a device-dax instance for a given CXL region
device configuration is similar to the process of assembling a
Device-Mapper or MDRAID storage-device array. Specifically, asynchronous
probing by the PCI and driver core enumerates all CXL endpoints and
their decoders. Then, once enough decoders have arrived to a describe a
given region, that region is passed to the device-dax subsystem where it
is subject to the above 'dax_kmem' policy. This assignment and policy
choice is only possible if memory is set aside by the 'Soft Reserved'
designation. Otherwise, CXL that is mapped as 'System RAM' becomes
immutable by CXL driver mechanisms, but is still enumerated for RAS
purposes.

This series is also available via:

https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region

...and has gone through some preview testing in various forms.

---

Dan Williams (18):
      cxl/Documentation: Update references to attributes added in v6.0
      cxl/region: Add a mode attribute for regions
      cxl/region: Support empty uuids for non-pmem regions
      cxl/region: Validate region mode vs decoder mode
      cxl/region: Add volatile region creation support
      cxl/region: Refactor attach_target() for autodiscovery
      cxl/region: Move region-position validation to a helper
      kernel/range: Uplevel the cxl subsystem's range_contains() helper
      cxl/region: Enable CONFIG_CXL_REGION to be toggled
      cxl/region: Fix passthrough-decoder detection
      cxl/region: Add region autodiscovery
      tools/testing/cxl: Define a fixed volatile configuration to parse
      dax/hmem: Move HMAT and Soft reservation probe initcall level
      dax/hmem: Drop unnecessary dax_hmem_remove()
      dax/hmem: Convey the dax range via memregion_info()
      dax/hmem: Move hmem device registration to dax_hmem.ko
      dax: Assign RAM regions to memory-hotplug by default
      cxl/dax: Create dax devices for CXL RAM regions


 Documentation/ABI/testing/sysfs-bus-cxl |   64 +-
 MAINTAINERS                             |    1 
 drivers/acpi/numa/hmat.c                |    4 
 drivers/cxl/Kconfig                     |   12 
 drivers/cxl/acpi.c                      |    3 
 drivers/cxl/core/core.h                 |    7 
 drivers/cxl/core/hdm.c                  |    8 
 drivers/cxl/core/pci.c                  |    5 
 drivers/cxl/core/port.c                 |   34 +
 drivers/cxl/core/region.c               |  848 ++++++++++++++++++++++++++++---
 drivers/cxl/cxl.h                       |   46 ++
 drivers/cxl/cxlmem.h                    |    3 
 drivers/cxl/port.c                      |   26 +
 drivers/dax/Kconfig                     |   17 +
 drivers/dax/Makefile                    |    2 
 drivers/dax/bus.c                       |   53 +-
 drivers/dax/bus.h                       |   12 
 drivers/dax/cxl.c                       |   53 ++
 drivers/dax/device.c                    |    3 
 drivers/dax/hmem/Makefile               |    3 
 drivers/dax/hmem/device.c               |  102 ++--
 drivers/dax/hmem/hmem.c                 |  148 +++++
 drivers/dax/kmem.c                      |    1 
 include/linux/dax.h                     |    7 
 include/linux/memregion.h               |    2 
 include/linux/range.h                   |    5 
 lib/stackinit_kunit.c                   |    6 
 tools/testing/cxl/test/cxl.c            |  146 +++++
 28 files changed, 1355 insertions(+), 266 deletions(-)
 create mode 100644 drivers/dax/cxl.c

base-commit: 172738bbccdb4ef76bdd72fc72a315c741c39161

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
@ 2023-02-06  1:02 ` Dan Williams
  2023-02-06 15:17   ` Jonathan Cameron
                     ` (5 more replies)
  2023-02-06  1:02 ` [PATCH 02/18] cxl/region: Add a mode attribute for regions Dan Williams
                   ` (20 subsequent siblings)
  21 siblings, 6 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:02 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Prior to Linus deciding that the kernel that following v5.19 would be
v6.0, the CXL ABI documentation already referenced v5.20. In preparation
for updating these entries update the kernel version to v6.0.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |   30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 329a7e46c805..5be032313e29 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -198,7 +198,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/endpointX/CDAT
 Date:		July, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RO) If this sysfs entry is not present no DOE mailbox was
@@ -209,7 +209,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/decoderX.Y/mode
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
@@ -229,7 +229,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/decoderX.Y/dpa_resource
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint",
@@ -240,7 +240,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/decoderX.Y/dpa_size
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
@@ -260,7 +260,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/decoderX.Y/interleave_ways
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RO) The number of targets across which this decoder's host
@@ -275,7 +275,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/decoderX.Y/interleave_granularity
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RO) The number of consecutive bytes of host physical address
@@ -287,7 +287,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/decoderX.Y/create_pmem_region
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Write a string in the form 'regionZ' to start the process
@@ -303,7 +303,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(WO) Write a string in the form 'regionZ' to delete that region,
@@ -312,7 +312,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/regionZ/uuid
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Write a unique identifier for the region. This field must
@@ -322,7 +322,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Set the number of consecutive bytes each device in the
@@ -333,7 +333,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/regionZ/interleave_ways
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Configures the number of devices participating in the
@@ -343,7 +343,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/regionZ/size
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) System physical address space to be consumed by the region.
@@ -360,7 +360,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/regionZ/resource
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RO) A region is a contiguous partition of a CXL root decoder
@@ -372,7 +372,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/regionZ/target[0..N]
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Write an endpoint decoder object name to 'targetX' where X
@@ -391,7 +391,7 @@ Description:
 
 What:		/sys/bus/cxl/devices/regionZ/commit
 Date:		May, 2022
-KernelVersion:	v5.20
+KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Write a boolean 'true' string value to this attribute to


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 02/18] cxl/region: Add a mode attribute for regions
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
  2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
@ 2023-02-06  1:02 ` Dan Williams
  2023-02-06 15:46   ` Jonathan Cameron
                     ` (4 more replies)
  2023-02-06  1:02 ` [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions Dan Williams
                   ` (19 subsequent siblings)
  21 siblings, 5 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:02 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

In preparation for a new region type, "ram" regions, add a mode
attribute to clarify the mode of the decoders that can be added to a
region. Share the internals of mode_show() (for decoders) with the
region case.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |   11 +++++++++++
 drivers/cxl/core/port.c                 |   12 +-----------
 drivers/cxl/core/region.c               |   10 ++++++++++
 drivers/cxl/cxl.h                       |   14 ++++++++++++++
 4 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 5be032313e29..058b0c45001f 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -358,6 +358,17 @@ Description:
 		results in the same address being allocated.
 
 
+What:		/sys/bus/cxl/devices/regionZ/mode
+Date:		January, 2023
+KernelVersion:	v6.3
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) The mode of a region is established at region creation time
+		and dictates the mode of the endpoint decoder that comprise the
+		region. For more details on the possible modes see
+		/sys/bus/cxl/devices/decoderX.Y/mode
+
+
 What:		/sys/bus/cxl/devices/regionZ/resource
 Date:		May, 2022
 KernelVersion:	v6.0
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 410c036c09fa..8566451cb22f 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -180,17 +180,7 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
 {
 	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
 
-	switch (cxled->mode) {
-	case CXL_DECODER_RAM:
-		return sysfs_emit(buf, "ram\n");
-	case CXL_DECODER_PMEM:
-		return sysfs_emit(buf, "pmem\n");
-	case CXL_DECODER_NONE:
-		return sysfs_emit(buf, "none\n");
-	case CXL_DECODER_MIXED:
-	default:
-		return sysfs_emit(buf, "mixed\n");
-	}
+	return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode));
 }
 
 static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 60828d01972a..17d2d0c12725 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -458,6 +458,15 @@ static ssize_t resource_show(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RO(resource);
 
+static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+
+	return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode));
+}
+static DEVICE_ATTR_RO(mode);
+
 static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
 {
 	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
@@ -585,6 +594,7 @@ static struct attribute *cxl_region_attrs[] = {
 	&dev_attr_interleave_granularity.attr,
 	&dev_attr_resource.attr,
 	&dev_attr_size.attr,
+	&dev_attr_mode.attr,
 	NULL,
 };
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index aa3af3bb73b2..ca76879af1de 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -320,6 +320,20 @@ enum cxl_decoder_mode {
 	CXL_DECODER_DEAD,
 };
 
+static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
+{
+	static const char * const names[] = {
+		[CXL_DECODER_NONE] = "none",
+		[CXL_DECODER_RAM] = "ram",
+		[CXL_DECODER_PMEM] = "pmem",
+		[CXL_DECODER_MIXED] = "mixed",
+	};
+
+	if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
+		return names[mode];
+	return "mixed";
+}
+
 /**
  * struct cxl_endpoint_decoder - Endpoint  / SPA to DPA decoder
  * @cxld: base cxl_decoder_object


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
  2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
  2023-02-06  1:02 ` [PATCH 02/18] cxl/region: Add a mode attribute for regions Dan Williams
@ 2023-02-06  1:02 ` Dan Williams
  2023-02-06 15:54   ` Jonathan Cameron
                     ` (2 more replies)
  2023-02-06  1:02 ` [PATCH 04/18] cxl/region: Validate region mode vs decoder mode Dan Williams
                   ` (18 subsequent siblings)
  21 siblings, 3 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:02 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Shipping versions of the cxl-cli utility expect all regions to have a
'uuid' attribute. In preparation for 'ram' regions, update the 'uuid'
attribute to return an empty string which satisfies the current
expectations of 'cxl list -R'. Otherwise, 'cxl list -R' fails in the
presence of regions with the 'uuid' attribute missing. Force the
attribute to be read-only as there is no facility or expectation for a
'ram' region to recall its uuid from one boot to the next.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |    3 ++-
 drivers/cxl/core/region.c               |    7 +++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 058b0c45001f..4c4e1cbb1169 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -317,7 +317,8 @@ Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Write a unique identifier for the region. This field must
 		be set for persistent regions and it must not conflict with the
-		UUID of another region.
+		UUID of another region. For volatile ram regions this
+		attribute is a read-only empty string.
 
 
 What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 17d2d0c12725..c9e7f05caa0f 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -45,7 +45,10 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
 	rc = down_read_interruptible(&cxl_region_rwsem);
 	if (rc)
 		return rc;
-	rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
+	if (cxlr->mode != CXL_DECODER_PMEM)
+		rc = sysfs_emit(buf, "\n");
+	else
+		rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
 	up_read(&cxl_region_rwsem);
 
 	return rc;
@@ -301,7 +304,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
 	struct cxl_region *cxlr = to_cxl_region(dev);
 
 	if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM)
-		return 0;
+		return 0444;
 	return a->mode;
 }
 


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 04/18] cxl/region: Validate region mode vs decoder mode
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (2 preceding siblings ...)
  2023-02-06  1:02 ` [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions Dan Williams
@ 2023-02-06  1:02 ` Dan Williams
  2023-02-06 16:02   ` Jonathan Cameron
                     ` (4 more replies)
  2023-02-06  1:02 ` [PATCH 05/18] cxl/region: Add volatile region creation support Dan Williams
                   ` (17 subsequent siblings)
  21 siblings, 5 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:02 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

In preparation for a new region mode, do not, for example, allow
'ram' decoders to be assigned to 'pmem' regions and vice versa.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/region.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c9e7f05caa0f..53d6dbe4de6d 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1217,6 +1217,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 	struct cxl_dport *dport;
 	int i, rc = -ENXIO;
 
+	if (cxled->mode != cxlr->mode) {
+		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
+			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
+		return -EINVAL;
+	}
+
 	if (cxled->mode == CXL_DECODER_DEAD) {
 		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
 		return -ENODEV;


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (3 preceding siblings ...)
  2023-02-06  1:02 ` [PATCH 04/18] cxl/region: Validate region mode vs decoder mode Dan Williams
@ 2023-02-06  1:02 ` Dan Williams
  2023-02-06 16:18   ` Jonathan Cameron
                     ` (4 more replies)
  2023-02-06  1:03 ` [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery Dan Williams
                   ` (16 subsequent siblings)
  21 siblings, 5 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:02 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Expand the region creation infrastructure to enable 'ram'
(volatile-memory) regions. The internals of create_pmem_region_store()
and create_pmem_region_show() are factored out into helpers
__create_region() and __create_region_show() for the 'ram' case to
reuse.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |   22 +++++-----
 drivers/cxl/core/core.h                 |    1 
 drivers/cxl/core/port.c                 |   14 ++++++
 drivers/cxl/core/region.c               |   71 +++++++++++++++++++++++++------
 4 files changed, 82 insertions(+), 26 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 4c4e1cbb1169..3acf2f17a73f 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -285,20 +285,20 @@ Description:
 		interleave_granularity).
 
 
-What:		/sys/bus/cxl/devices/decoderX.Y/create_pmem_region
-Date:		May, 2022
-KernelVersion:	v6.0
+What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
+Date:		May, 2022, January, 2023
+KernelVersion:	v6.0 (pmem), v6.3 (ram)
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Write a string in the form 'regionZ' to start the process
-		of defining a new persistent memory region (interleave-set)
-		within the decode range bounded by root decoder 'decoderX.Y'.
-		The value written must match the current value returned from
-		reading this attribute. An atomic compare exchange operation is
-		done on write to assign the requested id to a region and
-		allocate the region-id for the next creation attempt. EBUSY is
-		returned if the region name written does not match the current
-		cached value.
+		of defining a new persistent, or volatile memory region
+		(interleave-set) within the decode range bounded by root decoder
+		'decoderX.Y'. The value written must match the current value
+		returned from reading this attribute. An atomic compare exchange
+		operation is done on write to assign the requested id to a
+		region and allocate the region-id for the next creation attempt.
+		EBUSY is returned if the region name written does not match the
+		current cached value.
 
 
 What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 8c04672dca56..5eb873da5a30 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -11,6 +11,7 @@ extern struct attribute_group cxl_base_attribute_group;
 
 #ifdef CONFIG_CXL_REGION
 extern struct device_attribute dev_attr_create_pmem_region;
+extern struct device_attribute dev_attr_create_ram_region;
 extern struct device_attribute dev_attr_delete_region;
 extern struct device_attribute dev_attr_region;
 extern const struct device_type cxl_pmem_region_type;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 8566451cb22f..47e450c3a5a9 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -294,6 +294,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
 	&dev_attr_cap_type3.attr,
 	&dev_attr_target_list.attr,
 	SET_CXL_REGION_ATTR(create_pmem_region)
+	SET_CXL_REGION_ATTR(create_ram_region)
 	SET_CXL_REGION_ATTR(delete_region)
 	NULL,
 };
@@ -305,6 +306,13 @@ static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
 	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
 }
 
+static bool can_create_ram(struct cxl_root_decoder *cxlrd)
+{
+	unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
+
+	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
+}
+
 static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
 {
 	struct device *dev = kobj_to_dev(kobj);
@@ -313,7 +321,11 @@ static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *
 	if (a == CXL_REGION_ATTR(create_pmem_region) && !can_create_pmem(cxlrd))
 		return 0;
 
-	if (a == CXL_REGION_ATTR(delete_region) && !can_create_pmem(cxlrd))
+	if (a == CXL_REGION_ATTR(create_ram_region) && !can_create_ram(cxlrd))
+		return 0;
+
+	if (a == CXL_REGION_ATTR(delete_region) &&
+	    !(can_create_pmem(cxlrd) || can_create_ram(cxlrd)))
 		return 0;
 
 	return a->mode;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 53d6dbe4de6d..8dea49c021b8 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1685,6 +1685,15 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
 	struct device *dev;
 	int rc;
 
+	switch (mode) {
+	case CXL_DECODER_RAM:
+	case CXL_DECODER_PMEM:
+		break;
+	default:
+		dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
+		return ERR_PTR(-EINVAL);
+	}
+
 	cxlr = cxl_region_alloc(cxlrd, id);
 	if (IS_ERR(cxlr))
 		return cxlr;
@@ -1713,12 +1722,38 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
 	return ERR_PTR(rc);
 }
 
+static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf)
+{
+	return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
+}
+
 static ssize_t create_pmem_region_show(struct device *dev,
 				       struct device_attribute *attr, char *buf)
 {
-	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+	return __create_region_show(to_cxl_root_decoder(dev), buf);
+}
 
-	return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
+static ssize_t create_ram_region_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	return __create_region_show(to_cxl_root_decoder(dev), buf);
+}
+
+static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
+					  enum cxl_decoder_mode mode, int id)
+{
+	int rc;
+
+	rc = memregion_alloc(GFP_KERNEL);
+	if (rc < 0)
+		return ERR_PTR(rc);
+
+	if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
+		memregion_free(rc);
+		return ERR_PTR(-EBUSY);
+	}
+
+	return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_EXPANDER);
 }
 
 static ssize_t create_pmem_region_store(struct device *dev,
@@ -1727,29 +1762,37 @@ static ssize_t create_pmem_region_store(struct device *dev,
 {
 	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
 	struct cxl_region *cxlr;
-	int id, rc;
+	int rc, id;
 
 	rc = sscanf(buf, "region%d\n", &id);
 	if (rc != 1)
 		return -EINVAL;
 
-	rc = memregion_alloc(GFP_KERNEL);
-	if (rc < 0)
-		return rc;
+	cxlr = __create_region(cxlrd, CXL_DECODER_PMEM, id);
+	if (IS_ERR(cxlr))
+		return PTR_ERR(cxlr);
+	return len;
+}
+DEVICE_ATTR_RW(create_pmem_region);
 
-	if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
-		memregion_free(rc);
-		return -EBUSY;
-	}
+static ssize_t create_ram_region_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t len)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+	struct cxl_region *cxlr;
+	int rc, id;
 
-	cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM,
-				   CXL_DECODER_EXPANDER);
+	rc = sscanf(buf, "region%d\n", &id);
+	if (rc != 1)
+		return -EINVAL;
+
+	cxlr = __create_region(cxlrd, CXL_DECODER_RAM, id);
 	if (IS_ERR(cxlr))
 		return PTR_ERR(cxlr);
-
 	return len;
 }
-DEVICE_ATTR_RW(create_pmem_region);
+DEVICE_ATTR_RW(create_ram_region);
 
 static ssize_t region_show(struct device *dev, struct device_attribute *attr,
 			   char *buf)


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (4 preceding siblings ...)
  2023-02-06  1:02 ` [PATCH 05/18] cxl/region: Add volatile region creation support Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-06 17:06   ` Jonathan Cameron
                     ` (3 more replies)
  2023-02-06  1:03 ` [PATCH 07/18] cxl/region: Move region-position validation to a helper Dan Williams
                   ` (15 subsequent siblings)
  21 siblings, 4 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Region autodiscovery is the process of kernel creating 'struct
cxl_region' object to represent active CXL memory ranges it finds
already active in hardware when the driver loads. Typically this happens
when platform firmware establishes CXL memory regions and then publishes
them in the memory map. However, this can also happen in the case of
kexec-reboot after the kernel has created regions.

In the autodiscovery case the region creation process starts with a
known endpoint decoder. Refactor attach_target() into a helper that is
suitable to be called from either sysfs, for runtime region creation, or
from cxl_port_probe() after it has enumerated all endpoint decoders.

The cxl_port_probe() context is an async device-core probing context, so
it is not appropriate to allow SIGTERM to interrupt the assembly
process. Refactor attach_target() to take @cxled and @state as arguments
where @state indicates whether waiting from the region rwsem is
interruptible or not.

No behavior change is intended.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/region.c |   47 +++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 8dea49c021b8..97eafdd75675 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1418,31 +1418,25 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
 	up_write(&cxl_region_rwsem);
 }
 
-static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos)
+static int attach_target(struct cxl_region *cxlr,
+			 struct cxl_endpoint_decoder *cxled, int pos,
+			 unsigned int state)
 {
-	struct device *dev;
-	int rc;
-
-	dev = bus_find_device_by_name(&cxl_bus_type, NULL, decoder);
-	if (!dev)
-		return -ENODEV;
-
-	if (!is_endpoint_decoder(dev)) {
-		put_device(dev);
-		return -EINVAL;
-	}
+	int rc = 0;
 
-	rc = down_write_killable(&cxl_region_rwsem);
+	if (state == TASK_INTERRUPTIBLE)
+		rc = down_write_killable(&cxl_region_rwsem);
+	else
+		down_write(&cxl_region_rwsem);
 	if (rc)
-		goto out;
+		return rc;
+
 	down_read(&cxl_dpa_rwsem);
-	rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos);
+	rc = cxl_region_attach(cxlr, cxled, pos);
 	if (rc == 0)
 		set_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
 	up_read(&cxl_dpa_rwsem);
 	up_write(&cxl_region_rwsem);
-out:
-	put_device(dev);
 	return rc;
 }
 
@@ -1480,8 +1474,23 @@ static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
 
 	if (sysfs_streq(buf, "\n"))
 		rc = detach_target(cxlr, pos);
-	else
-		rc = attach_target(cxlr, buf, pos);
+	else {
+		struct device *dev;
+
+		dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
+		if (!dev)
+			return -ENODEV;
+
+		if (!is_endpoint_decoder(dev)) {
+			rc = -EINVAL;
+			goto out;
+		}
+
+		rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos,
+				   TASK_INTERRUPTIBLE);
+out:
+		put_device(dev);
+	}
 
 	if (rc < 0)
 		return rc;


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 07/18] cxl/region: Move region-position validation to a helper
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (5 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-06 17:44   ` Ira Weiny
                     ` (2 more replies)
  2023-02-06  1:03 ` [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper Dan Williams
                   ` (14 subsequent siblings)
  21 siblings, 3 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

In preparation for region autodiscovery, that needs all devices
discovered before their relative position in the region can be
determined, consolidate all position dependent validation in a helper.

Recall that in the on-demand region creation flow the end-user picks the
position of a given endpoint decoder in a region. In the autodiscovery
case the position of an endpoint decoder can only be determined after
all other endpoint decoders that claim to decode the region's address
range have been enumerated and attached. So, in the autodiscovery case
endpoint decoders may be attached before their relative position is
known. Once all decoders arrive, then positions can be determined and
validated with cxl_region_validate_position() the same as user initiated
on-demand creation.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/region.c |  119 +++++++++++++++++++++++++++++----------------
 1 file changed, 76 insertions(+), 43 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 97eafdd75675..c82d3b6f3d1f 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1207,35 +1207,13 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr)
 	return 0;
 }
 
-static int cxl_region_attach(struct cxl_region *cxlr,
-			     struct cxl_endpoint_decoder *cxled, int pos)
+static int cxl_region_validate_position(struct cxl_region *cxlr,
+					struct cxl_endpoint_decoder *cxled,
+					int pos)
 {
-	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
-	struct cxl_port *ep_port, *root_port, *iter;
 	struct cxl_region_params *p = &cxlr->params;
-	struct cxl_dport *dport;
-	int i, rc = -ENXIO;
-
-	if (cxled->mode != cxlr->mode) {
-		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
-			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
-		return -EINVAL;
-	}
-
-	if (cxled->mode == CXL_DECODER_DEAD) {
-		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
-		return -ENODEV;
-	}
-
-	/* all full of members, or interleave config not established? */
-	if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
-		dev_dbg(&cxlr->dev, "region already active\n");
-		return -EBUSY;
-	} else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
-		dev_dbg(&cxlr->dev, "interleave config missing\n");
-		return -ENXIO;
-	}
+	int i;
 
 	if (pos < 0 || pos >= p->interleave_ways) {
 		dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
@@ -1274,6 +1252,71 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 		}
 	}
 
+	return 0;
+}
+
+static int cxl_region_attach_position(struct cxl_region *cxlr,
+				      struct cxl_root_decoder *cxlrd,
+				      struct cxl_endpoint_decoder *cxled,
+				      const struct cxl_dport *dport, int pos)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_port *iter;
+	int rc;
+
+	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
+		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
+			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+			dev_name(&cxlrd->cxlsd.cxld.dev));
+		return -ENXIO;
+	}
+
+	for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
+	     iter = to_cxl_port(iter->dev.parent)) {
+		rc = cxl_port_attach_region(iter, cxlr, cxled, pos);
+		if (rc)
+			goto err;
+	}
+
+	return 0;
+
+err:
+	for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
+	     iter = to_cxl_port(iter->dev.parent))
+		cxl_port_detach_region(iter, cxlr, cxled);
+	return rc;
+}
+
+static int cxl_region_attach(struct cxl_region *cxlr,
+			     struct cxl_endpoint_decoder *cxled, int pos)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_port *ep_port, *root_port;
+	struct cxl_dport *dport;
+	int rc = -ENXIO;
+
+	if (cxled->mode != cxlr->mode) {
+		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
+			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
+		return -EINVAL;
+	}
+
+	if (cxled->mode == CXL_DECODER_DEAD) {
+		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
+		return -ENODEV;
+	}
+
+	/* all full of members, or interleave config not established? */
+	if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
+		dev_dbg(&cxlr->dev, "region already active\n");
+		return -EBUSY;
+	} else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
+		dev_dbg(&cxlr->dev, "interleave config missing\n");
+		return -ENXIO;
+	}
+
 	ep_port = cxled_to_port(cxled);
 	root_port = cxlrd_to_port(cxlrd);
 	dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge);
@@ -1284,13 +1327,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 		return -ENXIO;
 	}
 
-	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
-		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
-			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
-			dev_name(&cxlrd->cxlsd.cxld.dev));
-		return -ENXIO;
-	}
-
 	if (cxled->cxld.target_type != cxlr->type) {
 		dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
 			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
@@ -1314,12 +1350,13 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 		return -EINVAL;
 	}
 
-	for (iter = ep_port; !is_cxl_root(iter);
-	     iter = to_cxl_port(iter->dev.parent)) {
-		rc = cxl_port_attach_region(iter, cxlr, cxled, pos);
-		if (rc)
-			goto err;
-	}
+	rc = cxl_region_validate_position(cxlr, cxled, pos);
+	if (rc)
+		return rc;
+
+	rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos);
+	if (rc)
+		return rc;
 
 	p->targets[pos] = cxled;
 	cxled->pos = pos;
@@ -1343,10 +1380,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 
 err_decrement:
 	p->nr_targets--;
-err:
-	for (iter = ep_port; !is_cxl_root(iter);
-	     iter = to_cxl_port(iter->dev.parent))
-		cxl_port_detach_region(iter, cxlr, cxled);
 	return rc;
 }
 


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (6 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 07/18] cxl/region: Move region-position validation to a helper Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-06 17:02   ` Gregory Price
                     ` (4 more replies)
  2023-02-06  1:03 ` [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled Dan Williams
                   ` (13 subsequent siblings)
  21 siblings, 5 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: Kees Cook, dave.hansen, linux-mm, linux-acpi

In support of the CXL subsystem's use of 'struct range' to track decode
address ranges, add a common range_contains() implementation with
identical semantics as resource_contains();

The existing 'range_contains()' in lib/stackinit_kunit.c is namespaced
with a 'stackinit_' prefix.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/pci.c |    5 -----
 include/linux/range.h  |    5 +++++
 lib/stackinit_kunit.c  |    6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 1d1492440287..9ed2120dbf8a 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -214,11 +214,6 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
 	return devm_add_action_or_reset(host, clear_mem_enable, cxlds);
 }
 
-static bool range_contains(struct range *r1, struct range *r2)
-{
-	return r1->start <= r2->start && r1->end >= r2->end;
-}
-
 /* require dvsec ranges to be covered by a locked platform window */
 static int dvsec_range_allowed(struct device *dev, void *arg)
 {
diff --git a/include/linux/range.h b/include/linux/range.h
index 274681cc3154..7efb6a9b069b 100644
--- a/include/linux/range.h
+++ b/include/linux/range.h
@@ -13,6 +13,11 @@ static inline u64 range_len(const struct range *range)
 	return range->end - range->start + 1;
 }
 
+static inline bool range_contains(struct range *r1, struct range *r2)
+{
+	return r1->start <= r2->start && r1->end >= r2->end;
+}
+
 int add_range(struct range *range, int az, int nr_range,
 		u64 start, u64 end);
 
diff --git a/lib/stackinit_kunit.c b/lib/stackinit_kunit.c
index 4591d6cf5e01..05947a2feb93 100644
--- a/lib/stackinit_kunit.c
+++ b/lib/stackinit_kunit.c
@@ -31,8 +31,8 @@ static volatile u8 forced_mask = 0xff;
 static void *fill_start, *target_start;
 static size_t fill_size, target_size;
 
-static bool range_contains(char *haystack_start, size_t haystack_size,
-			   char *needle_start, size_t needle_size)
+static bool stackinit_range_contains(char *haystack_start, size_t haystack_size,
+				     char *needle_start, size_t needle_size)
 {
 	if (needle_start >= haystack_start &&
 	    needle_start + needle_size <= haystack_start + haystack_size)
@@ -175,7 +175,7 @@ static noinline void test_ ## name (struct kunit *test)		\
 								\
 	/* Validate that compiler lined up fill and target. */	\
 	KUNIT_ASSERT_TRUE_MSG(test,				\
-		range_contains(fill_start, fill_size,		\
+		stackinit_range_contains(fill_start, fill_size,	\
 			    target_start, target_size),		\
 		"stack fill missed target!? "			\
 		"(fill %zu wide, target offset by %d)\n",	\


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (7 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-06 17:03   ` Gregory Price
                     ` (3 more replies)
  2023-02-06  1:03 ` [PATCH 10/18] cxl/region: Fix passthrough-decoder detection Dan Williams
                   ` (12 subsequent siblings)
  21 siblings, 4 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Add help text and a label so the CXL_REGION config option can be
toggled. This is mainly to enable compile testing without region
support.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/Kconfig |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 0ac53c422c31..163c094e67ae 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -104,12 +104,22 @@ config CXL_SUSPEND
 	depends on SUSPEND && CXL_MEM
 
 config CXL_REGION
-	bool
+	bool "CXL: Region Support"
 	default CXL_BUS
 	# For MAX_PHYSMEM_BITS
 	depends on SPARSEMEM
 	select MEMREGION
 	select GET_FREE_REGION
+	help
+	  Enable the CXL core to enumerate and provision CXL regions. A CXL
+	  region is defined by one or more CXL expanders that decode a given
+	  system-physical address range. For CXL regions established by
+	  platform-firmware this option enables memory error handling to
+	  identify the devices participating in a given interleaved memory
+	  range. Otherwise, platform-firmware managed CXL is enabled by being
+	  placed in the system address map and does not need a driver.
+
+	  If unsure say 'y'
 
 config CXL_REGION_INVALIDATION_TEST
 	bool "CXL: Region Cache Management Bypass (TEST)"


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 10/18] cxl/region: Fix passthrough-decoder detection
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (8 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-06  5:38   ` Greg KH
                     ` (3 more replies)
  2023-02-06  1:03 ` [PATCH 11/18] cxl/region: Add region autodiscovery Dan Williams
                   ` (11 subsequent siblings)
  21 siblings, 4 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: stable, dave.hansen, linux-mm, linux-acpi

A passthrough decoder is a decoder that maps only 1 target. It is a
special case because it does not impose any constraints on the
interleave-math as compared to a decoder with multiple targets. Extend
the passthrough case to multi-target-capable decoders that only have one
target selected. I.e. the current code was only considering passthrough
*ports* which are only a subset of the potential passthrough decoder
scenarios.

Fixes: e4f6dfa9ef75 ("cxl/region: Fix 'distance' calculation with passthrough ports")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/region.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c82d3b6f3d1f..34cf95217901 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1019,10 +1019,10 @@ static int cxl_port_setup_targets(struct cxl_port *port,
 		int i, distance;
 
 		/*
-		 * Passthrough ports impose no distance requirements between
+		 * Passthrough decoders impose no distance requirements between
 		 * peers
 		 */
-		if (port->nr_dports == 1)
+		if (cxl_rr->nr_targets == 1)
 			distance = 0;
 		else
 			distance = p->nr_targets / cxl_rr->nr_targets;


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 11/18] cxl/region: Add region autodiscovery
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (9 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 10/18] cxl/region: Fix passthrough-decoder detection Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-06 19:02   ` Ira Weiny
                     ` (2 more replies)
  2023-02-06  1:03 ` [PATCH 12/18] tools/testing/cxl: Define a fixed volatile configuration to parse Dan Williams
                   ` (10 subsequent siblings)
  21 siblings, 3 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Region autodiscovery is an asynchrounous state machine advanced by
cxl_port_probe(). After the decoders on an endpoint port are enumerated
they are scanned for actively enabled instances. Each active decoder is
flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region.
If a region does not already exist for the address range setting of the
decoder one is created. That creation process may race with other
decoders of the same region being discovered since cxl_port_probe() is
asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is
introduced to mitigate that race.

Once all decoders have arrived, "p->nr_targets == p->interleave_ways",
they are sorted by their relative decode position. The sort algorithm
involves finding the point in the cxl_port topology where one leg of the
decode leads to deviceA and the other deviceB. At that point in the
topology the target order in the 'struct cxl_switch_decoder' indicates
the relative position of those endpoint decoders in the region.

>From that point the region goes through the same setup and validation
steps as user-created regions, but instead of programming the decoders
it validates that driver would have written the same values to the
decoders as were already present.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c    |    5 
 drivers/cxl/core/port.c   |    2 
 drivers/cxl/core/region.c |  496 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/cxl/cxl.h         |   16 +
 drivers/cxl/port.c        |   26 ++
 5 files changed, 531 insertions(+), 14 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index dcc16d7cb8f3..174cddfec6e8 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -674,7 +674,7 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
 	up_read(&cxl_dpa_rwsem);
 
 	port->commit_end--;
-	cxld->flags &= ~CXL_DECODER_F_ENABLE;
+	cxld->flags &= ~(CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO);
 
 	return 0;
 }
@@ -719,7 +719,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 
 	/* decoders are enabled if committed */
 	if (committed) {
-		cxld->flags |= CXL_DECODER_F_ENABLE;
+		cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO;
 		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
 			cxld->flags |= CXL_DECODER_F_LOCK;
 		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
@@ -783,6 +783,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 		return rc;
 	}
 	*dpa_base += dpa_size + skip;
+
 	return 0;
 }
 
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 47e450c3a5a9..8130430ffbcf 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -446,6 +446,7 @@ bool is_endpoint_decoder(struct device *dev)
 {
 	return dev->type == &cxl_decoder_endpoint_type;
 }
+EXPORT_SYMBOL_NS_GPL(is_endpoint_decoder, CXL);
 
 bool is_root_decoder(struct device *dev)
 {
@@ -1622,6 +1623,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 	}
 
 	cxlrd->calc_hb = calc_hb;
+	mutex_init(&cxlrd->range_lock);
 
 	cxld = &cxlsd->cxld;
 	cxld->dev.type = &cxl_decoder_root_type;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 34cf95217901..6fe8c70790df 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -6,6 +6,7 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/uuid.h>
+#include <linux/sort.h>
 #include <linux/idr.h>
 #include <cxlmem.h>
 #include <cxl.h>
@@ -520,7 +521,12 @@ static void cxl_region_iomem_release(struct cxl_region *cxlr)
 	if (device_is_registered(&cxlr->dev))
 		lockdep_assert_held_write(&cxl_region_rwsem);
 	if (p->res) {
-		remove_resource(p->res);
+		/*
+		 * Autodiscovered regions may not have been able to insert their
+		 * resource.
+		 */
+		if (p->res->parent)
+			remove_resource(p->res);
 		kfree(p->res);
 		p->res = NULL;
 	}
@@ -1101,12 +1107,35 @@ static int cxl_port_setup_targets(struct cxl_port *port,
 		return rc;
 	}
 
-	cxld->interleave_ways = iw;
-	cxld->interleave_granularity = ig;
-	cxld->hpa_range = (struct range) {
-		.start = p->res->start,
-		.end = p->res->end,
-	};
+	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
+		if (cxld->interleave_ways != iw ||
+		    cxld->interleave_granularity != ig ||
+		    cxld->hpa_range.start != p->res->start ||
+		    cxld->hpa_range.end != p->res->end ||
+		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
+			dev_err(&cxlr->dev,
+				"%s:%s %s expected iw: %d ig: %d %pr\n",
+				dev_name(port->uport), dev_name(&port->dev),
+				__func__, iw, ig, p->res);
+			dev_err(&cxlr->dev,
+				"%s:%s %s got iw: %d ig: %d state: %s %#llx:%#llx\n",
+				dev_name(port->uport), dev_name(&port->dev),
+				__func__, cxld->interleave_ways,
+				cxld->interleave_granularity,
+				(cxld->flags & CXL_DECODER_F_ENABLE) ?
+					"enabled" :
+					"disabled",
+				cxld->hpa_range.start, cxld->hpa_range.end);
+			return -ENXIO;
+		}
+	} else {
+		cxld->interleave_ways = iw;
+		cxld->interleave_granularity = ig;
+		cxld->hpa_range = (struct range) {
+			.start = p->res->start,
+			.end = p->res->end,
+		};
+	}
 	dev_dbg(&cxlr->dev, "%s:%s iw: %d ig: %d\n", dev_name(port->uport),
 		dev_name(&port->dev), iw, ig);
 add_target:
@@ -1117,7 +1146,17 @@ static int cxl_port_setup_targets(struct cxl_port *port,
 			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos);
 		return -ENXIO;
 	}
-	cxlsd->target[cxl_rr->nr_targets_set] = ep->dport;
+	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
+		if (cxlsd->target[cxl_rr->nr_targets_set] != ep->dport) {
+			dev_dbg(&cxlr->dev, "%s:%s: %s expected %s at %d\n",
+				dev_name(port->uport), dev_name(&port->dev),
+				dev_name(&cxlsd->cxld.dev),
+				dev_name(ep->dport->dport),
+				cxl_rr->nr_targets_set);
+			return -ENXIO;
+		}
+	} else
+		cxlsd->target[cxl_rr->nr_targets_set] = ep->dport;
 	inc = 1;
 out_target_set:
 	cxl_rr->nr_targets_set += inc;
@@ -1159,6 +1198,13 @@ static void cxl_region_teardown_targets(struct cxl_region *cxlr)
 	struct cxl_ep *ep;
 	int i;
 
+	/*
+	 * In the auto-discovery case skip automatic teardown since the
+	 * address space is already active
+	 */
+	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
+		return;
+
 	for (i = 0; i < p->nr_targets; i++) {
 		cxled = p->targets[i];
 		cxlmd = cxled_to_memdev(cxled);
@@ -1191,8 +1237,8 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr)
 			iter = to_cxl_port(iter->dev.parent);
 
 		/*
-		 * Descend the topology tree programming targets while
-		 * looking for conflicts.
+		 * Descend the topology tree programming / validating
+		 * targets while looking for conflicts.
 		 */
 		for (ep = cxl_ep_load(iter, cxlmd); iter;
 		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
@@ -1287,6 +1333,182 @@ static int cxl_region_attach_position(struct cxl_region *cxlr,
 	return rc;
 }
 
+static int cxl_region_attach_auto(struct cxl_region *cxlr,
+				  struct cxl_endpoint_decoder *cxled, int pos)
+{
+	struct cxl_region_params *p = &cxlr->params;
+
+	if (!(cxled->cxld.flags & CXL_DECODER_F_AUTO)) {
+		dev_err(&cxlr->dev,
+			"%s: unable to add decoder to autodetected region\n",
+			dev_name(&cxled->cxld.dev));
+		return -EINVAL;
+	}
+
+	if (pos >= 0) {
+		dev_dbg(&cxlr->dev, "%s: expected auto position, not %d\n",
+			dev_name(&cxled->cxld.dev), pos);
+		return -EINVAL;
+	}
+
+	if (p->nr_targets >= p->interleave_ways) {
+		dev_err(&cxlr->dev, "%s: no more target slots available\n",
+			dev_name(&cxled->cxld.dev));
+		return -ENXIO;
+	}
+
+	/*
+	 * Temporarily record the endpoint decoder into the target array. Yes,
+	 * this means that userspace can view devices in the wrong position
+	 * before the region activates, and must be careful to understand when
+	 * it might be racing region autodiscovery.
+	 */
+	pos = p->nr_targets;
+	p->targets[pos] = cxled;
+	cxled->pos = pos;
+	p->nr_targets++;
+
+	return 0;
+}
+
+static struct cxl_port *next_port(struct cxl_port *port)
+{
+	if (!port->parent_dport)
+		return NULL;
+	return port->parent_dport->port;
+}
+
+static int decoder_match_range(struct device *dev, void *data)
+{
+	struct cxl_endpoint_decoder *cxled = data;
+	struct cxl_switch_decoder *cxlsd;
+
+	if (!is_switch_decoder(dev))
+		return 0;
+
+	cxlsd = to_cxl_switch_decoder(dev);
+	return range_contains(&cxlsd->cxld.hpa_range, &cxled->cxld.hpa_range);
+}
+
+static void find_positions(const struct cxl_switch_decoder *cxlsd,
+			   const struct cxl_port *iter_a,
+			   const struct cxl_port *iter_b, int *a_pos,
+			   int *b_pos)
+{
+	int i;
+
+	for (i = 0, *a_pos = -1, *b_pos = -1; i < cxlsd->nr_targets; i++) {
+		if (cxlsd->target[i] == iter_a->parent_dport)
+			*a_pos = i;
+		else if (cxlsd->target[i] == iter_b->parent_dport)
+			*b_pos = i;
+		if (*a_pos >= 0 && *b_pos >= 0)
+			break;
+	}
+}
+
+static int cmp_decode_pos(const void *a, const void *b)
+{
+	struct cxl_endpoint_decoder *cxled_a = *(typeof(cxled_a) *)a;
+	struct cxl_endpoint_decoder *cxled_b = *(typeof(cxled_b) *)b;
+	struct cxl_memdev *cxlmd_a = cxled_to_memdev(cxled_a);
+	struct cxl_memdev *cxlmd_b = cxled_to_memdev(cxled_b);
+	struct cxl_port *port_a = cxled_to_port(cxled_a);
+	struct cxl_port *port_b = cxled_to_port(cxled_b);
+	struct cxl_port *iter_a, *iter_b;
+
+	/* Exit early if any prior sorting failed */
+	if (cxled_a->pos < 0 || cxled_b->pos < 0)
+		return 0;
+
+	/*
+	 * Walk up the hierarchy to find a shared port, find the decoder
+	 * that maps the range, compare the relative position of those
+	 * dport mappings.
+	 */
+	for (iter_a = port_a; iter_a; iter_a = next_port(iter_a)) {
+		struct cxl_port *next_a, *next_b, *port;
+		struct cxl_switch_decoder *cxlsd;
+
+		next_a = next_port(iter_a);
+		for (iter_b = port_b; iter_b; iter_b = next_port(iter_b)) {
+			int a_pos, b_pos, result;
+			struct device *dev;
+			unsigned int seq;
+
+			next_b = next_port(iter_b);
+			if (next_a != next_b)
+				continue;
+			if (!next_a)
+				goto out;
+			port = next_a;
+			dev = device_find_child(&port->dev, cxled_a,
+						decoder_match_range);
+			if (!dev) {
+				struct range *range = &cxled_a->cxld.hpa_range;
+
+				dev_err(port->uport,
+					"failed to find decoder that maps %#llx-:%#llx\n",
+					range->start, range->end);
+				cxled_a->pos = -1;
+				return 0;
+			}
+
+			cxlsd = to_cxl_switch_decoder(dev);
+			do {
+				seq = read_seqbegin(&cxlsd->target_lock);
+				find_positions(cxlsd, iter_a, iter_b, &a_pos,
+					       &b_pos);
+			} while (read_seqretry(&cxlsd->target_lock, seq));
+
+			if (a_pos < 0 || b_pos < 0) {
+				dev_err(port->uport,
+					"failed to find shared decoder for %s and %s\n",
+					dev_name(cxlmd_a->dev.parent),
+					dev_name(cxlmd_b->dev.parent));
+				cxled_a->pos = -1;
+				result = 0;
+			} else {
+				result = a_pos - b_pos;
+				dev_dbg(port->uport, "%s: %s comes %s %s\n",
+					dev_name(&cxlsd->cxld.dev),
+					dev_name(cxlmd_a->dev.parent),
+					result < 0 ? "before" : "after",
+					dev_name(cxlmd_b->dev.parent));
+			}
+
+			put_device(dev);
+
+			return result;
+		}
+	}
+out:
+	dev_err(cxlmd_a->dev.parent, "failed to find shared port with %s\n",
+		dev_name(cxlmd_b->dev.parent));
+	cxled_a->pos = -1;
+	return 0;
+}
+
+static int cxl_region_sort_targets(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	int i, rc = 0;
+
+	sort(p->targets, p->nr_targets, sizeof(p->targets[0]), cmp_decode_pos,
+	     NULL);
+
+	for (i = 0; i < p->nr_targets; i++) {
+		struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+		if (cxled->pos < 0)
+			rc = -ENXIO;
+		cxled->pos = i;
+	}
+
+	dev_dbg(&cxlr->dev, "region sort %s\n", rc ? "failed" : "successful");
+	return rc;
+}
+
 static int cxl_region_attach(struct cxl_region *cxlr,
 			     struct cxl_endpoint_decoder *cxled, int pos)
 {
@@ -1350,6 +1572,50 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 		return -EINVAL;
 	}
 
+	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
+		int i;
+
+		rc = cxl_region_attach_auto(cxlr, cxled, pos);
+		if (rc)
+			return rc;
+
+		/* await more targets to arrive... */
+		if (p->nr_targets < p->interleave_ways)
+			return 0;
+
+		/*
+		 * All targets are here, which implies all PCI enumeration that
+		 * affects this region has been completed. Walk the topology to
+		 * sort the devices into their relative region decode position.
+		 */
+		rc = cxl_region_sort_targets(cxlr);
+		if (rc)
+			return rc;
+
+		for (i = 0; i < p->nr_targets; i++) {
+			cxled = p->targets[i];
+			ep_port = cxled_to_port(cxled);
+			dport = cxl_find_dport_by_dev(root_port,
+						      ep_port->host_bridge);
+			rc = cxl_region_attach_position(cxlr, cxlrd, cxled,
+							dport, i);
+			if (rc)
+				return rc;
+		}
+
+		rc = cxl_region_setup_targets(cxlr);
+		if (rc)
+			return rc;
+
+		/*
+		 * If target setup succeeds in the autodiscovery case
+		 * then the region is already committed.
+		 */
+		p->state = CXL_CONFIG_COMMIT;
+
+		return 0;
+	}
+
 	rc = cxl_region_validate_position(cxlr, cxled, pos);
 	if (rc)
 		return rc;
@@ -1500,8 +1766,8 @@ static int detach_target(struct cxl_region *cxlr, int pos)
 	return rc;
 }
 
-static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
-			    size_t len)
+static ssize_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
+			     size_t len)
 {
 	int rc;
 
@@ -2079,6 +2345,191 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr)
 	return rc;
 }
 
+static int match_decoder_by_range(struct device *dev, void *data)
+{
+	struct range *r1, *r2 = data;
+	struct cxl_root_decoder *cxlrd;
+
+	if (!is_root_decoder(dev))
+		return 0;
+
+	cxlrd = to_cxl_root_decoder(dev);
+	r1 = &cxlrd->cxlsd.cxld.hpa_range;
+	return range_contains(r1, r2);
+}
+
+static int match_region_by_range(struct device *dev, void *data)
+{
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+	struct range *r = data;
+	int rc = 0;
+
+	if (!is_cxl_region(dev))
+		return 0;
+
+	cxlr = to_cxl_region(dev);
+	p = &cxlr->params;
+
+	down_read(&cxl_region_rwsem);
+	if (p->res && p->res->start == r->start && p->res->end == r->end)
+		rc = 1;
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+
+/* Establish an empty region covering the given HPA range */
+static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
+					   struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_port *port = cxlrd_to_port(cxlrd);
+	struct range *hpa = &cxled->cxld.hpa_range;
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+	struct resource *res;
+	int rc;
+
+	do {
+		cxlr = __create_region(cxlrd, cxled->mode,
+				       atomic_read(&cxlrd->region_id));
+	} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
+
+	if (IS_ERR(cxlr)) {
+		dev_err(cxlmd->dev.parent,
+			"%s:%s: %s failed assign region: %ld\n",
+			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+			__func__, PTR_ERR(cxlr));
+		return cxlr;
+	}
+
+	down_write(&cxl_region_rwsem);
+	p = &cxlr->params;
+	if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
+		dev_err(cxlmd->dev.parent,
+			"%s:%s: %s autodiscovery interrupted\n",
+			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+			__func__);
+		rc = -EBUSY;
+		goto err;
+	}
+
+	set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
+
+	res = kmalloc(sizeof(*res), GFP_KERNEL);
+	if (!res) {
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
+				    dev_name(&cxlr->dev));
+	rc = insert_resource(cxlrd->res, res);
+	if (rc) {
+		/*
+		 * Platform-firmware may not have split resources like "System
+		 * RAM" on CXL window boundaries see cxl_region_iomem_release()
+		 */
+		dev_warn(cxlmd->dev.parent,
+			 "%s:%s: %s %s cannot insert resource\n",
+			 dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+			 __func__, dev_name(&cxlr->dev));
+	}
+
+	p->res = res;
+	p->interleave_ways = cxled->cxld.interleave_ways;
+	p->interleave_granularity = cxled->cxld.interleave_granularity;
+	p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
+
+	rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
+	if (rc)
+		goto err;
+
+	dev_dbg(cxlmd->dev.parent, "%s:%s: %s %s res: %pr iw: %d ig: %d\n",
+		dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), __func__,
+		dev_name(&cxlr->dev), p->res, p->interleave_ways,
+		p->interleave_granularity);
+
+	/* ...to match put_device() in cxl_add_to_region() */
+	get_device(&cxlr->dev);
+	up_write(&cxl_region_rwsem);
+
+	return cxlr;
+
+err:
+	up_write(&cxl_region_rwsem);
+	devm_release_action(port->uport, unregister_region, cxlr);
+	return ERR_PTR(rc);
+}
+
+void cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct range *hpa = &cxled->cxld.hpa_range;
+	struct cxl_decoder *cxld = &cxled->cxld;
+	struct cxl_root_decoder *cxlrd;
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+	struct cxl_port *root;
+	bool attach = false;
+	struct device *dev;
+
+	root = find_cxl_root(&cxlmd->dev);
+	if (!root) {
+		dev_err(cxlmd->dev.parent, "%s: failed to map CXL root\n",
+			dev_name(&cxlmd->dev));
+		return;
+	}
+
+	dev = device_find_child(&root->dev, &cxld->hpa_range,
+				  match_decoder_by_range);
+	if (!dev) {
+		dev_err(cxlmd->dev.parent,
+			"%s:%s no CXL window for range %#llx:%#llx\n",
+			dev_name(&cxlmd->dev), dev_name(&cxld->dev),
+			cxld->hpa_range.start, cxld->hpa_range.end);
+		goto out;
+	}
+	cxlrd = to_cxl_root_decoder(dev);
+
+	mutex_lock(&cxlrd->range_lock);
+	dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa,
+				match_region_by_range);
+	if (!dev)
+		cxlr = construct_region(cxlrd, cxled);
+	else
+		cxlr = to_cxl_region(dev);
+	mutex_unlock(&cxlrd->range_lock);
+
+	if (IS_ERR(cxlr))
+		goto out;
+
+	attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
+
+	down_read(&cxl_region_rwsem);
+	p = &cxlr->params;
+	attach = p->state >= CXL_CONFIG_COMMIT;
+	up_read(&cxl_region_rwsem);
+
+	if (attach) {
+		int rc = device_attach(&cxlr->dev);
+
+		/*
+		 * If device_attach() fails the range may still be
+		 * active via the platform-firmware memory map
+		 */
+		if (rc < 0)
+			dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
+				p->res);
+	}
+
+	put_device(&cxlr->dev);
+out:
+	put_device(&root->dev);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, CXL);
+
 static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
 {
 	if (!test_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags))
@@ -2103,6 +2554,15 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
 	return 0;
 }
 
+static int is_system_ram(struct resource *res, void *arg)
+{
+	struct cxl_region *cxlr = arg;
+	struct cxl_region_params *p = &cxlr->params;
+
+	dev_dbg(&cxlr->dev, "%pr has System RAM: %pr\n", p->res, res);
+	return 1;
+}
+
 static int cxl_region_probe(struct device *dev)
 {
 	struct cxl_region *cxlr = to_cxl_region(dev);
@@ -2136,6 +2596,18 @@ static int cxl_region_probe(struct device *dev)
 	switch (cxlr->mode) {
 	case CXL_DECODER_PMEM:
 		return devm_cxl_add_pmem_region(cxlr);
+	case CXL_DECODER_RAM:
+		/*
+		 * The region can not be manged by CXL if any portion of
+		 * it is already online as 'System RAM'
+		 */
+		if (walk_iomem_res_desc(IORES_DESC_NONE,
+					IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
+					p->res->start, p->res->end, cxlr,
+					is_system_ram) > 0)
+			return 0;
+		dev_dbg(dev, "TODO: hookup devdax\n");
+		return 0;
 	default:
 		dev_dbg(&cxlr->dev, "unsupported region mode: %d\n",
 			cxlr->mode);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index ca76879af1de..9b3765c5c81a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -261,6 +261,8 @@ resource_size_t cxl_rcrb_to_component(struct device *dev,
  * cxl_decoder flags that define the type of memory / devices this
  * decoder supports as well as configuration lock status See "CXL 2.0
  * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
+ * Additionally indicate whether decoder settings were autodetected,
+ * user customized.
  */
 #define CXL_DECODER_F_RAM   BIT(0)
 #define CXL_DECODER_F_PMEM  BIT(1)
@@ -268,6 +270,7 @@ resource_size_t cxl_rcrb_to_component(struct device *dev,
 #define CXL_DECODER_F_TYPE3 BIT(3)
 #define CXL_DECODER_F_LOCK  BIT(4)
 #define CXL_DECODER_F_ENABLE    BIT(5)
+#define CXL_DECODER_F_AUTO  BIT(6)
 #define CXL_DECODER_F_MASK  GENMASK(5, 0)
 
 enum cxl_decoder_type {
@@ -380,6 +383,7 @@ typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
  * @region_id: region id for next region provisioning event
  * @calc_hb: which host bridge covers the n'th position by granularity
  * @platform_data: platform specific configuration data
+ * @range_lock: sync region autodiscovery by address range
  * @cxlsd: base cxl switch decoder
  */
 struct cxl_root_decoder {
@@ -387,6 +391,7 @@ struct cxl_root_decoder {
 	atomic_t region_id;
 	cxl_calc_hb_fn calc_hb;
 	void *platform_data;
+	struct mutex range_lock;
 	struct cxl_switch_decoder cxlsd;
 };
 
@@ -436,6 +441,13 @@ struct cxl_region_params {
  */
 #define CXL_REGION_F_INCOHERENT 0
 
+/*
+ * Indicate whether this region has been assembled by autodetection or
+ * userspace assembly. Prevent endpoint decoders outside of automatic
+ * detection from being added to the region.
+ */
+#define CXL_REGION_F_AUTO 1
+
 /**
  * struct cxl_region - CXL region
  * @dev: This region's device
@@ -699,6 +711,7 @@ struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct device *dev);
 #ifdef CONFIG_CXL_REGION
 bool is_cxl_pmem_region(struct device *dev);
 struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
+void cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
 #else
 static inline bool is_cxl_pmem_region(struct device *dev)
 {
@@ -708,6 +721,9 @@ static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev)
 {
 	return NULL;
 }
+static inline void cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
+{
+}
 #endif
 
 /*
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 5453771bf330..012a0c6f8476 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -30,6 +30,23 @@ static void schedule_detach(void *cxlmd)
 	schedule_cxl_memdev_detach(cxlmd);
 }
 
+static int discover_region(struct device *dev, void *data)
+{
+	const unsigned long flags = CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO;
+	struct cxl_endpoint_decoder *cxled;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	if ((cxled->cxld.flags & flags) != flags)
+		return 0;
+
+	cxl_add_to_region(cxled);
+
+	return 0;
+}
+
 static int cxl_port_probe(struct device *dev)
 {
 	struct cxl_port *port = to_cxl_port(dev);
@@ -78,6 +95,15 @@ static int cxl_port_probe(struct device *dev)
 		return rc;
 	}
 
+	if (!is_cxl_endpoint(port))
+		return 0;
+
+	/*
+	 * Now that all endpoint decoders are successfully enumerated,
+	 * try to assemble regions from committed decoders
+	 */
+	device_for_each_child(dev, NULL, discover_region);
+
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 12/18] tools/testing/cxl: Define a fixed volatile configuration to parse
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (10 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 11/18] cxl/region: Add region autodiscovery Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-08 17:31   ` Jonathan Cameron
  2023-02-06  1:03 ` [PATCH 13/18] dax/hmem: Move HMAT and Soft reservation probe initcall level Dan Williams
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Take two endpoints attached to the first switch on the first host-bridge
in the cxl_test topology and define a pre-initialized region. This is a
x2 interleave underneath a x1 CXL Window.

$ modprobe cxl_test
$ # cxl list -Ru
{
  "region":"region3",
  "resource":"0xf010000000",
  "size":"512.00 MiB (536.87 MB)",
  "interleave_ways":2,
  "interleave_granularity":4096,
  "decode_state":"commit"
}

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/core.h      |    3 -
 drivers/cxl/core/hdm.c       |    3 +
 drivers/cxl/core/port.c      |    4 +
 drivers/cxl/cxl.h            |    4 +
 drivers/cxl/cxlmem.h         |    3 +
 tools/testing/cxl/test/cxl.c |  146 +++++++++++++++++++++++++++++++++++++++---
 6 files changed, 148 insertions(+), 15 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 5eb873da5a30..479f01da6d35 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -57,9 +57,6 @@ resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
 resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled);
 extern struct rw_semaphore cxl_dpa_rwsem;
 
-bool is_switch_decoder(struct device *dev);
-struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
-
 int cxl_memdev_init(void);
 void cxl_memdev_exit(void);
 void cxl_mbox_init(void);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 174cddfec6e8..28fa4d835259 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -279,7 +279,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 	return 0;
 }
 
-static int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
+int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 				resource_size_t base, resource_size_t len,
 				resource_size_t skipped)
 {
@@ -295,6 +295,7 @@ static int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 
 	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
 }
+EXPORT_SYMBOL_NS_GPL(devm_cxl_dpa_reserve, CXL);
 
 resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled)
 {
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 8130430ffbcf..8d0895cbae93 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -458,6 +458,7 @@ bool is_switch_decoder(struct device *dev)
 {
 	return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
 }
+EXPORT_SYMBOL_NS_GPL(is_switch_decoder, CXL);
 
 struct cxl_decoder *to_cxl_decoder(struct device *dev)
 {
@@ -485,6 +486,7 @@ struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
 		return NULL;
 	return container_of(dev, struct cxl_switch_decoder, cxld.dev);
 }
+EXPORT_SYMBOL_NS_GPL(to_cxl_switch_decoder, CXL);
 
 static void cxl_ep_release(struct cxl_ep *ep)
 {
@@ -528,7 +530,7 @@ static const struct device_type cxl_port_type = {
 
 bool is_cxl_port(struct device *dev)
 {
-	return dev->type == &cxl_port_type;
+	return dev && dev->type == &cxl_port_type;
 }
 EXPORT_SYMBOL_NS_GPL(is_cxl_port, CXL);
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9b3765c5c81a..4c6ee6c96f23 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -452,6 +452,7 @@ struct cxl_region_params {
  * struct cxl_region - CXL region
  * @dev: This region's device
  * @id: This region's id. Id is globally unique across all regions
+ * @fixed: At least one decoder in this region was locked down at init
  * @mode: Endpoint decoder allocation / access mode
  * @type: Endpoint decoder target type
  * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
@@ -462,6 +463,7 @@ struct cxl_region_params {
 struct cxl_region {
 	struct device dev;
 	int id;
+	bool fixed;
 	enum cxl_decoder_mode mode;
 	enum cxl_decoder_type type;
 	struct cxl_nvdimm_bridge *cxl_nvb;
@@ -643,8 +645,10 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
 
 struct cxl_decoder *to_cxl_decoder(struct device *dev);
 struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
+struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
 struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
 bool is_root_decoder(struct device *dev);
+bool is_switch_decoder(struct device *dev);
 bool is_endpoint_decoder(struct device *dev);
 struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 						unsigned int nr_targets,
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index ab138004f644..82a430cfc3ff 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -79,6 +79,9 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
 }
 
 struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
+int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
+			 resource_size_t base, resource_size_t len,
+			 resource_size_t skipped);
 
 static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
 					 struct cxl_memdev *cxlmd)
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 920bd969c554..ea761afa446f 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -703,6 +703,141 @@ static int mock_decoder_reset(struct cxl_decoder *cxld)
 	return 0;
 }
 
+static void default_mock_decoder(struct cxl_decoder *cxld)
+{
+	cxld->hpa_range = (struct range){
+		.start = 0,
+		.end = -1,
+	};
+
+	cxld->interleave_ways = 1;
+	cxld->interleave_granularity = 256;
+	cxld->target_type = CXL_DECODER_EXPANDER;
+	cxld->commit = mock_decoder_commit;
+	cxld->reset = mock_decoder_reset;
+}
+
+static int first_decoder(struct device *dev, void *data)
+{
+	struct cxl_decoder *cxld;
+
+	if (!is_switch_decoder(dev))
+		return 0;
+	cxld = to_cxl_decoder(dev);
+	if (cxld->id == 0)
+		return 1;
+	return 0;
+}
+
+static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
+{
+	struct acpi_cedt_cfmws *window = mock_cfmws[0];
+	struct platform_device *pdev = NULL;
+	struct cxl_endpoint_decoder *cxled;
+	struct cxl_switch_decoder *cxlsd;
+	struct cxl_port *port, *iter;
+	const int size = SZ_512M;
+	struct cxl_memdev *cxlmd;
+	struct cxl_dport *dport;
+	struct device *dev;
+	bool hb0 = false;
+	u64 base;
+	int i;
+
+	if (is_endpoint_decoder(&cxld->dev)) {
+		cxled = to_cxl_endpoint_decoder(&cxld->dev);
+		cxlmd = cxled_to_memdev(cxled);
+		WARN_ON(!dev_is_platform(cxlmd->dev.parent));
+		pdev = to_platform_device(cxlmd->dev.parent);
+
+		/* check is endpoint is attach to host-bridge0 */
+		port = cxled_to_port(cxled);
+		do {
+			if (port->uport == &cxl_host_bridge[0]->dev) {
+				hb0 = true;
+				break;
+			}
+			if (is_cxl_port(port->dev.parent))
+				port = to_cxl_port(port->dev.parent);
+			else
+				port = NULL;
+		} while (port);
+		port = cxled_to_port(cxled);
+	}
+
+	/*
+	 * The first decoder on the first 2 devices on the first switch
+	 * attached to host-bridge0 mock a fake / static RAM region. All
+	 * other decoders are default disabled. Given the round robin
+	 * assignment those devices are named cxl_mem.0, and cxl_mem.4.
+	 *
+	 * See 'cxl list -BMPu -m cxl_mem.0,cxl_mem.4'
+	 */
+	if (!hb0 || pdev->id % 4 || pdev->id > 4 || cxld->id > 0) {
+		default_mock_decoder(cxld);
+		return;
+	}
+
+	base = window->base_hpa;
+	cxld->hpa_range = (struct range) {
+		.start = base,
+		.end = base + size - 1,
+	};
+
+	cxld->interleave_ways = 2;
+	eig_to_granularity(window->granularity, &cxld->interleave_granularity);
+	cxld->target_type = CXL_DECODER_EXPANDER;
+	cxld->flags = CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO;
+	port->commit_end = cxld->id;
+	devm_cxl_dpa_reserve(cxled, 0, size / cxld->interleave_ways, 0);
+	cxld->commit = mock_decoder_commit;
+	cxld->reset = mock_decoder_reset;
+
+	/*
+	 * Now that endpoint decoder is set up, walk up the hierarchy
+	 * and setup the switch and root port decoders targeting @cxlmd.
+	 */
+	iter = port;
+	for (i = 0; i < 2; i++) {
+		dport = iter->parent_dport;
+		iter = dport->port;
+		dev = device_find_child(&iter->dev, NULL, first_decoder);
+		/*
+		 * Ancestor ports are guaranteed to be enumerated before
+		 * @port, and all ports have at least one decoder.
+		 */
+		if (WARN_ON(!dev))
+			continue;
+		cxlsd = to_cxl_switch_decoder(dev);
+		if (i == 0) {
+			/* put cxl_mem.4 second in the decode order */
+			if (pdev->id == 4)
+				cxlsd->target[1] = dport;
+			else
+				cxlsd->target[0] = dport;
+		} else
+			cxlsd->target[0] = dport;
+		cxld = &cxlsd->cxld;
+		cxld->target_type = CXL_DECODER_EXPANDER;
+		cxld->flags = CXL_DECODER_F_ENABLE;
+		iter->commit_end = 0;
+		/*
+		 * Switch targets 2 endpoints, while host bridge targets
+		 * one root port
+		 */
+		if (i == 0)
+			cxld->interleave_ways = 2;
+		else
+			cxld->interleave_ways = 1;
+		cxld->interleave_granularity = 256;
+		cxld->hpa_range = (struct range) {
+			.start = base,
+			.end = base + size - 1,
+		};
+		put_device(dev);
+	}
+}
+
 static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 {
 	struct cxl_port *port = cxlhdm->port;
@@ -748,16 +883,7 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 			cxld = &cxled->cxld;
 		}
 
-		cxld->hpa_range = (struct range) {
-			.start = 0,
-			.end = -1,
-		};
-
-		cxld->interleave_ways = min_not_zero(target_count, 1);
-		cxld->interleave_granularity = SZ_4K;
-		cxld->target_type = CXL_DECODER_EXPANDER;
-		cxld->commit = mock_decoder_commit;
-		cxld->reset = mock_decoder_reset;
+		mock_init_hdm_decoder(cxld);
 
 		if (target_count) {
 			rc = device_for_each_child(port->uport, &ctx,


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 13/18] dax/hmem: Move HMAT and Soft reservation probe initcall level
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (11 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 12/18] tools/testing/cxl: Define a fixed volatile configuration to parse Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-06  1:03 ` [PATCH 14/18] dax/hmem: Drop unnecessary dax_hmem_remove() Dan Williams
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

In preparation for moving more filtering of "hmem" ranges into the
dax_hmem.ko module, update the initcall levels. HMAT range registration
moves to subsys_initcall() to be done before Soft Reservation probing,
and Soft Reservation probing is moved to device_initcall() to be done
before dax_hmem.ko initialization if it is built-in.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/numa/hmat.c  |    2 +-
 drivers/dax/hmem/Makefile |    3 ++-
 drivers/dax/hmem/device.c |    2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 605a0c7053be..ff24282301ab 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -869,4 +869,4 @@ static __init int hmat_init(void)
 	acpi_put_table(tbl);
 	return 0;
 }
-device_initcall(hmat_init);
+subsys_initcall(hmat_init);
diff --git a/drivers/dax/hmem/Makefile b/drivers/dax/hmem/Makefile
index 57377b4c3d47..d4c4cd6bccd7 100644
--- a/drivers/dax/hmem/Makefile
+++ b/drivers/dax/hmem/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
+# device_hmem.o deliberately precedes dax_hmem.o for initcall ordering
 obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device_hmem.o
+obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
 
 device_hmem-y := device.o
 dax_hmem-y := hmem.o
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index 903325aac991..20749c7fab81 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -104,4 +104,4 @@ static __init int hmem_init(void)
  * As this is a fallback for address ranges unclaimed by the ACPI HMAT
  * parsing it must be at an initcall level greater than hmat_init().
  */
-late_initcall(hmem_init);
+device_initcall(hmem_init);


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 14/18] dax/hmem: Drop unnecessary dax_hmem_remove()
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (12 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 13/18] dax/hmem: Move HMAT and Soft reservation probe initcall level Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-06 17:15   ` Gregory Price
  2023-02-08 17:33   ` Jonathan Cameron
  2023-02-06  1:03 ` [PATCH 15/18] dax/hmem: Convey the dax range via memregion_info() Dan Williams
                   ` (7 subsequent siblings)
  21 siblings, 2 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Empty driver remove callbacks can just be elided.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/hmem/hmem.c |    7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 1bf040dbc834..c7351e0dc8ff 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -44,15 +44,8 @@ static int dax_hmem_probe(struct platform_device *pdev)
 	return 0;
 }
 
-static int dax_hmem_remove(struct platform_device *pdev)
-{
-	/* devm handles teardown */
-	return 0;
-}
-
 static struct platform_driver dax_hmem_driver = {
 	.probe = dax_hmem_probe,
-	.remove = dax_hmem_remove,
 	.driver = {
 		.name = "hmem",
 	},


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 15/18] dax/hmem: Convey the dax range via memregion_info()
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (13 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 14/18] dax/hmem: Drop unnecessary dax_hmem_remove() Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-08 17:35   ` Jonathan Cameron
  2023-02-06  1:03 ` [PATCH 16/18] dax/hmem: Move hmem device registration to dax_hmem.ko Dan Williams
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

In preparation for hmem platform devices to be unregistered, stop using
platform_device_add_resources() to convey the address range. The
platform_device_add_resources() API causes an existing "Soft Reserved"
iomem resource to be re-parented under an inserted platform device
resource. When that platform device is deleted it removes the platform
device resource and all children.

Instead, it is sufficient to convey just the address range and let
request_mem_region() insert resources to indicate the devices active in
the range. This allows the "Soft Reserved" resource to be re-enumerated
upon the next probe event.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/hmem/device.c |   37 ++++++++++++++-----------------------
 drivers/dax/hmem/hmem.c   |   14 +++-----------
 include/linux/memregion.h |    2 ++
 3 files changed, 19 insertions(+), 34 deletions(-)

diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index 20749c7fab81..b1b339bccfe5 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -15,15 +15,8 @@ static struct resource hmem_active = {
 	.flags = IORESOURCE_MEM,
 };
 
-void hmem_register_device(int target_nid, struct resource *r)
+void hmem_register_device(int target_nid, struct resource *res)
 {
-	/* define a clean / non-busy resource for the platform device */
-	struct resource res = {
-		.start = r->start,
-		.end = r->end,
-		.flags = IORESOURCE_MEM,
-		.desc = IORES_DESC_SOFT_RESERVED,
-	};
 	struct platform_device *pdev;
 	struct memregion_info info;
 	int rc, id;
@@ -31,55 +24,53 @@ void hmem_register_device(int target_nid, struct resource *r)
 	if (nohmem)
 		return;
 
-	rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
-			IORES_DESC_SOFT_RESERVED);
+	rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+			       IORES_DESC_SOFT_RESERVED);
 	if (rc != REGION_INTERSECTS)
 		return;
 
 	id = memregion_alloc(GFP_KERNEL);
 	if (id < 0) {
-		pr_err("memregion allocation failure for %pr\n", &res);
+		pr_err("memregion allocation failure for %pr\n", res);
 		return;
 	}
 
 	pdev = platform_device_alloc("hmem", id);
 	if (!pdev) {
-		pr_err("hmem device allocation failure for %pr\n", &res);
+		pr_err("hmem device allocation failure for %pr\n", res);
 		goto out_pdev;
 	}
 
-	if (!__request_region(&hmem_active, res.start, resource_size(&res),
+	if (!__request_region(&hmem_active, res->start, resource_size(res),
 			      dev_name(&pdev->dev), 0)) {
-		dev_dbg(&pdev->dev, "hmem range %pr already active\n", &res);
+		dev_dbg(&pdev->dev, "hmem range %pr already active\n", res);
 		goto out_active;
 	}
 
 	pdev->dev.numa_node = numa_map_to_online_node(target_nid);
 	info = (struct memregion_info) {
 		.target_node = target_nid,
+		.range = {
+			.start = res->start,
+			.end = res->end,
+		},
 	};
 	rc = platform_device_add_data(pdev, &info, sizeof(info));
 	if (rc < 0) {
-		pr_err("hmem memregion_info allocation failure for %pr\n", &res);
-		goto out_resource;
-	}
-
-	rc = platform_device_add_resources(pdev, &res, 1);
-	if (rc < 0) {
-		pr_err("hmem resource allocation failure for %pr\n", &res);
+		pr_err("hmem memregion_info allocation failure for %pr\n", res);
 		goto out_resource;
 	}
 
 	rc = platform_device_add(pdev);
 	if (rc < 0) {
-		dev_err(&pdev->dev, "device add failed for %pr\n", &res);
+		dev_err(&pdev->dev, "device add failed for %pr\n", res);
 		goto out_resource;
 	}
 
 	return;
 
 out_resource:
-	__release_region(&hmem_active, res.start, resource_size(&res));
+	__release_region(&hmem_active, res->start, resource_size(res));
 out_active:
 	platform_device_put(pdev);
 out_pdev:
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index c7351e0dc8ff..5025a8c9850b 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -15,25 +15,17 @@ static int dax_hmem_probe(struct platform_device *pdev)
 	struct memregion_info *mri;
 	struct dev_dax_data data;
 	struct dev_dax *dev_dax;
-	struct resource *res;
-	struct range range;
-
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	if (!res)
-		return -ENOMEM;
 
 	mri = dev->platform_data;
-	range.start = res->start;
-	range.end = res->end;
-	dax_region = alloc_dax_region(dev, pdev->id, &range, mri->target_node,
-			PMD_SIZE, 0);
+	dax_region = alloc_dax_region(dev, pdev->id, &mri->range,
+				      mri->target_node, PMD_SIZE, 0);
 	if (!dax_region)
 		return -ENOMEM;
 
 	data = (struct dev_dax_data) {
 		.dax_region = dax_region,
 		.id = -1,
-		.size = region_idle ? 0 : resource_size(res),
+		.size = region_idle ? 0 : range_len(&mri->range),
 	};
 	dev_dax = devm_create_dev_dax(&data);
 	if (IS_ERR(dev_dax))
diff --git a/include/linux/memregion.h b/include/linux/memregion.h
index bf83363807ac..c01321467789 100644
--- a/include/linux/memregion.h
+++ b/include/linux/memregion.h
@@ -3,10 +3,12 @@
 #define _MEMREGION_H_
 #include <linux/types.h>
 #include <linux/errno.h>
+#include <linux/range.h>
 #include <linux/bug.h>
 
 struct memregion_info {
 	int target_node;
+	struct range range;
 };
 
 #ifdef CONFIG_MEMREGION


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 16/18] dax/hmem: Move hmem device registration to dax_hmem.ko
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (14 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 15/18] dax/hmem: Convey the dax range via memregion_info() Dan Williams
@ 2023-02-06  1:03 ` Dan Williams
  2023-02-06  1:04 ` [PATCH 17/18] dax: Assign RAM regions to memory-hotplug by default Dan Williams
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

In preparation for the CXL region driver to take over the responsibility
of registering device-dax instances for CXL regions, move the
registration of "hmem" devices to dax_hmem.ko.

Previously the builtin component of this enabling
(drivers/dax/hmem/device.o) would register platform devices for each
address range and trigger the dax_hmem.ko module to load and attach
device-dax instances to those devices. Now, the ranges are collected
from the HMAT and EFI memory map walking, but the device creation is
deferred. A new "hmem_platform" device is created which triggers
dax_hmem.ko to load and register the platform devices.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/numa/hmat.c  |    2 -
 drivers/dax/Kconfig       |    2 -
 drivers/dax/hmem/device.c |   91 +++++++++++++++++++--------------------
 drivers/dax/hmem/hmem.c   |  105 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dax.h       |    7 ++-
 5 files changed, 155 insertions(+), 52 deletions(-)

diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index ff24282301ab..bba268ecd802 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -718,7 +718,7 @@ static void hmat_register_target_devices(struct memory_target *target)
 	for (res = target->memregions.child; res; res = res->sibling) {
 		int target_nid = pxm_to_node(target->memory_pxm);
 
-		hmem_register_device(target_nid, res);
+		hmem_register_resource(target_nid, res);
 	}
 }
 
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 5fdf269a822e..d13c889c2a64 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -46,7 +46,7 @@ config DEV_DAX_HMEM
 	  Say M if unsure.
 
 config DEV_DAX_HMEM_DEVICES
-	depends on DEV_DAX_HMEM && DAX=y
+	depends on DEV_DAX_HMEM && DAX
 	def_bool y
 
 config DEV_DAX_KMEM
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index b1b339bccfe5..f9e1a76a04a9 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -8,6 +8,8 @@
 static bool nohmem;
 module_param_named(disable, nohmem, bool, 0444);
 
+static bool platform_initialized;
+static DEFINE_MUTEX(hmem_resource_lock);
 static struct resource hmem_active = {
 	.name = "HMEM devices",
 	.start = 0,
@@ -15,71 +17,66 @@ static struct resource hmem_active = {
 	.flags = IORESOURCE_MEM,
 };
 
-void hmem_register_device(int target_nid, struct resource *res)
+int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
+{
+	struct resource *res;
+	int rc = 0;
+
+	mutex_lock(&hmem_resource_lock);
+	for (res = hmem_active.child; res; res = res->sibling) {
+		rc = fn(host, (int) res->desc, res);
+		if (rc)
+			break;
+	}
+	mutex_unlock(&hmem_resource_lock);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(walk_hmem_resources);
+
+static void __hmem_register_resource(int target_nid, struct resource *res)
 {
 	struct platform_device *pdev;
-	struct memregion_info info;
-	int rc, id;
+	struct resource *new;
+	int rc;
 
-	if (nohmem)
+	new = __request_region(&hmem_active, res->start, resource_size(res), "",
+			       0);
+	if (!new) {
+		pr_debug("hmem range %pr already active\n", res);
 		return;
+	}
 
-	rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
-			       IORES_DESC_SOFT_RESERVED);
-	if (rc != REGION_INTERSECTS)
-		return;
+	new->desc = target_nid;
 
-	id = memregion_alloc(GFP_KERNEL);
-	if (id < 0) {
-		pr_err("memregion allocation failure for %pr\n", res);
+	if (platform_initialized)
 		return;
-	}
 
-	pdev = platform_device_alloc("hmem", id);
+	pdev = platform_device_alloc("hmem_platform", 0);
 	if (!pdev) {
-		pr_err("hmem device allocation failure for %pr\n", res);
-		goto out_pdev;
-	}
-
-	if (!__request_region(&hmem_active, res->start, resource_size(res),
-			      dev_name(&pdev->dev), 0)) {
-		dev_dbg(&pdev->dev, "hmem range %pr already active\n", res);
-		goto out_active;
-	}
-
-	pdev->dev.numa_node = numa_map_to_online_node(target_nid);
-	info = (struct memregion_info) {
-		.target_node = target_nid,
-		.range = {
-			.start = res->start,
-			.end = res->end,
-		},
-	};
-	rc = platform_device_add_data(pdev, &info, sizeof(info));
-	if (rc < 0) {
-		pr_err("hmem memregion_info allocation failure for %pr\n", res);
-		goto out_resource;
+		pr_err_once("failed to register device-dax hmem_platform device\n");
+		return;
 	}
 
 	rc = platform_device_add(pdev);
-	if (rc < 0) {
-		dev_err(&pdev->dev, "device add failed for %pr\n", res);
-		goto out_resource;
-	}
+	if (rc)
+		platform_device_put(pdev);
+	else
+		platform_initialized = true;
+}
 
-	return;
+void hmem_register_resource(int target_nid, struct resource *res)
+{
+	if (nohmem)
+		return;
 
-out_resource:
-	__release_region(&hmem_active, res->start, resource_size(res));
-out_active:
-	platform_device_put(pdev);
-out_pdev:
-	memregion_free(id);
+	mutex_lock(&hmem_resource_lock);
+	__hmem_register_resource(target_nid, res);
+	mutex_unlock(&hmem_resource_lock);
 }
 
 static __init int hmem_register_one(struct resource *res, void *data)
 {
-	hmem_register_device(phys_to_target_node(res->start), res);
+	hmem_register_resource(phys_to_target_node(res->start), res);
 
 	return 0;
 }
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 5025a8c9850b..e7bdff3132fa 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -3,6 +3,7 @@
 #include <linux/memregion.h>
 #include <linux/module.h>
 #include <linux/pfn_t.h>
+#include <linux/dax.h>
 #include "../bus.h"
 
 static bool region_idle;
@@ -43,8 +44,110 @@ static struct platform_driver dax_hmem_driver = {
 	},
 };
 
-module_platform_driver(dax_hmem_driver);
+static void release_memregion(void *data)
+{
+	memregion_free((long) data);
+}
+
+static void release_hmem(void *pdev)
+{
+	platform_device_unregister(pdev);
+}
+
+static int hmem_register_device(struct device *host, int target_nid,
+				const struct resource *res)
+{
+	struct platform_device *pdev;
+	struct memregion_info info;
+	long id;
+	int rc;
+
+	rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+			       IORES_DESC_SOFT_RESERVED);
+	if (rc != REGION_INTERSECTS)
+		return 0;
+
+	id = memregion_alloc(GFP_KERNEL);
+	if (id < 0) {
+		dev_err(host, "memregion allocation failure for %pr\n", res);
+		return -ENOMEM;
+	}
+	rc = devm_add_action_or_reset(host, release_memregion, (void *) id);
+	if (rc)
+		return rc;
+
+	pdev = platform_device_alloc("hmem", id);
+	if (!pdev) {
+		dev_err(host, "device allocation failure for %pr\n", res);
+		return -ENOMEM;
+	}
+
+	pdev->dev.numa_node = numa_map_to_online_node(target_nid);
+	info = (struct memregion_info) {
+		.target_node = target_nid,
+		.range = {
+			.start = res->start,
+			.end = res->end,
+		},
+	};
+	rc = platform_device_add_data(pdev, &info, sizeof(info));
+	if (rc < 0) {
+		dev_err(host, "memregion_info allocation failure for %pr\n",
+		       res);
+		goto out_put;
+	}
+
+	rc = platform_device_add(pdev);
+	if (rc < 0) {
+		dev_err(host, "%s add failed for %pr\n", dev_name(&pdev->dev),
+			res);
+		goto out_put;
+	}
+
+	return devm_add_action_or_reset(host, release_hmem, pdev);
+
+out_put:
+	platform_device_put(pdev);
+	return rc;
+}
+
+static int dax_hmem_platform_probe(struct platform_device *pdev)
+{
+	return walk_hmem_resources(&pdev->dev, hmem_register_device);
+}
+
+static struct platform_driver dax_hmem_platform_driver = {
+	.probe = dax_hmem_platform_probe,
+	.driver = {
+		.name = "hmem_platform",
+	},
+};
+
+static __init int dax_hmem_init(void)
+{
+	int rc;
+
+	rc = platform_driver_register(&dax_hmem_platform_driver);
+	if (rc)
+		return rc;
+
+	rc = platform_driver_register(&dax_hmem_driver);
+	if (rc)
+		platform_driver_unregister(&dax_hmem_platform_driver);
+
+	return rc;
+}
+
+static __exit void dax_hmem_exit(void)
+{
+	platform_driver_unregister(&dax_hmem_driver);
+	platform_driver_unregister(&dax_hmem_platform_driver);
+}
+
+module_init(dax_hmem_init);
+module_exit(dax_hmem_exit);
 
 MODULE_ALIAS("platform:hmem*");
+MODULE_ALIAS("platform:hmem_platform*");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Intel Corporation");
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2b5ecb591059..bf6258472e49 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -262,11 +262,14 @@ static inline bool dax_mapping(struct address_space *mapping)
 }
 
 #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
-void hmem_register_device(int target_nid, struct resource *r);
+void hmem_register_resource(int target_nid, struct resource *r);
 #else
-static inline void hmem_register_device(int target_nid, struct resource *r)
+static inline void hmem_register_resource(int target_nid, struct resource *r)
 {
 }
 #endif
 
+typedef int (*walk_hmem_fn)(struct device *dev, int target_nid,
+			    const struct resource *res);
+int walk_hmem_resources(struct device *dev, walk_hmem_fn fn);
 #endif


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 17/18] dax: Assign RAM regions to memory-hotplug by default
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (15 preceding siblings ...)
  2023-02-06  1:03 ` [PATCH 16/18] dax/hmem: Move hmem device registration to dax_hmem.ko Dan Williams
@ 2023-02-06  1:04 ` Dan Williams
  2023-02-06 17:26   ` Gregory Price
  2023-02-06  1:04 ` [PATCH 18/18] cxl/dax: Create dax devices for CXL RAM regions Dan Williams
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:04 UTC (permalink / raw)
  To: linux-cxl
  Cc: Michal Hocko, David Hildenbrand, Dave Hansen, linux-mm, linux-acpi

The default mode for device-dax instances is backwards for RAM-regions
as evidenced by the fact that it tends to catch end users by surprise.
"Where is my memory?". Recall that platforms are increasingly shipping
with performance-differentiated memory pools beyond typical DRAM and
NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic
interleave, varied media types, and future fabric attached
possibilities).

For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux
'Soft Reserved') attribute is expected to be applied to all memory-pools
that are not the general purpose pool. This designation gives an
Operating System a chance to defer usage of a memory pool until later in
the boot process where its performance properties can be interrogated
and administrator policy can be applied.

'Soft Reserved' memory can be anything from too limited and precious to
be part of the general purpose pool (HBM), too slow to host hot kernel
data structures (some PMEM media), or anything in between. However, in
the absence of an explicit policy, the memory should at least be made
usable by default. The current device-dax default hides all
non-general-purpose memory behind a device interface.

The expectation is that the distribution of users that want the memory
online by default vs device-dedicated-access by default follows the
Pareto principle. A small number of enlightened users may want to do
userspace memory management through a device, but general users just
want the kernel to make the memory available with an option to get more
advanced later.

Arrange for all device-dax instances not backed by PMEM to default to
attaching to the dax_kmem driver. From there the baseline memory hotplug
policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=)
gates whether the memory comes online or stays offline. Where, if it
stays offline, it can be reliably converted back to device-mode where it
can be partitioned, or fronted by a userspace allocator.

So, if someone wants device-dax instances for their 'Soft Reserved'
memory:

1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot
   with memhp_default_state=offline, or roll the dice and hope that the
   kernel has not pinned a page in that memory before step 2.

2/ Write a udev rule to convert the target dax device(s) from
   'system-ram' mode to 'devdax' mode:

   daxctl reconfigure-device $dax -m devdax -f

Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/Kconfig     |    2 +-
 drivers/dax/bus.c       |   53 ++++++++++++++++++++---------------------------
 drivers/dax/bus.h       |   12 +++++++++--
 drivers/dax/device.c    |    3 +--
 drivers/dax/hmem/hmem.c |   12 ++++++++++-
 drivers/dax/kmem.c      |    1 +
 6 files changed, 46 insertions(+), 37 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d13c889c2a64..1163eb62e5f6 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -50,7 +50,7 @@ config DEV_DAX_HMEM_DEVICES
 	def_bool y
 
 config DEV_DAX_KMEM
-	tristate "KMEM DAX: volatile-use of persistent memory"
+	tristate "KMEM DAX: map dax-devices as System-RAM"
 	default DEV_DAX
 	depends on DEV_DAX
 	depends on MEMORY_HOTPLUG # for add_memory() and friends
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 1dad813ee4a6..012d576004e9 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -56,6 +56,25 @@ static int dax_match_id(struct dax_device_driver *dax_drv, struct device *dev)
 	return match;
 }
 
+static int dax_match_type(struct dax_device_driver *dax_drv, struct device *dev)
+{
+	enum dax_driver_type type = DAXDRV_DEVICE_TYPE;
+	struct dev_dax *dev_dax = to_dev_dax(dev);
+
+	if (dev_dax->region->res.flags & IORESOURCE_DAX_KMEM)
+		type = DAXDRV_KMEM_TYPE;
+
+	if (dax_drv->type == type)
+		return 1;
+
+	/* default to device mode if dax_kmem is disabled */
+	if (dax_drv->type == DAXDRV_DEVICE_TYPE &&
+	    !IS_ENABLED(CONFIG_DEV_DAX_KMEM))
+		return 1;
+
+	return 0;
+}
+
 enum id_action {
 	ID_REMOVE,
 	ID_ADD,
@@ -216,14 +235,9 @@ static int dax_bus_match(struct device *dev, struct device_driver *drv)
 {
 	struct dax_device_driver *dax_drv = to_dax_drv(drv);
 
-	/*
-	 * All but the 'device-dax' driver, which has 'match_always'
-	 * set, requires an exact id match.
-	 */
-	if (dax_drv->match_always)
+	if (dax_match_id(dax_drv, dev))
 		return 1;
-
-	return dax_match_id(dax_drv, dev);
+	return dax_match_type(dax_drv, dev);
 }
 
 /*
@@ -1413,13 +1427,10 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 }
 EXPORT_SYMBOL_GPL(devm_create_dev_dax);
 
-static int match_always_count;
-
 int __dax_driver_register(struct dax_device_driver *dax_drv,
 		struct module *module, const char *mod_name)
 {
 	struct device_driver *drv = &dax_drv->drv;
-	int rc = 0;
 
 	/*
 	 * dax_bus_probe() calls dax_drv->probe() unconditionally.
@@ -1434,26 +1445,7 @@ int __dax_driver_register(struct dax_device_driver *dax_drv,
 	drv->mod_name = mod_name;
 	drv->bus = &dax_bus_type;
 
-	/* there can only be one default driver */
-	mutex_lock(&dax_bus_lock);
-	match_always_count += dax_drv->match_always;
-	if (match_always_count > 1) {
-		match_always_count--;
-		WARN_ON(1);
-		rc = -EINVAL;
-	}
-	mutex_unlock(&dax_bus_lock);
-	if (rc)
-		return rc;
-
-	rc = driver_register(drv);
-	if (rc && dax_drv->match_always) {
-		mutex_lock(&dax_bus_lock);
-		match_always_count -= dax_drv->match_always;
-		mutex_unlock(&dax_bus_lock);
-	}
-
-	return rc;
+	return driver_register(drv);
 }
 EXPORT_SYMBOL_GPL(__dax_driver_register);
 
@@ -1463,7 +1455,6 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv)
 	struct dax_id *dax_id, *_id;
 
 	mutex_lock(&dax_bus_lock);
-	match_always_count -= dax_drv->match_always;
 	list_for_each_entry_safe(dax_id, _id, &dax_drv->ids, list) {
 		list_del(&dax_id->list);
 		kfree(dax_id);
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index fbb940293d6d..8cd79ab34292 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -11,7 +11,10 @@ struct dax_device;
 struct dax_region;
 void dax_region_put(struct dax_region *dax_region);
 
-#define IORESOURCE_DAX_STATIC (1UL << 0)
+/* dax bus specific ioresource flags */
+#define IORESOURCE_DAX_STATIC BIT(0)
+#define IORESOURCE_DAX_KMEM BIT(1)
+
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		struct range *range, int target_node, unsigned int align,
 		unsigned long flags);
@@ -25,10 +28,15 @@ struct dev_dax_data {
 
 struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data);
 
+enum dax_driver_type {
+	DAXDRV_KMEM_TYPE,
+	DAXDRV_DEVICE_TYPE,
+};
+
 struct dax_device_driver {
 	struct device_driver drv;
 	struct list_head ids;
-	int match_always;
+	enum dax_driver_type type;
 	int (*probe)(struct dev_dax *dev);
 	void (*remove)(struct dev_dax *dev);
 };
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 5494d745ced5..ecdff79e31f2 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -475,8 +475,7 @@ EXPORT_SYMBOL_GPL(dev_dax_probe);
 
 static struct dax_device_driver device_dax_driver = {
 	.probe = dev_dax_probe,
-	/* all probe actions are unwound by devm, so .remove isn't necessary */
-	.match_always = 1,
+	.type = DAXDRV_DEVICE_TYPE,
 };
 
 static int __init dax_init(void)
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index e7bdff3132fa..5ec08f9f8a57 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -11,15 +11,25 @@ module_param_named(region_idle, region_idle, bool, 0644);
 
 static int dax_hmem_probe(struct platform_device *pdev)
 {
+	unsigned long flags = IORESOURCE_DAX_KMEM;
 	struct device *dev = &pdev->dev;
 	struct dax_region *dax_region;
 	struct memregion_info *mri;
 	struct dev_dax_data data;
 	struct dev_dax *dev_dax;
 
+	/*
+	 * @region_idle == true indicates that an administrative agent
+	 * wants to manipulate the range partitioning before the devices
+	 * are created, so do not send them to the dax_kmem driver by
+	 * default.
+	 */
+	if (region_idle)
+		flags = 0;
+
 	mri = dev->platform_data;
 	dax_region = alloc_dax_region(dev, pdev->id, &mri->range,
-				      mri->target_node, PMD_SIZE, 0);
+				      mri->target_node, PMD_SIZE, flags);
 	if (!dax_region)
 		return -ENOMEM;
 
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 4852a2dbdb27..918d01d3fbaa 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -239,6 +239,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
 static struct dax_device_driver device_dax_kmem_driver = {
 	.probe = dev_dax_kmem_probe,
 	.remove = dev_dax_kmem_remove,
+	.type = DAXDRV_KMEM_TYPE,
 };
 
 static int __init dax_kmem_init(void)


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH 18/18] cxl/dax: Create dax devices for CXL RAM regions
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (16 preceding siblings ...)
  2023-02-06  1:04 ` [PATCH 17/18] dax: Assign RAM regions to memory-hotplug by default Dan Williams
@ 2023-02-06  1:04 ` Dan Williams
  2023-02-06  5:36 ` [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Gregory Price
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06  1:04 UTC (permalink / raw)
  To: linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

While platform firmware takes some responsibility for mapping the RAM
capacity of CXL devices present at boot, the OS is responsible for
mapping the remainder and hot-added devices. Platform firmware is also
responsible for identifying the platform general purpose memory pool,
typically DDR attached DRAM, and arranging for the remainder to be 'Soft
Reserved'. That reservation allows the CXL subsystem to route the memory
to core-mm via memory-hotplug (dax_kmem), or leave it for dedicated
access (device-dax).

The new 'struct cxl_dax_region' object allows for a CXL memory resource
(region) to be published, but also allow for udev and module policy to
act on that event. It also prevents cxl_core.ko from having a module
loading dependency on any drivers/dax/ modules.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 MAINTAINERS               |    1 
 drivers/cxl/acpi.c        |    3 +
 drivers/cxl/core/core.h   |    3 +
 drivers/cxl/core/port.c   |    2 +
 drivers/cxl/core/region.c |  108 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/cxl/cxl.h         |   12 +++++
 drivers/dax/Kconfig       |   13 +++++
 drivers/dax/Makefile      |    2 +
 drivers/dax/cxl.c         |   53 ++++++++++++++++++++++
 drivers/dax/hmem/hmem.c   |   14 ++++++
 10 files changed, 208 insertions(+), 3 deletions(-)
 create mode 100644 drivers/dax/cxl.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7f86d02cb427..73a9f3401e0e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6035,6 +6035,7 @@ M:	Dan Williams <dan.j.williams@intel.com>
 M:	Vishal Verma <vishal.l.verma@intel.com>
 M:	Dave Jiang <dave.jiang@intel.com>
 L:	nvdimm@lists.linux.dev
+L:	linux-cxl@vger.kernel.org
 S:	Supported
 F:	drivers/dax/
 
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index ad0849af42d7..8ebb9a74790d 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -731,7 +731,8 @@ static void __exit cxl_acpi_exit(void)
 	cxl_bus_drain();
 }
 
-module_init(cxl_acpi_init);
+/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
+subsys_initcall(cxl_acpi_init);
 module_exit(cxl_acpi_exit);
 MODULE_LICENSE("GPL v2");
 MODULE_IMPORT_NS(CXL);
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 479f01da6d35..cde475e13216 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -15,12 +15,14 @@ extern struct device_attribute dev_attr_create_ram_region;
 extern struct device_attribute dev_attr_delete_region;
 extern struct device_attribute dev_attr_region;
 extern const struct device_type cxl_pmem_region_type;
+extern const struct device_type cxl_dax_region_type;
 extern const struct device_type cxl_region_type;
 void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled);
 #define CXL_REGION_ATTR(x) (&dev_attr_##x.attr)
 #define CXL_REGION_TYPE(x) (&cxl_region_type)
 #define SET_CXL_REGION_ATTR(x) (&dev_attr_##x.attr),
 #define CXL_PMEM_REGION_TYPE(x) (&cxl_pmem_region_type)
+#define CXL_DAX_REGION_TYPE(x) (&cxl_dax_region_type)
 int cxl_region_init(void);
 void cxl_region_exit(void);
 #else
@@ -38,6 +40,7 @@ static inline void cxl_region_exit(void)
 #define CXL_REGION_TYPE(x) NULL
 #define SET_CXL_REGION_ATTR(x)
 #define CXL_PMEM_REGION_TYPE(x) NULL
+#define CXL_DAX_REGION_TYPE(x) NULL
 #endif
 
 struct cxl_send_command;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 8d0895cbae93..0faeb1ffc212 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -46,6 +46,8 @@ static int cxl_device_id(struct device *dev)
 		return CXL_DEVICE_NVDIMM;
 	if (dev->type == CXL_PMEM_REGION_TYPE())
 		return CXL_DEVICE_PMEM_REGION;
+	if (dev->type == CXL_DAX_REGION_TYPE())
+		return CXL_DEVICE_DAX_REGION;
 	if (is_cxl_port(dev)) {
 		if (is_cxl_root(to_cxl_port(dev)))
 			return CXL_DEVICE_ROOT;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 6fe8c70790df..86fa54015a99 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2261,6 +2261,75 @@ static struct cxl_pmem_region *cxl_pmem_region_alloc(struct cxl_region *cxlr)
 	return cxlr_pmem;
 }
 
+static void cxl_dax_region_release(struct device *dev)
+{
+	struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev);
+
+	kfree(cxlr_dax);
+}
+
+static const struct attribute_group *cxl_dax_region_attribute_groups[] = {
+	&cxl_base_attribute_group,
+	NULL,
+};
+
+const struct device_type cxl_dax_region_type = {
+	.name = "cxl_dax_region",
+	.release = cxl_dax_region_release,
+	.groups = cxl_dax_region_attribute_groups,
+};
+
+static bool is_cxl_dax_region(struct device *dev)
+{
+	return dev->type == &cxl_dax_region_type;
+}
+
+struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
+{
+	if (dev_WARN_ONCE(dev, !is_cxl_dax_region(dev),
+			  "not a cxl_dax_region device\n"))
+		return NULL;
+	return container_of(dev, struct cxl_dax_region, dev);
+}
+EXPORT_SYMBOL_NS_GPL(to_cxl_dax_region, CXL);
+
+static struct lock_class_key cxl_dax_region_key;
+
+static struct cxl_dax_region *cxl_dax_region_alloc(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_dax_region *cxlr_dax;
+	struct device *dev;
+
+	down_read(&cxl_region_rwsem);
+	if (p->state != CXL_CONFIG_COMMIT) {
+		cxlr_dax = ERR_PTR(-ENXIO);
+		goto out;
+	}
+
+	cxlr_dax = kzalloc(sizeof(*cxlr_dax), GFP_KERNEL);
+	if (!cxlr_dax) {
+		cxlr_dax = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	cxlr_dax->hpa_range.start = p->res->start;
+	cxlr_dax->hpa_range.end = p->res->end;
+
+	dev = &cxlr_dax->dev;
+	cxlr_dax->cxlr = cxlr;
+	device_initialize(dev);
+	lockdep_set_class(&dev->mutex, &cxl_dax_region_key);
+	device_set_pm_not_required(dev);
+	dev->parent = &cxlr->dev;
+	dev->bus = &cxl_bus_type;
+	dev->type = &cxl_dax_region_type;
+out:
+	up_read(&cxl_region_rwsem);
+
+	return cxlr_dax;
+}
+
 static void cxlr_pmem_unregister(void *_cxlr_pmem)
 {
 	struct cxl_pmem_region *cxlr_pmem = _cxlr_pmem;
@@ -2345,6 +2414,42 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr)
 	return rc;
 }
 
+static void cxlr_dax_unregister(void *_cxlr_dax)
+{
+	struct cxl_dax_region *cxlr_dax = _cxlr_dax;
+
+	device_unregister(&cxlr_dax->dev);
+}
+
+static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
+{
+	struct cxl_dax_region *cxlr_dax;
+	struct device *dev;
+	int rc;
+
+	cxlr_dax = cxl_dax_region_alloc(cxlr);
+	if (IS_ERR(cxlr_dax))
+		return PTR_ERR(cxlr_dax);
+
+	dev = &cxlr_dax->dev;
+	rc = dev_set_name(dev, "dax_region%d", cxlr->id);
+	if (rc)
+		goto err;
+
+	rc = device_add(dev);
+	if (rc)
+		goto err;
+
+	dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent),
+		dev_name(dev));
+
+	return devm_add_action_or_reset(&cxlr->dev, cxlr_dax_unregister,
+					cxlr_dax);
+err:
+	put_device(dev);
+	return rc;
+}
+
 static int match_decoder_by_range(struct device *dev, void *data)
 {
 	struct range *r1, *r2 = data;
@@ -2606,8 +2711,7 @@ static int cxl_region_probe(struct device *dev)
 					p->res->start, p->res->end, cxlr,
 					is_system_ram) > 0)
 			return 0;
-		dev_dbg(dev, "TODO: hookup devdax\n");
-		return 0;
+		return devm_cxl_add_dax_region(cxlr);
 	default:
 		dev_dbg(&cxlr->dev, "unsupported region mode: %d\n",
 			cxlr->mode);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 4c6ee6c96f23..336b65d333a2 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -505,6 +505,12 @@ struct cxl_pmem_region {
 	struct cxl_pmem_region_mapping mapping[];
 };
 
+struct cxl_dax_region {
+	struct device dev;
+	struct cxl_region *cxlr;
+	struct range hpa_range;
+};
+
 /**
  * struct cxl_port - logical collection of upstream port devices and
  *		     downstream port devices to construct a CXL memory
@@ -699,6 +705,7 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv);
 #define CXL_DEVICE_MEMORY_EXPANDER	5
 #define CXL_DEVICE_REGION		6
 #define CXL_DEVICE_PMEM_REGION		7
+#define CXL_DEVICE_DAX_REGION		8
 
 #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
 #define CXL_MODALIAS_FMT "cxl:t%d"
@@ -715,6 +722,7 @@ struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct device *dev);
 #ifdef CONFIG_CXL_REGION
 bool is_cxl_pmem_region(struct device *dev);
 struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
+struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
 void cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
 #else
 static inline bool is_cxl_pmem_region(struct device *dev)
@@ -725,6 +733,10 @@ static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev)
 {
 	return NULL;
 }
+static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
+{
+	return NULL;
+}
 static inline void cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
 {
 }
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 1163eb62e5f6..bd06e16c7ac8 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -45,6 +45,19 @@ config DEV_DAX_HMEM
 
 	  Say M if unsure.
 
+config DEV_DAX_CXL
+	tristate "CXL DAX: direct access to CXL RAM regions"
+	depends on CXL_REGION && DEV_DAX
+	default CXL_REGION && DEV_DAX
+	help
+	  CXL RAM regions are either mapped by platform-firmware
+	  and published in the initial system-memory map as "System RAM", mapped
+	  by platform-firmware as "Soft Reserved", or dynamically provisioned
+	  after boot by the CXL driver. In the latter two cases a device-dax
+	  instance is created to access that unmapped-by-default address range.
+	  Per usual it can remain as dedicated access via a device interface, or
+	  converted to "System RAM" via the dax_kmem facility.
+
 config DEV_DAX_HMEM_DEVICES
 	depends on DEV_DAX_HMEM && DAX
 	def_bool y
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 90a56ca3b345..5ed5c39857c8 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -3,10 +3,12 @@ obj-$(CONFIG_DAX) += dax.o
 obj-$(CONFIG_DEV_DAX) += device_dax.o
 obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
 obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
+obj-$(CONFIG_DEV_DAX_CXL) += dax_cxl.o
 
 dax-y := super.o
 dax-y += bus.o
 device_dax-y := device.o
 dax_pmem-y := pmem.o
+dax_cxl-y := cxl.o
 
 obj-y += hmem/
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
new file mode 100644
index 000000000000..ccdf8de85bd5
--- /dev/null
+++ b/drivers/dax/cxl.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Intel Corporation. All rights reserved. */
+#include <linux/module.h>
+#include <linux/dax.h>
+
+#include "../cxl/cxl.h"
+#include "bus.h"
+
+static int cxl_dax_region_probe(struct device *dev)
+{
+	struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev);
+	int nid = phys_to_target_node(cxlr_dax->hpa_range.start);
+	struct cxl_region *cxlr = cxlr_dax->cxlr;
+	struct dax_region *dax_region;
+	struct dev_dax_data data;
+	struct dev_dax *dev_dax;
+
+	if (nid == NUMA_NO_NODE)
+		nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start);
+
+	dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid,
+				      PMD_SIZE, IORESOURCE_DAX_KMEM);
+	if (!dax_region)
+		return -ENOMEM;
+
+	data = (struct dev_dax_data) {
+		.dax_region = dax_region,
+		.id = -1,
+		.size = range_len(&cxlr_dax->hpa_range),
+	};
+	dev_dax = devm_create_dev_dax(&data);
+	if (IS_ERR(dev_dax))
+		return PTR_ERR(dev_dax);
+
+	/* child dev_dax instances now own the lifetime of the dax_region */
+	dax_region_put(dax_region);
+	return 0;
+}
+
+static struct cxl_driver cxl_dax_region_driver = {
+	.name = "cxl_dax_region",
+	.probe = cxl_dax_region_probe,
+	.id = CXL_DEVICE_DAX_REGION,
+	.drv = {
+		.suppress_bind_attrs = true,
+	},
+};
+
+module_cxl_driver(cxl_dax_region_driver);
+MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Intel Corporation");
+MODULE_IMPORT_NS(CXL);
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 5ec08f9f8a57..e5fe8b39fb94 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -72,6 +72,13 @@ static int hmem_register_device(struct device *host, int target_nid,
 	long id;
 	int rc;
 
+	if (IS_ENABLED(CONFIG_CXL_REGION) &&
+	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+			      IORES_DESC_CXL) != REGION_DISJOINT) {
+		dev_dbg(host, "deferring range to CXL: %pr\n", res);
+		return 0;
+	}
+
 	rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
 			       IORES_DESC_SOFT_RESERVED);
 	if (rc != REGION_INTERSECTS)
@@ -157,6 +164,13 @@ static __exit void dax_hmem_exit(void)
 module_init(dax_hmem_init);
 module_exit(dax_hmem_exit);
 
+/* Allow for CXL to define its own dax regions */
+#if IS_ENABLED(CONFIG_CXL_REGION)
+#if IS_MODULE(CONFIG_CXL_ACPI)
+MODULE_SOFTDEP("pre: cxl_acpi");
+#endif
+#endif
+
 MODULE_ALIAS("platform:hmem*");
 MODULE_ALIAS("platform:hmem_platform*");
 MODULE_LICENSE("GPL v2");


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (17 preceding siblings ...)
  2023-02-06  1:04 ` [PATCH 18/18] cxl/dax: Create dax devices for CXL RAM regions Dan Williams
@ 2023-02-06  5:36 ` Gregory Price
  2023-02-06 16:40   ` Davidlohr Bueso
  2023-02-06 17:29   ` Dan Williams
       [not found] ` <CGME20230208173730uscas1p2af3a9eeb8946dfa607b190c079a49653@uscas1p2.samsung.com>
                   ` (2 subsequent siblings)
  21 siblings, 2 replies; 112+ messages in thread
From: Gregory Price @ 2023-02-06  5:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, David Hildenbrand, Kees Cook, stable, Dave Hansen,
	Michal Hocko, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:
> Summary:
> --------
> 
> CXL RAM support allows for the dynamic provisioning of new CXL RAM
> regions, and more routinely, assembling a region from an existing
> configuration established by platform-firmware. The latter is motivated
> by CXL memory RAS (Reliability, Availability and Serviceability)
> support, that requires associating device events with System Physical
> Address ranges and vice versa.
> 
> The 'Soft Reserved' policy rework arranges for performance
> differentiated memory like CXL attached DRAM, or high-bandwidth memory,
> to be designated for 'System RAM' by default, rather than the device-dax
> dedicated access mode. That current device-dax default is confusing and
> surprising for the Pareto of users that do not expect memory to be
> quarantined for dedicated access by default. Most users expect all
> 'System RAM'-capable memory to show up in FREE(1).

Leverage the same QEMU branch, machine, and configuration as my prior
tests, i'm now experiencing a kernel panic on boot.  Will debug a bit
in the morning, but here is the stack trace i'm seeing

Saw this in both 1 and 2 root port configurations

(note: I also have the region reset issue previously discussed on top of
your branch).  

QEMU configuration:

sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
-drive file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
-m 2G,slots=4,maxmem=4G \
-smp 4 \
-machine type=q35,accel=kvm,cxl=on \
-enable-kvm \
-nographic \
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
-object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \
-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G

[   13.936817] Call Trace:
[   13.970691]  <TASK>
[   13.990690]  device_add+0x39d/0x9a0
[   14.024690]  ? kobject_set_name_vargs+0x6d/0x90
[   14.066690]  ? dev_set_name+0x4b/0x60
[   14.090691]  devm_cxl_add_port+0x29a/0x4d0
[   14.135946]  cxl_acpi_probe+0xd9/0x2f0
[   14.167691]  ? device_pm_check_callbacks+0x36/0x100
[   14.203691]  platform_probe+0x44/0x90
[   14.247691]  really_probe+0xde/0x380
[   14.277690]  ? pm_runtime_barrier+0x50/0x90
[   14.324693]  __driver_probe_device+0x78/0x170
[   14.356694]  driver_probe_device+0x1f/0x90
[   14.396692]  __driver_attach+0xce/0x1c0
[   14.435691]  ? __pfx___driver_attach+0x10/0x10
[   14.471692]  bus_for_each_dev+0x73/0xa0
[   14.508693]  bus_add_driver+0x1ae/0x200
[   14.551691]  driver_register+0x89/0xe0
[   14.587691]  ? __pfx_cxl_acpi_init+0x10/0x10
[   14.625690]  do_one_initcall+0x59/0x230
[   14.814691]  kernel_init_freeable+0x204/0x24e
[   14.846710]  ? __pfx_kernel_init+0x10/0x10
[   14.899692]  kernel_init+0x16/0x140
[   14.954691]  ret_from_fork+0x2c/0x50
[   14.986692]  </TASK>
[   15.023689] Modules linked in:
[   15.057693] CR2: 0000000000000060
[   15.105691] ---[ end trace 0000000000000000 ]---
[   15.162837] RIP: 0010:bus_add_device+0x5b/0x150
[   15.217693] Code: 49 8b 74 24 20 48 89 df e8 92 88 ff ff 89 c5 85 c0 75 3b 48 8b 53 50 48 85 d2 75 03 48 8b 13 49 8b 84 24 a8 00 00 00 48 89 0
[   15.427859] RSP: 0000:ffffbd2a40013bc0 EFLAGS: 00010246
[   15.475692] RAX: 0000000000000000 RBX: ffff955c419e1800 RCX: 0000000000000000
[   15.591691] RDX: ffff955c41921778 RSI: ffff955c419e1800 RDI: ffff955c419e1800
[   15.703693] RBP: 0000000000000000 R08: 0000000000000228 R09: ffff955c4119e550
[   15.802711] R10: ffff955c414bedc8 R11: 0000000000000000 R12: ffffffff9e259d60
[   15.907692] R13: ffffffff9d3cee40 R14: ffff955c419e1800 R15: ffff955c41f06010
[   15.983693] FS:  0000000000000000(0000) GS:ffff955cbdd00000(0000) knlGS:0000000000000000
[   16.126698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.206694] CR2: 0000000000000060 CR3: 0000000036010000 CR4: 00000000000006e0
[   16.347694] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[   16.348686] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 10/18] cxl/region: Fix passthrough-decoder detection
  2023-02-06  1:03 ` [PATCH 10/18] cxl/region: Fix passthrough-decoder detection Dan Williams
@ 2023-02-06  5:38   ` Greg KH
  2023-02-06 17:22     ` Dan Williams
  2023-02-07  0:00   ` Dave Jiang
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 112+ messages in thread
From: Greg KH @ 2023-02-06  5:38 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, stable, dave.hansen, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:03:24PM -0800, Dan Williams wrote:
> A passthrough decoder is a decoder that maps only 1 target. It is a
> special case because it does not impose any constraints on the
> interleave-math as compared to a decoder with multiple targets. Extend
> the passthrough case to multi-target-capable decoders that only have one
> target selected. I.e. the current code was only considering passthrough
> *ports* which are only a subset of the potential passthrough decoder
> scenarios.
> 
> Fixes: e4f6dfa9ef75 ("cxl/region: Fix 'distance' calculation with passthrough ports")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

If a patch really is a "fix" that needs to go to stable kernels, why is
it commit 10 out of 18?  Why isn't it going to Linus now for inclusion
in 6.2-final?  Does it depend on the 9 earlier patches in this series?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0
  2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
@ 2023-02-06 15:17   ` Jonathan Cameron
  2023-02-06 16:37   ` Gregory Price
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-06 15:17 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:02:35 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Prior to Linus deciding that the kernel that following v5.19 would be
> v6.0, the CXL ABI documentation already referenced v5.20. In preparation
> for updating these entries update the kernel version to v6.0.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fair enough though that fun bit of history disappears :)

Might as well queue this up ahead of the rest if you want to.
FWIW
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
 
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 329a7e46c805..5be032313e29 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -198,7 +198,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/endpointX/CDAT
>  Date:		July, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) If this sysfs entry is not present no DOE mailbox was
> @@ -209,7 +209,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/mode
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> @@ -229,7 +229,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/dpa_resource
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint",
> @@ -240,7 +240,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/dpa_size
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> @@ -260,7 +260,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/interleave_ways
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) The number of targets across which this decoder's host
> @@ -275,7 +275,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/interleave_granularity
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) The number of consecutive bytes of host physical address
> @@ -287,7 +287,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/create_pmem_region
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a string in the form 'regionZ' to start the process
> @@ -303,7 +303,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(WO) Write a string in the form 'regionZ' to delete that region,
> @@ -312,7 +312,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/uuid
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a unique identifier for the region. This field must
> @@ -322,7 +322,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Set the number of consecutive bytes each device in the
> @@ -333,7 +333,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/interleave_ways
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Configures the number of devices participating in the
> @@ -343,7 +343,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/size
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) System physical address space to be consumed by the region.
> @@ -360,7 +360,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/resource
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) A region is a contiguous partition of a CXL root decoder
> @@ -372,7 +372,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/target[0..N]
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write an endpoint decoder object name to 'targetX' where X
> @@ -391,7 +391,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/commit
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a boolean 'true' string value to this attribute to
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 02/18] cxl/region: Add a mode attribute for regions
  2023-02-06  1:02 ` [PATCH 02/18] cxl/region: Add a mode attribute for regions Dan Williams
@ 2023-02-06 15:46   ` Jonathan Cameron
  2023-02-06 17:47     ` Dan Williams
  2023-02-06 16:39   ` Gregory Price
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-06 15:46 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:02:40 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for a new region type, "ram" regions, add a mode
> attribute to clarify the mode of the decoders that can be added to a
> region. Share the internals of mode_show() (for decoders) with the
> region case.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
I guess the DEAD decoder cases is an odd enough path that it is
fine to not worry that it is presented as mixed (none might be
more appropriate as you can't put any memory behind it).

Other than that this is straight forwards exposure of internal data
that already existed...

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   11 +++++++++++
>  drivers/cxl/core/port.c                 |   12 +-----------
>  drivers/cxl/core/region.c               |   10 ++++++++++
>  drivers/cxl/cxl.h                       |   14 ++++++++++++++
>  4 files changed, 36 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 5be032313e29..058b0c45001f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -358,6 +358,17 @@ Description:
>  		results in the same address being allocated.
>  
>  
> +What:		/sys/bus/cxl/devices/regionZ/mode
> +Date:		January, 2023
> +KernelVersion:	v6.3
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The mode of a region is established at region creation time
> +		and dictates the mode of the endpoint decoder that comprise the
> +		region. For more details on the possible modes see
> +		/sys/bus/cxl/devices/decoderX.Y/mode
> +
> +
>  What:		/sys/bus/cxl/devices/regionZ/resource
>  Date:		May, 2022
>  KernelVersion:	v6.0
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 410c036c09fa..8566451cb22f 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -180,17 +180,7 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
>  {
>  	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
>  
> -	switch (cxled->mode) {
> -	case CXL_DECODER_RAM:
> -		return sysfs_emit(buf, "ram\n");
> -	case CXL_DECODER_PMEM:
> -		return sysfs_emit(buf, "pmem\n");
> -	case CXL_DECODER_NONE:
> -		return sysfs_emit(buf, "none\n");
> -	case CXL_DECODER_MIXED:
> -	default:
> -		return sysfs_emit(buf, "mixed\n");
> -	}
> +	return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode));
>  }
>  
>  static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 60828d01972a..17d2d0c12725 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -458,6 +458,15 @@ static ssize_t resource_show(struct device *dev, struct device_attribute *attr,
>  }
>  static DEVICE_ATTR_RO(resource);
>  
> +static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
> +			 char *buf)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +
> +	return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode));
> +}
> +static DEVICE_ATTR_RO(mode);
> +
>  static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
>  {
>  	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> @@ -585,6 +594,7 @@ static struct attribute *cxl_region_attrs[] = {
>  	&dev_attr_interleave_granularity.attr,
>  	&dev_attr_resource.attr,
>  	&dev_attr_size.attr,
> +	&dev_attr_mode.attr,
>  	NULL,
>  };
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index aa3af3bb73b2..ca76879af1de 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -320,6 +320,20 @@ enum cxl_decoder_mode {
>  	CXL_DECODER_DEAD,
>  };
>  
> +static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
> +{
> +	static const char * const names[] = {
> +		[CXL_DECODER_NONE] = "none",
> +		[CXL_DECODER_RAM] = "ram",
> +		[CXL_DECODER_PMEM] = "pmem",
> +		[CXL_DECODER_MIXED] = "mixed",
> +	};
> +
> +	if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
> +		return names[mode];
> +	return "mixed";
> +}
> +
>  /**
>   * struct cxl_endpoint_decoder - Endpoint  / SPA to DPA decoder
>   * @cxld: base cxl_decoder_object
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions
  2023-02-06  1:02 ` [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions Dan Williams
@ 2023-02-06 15:54   ` Jonathan Cameron
  2023-02-06 18:07     ` Dan Williams
  2023-02-06 19:22   ` Ira Weiny
  2023-02-09  0:24   ` Verma, Vishal L
  2 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-06 15:54 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:02:45 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Shipping versions of the cxl-cli utility expect all regions to have a
> 'uuid' attribute. In preparation for 'ram' regions, update the 'uuid'
> attribute to return an empty string which satisfies the current
> expectations of 'cxl list -R'. Otherwise, 'cxl list -R' fails in the
> presence of regions with the 'uuid' attribute missing. Force the
> attribute to be read-only as there is no facility or expectation for a
> 'ram' region to recall its uuid from one boot to the next.

This is new functionality, so do we need the backwards compatibility?
Or does that cxl list -R not just skip these, but fail hard?
It's inelegant to have this visible at all (and it confused me when
I was running tests as I assumed I needed to poke something in it).

If nothing else, good to have a comment to remind us in the code
why there is the oddity of setting it to read only.

Jonathan

> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |    3 ++-
>  drivers/cxl/core/region.c               |    7 +++++--
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 058b0c45001f..4c4e1cbb1169 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -317,7 +317,8 @@ Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a unique identifier for the region. This field must
>  		be set for persistent regions and it must not conflict with the
> -		UUID of another region.
> +		UUID of another region. For volatile ram regions this
> +		attribute is a read-only empty string.
>  
>  
>  What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 17d2d0c12725..c9e7f05caa0f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -45,7 +45,10 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
>  	rc = down_read_interruptible(&cxl_region_rwsem);
>  	if (rc)
>  		return rc;
> -	rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
> +	if (cxlr->mode != CXL_DECODER_PMEM)
> +		rc = sysfs_emit(buf, "\n");
> +	else
> +		rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
>  	up_read(&cxl_region_rwsem);
>  
>  	return rc;
> @@ -301,7 +304,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
>  	struct cxl_region *cxlr = to_cxl_region(dev);
>  
>  	if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM)
> -		return 0;
> +		return 0444;

Definitely a comment here as this is very odd and we may forget why
it is like this.

>  	return a->mode;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 04/18] cxl/region: Validate region mode vs decoder mode
  2023-02-06  1:02 ` [PATCH 04/18] cxl/region: Validate region mode vs decoder mode Dan Williams
@ 2023-02-06 16:02   ` Jonathan Cameron
  2023-02-06 18:14     ` Dan Williams
  2023-02-06 16:44   ` Gregory Price
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-06 16:02 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:02:51 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for a new region mode, do not, for example, allow
> 'ram' decoders to be assigned to 'pmem' regions and vice versa.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

I guess we aren't support mixed endpoint decoders for now (and
maybe never..) If we did I'd expect this to have to allow
mixed with either RAM or PMEM.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/cxl/core/region.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c9e7f05caa0f..53d6dbe4de6d 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1217,6 +1217,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  	struct cxl_dport *dport;
>  	int i, rc = -ENXIO;
>  
> +	if (cxled->mode != cxlr->mode) {
> +		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> +			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> +		return -EINVAL;
> +	}
> +
>  	if (cxled->mode == CXL_DECODER_DEAD) {
>  		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
>  		return -ENODEV;
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06  1:02 ` [PATCH 05/18] cxl/region: Add volatile region creation support Dan Williams
@ 2023-02-06 16:18   ` Jonathan Cameron
  2023-02-06 18:19     ` Dan Williams
  2023-02-06 16:55   ` Gregory Price
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-06 16:18 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:02:56 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Expand the region creation infrastructure to enable 'ram'
> (volatile-memory) regions. The internals of create_pmem_region_store()
> and create_pmem_region_show() are factored out into helpers
> __create_region() and __create_region_show() for the 'ram' case to
> reuse.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Entirely trivial comments inline.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   22 +++++-----
>  drivers/cxl/core/core.h                 |    1 
>  drivers/cxl/core/port.c                 |   14 ++++++
>  drivers/cxl/core/region.c               |   71 +++++++++++++++++++++++++------
>  4 files changed, 82 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 4c4e1cbb1169..3acf2f17a73f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -285,20 +285,20 @@ Description:
>  		interleave_granularity).
>  
>  
> -What:		/sys/bus/cxl/devices/decoderX.Y/create_pmem_region
> -Date:		May, 2022
> -KernelVersion:	v6.0
> +What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
> +Date:		May, 2022, January, 2023
> +KernelVersion:	v6.0 (pmem), v6.3 (ram)
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a string in the form 'regionZ' to start the process
> -		of defining a new persistent memory region (interleave-set)
> -		within the decode range bounded by root decoder 'decoderX.Y'.
> -		The value written must match the current value returned from
> -		reading this attribute. An atomic compare exchange operation is
> -		done on write to assign the requested id to a region and
> -		allocate the region-id for the next creation attempt. EBUSY is
> -		returned if the region name written does not match the current
> -		cached value.
> +		of defining a new persistent, or volatile memory region
> +		(interleave-set) within the decode range bounded by root decoder
> +		'decoderX.Y'. The value written must match the current value
> +		returned from reading this attribute. An atomic compare exchange
> +		operation is done on write to assign the requested id to a
> +		region and allocate the region-id for the next creation attempt.
> +		EBUSY is returned if the region name written does not match the
> +		current cached value.
>  
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 8c04672dca56..5eb873da5a30 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -11,6 +11,7 @@ extern struct attribute_group cxl_base_attribute_group;
>  
>  #ifdef CONFIG_CXL_REGION
>  extern struct device_attribute dev_attr_create_pmem_region;
> +extern struct device_attribute dev_attr_create_ram_region;
>  extern struct device_attribute dev_attr_delete_region;
>  extern struct device_attribute dev_attr_region;
>  extern const struct device_type cxl_pmem_region_type;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 8566451cb22f..47e450c3a5a9 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -294,6 +294,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>  	&dev_attr_cap_type3.attr,
>  	&dev_attr_target_list.attr,
>  	SET_CXL_REGION_ATTR(create_pmem_region)
> +	SET_CXL_REGION_ATTR(create_ram_region)
>  	SET_CXL_REGION_ATTR(delete_region)
>  	NULL,
>  };
> @@ -305,6 +306,13 @@ static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
>  	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
>  }
>  
> +static bool can_create_ram(struct cxl_root_decoder *cxlrd)
> +{
> +	unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
> +
> +	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> +}
> +
>  static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
>  {
>  	struct device *dev = kobj_to_dev(kobj);
> @@ -313,7 +321,11 @@ static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *
>  	if (a == CXL_REGION_ATTR(create_pmem_region) && !can_create_pmem(cxlrd))
>  		return 0;
>  
> -	if (a == CXL_REGION_ATTR(delete_region) && !can_create_pmem(cxlrd))
> +	if (a == CXL_REGION_ATTR(create_ram_region) && !can_create_ram(cxlrd))
> +		return 0;
> +
> +	if (a == CXL_REGION_ATTR(delete_region) &&
> +	    !(can_create_pmem(cxlrd) || can_create_ram(cxlrd)))
>  		return 0;
>  
>  	return a->mode;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 53d6dbe4de6d..8dea49c021b8 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1685,6 +1685,15 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>  	struct device *dev;
>  	int rc;
>  
> +	switch (mode) {
> +	case CXL_DECODER_RAM:
> +	case CXL_DECODER_PMEM:
> +		break;
> +	default:
> +		dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
>  	cxlr = cxl_region_alloc(cxlrd, id);
>  	if (IS_ERR(cxlr))
>  		return cxlr;
> @@ -1713,12 +1722,38 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>  	return ERR_PTR(rc);
>  }
>  
> +static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf)
> +{
> +	return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
> +}
> +
>  static ssize_t create_pmem_region_show(struct device *dev,
>  				       struct device_attribute *attr, char *buf)
>  {
> -	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +	return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
>  
> -	return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
> +static ssize_t create_ram_region_show(struct device *dev,
> +				      struct device_attribute *attr, char *buf)
> +{
> +	return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
> +
> +static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
> +					  enum cxl_decoder_mode mode, int id)
> +{
> +	int rc;
> +
> +	rc = memregion_alloc(GFP_KERNEL);
> +	if (rc < 0)
> +		return ERR_PTR(rc);
> +
> +	if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
> +		memregion_free(rc);
> +		return ERR_PTR(-EBUSY);
> +	}
> +
> +	return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_EXPANDER);
>  }
>  
>  static ssize_t create_pmem_region_store(struct device *dev,
> @@ -1727,29 +1762,37 @@ static ssize_t create_pmem_region_store(struct device *dev,
>  {
>  	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>  	struct cxl_region *cxlr;
> -	int id, rc;
> +	int rc, id;
>  
>  	rc = sscanf(buf, "region%d\n", &id);
>  	if (rc != 1)
>  		return -EINVAL;
>  
> -	rc = memregion_alloc(GFP_KERNEL);
> -	if (rc < 0)
> -		return rc;
> +	cxlr = __create_region(cxlrd, CXL_DECODER_PMEM, id);
> +	if (IS_ERR(cxlr))
> +		return PTR_ERR(cxlr);

I'd have a blank line here - see below.

> +	return len;
> +}
> +DEVICE_ATTR_RW(create_pmem_region);
>  
> -	if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
> -		memregion_free(rc);
> -		return -EBUSY;
> -	}
> +static ssize_t create_ram_region_store(struct device *dev,
> +				       struct device_attribute *attr,
> +				       const char *buf, size_t len)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +	struct cxl_region *cxlr;
> +	int rc, id;
>  
> -	cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM,
> -				   CXL_DECODER_EXPANDER);
> +	rc = sscanf(buf, "region%d\n", &id);
> +	if (rc != 1)
> +		return -EINVAL;
> +
> +	cxlr = __create_region(cxlrd, CXL_DECODER_RAM, id);
>  	if (IS_ERR(cxlr))
>  		return PTR_ERR(cxlr);
> -

Just so you know I read to the end!

Spurious unrelated white space change :)

>  	return len;
>  }
> -DEVICE_ATTR_RW(create_pmem_region);
> +DEVICE_ATTR_RW(create_ram_region);
>  
>  static ssize_t region_show(struct device *dev, struct device_attribute *attr,
>  			   char *buf)
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0
  2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
  2023-02-06 15:17   ` Jonathan Cameron
@ 2023-02-06 16:37   ` Gregory Price
  2023-02-06 17:27   ` [PATCH 1/18] " Davidlohr Bueso
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 112+ messages in thread
From: Gregory Price @ 2023-02-06 16:37 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:02:35PM -0800, Dan Williams wrote:
> Prior to Linus deciding that the kernel that following v5.19 would be
> v6.0, the CXL ABI documentation already referenced v5.20. In preparation
> for updating these entries update the kernel version to v6.0.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 02/18] cxl/region: Add a mode attribute for regions
  2023-02-06  1:02 ` [PATCH 02/18] cxl/region: Add a mode attribute for regions Dan Williams
  2023-02-06 15:46   ` Jonathan Cameron
@ 2023-02-06 16:39   ` Gregory Price
  2023-02-06 19:16   ` Ira Weiny
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 112+ messages in thread
From: Gregory Price @ 2023-02-06 16:39 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:02:40PM -0800, Dan Williams wrote:
> In preparation for a new region type, "ram" regions, add a mode
> attribute to clarify the mode of the decoders that can be added to a
> region. Share the internals of mode_show() (for decoders) with the
> region case.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-06  5:36 ` [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Gregory Price
@ 2023-02-06 16:40   ` Davidlohr Bueso
  2023-02-06 18:23     ` Dan Williams
  2023-02-06 17:29   ` Dan Williams
  1 sibling, 1 reply; 112+ messages in thread
From: Davidlohr Bueso @ 2023-02-06 16:40 UTC (permalink / raw)
  To: Gregory Price
  Cc: Dan Williams, linux-cxl, David Hildenbrand, Kees Cook, stable,
	Dave Hansen, Michal Hocko, linux-mm, linux-acpi

On Mon, 06 Feb 2023, Gregory Price wrote:

>On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:
>> Summary:
>> --------
>>
>> CXL RAM support allows for the dynamic provisioning of new CXL RAM
>> regions, and more routinely, assembling a region from an existing
>> configuration established by platform-firmware. The latter is motivated
>> by CXL memory RAS (Reliability, Availability and Serviceability)
>> support, that requires associating device events with System Physical
>> Address ranges and vice versa.
>>
>> The 'Soft Reserved' policy rework arranges for performance
>> differentiated memory like CXL attached DRAM, or high-bandwidth memory,
>> to be designated for 'System RAM' by default, rather than the device-dax
>> dedicated access mode. That current device-dax default is confusing and
>> surprising for the Pareto of users that do not expect memory to be
>> quarantined for dedicated access by default. Most users expect all
>> 'System RAM'-capable memory to show up in FREE(1).
>
>Leverage the same QEMU branch, machine, and configuration as my prior
>tests, i'm now experiencing a kernel panic on boot.  Will debug a bit
>in the morning, but here is the stack trace i'm seeing
>
>Saw this in both 1 and 2 root port configurations

I also see it in "regular" pmem setups, and narrowed it down to this
change in the last patch:

-module_init(cxl_acpi_init);
+/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
+subsys_initcall(cxl_acpi_init);

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 04/18] cxl/region: Validate region mode vs decoder mode
  2023-02-06  1:02 ` [PATCH 04/18] cxl/region: Validate region mode vs decoder mode Dan Williams
  2023-02-06 16:02   ` Jonathan Cameron
@ 2023-02-06 16:44   ` Gregory Price
  2023-02-06 21:51     ` Dan Williams
  2023-02-06 19:23   ` Ira Weiny
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 112+ messages in thread
From: Gregory Price @ 2023-02-06 16:44 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:02:51PM -0800, Dan Williams wrote:
> In preparation for a new region mode, do not, for example, allow
> 'ram' decoders to be assigned to 'pmem' regions and vice versa.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/region.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c9e7f05caa0f..53d6dbe4de6d 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1217,6 +1217,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  	struct cxl_dport *dport;
>  	int i, rc = -ENXIO;
>  
> +	if (cxled->mode != cxlr->mode) {
> +		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> +			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> +		return -EINVAL;
> +	}
> +
>  	if (cxled->mode == CXL_DECODER_DEAD) {
>  		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
>  		return -ENODEV;
> 


Maybe a stupid question.  It will be entirely possible to "convert" pmem
to "ram" (i.e. just online it as system memory).

Are type-3 devices with pmem expected to carry 2 decoders (one ram, 1
pmem), such that you can create a ram region from pmem, or is it expected
that the pmem decoder will be used for pmem, even if it gets onlined
as volatile?

Just asking because depending on how that flushes out, you might
expected a "ram region" to get assigned to a "pmem decoder", unless
you're expecting all pmem-carrying-devices to create 2 decoders per pmem
region depending on how the end user intends to use it.

~Gregory

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06  1:02 ` [PATCH 05/18] cxl/region: Add volatile region creation support Dan Williams
  2023-02-06 16:18   ` Jonathan Cameron
@ 2023-02-06 16:55   ` Gregory Price
  2023-02-06 21:57     ` Dan Williams
  2023-02-06 19:25   ` Ira Weiny
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 112+ messages in thread
From: Gregory Price @ 2023-02-06 16:55 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:02:56PM -0800, Dan Williams wrote:
> Expand the region creation infrastructure to enable 'ram'
> (volatile-memory) regions. The internals of create_pmem_region_store()
> and create_pmem_region_show() are factored out into helpers
> __create_region() and __create_region_show() for the 'ram' case to
> reuse.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  
[...]
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 8566451cb22f..47e450c3a5a9 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -294,6 +294,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>  	&dev_attr_cap_type3.attr,
>  	&dev_attr_target_list.attr,
>  	SET_CXL_REGION_ATTR(create_pmem_region)
> +	SET_CXL_REGION_ATTR(create_ram_region)
>  	SET_CXL_REGION_ATTR(delete_region)
>  	NULL,
>  };
> @@ -305,6 +306,13 @@ static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
>  	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
>  }
>  
> +static bool can_create_ram(struct cxl_root_decoder *cxlrd)
> +{
> +	unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
> +
> +	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> +}
> +

does this collide with either CXL_DECODE_F_ENABLE or CXL_DECODER_F_AUTO?

I think obviously if it's already enabled creating new regions in a
decoder doesn't make sense, but if F_AUTO is set, does that imply
the region settings cannot be changed?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper
  2023-02-06  1:03 ` [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper Dan Williams
@ 2023-02-06 17:02   ` Gregory Price
  2023-02-06 22:01     ` Dan Williams
  2023-02-06 19:28   ` Ira Weiny
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 112+ messages in thread
From: Gregory Price @ 2023-02-06 17:02 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Kees Cook, dave.hansen, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:03:13PM -0800, Dan Williams wrote:
> In support of the CXL subsystem's use of 'struct range' to track decode
> address ranges, add a common range_contains() implementation with
> identical semantics as resource_contains();
> 
> The existing 'range_contains()' in lib/stackinit_kunit.c is namespaced
> with a 'stackinit_' prefix.
> 
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/pci.c |    5 -----
>  include/linux/range.h  |    5 +++++
>  lib/stackinit_kunit.c  |    6 +++---
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 

Only question is whether this should be upstreamed separate/ahead since
it's only tacitly related to the commit line, and the stackinit naming
seems overly-broad enough to cause issues.

Either way:


Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled
  2023-02-06  1:03 ` [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled Dan Williams
@ 2023-02-06 17:03   ` Gregory Price
  2023-02-06 23:57   ` Dave Jiang
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 112+ messages in thread
From: Gregory Price @ 2023-02-06 17:03 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:03:18PM -0800, Dan Williams wrote:
> Add help text and a label so the CXL_REGION config option can be
> toggled. This is mainly to enable compile testing without region
> support.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/Kconfig |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery
  2023-02-06  1:03 ` [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery Dan Williams
@ 2023-02-06 17:06   ` Jonathan Cameron
  2023-02-06 18:48     ` Dan Williams
  2023-02-06 19:26   ` Ira Weiny
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-06 17:06 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:03:02 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Region autodiscovery is the process of kernel creating 'struct
> cxl_region' object to represent active CXL memory ranges it finds
> already active in hardware when the driver loads. Typically this happens
> when platform firmware establishes CXL memory regions and then publishes
> them in the memory map. However, this can also happen in the case of
> kexec-reboot after the kernel has created regions.
> 
> In the autodiscovery case the region creation process starts with a
> known endpoint decoder. Refactor attach_target() into a helper that is
> suitable to be called from either sysfs, for runtime region creation, or
> from cxl_port_probe() after it has enumerated all endpoint decoders.
> 
> The cxl_port_probe() context is an async device-core probing context, so
> it is not appropriate to allow SIGTERM to interrupt the assembly
> process. Refactor attach_target() to take @cxled and @state as arguments
> where @state indicates whether waiting from the region rwsem is
> interruptible or not.

As below, I'd have broken this change out as a follow up patch - that
way reviewer could check for strict noop refactor, then look in ioslation
at that case.

I don't care that much though as second patch would only have about 4 lines
of diff.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> 
> No behavior change is intended.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/region.c |   47 +++++++++++++++++++++++++++------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 8dea49c021b8..97eafdd75675 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1418,31 +1418,25 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
>  	up_write(&cxl_region_rwsem);
>  }
>  
> -static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos)
> +static int attach_target(struct cxl_region *cxlr,
> +			 struct cxl_endpoint_decoder *cxled, int pos,
> +			 unsigned int state)
>  {
> -	struct device *dev;
> -	int rc;
> -
> -	dev = bus_find_device_by_name(&cxl_bus_type, NULL, decoder);
> -	if (!dev)
> -		return -ENODEV;
> -
> -	if (!is_endpoint_decoder(dev)) {
> -		put_device(dev);
> -		return -EINVAL;
> -	}
> +	int rc = 0;
>  
> -	rc = down_write_killable(&cxl_region_rwsem);
> +	if (state == TASK_INTERRUPTIBLE)
> +		rc = down_write_killable(&cxl_region_rwsem);
> +	else
> +		down_write(&cxl_region_rwsem);

I'd be tempted to do this in two hops for patch readability. First
make the code reorg then follow up with this bit before the use
of it in the next patch.

>  	if (rc)
> -		goto out;
> +		return rc;
> +
>  	down_read(&cxl_dpa_rwsem);
> -	rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos);
> +	rc = cxl_region_attach(cxlr, cxled, pos);
>  	if (rc == 0)
>  		set_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
>  	up_read(&cxl_dpa_rwsem);
>  	up_write(&cxl_region_rwsem);
> -out:
> -	put_device(dev);
>  	return rc;
>  }
>  
> @@ -1480,8 +1474,23 @@ static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
>  
>  	if (sysfs_streq(buf, "\n"))
>  		rc = detach_target(cxlr, pos);
> -	else
> -		rc = attach_target(cxlr, buf, pos);
> +	else {
> +		struct device *dev;
> +
> +		dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
> +		if (!dev)
> +			return -ENODEV;
> +
> +		if (!is_endpoint_decoder(dev)) {
> +			rc = -EINVAL;
> +			goto out;
> +		}
> +
> +		rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos,
> +				   TASK_INTERRUPTIBLE);
> +out:
> +		put_device(dev);
> +	}
>  
>  	if (rc < 0)
>  		return rc;
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 14/18] dax/hmem: Drop unnecessary dax_hmem_remove()
  2023-02-06  1:03 ` [PATCH 14/18] dax/hmem: Drop unnecessary dax_hmem_remove() Dan Williams
@ 2023-02-06 17:15   ` Gregory Price
  2023-02-08 17:33   ` Jonathan Cameron
  1 sibling, 0 replies; 112+ messages in thread
From: Gregory Price @ 2023-02-06 17:15 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:03:46PM -0800, Dan Williams wrote:
> Empty driver remove callbacks can just be elided.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/dax/hmem/hmem.c |    7 -------
>  1 file changed, 7 deletions(-)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-06 17:29   ` Dan Williams
@ 2023-02-06 17:18     ` Davidlohr Bueso
  0 siblings, 0 replies; 112+ messages in thread
From: Davidlohr Bueso @ 2023-02-06 17:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: Gregory Price, linux-cxl, David Hildenbrand, Kees Cook, stable,
	Dave Hansen, Michal Hocko, linux-mm, linux-acpi

On Mon, 06 Feb 2023, Dan Williams wrote:

>Gregory Price wrote:
>[..]
>> Leverage the same QEMU branch, machine, and configuration as my prior
>> tests, i'm now experiencing a kernel panic on boot.  Will debug a bit
>> in the morning, but here is the stack trace i'm seeing
>>
>> Saw this in both 1 and 2 root port configurations
>>
>> (note: I also have the region reset issue previously discussed on top of
>> your branch).
>>
>> QEMU configuration:
>>
>> sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
>> -drive file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
>> -m 2G,slots=4,maxmem=4G \
>> -smp 4 \
>> -machine type=q35,accel=kvm,cxl=on \
>> -enable-kvm \
>> -nographic \
>> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
>> -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
>> -object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \
>> -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
>> -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
>>
>[..]
>> [   15.162837] RIP: 0010:bus_add_device+0x5b/0x150
>
>I suspect cxl_bus_type is not intialized yet. I think this should
>address it:

Yep, thanks.

>
>diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>index 0faeb1ffc212..6fe00702327f 100644
>--- a/drivers/cxl/core/port.c
>+++ b/drivers/cxl/core/port.c
>@@ -2011,6 +2011,6 @@ static void cxl_core_exit(void)
>	debugfs_remove_recursive(cxl_debugfs);
> }
>
>-module_init(cxl_core_init);
>+subsys_initcall(cxl_core_init);
> module_exit(cxl_core_exit);
> MODULE_LICENSE("GPL v2");

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 10/18] cxl/region: Fix passthrough-decoder detection
  2023-02-06  5:38   ` Greg KH
@ 2023-02-06 17:22     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 17:22 UTC (permalink / raw)
  To: Greg KH, Dan Williams
  Cc: linux-cxl, stable, dave.hansen, linux-mm, linux-acpi

Greg KH wrote:
> On Sun, Feb 05, 2023 at 05:03:24PM -0800, Dan Williams wrote:
> > A passthrough decoder is a decoder that maps only 1 target. It is a
> > special case because it does not impose any constraints on the
> > interleave-math as compared to a decoder with multiple targets. Extend
> > the passthrough case to multi-target-capable decoders that only have one
> > target selected. I.e. the current code was only considering passthrough
> > *ports* which are only a subset of the potential passthrough decoder
> > scenarios.
> > 
> > Fixes: e4f6dfa9ef75 ("cxl/region: Fix 'distance' calculation with passthrough ports")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> If a patch really is a "fix" that needs to go to stable kernels, why is
> it commit 10 out of 18?  Why isn't it going to Linus now for inclusion
> in 6.2-final?  Does it depend on the 9 earlier patches in this series?

In this case the urgency is low since the only environment known to
produce this configuration is QEMU. However, there's now a more
important CXL fix pending for this week, so this one can tag along.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 17/18] dax: Assign RAM regions to memory-hotplug by default
  2023-02-06  1:04 ` [PATCH 17/18] dax: Assign RAM regions to memory-hotplug by default Dan Williams
@ 2023-02-06 17:26   ` Gregory Price
  2023-02-06 22:15     ` Dan Williams
  0 siblings, 1 reply; 112+ messages in thread
From: Gregory Price @ 2023-02-06 17:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Michal Hocko, David Hildenbrand, Dave Hansen,
	linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:04:05PM -0800, Dan Williams wrote:
> The default mode for device-dax instances is backwards for RAM-regions
> as evidenced by the fact that it tends to catch end users by surprise.
> "Where is my memory?". Recall that platforms are increasingly shipping
> with performance-differentiated memory pools beyond typical DRAM and
> NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic
> interleave, varied media types, and future fabric attached
> possibilities).
> 
> For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux
> 'Soft Reserved') attribute is expected to be applied to all memory-pools
> that are not the general purpose pool. This designation gives an
> Operating System a chance to defer usage of a memory pool until later in
> the boot process where its performance properties can be interrogated
> and administrator policy can be applied.
> 
> 'Soft Reserved' memory can be anything from too limited and precious to
> be part of the general purpose pool (HBM), too slow to host hot kernel
> data structures (some PMEM media), or anything in between. However, in
> the absence of an explicit policy, the memory should at least be made
> usable by default. The current device-dax default hides all
> non-general-purpose memory behind a device interface.
> 
> The expectation is that the distribution of users that want the memory
> online by default vs device-dedicated-access by default follows the
> Pareto principle. A small number of enlightened users may want to do
> userspace memory management through a device, but general users just
> want the kernel to make the memory available with an option to get more
> advanced later.
> 
> Arrange for all device-dax instances not backed by PMEM to default to
> attaching to the dax_kmem driver. From there the baseline memory hotplug
> policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=)
> gates whether the memory comes online or stays offline. Where, if it
> stays offline, it can be reliably converted back to device-mode where it
> can be partitioned, or fronted by a userspace allocator.
> 
> So, if someone wants device-dax instances for their 'Soft Reserved'
> memory:
> 
> 1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot
>    with memhp_default_state=offline, or roll the dice and hope that the
>    kernel has not pinned a page in that memory before step 2.
> 
> 2/ Write a udev rule to convert the target dax device(s) from
>    'system-ram' mode to 'devdax' mode:
> 
>    daxctl reconfigure-device $dax -m devdax -f
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Stupid question: when defaulting to online, do these devices get placed
into Zone Normal?  Is there a way for us, at a minimum, to online this
as Zone Moveable in an effort to assist the "hope the kernel has not
pinned a page" problem (and to try to keep kernel resources out of this
zone in general).

If this is covered by a different patch or already set up this way,
ignore me :]

~Gregory

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 1/18] cxl/Documentation: Update references to attributes added in v6.0
  2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
  2023-02-06 15:17   ` Jonathan Cameron
  2023-02-06 16:37   ` Gregory Price
@ 2023-02-06 17:27   ` Davidlohr Bueso
  2023-02-06 19:15   ` [PATCH 01/18] " Ira Weiny
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 112+ messages in thread
From: Davidlohr Bueso @ 2023-02-06 17:27 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023, Dan Williams wrote:

>Prior to Linus deciding that the kernel that following v5.19 would be
>v6.0, the CXL ABI documentation already referenced v5.20. In preparation
>for updating these entries update the kernel version to v6.0.
>
>Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

This one seems better for v6.2.

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-06  5:36 ` [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Gregory Price
  2023-02-06 16:40   ` Davidlohr Bueso
@ 2023-02-06 17:29   ` Dan Williams
  2023-02-06 17:18     ` Davidlohr Bueso
  1 sibling, 1 reply; 112+ messages in thread
From: Dan Williams @ 2023-02-06 17:29 UTC (permalink / raw)
  To: Gregory Price, Dan Williams
  Cc: linux-cxl, David Hildenbrand, Kees Cook, stable, Dave Hansen,
	Michal Hocko, linux-mm, linux-acpi

Gregory Price wrote:
[..]
> Leverage the same QEMU branch, machine, and configuration as my prior
> tests, i'm now experiencing a kernel panic on boot.  Will debug a bit
> in the morning, but here is the stack trace i'm seeing
> 
> Saw this in both 1 and 2 root port configurations
> 
> (note: I also have the region reset issue previously discussed on top of
> your branch).  
> 
> QEMU configuration:
> 
> sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> -drive file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
> -m 2G,slots=4,maxmem=4G \
> -smp 4 \
> -machine type=q35,accel=kvm,cxl=on \
> -enable-kvm \
> -nographic \
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> -object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \
> -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
> -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> 
[..]
> [   15.162837] RIP: 0010:bus_add_device+0x5b/0x150

I suspect cxl_bus_type is not intialized yet. I think this should
address it:

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 0faeb1ffc212..6fe00702327f 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2011,6 +2011,6 @@ static void cxl_core_exit(void)
 	debugfs_remove_recursive(cxl_debugfs);
 }
 
-module_init(cxl_core_init);
+subsys_initcall(cxl_core_init);
 module_exit(cxl_core_exit);
 MODULE_LICENSE("GPL v2");

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [PATCH 07/18] cxl/region: Move region-position validation to a helper
  2023-02-06  1:03 ` [PATCH 07/18] cxl/region: Move region-position validation to a helper Dan Williams
@ 2023-02-06 17:44   ` Ira Weiny
  2023-02-06 19:15     ` Dan Williams
  2023-02-08 12:30   ` Jonathan Cameron
  2023-02-09 19:45   ` Verma, Vishal L
  2 siblings, 1 reply; 112+ messages in thread
From: Ira Weiny @ 2023-02-06 17:44 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Dan Williams wrote:
> In preparation for region autodiscovery, that needs all devices
> discovered before their relative position in the region can be
> determined, consolidate all position dependent validation in a helper.
> 
> Recall that in the on-demand region creation flow the end-user picks the
> position of a given endpoint decoder in a region. In the autodiscovery
> case the position of an endpoint decoder can only be determined after
> all other endpoint decoders that claim to decode the region's address
> range have been enumerated and attached. So, in the autodiscovery case
> endpoint decoders may be attached before their relative position is
> known. Once all decoders arrive, then positions can be determined and
> validated with cxl_region_validate_position() the same as user initiated
> on-demand creation.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/region.c |  119 +++++++++++++++++++++++++++++----------------
>  1 file changed, 76 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 97eafdd75675..c82d3b6f3d1f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1207,35 +1207,13 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr)
>  	return 0;
>  }
>  

[snip]

> @@ -1274,6 +1252,71 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  		}
>  	}
>  
> +	return 0;
> +}
> +
> +static int cxl_region_attach_position(struct cxl_region *cxlr,
> +				      struct cxl_root_decoder *cxlrd,
> +				      struct cxl_endpoint_decoder *cxled,
> +				      const struct cxl_dport *dport, int pos)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_port *iter;
> +	int rc;
> +
> +	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> +		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +			dev_name(&cxlrd->cxlsd.cxld.dev));
> +		return -ENXIO;
> +	}

I think I know the answer but I'm curious why this check is not part of
validating the position?  Is it because this is validating the position
relative to the hostbridge which is not strictly part of the region
validation?

Ira

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 02/18] cxl/region: Add a mode attribute for regions
  2023-02-06 15:46   ` Jonathan Cameron
@ 2023-02-06 17:47     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 17:47 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
> On Sun, 05 Feb 2023 17:02:40 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > In preparation for a new region type, "ram" regions, add a mode
> > attribute to clarify the mode of the decoders that can be added to a
> > region. Share the internals of mode_show() (for decoders) with the
> > region case.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> I guess the DEAD decoder cases is an odd enough path that it is
> fine to not worry that it is presented as mixed (none might be
> more appropriate as you can't put any memory behind it).

Yeah, I went back and forth on what's the best state to expose in the
small window between the transition to DEAD and the removal of the
attribute from sysfs. Settled on "mixed" as the best indicator of
"ambiguous". From an administrator point of the view the decoder is
still very much alive in hardware, but the kernel has lost track of its
setting.

> 
> Other than that this is straight forwards exposure of internal data
> that already existed...
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions
  2023-02-06 15:54   ` Jonathan Cameron
@ 2023-02-06 18:07     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 18:07 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
> On Sun, 05 Feb 2023 17:02:45 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Shipping versions of the cxl-cli utility expect all regions to have a
> > 'uuid' attribute. In preparation for 'ram' regions, update the 'uuid'
> > attribute to return an empty string which satisfies the current
> > expectations of 'cxl list -R'. Otherwise, 'cxl list -R' fails in the
> > presence of regions with the 'uuid' attribute missing. Force the
> > attribute to be read-only as there is no facility or expectation for a
> > 'ram' region to recall its uuid from one boot to the next.
> 
> This is new functionality, so do we need the backwards compatibility?
> Or does that cxl list -R not just skip these, but fail hard?

It does this:

    # cxl list -R
    libcxl: __sysfs_device_parse: region4: add_dev() failed
      Warning: no matching devices found
    
    [
    ]

...which in hindsight is not a helpful message. However, as an end user
I hate forced upgrades, so a small kernel behavior change to extend the
useful life of an installed tool is worthwhile.

> It's inelegant to have this visible at all (and it confused me when
> I was running tests as I assumed I needed to poke something in it).

It is RO to indicate that, but I do notice I neglected to update
Documentation/ABI/testing/sysfs-bus-cxl with this behavior.

> 
> If nothing else, good to have a comment to remind us in the code
> why there is the oddity of setting it to read only.

Sure, I'll add that too.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 04/18] cxl/region: Validate region mode vs decoder mode
  2023-02-06 16:02   ` Jonathan Cameron
@ 2023-02-06 18:14     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 18:14 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
> On Sun, 05 Feb 2023 17:02:51 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > In preparation for a new region mode, do not, for example, allow
> > 'ram' decoders to be assigned to 'pmem' regions and vice versa.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> I guess we aren't support mixed endpoint decoders for now (and
> maybe never..) If we did I'd expect this to have to allow
> mixed with either RAM or PMEM.

Mixed is a bug, or I otherwise can not think of a scenario where it is
valid. The kernel prevents it, but platform firmware can make
that mistake. I think if the kernel ever encounters that and works
around it that will come with a loud TAINT_FIRMWARE_WORKAROUND warning.

> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >  drivers/cxl/core/region.c |    6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index c9e7f05caa0f..53d6dbe4de6d 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -1217,6 +1217,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> >  	struct cxl_dport *dport;
> >  	int i, rc = -ENXIO;
> >  
> > +	if (cxled->mode != cxlr->mode) {
> > +		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> > +			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> > +		return -EINVAL;
> > +	}
> > +
> >  	if (cxled->mode == CXL_DECODER_DEAD) {
> >  		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> >  		return -ENODEV;
> > 
> 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06 16:18   ` Jonathan Cameron
@ 2023-02-06 18:19     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 18:19 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
> On Sun, 05 Feb 2023 17:02:56 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Expand the region creation infrastructure to enable 'ram'
> > (volatile-memory) regions. The internals of create_pmem_region_store()
> > and create_pmem_region_show() are factored out into helpers
> > __create_region() and __create_region_show() for the 'ram' case to
> > reuse.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Entirely trivial comments inline.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[..]
> > @@ -1727,29 +1762,37 @@ static ssize_t create_pmem_region_store(struct device *dev,
> >  {
> >  	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> >  	struct cxl_region *cxlr;
> > -	int id, rc;
> > +	int rc, id;
> >  
> >  	rc = sscanf(buf, "region%d\n", &id);
> >  	if (rc != 1)
> >  		return -EINVAL;
> >  
> > -	rc = memregion_alloc(GFP_KERNEL);
> > -	if (rc < 0)
> > -		return rc;
> > +	cxlr = __create_region(cxlrd, CXL_DECODER_PMEM, id);
> > +	if (IS_ERR(cxlr))
> > +		return PTR_ERR(cxlr);
> 
> I'd have a blank line here - see below.
> 
[..]
> >  	if (IS_ERR(cxlr))
> >  		return PTR_ERR(cxlr);
> > -
> 
> Just so you know I read to the end!
> 
> Spurious unrelated white space change :)

...and I can't even blame that on clang-format! Will fix.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-06 16:40   ` Davidlohr Bueso
@ 2023-02-06 18:23     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 18:23 UTC (permalink / raw)
  To: Davidlohr Bueso, Gregory Price
  Cc: Dan Williams, linux-cxl, David Hildenbrand, Kees Cook, stable,
	Dave Hansen, Michal Hocko, linux-mm, linux-acpi

Davidlohr Bueso wrote:
> On Mon, 06 Feb 2023, Gregory Price wrote:
> 
> >On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:
> >> Summary:
> >> --------
> >>
> >> CXL RAM support allows for the dynamic provisioning of new CXL RAM
> >> regions, and more routinely, assembling a region from an existing
> >> configuration established by platform-firmware. The latter is motivated
> >> by CXL memory RAS (Reliability, Availability and Serviceability)
> >> support, that requires associating device events with System Physical
> >> Address ranges and vice versa.
> >>
> >> The 'Soft Reserved' policy rework arranges for performance
> >> differentiated memory like CXL attached DRAM, or high-bandwidth memory,
> >> to be designated for 'System RAM' by default, rather than the device-dax
> >> dedicated access mode. That current device-dax default is confusing and
> >> surprising for the Pareto of users that do not expect memory to be
> >> quarantined for dedicated access by default. Most users expect all
> >> 'System RAM'-capable memory to show up in FREE(1).
> >
> >Leverage the same QEMU branch, machine, and configuration as my prior
> >tests, i'm now experiencing a kernel panic on boot.  Will debug a bit
> >in the morning, but here is the stack trace i'm seeing
> >
> >Saw this in both 1 and 2 root port configurations
> 
> I also see it in "regular" pmem setups, and narrowed it down to this
> change in the last patch:
> 
> -module_init(cxl_acpi_init);
> +/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
> +subsys_initcall(cxl_acpi_init);

Yeah, I need to add some CONFIG_CXL_ACPI=y and CONFIG_CXL_BUS=y configs
to my tests. Typically I only run module builds.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery
  2023-02-06 17:06   ` Jonathan Cameron
@ 2023-02-06 18:48     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 18:48 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
> On Sun, 05 Feb 2023 17:03:02 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Region autodiscovery is the process of kernel creating 'struct
> > cxl_region' object to represent active CXL memory ranges it finds
> > already active in hardware when the driver loads. Typically this happens
> > when platform firmware establishes CXL memory regions and then publishes
> > them in the memory map. However, this can also happen in the case of
> > kexec-reboot after the kernel has created regions.
> > 
> > In the autodiscovery case the region creation process starts with a
> > known endpoint decoder. Refactor attach_target() into a helper that is
> > suitable to be called from either sysfs, for runtime region creation, or
> > from cxl_port_probe() after it has enumerated all endpoint decoders.
> > 
> > The cxl_port_probe() context is an async device-core probing context, so
> > it is not appropriate to allow SIGTERM to interrupt the assembly
> > process. Refactor attach_target() to take @cxled and @state as arguments
> > where @state indicates whether waiting from the region rwsem is
> > interruptible or not.
> 
[..]
> > @@ -1418,31 +1418,25 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
> >  	up_write(&cxl_region_rwsem);
> >  }
> >  
> > -static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos)
> > +static int attach_target(struct cxl_region *cxlr,
> > +			 struct cxl_endpoint_decoder *cxled, int pos,
> > +			 unsigned int state)
> >  {
> > -	struct device *dev;
> > -	int rc;
> > -
> > -	dev = bus_find_device_by_name(&cxl_bus_type, NULL, decoder);
> > -	if (!dev)
> > -		return -ENODEV;
> > -
> > -	if (!is_endpoint_decoder(dev)) {
> > -		put_device(dev);
> > -		return -EINVAL;
> > -	}
> > +	int rc = 0;
> >  
> > -	rc = down_write_killable(&cxl_region_rwsem);
> > +	if (state == TASK_INTERRUPTIBLE)
> > +		rc = down_write_killable(&cxl_region_rwsem);
> > +	else
> > +		down_write(&cxl_region_rwsem);
> 
> I'd be tempted to do this in two hops for patch readability. First
> make the code reorg then follow up with this bit before the use
> of it in the next patch.
> 

Sure.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 11/18] cxl/region: Add region autodiscovery
  2023-02-06  1:03 ` [PATCH 11/18] cxl/region: Add region autodiscovery Dan Williams
@ 2023-02-06 19:02   ` Ira Weiny
  2023-02-07 23:54   ` Dave Jiang
  2023-02-08 17:07   ` Jonathan Cameron
  2 siblings, 0 replies; 112+ messages in thread
From: Ira Weiny @ 2023-02-06 19:02 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Dan Williams wrote:

[snip]

> +
> +static int cmp_decode_pos(const void *a, const void *b)
> +{
> +	struct cxl_endpoint_decoder *cxled_a = *(typeof(cxled_a) *)a;
> +	struct cxl_endpoint_decoder *cxled_b = *(typeof(cxled_b) *)b;
> +	struct cxl_memdev *cxlmd_a = cxled_to_memdev(cxled_a);
> +	struct cxl_memdev *cxlmd_b = cxled_to_memdev(cxled_b);
> +	struct cxl_port *port_a = cxled_to_port(cxled_a);
> +	struct cxl_port *port_b = cxled_to_port(cxled_b);
> +	struct cxl_port *iter_a, *iter_b;
> +
> +	/* Exit early if any prior sorting failed */
> +	if (cxled_a->pos < 0 || cxled_b->pos < 0)
> +		return 0;
> +
> +	/*
> +	 * Walk up the hierarchy to find a shared port, find the decoder
> +	 * that maps the range, compare the relative position of those
> +	 * dport mappings.
> +	 */
> +	for (iter_a = port_a; iter_a; iter_a = next_port(iter_a)) {
> +		struct cxl_port *next_a, *next_b, *port;
> +		struct cxl_switch_decoder *cxlsd;
> +
> +		next_a = next_port(iter_a);
> +		for (iter_b = port_b; iter_b; iter_b = next_port(iter_b)) {
> +			int a_pos, b_pos, result;
> +			struct device *dev;
> +			unsigned int seq;
> +
> +			next_b = next_port(iter_b);
> +			if (next_a != next_b)
> +				continue;
> +			if (!next_a)
> +				goto out;

To me this check makes more sense before the inner loop.

> +			port = next_a;
> +			dev = device_find_child(&port->dev, cxled_a,
> +						decoder_match_range);
> +			if (!dev) {
> +				struct range *range = &cxled_a->cxld.hpa_range;
> +
> +				dev_err(port->uport,
> +					"failed to find decoder that maps %#llx-:%#llx\n",
> +					range->start, range->end);
> +				cxled_a->pos = -1;
> +				return 0;
> +			}
> +
> +			cxlsd = to_cxl_switch_decoder(dev);
> +			do {
> +				seq = read_seqbegin(&cxlsd->target_lock);
> +				find_positions(cxlsd, iter_a, iter_b, &a_pos,
> +					       &b_pos);
> +			} while (read_seqretry(&cxlsd->target_lock, seq));
> +
> +			if (a_pos < 0 || b_pos < 0) {
> +				dev_err(port->uport,
> +					"failed to find shared decoder for %s and %s\n",
> +					dev_name(cxlmd_a->dev.parent),
> +					dev_name(cxlmd_b->dev.parent));
> +				cxled_a->pos = -1;
> +				result = 0;
> +			} else {
> +				result = a_pos - b_pos;
> +				dev_dbg(port->uport, "%s: %s comes %s %s\n",
> +					dev_name(&cxlsd->cxld.dev),
> +					dev_name(cxlmd_a->dev.parent),
> +					result < 0 ? "before" : "after",
> +					dev_name(cxlmd_b->dev.parent));
> +			}
> +
> +			put_device(dev);
> +
> +			return result;
> +		}
> +	}
> +out:
> +	dev_err(cxlmd_a->dev.parent, "failed to find shared port with %s\n",
> +		dev_name(cxlmd_b->dev.parent));
> +	cxled_a->pos = -1;
> +	return 0;
> +}
> +

[snip]

> @@ -1500,8 +1766,8 @@ static int detach_target(struct cxl_region *cxlr, int pos)
>  	return rc;
>  }
>  
> -static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
> -			    size_t len)
> +static ssize_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
> +			     size_t len)

Is this a separate fix?

>  {
>  	int rc;
>  

[snip]

> +
> +/* Establish an empty region covering the given HPA range */
> +static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> +					   struct cxl_endpoint_decoder *cxled)

Rather than a comment would this be better named construct_empty_region()?

Ira

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 17/18] dax: Assign RAM regions to memory-hotplug by default
  2023-02-06 22:15     ` Dan Williams
@ 2023-02-06 19:05       ` Gregory Price
  2023-02-06 23:20         ` Dan Williams
  0 siblings, 1 reply; 112+ messages in thread
From: Gregory Price @ 2023-02-06 19:05 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Michal Hocko, David Hildenbrand, Dave Hansen,
	linux-mm, linux-acpi

On Mon, Feb 06, 2023 at 02:15:56PM -0800, Dan Williams wrote:
> Gregory Price wrote:
> > On Sun, Feb 05, 2023 at 05:04:05PM -0800, Dan Williams wrote:
> [..]
> > 
> > Stupid question: when defaulting to online, do these devices get placed
> > into Zone Normal?  Is there a way for us, at a minimum, to online this
> > as Zone Moveable in an effort to assist the "hope the kernel has not
> > pinned a page" problem (and to try to keep kernel resources out of this
> > zone in general).
> > 
> > If this is covered by a different patch or already set up this way,
> > ignore me :]
> 
> Have a look in Documentation/admin-guide/mm/memory-hotplug.rst, the
> 'daxctl recconfigure-device' man page and the the policy options of how
> hot-added memory is brought online.
> 
> The routing can be anything from fully offline device-dax, to fully
> online ZONE_NORMAL, or even a mix of device-dax subdivision,
> ZONE_NORMAL, and ZONE_MOVABLE memblocks all within one CXL ram region.

Hm.

I'm just thinking, for early expander devices the assumption that the
default behavior should be auto-online is good.  Just kinda working
through this in my head for multi-headed devices and early pools, and I
suppose the kernel paremeter covers that. Most of those will be used in
specialty, vertically integrated systems, so we're good to go.

In the future when we get fully featured DCD's, I imagine we will have a
memory region of size X, with N memory blocks, but only some of those
blocks are online.  I suppose it could also be done such that DCD capacity
add events drive the creation of new blocks under the region, as opposed
to having the region pre-create the blocks.

Either way I think the kernel parameter probably covers this case as
well, I'm just wondering if at some point the default will yet again
wish to be false, but i think that's a "CXL has become so successful all
memory is now CXL DCDs and computers look radically different" scenario.

Anyway, this seems good.

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 07/18] cxl/region: Move region-position validation to a helper
  2023-02-06 17:44   ` Ira Weiny
@ 2023-02-06 19:15     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 19:15 UTC (permalink / raw)
  To: Ira Weiny, Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Ira Weiny wrote:
> Dan Williams wrote:
> > In preparation for region autodiscovery, that needs all devices
> > discovered before their relative position in the region can be
> > determined, consolidate all position dependent validation in a helper.
> > 
> > Recall that in the on-demand region creation flow the end-user picks the
> > position of a given endpoint decoder in a region. In the autodiscovery
> > case the position of an endpoint decoder can only be determined after
> > all other endpoint decoders that claim to decode the region's address
> > range have been enumerated and attached. So, in the autodiscovery case
> > endpoint decoders may be attached before their relative position is
> > known. Once all decoders arrive, then positions can be determined and
> > validated with cxl_region_validate_position() the same as user initiated
> > on-demand creation.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/region.c |  119 +++++++++++++++++++++++++++++----------------
> >  1 file changed, 76 insertions(+), 43 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 97eafdd75675..c82d3b6f3d1f 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -1207,35 +1207,13 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr)
> >  	return 0;
> >  }
> >  
> 
> [snip]
> 
> > @@ -1274,6 +1252,71 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> >  		}
> >  	}
> >  
> > +	return 0;
> > +}
> > +
> > +static int cxl_region_attach_position(struct cxl_region *cxlr,
> > +				      struct cxl_root_decoder *cxlrd,
> > +				      struct cxl_endpoint_decoder *cxled,
> > +				      const struct cxl_dport *dport, int pos)
> > +{
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_port *iter;
> > +	int rc;
> > +
> > +	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> > +		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> > +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > +			dev_name(&cxlrd->cxlsd.cxld.dev));
> > +		return -ENXIO;
> > +	}
> 
> I think I know the answer but I'm curious why this check is not part of
> validating the position?  Is it because this is validating the position
> relative to the hostbridge which is not strictly part of the region
> validation?

cxl_region_validate_position() is just doing the basic checks like
preventing assigning 2 decoders to the same position, or assigning a
device twice to the same region.

The checks in cxl_region_attach_position() and cxl_port_attach_region()
are more about whether that device can be in that position relative to
the hardware topology.

I think this split makes more sense in the follow on patch where you see
that cxl_region_attach_position() is done after sorting while
cxl_region_attach_auto() replaces cxl_region_validate_position() since
the latter is validating user input and the former is reacting to what
platform-firmware already successfully programmed.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0
  2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
                     ` (2 preceding siblings ...)
  2023-02-06 17:27   ` [PATCH 1/18] " Davidlohr Bueso
@ 2023-02-06 19:15   ` Ira Weiny
  2023-02-06 21:04   ` Dave Jiang
  2023-02-09  0:20   ` Verma, Vishal L
  5 siblings, 0 replies; 112+ messages in thread
From: Ira Weiny @ 2023-02-06 19:15 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Dan Williams wrote:
> Prior to Linus deciding that the kernel that following v5.19 would be
> v6.0, the CXL ABI documentation already referenced v5.20. In preparation
> for updating these entries update the kernel version to v6.0.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 329a7e46c805..5be032313e29 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -198,7 +198,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/endpointX/CDAT
>  Date:		July, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) If this sysfs entry is not present no DOE mailbox was
> @@ -209,7 +209,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/mode
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> @@ -229,7 +229,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/dpa_resource
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint",
> @@ -240,7 +240,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/dpa_size
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> @@ -260,7 +260,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/interleave_ways
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) The number of targets across which this decoder's host
> @@ -275,7 +275,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/interleave_granularity
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) The number of consecutive bytes of host physical address
> @@ -287,7 +287,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/create_pmem_region
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a string in the form 'regionZ' to start the process
> @@ -303,7 +303,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(WO) Write a string in the form 'regionZ' to delete that region,
> @@ -312,7 +312,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/uuid
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a unique identifier for the region. This field must
> @@ -322,7 +322,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Set the number of consecutive bytes each device in the
> @@ -333,7 +333,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/interleave_ways
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Configures the number of devices participating in the
> @@ -343,7 +343,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/size
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) System physical address space to be consumed by the region.
> @@ -360,7 +360,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/resource
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RO) A region is a contiguous partition of a CXL root decoder
> @@ -372,7 +372,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/target[0..N]
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write an endpoint decoder object name to 'targetX' where X
> @@ -391,7 +391,7 @@ Description:
>  
>  What:		/sys/bus/cxl/devices/regionZ/commit
>  Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a boolean 'true' string value to this attribute to
> 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 02/18] cxl/region: Add a mode attribute for regions
  2023-02-06  1:02 ` [PATCH 02/18] cxl/region: Add a mode attribute for regions Dan Williams
  2023-02-06 15:46   ` Jonathan Cameron
  2023-02-06 16:39   ` Gregory Price
@ 2023-02-06 19:16   ` Ira Weiny
  2023-02-06 21:05   ` Dave Jiang
  2023-02-09  0:22   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Ira Weiny @ 2023-02-06 19:16 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Dan Williams wrote:
> In preparation for a new region type, "ram" regions, add a mode
> attribute to clarify the mode of the decoders that can be added to a
> region. Share the internals of mode_show() (for decoders) with the
> region case.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions
  2023-02-06  1:02 ` [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions Dan Williams
  2023-02-06 15:54   ` Jonathan Cameron
@ 2023-02-06 19:22   ` Ira Weiny
  2023-02-06 19:35     ` Dan Williams
  2023-02-09  0:24   ` Verma, Vishal L
  2 siblings, 1 reply; 112+ messages in thread
From: Ira Weiny @ 2023-02-06 19:22 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Dan Williams wrote:
> Shipping versions of the cxl-cli utility expect all regions to have a
> 'uuid' attribute. In preparation for 'ram' regions, update the 'uuid'
> attribute to return an empty string which satisfies the current
> expectations of 'cxl list -R'. Otherwise, 'cxl list -R' fails in the
> presence of regions with the 'uuid' attribute missing.

Would this be more appropriate as a change to cxl-cli?

> Force the
> attribute to be read-only as there is no facility or expectation for a
> 'ram' region to recall its uuid from one boot to the next.

This seems reasonable.

> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |    3 ++-
>  drivers/cxl/core/region.c               |    7 +++++--
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 058b0c45001f..4c4e1cbb1169 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -317,7 +317,8 @@ Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(RW) Write a unique identifier for the region. This field must
>  		be set for persistent regions and it must not conflict with the
> -		UUID of another region.
> +		UUID of another region. For volatile ram regions this
> +		attribute is a read-only empty string.
>  
>  
>  What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 17d2d0c12725..c9e7f05caa0f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -45,7 +45,10 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
>  	rc = down_read_interruptible(&cxl_region_rwsem);
>  	if (rc)
>  		return rc;
> -	rc = sysfs_emit(buf, "%pUb\n", &p->uuid);

I guess it all depends on what p->uuid is...  Shouldn't this be all 0's
for a ram region?  Does sysfs_emit() choke on that?

Ira

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 04/18] cxl/region: Validate region mode vs decoder mode
  2023-02-06  1:02 ` [PATCH 04/18] cxl/region: Validate region mode vs decoder mode Dan Williams
  2023-02-06 16:02   ` Jonathan Cameron
  2023-02-06 16:44   ` Gregory Price
@ 2023-02-06 19:23   ` Ira Weiny
  2023-02-06 22:16   ` Dave Jiang
  2023-02-09  0:25   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Ira Weiny @ 2023-02-06 19:23 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Dan Williams wrote:
> In preparation for a new region mode, do not, for example, allow
> 'ram' decoders to be assigned to 'pmem' regions and vice versa.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06  1:02 ` [PATCH 05/18] cxl/region: Add volatile region creation support Dan Williams
  2023-02-06 16:18   ` Jonathan Cameron
  2023-02-06 16:55   ` Gregory Price
@ 2023-02-06 19:25   ` Ira Weiny
  2023-02-06 22:31   ` Dave Jiang
  2023-02-09  1:02   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Ira Weiny @ 2023-02-06 19:25 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Dan Williams wrote:
> Expand the region creation infrastructure to enable 'ram'
> (volatile-memory) regions. The internals of create_pmem_region_store()
> and create_pmem_region_show() are factored out into helpers
> __create_region() and __create_region_show() for the 'ram' case to
> reuse.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery
  2023-02-06  1:03 ` [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery Dan Williams
  2023-02-06 17:06   ` Jonathan Cameron
@ 2023-02-06 19:26   ` Ira Weiny
  2023-02-06 22:41   ` Dave Jiang
  2023-02-09  1:09   ` Verma, Vishal L
  3 siblings, 0 replies; 112+ messages in thread
From: Ira Weiny @ 2023-02-06 19:26 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Dan Williams wrote:
> Region autodiscovery is the process of kernel creating 'struct
> cxl_region' object to represent active CXL memory ranges it finds
> already active in hardware when the driver loads. Typically this happens
> when platform firmware establishes CXL memory regions and then publishes
> them in the memory map. However, this can also happen in the case of
> kexec-reboot after the kernel has created regions.
> 
> In the autodiscovery case the region creation process starts with a
> known endpoint decoder. Refactor attach_target() into a helper that is
> suitable to be called from either sysfs, for runtime region creation, or
> from cxl_port_probe() after it has enumerated all endpoint decoders.
> 
> The cxl_port_probe() context is an async device-core probing context, so
> it is not appropriate to allow SIGTERM to interrupt the assembly
> process. Refactor attach_target() to take @cxled and @state as arguments
> where @state indicates whether waiting from the region rwsem is
> interruptible or not.
> 
> No behavior change is intended.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper
  2023-02-06  1:03 ` [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper Dan Williams
  2023-02-06 17:02   ` Gregory Price
@ 2023-02-06 19:28   ` Ira Weiny
  2023-02-06 23:41   ` Dave Jiang
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 112+ messages in thread
From: Ira Weiny @ 2023-02-06 19:28 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: Kees Cook, dave.hansen, linux-mm, linux-acpi

Dan Williams wrote:
> In support of the CXL subsystem's use of 'struct range' to track decode
> address ranges, add a common range_contains() implementation with
> identical semantics as resource_contains();
> 
> The existing 'range_contains()' in lib/stackinit_kunit.c is namespaced
> with a 'stackinit_' prefix.
> 
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions
  2023-02-06 19:22   ` Ira Weiny
@ 2023-02-06 19:35     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 19:35 UTC (permalink / raw)
  To: Ira Weiny, Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Ira Weiny wrote:
> Dan Williams wrote:
> > Shipping versions of the cxl-cli utility expect all regions to have a
> > 'uuid' attribute. In preparation for 'ram' regions, update the 'uuid'
> > attribute to return an empty string which satisfies the current
> > expectations of 'cxl list -R'. Otherwise, 'cxl list -R' fails in the
> > presence of regions with the 'uuid' attribute missing.
> 
> Would this be more appropriate as a change to cxl-cli?

The point is already shipped cxl-cli can not be changed. So if the
kernel carries this workaround it will carry it forever even if
userspace updates.

Here is an illustration of the different update cadences of
distributions that ship ndctl / cxl-cli:

https://repology.org/project/ndctl/versions

> 
> > Force the
> > attribute to be read-only as there is no facility or expectation for a
> > 'ram' region to recall its uuid from one boot to the next.
> 
> This seems reasonable.
> 
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl |    3 ++-
> >  drivers/cxl/core/region.c               |    7 +++++--
> >  2 files changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 058b0c45001f..4c4e1cbb1169 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -317,7 +317,8 @@ Contact:	linux-cxl@vger.kernel.org
> >  Description:
> >  		(RW) Write a unique identifier for the region. This field must
> >  		be set for persistent regions and it must not conflict with the
> > -		UUID of another region.
> > +		UUID of another region. For volatile ram regions this
> > +		attribute is a read-only empty string.
> >  
> >  
> >  What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 17d2d0c12725..c9e7f05caa0f 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -45,7 +45,10 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
> >  	rc = down_read_interruptible(&cxl_region_rwsem);
> >  	if (rc)
> >  		return rc;
> > -	rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
> 
> I guess it all depends on what p->uuid is...  Shouldn't this be all 0's
> for a ram region?  Does sysfs_emit() choke on that?

...but the uuid isn't all zeros for ram-regions, it does not exist so an
empty string is more appropriate.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 04/18] cxl/region: Validate region mode vs decoder mode
  2023-02-06 21:51     ` Dan Williams
@ 2023-02-06 19:55       ` Gregory Price
  0 siblings, 0 replies; 112+ messages in thread
From: Gregory Price @ 2023-02-06 19:55 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Mon, Feb 06, 2023 at 01:51:18PM -0800, Dan Williams wrote:
> Gregory Price wrote:
> > On Sun, Feb 05, 2023 at 05:02:51PM -0800, Dan Williams wrote:
> > > In preparation for a new region mode, do not, for example, allow
> > > 'ram' decoders to be assigned to 'pmem' regions and vice versa.
> > > 
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  drivers/cxl/core/region.c |    6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
[...]

sounds good, thanks for the explanation

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06 21:57     ` Dan Williams
@ 2023-02-06 19:56       ` Gregory Price
  0 siblings, 0 replies; 112+ messages in thread
From: Gregory Price @ 2023-02-06 19:56 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Mon, Feb 06, 2023 at 01:57:05PM -0800, Dan Williams wrote:
> Gregory Price wrote:
> > On Sun, Feb 05, 2023 at 05:02:56PM -0800, Dan Williams wrote:
> > > Expand the region creation infrastructure to enable 'ram'
> > > (volatile-memory) regions. The internals of create_pmem_region_store()
> > > and create_pmem_region_show() are factored out into helpers
> > > __create_region() and __create_region_show() for the 'ram' case to
> > > reuse.
> > > 
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  
> [..]
> > > @@ -305,6 +306,13 @@ static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
> > >  	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> > >  }
> > >  
> > > +static bool can_create_ram(struct cxl_root_decoder *cxlrd)
> > > +{
> > > +	unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
> > > +
> > > +	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> > > +}
> > > +
> > 
> > does this collide with either CXL_DECODE_F_ENABLE or CXL_DECODER_F_AUTO?
> > 
> > I think obviously if it's already enabled creating new regions in a
> > decoder doesn't make sense, but if F_AUTO is set, does that imply
> > the region settings cannot be changed?
> 
> That just cares if the root decoder supports TYPE3 and RAM independent
> of ENABLE or AUTO. Root decoders are always enabled. The AUTO flag,
> which is not applicable to root decoders, is just there to hold off
> userspace racing the attachment of endpoint decoders to a region until
> the autodiscovery process has completed. Once that completes and the
> region has been enabled then it can be destroyed to clear AUTO.

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0
  2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
                     ` (3 preceding siblings ...)
  2023-02-06 19:15   ` [PATCH 01/18] " Ira Weiny
@ 2023-02-06 21:04   ` Dave Jiang
  2023-02-09  0:20   ` Verma, Vishal L
  5 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2023-02-06 21:04 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi



On 2/5/23 6:02 PM, Dan Williams wrote:
> Prior to Linus deciding that the kernel that following v5.19 would be
> v6.0, the CXL ABI documentation already referenced v5.20. In preparation
> for updating these entries update the kernel version to v6.0.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   Documentation/ABI/testing/sysfs-bus-cxl |   30 +++++++++++++++---------------
>   1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 329a7e46c805..5be032313e29 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -198,7 +198,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/endpointX/CDAT
>   Date:		July, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RO) If this sysfs entry is not present no DOE mailbox was
> @@ -209,7 +209,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/decoderX.Y/mode
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> @@ -229,7 +229,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/decoderX.Y/dpa_resource
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint",
> @@ -240,7 +240,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/decoderX.Y/dpa_size
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> @@ -260,7 +260,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/decoderX.Y/interleave_ways
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RO) The number of targets across which this decoder's host
> @@ -275,7 +275,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/decoderX.Y/interleave_granularity
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RO) The number of consecutive bytes of host physical address
> @@ -287,7 +287,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/decoderX.Y/create_pmem_region
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) Write a string in the form 'regionZ' to start the process
> @@ -303,7 +303,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(WO) Write a string in the form 'regionZ' to delete that region,
> @@ -312,7 +312,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/regionZ/uuid
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) Write a unique identifier for the region. This field must
> @@ -322,7 +322,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) Set the number of consecutive bytes each device in the
> @@ -333,7 +333,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/regionZ/interleave_ways
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) Configures the number of devices participating in the
> @@ -343,7 +343,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/regionZ/size
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) System physical address space to be consumed by the region.
> @@ -360,7 +360,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/regionZ/resource
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RO) A region is a contiguous partition of a CXL root decoder
> @@ -372,7 +372,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/regionZ/target[0..N]
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) Write an endpoint decoder object name to 'targetX' where X
> @@ -391,7 +391,7 @@ Description:
>   
>   What:		/sys/bus/cxl/devices/regionZ/commit
>   Date:		May, 2022
> -KernelVersion:	v5.20
> +KernelVersion:	v6.0
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) Write a boolean 'true' string value to this attribute to
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 02/18] cxl/region: Add a mode attribute for regions
  2023-02-06  1:02 ` [PATCH 02/18] cxl/region: Add a mode attribute for regions Dan Williams
                     ` (2 preceding siblings ...)
  2023-02-06 19:16   ` Ira Weiny
@ 2023-02-06 21:05   ` Dave Jiang
  2023-02-09  0:22   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2023-02-06 21:05 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi



On 2/5/23 6:02 PM, Dan Williams wrote:
> In preparation for a new region type, "ram" regions, add a mode
> attribute to clarify the mode of the decoders that can be added to a
> region. Share the internals of mode_show() (for decoders) with the
> region case.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   Documentation/ABI/testing/sysfs-bus-cxl |   11 +++++++++++
>   drivers/cxl/core/port.c                 |   12 +-----------
>   drivers/cxl/core/region.c               |   10 ++++++++++
>   drivers/cxl/cxl.h                       |   14 ++++++++++++++
>   4 files changed, 36 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 5be032313e29..058b0c45001f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -358,6 +358,17 @@ Description:
>   		results in the same address being allocated.
>   
>   
> +What:		/sys/bus/cxl/devices/regionZ/mode
> +Date:		January, 2023
> +KernelVersion:	v6.3
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The mode of a region is established at region creation time
> +		and dictates the mode of the endpoint decoder that comprise the
> +		region. For more details on the possible modes see
> +		/sys/bus/cxl/devices/decoderX.Y/mode
> +
> +
>   What:		/sys/bus/cxl/devices/regionZ/resource
>   Date:		May, 2022
>   KernelVersion:	v6.0
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 410c036c09fa..8566451cb22f 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -180,17 +180,7 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
>   {
>   	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
>   
> -	switch (cxled->mode) {
> -	case CXL_DECODER_RAM:
> -		return sysfs_emit(buf, "ram\n");
> -	case CXL_DECODER_PMEM:
> -		return sysfs_emit(buf, "pmem\n");
> -	case CXL_DECODER_NONE:
> -		return sysfs_emit(buf, "none\n");
> -	case CXL_DECODER_MIXED:
> -	default:
> -		return sysfs_emit(buf, "mixed\n");
> -	}
> +	return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode));
>   }
>   
>   static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 60828d01972a..17d2d0c12725 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -458,6 +458,15 @@ static ssize_t resource_show(struct device *dev, struct device_attribute *attr,
>   }
>   static DEVICE_ATTR_RO(resource);
>   
> +static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
> +			 char *buf)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +
> +	return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode));
> +}
> +static DEVICE_ATTR_RO(mode);
> +
>   static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
>   {
>   	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> @@ -585,6 +594,7 @@ static struct attribute *cxl_region_attrs[] = {
>   	&dev_attr_interleave_granularity.attr,
>   	&dev_attr_resource.attr,
>   	&dev_attr_size.attr,
> +	&dev_attr_mode.attr,
>   	NULL,
>   };
>   
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index aa3af3bb73b2..ca76879af1de 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -320,6 +320,20 @@ enum cxl_decoder_mode {
>   	CXL_DECODER_DEAD,
>   };
>   
> +static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
> +{
> +	static const char * const names[] = {
> +		[CXL_DECODER_NONE] = "none",
> +		[CXL_DECODER_RAM] = "ram",
> +		[CXL_DECODER_PMEM] = "pmem",
> +		[CXL_DECODER_MIXED] = "mixed",
> +	};
> +
> +	if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
> +		return names[mode];
> +	return "mixed";
> +}
> +
>   /**
>    * struct cxl_endpoint_decoder - Endpoint  / SPA to DPA decoder
>    * @cxld: base cxl_decoder_object
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 04/18] cxl/region: Validate region mode vs decoder mode
  2023-02-06 16:44   ` Gregory Price
@ 2023-02-06 21:51     ` Dan Williams
  2023-02-06 19:55       ` Gregory Price
  0 siblings, 1 reply; 112+ messages in thread
From: Dan Williams @ 2023-02-06 21:51 UTC (permalink / raw)
  To: Gregory Price, Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Gregory Price wrote:
> On Sun, Feb 05, 2023 at 05:02:51PM -0800, Dan Williams wrote:
> > In preparation for a new region mode, do not, for example, allow
> > 'ram' decoders to be assigned to 'pmem' regions and vice versa.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/region.c |    6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index c9e7f05caa0f..53d6dbe4de6d 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -1217,6 +1217,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> >  	struct cxl_dport *dport;
> >  	int i, rc = -ENXIO;
> >  
> > +	if (cxled->mode != cxlr->mode) {
> > +		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> > +			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> > +		return -EINVAL;
> > +	}
> > +
> >  	if (cxled->mode == CXL_DECODER_DEAD) {
> >  		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> >  		return -ENODEV;
> > 
> 
> 
> Maybe a stupid question.  It will be entirely possible to "convert" pmem
> to "ram" (i.e. just online it as system memory).

That's a good question, there are two mechanisms for this:

1/ Use ndctl to assign the resulting pmem namespace to devdax mode and
   then use dax_kmem to online that persistent memory.

2/ If the CXL device supports partitioning ram vs pmem capacity: tear down
   the pmem region ('cxl destroy-region ...'), repartition the pmem ('cxl
   set-partition -t ram ...'), create a new ram region ('cxl create-region
   ...').

> Are type-3 devices with pmem expected to carry 2 decoders (one ram, 1
> pmem), such that you can create a ram region from pmem, or is it expected
> that the pmem decoder will be used for pmem, even if it gets onlined
> as volatile?

Not sure there are any formal expectations when it comes to number of
decoders.

> Just asking because depending on how that flushes out, you might
> expected a "ram region" to get assigned to a "pmem decoder", unless
> you're expecting all pmem-carrying-devices to create 2 decoders per pmem
> region depending on how the end user intends to use it.

Per 2/ above even with a 1-decoder-device there is a spec-defined
mechanism to assign all of the device's capacity to ram.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06 16:55   ` Gregory Price
@ 2023-02-06 21:57     ` Dan Williams
  2023-02-06 19:56       ` Gregory Price
  0 siblings, 1 reply; 112+ messages in thread
From: Dan Williams @ 2023-02-06 21:57 UTC (permalink / raw)
  To: Gregory Price, Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Gregory Price wrote:
> On Sun, Feb 05, 2023 at 05:02:56PM -0800, Dan Williams wrote:
> > Expand the region creation infrastructure to enable 'ram'
> > (volatile-memory) regions. The internals of create_pmem_region_store()
> > and create_pmem_region_show() are factored out into helpers
> > __create_region() and __create_region_show() for the 'ram' case to
> > reuse.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  
[..]
> > @@ -305,6 +306,13 @@ static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
> >  	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> >  }
> >  
> > +static bool can_create_ram(struct cxl_root_decoder *cxlrd)
> > +{
> > +	unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
> > +
> > +	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> > +}
> > +
> 
> does this collide with either CXL_DECODE_F_ENABLE or CXL_DECODER_F_AUTO?
> 
> I think obviously if it's already enabled creating new regions in a
> decoder doesn't make sense, but if F_AUTO is set, does that imply
> the region settings cannot be changed?

That just cares if the root decoder supports TYPE3 and RAM independent
of ENABLE or AUTO. Root decoders are always enabled. The AUTO flag,
which is not applicable to root decoders, is just there to hold off
userspace racing the attachment of endpoint decoders to a region until
the autodiscovery process has completed. Once that completes and the
region has been enabled then it can be destroyed to clear AUTO.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper
  2023-02-06 17:02   ` Gregory Price
@ 2023-02-06 22:01     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 22:01 UTC (permalink / raw)
  To: Gregory Price, Dan Williams
  Cc: linux-cxl, Kees Cook, dave.hansen, linux-mm, linux-acpi

Gregory Price wrote:
> On Sun, Feb 05, 2023 at 05:03:13PM -0800, Dan Williams wrote:
> > In support of the CXL subsystem's use of 'struct range' to track decode
> > address ranges, add a common range_contains() implementation with
> > identical semantics as resource_contains();
> > 
> > The existing 'range_contains()' in lib/stackinit_kunit.c is namespaced
> > with a 'stackinit_' prefix.
> > 
> > Cc: Kees Cook <keescook@chromium.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/pci.c |    5 -----
> >  include/linux/range.h  |    5 +++++
> >  lib/stackinit_kunit.c  |    6 +++---
> >  3 files changed, 8 insertions(+), 8 deletions(-)
> > 
> 
> Only question is whether this should be upstreamed separate/ahead since
> it's only tacitly related to the commit line, and the stackinit naming
> seems overly-broad enough to cause issues.

It's easier for me if it gets an ack and goes through the CXL tree.
Otherwise I need to delay the patch that introduces range_contains() as
a global symbol.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 17/18] dax: Assign RAM regions to memory-hotplug by default
  2023-02-06 17:26   ` Gregory Price
@ 2023-02-06 22:15     ` Dan Williams
  2023-02-06 19:05       ` Gregory Price
  0 siblings, 1 reply; 112+ messages in thread
From: Dan Williams @ 2023-02-06 22:15 UTC (permalink / raw)
  To: Gregory Price, Dan Williams
  Cc: linux-cxl, Michal Hocko, David Hildenbrand, Dave Hansen,
	linux-mm, linux-acpi

Gregory Price wrote:
> On Sun, Feb 05, 2023 at 05:04:05PM -0800, Dan Williams wrote:
[..]
> 
> Stupid question: when defaulting to online, do these devices get placed
> into Zone Normal?  Is there a way for us, at a minimum, to online this
> as Zone Moveable in an effort to assist the "hope the kernel has not
> pinned a page" problem (and to try to keep kernel resources out of this
> zone in general).
> 
> If this is covered by a different patch or already set up this way,
> ignore me :]

Have a look in Documentation/admin-guide/mm/memory-hotplug.rst, the
'daxctl recconfigure-device' man page and the the policy options of how
hot-added memory is brought online.

The routing can be anything from fully offline device-dax, to fully
online ZONE_NORMAL, or even a mix of device-dax subdivision,
ZONE_NORMAL, and ZONE_MOVABLE memblocks all within one CXL ram region.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 04/18] cxl/region: Validate region mode vs decoder mode
  2023-02-06  1:02 ` [PATCH 04/18] cxl/region: Validate region mode vs decoder mode Dan Williams
                     ` (2 preceding siblings ...)
  2023-02-06 19:23   ` Ira Weiny
@ 2023-02-06 22:16   ` Dave Jiang
  2023-02-09  0:25   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2023-02-06 22:16 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi



On 2/5/23 6:02 PM, Dan Williams wrote:
> In preparation for a new region mode, do not, for example, allow
> 'ram' decoders to be assigned to 'pmem' regions and vice versa.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>   drivers/cxl/core/region.c |    6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c9e7f05caa0f..53d6dbe4de6d 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1217,6 +1217,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>   	struct cxl_dport *dport;
>   	int i, rc = -ENXIO;
>   
> +	if (cxled->mode != cxlr->mode) {
> +		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> +			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> +		return -EINVAL;
> +	}
> +
>   	if (cxled->mode == CXL_DECODER_DEAD) {
>   		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
>   		return -ENODEV;
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06  1:02 ` [PATCH 05/18] cxl/region: Add volatile region creation support Dan Williams
                     ` (2 preceding siblings ...)
  2023-02-06 19:25   ` Ira Weiny
@ 2023-02-06 22:31   ` Dave Jiang
  2023-02-06 22:37     ` Dan Williams
  2023-02-09  1:02   ` Verma, Vishal L
  4 siblings, 1 reply; 112+ messages in thread
From: Dave Jiang @ 2023-02-06 22:31 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi



On 2/5/23 6:02 PM, Dan Williams wrote:
> Expand the region creation infrastructure to enable 'ram'
> (volatile-memory) regions. The internals of create_pmem_region_store()
> and create_pmem_region_show() are factored out into helpers
> __create_region() and __create_region_show() for the 'ram' case to
> reuse.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Minor comment below, otherwise:
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   Documentation/ABI/testing/sysfs-bus-cxl |   22 +++++-----
>   drivers/cxl/core/core.h                 |    1
>   drivers/cxl/core/port.c                 |   14 ++++++
>   drivers/cxl/core/region.c               |   71 +++++++++++++++++++++++++------
>   4 files changed, 82 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 4c4e1cbb1169..3acf2f17a73f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -285,20 +285,20 @@ Description:
>   		interleave_granularity).
>   
>   
> -What:		/sys/bus/cxl/devices/decoderX.Y/create_pmem_region
> -Date:		May, 2022
> -KernelVersion:	v6.0
> +What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
> +Date:		May, 2022, January, 2023
> +KernelVersion:	v6.0 (pmem), v6.3 (ram)
>   Contact:	linux-cxl@vger.kernel.org
>   Description:
>   		(RW) Write a string in the form 'regionZ' to start the process
> -		of defining a new persistent memory region (interleave-set)
> -		within the decode range bounded by root decoder 'decoderX.Y'.
> -		The value written must match the current value returned from
> -		reading this attribute. An atomic compare exchange operation is
> -		done on write to assign the requested id to a region and
> -		allocate the region-id for the next creation attempt. EBUSY is
> -		returned if the region name written does not match the current
> -		cached value.
> +		of defining a new persistent, or volatile memory region
> +		(interleave-set) within the decode range bounded by root decoder
> +		'decoderX.Y'. The value written must match the current value
> +		returned from reading this attribute. An atomic compare exchange
> +		operation is done on write to assign the requested id to a
> +		region and allocate the region-id for the next creation attempt.
> +		EBUSY is returned if the region name written does not match the

-EBUSY?

DJ
> +		current cached value.
>   
>   
>   What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 8c04672dca56..5eb873da5a30 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -11,6 +11,7 @@ extern struct attribute_group cxl_base_attribute_group;
>   
>   #ifdef CONFIG_CXL_REGION
>   extern struct device_attribute dev_attr_create_pmem_region;
> +extern struct device_attribute dev_attr_create_ram_region;
>   extern struct device_attribute dev_attr_delete_region;
>   extern struct device_attribute dev_attr_region;
>   extern const struct device_type cxl_pmem_region_type;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 8566451cb22f..47e450c3a5a9 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -294,6 +294,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>   	&dev_attr_cap_type3.attr,
>   	&dev_attr_target_list.attr,
>   	SET_CXL_REGION_ATTR(create_pmem_region)
> +	SET_CXL_REGION_ATTR(create_ram_region)
>   	SET_CXL_REGION_ATTR(delete_region)
>   	NULL,
>   };
> @@ -305,6 +306,13 @@ static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
>   	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
>   }
>   
> +static bool can_create_ram(struct cxl_root_decoder *cxlrd)
> +{
> +	unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
> +
> +	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> +}
> +
>   static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
>   {
>   	struct device *dev = kobj_to_dev(kobj);
> @@ -313,7 +321,11 @@ static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *
>   	if (a == CXL_REGION_ATTR(create_pmem_region) && !can_create_pmem(cxlrd))
>   		return 0;
>   
> -	if (a == CXL_REGION_ATTR(delete_region) && !can_create_pmem(cxlrd))
> +	if (a == CXL_REGION_ATTR(create_ram_region) && !can_create_ram(cxlrd))
> +		return 0;
> +
> +	if (a == CXL_REGION_ATTR(delete_region) &&
> +	    !(can_create_pmem(cxlrd) || can_create_ram(cxlrd)))
>   		return 0;
>   
>   	return a->mode;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 53d6dbe4de6d..8dea49c021b8 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1685,6 +1685,15 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>   	struct device *dev;
>   	int rc;
>   
> +	switch (mode) {
> +	case CXL_DECODER_RAM:
> +	case CXL_DECODER_PMEM:
> +		break;
> +	default:
> +		dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
>   	cxlr = cxl_region_alloc(cxlrd, id);
>   	if (IS_ERR(cxlr))
>   		return cxlr;
> @@ -1713,12 +1722,38 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>   	return ERR_PTR(rc);
>   }
>   
> +static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf)
> +{
> +	return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
> +}
> +
>   static ssize_t create_pmem_region_show(struct device *dev,
>   				       struct device_attribute *attr, char *buf)
>   {
> -	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +	return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
>   
> -	return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
> +static ssize_t create_ram_region_show(struct device *dev,
> +				      struct device_attribute *attr, char *buf)
> +{
> +	return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
> +
> +static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
> +					  enum cxl_decoder_mode mode, int id)
> +{
> +	int rc;
> +
> +	rc = memregion_alloc(GFP_KERNEL);
> +	if (rc < 0)
> +		return ERR_PTR(rc);
> +
> +	if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
> +		memregion_free(rc);
> +		return ERR_PTR(-EBUSY);
> +	}
> +
> +	return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_EXPANDER);
>   }
>   
>   static ssize_t create_pmem_region_store(struct device *dev,
> @@ -1727,29 +1762,37 @@ static ssize_t create_pmem_region_store(struct device *dev,
>   {
>   	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>   	struct cxl_region *cxlr;
> -	int id, rc;
> +	int rc, id;
>   
>   	rc = sscanf(buf, "region%d\n", &id);
>   	if (rc != 1)
>   		return -EINVAL;
>   
> -	rc = memregion_alloc(GFP_KERNEL);
> -	if (rc < 0)
> -		return rc;
> +	cxlr = __create_region(cxlrd, CXL_DECODER_PMEM, id);
> +	if (IS_ERR(cxlr))
> +		return PTR_ERR(cxlr);
> +	return len;
> +}
> +DEVICE_ATTR_RW(create_pmem_region);
>   
> -	if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
> -		memregion_free(rc);
> -		return -EBUSY;
> -	}
> +static ssize_t create_ram_region_store(struct device *dev,
> +				       struct device_attribute *attr,
> +				       const char *buf, size_t len)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +	struct cxl_region *cxlr;
> +	int rc, id;
>   
> -	cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM,
> -				   CXL_DECODER_EXPANDER);
> +	rc = sscanf(buf, "region%d\n", &id);
> +	if (rc != 1)
> +		return -EINVAL;
> +
> +	cxlr = __create_region(cxlrd, CXL_DECODER_RAM, id);
>   	if (IS_ERR(cxlr))
>   		return PTR_ERR(cxlr);
> -
>   	return len;
>   }
> -DEVICE_ATTR_RW(create_pmem_region);
> +DEVICE_ATTR_RW(create_ram_region);
>   
>   static ssize_t region_show(struct device *dev, struct device_attribute *attr,
>   			   char *buf)
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06 22:31   ` Dave Jiang
@ 2023-02-06 22:37     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 22:37 UTC (permalink / raw)
  To: Dave Jiang, Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi

Dave Jiang wrote:
> 
> 
> On 2/5/23 6:02 PM, Dan Williams wrote:
> > Expand the region creation infrastructure to enable 'ram'
> > (volatile-memory) regions. The internals of create_pmem_region_store()
> > and create_pmem_region_show() are factored out into helpers
> > __create_region() and __create_region_show() for the 'ram' case to
> > reuse.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> Minor comment below, otherwise:
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
[..]
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 4c4e1cbb1169..3acf2f17a73f 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -285,20 +285,20 @@ Description:
> >   		interleave_granularity).
> >   
> >   
> > -What:		/sys/bus/cxl/devices/decoderX.Y/create_pmem_region
> > -Date:		May, 2022
> > -KernelVersion:	v6.0
> > +What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
> > +Date:		May, 2022, January, 2023
> > +KernelVersion:	v6.0 (pmem), v6.3 (ram)
> >   Contact:	linux-cxl@vger.kernel.org
> >   Description:
> >   		(RW) Write a string in the form 'regionZ' to start the process
> > -		of defining a new persistent memory region (interleave-set)
> > -		within the decode range bounded by root decoder 'decoderX.Y'.
> > -		The value written must match the current value returned from
> > -		reading this attribute. An atomic compare exchange operation is
> > -		done on write to assign the requested id to a region and
> > -		allocate the region-id for the next creation attempt. EBUSY is
> > -		returned if the region name written does not match the current
> > -		cached value.
> > +		of defining a new persistent, or volatile memory region
> > +		(interleave-set) within the decode range bounded by root decoder
> > +		'decoderX.Y'. The value written must match the current value
> > +		returned from reading this attribute. An atomic compare exchange
> > +		operation is done on write to assign the requested id to a
> > +		region and allocate the region-id for the next creation attempt.
> > +		EBUSY is returned if the region name written does not match the
> 
> -EBUSY?

Userspace errno values are positive. So "$?" after "echo $region >
create_pmem_region" will be 16 not -16 in the busy case.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery
  2023-02-06  1:03 ` [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery Dan Williams
  2023-02-06 17:06   ` Jonathan Cameron
  2023-02-06 19:26   ` Ira Weiny
@ 2023-02-06 22:41   ` Dave Jiang
  2023-02-09  1:09   ` Verma, Vishal L
  3 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2023-02-06 22:41 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi



On 2/5/23 6:03 PM, Dan Williams wrote:
> Region autodiscovery is the process of kernel creating 'struct
> cxl_region' object to represent active CXL memory ranges it finds
> already active in hardware when the driver loads. Typically this happens
> when platform firmware establishes CXL memory regions and then publishes
> them in the memory map. However, this can also happen in the case of
> kexec-reboot after the kernel has created regions.
> 
> In the autodiscovery case the region creation process starts with a
> known endpoint decoder. Refactor attach_target() into a helper that is
> suitable to be called from either sysfs, for runtime region creation, or
> from cxl_port_probe() after it has enumerated all endpoint decoders.
> 
> The cxl_port_probe() context is an async device-core probing context, so
> it is not appropriate to allow SIGTERM to interrupt the assembly
> process. Refactor attach_target() to take @cxled and @state as arguments
> where @state indicates whether waiting from the region rwsem is
> interruptible or not.
> 
> No behavior change is intended.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/core/region.c |   47 +++++++++++++++++++++++++++------------------
>   1 file changed, 28 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 8dea49c021b8..97eafdd75675 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1418,31 +1418,25 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
>   	up_write(&cxl_region_rwsem);
>   }
>   
> -static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos)
> +static int attach_target(struct cxl_region *cxlr,
> +			 struct cxl_endpoint_decoder *cxled, int pos,
> +			 unsigned int state)
>   {
> -	struct device *dev;
> -	int rc;
> -
> -	dev = bus_find_device_by_name(&cxl_bus_type, NULL, decoder);
> -	if (!dev)
> -		return -ENODEV;
> -
> -	if (!is_endpoint_decoder(dev)) {
> -		put_device(dev);
> -		return -EINVAL;
> -	}
> +	int rc = 0;
>   
> -	rc = down_write_killable(&cxl_region_rwsem);
> +	if (state == TASK_INTERRUPTIBLE)
> +		rc = down_write_killable(&cxl_region_rwsem);
> +	else
> +		down_write(&cxl_region_rwsem);
>   	if (rc)
> -		goto out;
> +		return rc;
> +
>   	down_read(&cxl_dpa_rwsem);
> -	rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos);
> +	rc = cxl_region_attach(cxlr, cxled, pos);
>   	if (rc == 0)
>   		set_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
>   	up_read(&cxl_dpa_rwsem);
>   	up_write(&cxl_region_rwsem);
> -out:
> -	put_device(dev);
>   	return rc;
>   }
>   
> @@ -1480,8 +1474,23 @@ static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
>   
>   	if (sysfs_streq(buf, "\n"))
>   		rc = detach_target(cxlr, pos);
> -	else
> -		rc = attach_target(cxlr, buf, pos);
> +	else {
> +		struct device *dev;
> +
> +		dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
> +		if (!dev)
> +			return -ENODEV;
> +
> +		if (!is_endpoint_decoder(dev)) {
> +			rc = -EINVAL;
> +			goto out;
> +		}
> +
> +		rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos,
> +				   TASK_INTERRUPTIBLE);
> +out:
> +		put_device(dev);
> +	}
>   
>   	if (rc < 0)
>   		return rc;
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 17/18] dax: Assign RAM regions to memory-hotplug by default
  2023-02-06 19:05       ` Gregory Price
@ 2023-02-06 23:20         ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-06 23:20 UTC (permalink / raw)
  To: Gregory Price, Dan Williams
  Cc: linux-cxl, Michal Hocko, David Hildenbrand, Dave Hansen,
	linux-mm, linux-acpi

Gregory Price wrote:
> On Mon, Feb 06, 2023 at 02:15:56PM -0800, Dan Williams wrote:
> > Gregory Price wrote:
> > > On Sun, Feb 05, 2023 at 05:04:05PM -0800, Dan Williams wrote:
> > [..]
> > > 
> > > Stupid question: when defaulting to online, do these devices get placed
> > > into Zone Normal?  Is there a way for us, at a minimum, to online this
> > > as Zone Moveable in an effort to assist the "hope the kernel has not
> > > pinned a page" problem (and to try to keep kernel resources out of this
> > > zone in general).
> > > 
> > > If this is covered by a different patch or already set up this way,
> > > ignore me :]
> > 
> > Have a look in Documentation/admin-guide/mm/memory-hotplug.rst, the
> > 'daxctl recconfigure-device' man page and the the policy options of how
> > hot-added memory is brought online.
> > 
> > The routing can be anything from fully offline device-dax, to fully
> > online ZONE_NORMAL, or even a mix of device-dax subdivision,
> > ZONE_NORMAL, and ZONE_MOVABLE memblocks all within one CXL ram region.
> 
> Hm.
> 
> I'm just thinking, for early expander devices the assumption that the
> default behavior should be auto-online is good.  Just kinda working
> through this in my head for multi-headed devices and early pools, and I
> suppose the kernel paremeter covers that. Most of those will be used in
> specialty, vertically integrated systems, so we're good to go.
> 
> In the future when we get fully featured DCD's, I imagine we will have a
> memory region of size X, with N memory blocks, but only some of those
> blocks are online.  I suppose it could also be done such that DCD capacity
> add events drive the creation of new blocks under the region, as opposed
> to having the region pre-create the blocks.

In the case of DCD the proposal is "sparse" dax regions. So there will
be nothing to online by default until extents start arriving. I do
expect DCD systems will opt for memhp_default_state=offline so that
userspace policy can be applied.

> Either way I think the kernel parameter probably covers this case as
> well, I'm just wondering if at some point the default will yet again
> wish to be false, but i think that's a "CXL has become so successful all
> memory is now CXL DCDs and computers look radically different" scenario.

It's a distro policy. Fedora and RHEL, for example, pick different
defaults.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper
  2023-02-06  1:03 ` [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper Dan Williams
  2023-02-06 17:02   ` Gregory Price
  2023-02-06 19:28   ` Ira Weiny
@ 2023-02-06 23:41   ` Dave Jiang
  2023-02-08 12:32   ` Jonathan Cameron
  2023-02-09 19:47   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2023-02-06 23:41 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: Kees Cook, dave.hansen, linux-mm, linux-acpi



On 2/5/23 6:03 PM, Dan Williams wrote:
> In support of the CXL subsystem's use of 'struct range' to track decode
> address ranges, add a common range_contains() implementation with
> identical semantics as resource_contains();
> 
> The existing 'range_contains()' in lib/stackinit_kunit.c is namespaced
> with a 'stackinit_' prefix.
> 
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

I can use this in my QTG series for DPA matching.
> ---
>   drivers/cxl/core/pci.c |    5 -----
>   include/linux/range.h  |    5 +++++
>   lib/stackinit_kunit.c  |    6 +++---
>   3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 1d1492440287..9ed2120dbf8a 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -214,11 +214,6 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
>   	return devm_add_action_or_reset(host, clear_mem_enable, cxlds);
>   }
>   
> -static bool range_contains(struct range *r1, struct range *r2)
> -{
> -	return r1->start <= r2->start && r1->end >= r2->end;
> -}
> -
>   /* require dvsec ranges to be covered by a locked platform window */
>   static int dvsec_range_allowed(struct device *dev, void *arg)
>   {
> diff --git a/include/linux/range.h b/include/linux/range.h
> index 274681cc3154..7efb6a9b069b 100644
> --- a/include/linux/range.h
> +++ b/include/linux/range.h
> @@ -13,6 +13,11 @@ static inline u64 range_len(const struct range *range)
>   	return range->end - range->start + 1;
>   }
>   
> +static inline bool range_contains(struct range *r1, struct range *r2)
> +{
> +	return r1->start <= r2->start && r1->end >= r2->end;
> +}
> +
>   int add_range(struct range *range, int az, int nr_range,
>   		u64 start, u64 end);
>   
> diff --git a/lib/stackinit_kunit.c b/lib/stackinit_kunit.c
> index 4591d6cf5e01..05947a2feb93 100644
> --- a/lib/stackinit_kunit.c
> +++ b/lib/stackinit_kunit.c
> @@ -31,8 +31,8 @@ static volatile u8 forced_mask = 0xff;
>   static void *fill_start, *target_start;
>   static size_t fill_size, target_size;
>   
> -static bool range_contains(char *haystack_start, size_t haystack_size,
> -			   char *needle_start, size_t needle_size)
> +static bool stackinit_range_contains(char *haystack_start, size_t haystack_size,
> +				     char *needle_start, size_t needle_size)
>   {
>   	if (needle_start >= haystack_start &&
>   	    needle_start + needle_size <= haystack_start + haystack_size)
> @@ -175,7 +175,7 @@ static noinline void test_ ## name (struct kunit *test)		\
>   								\
>   	/* Validate that compiler lined up fill and target. */	\
>   	KUNIT_ASSERT_TRUE_MSG(test,				\
> -		range_contains(fill_start, fill_size,		\
> +		stackinit_range_contains(fill_start, fill_size,	\
>   			    target_start, target_size),		\
>   		"stack fill missed target!? "			\
>   		"(fill %zu wide, target offset by %d)\n",	\
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled
  2023-02-06  1:03 ` [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled Dan Williams
  2023-02-06 17:03   ` Gregory Price
@ 2023-02-06 23:57   ` Dave Jiang
  2023-02-08 12:36   ` Jonathan Cameron
  2023-02-09 20:17   ` Verma, Vishal L
  3 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2023-02-06 23:57 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi



On 2/5/23 6:03 PM, Dan Williams wrote:
> Add help text and a label so the CXL_REGION config option can be
> toggled. This is mainly to enable compile testing without region
> support.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/Kconfig |   12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 0ac53c422c31..163c094e67ae 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -104,12 +104,22 @@ config CXL_SUSPEND
>   	depends on SUSPEND && CXL_MEM
>   
>   config CXL_REGION
> -	bool
> +	bool "CXL: Region Support"
>   	default CXL_BUS
>   	# For MAX_PHYSMEM_BITS
>   	depends on SPARSEMEM
>   	select MEMREGION
>   	select GET_FREE_REGION
> +	help
> +	  Enable the CXL core to enumerate and provision CXL regions. A CXL
> +	  region is defined by one or more CXL expanders that decode a given
> +	  system-physical address range. For CXL regions established by
> +	  platform-firmware this option enables memory error handling to
> +	  identify the devices participating in a given interleaved memory
> +	  range. Otherwise, platform-firmware managed CXL is enabled by being
> +	  placed in the system address map and does not need a driver.
> +
> +	  If unsure say 'y'
>   
>   config CXL_REGION_INVALIDATION_TEST
>   	bool "CXL: Region Cache Management Bypass (TEST)"
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 10/18] cxl/region: Fix passthrough-decoder detection
  2023-02-06  1:03 ` [PATCH 10/18] cxl/region: Fix passthrough-decoder detection Dan Williams
  2023-02-06  5:38   ` Greg KH
@ 2023-02-07  0:00   ` Dave Jiang
  2023-02-08 12:44   ` Jonathan Cameron
  2023-02-09 20:28   ` Verma, Vishal L
  3 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2023-02-07  0:00 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: stable, dave.hansen, linux-mm, linux-acpi



On 2/5/23 6:03 PM, Dan Williams wrote:
> A passthrough decoder is a decoder that maps only 1 target. It is a
> special case because it does not impose any constraints on the
> interleave-math as compared to a decoder with multiple targets. Extend
> the passthrough case to multi-target-capable decoders that only have one
> target selected. I.e. the current code was only considering passthrough
> *ports* which are only a subset of the potential passthrough decoder
> scenarios.
> 
> Fixes: e4f6dfa9ef75 ("cxl/region: Fix 'distance' calculation with passthrough ports")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/core/region.c |    4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c82d3b6f3d1f..34cf95217901 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1019,10 +1019,10 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>   		int i, distance;
>   
>   		/*
> -		 * Passthrough ports impose no distance requirements between
> +		 * Passthrough decoders impose no distance requirements between
>   		 * peers
>   		 */
> -		if (port->nr_dports == 1)
> +		if (cxl_rr->nr_targets == 1)
>   			distance = 0;
>   		else
>   			distance = p->nr_targets / cxl_rr->nr_targets;
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 11/18] cxl/region: Add region autodiscovery
  2023-02-06  1:03 ` [PATCH 11/18] cxl/region: Add region autodiscovery Dan Williams
  2023-02-06 19:02   ` Ira Weiny
@ 2023-02-07 23:54   ` Dave Jiang
  2023-02-08 17:07   ` Jonathan Cameron
  2 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2023-02-07 23:54 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: dave.hansen, linux-mm, linux-acpi



On 2/5/23 6:03 PM, Dan Williams wrote:
> Region autodiscovery is an asynchrounous state machine advanced by
> cxl_port_probe(). After the decoders on an endpoint port are enumerated
> they are scanned for actively enabled instances. Each active decoder is
> flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region.
> If a region does not already exist for the address range setting of the
> decoder one is created. That creation process may race with other
> decoders of the same region being discovered since cxl_port_probe() is
> asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is
> introduced to mitigate that race.
> 
> Once all decoders have arrived, "p->nr_targets == p->interleave_ways",
> they are sorted by their relative decode position. The sort algorithm
> involves finding the point in the cxl_port topology where one leg of the
> decode leads to deviceA and the other deviceB. At that point in the
> topology the target order in the 'struct cxl_switch_decoder' indicates
> the relative position of those endpoint decoders in the region.
> 
>>From that point the region goes through the same setup and validation
> steps as user-created regions, but instead of programming the decoders
> it validates that driver would have written the same values to the
> decoders as were already present.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

May want to run checkpatch on this one.

DJ

> ---
>   drivers/cxl/core/hdm.c    |    5
>   drivers/cxl/core/port.c   |    2
>   drivers/cxl/core/region.c |  496 ++++++++++++++++++++++++++++++++++++++++++++-
>   drivers/cxl/cxl.h         |   16 +
>   drivers/cxl/port.c        |   26 ++
>   5 files changed, 531 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index dcc16d7cb8f3..174cddfec6e8 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -674,7 +674,7 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>   	up_read(&cxl_dpa_rwsem);
>   
>   	port->commit_end--;
> -	cxld->flags &= ~CXL_DECODER_F_ENABLE;
> +	cxld->flags &= ~(CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO);
>   
>   	return 0;
>   }
> @@ -719,7 +719,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>   
>   	/* decoders are enabled if committed */
>   	if (committed) {
> -		cxld->flags |= CXL_DECODER_F_ENABLE;
> +		cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO;
>   		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
>   			cxld->flags |= CXL_DECODER_F_LOCK;
>   		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
> @@ -783,6 +783,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>   		return rc;
>   	}
>   	*dpa_base += dpa_size + skip;
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 47e450c3a5a9..8130430ffbcf 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -446,6 +446,7 @@ bool is_endpoint_decoder(struct device *dev)
>   {
>   	return dev->type == &cxl_decoder_endpoint_type;
>   }
> +EXPORT_SYMBOL_NS_GPL(is_endpoint_decoder, CXL);
>   
>   bool is_root_decoder(struct device *dev)
>   {
> @@ -1622,6 +1623,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>   	}
>   
>   	cxlrd->calc_hb = calc_hb;
> +	mutex_init(&cxlrd->range_lock);
>   
>   	cxld = &cxlsd->cxld;
>   	cxld->dev.type = &cxl_decoder_root_type;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 34cf95217901..6fe8c70790df 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -6,6 +6,7 @@
>   #include <linux/module.h>
>   #include <linux/slab.h>
>   #include <linux/uuid.h>
> +#include <linux/sort.h>
>   #include <linux/idr.h>
>   #include <cxlmem.h>
>   #include <cxl.h>
> @@ -520,7 +521,12 @@ static void cxl_region_iomem_release(struct cxl_region *cxlr)
>   	if (device_is_registered(&cxlr->dev))
>   		lockdep_assert_held_write(&cxl_region_rwsem);
>   	if (p->res) {
> -		remove_resource(p->res);
> +		/*
> +		 * Autodiscovered regions may not have been able to insert their
> +		 * resource.
> +		 */
> +		if (p->res->parent)
> +			remove_resource(p->res);
>   		kfree(p->res);
>   		p->res = NULL;
>   	}
> @@ -1101,12 +1107,35 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>   		return rc;
>   	}
>   
> -	cxld->interleave_ways = iw;
> -	cxld->interleave_granularity = ig;
> -	cxld->hpa_range = (struct range) {
> -		.start = p->res->start,
> -		.end = p->res->end,
> -	};
> +	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
> +		if (cxld->interleave_ways != iw ||
> +		    cxld->interleave_granularity != ig ||
> +		    cxld->hpa_range.start != p->res->start ||
> +		    cxld->hpa_range.end != p->res->end ||
> +		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
> +			dev_err(&cxlr->dev,
> +				"%s:%s %s expected iw: %d ig: %d %pr\n",
> +				dev_name(port->uport), dev_name(&port->dev),
> +				__func__, iw, ig, p->res);
> +			dev_err(&cxlr->dev,
> +				"%s:%s %s got iw: %d ig: %d state: %s %#llx:%#llx\n",
> +				dev_name(port->uport), dev_name(&port->dev),
> +				__func__, cxld->interleave_ways,
> +				cxld->interleave_granularity,
> +				(cxld->flags & CXL_DECODER_F_ENABLE) ?
> +					"enabled" :
> +					"disabled",
> +				cxld->hpa_range.start, cxld->hpa_range.end);
> +			return -ENXIO;
> +		}
> +	} else {
> +		cxld->interleave_ways = iw;
> +		cxld->interleave_granularity = ig;
> +		cxld->hpa_range = (struct range) {
> +			.start = p->res->start,
> +			.end = p->res->end,
> +		};
> +	}
>   	dev_dbg(&cxlr->dev, "%s:%s iw: %d ig: %d\n", dev_name(port->uport),
>   		dev_name(&port->dev), iw, ig);
>   add_target:
> @@ -1117,7 +1146,17 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>   			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos);
>   		return -ENXIO;
>   	}
> -	cxlsd->target[cxl_rr->nr_targets_set] = ep->dport;
> +	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
> +		if (cxlsd->target[cxl_rr->nr_targets_set] != ep->dport) {
> +			dev_dbg(&cxlr->dev, "%s:%s: %s expected %s at %d\n",
> +				dev_name(port->uport), dev_name(&port->dev),
> +				dev_name(&cxlsd->cxld.dev),
> +				dev_name(ep->dport->dport),
> +				cxl_rr->nr_targets_set);
> +			return -ENXIO;
> +		}
> +	} else
> +		cxlsd->target[cxl_rr->nr_targets_set] = ep->dport;
>   	inc = 1;
>   out_target_set:
>   	cxl_rr->nr_targets_set += inc;
> @@ -1159,6 +1198,13 @@ static void cxl_region_teardown_targets(struct cxl_region *cxlr)
>   	struct cxl_ep *ep;
>   	int i;
>   
> +	/*
> +	 * In the auto-discovery case skip automatic teardown since the
> +	 * address space is already active
> +	 */
> +	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> +		return;
> +
>   	for (i = 0; i < p->nr_targets; i++) {
>   		cxled = p->targets[i];
>   		cxlmd = cxled_to_memdev(cxled);
> @@ -1191,8 +1237,8 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr)
>   			iter = to_cxl_port(iter->dev.parent);
>   
>   		/*
> -		 * Descend the topology tree programming targets while
> -		 * looking for conflicts.
> +		 * Descend the topology tree programming / validating
> +		 * targets while looking for conflicts.
>   		 */
>   		for (ep = cxl_ep_load(iter, cxlmd); iter;
>   		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
> @@ -1287,6 +1333,182 @@ static int cxl_region_attach_position(struct cxl_region *cxlr,
>   	return rc;
>   }
>   
> +static int cxl_region_attach_auto(struct cxl_region *cxlr,
> +				  struct cxl_endpoint_decoder *cxled, int pos)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +
> +	if (!(cxled->cxld.flags & CXL_DECODER_F_AUTO)) {
> +		dev_err(&cxlr->dev,
> +			"%s: unable to add decoder to autodetected region\n",
> +			dev_name(&cxled->cxld.dev));
> +		return -EINVAL;
> +	}
> +
> +	if (pos >= 0) {
> +		dev_dbg(&cxlr->dev, "%s: expected auto position, not %d\n",
> +			dev_name(&cxled->cxld.dev), pos);
> +		return -EINVAL;
> +	}
> +
> +	if (p->nr_targets >= p->interleave_ways) {
> +		dev_err(&cxlr->dev, "%s: no more target slots available\n",
> +			dev_name(&cxled->cxld.dev));
> +		return -ENXIO;
> +	}
> +
> +	/*
> +	 * Temporarily record the endpoint decoder into the target array. Yes,
> +	 * this means that userspace can view devices in the wrong position
> +	 * before the region activates, and must be careful to understand when
> +	 * it might be racing region autodiscovery.
> +	 */
> +	pos = p->nr_targets;
> +	p->targets[pos] = cxled;
> +	cxled->pos = pos;
> +	p->nr_targets++;
> +
> +	return 0;
> +}
> +
> +static struct cxl_port *next_port(struct cxl_port *port)
> +{
> +	if (!port->parent_dport)
> +		return NULL;
> +	return port->parent_dport->port;
> +}
> +
> +static int decoder_match_range(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled = data;
> +	struct cxl_switch_decoder *cxlsd;
> +
> +	if (!is_switch_decoder(dev))
> +		return 0;
> +
> +	cxlsd = to_cxl_switch_decoder(dev);
> +	return range_contains(&cxlsd->cxld.hpa_range, &cxled->cxld.hpa_range);
> +}
> +
> +static void find_positions(const struct cxl_switch_decoder *cxlsd,
> +			   const struct cxl_port *iter_a,
> +			   const struct cxl_port *iter_b, int *a_pos,
> +			   int *b_pos)
> +{
> +	int i;
> +
> +	for (i = 0, *a_pos = -1, *b_pos = -1; i < cxlsd->nr_targets; i++) {
> +		if (cxlsd->target[i] == iter_a->parent_dport)
> +			*a_pos = i;
> +		else if (cxlsd->target[i] == iter_b->parent_dport)
> +			*b_pos = i;
> +		if (*a_pos >= 0 && *b_pos >= 0)
> +			break;
> +	}
> +}
> +
> +static int cmp_decode_pos(const void *a, const void *b)
> +{
> +	struct cxl_endpoint_decoder *cxled_a = *(typeof(cxled_a) *)a;
> +	struct cxl_endpoint_decoder *cxled_b = *(typeof(cxled_b) *)b;
> +	struct cxl_memdev *cxlmd_a = cxled_to_memdev(cxled_a);
> +	struct cxl_memdev *cxlmd_b = cxled_to_memdev(cxled_b);
> +	struct cxl_port *port_a = cxled_to_port(cxled_a);
> +	struct cxl_port *port_b = cxled_to_port(cxled_b);
> +	struct cxl_port *iter_a, *iter_b;
> +
> +	/* Exit early if any prior sorting failed */
> +	if (cxled_a->pos < 0 || cxled_b->pos < 0)
> +		return 0;
> +
> +	/*
> +	 * Walk up the hierarchy to find a shared port, find the decoder
> +	 * that maps the range, compare the relative position of those
> +	 * dport mappings.
> +	 */
> +	for (iter_a = port_a; iter_a; iter_a = next_port(iter_a)) {
> +		struct cxl_port *next_a, *next_b, *port;
> +		struct cxl_switch_decoder *cxlsd;
> +
> +		next_a = next_port(iter_a);
> +		for (iter_b = port_b; iter_b; iter_b = next_port(iter_b)) {
> +			int a_pos, b_pos, result;
> +			struct device *dev;
> +			unsigned int seq;
> +
> +			next_b = next_port(iter_b);
> +			if (next_a != next_b)
> +				continue;
> +			if (!next_a)
> +				goto out;
> +			port = next_a;
> +			dev = device_find_child(&port->dev, cxled_a,
> +						decoder_match_range);
> +			if (!dev) {
> +				struct range *range = &cxled_a->cxld.hpa_range;
> +
> +				dev_err(port->uport,
> +					"failed to find decoder that maps %#llx-:%#llx\n",
> +					range->start, range->end);
> +				cxled_a->pos = -1;
> +				return 0;
> +			}
> +
> +			cxlsd = to_cxl_switch_decoder(dev);
> +			do {
> +				seq = read_seqbegin(&cxlsd->target_lock);
> +				find_positions(cxlsd, iter_a, iter_b, &a_pos,
> +					       &b_pos);
> +			} while (read_seqretry(&cxlsd->target_lock, seq));
> +
> +			if (a_pos < 0 || b_pos < 0) {
> +				dev_err(port->uport,
> +					"failed to find shared decoder for %s and %s\n",
> +					dev_name(cxlmd_a->dev.parent),
> +					dev_name(cxlmd_b->dev.parent));
> +				cxled_a->pos = -1;
> +				result = 0;
> +			} else {
> +				result = a_pos - b_pos;
> +				dev_dbg(port->uport, "%s: %s comes %s %s\n",
> +					dev_name(&cxlsd->cxld.dev),
> +					dev_name(cxlmd_a->dev.parent),
> +					result < 0 ? "before" : "after",
> +					dev_name(cxlmd_b->dev.parent));
> +			}
> +
> +			put_device(dev);
> +
> +			return result;
> +		}
> +	}
> +out:
> +	dev_err(cxlmd_a->dev.parent, "failed to find shared port with %s\n",
> +		dev_name(cxlmd_b->dev.parent));
> +	cxled_a->pos = -1;
> +	return 0;
> +}
> +
> +static int cxl_region_sort_targets(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	int i, rc = 0;
> +
> +	sort(p->targets, p->nr_targets, sizeof(p->targets[0]), cmp_decode_pos,
> +	     NULL);
> +
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> +		if (cxled->pos < 0)
> +			rc = -ENXIO;
> +		cxled->pos = i;
> +	}
> +
> +	dev_dbg(&cxlr->dev, "region sort %s\n", rc ? "failed" : "successful");
> +	return rc;
> +}
> +
>   static int cxl_region_attach(struct cxl_region *cxlr,
>   			     struct cxl_endpoint_decoder *cxled, int pos)
>   {
> @@ -1350,6 +1572,50 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>   		return -EINVAL;
>   	}
>   
> +	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
> +		int i;
> +
> +		rc = cxl_region_attach_auto(cxlr, cxled, pos);
> +		if (rc)
> +			return rc;
> +
> +		/* await more targets to arrive... */
> +		if (p->nr_targets < p->interleave_ways)
> +			return 0;
> +
> +		/*
> +		 * All targets are here, which implies all PCI enumeration that
> +		 * affects this region has been completed. Walk the topology to
> +		 * sort the devices into their relative region decode position.
> +		 */
> +		rc = cxl_region_sort_targets(cxlr);
> +		if (rc)
> +			return rc;
> +
> +		for (i = 0; i < p->nr_targets; i++) {
> +			cxled = p->targets[i];
> +			ep_port = cxled_to_port(cxled);
> +			dport = cxl_find_dport_by_dev(root_port,
> +						      ep_port->host_bridge);
> +			rc = cxl_region_attach_position(cxlr, cxlrd, cxled,
> +							dport, i);
> +			if (rc)
> +				return rc;
> +		}
> +
> +		rc = cxl_region_setup_targets(cxlr);
> +		if (rc)
> +			return rc;
> +
> +		/*
> +		 * If target setup succeeds in the autodiscovery case
> +		 * then the region is already committed.
> +		 */
> +		p->state = CXL_CONFIG_COMMIT;
> +
> +		return 0;
> +	}
> +
>   	rc = cxl_region_validate_position(cxlr, cxled, pos);
>   	if (rc)
>   		return rc;
> @@ -1500,8 +1766,8 @@ static int detach_target(struct cxl_region *cxlr, int pos)
>   	return rc;
>   }
>   
> -static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
> -			    size_t len)
> +static ssize_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
> +			     size_t len)
>   {
>   	int rc;
>   
> @@ -2079,6 +2345,191 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr)
>   	return rc;
>   }
>   
> +static int match_decoder_by_range(struct device *dev, void *data)
> +{
> +	struct range *r1, *r2 = data;
> +	struct cxl_root_decoder *cxlrd;
> +
> +	if (!is_root_decoder(dev))
> +		return 0;
> +
> +	cxlrd = to_cxl_root_decoder(dev);
> +	r1 = &cxlrd->cxlsd.cxld.hpa_range;
> +	return range_contains(r1, r2);
> +}
> +
> +static int match_region_by_range(struct device *dev, void *data)
> +{
> +	struct cxl_region_params *p;
> +	struct cxl_region *cxlr;
> +	struct range *r = data;
> +	int rc = 0;
> +
> +	if (!is_cxl_region(dev))
> +		return 0;
> +
> +	cxlr = to_cxl_region(dev);
> +	p = &cxlr->params;
> +
> +	down_read(&cxl_region_rwsem);
> +	if (p->res && p->res->start == r->start && p->res->end == r->end)
> +		rc = 1;
> +	up_read(&cxl_region_rwsem);
> +
> +	return rc;
> +}
> +
> +/* Establish an empty region covering the given HPA range */
> +static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> +					   struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_port *port = cxlrd_to_port(cxlrd);
> +	struct range *hpa = &cxled->cxld.hpa_range;
> +	struct cxl_region_params *p;
> +	struct cxl_region *cxlr;
> +	struct resource *res;
> +	int rc;
> +
> +	do {
> +		cxlr = __create_region(cxlrd, cxled->mode,
> +				       atomic_read(&cxlrd->region_id));
> +	} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
> +
> +	if (IS_ERR(cxlr)) {
> +		dev_err(cxlmd->dev.parent,
> +			"%s:%s: %s failed assign region: %ld\n",
> +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +			__func__, PTR_ERR(cxlr));
> +		return cxlr;
> +	}
> +
> +	down_write(&cxl_region_rwsem);
> +	p = &cxlr->params;
> +	if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +		dev_err(cxlmd->dev.parent,
> +			"%s:%s: %s autodiscovery interrupted\n",
> +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +			__func__);
> +		rc = -EBUSY;
> +		goto err;
> +	}
> +
> +	set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
> +
> +	res = kmalloc(sizeof(*res), GFP_KERNEL);
> +	if (!res) {
> +		rc = -ENOMEM;
> +		goto err;
> +	}
> +
> +	*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
> +				    dev_name(&cxlr->dev));
> +	rc = insert_resource(cxlrd->res, res);
> +	if (rc) {
> +		/*
> +		 * Platform-firmware may not have split resources like "System
> +		 * RAM" on CXL window boundaries see cxl_region_iomem_release()
> +		 */
> +		dev_warn(cxlmd->dev.parent,
> +			 "%s:%s: %s %s cannot insert resource\n",
> +			 dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +			 __func__, dev_name(&cxlr->dev));
> +	}
> +
> +	p->res = res;
> +	p->interleave_ways = cxled->cxld.interleave_ways;
> +	p->interleave_granularity = cxled->cxld.interleave_granularity;
> +	p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
> +
> +	rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
> +	if (rc)
> +		goto err;
> +
> +	dev_dbg(cxlmd->dev.parent, "%s:%s: %s %s res: %pr iw: %d ig: %d\n",
> +		dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), __func__,
> +		dev_name(&cxlr->dev), p->res, p->interleave_ways,
> +		p->interleave_granularity);
> +
> +	/* ...to match put_device() in cxl_add_to_region() */
> +	get_device(&cxlr->dev);
> +	up_write(&cxl_region_rwsem);
> +
> +	return cxlr;
> +
> +err:
> +	up_write(&cxl_region_rwsem);
> +	devm_release_action(port->uport, unregister_region, cxlr);
> +	return ERR_PTR(rc);
> +}
> +
> +void cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct range *hpa = &cxled->cxld.hpa_range;
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	struct cxl_root_decoder *cxlrd;
> +	struct cxl_region_params *p;
> +	struct cxl_region *cxlr;
> +	struct cxl_port *root;
> +	bool attach = false;
> +	struct device *dev;
> +
> +	root = find_cxl_root(&cxlmd->dev);
> +	if (!root) {
> +		dev_err(cxlmd->dev.parent, "%s: failed to map CXL root\n",
> +			dev_name(&cxlmd->dev));
> +		return;
> +	}
> +
> +	dev = device_find_child(&root->dev, &cxld->hpa_range,
> +				  match_decoder_by_range);
> +	if (!dev) {
> +		dev_err(cxlmd->dev.parent,
> +			"%s:%s no CXL window for range %#llx:%#llx\n",
> +			dev_name(&cxlmd->dev), dev_name(&cxld->dev),
> +			cxld->hpa_range.start, cxld->hpa_range.end);
> +		goto out;
> +	}
> +	cxlrd = to_cxl_root_decoder(dev);
> +
> +	mutex_lock(&cxlrd->range_lock);
> +	dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa,
> +				match_region_by_range);
> +	if (!dev)
> +		cxlr = construct_region(cxlrd, cxled);
> +	else
> +		cxlr = to_cxl_region(dev);
> +	mutex_unlock(&cxlrd->range_lock);
> +
> +	if (IS_ERR(cxlr))
> +		goto out;
> +
> +	attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
> +
> +	down_read(&cxl_region_rwsem);
> +	p = &cxlr->params;
> +	attach = p->state >= CXL_CONFIG_COMMIT;
> +	up_read(&cxl_region_rwsem);
> +
> +	if (attach) {
> +		int rc = device_attach(&cxlr->dev);
> +
> +		/*
> +		 * If device_attach() fails the range may still be
> +		 * active via the platform-firmware memory map
> +		 */
> +		if (rc < 0)
> +			dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
> +				p->res);
> +	}
> +
> +	put_device(&cxlr->dev);
> +out:
> +	put_device(&root->dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, CXL);
> +
>   static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
>   {
>   	if (!test_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags))
> @@ -2103,6 +2554,15 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
>   	return 0;
>   }
>   
> +static int is_system_ram(struct resource *res, void *arg)
> +{
> +	struct cxl_region *cxlr = arg;
> +	struct cxl_region_params *p = &cxlr->params;
> +
> +	dev_dbg(&cxlr->dev, "%pr has System RAM: %pr\n", p->res, res);
> +	return 1;
> +}
> +
>   static int cxl_region_probe(struct device *dev)
>   {
>   	struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -2136,6 +2596,18 @@ static int cxl_region_probe(struct device *dev)
>   	switch (cxlr->mode) {
>   	case CXL_DECODER_PMEM:
>   		return devm_cxl_add_pmem_region(cxlr);
> +	case CXL_DECODER_RAM:
> +		/*
> +		 * The region can not be manged by CXL if any portion of
> +		 * it is already online as 'System RAM'
> +		 */
> +		if (walk_iomem_res_desc(IORES_DESC_NONE,
> +					IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> +					p->res->start, p->res->end, cxlr,
> +					is_system_ram) > 0)
> +			return 0;
> +		dev_dbg(dev, "TODO: hookup devdax\n");
> +		return 0;
>   	default:
>   		dev_dbg(&cxlr->dev, "unsupported region mode: %d\n",
>   			cxlr->mode);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index ca76879af1de..9b3765c5c81a 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -261,6 +261,8 @@ resource_size_t cxl_rcrb_to_component(struct device *dev,
>    * cxl_decoder flags that define the type of memory / devices this
>    * decoder supports as well as configuration lock status See "CXL 2.0
>    * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
> + * Additionally indicate whether decoder settings were autodetected,
> + * user customized.
>    */
>   #define CXL_DECODER_F_RAM   BIT(0)
>   #define CXL_DECODER_F_PMEM  BIT(1)
> @@ -268,6 +270,7 @@ resource_size_t cxl_rcrb_to_component(struct device *dev,
>   #define CXL_DECODER_F_TYPE3 BIT(3)
>   #define CXL_DECODER_F_LOCK  BIT(4)
>   #define CXL_DECODER_F_ENABLE    BIT(5)
> +#define CXL_DECODER_F_AUTO  BIT(6)
>   #define CXL_DECODER_F_MASK  GENMASK(5, 0)
>   
>   enum cxl_decoder_type {
> @@ -380,6 +383,7 @@ typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
>    * @region_id: region id for next region provisioning event
>    * @calc_hb: which host bridge covers the n'th position by granularity
>    * @platform_data: platform specific configuration data
> + * @range_lock: sync region autodiscovery by address range
>    * @cxlsd: base cxl switch decoder
>    */
>   struct cxl_root_decoder {
> @@ -387,6 +391,7 @@ struct cxl_root_decoder {
>   	atomic_t region_id;
>   	cxl_calc_hb_fn calc_hb;
>   	void *platform_data;
> +	struct mutex range_lock;
>   	struct cxl_switch_decoder cxlsd;
>   };
>   
> @@ -436,6 +441,13 @@ struct cxl_region_params {
>    */
>   #define CXL_REGION_F_INCOHERENT 0
>   
> +/*
> + * Indicate whether this region has been assembled by autodetection or
> + * userspace assembly. Prevent endpoint decoders outside of automatic
> + * detection from being added to the region.
> + */
> +#define CXL_REGION_F_AUTO 1
> +
>   /**
>    * struct cxl_region - CXL region
>    * @dev: This region's device
> @@ -699,6 +711,7 @@ struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct device *dev);
>   #ifdef CONFIG_CXL_REGION
>   bool is_cxl_pmem_region(struct device *dev);
>   struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
> +void cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
>   #else
>   static inline bool is_cxl_pmem_region(struct device *dev)
>   {
> @@ -708,6 +721,9 @@ static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev)
>   {
>   	return NULL;
>   }
> +static inline void cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> +{
> +}
>   #endif
>   
>   /*
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 5453771bf330..012a0c6f8476 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -30,6 +30,23 @@ static void schedule_detach(void *cxlmd)
>   	schedule_cxl_memdev_detach(cxlmd);
>   }
>   
> +static int discover_region(struct device *dev, void *data)
> +{
> +	const unsigned long flags = CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO;
> +	struct cxl_endpoint_decoder *cxled;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	if ((cxled->cxld.flags & flags) != flags)
> +		return 0;
> +
> +	cxl_add_to_region(cxled);
> +
> +	return 0;
> +}
> +
>   static int cxl_port_probe(struct device *dev)
>   {
>   	struct cxl_port *port = to_cxl_port(dev);
> @@ -78,6 +95,15 @@ static int cxl_port_probe(struct device *dev)
>   		return rc;
>   	}
>   
> +	if (!is_cxl_endpoint(port))
> +		return 0;
> +
> +	/*
> +	 * Now that all endpoint decoders are successfully enumerated,
> +	 * try to assemble regions from committed decoders
> +	 */
> +	device_for_each_child(dev, NULL, discover_region);
> +
>   	return 0;
>   }
>   
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 07/18] cxl/region: Move region-position validation to a helper
  2023-02-06  1:03 ` [PATCH 07/18] cxl/region: Move region-position validation to a helper Dan Williams
  2023-02-06 17:44   ` Ira Weiny
@ 2023-02-08 12:30   ` Jonathan Cameron
  2023-02-09  4:09     ` Dan Williams
  2023-02-09  4:26     ` Dan Williams
  2023-02-09 19:45   ` Verma, Vishal L
  2 siblings, 2 replies; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-08 12:30 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:03:07 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for region autodiscovery, that needs all devices
> discovered before their relative position in the region can be
> determined, consolidate all position dependent validation in a helper.
> 
> Recall that in the on-demand region creation flow the end-user picks the
> position of a given endpoint decoder in a region. In the autodiscovery
> case the position of an endpoint decoder can only be determined after
> all other endpoint decoders that claim to decode the region's address
> range have been enumerated and attached. So, in the autodiscovery case
> endpoint decoders may be attached before their relative position is
> known. Once all decoders arrive, then positions can be determined and
> validated with cxl_region_validate_position() the same as user initiated
> on-demand creation.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Hi Dan,

A few comments inline, but mostly reflect the original code rather than
the refactoring you have done in this patch.

Jonathan


> +static int cxl_region_attach(struct cxl_region *cxlr,
> +			     struct cxl_endpoint_decoder *cxled, int pos)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct cxl_port *ep_port, *root_port;
> +	struct cxl_dport *dport;
> +	int rc = -ENXIO;
> +
> +	if (cxled->mode != cxlr->mode) {
> +		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> +			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> +		return -EINVAL;
> +	}
> +
> +	if (cxled->mode == CXL_DECODER_DEAD) {
> +		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> +		return -ENODEV;
> +	}
> +
> +	/* all full of members, or interleave config not established? */
> +	if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +		dev_dbg(&cxlr->dev, "region already active\n");
> +		return -EBUSY;
> +	} else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +		dev_dbg(&cxlr->dev, "interleave config missing\n");
> +		return -ENXIO;
> +	}
> +
>  	ep_port = cxled_to_port(cxled);
>  	root_port = cxlrd_to_port(cxlrd);
>  	dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge);
> @@ -1284,13 +1327,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  		return -ENXIO;
>  	}
>  
> -	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> -		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> -			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> -			dev_name(&cxlrd->cxlsd.cxld.dev));
> -		return -ENXIO;
> -	}
> -

In an ideal world, this would have been nice as two patches.
One that reorders the various checks so that they are in the order
after you have factored things out (easy to review for correctness)
then one that factored it out.

>  	if (cxled->cxld.target_type != cxlr->type) {
>  		dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
>  			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> @@ -1314,12 +1350,13 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  		return -EINVAL;
>  	}
>  
> -	for (iter = ep_port; !is_cxl_root(iter);
> -	     iter = to_cxl_port(iter->dev.parent)) {
> -		rc = cxl_port_attach_region(iter, cxlr, cxled, pos);
> -		if (rc)
> -			goto err;
> -	}
> +	rc = cxl_region_validate_position(cxlr, cxled, pos);
> +	if (rc)
> +		return rc;
> +
> +	rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos);
> +	if (rc)
> +		return rc;
>  
>  	p->targets[pos] = cxled;
>  	cxled->pos = pos;

More something about original code than the refactoring...

I'm not keen on the side effects that aren't unwound in the error paths.

p->targets[pos] and cxled->pos are left set.  Probably never matters
but not elegant or as easy to reason about as it would be if they
were cleared in error cases.  In particular there is a check on
whether p->targets[pos] is set that will result in a dev_dbg even
though setting it up actually failed.


> @@ -1343,10 +1380,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  
>  err_decrement:
>  	p->nr_targets--;
> -err:
> -	for (iter = ep_port; !is_cxl_root(iter);
> -	     iter = to_cxl_port(iter->dev.parent))
> -		cxl_port_detach_region(iter, cxlr, cxled);
>  	return rc;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper
  2023-02-06  1:03 ` [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper Dan Williams
                     ` (2 preceding siblings ...)
  2023-02-06 23:41   ` Dave Jiang
@ 2023-02-08 12:32   ` Jonathan Cameron
  2023-02-09 19:47   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-08 12:32 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Kees Cook, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:03:13 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> In support of the CXL subsystem's use of 'struct range' to track decode
> address ranges, add a common range_contains() implementation with
> identical semantics as resource_contains();
> 
> The existing 'range_contains()' in lib/stackinit_kunit.c is namespaced
> with a 'stackinit_' prefix.
> 
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Seems reasonable.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/pci.c |    5 -----
>  include/linux/range.h  |    5 +++++
>  lib/stackinit_kunit.c  |    6 +++---
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 1d1492440287..9ed2120dbf8a 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -214,11 +214,6 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
>  	return devm_add_action_or_reset(host, clear_mem_enable, cxlds);
>  }
>  
> -static bool range_contains(struct range *r1, struct range *r2)
> -{
> -	return r1->start <= r2->start && r1->end >= r2->end;
> -}
> -
>  /* require dvsec ranges to be covered by a locked platform window */
>  static int dvsec_range_allowed(struct device *dev, void *arg)
>  {
> diff --git a/include/linux/range.h b/include/linux/range.h
> index 274681cc3154..7efb6a9b069b 100644
> --- a/include/linux/range.h
> +++ b/include/linux/range.h
> @@ -13,6 +13,11 @@ static inline u64 range_len(const struct range *range)
>  	return range->end - range->start + 1;
>  }
>  
> +static inline bool range_contains(struct range *r1, struct range *r2)
> +{
> +	return r1->start <= r2->start && r1->end >= r2->end;
> +}
> +
>  int add_range(struct range *range, int az, int nr_range,
>  		u64 start, u64 end);
>  
> diff --git a/lib/stackinit_kunit.c b/lib/stackinit_kunit.c
> index 4591d6cf5e01..05947a2feb93 100644
> --- a/lib/stackinit_kunit.c
> +++ b/lib/stackinit_kunit.c
> @@ -31,8 +31,8 @@ static volatile u8 forced_mask = 0xff;
>  static void *fill_start, *target_start;
>  static size_t fill_size, target_size;
>  
> -static bool range_contains(char *haystack_start, size_t haystack_size,
> -			   char *needle_start, size_t needle_size)
> +static bool stackinit_range_contains(char *haystack_start, size_t haystack_size,
> +				     char *needle_start, size_t needle_size)
>  {
>  	if (needle_start >= haystack_start &&
>  	    needle_start + needle_size <= haystack_start + haystack_size)
> @@ -175,7 +175,7 @@ static noinline void test_ ## name (struct kunit *test)		\
>  								\
>  	/* Validate that compiler lined up fill and target. */	\
>  	KUNIT_ASSERT_TRUE_MSG(test,				\
> -		range_contains(fill_start, fill_size,		\
> +		stackinit_range_contains(fill_start, fill_size,	\
>  			    target_start, target_size),		\
>  		"stack fill missed target!? "			\
>  		"(fill %zu wide, target offset by %d)\n",	\
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled
  2023-02-06  1:03 ` [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled Dan Williams
  2023-02-06 17:03   ` Gregory Price
  2023-02-06 23:57   ` Dave Jiang
@ 2023-02-08 12:36   ` Jonathan Cameron
  2023-02-09 20:17   ` Verma, Vishal L
  3 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-08 12:36 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:03:18 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Add help text and a label so the CXL_REGION config option can be
> toggled. This is mainly to enable compile testing without region
> support.

Hmm. Possibly pull the reasoning up here for why this might
want to be configurable at all.  I'm not sure I fully follow
your reasoning as enumerating existing regions 'should' be harmless
gathering of information, not something that could do any damage
- so who would turn this off?

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/Kconfig |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 0ac53c422c31..163c094e67ae 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -104,12 +104,22 @@ config CXL_SUSPEND
>  	depends on SUSPEND && CXL_MEM
>  
>  config CXL_REGION
> -	bool
> +	bool "CXL: Region Support"
>  	default CXL_BUS
>  	# For MAX_PHYSMEM_BITS
>  	depends on SPARSEMEM
>  	select MEMREGION
>  	select GET_FREE_REGION
> +	help
> +	  Enable the CXL core to enumerate and provision CXL regions. A CXL
> +	  region is defined by one or more CXL expanders that decode a given
> +	  system-physical address range. For CXL regions established by
> +	  platform-firmware this option enables memory error handling to
> +	  identify the devices participating in a given interleaved memory
> +	  range. Otherwise, platform-firmware managed CXL is enabled by being
> +	  placed in the system address map and does not need a driver.
> +
> +	  If unsure say 'y'
>  
>  config CXL_REGION_INVALIDATION_TEST
>  	bool "CXL: Region Cache Management Bypass (TEST)"
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 10/18] cxl/region: Fix passthrough-decoder detection
  2023-02-06  1:03 ` [PATCH 10/18] cxl/region: Fix passthrough-decoder detection Dan Williams
  2023-02-06  5:38   ` Greg KH
  2023-02-07  0:00   ` Dave Jiang
@ 2023-02-08 12:44   ` Jonathan Cameron
  2023-02-09 20:28   ` Verma, Vishal L
  3 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-08 12:44 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, stable, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:03:24 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> A passthrough decoder is a decoder that maps only 1 target. It is a
> special case because it does not impose any constraints on the
> interleave-math as compared to a decoder with multiple targets. Extend
> the passthrough case to multi-target-capable decoders that only have one
> target selected. I.e. the current code was only considering passthrough
> *ports* which are only a subset of the potential passthrough decoder
> scenarios.
> 
> Fixes: e4f6dfa9ef75 ("cxl/region: Fix 'distance' calculation with passthrough ports")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/region.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c82d3b6f3d1f..34cf95217901 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1019,10 +1019,10 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>  		int i, distance;
>  
>  		/*
> -		 * Passthrough ports impose no distance requirements between
> +		 * Passthrough decoders impose no distance requirements between
>  		 * peers

I think we have a terminology inconsistency.  My understanding was we were using
passthrough decoders for the special case where there is no programmable hardware.
In this case I think we are also considering the case where that hardware must
be programmed etc, it's just that we don't care about interleave.
I'd just explain what it is rather than trying to assign a term. 

Decoders that have a single target configured impose...



>  		 */
> -		if (port->nr_dports == 1)
> +		if (cxl_rr->nr_targets == 1)
>  			distance = 0;
>  		else
>  			distance = p->nr_targets / cxl_rr->nr_targets;
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 11/18] cxl/region: Add region autodiscovery
  2023-02-06  1:03 ` [PATCH 11/18] cxl/region: Add region autodiscovery Dan Williams
  2023-02-06 19:02   ` Ira Weiny
  2023-02-07 23:54   ` Dave Jiang
@ 2023-02-08 17:07   ` Jonathan Cameron
  2023-02-09  4:07     ` Dan Williams
  2 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-08 17:07 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:03:29 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Region autodiscovery is an asynchrounous state machine advanced by
> cxl_port_probe(). After the decoders on an endpoint port are enumerated
> they are scanned for actively enabled instances. Each active decoder is
> flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region.
> If a region does not already exist for the address range setting of the
> decoder one is created. That creation process may race with other
> decoders of the same region being discovered since cxl_port_probe() is
> asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is
> introduced to mitigate that race.
> 
> Once all decoders have arrived, "p->nr_targets == p->interleave_ways",
> they are sorted by their relative decode position. The sort algorithm
> involves finding the point in the cxl_port topology where one leg of the
> decode leads to deviceA and the other deviceB. At that point in the
> topology the target order in the 'struct cxl_switch_decoder' indicates
> the relative position of those endpoint decoders in the region.
> 
> >From that point the region goes through the same setup and validation  
> steps as user-created regions, but instead of programming the decoders
> it validates that driver would have written the same values to the
> decoders as were already present.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A few comments inline,

Thanks,

Jonathan

> ---
>  drivers/cxl/core/hdm.c    |    5 
>  drivers/cxl/core/port.c   |    2 
>  drivers/cxl/core/region.c |  496 ++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/cxl/cxl.h         |   16 +
>  drivers/cxl/port.c        |   26 ++
>  5 files changed, 531 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index dcc16d7cb8f3..174cddfec6e8 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -674,7 +674,7 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>  	up_read(&cxl_dpa_rwsem);
>  
>  	port->commit_end--;
> -	cxld->flags &= ~CXL_DECODER_F_ENABLE;
> +	cxld->flags &= ~(CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO);
>  
>  	return 0;
>  }
> @@ -719,7 +719,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  
>  	/* decoders are enabled if committed */
>  	if (committed) {
> -		cxld->flags |= CXL_DECODER_F_ENABLE;
> +		cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO;
>  		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
>  			cxld->flags |= CXL_DECODER_F_LOCK;
>  		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
> @@ -783,6 +783,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  		return rc;
>  	}
>  	*dpa_base += dpa_size + skip;
> +

:)  No comment.

>  	return 0;
>  }
>  


...


> +static int cmp_decode_pos(const void *a, const void *b)
> +{
> +	struct cxl_endpoint_decoder *cxled_a = *(typeof(cxled_a) *)a;
> +	struct cxl_endpoint_decoder *cxled_b = *(typeof(cxled_b) *)b;
> +	struct cxl_memdev *cxlmd_a = cxled_to_memdev(cxled_a);
> +	struct cxl_memdev *cxlmd_b = cxled_to_memdev(cxled_b);
> +	struct cxl_port *port_a = cxled_to_port(cxled_a);
> +	struct cxl_port *port_b = cxled_to_port(cxled_b);
> +	struct cxl_port *iter_a, *iter_b;
> +
> +	/* Exit early if any prior sorting failed */
> +	if (cxled_a->pos < 0 || cxled_b->pos < 0)
> +		return 0;
> +
> +	/*
> +	 * Walk up the hierarchy to find a shared port, find the decoder
> +	 * that maps the range, compare the relative position of those
> +	 * dport mappings.
> +	 */
> +	for (iter_a = port_a; iter_a; iter_a = next_port(iter_a)) {
> +		struct cxl_port *next_a, *next_b, *port;
> +		struct cxl_switch_decoder *cxlsd;
> +
> +		next_a = next_port(iter_a);
> +		for (iter_b = port_b; iter_b; iter_b = next_port(iter_b)) {
> +			int a_pos, b_pos, result;
> +			struct device *dev;
> +			unsigned int seq;
> +
> +			next_b = next_port(iter_b);
> +			if (next_a != next_b)
> +				continue;
> +			if (!next_a)
> +				goto out;
Perhaps treat this as a 'find the match' then carry on, so as to avoid having the
guts of the handling nested so deep.  So something like.

	struct cxl_switch_decoder *cxlsd;
	struct cxl_port *next_a, *next_b, *port;
	bool matched = false; // May be simpler test - I'm just lazy

	// possibly pull this out as a utility function so that
	// error handling can be tidied up by simply returning 
	// rather than gotos.
	for (iter_a = port_a; iter_a; iter_a = next_port(iter_a)) {
		next_a = next_port(iter_a);
		for (iter_b = port_b; iter_b; iter_b = next_port(iter_b)) {
			int a_pos, b_pos, result;
			struct device *dev;
			unsigned int seq;

			next_b = next_port(iter_b);
			if (next_a != next_b)
				continue;
			if (!next_a)
				goto out;
			port = next_a;
			found = true;
			break;
		}
		if (found)
			break;
	}
	if (!found)
		goto out;

	dev = device_find_child(&port->dev, cxled_a,
				decoder_match_range);
	if (!dev) {
		struct range *range = &cxled_a->cxld.hpa_range;
		dev_err(port->uport,
			"failed to find decoder that maps %#llx-:%#llx\n",
		range->start, range->end);
		cxled_a->pos = -1;
		return 0;
	}

	cxlsd = to_cxl_switch_decoder(dev);
	do {
		seq = read_seqbegin(&cxlsd->target_lock);
		find_positions(cxlsd, iter_a, iter_b, &a_pos, &b_pos);
	} while (read_seqretry(&cxlsd->target_lock, seq));

	if (a_pos < 0 || b_pos < 0) {
		dev_err(port->uport,
			"failed to find shared decoder for %s and %s\n",
			dev_name(cxlmd_a->dev.parent),
			dev_name(cxlmd_b->dev.parent));
		cxled_a->pos = -1;
		result = 0;
//if factored as suggested above then a goto here and drop
// the else would be neater again.

	} else {
		result = a_pos - b_pos;
		dev_dbg(port->uport, "%s: %s comes %s %s\n",
			dev_name(&cxlsd->cxld.dev),
			dev_name(cxlmd_a->dev.parent),
			result < 0 ? "before" : "after",
			dev_name(cxlmd_b->dev.parent));
		}
	put_device(dev);

	return result;
out:
	dev_err(cxlmd_a->dev.parent, "failed to find shared port with %s\n",
		dev_name(cxlmd_b->dev.parent));
	cxled_a->pos = -1;
	return 0;
}


> +			dev = device_find_child(&port->dev, cxled_a,
> +						decoder_match_range);
> +			if (!dev) {
> +				struct range *range = &cxled_a->cxld.hpa_range;
> +
> +				dev_err(port->uport,
> +					"failed to find decoder that maps %#llx-:%#llx\n",
> +					range->start, range->end);
> +				cxled_a->pos = -1;
> +				return 0;
> +			}
> +
> +			cxlsd = to_cxl_switch_decoder(dev);
> +			do {
> +				seq = read_seqbegin(&cxlsd->target_lock);
> +				find_positions(cxlsd, iter_a, iter_b, &a_pos,
> +					       &b_pos);
> +			} while (read_seqretry(&cxlsd->target_lock, seq));
> +
> +			if (a_pos < 0 || b_pos < 0) {
> +				dev_err(port->uport,
> +					"failed to find shared decoder for %s and %s\n",
> +					dev_name(cxlmd_a->dev.parent),
> +					dev_name(cxlmd_b->dev.parent));
> +				cxled_a->pos = -1;
> +				result = 0;
> +			} else {
> +				result = a_pos - b_pos;
> +				dev_dbg(port->uport, "%s: %s comes %s %s\n",
> +					dev_name(&cxlsd->cxld.dev),
> +					dev_name(cxlmd_a->dev.parent),
> +					result < 0 ? "before" : "after",
> +					dev_name(cxlmd_b->dev.parent));
> +			}
> +
> +			put_device(dev);
> +
> +			return result;
> +		}
> +	}
> +out:
> +	dev_err(cxlmd_a->dev.parent, "failed to find shared port with %s\n",
> +		dev_name(cxlmd_b->dev.parent));
> +	cxled_a->pos = -1;
> +	return 0;
> +}
> +
> +static int cxl_region_sort_targets(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	int i, rc = 0;
> +
> +	sort(p->targets, p->nr_targets, sizeof(p->targets[0]), cmp_decode_pos,
> +	     NULL);
> +
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> +		if (cxled->pos < 0)
> +			rc = -ENXIO;
> +		cxled->pos = i;
> +	}
> +
> +	dev_dbg(&cxlr->dev, "region sort %s\n", rc ? "failed" : "successful");
> +	return rc;
> +}
> +
>  static int cxl_region_attach(struct cxl_region *cxlr,
>  			     struct cxl_endpoint_decoder *cxled, int pos)
>  {
> @@ -1350,6 +1572,50 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  		return -EINVAL;
>  	}
>  
> +	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
> +		int i;
> +
> +		rc = cxl_region_attach_auto(cxlr, cxled, pos);
> +		if (rc)
> +			return rc;
> +
> +		/* await more targets to arrive... */
> +		if (p->nr_targets < p->interleave_ways)
> +			return 0;
> +
> +		/*
> +		 * All targets are here, which implies all PCI enumeration that
> +		 * affects this region has been completed. Walk the topology to
> +		 * sort the devices into their relative region decode position.
> +		 */
> +		rc = cxl_region_sort_targets(cxlr);
> +		if (rc)
> +			return rc;
> +
> +		for (i = 0; i < p->nr_targets; i++) {
> +			cxled = p->targets[i];
> +			ep_port = cxled_to_port(cxled);
> +			dport = cxl_find_dport_by_dev(root_port,
> +						      ep_port->host_bridge);
> +			rc = cxl_region_attach_position(cxlr, cxlrd, cxled,
> +							dport, i);
> +			if (rc)
> +				return rc;
> +		}
> +
> +		rc = cxl_region_setup_targets(cxlr);
> +		if (rc)
> +			return rc;
> +
> +		/*
> +		 * If target setup succeeds in the autodiscovery case
> +		 * then the region is already committed.
> +		 */
> +		p->state = CXL_CONFIG_COMMIT;
> +
> +		return 0;
> +	}
> +
>  	rc = cxl_region_validate_position(cxlr, cxled, pos);
>  	if (rc)
>  		return rc;
> @@ -1500,8 +1766,8 @@ static int detach_target(struct cxl_region *cxlr, int pos)
>  	return rc;
>  }
>  
> -static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
> -			    size_t len)
> +static ssize_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
> +			     size_t len)

?



...


> +
> +void cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct range *hpa = &cxled->cxld.hpa_range;
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	struct cxl_root_decoder *cxlrd;
> +	struct cxl_region_params *p;
> +	struct cxl_region *cxlr;
> +	struct cxl_port *root;
> +	bool attach = false;
> +	struct device *dev;
> +
> +	root = find_cxl_root(&cxlmd->dev);
> +	if (!root) {
> +		dev_err(cxlmd->dev.parent, "%s: failed to map CXL root\n",
> +			dev_name(&cxlmd->dev));
> +		return;

I'm not a fan of muddling on from errors at lower layers of code.
I'd push that decision up to the caller - so return an error code from here.
If caller just wants to ignore it (with appropriate comment on why) then
that is fine by me.


> +	}
> +
> +	dev = device_find_child(&root->dev, &cxld->hpa_range,
> +				  match_decoder_by_range);

device_find_child is acquiring a reference to the child I think.

> +	if (!dev) {
> +		dev_err(cxlmd->dev.parent,
> +			"%s:%s no CXL window for range %#llx:%#llx\n",
> +			dev_name(&cxlmd->dev), dev_name(&cxld->dev),
> +			cxld->hpa_range.start, cxld->hpa_range.end);
> +		goto out;
> +	}
> +	cxlrd = to_cxl_root_decoder(dev);
Here we've stashed the dev that we have a reference to in cxlrd.
> +
> +	mutex_lock(&cxlrd->range_lock);
> +	dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa,
> +				match_region_by_range);
> +	if (!dev)
> +		cxlr = construct_region(cxlrd, cxled);
> +	else
> +		cxlr = to_cxl_region(dev);
> +	mutex_unlock(&cxlrd->range_lock);
> +
> +	if (IS_ERR(cxlr))

In this path, do we drop the reference to cxlrd->dev?

> +		goto out;
> +
> +	attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
> +
> +	down_read(&cxl_region_rwsem);
> +	p = &cxlr->params;
> +	attach = p->state >= CXL_CONFIG_COMMIT;

Test is a bit non obvious.  In the tree I'm looking at
CXL_CONFIG_COMMIT is last value in the enum.


> +	up_read(&cxl_region_rwsem);
> +
> +	if (attach) {
> +		int rc = device_attach(&cxlr->dev);
> +
> +		/*
> +		 * If device_attach() fails the range may still be
> +		 * active via the platform-firmware memory map
> +		 */
> +		if (rc < 0)

We also want to know if the no matching driver found case happened so perhaps
		if (rc != 1)

> +			dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
> +				p->res);
> +	}
> +
> +	put_device(&cxlr->dev);
> +out:
> +	put_device(&root->dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, CXL);
> +
>  static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
>  {
>  	if (!test_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags))
> @@ -2103,6 +2554,15 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
>  	return 0;
>  }

> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index ca76879af1de..9b3765c5c81a 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -261,6 +261,8 @@ resource_size_t cxl_rcrb_to_component(struct device *dev,
>   * cxl_decoder flags that define the type of memory / devices this
>   * decoder supports as well as configuration lock status See "CXL 2.0
>   * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
> + * Additionally indicate whether decoder settings were autodetected,
> + * user customized.
>   */
>  #define CXL_DECODER_F_RAM   BIT(0)
>  #define CXL_DECODER_F_PMEM  BIT(1)
> @@ -268,6 +270,7 @@ resource_size_t cxl_rcrb_to_component(struct device *dev,
>  #define CXL_DECODER_F_TYPE3 BIT(3)
>  #define CXL_DECODER_F_LOCK  BIT(4)
>  #define CXL_DECODER_F_ENABLE    BIT(5)
> +#define CXL_DECODER_F_AUTO  BIT(6)

That's a bit nasty and unexpected - can we rename to make it really clear
it's not a hardware flag at the point of usage.
Or just put it in a a new structure field rather than pushing it in here.

>  #define CXL_DECODER_F_MASK  GENMASK(5, 0)

Hmm. This mask isn't currently used and now has
an odd meaning because of hiding a flag in bit(6).


...

>  
>  /*
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 5453771bf330..012a0c6f8476 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -30,6 +30,23 @@ static void schedule_detach(void *cxlmd)
>  	schedule_cxl_memdev_detach(cxlmd);
>  }
>  
> +static int discover_region(struct device *dev, void *data)
> +{
> +	const unsigned long flags = CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO;
> +	struct cxl_endpoint_decoder *cxled;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	if ((cxled->cxld.flags & flags) != flags)
> +		return 0;
> +
> +	cxl_add_to_region(cxled);
> +
> +	return 0;
> +}
> +
>  static int cxl_port_probe(struct device *dev)
>  {
>  	struct cxl_port *port = to_cxl_port(dev);
> @@ -78,6 +95,15 @@ static int cxl_port_probe(struct device *dev)
>  		return rc;
>  	}
>  
> +	if (!is_cxl_endpoint(port))
> +		return 0;
> +
> +	/*
> +	 * Now that all endpoint decoders are successfully enumerated,
> +	 * try to assemble regions from committed decoders
> +	 */
> +	device_for_each_child(dev, NULL, discover_region);
> +

There is a steady reduction in common code vs specific code for the two
cases in here.  Maybe it's time just to split the probe into

static int cxl_port_probe(struct device *dev)
{
	if (is_cxl_endpoint(port))
		cxl_port_probe_endpoint();
	else
		cxl_port_probe_not endpoint(); //with better naming.
}

Shared code is about 7 lines vs 40+ unshared.



>  	return 0;
>  }
>  
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 12/18] tools/testing/cxl: Define a fixed volatile configuration to parse
  2023-02-06  1:03 ` [PATCH 12/18] tools/testing/cxl: Define a fixed volatile configuration to parse Dan Williams
@ 2023-02-08 17:31   ` Jonathan Cameron
  2023-02-09 20:50     ` Dan Williams
  0 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-08 17:31 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:03:35 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Take two endpoints attached to the first switch on the first host-bridge
> in the cxl_test topology and define a pre-initialized region. This is a
> x2 interleave underneath a x1 CXL Window.
> 
> $ modprobe cxl_test
> $ # cxl list -Ru
> {
>   "region":"region3",
>   "resource":"0xf010000000",
>   "size":"512.00 MiB (536.87 MB)",
>   "interleave_ways":2,
>   "interleave_granularity":4096,
>   "decode_state":"commit"
> }
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/core.h      |    3 -
>  drivers/cxl/core/hdm.c       |    3 +
>  drivers/cxl/core/port.c      |    4 +
>  drivers/cxl/cxl.h            |    4 +
>  drivers/cxl/cxlmem.h         |    3 +
>  tools/testing/cxl/test/cxl.c |  146 +++++++++++++++++++++++++++++++++++++++---
>  6 files changed, 148 insertions(+), 15 deletions(-)
> 

> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 8130430ffbcf..8d0895cbae93 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -458,6 +458,7 @@ bool is_switch_decoder(struct device *dev)
>  {
>  	return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
>  }
> +EXPORT_SYMBOL_NS_GPL(is_switch_decoder, CXL);
>  
>  struct cxl_decoder *to_cxl_decoder(struct device *dev)
>  {
> @@ -485,6 +486,7 @@ struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
>  		return NULL;
>  	return container_of(dev, struct cxl_switch_decoder, cxld.dev);
>  }
> +EXPORT_SYMBOL_NS_GPL(to_cxl_switch_decoder, CXL);
>  
>  static void cxl_ep_release(struct cxl_ep *ep)
>  {
> @@ -528,7 +530,7 @@ static const struct device_type cxl_port_type = {
>  
>  bool is_cxl_port(struct device *dev)
>  {
> -	return dev->type == &cxl_port_type;
> +	return dev && dev->type == &cxl_port_type;

Adding that protection just for mocking seems a bit nasty.
Could you push the sanity check out to the caller or am I missing something?
Perhaps worth calling out the reason we'd call this on a NULL dev with a comment
or similar.


>  }
>  EXPORT_SYMBOL_NS_GPL(is_cxl_port, CXL);
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 9b3765c5c81a..4c6ee6c96f23 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -452,6 +452,7 @@ struct cxl_region_params {
>   * struct cxl_region - CXL region
>   * @dev: This region's device
>   * @id: This region's id. Id is globally unique across all regions
> + * @fixed: At least one decoder in this region was locked down at init
>   * @mode: Endpoint decoder allocation / access mode
>   * @type: Endpoint decoder target type
>   * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
> @@ -462,6 +463,7 @@ struct cxl_region_params {
>  struct cxl_region {
>  	struct device dev;
>  	int id;
> +	bool fixed;

I was wondering why a mocking patch was changing the region state structure...
It isn't - this isn't used that I can find.

>  	enum cxl_decoder_mode mode;
>  	enum cxl_decoder_type type;
>  	struct cxl_nvdimm_bridge *cxl_nvb;


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 14/18] dax/hmem: Drop unnecessary dax_hmem_remove()
  2023-02-06  1:03 ` [PATCH 14/18] dax/hmem: Drop unnecessary dax_hmem_remove() Dan Williams
  2023-02-06 17:15   ` Gregory Price
@ 2023-02-08 17:33   ` Jonathan Cameron
  1 sibling, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-08 17:33 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:03:46 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Empty driver remove callbacks can just be elided.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/dax/hmem/hmem.c |    7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 1bf040dbc834..c7351e0dc8ff 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -44,15 +44,8 @@ static int dax_hmem_probe(struct platform_device *pdev)
>  	return 0;
>  }
>  
> -static int dax_hmem_remove(struct platform_device *pdev)
> -{
> -	/* devm handles teardown */
> -	return 0;
> -}
> -
>  static struct platform_driver dax_hmem_driver = {
>  	.probe = dax_hmem_probe,
> -	.remove = dax_hmem_remove,
>  	.driver = {
>  		.name = "hmem",
>  	},
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 15/18] dax/hmem: Convey the dax range via memregion_info()
  2023-02-06  1:03 ` [PATCH 15/18] dax/hmem: Convey the dax range via memregion_info() Dan Williams
@ 2023-02-08 17:35   ` Jonathan Cameron
  0 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-08 17:35 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Sun, 05 Feb 2023 17:03:53 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for hmem platform devices to be unregistered, stop using
> platform_device_add_resources() to convey the address range. The
> platform_device_add_resources() API causes an existing "Soft Reserved"
> iomem resource to be re-parented under an inserted platform device
> resource. When that platform device is deleted it removes the platform
> device resource and all children.
> 
> Instead, it is sufficient to convey just the address range and let
> request_mem_region() insert resources to indicate the devices active in
> the range. This allows the "Soft Reserved" resource to be re-enumerated
> upon the next probe event.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Seems sensible to me.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/dax/hmem/device.c |   37 ++++++++++++++-----------------------
>  drivers/dax/hmem/hmem.c   |   14 +++-----------
>  include/linux/memregion.h |    2 ++
>  3 files changed, 19 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index 20749c7fab81..b1b339bccfe5 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -15,15 +15,8 @@ static struct resource hmem_active = {
>  	.flags = IORESOURCE_MEM,
>  };
>  
> -void hmem_register_device(int target_nid, struct resource *r)
> +void hmem_register_device(int target_nid, struct resource *res)
>  {
> -	/* define a clean / non-busy resource for the platform device */
> -	struct resource res = {
> -		.start = r->start,
> -		.end = r->end,
> -		.flags = IORESOURCE_MEM,
> -		.desc = IORES_DESC_SOFT_RESERVED,
> -	};
>  	struct platform_device *pdev;
>  	struct memregion_info info;
>  	int rc, id;
> @@ -31,55 +24,53 @@ void hmem_register_device(int target_nid, struct resource *r)
>  	if (nohmem)
>  		return;
>  
> -	rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
> -			IORES_DESC_SOFT_RESERVED);
> +	rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			       IORES_DESC_SOFT_RESERVED);
>  	if (rc != REGION_INTERSECTS)
>  		return;
>  
>  	id = memregion_alloc(GFP_KERNEL);
>  	if (id < 0) {
> -		pr_err("memregion allocation failure for %pr\n", &res);
> +		pr_err("memregion allocation failure for %pr\n", res);
>  		return;
>  	}
>  
>  	pdev = platform_device_alloc("hmem", id);
>  	if (!pdev) {
> -		pr_err("hmem device allocation failure for %pr\n", &res);
> +		pr_err("hmem device allocation failure for %pr\n", res);
>  		goto out_pdev;
>  	}
>  
> -	if (!__request_region(&hmem_active, res.start, resource_size(&res),
> +	if (!__request_region(&hmem_active, res->start, resource_size(res),
>  			      dev_name(&pdev->dev), 0)) {
> -		dev_dbg(&pdev->dev, "hmem range %pr already active\n", &res);
> +		dev_dbg(&pdev->dev, "hmem range %pr already active\n", res);
>  		goto out_active;
>  	}
>  
>  	pdev->dev.numa_node = numa_map_to_online_node(target_nid);
>  	info = (struct memregion_info) {
>  		.target_node = target_nid,
> +		.range = {
> +			.start = res->start,
> +			.end = res->end,
> +		},
>  	};
>  	rc = platform_device_add_data(pdev, &info, sizeof(info));
>  	if (rc < 0) {
> -		pr_err("hmem memregion_info allocation failure for %pr\n", &res);
> -		goto out_resource;
> -	}
> -
> -	rc = platform_device_add_resources(pdev, &res, 1);
> -	if (rc < 0) {
> -		pr_err("hmem resource allocation failure for %pr\n", &res);
> +		pr_err("hmem memregion_info allocation failure for %pr\n", res);
>  		goto out_resource;
>  	}
>  
>  	rc = platform_device_add(pdev);
>  	if (rc < 0) {
> -		dev_err(&pdev->dev, "device add failed for %pr\n", &res);
> +		dev_err(&pdev->dev, "device add failed for %pr\n", res);
>  		goto out_resource;
>  	}
>  
>  	return;
>  
>  out_resource:
> -	__release_region(&hmem_active, res.start, resource_size(&res));
> +	__release_region(&hmem_active, res->start, resource_size(res));
>  out_active:
>  	platform_device_put(pdev);
>  out_pdev:
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index c7351e0dc8ff..5025a8c9850b 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -15,25 +15,17 @@ static int dax_hmem_probe(struct platform_device *pdev)
>  	struct memregion_info *mri;
>  	struct dev_dax_data data;
>  	struct dev_dax *dev_dax;
> -	struct resource *res;
> -	struct range range;
> -
> -	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -	if (!res)
> -		return -ENOMEM;
>  
>  	mri = dev->platform_data;
> -	range.start = res->start;
> -	range.end = res->end;
> -	dax_region = alloc_dax_region(dev, pdev->id, &range, mri->target_node,
> -			PMD_SIZE, 0);
> +	dax_region = alloc_dax_region(dev, pdev->id, &mri->range,
> +				      mri->target_node, PMD_SIZE, 0);
>  	if (!dax_region)
>  		return -ENOMEM;
>  
>  	data = (struct dev_dax_data) {
>  		.dax_region = dax_region,
>  		.id = -1,
> -		.size = region_idle ? 0 : resource_size(res),
> +		.size = region_idle ? 0 : range_len(&mri->range),
>  	};
>  	dev_dax = devm_create_dev_dax(&data);
>  	if (IS_ERR(dev_dax))
> diff --git a/include/linux/memregion.h b/include/linux/memregion.h
> index bf83363807ac..c01321467789 100644
> --- a/include/linux/memregion.h
> +++ b/include/linux/memregion.h
> @@ -3,10 +3,12 @@
>  #define _MEMREGION_H_
>  #include <linux/types.h>
>  #include <linux/errno.h>
> +#include <linux/range.h>
>  #include <linux/bug.h>
>  
>  struct memregion_info {
>  	int target_node;
> +	struct range range;
>  };
>  
>  #ifdef CONFIG_MEMREGION
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
       [not found] ` <CGME20230208173730uscas1p2af3a9eeb8946dfa607b190c079a49653@uscas1p2.samsung.com>
@ 2023-02-08 17:37   ` Fan Ni
  2023-02-09  4:56     ` Dan Williams
  0 siblings, 1 reply; 112+ messages in thread
From: Fan Ni @ 2023-02-08 17:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, David Hildenbrand, Kees Cook, stable, Dave Hansen,
	Michal Hocko, linux-mm, linux-acpi, Adam Manzanares, dave

On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:


> Summary:
> --------
> 
> CXL RAM support allows for the dynamic provisioning of new CXL RAM
> regions, and more routinely, assembling a region from an existing
> configuration established by platform-firmware. The latter is motivated
> by CXL memory RAS (Reliability, Availability and Serviceability)
> support, that requires associating device events with System Physical
> Address ranges and vice versa.
> 
> The 'Soft Reserved' policy rework arranges for performance
> differentiated memory like CXL attached DRAM, or high-bandwidth memory,
> to be designated for 'System RAM' by default, rather than the device-dax
> dedicated access mode. That current device-dax default is confusing and
> surprising for the Pareto of users that do not expect memory to be
> quarantined for dedicated access by default. Most users expect all
> 'System RAM'-capable memory to show up in FREE(1).
> 
> 
> Details:
> --------
> 
> Recall that the Linux 'Soft Reserved' designation for memory is a
> reaction to platform-firmware, like EFI EDK2, delineating memory with
> the EFI Specific Purpose Memory attribute (EFI_MEMORY_SP). An
> alternative way to think of that attribute is that it specifies the
> *not* general-purpose memory pool. It is memory that may be too precious
> for general usage or not performant enough for some hot data structures.
> However, in the absence of explicit policy it should just be 'System
> RAM' by default.
> 
> Rather than require every distribution to ship a udev policy to assign
> dax devices to dax_kmem (the device-memory hotplug driver) just make
> that the kernel default. This is similar to the rationale in:
> 
> commit 8604d9e534a3 ("memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE")
> 
> With this change the relatively niche use case of accessing this memory
> via mapping a device-dax instance can be achieved by building with
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, or specifying
> memhp_default_state=offline at boot, and then use:
> 
>     daxctl reconfigure-device $device -m devdax --force
> 
> ...to shift the corresponding address range to device-dax access.
> 
> The process of assembling a device-dax instance for a given CXL region
> device configuration is similar to the process of assembling a
> Device-Mapper or MDRAID storage-device array. Specifically, asynchronous
> probing by the PCI and driver core enumerates all CXL endpoints and
> their decoders. Then, once enough decoders have arrived to a describe a
> given region, that region is passed to the device-dax subsystem where it
> is subject to the above 'dax_kmem' policy. This assignment and policy
> choice is only possible if memory is set aside by the 'Soft Reserved'
> designation. Otherwise, CXL that is mapped as 'System RAM' becomes
> immutable by CXL driver mechanisms, but is still enumerated for RAS
> purposes.
> 
> This series is also available via:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region
> 
> ...and has gone through some preview testing in various forms.
> 
> ---

Tested-by: Fan Ni <fan.ni@samsung.com>


Run the following tests with the patch (with the volatile support at qemu).
Note: cxl related code are compiled as modules and loaded before used.

For pmem setup, tried three topologies (1HB1RP1Mem, 1HB2RP2Mem, 1HB2RP4Mem with
a cxl switch). The memdev is either provided in the command line when launching
qemu or hot added to the guest with device_add command in qemu monitor.

The following operations are performed,
1. create-region with cxl cmd
2. create name-space with ndctl cmd
3. convert cxl mem to ram with daxctl cmd
4. online the memory with daxctl cmd
5. Let app use the memory (numactl --membind=1 htop)

Results: No regression.

For volatile memory (hot add with device_add command), mainly tested 1HB1RP1Mem
case (passthrough).
1. the device can be correctly discovered after hot add (cxl list, may need
cxl enable-memdev)
2. creating ram region (cxl create-region) succeeded, after creating the
region, a dax device under /dev/ is shown.
3. online the memory passes, and the memory is shown on another NUMA node.
4. Let app use the memory (numactl --membind=1 htop) passed.




> 
> Dan Williams (18):
>       cxl/Documentation: Update references to attributes added in v6.0
>       cxl/region: Add a mode attribute for regions
>       cxl/region: Support empty uuids for non-pmem regions
>       cxl/region: Validate region mode vs decoder mode
>       cxl/region: Add volatile region creation support
>       cxl/region: Refactor attach_target() for autodiscovery
>       cxl/region: Move region-position validation to a helper
>       kernel/range: Uplevel the cxl subsystem's range_contains() helper
>       cxl/region: Enable CONFIG_CXL_REGION to be toggled
>       cxl/region: Fix passthrough-decoder detection
>       cxl/region: Add region autodiscovery
>       tools/testing/cxl: Define a fixed volatile configuration to parse
>       dax/hmem: Move HMAT and Soft reservation probe initcall level
>       dax/hmem: Drop unnecessary dax_hmem_remove()
>       dax/hmem: Convey the dax range via memregion_info()
>       dax/hmem: Move hmem device registration to dax_hmem.ko
>       dax: Assign RAM regions to memory-hotplug by default
>       cxl/dax: Create dax devices for CXL RAM regions
> 
> 
>  Documentation/ABI/testing/sysfs-bus-cxl |   64 +-
>  MAINTAINERS                             |    1 
>  drivers/acpi/numa/hmat.c                |    4 
>  drivers/cxl/Kconfig                     |   12 
>  drivers/cxl/acpi.c                      |    3 
>  drivers/cxl/core/core.h                 |    7 
>  drivers/cxl/core/hdm.c                  |    8 
>  drivers/cxl/core/pci.c                  |    5 
>  drivers/cxl/core/port.c                 |   34 +
>  drivers/cxl/core/region.c               |  848 ++++++++++++++++++++++++++++---
>  drivers/cxl/cxl.h                       |   46 ++
>  drivers/cxl/cxlmem.h                    |    3 
>  drivers/cxl/port.c                      |   26 +
>  drivers/dax/Kconfig                     |   17 +
>  drivers/dax/Makefile                    |    2 
>  drivers/dax/bus.c                       |   53 +-
>  drivers/dax/bus.h                       |   12 
>  drivers/dax/cxl.c                       |   53 ++
>  drivers/dax/device.c                    |    3 
>  drivers/dax/hmem/Makefile               |    3 
>  drivers/dax/hmem/device.c               |  102 ++--
>  drivers/dax/hmem/hmem.c                 |  148 +++++
>  drivers/dax/kmem.c                      |    1 
>  include/linux/dax.h                     |    7 
>  include/linux/memregion.h               |    2 
>  include/linux/range.h                   |    5 
>  lib/stackinit_kunit.c                   |    6 
>  tools/testing/cxl/test/cxl.c            |  146 +++++
>  28 files changed, 1355 insertions(+), 266 deletions(-)
>  create mode 100644 drivers/dax/cxl.c
> 
> base-commit: 172738bbccdb4ef76bdd72fc72a315c741c39161
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0
  2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
                     ` (4 preceding siblings ...)
  2023-02-06 21:04   ` Dave Jiang
@ 2023-02-09  0:20   ` Verma, Vishal L
  5 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09  0:20 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: linux-mm, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:02 -0800, Dan Williams wrote:
> Prior to Linus deciding that the kernel that following v5.19 would be
> v6.0, the CXL ABI documentation already referenced v5.20. In preparation
> for updating these entries update the kernel version to v6.0.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)

Looks good,

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>

> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 329a7e46c805..5be032313e29 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -198,7 +198,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/endpointX/CDAT
>  Date:          July, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RO) If this sysfs entry is not present no DOE mailbox was
> @@ -209,7 +209,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/decoderX.Y/mode
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> @@ -229,7 +229,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/decoderX.Y/dpa_resource
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint",
> @@ -240,7 +240,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/decoderX.Y/dpa_size
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> @@ -260,7 +260,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/decoderX.Y/interleave_ways
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RO) The number of targets across which this decoder's host
> @@ -275,7 +275,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/decoderX.Y/interleave_granularity
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RO) The number of consecutive bytes of host physical address
> @@ -287,7 +287,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/decoderX.Y/create_pmem_region
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) Write a string in the form 'regionZ' to start the process
> @@ -303,7 +303,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/decoderX.Y/delete_region
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (WO) Write a string in the form 'regionZ' to delete that region,
> @@ -312,7 +312,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/regionZ/uuid
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) Write a unique identifier for the region. This field must
> @@ -322,7 +322,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/regionZ/interleave_granularity
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) Set the number of consecutive bytes each device in the
> @@ -333,7 +333,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/regionZ/interleave_ways
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) Configures the number of devices participating in the
> @@ -343,7 +343,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/regionZ/size
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) System physical address space to be consumed by the region.
> @@ -360,7 +360,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/regionZ/resource
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RO) A region is a contiguous partition of a CXL root decoder
> @@ -372,7 +372,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/regionZ/target[0..N]
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) Write an endpoint decoder object name to 'targetX' where X
> @@ -391,7 +391,7 @@ Description:
>  
>  What:          /sys/bus/cxl/devices/regionZ/commit
>  Date:          May, 2022
> -KernelVersion: v5.20
> +KernelVersion: v6.0
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) Write a boolean 'true' string value to this attribute to
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 02/18] cxl/region: Add a mode attribute for regions
  2023-02-06  1:02 ` [PATCH 02/18] cxl/region: Add a mode attribute for regions Dan Williams
                     ` (3 preceding siblings ...)
  2023-02-06 21:05   ` Dave Jiang
@ 2023-02-09  0:22   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09  0:22 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: linux-mm, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:02 -0800, Dan Williams wrote:
> In preparation for a new region type, "ram" regions, add a mode
> attribute to clarify the mode of the decoders that can be added to a
> region. Share the internals of mode_show() (for decoders) with the
> region case.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   11 +++++++++++
>  drivers/cxl/core/port.c                 |   12 +-----------
>  drivers/cxl/core/region.c               |   10 ++++++++++
>  drivers/cxl/cxl.h                       |   14 ++++++++++++++
>  4 files changed, 36 insertions(+), 11 deletions(-)

Looks good, thanks for adding this - makes libcxl's job much easier!

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>

> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl
> b/Documentation/ABI/testing/sysfs-bus-cxl
> index 5be032313e29..058b0c45001f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -358,6 +358,17 @@ Description:
>                 results in the same address being allocated.
>  
>  
> +What:          /sys/bus/cxl/devices/regionZ/mode
> +Date:          January, 2023
> +KernelVersion: v6.3
> +Contact:       linux-cxl@vger.kernel.org
> +Description:
> +               (RO) The mode of a region is established at region
> creation time
> +               and dictates the mode of the endpoint decoder that
> comprise the
> +               region. For more details on the possible modes see
> +               /sys/bus/cxl/devices/decoderX.Y/mode
> +
> +
>  What:          /sys/bus/cxl/devices/regionZ/resource
>  Date:          May, 2022
>  KernelVersion: v6.0
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 410c036c09fa..8566451cb22f 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -180,17 +180,7 @@ static ssize_t mode_show(struct device *dev,
> struct device_attribute *attr,
>  {
>         struct cxl_endpoint_decoder *cxled =
> to_cxl_endpoint_decoder(dev);
>  
> -       switch (cxled->mode) {
> -       case CXL_DECODER_RAM:
> -               return sysfs_emit(buf, "ram\n");
> -       case CXL_DECODER_PMEM:
> -               return sysfs_emit(buf, "pmem\n");
> -       case CXL_DECODER_NONE:
> -               return sysfs_emit(buf, "none\n");
> -       case CXL_DECODER_MIXED:
> -       default:
> -               return sysfs_emit(buf, "mixed\n");
> -       }
> +       return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled-
> >mode));
>  }
>  
>  static ssize_t mode_store(struct device *dev, struct
> device_attribute *attr,
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 60828d01972a..17d2d0c12725 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -458,6 +458,15 @@ static ssize_t resource_show(struct device *dev,
> struct device_attribute *attr,
>  }
>  static DEVICE_ATTR_RO(resource);
>  
> +static ssize_t mode_show(struct device *dev, struct device_attribute
> *attr,
> +                        char *buf)
> +{
> +       struct cxl_region *cxlr = to_cxl_region(dev);
> +
> +       return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr-
> >mode));
> +}
> +static DEVICE_ATTR_RO(mode);
> +
>  static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
>  {
>         struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr-
> >dev.parent);
> @@ -585,6 +594,7 @@ static struct attribute *cxl_region_attrs[] = {
>         &dev_attr_interleave_granularity.attr,
>         &dev_attr_resource.attr,
>         &dev_attr_size.attr,
> +       &dev_attr_mode.attr,
>         NULL,
>  };
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index aa3af3bb73b2..ca76879af1de 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -320,6 +320,20 @@ enum cxl_decoder_mode {
>         CXL_DECODER_DEAD,
>  };
>  
> +static inline const char *cxl_decoder_mode_name(enum
> cxl_decoder_mode mode)
> +{
> +       static const char * const names[] = {
> +               [CXL_DECODER_NONE] = "none",
> +               [CXL_DECODER_RAM] = "ram",
> +               [CXL_DECODER_PMEM] = "pmem",
> +               [CXL_DECODER_MIXED] = "mixed",
> +       };
> +
> +       if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
> +               return names[mode];
> +       return "mixed";
> +}
> +
>  /**
>   * struct cxl_endpoint_decoder - Endpoint  / SPA to DPA decoder
>   * @cxld: base cxl_decoder_object
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions
  2023-02-06  1:02 ` [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions Dan Williams
  2023-02-06 15:54   ` Jonathan Cameron
  2023-02-06 19:22   ` Ira Weiny
@ 2023-02-09  0:24   ` Verma, Vishal L
  2 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09  0:24 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: linux-mm, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:02 -0800, Dan Williams wrote:
> Shipping versions of the cxl-cli utility expect all regions to have a
> 'uuid' attribute. In preparation for 'ram' regions, update the 'uuid'
> attribute to return an empty string which satisfies the current
> expectations of 'cxl list -R'. Otherwise, 'cxl list -R' fails in the
> presence of regions with the 'uuid' attribute missing. Force the
> attribute to be read-only as there is no facility or expectation for a
> 'ram' region to recall its uuid from one boot to the next.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |    3 ++-
>  drivers/cxl/core/region.c               |    7 +++++--
>  2 files changed, 7 insertions(+), 3 deletions(-)

Looks good,

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>

> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 058b0c45001f..4c4e1cbb1169 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -317,7 +317,8 @@ Contact:    linux-cxl@vger.kernel.org
>  Description:
>                 (RW) Write a unique identifier for the region. This field must
>                 be set for persistent regions and it must not conflict with the
> -               UUID of another region.
> +               UUID of another region. For volatile ram regions this
> +               attribute is a read-only empty string.
>  
>  
>  What:          /sys/bus/cxl/devices/regionZ/interleave_granularity
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 17d2d0c12725..c9e7f05caa0f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -45,7 +45,10 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
>         rc = down_read_interruptible(&cxl_region_rwsem);
>         if (rc)
>                 return rc;
> -       rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
> +       if (cxlr->mode != CXL_DECODER_PMEM)
> +               rc = sysfs_emit(buf, "\n");
> +       else
> +               rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
>         up_read(&cxl_region_rwsem);
>  
>         return rc;
> @@ -301,7 +304,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
>         struct cxl_region *cxlr = to_cxl_region(dev);
>  
>         if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM)
> -               return 0;
> +               return 0444;
>         return a->mode;
>  }
>  
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 04/18] cxl/region: Validate region mode vs decoder mode
  2023-02-06  1:02 ` [PATCH 04/18] cxl/region: Validate region mode vs decoder mode Dan Williams
                     ` (3 preceding siblings ...)
  2023-02-06 22:16   ` Dave Jiang
@ 2023-02-09  0:25   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09  0:25 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: linux-mm, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:02 -0800, Dan Williams wrote:
> In preparation for a new region mode, do not, for example, allow
> 'ram' decoders to be assigned to 'pmem' regions and vice versa.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/region.c |    6 ++++++
>  1 file changed, 6 insertions(+)

Looks good,

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>

> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c9e7f05caa0f..53d6dbe4de6d 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1217,6 +1217,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>         struct cxl_dport *dport;
>         int i, rc = -ENXIO;
>  
> +       if (cxled->mode != cxlr->mode) {
> +               dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> +                       dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> +               return -EINVAL;
> +       }
> +
>         if (cxled->mode == CXL_DECODER_DEAD) {
>                 dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
>                 return -ENODEV;
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 05/18] cxl/region: Add volatile region creation support
  2023-02-06  1:02 ` [PATCH 05/18] cxl/region: Add volatile region creation support Dan Williams
                     ` (3 preceding siblings ...)
  2023-02-06 22:31   ` Dave Jiang
@ 2023-02-09  1:02   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09  1:02 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: linux-mm, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:02 -0800, Dan Williams wrote:
> Expand the region creation infrastructure to enable 'ram'
> (volatile-memory) regions. The internals of create_pmem_region_store()
> and create_pmem_region_show() are factored out into helpers
> __create_region() and __create_region_show() for the 'ram' case to
> reuse.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   22 +++++-----
>  drivers/cxl/core/core.h                 |    1 
>  drivers/cxl/core/port.c                 |   14 ++++++
>  drivers/cxl/core/region.c               |   71 +++++++++++++++++++++++++------
>  4 files changed, 82 insertions(+), 26 deletions(-)

Looks good,

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>

> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 4c4e1cbb1169..3acf2f17a73f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -285,20 +285,20 @@ Description:
>                 interleave_granularity).
>  
>  
> -What:          /sys/bus/cxl/devices/decoderX.Y/create_pmem_region
> -Date:          May, 2022
> -KernelVersion: v6.0
> +What:          /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
> +Date:          May, 2022, January, 2023
> +KernelVersion: v6.0 (pmem), v6.3 (ram)
>  Contact:       linux-cxl@vger.kernel.org
>  Description:
>                 (RW) Write a string in the form 'regionZ' to start the process
> -               of defining a new persistent memory region (interleave-set)
> -               within the decode range bounded by root decoder 'decoderX.Y'.
> -               The value written must match the current value returned from
> -               reading this attribute. An atomic compare exchange operation is
> -               done on write to assign the requested id to a region and
> -               allocate the region-id for the next creation attempt. EBUSY is
> -               returned if the region name written does not match the current
> -               cached value.
> +               of defining a new persistent, or volatile memory region
> +               (interleave-set) within the decode range bounded by root decoder
> +               'decoderX.Y'. The value written must match the current value
> +               returned from reading this attribute. An atomic compare exchange
> +               operation is done on write to assign the requested id to a
> +               region and allocate the region-id for the next creation attempt.
> +               EBUSY is returned if the region name written does not match the
> +               current cached value.
>  
>  
>  What:          /sys/bus/cxl/devices/decoderX.Y/delete_region
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 8c04672dca56..5eb873da5a30 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -11,6 +11,7 @@ extern struct attribute_group cxl_base_attribute_group;
>  
>  #ifdef CONFIG_CXL_REGION
>  extern struct device_attribute dev_attr_create_pmem_region;
> +extern struct device_attribute dev_attr_create_ram_region;
>  extern struct device_attribute dev_attr_delete_region;
>  extern struct device_attribute dev_attr_region;
>  extern const struct device_type cxl_pmem_region_type;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 8566451cb22f..47e450c3a5a9 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -294,6 +294,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>         &dev_attr_cap_type3.attr,
>         &dev_attr_target_list.attr,
>         SET_CXL_REGION_ATTR(create_pmem_region)
> +       SET_CXL_REGION_ATTR(create_ram_region)
>         SET_CXL_REGION_ATTR(delete_region)
>         NULL,
>  };
> @@ -305,6 +306,13 @@ static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
>         return (cxlrd->cxlsd.cxld.flags & flags) == flags;
>  }
>  
> +static bool can_create_ram(struct cxl_root_decoder *cxlrd)
> +{
> +       unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
> +
> +       return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> +}
> +
>  static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
>  {
>         struct device *dev = kobj_to_dev(kobj);
> @@ -313,7 +321,11 @@ static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *
>         if (a == CXL_REGION_ATTR(create_pmem_region) && !can_create_pmem(cxlrd))
>                 return 0;
>  
> -       if (a == CXL_REGION_ATTR(delete_region) && !can_create_pmem(cxlrd))
> +       if (a == CXL_REGION_ATTR(create_ram_region) && !can_create_ram(cxlrd))
> +               return 0;
> +
> +       if (a == CXL_REGION_ATTR(delete_region) &&
> +           !(can_create_pmem(cxlrd) || can_create_ram(cxlrd)))
>                 return 0;
>  
>         return a->mode;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 53d6dbe4de6d..8dea49c021b8 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1685,6 +1685,15 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>         struct device *dev;
>         int rc;
>  
> +       switch (mode) {
> +       case CXL_DECODER_RAM:
> +       case CXL_DECODER_PMEM:
> +               break;
> +       default:
> +               dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
> +               return ERR_PTR(-EINVAL);
> +       }
> +
>         cxlr = cxl_region_alloc(cxlrd, id);
>         if (IS_ERR(cxlr))
>                 return cxlr;
> @@ -1713,12 +1722,38 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>         return ERR_PTR(rc);
>  }
>  
> +static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf)
> +{
> +       return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
> +}
> +
>  static ssize_t create_pmem_region_show(struct device *dev,
>                                        struct device_attribute *attr, char *buf)
>  {
> -       struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +       return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
>  
> -       return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
> +static ssize_t create_ram_region_show(struct device *dev,
> +                                     struct device_attribute *attr, char *buf)
> +{
> +       return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
> +
> +static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
> +                                         enum cxl_decoder_mode mode, int id)
> +{
> +       int rc;
> +
> +       rc = memregion_alloc(GFP_KERNEL);
> +       if (rc < 0)
> +               return ERR_PTR(rc);
> +
> +       if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
> +               memregion_free(rc);
> +               return ERR_PTR(-EBUSY);
> +       }
> +
> +       return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_EXPANDER);
>  }
>  
>  static ssize_t create_pmem_region_store(struct device *dev,
> @@ -1727,29 +1762,37 @@ static ssize_t create_pmem_region_store(struct device *dev,
>  {
>         struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>         struct cxl_region *cxlr;
> -       int id, rc;
> +       int rc, id;
>  
>         rc = sscanf(buf, "region%d\n", &id);
>         if (rc != 1)
>                 return -EINVAL;
>  
> -       rc = memregion_alloc(GFP_KERNEL);
> -       if (rc < 0)
> -               return rc;
> +       cxlr = __create_region(cxlrd, CXL_DECODER_PMEM, id);
> +       if (IS_ERR(cxlr))
> +               return PTR_ERR(cxlr);
> +       return len;
> +}
> +DEVICE_ATTR_RW(create_pmem_region);
>  
> -       if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
> -               memregion_free(rc);
> -               return -EBUSY;
> -       }
> +static ssize_t create_ram_region_store(struct device *dev,
> +                                      struct device_attribute *attr,
> +                                      const char *buf, size_t len)
> +{
> +       struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +       struct cxl_region *cxlr;
> +       int rc, id;
>  
> -       cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM,
> -                                  CXL_DECODER_EXPANDER);
> +       rc = sscanf(buf, "region%d\n", &id);
> +       if (rc != 1)
> +               return -EINVAL;
> +
> +       cxlr = __create_region(cxlrd, CXL_DECODER_RAM, id);
>         if (IS_ERR(cxlr))
>                 return PTR_ERR(cxlr);
> -
>         return len;
>  }
> -DEVICE_ATTR_RW(create_pmem_region);
> +DEVICE_ATTR_RW(create_ram_region);
>  
>  static ssize_t region_show(struct device *dev, struct device_attribute *attr,
>                            char *buf)
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery
  2023-02-06  1:03 ` [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery Dan Williams
                     ` (2 preceding siblings ...)
  2023-02-06 22:41   ` Dave Jiang
@ 2023-02-09  1:09   ` Verma, Vishal L
  3 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09  1:09 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: linux-mm, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:03 -0800, Dan Williams wrote:
> Region autodiscovery is the process of kernel creating 'struct
> cxl_region' object to represent active CXL memory ranges it finds
> already active in hardware when the driver loads. Typically this happens
> when platform firmware establishes CXL memory regions and then publishes
> them in the memory map. However, this can also happen in the case of
> kexec-reboot after the kernel has created regions.
> 
> In the autodiscovery case the region creation process starts with a
> known endpoint decoder. Refactor attach_target() into a helper that is
> suitable to be called from either sysfs, for runtime region creation, or
> from cxl_port_probe() after it has enumerated all endpoint decoders.
> 
> The cxl_port_probe() context is an async device-core probing context, so
> it is not appropriate to allow SIGTERM to interrupt the assembly
> process. Refactor attach_target() to take @cxled and @state as arguments
> where @state indicates whether waiting from the region rwsem is
> interruptible or not.
> 
> No behavior change is intended.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/region.c |   47 +++++++++++++++++++++++++++------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)

Looks good,

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>

> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 8dea49c021b8..97eafdd75675 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1418,31 +1418,25 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
>         up_write(&cxl_region_rwsem);
>  }
>  
> -static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos)
> +static int attach_target(struct cxl_region *cxlr,
> +                        struct cxl_endpoint_decoder *cxled, int pos,
> +                        unsigned int state)
>  {
> -       struct device *dev;
> -       int rc;
> -
> -       dev = bus_find_device_by_name(&cxl_bus_type, NULL, decoder);
> -       if (!dev)
> -               return -ENODEV;
> -
> -       if (!is_endpoint_decoder(dev)) {
> -               put_device(dev);
> -               return -EINVAL;
> -       }
> +       int rc = 0;
>  
> -       rc = down_write_killable(&cxl_region_rwsem);
> +       if (state == TASK_INTERRUPTIBLE)
> +               rc = down_write_killable(&cxl_region_rwsem);
> +       else
> +               down_write(&cxl_region_rwsem);
>         if (rc)
> -               goto out;
> +               return rc;
> +
>         down_read(&cxl_dpa_rwsem);
> -       rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos);
> +       rc = cxl_region_attach(cxlr, cxled, pos);
>         if (rc == 0)
>                 set_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
>         up_read(&cxl_dpa_rwsem);
>         up_write(&cxl_region_rwsem);
> -out:
> -       put_device(dev);
>         return rc;
>  }
>  
> @@ -1480,8 +1474,23 @@ static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
>  
>         if (sysfs_streq(buf, "\n"))
>                 rc = detach_target(cxlr, pos);
> -       else
> -               rc = attach_target(cxlr, buf, pos);
> +       else {
> +               struct device *dev;
> +
> +               dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
> +               if (!dev)
> +                       return -ENODEV;
> +
> +               if (!is_endpoint_decoder(dev)) {
> +                       rc = -EINVAL;
> +                       goto out;
> +               }
> +
> +               rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos,
> +                                  TASK_INTERRUPTIBLE);
> +out:
> +               put_device(dev);
> +       }
>  
>         if (rc < 0)
>                 return rc;
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 11/18] cxl/region: Add region autodiscovery
  2023-02-08 17:07   ` Jonathan Cameron
@ 2023-02-09  4:07     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-09  4:07 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
> On Sun, 05 Feb 2023 17:03:29 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Region autodiscovery is an asynchrounous state machine advanced by
> > cxl_port_probe(). After the decoders on an endpoint port are enumerated
> > they are scanned for actively enabled instances. Each active decoder is
> > flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region.
> > If a region does not already exist for the address range setting of the
> > decoder one is created. That creation process may race with other
> > decoders of the same region being discovered since cxl_port_probe() is
> > asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is
> > introduced to mitigate that race.
> > 
> > Once all decoders have arrived, "p->nr_targets == p->interleave_ways",
> > they are sorted by their relative decode position. The sort algorithm
> > involves finding the point in the cxl_port topology where one leg of the
> > decode leads to deviceA and the other deviceB. At that point in the
> > topology the target order in the 'struct cxl_switch_decoder' indicates
> > the relative position of those endpoint decoders in the region.
> > 
> > >From that point the region goes through the same setup and validation  
> > steps as user-created regions, but instead of programming the decoders
> > it validates that driver would have written the same values to the
> > decoders as were already present.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> A few comments inline,
> 
> Thanks,
> 
> Jonathan
> 
> > ---
> >  drivers/cxl/core/hdm.c    |    5 
> >  drivers/cxl/core/port.c   |    2 
> >  drivers/cxl/core/region.c |  496 ++++++++++++++++++++++++++++++++++++++++++++-
> >  drivers/cxl/cxl.h         |   16 +
> >  drivers/cxl/port.c        |   26 ++
> >  5 files changed, 531 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index dcc16d7cb8f3..174cddfec6e8 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -674,7 +674,7 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
> >  	up_read(&cxl_dpa_rwsem);
> >  
> >  	port->commit_end--;
> > -	cxld->flags &= ~CXL_DECODER_F_ENABLE;
> > +	cxld->flags &= ~(CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO);
> >  
> >  	return 0;
> >  }
> > @@ -719,7 +719,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> >  
> >  	/* decoders are enabled if committed */
> >  	if (committed) {
> > -		cxld->flags |= CXL_DECODER_F_ENABLE;
> > +		cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO;
> >  		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
> >  			cxld->flags |= CXL_DECODER_F_LOCK;
> >  		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
> > @@ -783,6 +783,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> >  		return rc;
> >  	}
> >  	*dpa_base += dpa_size + skip;
> > +
> 
> :)  No comment.

Spurious newlines... my old friend.

> 
> >  	return 0;
> >  }
> >  
> 
> 
> ...
> 
> 
> > +static int cmp_decode_pos(const void *a, const void *b)
> > +{
> > +	struct cxl_endpoint_decoder *cxled_a = *(typeof(cxled_a) *)a;
> > +	struct cxl_endpoint_decoder *cxled_b = *(typeof(cxled_b) *)b;
> > +	struct cxl_memdev *cxlmd_a = cxled_to_memdev(cxled_a);
> > +	struct cxl_memdev *cxlmd_b = cxled_to_memdev(cxled_b);
> > +	struct cxl_port *port_a = cxled_to_port(cxled_a);
> > +	struct cxl_port *port_b = cxled_to_port(cxled_b);
> > +	struct cxl_port *iter_a, *iter_b;
> > +
> > +	/* Exit early if any prior sorting failed */
> > +	if (cxled_a->pos < 0 || cxled_b->pos < 0)
> > +		return 0;
> > +
> > +	/*
> > +	 * Walk up the hierarchy to find a shared port, find the decoder
> > +	 * that maps the range, compare the relative position of those
> > +	 * dport mappings.
> > +	 */
> > +	for (iter_a = port_a; iter_a; iter_a = next_port(iter_a)) {
> > +		struct cxl_port *next_a, *next_b, *port;
> > +		struct cxl_switch_decoder *cxlsd;
> > +
> > +		next_a = next_port(iter_a);
> > +		for (iter_b = port_b; iter_b; iter_b = next_port(iter_b)) {
> > +			int a_pos, b_pos, result;
> > +			struct device *dev;
> > +			unsigned int seq;
> > +
> > +			next_b = next_port(iter_b);
> > +			if (next_a != next_b)
> > +				continue;
> > +			if (!next_a)
> > +				goto out;
> Perhaps treat this as a 'find the match' then carry on, so as to avoid having the
> guts of the handling nested so deep.  So something like.

I always like a good out-indent, will give it a look.

> 
> 	struct cxl_switch_decoder *cxlsd;
> 	struct cxl_port *next_a, *next_b, *port;
> 	bool matched = false; // May be simpler test - I'm just lazy
> 
> 	// possibly pull this out as a utility function so that
> 	// error handling can be tidied up by simply returning 
> 	// rather than gotos.
> 	for (iter_a = port_a; iter_a; iter_a = next_port(iter_a)) {
> 		next_a = next_port(iter_a);
> 		for (iter_b = port_b; iter_b; iter_b = next_port(iter_b)) {
> 			int a_pos, b_pos, result;
> 			struct device *dev;
> 			unsigned int seq;
> 
> 			next_b = next_port(iter_b);
> 			if (next_a != next_b)
> 				continue;
> 			if (!next_a)
> 				goto out;
> 			port = next_a;
> 			found = true;

@found can just be @port.

> 			break;
> 		}
> 		if (found)
> 			break;
> 	}
> 	if (!found)
> 		goto out;
> 
> 	dev = device_find_child(&port->dev, cxled_a,
> 				decoder_match_range);
> 	if (!dev) {
> 		struct range *range = &cxled_a->cxld.hpa_range;
> 		dev_err(port->uport,
> 			"failed to find decoder that maps %#llx-:%#llx\n",
> 		range->start, range->end);
> 		cxled_a->pos = -1;
> 		return 0;
> 	}
> 
> 	cxlsd = to_cxl_switch_decoder(dev);
> 	do {
> 		seq = read_seqbegin(&cxlsd->target_lock);
> 		find_positions(cxlsd, iter_a, iter_b, &a_pos, &b_pos);
> 	} while (read_seqretry(&cxlsd->target_lock, seq));
> 
> 	if (a_pos < 0 || b_pos < 0) {
> 		dev_err(port->uport,
> 			"failed to find shared decoder for %s and %s\n",
> 			dev_name(cxlmd_a->dev.parent),
> 			dev_name(cxlmd_b->dev.parent));
> 		cxled_a->pos = -1;
> 		result = 0;
> //if factored as suggested above then a goto here and drop
> // the else would be neater again.

Perhaps, I'll take a look.

> 
> 	} else {
> 		result = a_pos - b_pos;
> 		dev_dbg(port->uport, "%s: %s comes %s %s\n",
> 			dev_name(&cxlsd->cxld.dev),
> 			dev_name(cxlmd_a->dev.parent),
> 			result < 0 ? "before" : "after",
> 			dev_name(cxlmd_b->dev.parent));
> 		}
> 	put_device(dev);
> 
> 	return result;
> out:
> 	dev_err(cxlmd_a->dev.parent, "failed to find shared port with %s\n",
> 		dev_name(cxlmd_b->dev.parent));
> 	cxled_a->pos = -1;
> 	return 0;
> }
> 
> 
> > +			dev = device_find_child(&port->dev, cxled_a,
> > +						decoder_match_range);
> > +			if (!dev) {
> > +				struct range *range = &cxled_a->cxld.hpa_range;
> > +
> > +				dev_err(port->uport,
> > +					"failed to find decoder that maps %#llx-:%#llx\n",
> > +					range->start, range->end);
> > +				cxled_a->pos = -1;
> > +				return 0;
> > +			}
> > +
> > +			cxlsd = to_cxl_switch_decoder(dev);
> > +			do {
> > +				seq = read_seqbegin(&cxlsd->target_lock);
> > +				find_positions(cxlsd, iter_a, iter_b, &a_pos,
> > +					       &b_pos);
> > +			} while (read_seqretry(&cxlsd->target_lock, seq));
> > +
> > +			if (a_pos < 0 || b_pos < 0) {
> > +				dev_err(port->uport,
> > +					"failed to find shared decoder for %s and %s\n",
> > +					dev_name(cxlmd_a->dev.parent),
> > +					dev_name(cxlmd_b->dev.parent));
> > +				cxled_a->pos = -1;
> > +				result = 0;
> > +			} else {
> > +				result = a_pos - b_pos;
> > +				dev_dbg(port->uport, "%s: %s comes %s %s\n",
> > +					dev_name(&cxlsd->cxld.dev),
> > +					dev_name(cxlmd_a->dev.parent),
> > +					result < 0 ? "before" : "after",
> > +					dev_name(cxlmd_b->dev.parent));
> > +			}
> > +
> > +			put_device(dev);
> > +
> > +			return result;
> > +		}
> > +	}
> > +out:
> > +	dev_err(cxlmd_a->dev.parent, "failed to find shared port with %s\n",
> > +		dev_name(cxlmd_b->dev.parent));
> > +	cxled_a->pos = -1;
> > +	return 0;
> > +}
> > +
> > +static int cxl_region_sort_targets(struct cxl_region *cxlr)
> > +{
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	int i, rc = 0;
> > +
> > +	sort(p->targets, p->nr_targets, sizeof(p->targets[0]), cmp_decode_pos,
> > +	     NULL);
> > +
> > +	for (i = 0; i < p->nr_targets; i++) {
> > +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> > +
> > +		if (cxled->pos < 0)
> > +			rc = -ENXIO;
> > +		cxled->pos = i;
> > +	}
> > +
> > +	dev_dbg(&cxlr->dev, "region sort %s\n", rc ? "failed" : "successful");
> > +	return rc;
> > +}
> > +
> >  static int cxl_region_attach(struct cxl_region *cxlr,
> >  			     struct cxl_endpoint_decoder *cxled, int pos)
> >  {
> > @@ -1350,6 +1572,50 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> >  		return -EINVAL;
> >  	}
> >  
> > +	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
> > +		int i;
> > +
> > +		rc = cxl_region_attach_auto(cxlr, cxled, pos);
> > +		if (rc)
> > +			return rc;
> > +
> > +		/* await more targets to arrive... */
> > +		if (p->nr_targets < p->interleave_ways)
> > +			return 0;
> > +
> > +		/*
> > +		 * All targets are here, which implies all PCI enumeration that
> > +		 * affects this region has been completed. Walk the topology to
> > +		 * sort the devices into their relative region decode position.
> > +		 */
> > +		rc = cxl_region_sort_targets(cxlr);
> > +		if (rc)
> > +			return rc;
> > +
> > +		for (i = 0; i < p->nr_targets; i++) {
> > +			cxled = p->targets[i];
> > +			ep_port = cxled_to_port(cxled);
> > +			dport = cxl_find_dport_by_dev(root_port,
> > +						      ep_port->host_bridge);
> > +			rc = cxl_region_attach_position(cxlr, cxlrd, cxled,
> > +							dport, i);
> > +			if (rc)
> > +				return rc;
> > +		}
> > +
> > +		rc = cxl_region_setup_targets(cxlr);
> > +		if (rc)
> > +			return rc;
> > +
> > +		/*
> > +		 * If target setup succeeds in the autodiscovery case
> > +		 * then the region is already committed.
> > +		 */
> > +		p->state = CXL_CONFIG_COMMIT;
> > +
> > +		return 0;
> > +	}
> > +
> >  	rc = cxl_region_validate_position(cxlr, cxled, pos);
> >  	if (rc)
> >  		return rc;
> > @@ -1500,8 +1766,8 @@ static int detach_target(struct cxl_region *cxlr, int pos)
> >  	return rc;
> >  }
> >  
> > -static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
> > -			    size_t len)
> > +static ssize_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
> > +			     size_t len)
> 
> ?
> 
> 
> 
> ...
> 
> 
> > +
> > +void cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> > +{
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct range *hpa = &cxled->cxld.hpa_range;
> > +	struct cxl_decoder *cxld = &cxled->cxld;
> > +	struct cxl_root_decoder *cxlrd;
> > +	struct cxl_region_params *p;
> > +	struct cxl_region *cxlr;
> > +	struct cxl_port *root;
> > +	bool attach = false;
> > +	struct device *dev;
> > +
> > +	root = find_cxl_root(&cxlmd->dev);
> > +	if (!root) {
> > +		dev_err(cxlmd->dev.parent, "%s: failed to map CXL root\n",
> > +			dev_name(&cxlmd->dev));
> > +		return;
> 
> I'm not a fan of muddling on from errors at lower layers of code.
> I'd push that decision up to the caller - so return an error code from here.
> If caller just wants to ignore it (with appropriate comment on why) then
> that is fine by me.

Yeah, in fact the caller should already know about the root at this
point since this is called from cxl_port_probe() that did that port
establishment. Perhaps this can just take @root as an argument.

Note though that all errors here are ignored by the caller because there
is simply no way to know ahead of time whether this region will succeed
in receiving all its members. So the failure scenario is missing
regions, not anything that would make the port driver fail to load.

> 
> 
> > +	}
> > +
> > +	dev = device_find_child(&root->dev, &cxld->hpa_range,
> > +				  match_decoder_by_range);
> 
> device_find_child is acquiring a reference to the child I think.
> 
> > +	if (!dev) {
> > +		dev_err(cxlmd->dev.parent,
> > +			"%s:%s no CXL window for range %#llx:%#llx\n",
> > +			dev_name(&cxlmd->dev), dev_name(&cxld->dev),
> > +			cxld->hpa_range.start, cxld->hpa_range.end);
> > +		goto out;
> > +	}
> > +	cxlrd = to_cxl_root_decoder(dev);
> Here we've stashed the dev that we have a reference to in cxlrd.
> > +
> > +	mutex_lock(&cxlrd->range_lock);
> > +	dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa,
> > +				match_region_by_range);
> > +	if (!dev)
> > +		cxlr = construct_region(cxlrd, cxled);
> > +	else
> > +		cxlr = to_cxl_region(dev);
> > +	mutex_unlock(&cxlrd->range_lock);
> > +
> > +	if (IS_ERR(cxlr))
> 
> In this path, do we drop the reference to cxlrd->dev?

No, good eye. The reference can be dropped immediately since the root
decoder will not be unregistered until the root port is removed and by
this point this port has registered itself in the device_unregister()
chain when its ancestors are taken out.

> 
> > +		goto out;
> > +
> > +	attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
> > +
> > +	down_read(&cxl_region_rwsem);
> > +	p = &cxlr->params;
> > +	attach = p->state >= CXL_CONFIG_COMMIT;
> 
> Test is a bit non obvious.  In the tree I'm looking at
> CXL_CONFIG_COMMIT is last value in the enum.

Can switch to "=="

> 
> 
> > +	up_read(&cxl_region_rwsem);
> > +
> > +	if (attach) {
> > +		int rc = device_attach(&cxlr->dev);
> > +
> > +		/*
> > +		 * If device_attach() fails the range may still be
> > +		 * active via the platform-firmware memory map
> > +		 */
> > +		if (rc < 0)
> 
> We also want to know if the no matching driver found case happened so perhaps
> 		if (rc != 1)

That can't happen as the only possible driver, cxl_region_driver, is in
this same file. I'll add a comment to that effect as its a small
layering violation to know that.

> 
> > +			dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
> > +				p->res);
> > +	}
> > +
> > +	put_device(&cxlr->dev);
> > +out:
> > +	put_device(&root->dev);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, CXL);
> > +
> >  static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
> >  {
> >  	if (!test_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags))
> > @@ -2103,6 +2554,15 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
> >  	return 0;
> >  }
> 
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index ca76879af1de..9b3765c5c81a 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -261,6 +261,8 @@ resource_size_t cxl_rcrb_to_component(struct device *dev,
> >   * cxl_decoder flags that define the type of memory / devices this
> >   * decoder supports as well as configuration lock status See "CXL 2.0
> >   * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
> > + * Additionally indicate whether decoder settings were autodetected,
> > + * user customized.
> >   */
> >  #define CXL_DECODER_F_RAM   BIT(0)
> >  #define CXL_DECODER_F_PMEM  BIT(1)
> > @@ -268,6 +270,7 @@ resource_size_t cxl_rcrb_to_component(struct device *dev,
> >  #define CXL_DECODER_F_TYPE3 BIT(3)
> >  #define CXL_DECODER_F_LOCK  BIT(4)
> >  #define CXL_DECODER_F_ENABLE    BIT(5)
> > +#define CXL_DECODER_F_AUTO  BIT(6)
> 
> That's a bit nasty and unexpected - can we rename to make it really clear
> it's not a hardware flag at the point of usage.
> Or just put it in a a new structure field rather than pushing it in here.

I'll move it to an endpoint-flag value since it is only relevant for
those.

> >  #define CXL_DECODER_F_MASK  GENMASK(5, 0)
> 
> Hmm. This mask isn't currently used and now has
> an odd meaning because of hiding a flag in bit(6).

True, it can be deleted by a separate cleanup.

> 
> 
> ...
> 
> >  
> >  /*
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > index 5453771bf330..012a0c6f8476 100644
> > --- a/drivers/cxl/port.c
> > +++ b/drivers/cxl/port.c
> > @@ -30,6 +30,23 @@ static void schedule_detach(void *cxlmd)
> >  	schedule_cxl_memdev_detach(cxlmd);
> >  }
> >  
> > +static int discover_region(struct device *dev, void *data)
> > +{
> > +	const unsigned long flags = CXL_DECODER_F_ENABLE | CXL_DECODER_F_AUTO;
> > +	struct cxl_endpoint_decoder *cxled;
> > +
> > +	if (!is_endpoint_decoder(dev))
> > +		return 0;
> > +
> > +	cxled = to_cxl_endpoint_decoder(dev);
> > +	if ((cxled->cxld.flags & flags) != flags)
> > +		return 0;
> > +
> > +	cxl_add_to_region(cxled);
> > +
> > +	return 0;
> > +}
> > +
> >  static int cxl_port_probe(struct device *dev)
> >  {
> >  	struct cxl_port *port = to_cxl_port(dev);
> > @@ -78,6 +95,15 @@ static int cxl_port_probe(struct device *dev)
> >  		return rc;
> >  	}
> >  
> > +	if (!is_cxl_endpoint(port))
> > +		return 0;
> > +
> > +	/*
> > +	 * Now that all endpoint decoders are successfully enumerated,
> > +	 * try to assemble regions from committed decoders
> > +	 */
> > +	device_for_each_child(dev, NULL, discover_region);
> > +
> 
> There is a steady reduction in common code vs specific code for the two
> cases in here.  Maybe it's time just to split the probe into
> 
> static int cxl_port_probe(struct device *dev)
> {
> 	if (is_cxl_endpoint(port))
> 		cxl_port_probe_endpoint();
> 	else
> 		cxl_port_probe_not endpoint(); //with better naming.
> }
> 
> Shared code is about 7 lines vs 40+ unshared.

Works for me.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 07/18] cxl/region: Move region-position validation to a helper
  2023-02-08 12:30   ` Jonathan Cameron
@ 2023-02-09  4:09     ` Dan Williams
  2023-02-09  4:26     ` Dan Williams
  1 sibling, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-09  4:09 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
> On Sun, 05 Feb 2023 17:03:07 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > In preparation for region autodiscovery, that needs all devices
> > discovered before their relative position in the region can be
> > determined, consolidate all position dependent validation in a helper.
> > 
> > Recall that in the on-demand region creation flow the end-user picks the
> > position of a given endpoint decoder in a region. In the autodiscovery
> > case the position of an endpoint decoder can only be determined after
> > all other endpoint decoders that claim to decode the region's address
> > range have been enumerated and attached. So, in the autodiscovery case
> > endpoint decoders may be attached before their relative position is
> > known. Once all decoders arrive, then positions can be determined and
> > validated with cxl_region_validate_position() the same as user initiated
> > on-demand creation.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Hi Dan,
> 
> A few comments inline, but mostly reflect the original code rather than
> the refactoring you have done in this patch.
> 
> Jonathan
> 
> 
> > +static int cxl_region_attach(struct cxl_region *cxlr,
> > +			     struct cxl_endpoint_decoder *cxled, int pos)
> > +{
> > +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	struct cxl_port *ep_port, *root_port;
> > +	struct cxl_dport *dport;
> > +	int rc = -ENXIO;
> > +
> > +	if (cxled->mode != cxlr->mode) {
> > +		dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> > +			dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (cxled->mode == CXL_DECODER_DEAD) {
> > +		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> > +		return -ENODEV;
> > +	}
> > +
> > +	/* all full of members, or interleave config not established? */
> > +	if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
> > +		dev_dbg(&cxlr->dev, "region already active\n");
> > +		return -EBUSY;
> > +	} else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
> > +		dev_dbg(&cxlr->dev, "interleave config missing\n");
> > +		return -ENXIO;
> > +	}
> > +
> >  	ep_port = cxled_to_port(cxled);
> >  	root_port = cxlrd_to_port(cxlrd);
> >  	dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge);
> > @@ -1284,13 +1327,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> >  		return -ENXIO;
> >  	}
> >  
> > -	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> > -		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> > -			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > -			dev_name(&cxlrd->cxlsd.cxld.dev));
> > -		return -ENXIO;
> > -	}
> > -
> 
> In an ideal world, this would have been nice as two patches.
> One that reorders the various checks so that they are in the order
> after you have factored things out (easy to review for correctness)
> then one that factored it out.
> 
> >  	if (cxled->cxld.target_type != cxlr->type) {
> >  		dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
> >  			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > @@ -1314,12 +1350,13 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> >  		return -EINVAL;
> >  	}
> >  
> > -	for (iter = ep_port; !is_cxl_root(iter);
> > -	     iter = to_cxl_port(iter->dev.parent)) {
> > -		rc = cxl_port_attach_region(iter, cxlr, cxled, pos);
> > -		if (rc)
> > -			goto err;
> > -	}
> > +	rc = cxl_region_validate_position(cxlr, cxled, pos);
> > +	if (rc)
> > +		return rc;
> > +
> > +	rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos);
> > +	if (rc)
> > +		return rc;
> >  
> >  	p->targets[pos] = cxled;
> >  	cxled->pos = pos;
> 
> More something about original code than the refactoring...
> 
> I'm not keen on the side effects that aren't unwound in the error paths.
> 
> p->targets[pos] and cxled->pos are left set.  Probably never matters
> but not elegant or as easy to reason about as it would be if they
> were cleared in error cases.  In particular there is a check on
> whether p->targets[pos] is set that will result in a dev_dbg even
> though setting it up actually failed.

I'll clean that up with a lead-in fixup.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 07/18] cxl/region: Move region-position validation to a helper
  2023-02-08 12:30   ` Jonathan Cameron
  2023-02-09  4:09     ` Dan Williams
@ 2023-02-09  4:26     ` Dan Williams
  2023-02-09 11:07       ` Jonathan Cameron
  1 sibling, 1 reply; 112+ messages in thread
From: Dan Williams @ 2023-02-09  4:26 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
[..]
> > @@ -1284,13 +1327,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> >  		return -ENXIO;
> >  	}
> >  
> > -	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> > -		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> > -			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > -			dev_name(&cxlrd->cxlsd.cxld.dev));
> > -		return -ENXIO;
> > -	}
> > -
> 
> In an ideal world, this would have been nice as two patches.
> One that reorders the various checks so that they are in the order
> after you have factored things out (easy to review for correctness)
> then one that factored it out.

I played with this a bit and the only way I could see to make the diff
come out significantly nicer would be to use a forward declaration to
move the new helpers below cxl_region_attach().

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-08 17:37   ` Fan Ni
@ 2023-02-09  4:56     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-09  4:56 UTC (permalink / raw)
  To: Fan Ni, Dan Williams
  Cc: linux-cxl, David Hildenbrand, Kees Cook, stable, Dave Hansen,
	Michal Hocko, linux-mm, linux-acpi, Adam Manzanares, dave

Fan Ni wrote:
> On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:
> 
> 
> > Summary:
> > --------
> > 
> > CXL RAM support allows for the dynamic provisioning of new CXL RAM
> > regions, and more routinely, assembling a region from an existing
> > configuration established by platform-firmware. The latter is motivated
> > by CXL memory RAS (Reliability, Availability and Serviceability)
> > support, that requires associating device events with System Physical
> > Address ranges and vice versa.
> > 
> > The 'Soft Reserved' policy rework arranges for performance
> > differentiated memory like CXL attached DRAM, or high-bandwidth memory,
> > to be designated for 'System RAM' by default, rather than the device-dax
> > dedicated access mode. That current device-dax default is confusing and
> > surprising for the Pareto of users that do not expect memory to be
> > quarantined for dedicated access by default. Most users expect all
> > 'System RAM'-capable memory to show up in FREE(1).
> > 
> > 
> > Details:
> > --------
> > 
> > Recall that the Linux 'Soft Reserved' designation for memory is a
> > reaction to platform-firmware, like EFI EDK2, delineating memory with
> > the EFI Specific Purpose Memory attribute (EFI_MEMORY_SP). An
> > alternative way to think of that attribute is that it specifies the
> > *not* general-purpose memory pool. It is memory that may be too precious
> > for general usage or not performant enough for some hot data structures.
> > However, in the absence of explicit policy it should just be 'System
> > RAM' by default.
> > 
> > Rather than require every distribution to ship a udev policy to assign
> > dax devices to dax_kmem (the device-memory hotplug driver) just make
> > that the kernel default. This is similar to the rationale in:
> > 
> > commit 8604d9e534a3 ("memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE")
> > 
> > With this change the relatively niche use case of accessing this memory
> > via mapping a device-dax instance can be achieved by building with
> > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, or specifying
> > memhp_default_state=offline at boot, and then use:
> > 
> >     daxctl reconfigure-device $device -m devdax --force
> > 
> > ...to shift the corresponding address range to device-dax access.
> > 
> > The process of assembling a device-dax instance for a given CXL region
> > device configuration is similar to the process of assembling a
> > Device-Mapper or MDRAID storage-device array. Specifically, asynchronous
> > probing by the PCI and driver core enumerates all CXL endpoints and
> > their decoders. Then, once enough decoders have arrived to a describe a
> > given region, that region is passed to the device-dax subsystem where it
> > is subject to the above 'dax_kmem' policy. This assignment and policy
> > choice is only possible if memory is set aside by the 'Soft Reserved'
> > designation. Otherwise, CXL that is mapped as 'System RAM' becomes
> > immutable by CXL driver mechanisms, but is still enumerated for RAS
> > purposes.
> > 
> > This series is also available via:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region
> > 
> > ...and has gone through some preview testing in various forms.
> > 
> > ---
> 
> Tested-by: Fan Ni <fan.ni@samsung.com>
> 
> 
> Run the following tests with the patch (with the volatile support at qemu).
> Note: cxl related code are compiled as modules and loaded before used.
> 
> For pmem setup, tried three topologies (1HB1RP1Mem, 1HB2RP2Mem, 1HB2RP4Mem with
> a cxl switch). The memdev is either provided in the command line when launching
> qemu or hot added to the guest with device_add command in qemu monitor.
> 
> The following operations are performed,
> 1. create-region with cxl cmd
> 2. create name-space with ndctl cmd
> 3. convert cxl mem to ram with daxctl cmd
> 4. online the memory with daxctl cmd
> 5. Let app use the memory (numactl --membind=1 htop)
> 
> Results: No regression.
> 
> For volatile memory (hot add with device_add command), mainly tested 1HB1RP1Mem
> case (passthrough).
> 1. the device can be correctly discovered after hot add (cxl list, may need
> cxl enable-memdev)
> 2. creating ram region (cxl create-region) succeeded, after creating the
> region, a dax device under /dev/ is shown.
> 3. online the memory passes, and the memory is shown on another NUMA node.
> 4. Let app use the memory (numactl --membind=1 htop) passed.

Thank you, Fan!

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 07/18] cxl/region: Move region-position validation to a helper
  2023-02-09  4:26     ` Dan Williams
@ 2023-02-09 11:07       ` Jonathan Cameron
  2023-02-09 20:52         ` Dan Williams
  0 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-09 11:07 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

On Wed, 8 Feb 2023 20:26:26 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> [..]
> > > @@ -1284,13 +1327,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> > >  		return -ENXIO;
> > >  	}
> > >  
> > > -	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> > > -		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> > > -			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > > -			dev_name(&cxlrd->cxlsd.cxld.dev));
> > > -		return -ENXIO;
> > > -	}
> > > -  
> > 
> > In an ideal world, this would have been nice as two patches.
> > One that reorders the various checks so that they are in the order
> > after you have factored things out (easy to review for correctness)
> > then one that factored it out.  
> 
> I played with this a bit and the only way I could see to make the diff
> come out significantly nicer would be to use a forward declaration to
> move the new helpers below cxl_region_attach().

Don't bother then!  Unless you've already done it.

In the ideal world diff would magically present everything in the most
human readable form :)  What are all these AI folk working on that we
don't already have this!

Jonathan


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 07/18] cxl/region: Move region-position validation to a helper
  2023-02-06  1:03 ` [PATCH 07/18] cxl/region: Move region-position validation to a helper Dan Williams
  2023-02-06 17:44   ` Ira Weiny
  2023-02-08 12:30   ` Jonathan Cameron
@ 2023-02-09 19:45   ` Verma, Vishal L
  2 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09 19:45 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: linux-mm, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:03 -0800, Dan Williams wrote:
> In preparation for region autodiscovery, that needs all devices
> discovered before their relative position in the region can be
> determined, consolidate all position dependent validation in a helper.
> 
> Recall that in the on-demand region creation flow the end-user picks the
> position of a given endpoint decoder in a region. In the autodiscovery
> case the position of an endpoint decoder can only be determined after
> all other endpoint decoders that claim to decode the region's address
> range have been enumerated and attached. So, in the autodiscovery case
> endpoint decoders may be attached before their relative position is
> known. Once all decoders arrive, then positions can be determined and
> validated with cxl_region_validate_position() the same as user initiated
> on-demand creation.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/region.c |  119 +++++++++++++++++++++++++++++----------------
>  1 file changed, 76 insertions(+), 43 deletions(-)

Looks good,

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>

> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 97eafdd75675..c82d3b6f3d1f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1207,35 +1207,13 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr)
>         return 0;
>  }
>  
> -static int cxl_region_attach(struct cxl_region *cxlr,
> -                            struct cxl_endpoint_decoder *cxled, int pos)
> +static int cxl_region_validate_position(struct cxl_region *cxlr,
> +                                       struct cxl_endpoint_decoder *cxled,
> +                                       int pos)
>  {
> -       struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
>         struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> -       struct cxl_port *ep_port, *root_port, *iter;
>         struct cxl_region_params *p = &cxlr->params;
> -       struct cxl_dport *dport;
> -       int i, rc = -ENXIO;
> -
> -       if (cxled->mode != cxlr->mode) {
> -               dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> -                       dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> -               return -EINVAL;
> -       }
> -
> -       if (cxled->mode == CXL_DECODER_DEAD) {
> -               dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> -               return -ENODEV;
> -       }
> -
> -       /* all full of members, or interleave config not established? */
> -       if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
> -               dev_dbg(&cxlr->dev, "region already active\n");
> -               return -EBUSY;
> -       } else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
> -               dev_dbg(&cxlr->dev, "interleave config missing\n");
> -               return -ENXIO;
> -       }
> +       int i;
>  
>         if (pos < 0 || pos >= p->interleave_ways) {
>                 dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
> @@ -1274,6 +1252,71 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>                 }
>         }
>  
> +       return 0;
> +}
> +
> +static int cxl_region_attach_position(struct cxl_region *cxlr,
> +                                     struct cxl_root_decoder *cxlrd,
> +                                     struct cxl_endpoint_decoder *cxled,
> +                                     const struct cxl_dport *dport, int pos)
> +{
> +       struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +       struct cxl_port *iter;
> +       int rc;
> +
> +       if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> +               dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> +                       dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +                       dev_name(&cxlrd->cxlsd.cxld.dev));
> +               return -ENXIO;
> +       }
> +
> +       for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
> +            iter = to_cxl_port(iter->dev.parent)) {
> +               rc = cxl_port_attach_region(iter, cxlr, cxled, pos);
> +               if (rc)
> +                       goto err;
> +       }
> +
> +       return 0;
> +
> +err:
> +       for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
> +            iter = to_cxl_port(iter->dev.parent))
> +               cxl_port_detach_region(iter, cxlr, cxled);
> +       return rc;
> +}
> +
> +static int cxl_region_attach(struct cxl_region *cxlr,
> +                            struct cxl_endpoint_decoder *cxled, int pos)
> +{
> +       struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> +       struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +       struct cxl_region_params *p = &cxlr->params;
> +       struct cxl_port *ep_port, *root_port;
> +       struct cxl_dport *dport;
> +       int rc = -ENXIO;
> +
> +       if (cxled->mode != cxlr->mode) {
> +               dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> +                       dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> +               return -EINVAL;
> +       }
> +
> +       if (cxled->mode == CXL_DECODER_DEAD) {
> +               dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> +               return -ENODEV;
> +       }
> +
> +       /* all full of members, or interleave config not established? */
> +       if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +               dev_dbg(&cxlr->dev, "region already active\n");
> +               return -EBUSY;
> +       } else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +               dev_dbg(&cxlr->dev, "interleave config missing\n");
> +               return -ENXIO;
> +       }
> +
>         ep_port = cxled_to_port(cxled);
>         root_port = cxlrd_to_port(cxlrd);
>         dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge);
> @@ -1284,13 +1327,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>                 return -ENXIO;
>         }
>  
> -       if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> -               dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> -                       dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> -                       dev_name(&cxlrd->cxlsd.cxld.dev));
> -               return -ENXIO;
> -       }
> -
>         if (cxled->cxld.target_type != cxlr->type) {
>                 dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
>                         dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> @@ -1314,12 +1350,13 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>                 return -EINVAL;
>         }
>  
> -       for (iter = ep_port; !is_cxl_root(iter);
> -            iter = to_cxl_port(iter->dev.parent)) {
> -               rc = cxl_port_attach_region(iter, cxlr, cxled, pos);
> -               if (rc)
> -                       goto err;
> -       }
> +       rc = cxl_region_validate_position(cxlr, cxled, pos);
> +       if (rc)
> +               return rc;
> +
> +       rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos);
> +       if (rc)
> +               return rc;
>  
>         p->targets[pos] = cxled;
>         cxled->pos = pos;
> @@ -1343,10 +1380,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  
>  err_decrement:
>         p->nr_targets--;
> -err:
> -       for (iter = ep_port; !is_cxl_root(iter);
> -            iter = to_cxl_port(iter->dev.parent))
> -               cxl_port_detach_region(iter, cxlr, cxled);
>         return rc;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper
  2023-02-06  1:03 ` [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper Dan Williams
                     ` (3 preceding siblings ...)
  2023-02-08 12:32   ` Jonathan Cameron
@ 2023-02-09 19:47   ` Verma, Vishal L
  4 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09 19:47 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: keescook, linux-mm, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:03 -0800, Dan Williams wrote:
> In support of the CXL subsystem's use of 'struct range' to track decode
> address ranges, add a common range_contains() implementation with
> identical semantics as resource_contains();
> 
> The existing 'range_contains()' in lib/stackinit_kunit.c is namespaced
> with a 'stackinit_' prefix.
> 
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/pci.c |    5 -----
>  include/linux/range.h  |    5 +++++
>  lib/stackinit_kunit.c  |    6 +++---
>  3 files changed, 8 insertions(+), 8 deletions(-)

Looks good,

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>

> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 1d1492440287..9ed2120dbf8a 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -214,11 +214,6 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
>         return devm_add_action_or_reset(host, clear_mem_enable, cxlds);
>  }
>  
> -static bool range_contains(struct range *r1, struct range *r2)
> -{
> -       return r1->start <= r2->start && r1->end >= r2->end;
> -}
> -
>  /* require dvsec ranges to be covered by a locked platform window */
>  static int dvsec_range_allowed(struct device *dev, void *arg)
>  {
> diff --git a/include/linux/range.h b/include/linux/range.h
> index 274681cc3154..7efb6a9b069b 100644
> --- a/include/linux/range.h
> +++ b/include/linux/range.h
> @@ -13,6 +13,11 @@ static inline u64 range_len(const struct range *range)
>         return range->end - range->start + 1;
>  }
>  
> +static inline bool range_contains(struct range *r1, struct range *r2)
> +{
> +       return r1->start <= r2->start && r1->end >= r2->end;
> +}
> +
>  int add_range(struct range *range, int az, int nr_range,
>                 u64 start, u64 end);
>  
> diff --git a/lib/stackinit_kunit.c b/lib/stackinit_kunit.c
> index 4591d6cf5e01..05947a2feb93 100644
> --- a/lib/stackinit_kunit.c
> +++ b/lib/stackinit_kunit.c
> @@ -31,8 +31,8 @@ static volatile u8 forced_mask = 0xff;
>  static void *fill_start, *target_start;
>  static size_t fill_size, target_size;
>  
> -static bool range_contains(char *haystack_start, size_t haystack_size,
> -                          char *needle_start, size_t needle_size)
> +static bool stackinit_range_contains(char *haystack_start, size_t haystack_size,
> +                                    char *needle_start, size_t needle_size)
>  {
>         if (needle_start >= haystack_start &&
>             needle_start + needle_size <= haystack_start + haystack_size)
> @@ -175,7 +175,7 @@ static noinline void test_ ## name (struct kunit *test)             \
>                                                                 \
>         /* Validate that compiler lined up fill and target. */  \
>         KUNIT_ASSERT_TRUE_MSG(test,                             \
> -               range_contains(fill_start, fill_size,           \
> +               stackinit_range_contains(fill_start, fill_size, \
>                             target_start, target_size),         \
>                 "stack fill missed target!? "                   \
>                 "(fill %zu wide, target offset by %d)\n",       \
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled
  2023-02-06  1:03 ` [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled Dan Williams
                     ` (2 preceding siblings ...)
  2023-02-08 12:36   ` Jonathan Cameron
@ 2023-02-09 20:17   ` Verma, Vishal L
  3 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09 20:17 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: linux-mm, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:03 -0800, Dan Williams wrote:
> Add help text and a label so the CXL_REGION config option can be
> toggled. This is mainly to enable compile testing without region
> support.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/Kconfig |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 0ac53c422c31..163c094e67ae 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -104,12 +104,22 @@ config CXL_SUSPEND
>         depends on SUSPEND && CXL_MEM
>  
>  config CXL_REGION
> -       bool
> +       bool "CXL: Region Support"
>         default CXL_BUS
>         # For MAX_PHYSMEM_BITS
>         depends on SPARSEMEM
>         select MEMREGION
>         select GET_FREE_REGION
> +       help
> +         Enable the CXL core to enumerate and provision CXL regions. A CXL
> +         region is defined by one or more CXL expanders that decode a given
> +         system-physical address range. For CXL regions established by
> +         platform-firmware this option enables memory error handling to

unnecessary-hyphenation?

But regardless,

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>


> +         identify the devices participating in a given interleaved memory
> +         range. Otherwise, platform-firmware managed CXL is enabled by being
> +         placed in the system address map and does not need a driver.
> +
> +         If unsure say 'y'
>  
>  config CXL_REGION_INVALIDATION_TEST
>         bool "CXL: Region Cache Management Bypass (TEST)"
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 10/18] cxl/region: Fix passthrough-decoder detection
  2023-02-06  1:03 ` [PATCH 10/18] cxl/region: Fix passthrough-decoder detection Dan Williams
                     ` (2 preceding siblings ...)
  2023-02-08 12:44   ` Jonathan Cameron
@ 2023-02-09 20:28   ` Verma, Vishal L
  3 siblings, 0 replies; 112+ messages in thread
From: Verma, Vishal L @ 2023-02-09 20:28 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl; +Cc: linux-mm, stable, dave.hansen, linux-acpi

On Sun, 2023-02-05 at 17:03 -0800, Dan Williams wrote:
> A passthrough decoder is a decoder that maps only 1 target. It is a
> special case because it does not impose any constraints on the
> interleave-math as compared to a decoder with multiple targets. Extend
> the passthrough case to multi-target-capable decoders that only have one
> target selected. I.e. the current code was only considering passthrough
> *ports* which are only a subset of the potential passthrough decoder
> scenarios.
> 
> Fixes: e4f6dfa9ef75 ("cxl/region: Fix 'distance' calculation with passthrough ports")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/region.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Looks good,

Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>

> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c82d3b6f3d1f..34cf95217901 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1019,10 +1019,10 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>                 int i, distance;
>  
>                 /*
> -                * Passthrough ports impose no distance requirements between
> +                * Passthrough decoders impose no distance requirements between
>                  * peers
>                  */
> -               if (port->nr_dports == 1)
> +               if (cxl_rr->nr_targets == 1)
>                         distance = 0;
>                 else
>                         distance = p->nr_targets / cxl_rr->nr_targets;
> 
> 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 12/18] tools/testing/cxl: Define a fixed volatile configuration to parse
  2023-02-08 17:31   ` Jonathan Cameron
@ 2023-02-09 20:50     ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-09 20:50 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
> On Sun, 05 Feb 2023 17:03:35 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Take two endpoints attached to the first switch on the first host-bridge
> > in the cxl_test topology and define a pre-initialized region. This is a
> > x2 interleave underneath a x1 CXL Window.
> > 
> > $ modprobe cxl_test
> > $ # cxl list -Ru
> > {
> >   "region":"region3",
> >   "resource":"0xf010000000",
> >   "size":"512.00 MiB (536.87 MB)",
> >   "interleave_ways":2,
> >   "interleave_granularity":4096,
> >   "decode_state":"commit"
> > }
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/core.h      |    3 -
> >  drivers/cxl/core/hdm.c       |    3 +
> >  drivers/cxl/core/port.c      |    4 +
> >  drivers/cxl/cxl.h            |    4 +
> >  drivers/cxl/cxlmem.h         |    3 +
> >  tools/testing/cxl/test/cxl.c |  146 +++++++++++++++++++++++++++++++++++++++---
> >  6 files changed, 148 insertions(+), 15 deletions(-)
> > 
> 
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 8130430ffbcf..8d0895cbae93 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -458,6 +458,7 @@ bool is_switch_decoder(struct device *dev)
> >  {
> >  	return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
> >  }
> > +EXPORT_SYMBOL_NS_GPL(is_switch_decoder, CXL);
> >  
> >  struct cxl_decoder *to_cxl_decoder(struct device *dev)
> >  {
> > @@ -485,6 +486,7 @@ struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
> >  		return NULL;
> >  	return container_of(dev, struct cxl_switch_decoder, cxld.dev);
> >  }
> > +EXPORT_SYMBOL_NS_GPL(to_cxl_switch_decoder, CXL);
> >  
> >  static void cxl_ep_release(struct cxl_ep *ep)
> >  {
> > @@ -528,7 +530,7 @@ static const struct device_type cxl_port_type = {
> >  
> >  bool is_cxl_port(struct device *dev)
> >  {
> > -	return dev->type == &cxl_port_type;
> > +	return dev && dev->type == &cxl_port_type;
> 
> Adding that protection just for mocking seems a bit nasty.
> Could you push the sanity check out to the caller or am I missing something?
> Perhaps worth calling out the reason we'd call this on a NULL dev with a comment
> or similar.

Oh, that was probably a bug fix for an interim version of cxl_test that
I neglected to come back and clean up. A "while (port)" loop should not
walk off the top of a port->dev.parent chain.

Confirmed that the self-test still passes with this extra check gone.

> >  }
> >  EXPORT_SYMBOL_NS_GPL(is_cxl_port, CXL);
> >  
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 9b3765c5c81a..4c6ee6c96f23 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -452,6 +452,7 @@ struct cxl_region_params {
> >   * struct cxl_region - CXL region
> >   * @dev: This region's device
> >   * @id: This region's id. Id is globally unique across all regions
> > + * @fixed: At least one decoder in this region was locked down at init
> >   * @mode: Endpoint decoder allocation / access mode
> >   * @type: Endpoint decoder target type
> >   * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
> > @@ -462,6 +463,7 @@ struct cxl_region_params {
> >  struct cxl_region {
> >  	struct device dev;
> >  	int id;
> > +	bool fixed;
> 
> I was wondering why a mocking patch was changing the region state structure...
> It isn't - this isn't used that I can find.

Another interim hack that I failed to cleanup before posting.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 07/18] cxl/region: Move region-position validation to a helper
  2023-02-09 11:07       ` Jonathan Cameron
@ 2023-02-09 20:52         ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-09 20:52 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, dave.hansen, linux-mm, linux-acpi

Jonathan Cameron wrote:
> On Wed, 8 Feb 2023 20:26:26 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Jonathan Cameron wrote:
> > [..]
> > > > @@ -1284,13 +1327,6 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> > > >  		return -ENXIO;
> > > >  	}
> > > >  
> > > > -	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> > > > -		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> > > > -			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > > > -			dev_name(&cxlrd->cxlsd.cxld.dev));
> > > > -		return -ENXIO;
> > > > -	}
> > > > -  
> > > 
> > > In an ideal world, this would have been nice as two patches.
> > > One that reorders the various checks so that they are in the order
> > > after you have factored things out (easy to review for correctness)
> > > then one that factored it out.  
> > 
> > I played with this a bit and the only way I could see to make the diff
> > come out significantly nicer would be to use a forward declaration to
> > move the new helpers below cxl_region_attach().
> 
> Don't bother then!  Unless you've already done it.

No worries, abandoned.

> In the ideal world diff would magically present everything in the most
> human readable form :)  What are all these AI folk working on that we
> don't already have this!

+1 to this.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (19 preceding siblings ...)
       [not found] ` <CGME20230208173730uscas1p2af3a9eeb8946dfa607b190c079a49653@uscas1p2.samsung.com>
@ 2023-02-13 12:13 ` David Hildenbrand
  2023-02-14 18:45   ` Dan Williams
  2023-02-14 18:27 ` Gregory Price
  21 siblings, 1 reply; 112+ messages in thread
From: David Hildenbrand @ 2023-02-13 12:13 UTC (permalink / raw)
  To: Dan Williams, linux-cxl
  Cc: Kees Cook, stable, Dave Hansen, Michal Hocko, linux-mm, linux-acpi

On 06.02.23 02:02, Dan Williams wrote:
> Summary:
> --------
> 
> CXL RAM support allows for the dynamic provisioning of new CXL RAM
> regions, and more routinely, assembling a region from an existing
> configuration established by platform-firmware. The latter is motivated
> by CXL memory RAS (Reliability, Availability and Serviceability)
> support, that requires associating device events with System Physical
> Address ranges and vice versa.
> 
> The 'Soft Reserved' policy rework arranges for performance
> differentiated memory like CXL attached DRAM, or high-bandwidth memory,
> to be designated for 'System RAM' by default, rather than the device-dax
> dedicated access mode. That current device-dax default is confusing and
> surprising for the Pareto of users that do not expect memory to be
> quarantined for dedicated access by default. Most users expect all
> 'System RAM'-capable memory to show up in FREE(1).
> 
> 
> Details:
> --------
> 
> Recall that the Linux 'Soft Reserved' designation for memory is a
> reaction to platform-firmware, like EFI EDK2, delineating memory with
> the EFI Specific Purpose Memory attribute (EFI_MEMORY_SP). An
> alternative way to think of that attribute is that it specifies the
> *not* general-purpose memory pool. It is memory that may be too precious
> for general usage or not performant enough for some hot data structures.
> However, in the absence of explicit policy it should just be 'System
> RAM' by default.
> 
> Rather than require every distribution to ship a udev policy to assign
> dax devices to dax_kmem (the device-memory hotplug driver) just make
> that the kernel default. This is similar to the rationale in:
> 
> commit 8604d9e534a3 ("memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE")
> 
> With this change the relatively niche use case of accessing this memory
> via mapping a device-dax instance can be achieved by building with
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, or specifying
> memhp_default_state=offline at boot, and then use:
> 
>      daxctl reconfigure-device $device -m devdax --force
> 
> ...to shift the corresponding address range to device-dax access.
> 
> The process of assembling a device-dax instance for a given CXL region
> device configuration is similar to the process of assembling a
> Device-Mapper or MDRAID storage-device array. Specifically, asynchronous
> probing by the PCI and driver core enumerates all CXL endpoints and
> their decoders. Then, once enough decoders have arrived to a describe a
> given region, that region is passed to the device-dax subsystem where it
> is subject to the above 'dax_kmem' policy. This assignment and policy
> choice is only possible if memory is set aside by the 'Soft Reserved'
> designation. Otherwise, CXL that is mapped as 'System RAM' becomes
> immutable by CXL driver mechanisms, but is still enumerated for RAS
> purposes.
> 
> This series is also available via:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region
> 
> ...and has gone through some preview testing in various forms.
> 

My concern would be that in setups with a lot of CXL memory 
(soft-reserved), having that much offline memory during boot might make 
the kernel run out of memory. After all, offline memory consumes memory 
for the memmap.

Is the assumption that something like that cannot happen because we'll 
never ever have that much soft-reserved memory? :)

Note that this is a concern only applies when not using auto-onlining in 
the kernel during boot, which (IMHO) is or will be the default in the 
future.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
                   ` (20 preceding siblings ...)
  2023-02-13 12:13 ` David Hildenbrand
@ 2023-02-14 18:27 ` Gregory Price
  2023-02-14 18:39   ` Dan Williams
  21 siblings, 1 reply; 112+ messages in thread
From: Gregory Price @ 2023-02-14 18:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, David Hildenbrand, Kees Cook, stable, Dave Hansen,
	Michal Hocko, linux-mm, linux-acpi

On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:
> Summary:
> --------
> 
> CXL RAM support allows for the dynamic provisioning of new CXL RAM
> regions, and more routinely, assembling a region from an existing
> configuration established by platform-firmware. The latter is motivated
> by CXL memory RAS (Reliability, Availability and Serviceability)
> support, that requires associating device events with System Physical
> Address ranges and vice versa.
> 

Ok, I simplified down my tests and reverted a bunch of stuff, figured i
should report this before I dive further in.

Earlier i was carrying the DOE patches and others, I've dropped most of
that to make sure i could replicate on the base kernel and qemu images

QEMU branch: 
https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-26
this is a little out of date at this point i think? but it shouldn't
matter, the results are the same regardless of what else i pull in.

Kernel branch:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region
I had been carrying a bunch of other hotfixes and work like the DOE
patches, but but i wanted to know if the base branch saw the same
behavior.

Config:
sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
-drive file=/data/qemu/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
-m 4G,slots=4,maxmem=8G \
-smp 4 \
-machine type=q35,accel=kvm,cxl=on \
-enable-kvm \
-nographic \
-netdev bridge,id=hn0,br=virbr0 \
-device virtio-net-pci,netdev=hn0,id=nic1 \
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
-object memory-backend-ram,id=mem0,size=4G \
-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G


boot is fine
# ./ndctl/build/cxl/cxl create-region -m -t ram -d decoder0.0 \
  -w 1 -g 4096 mem0

[   28.183276] cxl_region region0: Bypassing cpu_cache_invalidate_memregion() for testing!
{
  "region":"region0",
  "resource":"0x390000000",
  "size":"4.00 GiB (4.29 GB)",
  "type":"ram",
  "interleave_ways":1,
  "interleave_granularity":4096,
  "decode_state":"commit",
  "mappings":[
    {
      "position":0,
      "memdev":"mem0",
      "decoder":"decoder2.0"
    }
  ]
}
cxl region: cmd_create_region: created 1 region
[   28.247144] Built 1 zonelists, mobility grouping on.  Total pages: 979754
[   28.247844] Policy zone: Normal
[   28.258449] Fallback order for Node 0: 0 1
[   28.258945] Fallback order for Node 1: 1 0
[   28.259422] Built 2 zonelists, mobility grouping on.  Total pages: 1012522
[   28.260087] Policy zone: Normal


top shows the memory has been onlined
MiB Mem :   8022.1 total,   7631.6 free,    200.9 used,    189.7 buff/cache
MiB Swap:   3926.0 total,   3926.0 free,      0.0 used.   7567.8 avail Mem


Lets attempt to use the memory
[root@fedora ~]# numactl --membind=1 python
KVM internal error. Suberror: 3
extra data[0]: 0x0000000080000b0e
extra data[1]: 0x0000000000000031
extra data[2]: 0x0000000000000d81
extra data[3]: 0x0000000390074ac0
extra data[4]: 0x0000000000000010
RAX=0000000080000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000001
RSI=0000000000000000 RDI=0000000390074000 RBP=ffffac1c4067bca0 RSP=ffffac1c4067bc88
R8 =0000000000000000 R9 =0000000000000001 R10=0000000000000000 R11=0000000000000000
R12=0000000000000000 R13=ffff99eed0074000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff812b3d62 RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00c00000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 ffffffff 00c00000
FS =0000 0000000000000000 ffffffff 00c00000
GS =0000 ffff99ec3bc00000 ffffffff 00c00000
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0040 fffffe1d13135000 00004087 00008b00 DPL=0 TSS64-busy
GDT=     fffffe1d13133000 0000007f
IDT=     fffffe0000000000 00000fff
CR0=80050033 CR2=ffffffff812b3d62 CR3=0000000390074000 CR4=000006f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000fffe0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=5d 9c 01 0f b7 db 48 09 df 48 0f ba ef 3f 0f 22 df 0f 1f 00 <5b> 41 5c 41 5d 5d c3 cc cc cc cc 48 c7 c0 00 00 00 80 48 2b 05 cd 0d 76 01



I also tested lowering the ram sizes (2GB ram, 1GB "CXL") to see if
there's something going on with the PCI hole or something, but no, same
results.

Double checked if there was an issue using a single root port so i
registered a second one - same results.


In prior tests i accessed the memory directly via devmem2

This still works when mapping the memory manually

[root@fedora map] ./map_memory.sh
echo ram > /sys/bus/cxl/devices/decoder2.0/mode
echo 0x40000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size
echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
echo 4096 > /sys/bus/cxl/devices/region0/interleave_granularity
echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
echo 0x40000000 > /sys/bus/cxl/devices/region0/size
echo decoder2.0 > /sys/bus/cxl/devices/region0/target0
echo 1 > /sys/bus/cxl/devices/region0/commit


[root@fedora devmem]# ./devmem2 0x290000000 w 0x12345678
/dev/mem opened.
Memory mapped at address 0x7fb4d4ed3000.
Value at address 0x290000000 (0x7fb4d4ed3000): 0x0
Written 0x12345678; readback 0x12345678



This kind of implies there's a disagreement about the state of memory
between linux and qemu.


but even just onlining a region produces memory usage:

[root@fedora ~]# cat /sys/bus/node/devices/node1/meminfo
Node 1 MemTotal:        1048576 kB
Node 1 MemFree:         1048112 kB
Node 1 MemUsed:             464 kB


Which I would expect to set off some fireworks.

Maybe an issue at the NUMA level? I just... i have no idea.


I will need to dig through the email chains to figure out what others
have been doing that i'm missing.  Everything *looks* nominal, but the
reactors are exploding so... ¯\_(ツ)_/¯

I'm not sure where to start here, but i'll bash my face on the keyboard
for a bit until i have some ideas.


~Gregory

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-14 18:27 ` Gregory Price
@ 2023-02-14 18:39   ` Dan Williams
  2023-02-14 19:01     ` Gregory Price
  0 siblings, 1 reply; 112+ messages in thread
From: Dan Williams @ 2023-02-14 18:39 UTC (permalink / raw)
  To: Gregory Price, Dan Williams
  Cc: linux-cxl, David Hildenbrand, Kees Cook, stable, Dave Hansen,
	Michal Hocko, linux-mm, linux-acpi

Gregory Price wrote:
> On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:
> > Summary:
> > --------
> > 
> > CXL RAM support allows for the dynamic provisioning of new CXL RAM
> > regions, and more routinely, assembling a region from an existing
> > configuration established by platform-firmware. The latter is motivated
> > by CXL memory RAS (Reliability, Availability and Serviceability)
> > support, that requires associating device events with System Physical
> > Address ranges and vice versa.
> > 
> 
> Ok, I simplified down my tests and reverted a bunch of stuff, figured i
> should report this before I dive further in.
> 
> Earlier i was carrying the DOE patches and others, I've dropped most of
> that to make sure i could replicate on the base kernel and qemu images
> 
> QEMU branch: 
> https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-26
> this is a little out of date at this point i think? but it shouldn't
> matter, the results are the same regardless of what else i pull in.
> 
> Kernel branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region

Note that I acted on this feedback from Greg to break out a fix and
merge it for v6.2-final

http://lore.kernel.org/r/Y+CSOeHVLKudN0A6@kroah.com

...i.e. you are missing at least the passthrough decoder fix, but that
would show up as a region creation failure not a QEMU crash.

So I would move to testing cxl/next.

[..]
> Lets attempt to use the memory
> [root@fedora ~]# numactl --membind=1 python
> KVM internal error. Suberror: 3
> extra data[0]: 0x0000000080000b0e
> extra data[1]: 0x0000000000000031
> extra data[2]: 0x0000000000000d81
> extra data[3]: 0x0000000390074ac0
> extra data[4]: 0x0000000000000010
> RAX=0000000080000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000001
> RSI=0000000000000000 RDI=0000000390074000 RBP=ffffac1c4067bca0 RSP=ffffac1c4067bc88
> R8 =0000000000000000 R9 =0000000000000001 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=ffff99eed0074000 R14=0000000000000000 R15=0000000000000000
> RIP=ffffffff812b3d62 RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 0000000000000000 ffffffff 00c00000
> CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0000 0000000000000000 ffffffff 00c00000
> FS =0000 0000000000000000 ffffffff 00c00000
> GS =0000 ffff99ec3bc00000 ffffffff 00c00000
> LDT=0000 0000000000000000 ffffffff 00c00000
> TR =0040 fffffe1d13135000 00004087 00008b00 DPL=0 TSS64-busy
> GDT=     fffffe1d13133000 0000007f
> IDT=     fffffe0000000000 00000fff
> CR0=80050033 CR2=ffffffff812b3d62 CR3=0000000390074000 CR4=000006f0
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000fffe0ff0 DR7=0000000000000400
> EFER=0000000000000d01
> Code=5d 9c 01 0f b7 db 48 09 df 48 0f ba ef 3f 0f 22 df 0f 1f 00 <5b> 41 5c 41 5d 5d c3 cc cc cc cc 48 c7 c0 00 00 00 80 48 2b 05 cd 0d 76 01
> 

At first glance that looks like a QEMU issue, but I would capture a cxl
list -vvv before attempting to use the memory just to verify the decoder
setup looks sane.

> 
> 
> I also tested lowering the ram sizes (2GB ram, 1GB "CXL") to see if
> there's something going on with the PCI hole or something, but no, same
> results.
> 
> Double checked if there was an issue using a single root port so i
> registered a second one - same results.
> 
> 
> In prior tests i accessed the memory directly via devmem2
> 
> This still works when mapping the memory manually
> 
> [root@fedora map] ./map_memory.sh
> echo ram > /sys/bus/cxl/devices/decoder2.0/mode
> echo 0x40000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size
> echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
> echo 4096 > /sys/bus/cxl/devices/region0/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
> echo 0x40000000 > /sys/bus/cxl/devices/region0/size
> echo decoder2.0 > /sys/bus/cxl/devices/region0/target0
> echo 1 > /sys/bus/cxl/devices/region0/commit
> 
> 
> [root@fedora devmem]# ./devmem2 0x290000000 w 0x12345678
> /dev/mem opened.
> Memory mapped at address 0x7fb4d4ed3000.
> Value at address 0x290000000 (0x7fb4d4ed3000): 0x0
> Written 0x12345678; readback 0x12345678

Likely it is sensitive to crossing an interleave threshold.

> This kind of implies there's a disagreement about the state of memory
> between linux and qemu.
> 
> 
> but even just onlining a region produces memory usage:
> 
> [root@fedora ~]# cat /sys/bus/node/devices/node1/meminfo
> Node 1 MemTotal:        1048576 kB
> Node 1 MemFree:         1048112 kB
> Node 1 MemUsed:             464 kB
> 
> 
> Which I would expect to set off some fireworks.
> 
> Maybe an issue at the NUMA level? I just... i have no idea.
> 
> 
> I will need to dig through the email chains to figure out what others
> have been doing that i'm missing.  Everything *looks* nominal, but the
> reactors are exploding so... ¯\_(ツ)_/¯
> 
> I'm not sure where to start here, but i'll bash my face on the keyboard
> for a bit until i have some ideas.

Not ruling out the driver yet, but Fan's tests with hardware has me
leaning more towards QEMU.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-13 12:13 ` David Hildenbrand
@ 2023-02-14 18:45   ` Dan Williams
  0 siblings, 0 replies; 112+ messages in thread
From: Dan Williams @ 2023-02-14 18:45 UTC (permalink / raw)
  To: David Hildenbrand, Dan Williams, linux-cxl
  Cc: Kees Cook, stable, Dave Hansen, Michal Hocko, linux-mm, linux-acpi

David Hildenbrand wrote:
[..]
> My concern would be that in setups with a lot of CXL memory 
> (soft-reserved), having that much offline memory during boot might make 
> the kernel run out of memory. After all, offline memory consumes memory 
> for the memmap.
> 
> Is the assumption that something like that cannot happen because we'll 
> never ever have that much soft-reserved memory? :)

Right, the assumption is that platform firmware is going to pick a
reasonable general purpose pool that lets an OS boot. In the near term
it is difficult to get that wrong since it will encompass all locally
attached DDR-DRAM. In the longer term, where it becomes possible that
all memory is CXL attached, then platform firmware will need to mark
some or all of CXL memory as general purpose (not Soft Reserved).

> Note that this is a concern only applies when not using auto-onlining in 
> the kernel during boot, which (IMHO) is or will be the default in the 
> future.

Yes, I expect that to be the default as well and let those small few
that have custom requirements figure out how to override that default.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-14 18:39   ` Dan Williams
@ 2023-02-14 19:01     ` Gregory Price
  2023-02-14 21:18       ` Jonathan Cameron
  0 siblings, 1 reply; 112+ messages in thread
From: Gregory Price @ 2023-02-14 19:01 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, David Hildenbrand, Kees Cook, stable, Dave Hansen,
	Michal Hocko, linux-mm, linux-acpi

On Tue, Feb 14, 2023 at 10:39:42AM -0800, Dan Williams wrote:
> Gregory Price wrote:
> > On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:
> > > Summary:
> > > --------
> > > 
> > > CXL RAM support allows for the dynamic provisioning of new CXL RAM
> > > regions, and more routinely, assembling a region from an existing
> > > configuration established by platform-firmware. The latter is motivated
> > > by CXL memory RAS (Reliability, Availability and Serviceability)
> > > support, that requires associating device events with System Physical
> > > Address ranges and vice versa.
> > > 
> > 
> > Ok, I simplified down my tests and reverted a bunch of stuff, figured i
> > should report this before I dive further in.
> > 
> > Earlier i was carrying the DOE patches and others, I've dropped most of
> > that to make sure i could replicate on the base kernel and qemu images
> > 
> > QEMU branch: 
> > https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-26
> > this is a little out of date at this point i think? but it shouldn't
> > matter, the results are the same regardless of what else i pull in.
> > 
> > Kernel branch:
> > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region
> 
> Note that I acted on this feedback from Greg to break out a fix and
> merge it for v6.2-final
> 
> http://lore.kernel.org/r/Y+CSOeHVLKudN0A6@kroah.com
> 
> ...i.e. you are missing at least the passthrough decoder fix, but that
> would show up as a region creation failure not a QEMU crash.
> 
> So I would move to testing cxl/next.
> 

I just noticed this, already spinning a new kernel.  Will report back

> Not ruling out the driver yet, but Fan's tests with hardware has me
> leaning more towards QEMU.

Same, not much has changed and I haven't tested with hardware yet. Was
planning to install it on our local boxes sometime later this week.

Was just so close to setting up a virtual memory pool in the lab, was
getting antsy :]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-14 19:01     ` Gregory Price
@ 2023-02-14 21:18       ` Jonathan Cameron
  2023-02-14 21:51         ` Gregory Price
  0 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-14 21:18 UTC (permalink / raw)
  To: Gregory Price
  Cc: Dan Williams, linux-cxl, David Hildenbrand, Kees Cook, stable,
	Dave Hansen, Michal Hocko, linux-mm, linux-acpi

On Tue, 14 Feb 2023 14:01:23 -0500
Gregory Price <gregory.price@memverge.com> wrote:

> On Tue, Feb 14, 2023 at 10:39:42AM -0800, Dan Williams wrote:
> > Gregory Price wrote:  
> > > On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:  
> > > > Summary:
> > > > --------
> > > > 
> > > > CXL RAM support allows for the dynamic provisioning of new CXL RAM
> > > > regions, and more routinely, assembling a region from an existing
> > > > configuration established by platform-firmware. The latter is motivated
> > > > by CXL memory RAS (Reliability, Availability and Serviceability)
> > > > support, that requires associating device events with System Physical
> > > > Address ranges and vice versa.
> > > >   
> > > 
> > > Ok, I simplified down my tests and reverted a bunch of stuff, figured i
> > > should report this before I dive further in.
> > > 
> > > Earlier i was carrying the DOE patches and others, I've dropped most of
> > > that to make sure i could replicate on the base kernel and qemu images
> > > 
> > > QEMU branch: 
> > > https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-26
> > > this is a little out of date at this point i think? but it shouldn't
> > > matter, the results are the same regardless of what else i pull in.
> > > 
> > > Kernel branch:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region  
> > 
> > Note that I acted on this feedback from Greg to break out a fix and
> > merge it for v6.2-final
> > 
> > http://lore.kernel.org/r/Y+CSOeHVLKudN0A6@kroah.com
> > 
> > ...i.e. you are missing at least the passthrough decoder fix, but that
> > would show up as a region creation failure not a QEMU crash.
> > 
> > So I would move to testing cxl/next.
> >   
> 
> I just noticed this, already spinning a new kernel.  Will report back
> 
> > Not ruling out the driver yet, but Fan's tests with hardware has me
> > leaning more towards QEMU.  
> 
> Same, not much has changed and I haven't tested with hardware yet. Was
> planning to install it on our local boxes sometime later this week.
> 
> Was just so close to setting up a virtual memory pool in the lab, was
> getting antsy :]

Could you test it with TCG (just drop --enable-kvm)?  We have a known
limitation with x86 instructions running out of CXL emulated memory
(side effect of emulating the interleave).  You'll need a fix even on TCG
for the corner case of an instruction bridging from normal ram to cxl memory.
https://lore.kernel.org/qemu-devel/20230206193809.1153124-1-richard.henderson@linaro.org/

Performance will be bad, but so far this is only way we can do it correctly.

Jonathan


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-14 21:18       ` Jonathan Cameron
@ 2023-02-14 21:51         ` Gregory Price
  2023-02-14 21:54           ` Gregory Price
  0 siblings, 1 reply; 112+ messages in thread
From: Gregory Price @ 2023-02-14 21:51 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, linux-cxl, David Hildenbrand, Kees Cook, stable,
	Dave Hansen, Michal Hocko, linux-mm, linux-acpi

On Tue, Feb 14, 2023 at 09:18:24PM +0000, Jonathan Cameron wrote:
> On Tue, 14 Feb 2023 14:01:23 -0500
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Tue, Feb 14, 2023 at 10:39:42AM -0800, Dan Williams wrote:
> > > Gregory Price wrote:  
> > > > On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:  
> > > > > Summary:
> > > > > --------
> > > > > 
> > > > > CXL RAM support allows for the dynamic provisioning of new CXL RAM
> > > > > regions, and more routinely, assembling a region from an existing
> > > > > configuration established by platform-firmware. The latter is motivated
> > > > > by CXL memory RAS (Reliability, Availability and Serviceability)
> > > > > support, that requires associating device events with System Physical
> > > > > Address ranges and vice versa.
> > > > >   
> > > > 
> > > > Ok, I simplified down my tests and reverted a bunch of stuff, figured i
> > > > should report this before I dive further in.
> > > > 
> > > > Earlier i was carrying the DOE patches and others, I've dropped most of
> > > > that to make sure i could replicate on the base kernel and qemu images
> > > > 
> > > > QEMU branch: 
> > > > https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-26
> > > > this is a little out of date at this point i think? but it shouldn't
> > > > matter, the results are the same regardless of what else i pull in.
> > > > 
> > > > Kernel branch:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region  
> > > 
> > > Note that I acted on this feedback from Greg to break out a fix and
> > > merge it for v6.2-final
> > > 
> > > http://lore.kernel.org/r/Y+CSOeHVLKudN0A6@kroah.com
> > > 
> > > ...i.e. you are missing at least the passthrough decoder fix, but that
> > > would show up as a region creation failure not a QEMU crash.
> > > 
> > > So I would move to testing cxl/next.
> > >   
> > 
> > I just noticed this, already spinning a new kernel.  Will report back
> > 
> > > Not ruling out the driver yet, but Fan's tests with hardware has me
> > > leaning more towards QEMU.  
> > 
> > Same, not much has changed and I haven't tested with hardware yet. Was
> > planning to install it on our local boxes sometime later this week.
> > 
> > Was just so close to setting up a virtual memory pool in the lab, was
> > getting antsy :]
> 
> Could you test it with TCG (just drop --enable-kvm)?  We have a known
> limitation with x86 instructions running out of CXL emulated memory
> (side effect of emulating the interleave).  You'll need a fix even on TCG
> for the corner case of an instruction bridging from normal ram to cxl memory.
> https://lore.kernel.org/qemu-devel/20230206193809.1153124-1-richard.henderson@linaro.org/
> 
> Performance will be bad, but so far this is only way we can do it correctly.
> 
> Jonathan
> 

Siiiggghh... i had this patch and dropped --enable-kvm, but forgot to
drop "accel=kvm" from the -machine line

This was the issue.

And let me tell you, if you numactl --membind=1 python, it is
IMPRESSIVELY slow.  I wonder if it's even hitting a few 100k
instructions a second.


This appears to be the issue.  When I get a bit more time, try to dive
into the deep dark depths of qemu memory regions to see how difficult
a non-mmio fork might be, unless someone else is already looking at it.

~Gregory

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-14 21:51         ` Gregory Price
@ 2023-02-14 21:54           ` Gregory Price
  2023-02-15 10:03               ` Jonathan Cameron via
  0 siblings, 1 reply; 112+ messages in thread
From: Gregory Price @ 2023-02-14 21:54 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, linux-cxl, David Hildenbrand, Kees Cook, stable,
	Dave Hansen, Michal Hocko, linux-mm, linux-acpi

On Tue, Feb 14, 2023 at 04:51:53PM -0500, Gregory Price wrote:
> On Tue, Feb 14, 2023 at 09:18:24PM +0000, Jonathan Cameron wrote:
> > On Tue, 14 Feb 2023 14:01:23 -0500
> > Gregory Price <gregory.price@memverge.com> wrote:
> > 
> > Could you test it with TCG (just drop --enable-kvm)?  We have a known
> > limitation with x86 instructions running out of CXL emulated memory
> > (side effect of emulating the interleave).  You'll need a fix even on TCG
> > for the corner case of an instruction bridging from normal ram to cxl memory.
> > https://lore.kernel.org/qemu-devel/20230206193809.1153124-1-richard.henderson@linaro.org/
> > 
> > Performance will be bad, but so far this is only way we can do it correctly.
> > 
> > Jonathan
> > 
> 
> Siiiggghh... i had this patch and dropped --enable-kvm, but forgot to
> drop "accel=kvm" from the -machine line
> 
> This was the issue.
> 
> And let me tell you, if you numactl --membind=1 python, it is
> IMPRESSIVELY slow.  I wonder if it's even hitting a few 100k
> instructions a second.
> 
> 
> This appears to be the issue.  When I get a bit more time, try to dive
> into the deep dark depths of qemu memory regions to see how difficult
> a non-mmio fork might be, unless someone else is already looking at it.
> 
> ~Gregory

Just clarifying one thing:  Even with the patch, KVM blows up.
Disabling KVM fixes this entirely.  I haven't tested without KVM but
with the patch, i will do that now.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-14 21:54           ` Gregory Price
@ 2023-02-15 10:03               ` Jonathan Cameron via
  0 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2023-02-15 10:03 UTC (permalink / raw)
  To: Gregory Price
  Cc: Dan Williams, linux-cxl, David Hildenbrand, Kees Cook, stable,
	Dave Hansen, Michal Hocko, linux-mm, linux-acpi, qemu-devel

On Tue, 14 Feb 2023 16:54:02 -0500
Gregory Price <gregory.price@memverge.com> wrote:

> On Tue, Feb 14, 2023 at 04:51:53PM -0500, Gregory Price wrote:
> > On Tue, Feb 14, 2023 at 09:18:24PM +0000, Jonathan Cameron wrote:  
> > > On Tue, 14 Feb 2023 14:01:23 -0500
> > > Gregory Price <gregory.price@memverge.com> wrote:
> > > 
> > > Could you test it with TCG (just drop --enable-kvm)?  We have a known
> > > limitation with x86 instructions running out of CXL emulated memory
> > > (side effect of emulating the interleave).  You'll need a fix even on TCG
> > > for the corner case of an instruction bridging from normal ram to cxl memory.
> > > https://lore.kernel.org/qemu-devel/20230206193809.1153124-1-richard.henderson@linaro.org/
> > > 
> > > Performance will be bad, but so far this is only way we can do it correctly.
> > > 
> > > Jonathan
> > >   
> > 
> > Siiiggghh... i had this patch and dropped --enable-kvm, but forgot to
> > drop "accel=kvm" from the -machine line
> > 
> > This was the issue.
> > 
> > And let me tell you, if you numactl --membind=1 python, it is
> > IMPRESSIVELY slow.  I wonder if it's even hitting a few 100k
> > instructions a second.
> > 
> > 
> > This appears to be the issue.  When I get a bit more time, try to dive
> > into the deep dark depths of qemu memory regions to see how difficult
> > a non-mmio fork might be, unless someone else is already looking at it.
> > 
> > ~Gregory  
> 
> Just clarifying one thing:  Even with the patch, KVM blows up.
> Disabling KVM fixes this entirely.  I haven't tested without KVM but
> with the patch, i will do that now.

yup.  The patch only fixes TCG so that's expected behavior.

Fingers crossed on this 'working'.

I'm open to suggestions on how to work around the problem with KVM
or indeed allow TCG to cache the instructions (right not it has
to fetch and emulate each instruction on it's own).

I can envision how we might do it for KVM with userspace page fault handling
used to get a fault up to QEMU which can then stitch in a cache
of the underlying memory as a stage 2 translation to the page (a little
bit like how post migration copy works) though I've not prototyped
anything...

I think it would be complex code that would be little used
so we may just have to cope with the emulation being slow.

Intent is very much to be able to test the kernel code etc, not
test it quickly :)

Jonathan

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
@ 2023-02-15 10:03               ` Jonathan Cameron via
  0 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron via @ 2023-02-15 10:03 UTC (permalink / raw)
  To: Gregory Price
  Cc: Dan Williams, linux-cxl, David Hildenbrand, Kees Cook, stable,
	Dave Hansen, Michal Hocko, linux-mm, linux-acpi, qemu-devel

On Tue, 14 Feb 2023 16:54:02 -0500
Gregory Price <gregory.price@memverge.com> wrote:

> On Tue, Feb 14, 2023 at 04:51:53PM -0500, Gregory Price wrote:
> > On Tue, Feb 14, 2023 at 09:18:24PM +0000, Jonathan Cameron wrote:  
> > > On Tue, 14 Feb 2023 14:01:23 -0500
> > > Gregory Price <gregory.price@memverge.com> wrote:
> > > 
> > > Could you test it with TCG (just drop --enable-kvm)?  We have a known
> > > limitation with x86 instructions running out of CXL emulated memory
> > > (side effect of emulating the interleave).  You'll need a fix even on TCG
> > > for the corner case of an instruction bridging from normal ram to cxl memory.
> > > https://lore.kernel.org/qemu-devel/20230206193809.1153124-1-richard.henderson@linaro.org/
> > > 
> > > Performance will be bad, but so far this is only way we can do it correctly.
> > > 
> > > Jonathan
> > >   
> > 
> > Siiiggghh... i had this patch and dropped --enable-kvm, but forgot to
> > drop "accel=kvm" from the -machine line
> > 
> > This was the issue.
> > 
> > And let me tell you, if you numactl --membind=1 python, it is
> > IMPRESSIVELY slow.  I wonder if it's even hitting a few 100k
> > instructions a second.
> > 
> > 
> > This appears to be the issue.  When I get a bit more time, try to dive
> > into the deep dark depths of qemu memory regions to see how difficult
> > a non-mmio fork might be, unless someone else is already looking at it.
> > 
> > ~Gregory  
> 
> Just clarifying one thing:  Even with the patch, KVM blows up.
> Disabling KVM fixes this entirely.  I haven't tested without KVM but
> with the patch, i will do that now.

yup.  The patch only fixes TCG so that's expected behavior.

Fingers crossed on this 'working'.

I'm open to suggestions on how to work around the problem with KVM
or indeed allow TCG to cache the instructions (right not it has
to fetch and emulate each instruction on it's own).

I can envision how we might do it for KVM with userspace page fault handling
used to get a fault up to QEMU which can then stitch in a cache
of the underlying memory as a stage 2 translation to the page (a little
bit like how post migration copy works) though I've not prototyped
anything...

I think it would be complex code that would be little used
so we may just have to cope with the emulation being slow.

Intent is very much to be able to test the kernel code etc, not
test it quickly :)

Jonathan


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default
  2023-02-15 10:03               ` Jonathan Cameron via
  (?)
@ 2023-02-18  9:47               ` Gregory Price
  -1 siblings, 0 replies; 112+ messages in thread
From: Gregory Price @ 2023-02-18  9:47 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, linux-cxl, David Hildenbrand, Kees Cook, stable,
	Dave Hansen, Michal Hocko, linux-mm, linux-acpi, qemu-devel

On Wed, Feb 15, 2023 at 10:03:27AM +0000, Jonathan Cameron wrote:
> On Tue, 14 Feb 2023 16:54:02 -0500
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > Just clarifying one thing:  Even with the patch, KVM blows up.
> > Disabling KVM fixes this entirely.  I haven't tested without KVM but
> > with the patch, i will do that now.
> 
> yup.  The patch only fixes TCG so that's expected behavior.
> 
> Fingers crossed on this 'working'.
> 
> I'm open to suggestions on how to work around the problem with KVM
> or indeed allow TCG to cache the instructions (right not it has
> to fetch and emulate each instruction on it's own).
> 
> I can envision how we might do it for KVM with userspace page fault handling
> used to get a fault up to QEMU which can then stitch in a cache
> of the underlying memory as a stage 2 translation to the page (a little
> bit like how post migration copy works) though I've not prototyped
> anything...
> 

Just following up.  With the patch applied and KVM turned off, no crash.
I've been working with this for a while.

We should move the instruction alignment issue into a separate
discussion thread.

^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2023-02-21 15:11 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-06  1:02 [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Dan Williams
2023-02-06  1:02 ` [PATCH 01/18] cxl/Documentation: Update references to attributes added in v6.0 Dan Williams
2023-02-06 15:17   ` Jonathan Cameron
2023-02-06 16:37   ` Gregory Price
2023-02-06 17:27   ` [PATCH 1/18] " Davidlohr Bueso
2023-02-06 19:15   ` [PATCH 01/18] " Ira Weiny
2023-02-06 21:04   ` Dave Jiang
2023-02-09  0:20   ` Verma, Vishal L
2023-02-06  1:02 ` [PATCH 02/18] cxl/region: Add a mode attribute for regions Dan Williams
2023-02-06 15:46   ` Jonathan Cameron
2023-02-06 17:47     ` Dan Williams
2023-02-06 16:39   ` Gregory Price
2023-02-06 19:16   ` Ira Weiny
2023-02-06 21:05   ` Dave Jiang
2023-02-09  0:22   ` Verma, Vishal L
2023-02-06  1:02 ` [PATCH 03/18] cxl/region: Support empty uuids for non-pmem regions Dan Williams
2023-02-06 15:54   ` Jonathan Cameron
2023-02-06 18:07     ` Dan Williams
2023-02-06 19:22   ` Ira Weiny
2023-02-06 19:35     ` Dan Williams
2023-02-09  0:24   ` Verma, Vishal L
2023-02-06  1:02 ` [PATCH 04/18] cxl/region: Validate region mode vs decoder mode Dan Williams
2023-02-06 16:02   ` Jonathan Cameron
2023-02-06 18:14     ` Dan Williams
2023-02-06 16:44   ` Gregory Price
2023-02-06 21:51     ` Dan Williams
2023-02-06 19:55       ` Gregory Price
2023-02-06 19:23   ` Ira Weiny
2023-02-06 22:16   ` Dave Jiang
2023-02-09  0:25   ` Verma, Vishal L
2023-02-06  1:02 ` [PATCH 05/18] cxl/region: Add volatile region creation support Dan Williams
2023-02-06 16:18   ` Jonathan Cameron
2023-02-06 18:19     ` Dan Williams
2023-02-06 16:55   ` Gregory Price
2023-02-06 21:57     ` Dan Williams
2023-02-06 19:56       ` Gregory Price
2023-02-06 19:25   ` Ira Weiny
2023-02-06 22:31   ` Dave Jiang
2023-02-06 22:37     ` Dan Williams
2023-02-09  1:02   ` Verma, Vishal L
2023-02-06  1:03 ` [PATCH 06/18] cxl/region: Refactor attach_target() for autodiscovery Dan Williams
2023-02-06 17:06   ` Jonathan Cameron
2023-02-06 18:48     ` Dan Williams
2023-02-06 19:26   ` Ira Weiny
2023-02-06 22:41   ` Dave Jiang
2023-02-09  1:09   ` Verma, Vishal L
2023-02-06  1:03 ` [PATCH 07/18] cxl/region: Move region-position validation to a helper Dan Williams
2023-02-06 17:44   ` Ira Weiny
2023-02-06 19:15     ` Dan Williams
2023-02-08 12:30   ` Jonathan Cameron
2023-02-09  4:09     ` Dan Williams
2023-02-09  4:26     ` Dan Williams
2023-02-09 11:07       ` Jonathan Cameron
2023-02-09 20:52         ` Dan Williams
2023-02-09 19:45   ` Verma, Vishal L
2023-02-06  1:03 ` [PATCH 08/18] kernel/range: Uplevel the cxl subsystem's range_contains() helper Dan Williams
2023-02-06 17:02   ` Gregory Price
2023-02-06 22:01     ` Dan Williams
2023-02-06 19:28   ` Ira Weiny
2023-02-06 23:41   ` Dave Jiang
2023-02-08 12:32   ` Jonathan Cameron
2023-02-09 19:47   ` Verma, Vishal L
2023-02-06  1:03 ` [PATCH 09/18] cxl/region: Enable CONFIG_CXL_REGION to be toggled Dan Williams
2023-02-06 17:03   ` Gregory Price
2023-02-06 23:57   ` Dave Jiang
2023-02-08 12:36   ` Jonathan Cameron
2023-02-09 20:17   ` Verma, Vishal L
2023-02-06  1:03 ` [PATCH 10/18] cxl/region: Fix passthrough-decoder detection Dan Williams
2023-02-06  5:38   ` Greg KH
2023-02-06 17:22     ` Dan Williams
2023-02-07  0:00   ` Dave Jiang
2023-02-08 12:44   ` Jonathan Cameron
2023-02-09 20:28   ` Verma, Vishal L
2023-02-06  1:03 ` [PATCH 11/18] cxl/region: Add region autodiscovery Dan Williams
2023-02-06 19:02   ` Ira Weiny
2023-02-07 23:54   ` Dave Jiang
2023-02-08 17:07   ` Jonathan Cameron
2023-02-09  4:07     ` Dan Williams
2023-02-06  1:03 ` [PATCH 12/18] tools/testing/cxl: Define a fixed volatile configuration to parse Dan Williams
2023-02-08 17:31   ` Jonathan Cameron
2023-02-09 20:50     ` Dan Williams
2023-02-06  1:03 ` [PATCH 13/18] dax/hmem: Move HMAT and Soft reservation probe initcall level Dan Williams
2023-02-06  1:03 ` [PATCH 14/18] dax/hmem: Drop unnecessary dax_hmem_remove() Dan Williams
2023-02-06 17:15   ` Gregory Price
2023-02-08 17:33   ` Jonathan Cameron
2023-02-06  1:03 ` [PATCH 15/18] dax/hmem: Convey the dax range via memregion_info() Dan Williams
2023-02-08 17:35   ` Jonathan Cameron
2023-02-06  1:03 ` [PATCH 16/18] dax/hmem: Move hmem device registration to dax_hmem.ko Dan Williams
2023-02-06  1:04 ` [PATCH 17/18] dax: Assign RAM regions to memory-hotplug by default Dan Williams
2023-02-06 17:26   ` Gregory Price
2023-02-06 22:15     ` Dan Williams
2023-02-06 19:05       ` Gregory Price
2023-02-06 23:20         ` Dan Williams
2023-02-06  1:04 ` [PATCH 18/18] cxl/dax: Create dax devices for CXL RAM regions Dan Williams
2023-02-06  5:36 ` [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default Gregory Price
2023-02-06 16:40   ` Davidlohr Bueso
2023-02-06 18:23     ` Dan Williams
2023-02-06 17:29   ` Dan Williams
2023-02-06 17:18     ` Davidlohr Bueso
     [not found] ` <CGME20230208173730uscas1p2af3a9eeb8946dfa607b190c079a49653@uscas1p2.samsung.com>
2023-02-08 17:37   ` Fan Ni
2023-02-09  4:56     ` Dan Williams
2023-02-13 12:13 ` David Hildenbrand
2023-02-14 18:45   ` Dan Williams
2023-02-14 18:27 ` Gregory Price
2023-02-14 18:39   ` Dan Williams
2023-02-14 19:01     ` Gregory Price
2023-02-14 21:18       ` Jonathan Cameron
2023-02-14 21:51         ` Gregory Price
2023-02-14 21:54           ` Gregory Price
2023-02-15 10:03             ` Jonathan Cameron
2023-02-15 10:03               ` Jonathan Cameron via
2023-02-18  9:47               ` Gregory Price

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.