linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/46] CXL PMEM Region Provisioning
@ 2022-06-24  2:45 Dan Williams
  2022-06-24  2:45 ` [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention Dan Williams
                   ` (48 more replies)
  0 siblings, 49 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:45 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ira Weiny, Christoph Hellwig, Jason Gunthorpe, Ben Widawsky,
	Alison Schofield, Matthew Wilcox, nvdimm, linux-pci, patches

tl;dr: 46 patches is way too many patches to review in one sitting. Jump
to the PATCH SUMMARY below to find a subset of interest to jump into.

The series is also posted on the 'preview' branch [1]. Note that branch
rebases, the tip of that branch at time of posting is:

7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview

---

Until the CXL 2.0 definition arrived there was little reason for OS
drivers to care about CXL memory expanders. Similar to DDR they just
implemented a physical address range that was described to the OS by
platform firmware (EFI Memory Map + ACPI SRAT/SLIT/HMAT etc). The CXL
2.0 definition adds support for PMEM, hotplug, switch topologies, and
device-interleaving which exceeds the limits of what can be reasonably
abstracted by EFI + ACPI mechanisms. As a result, Linux needs a native
capability to provision new CXL regions.

The term "region" is the same term that originated in the LIBNVDIMM
implementation to describe a host physical / system physical address
range. For PMEM a region is a persistent memory range that can be
further sub-divided into namespaces. For CXL there are three
classifications of regions:
- PMEM: set up by CXL native tooling and persisted in CXL region labels

- RAM: set up dynamically by CXL native tooling after hotplug events, or
  leftover capacity not mapped by platform firmware. Any persistent
  configuration would come from set up scripts / configuration files in
  usersapce.

- System RAM: set up by platform firmware and described by EFI + ACPI
  metadata, these regions are static.

For now, these patches implement just PMEM regions without region label
support. Note though that the infrastructure routines like
cxl_region_attach() and cxl_region_setup_targets() are building blocks
for region-label support, provisioning RAM regions, and enumerating
System RAM regions.

The general flow for provisioning a CXL region is to:
- Find a device or set of devices with available device-physical-address
  (DPA) capacity

- Find a platform CXL window that has free capacity to map a new region
  and that is able to target the devices in the previous step.

- Allocate DPA according to the CXL specification rules of sequential
  enabling of decoders by id and when a device hosts multiple decoders
  make sure that lower-id decoders map lower HPA and higher-id decoders
  map higher HPA.

- Assign endpoint decoders to a region and validate that the switching
  topology supports the requested configuration. Recall that
  interleaving is governed by modulo or xormap math that constrains which
  device can support which positions in a given region interleave.

- Program all the decoders an all endpoints and participating switches
  to bring the new address range online.

Once the range is online then existing drivers like LIBNVDIMM or
device-dax can manage the memory range as if the ACPI BIOS had conveyed
its parameters at boot.

This patch kit is the result of significant amounts of path finding work
[2] and long discussions with Ben. Thank you Ben for all that work!
Where the patches in this kit go in a different design direction than
the RFC, the authorship is changed and a Co-developed-by is added mainly
so I get blamed for the bad decisions and not Ben. The major updates
from that last posting are:

- all CXL resources are reflected in full in iomem_resource

- host-physical-address (HPA) range allocation moves to a
  devm_request_free_mem_region() derivative

- locking moves to two global rwsems, one for DPA / endpoint decoders
  and one for HPA / regions.

- the existing port scanning path is augmented to cache more topology
  information rather than recreate it at region creation time

[2]: https://lore.kernel.org/r/20220413183720.2444089-1-ben.widawsky@intel.com

PATCH SUMMARY

If you want to jump straight to the meat of the new infrastructure start
reading at patch 34.

- Patch 34 through 42 is the bulk of the new infrastructure that is
  needed to stand up a new region regardless of whether it is PMEM, or
  RAM.

- Patch 33 is a new core facility for allocating physical address space.
  It is a straightforward extension of devm_request_free_mem_region().

- Patch 9 uses insert_resource_expand_to_fit() to inform the new
  allocator mentioned above about which address ranges are busy / free.

- Patch 46 is the support that takes a CXL PMEM region and turns it into
  a LIBNVDIMM region. Patches 43-45 are just prep work for patch 46.

- Patch 16 - 20 is the infrastructure to mangage DPA capacity, including
  enumerating the DPA that platform firmware may have already allocated
  to a System RAM region. They also enable DPA allocations to be manipulated
  separate from the case when the decoder is assigned to a given region.
  This separation of allocation and region assignment is necessary for
  enumerating regions from region labels where labels within and across
  devices may disagree. Userspace in that situation may need to jump in
  and sort out the allocation conflicts.

- Patches 21 - 24 are updates to cxl_test to put this new implementation
  through its paces with a x8 device region creation test. Recall that
  cxl_test is a way to ship canned CXL configurations in the kernel
  alongside new CXL subsystem code to supplement testing that can be done
  with real devices or QEMU emulation. Note cxl_test just implements
  device topology and ABI, it does not test the PCI-related aspects of
  the implementation.

- Patches 25 - 29 are enhancements to the port enumeration code to cache
  and improve the lookup of topology metadata that is relevant for
  region provisioning.

- Patches 30 - 32 are some straightforward pre-work for exporting
  decoder settings via sysfs.

- Patch 1 - 8, 10 - 15 are some miscellaneous fixes and refactorings
  that should be straightforward to review.

[PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention
[PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port
[PATCH 03/46] cxl/hdm: Use local hdm variable
[PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range
[PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders
[PATCH 06/46] cxl/core: Drop is_cxl_decoder()
[PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity}
[PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder'
[PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
[PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources
[PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
[PATCH 12/46] cxl/mem: Convert partition-info to resources
[PATCH 13/46] cxl/hdm: Require all decoders to be enumerated
[PATCH 14/46] cxl/hdm: Enumerate allocated DPA
[PATCH 15/46] cxl/Documentation: List attribute permissions
[PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects
[PATCH 17/46] cxl/hdm: Track next decoder to allocate
[PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder
[PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init()
[PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem'
[PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory
[PATCH 22/46] tools/testing/cxl: Expand CFMWS windows
[PATCH 23/46] tools/testing/cxl: Add partition support
[PATCH 24/46] tools/testing/cxl: Fix decoder default state
[PATCH 25/46] cxl/port: Record dport in endpoint references
[PATCH 26/46] cxl/port: Record parent dport when adding ports
[PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port
[PATCH 28/46] cxl/port: Move dport tracking to an xarray
[PATCH 29/46] cxl/port: Cache CXL host bridge data
[PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
[PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices
[PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints
[PATCH 33/46] resource: Introduce alloc_free_mem_region()
[PATCH 34/46] cxl/region: Add region creation support
[PATCH 35/46] cxl/region: Add a 'uuid' attribute
[PATCH 36/46] cxl/region: Add interleave ways attribute
[PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions
[PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions
[PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism
[PATCH 40/46] cxl/region: Attach endpoint decoders
[PATCH 41/46] cxl/region: Program target lists
[PATCH 42/46] cxl/hdm: Commit decoder state to hardware
[PATCH 43/46] cxl/region: Add region driver boiler plate
[PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute
[PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge
[PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects

---

 Documentation/ABI/testing/sysfs-bus-cxl         |  271 +++
 Documentation/driver-api/cxl/memory-devices.rst |   11 
 drivers/cxl/Kconfig                             |    8 
 drivers/cxl/acpi.c                              |  198 ++-
 drivers/cxl/core/Makefile                       |    1 
 drivers/cxl/core/core.h                         |   52 +
 drivers/cxl/core/hdm.c                          |  663 ++++++++
 drivers/cxl/core/mbox.c                         |   95 +
 drivers/cxl/core/memdev.c                       |    4 
 drivers/cxl/core/pci.c                          |    8 
 drivers/cxl/core/pmem.c                         |    4 
 drivers/cxl/core/port.c                         |  678 ++++++---
 drivers/cxl/core/region.c                       | 1797 +++++++++++++++++++++++
 drivers/cxl/cxl.h                               |  294 +++-
 drivers/cxl/cxlmem.h                            |   39 
 drivers/cxl/mem.c                               |   49 -
 drivers/cxl/pci.c                               |    2 
 drivers/cxl/pmem.c                              |  256 +++
 drivers/nvdimm/region_devs.c                    |   28 
 include/linux/ioport.h                          |    2 
 include/linux/libnvdimm.h                       |    5 
 kernel/resource.c                               |  181 ++
 mm/Kconfig                                      |    5 
 tools/testing/cxl/Kbuild                        |    1 
 tools/testing/cxl/test/cxl.c                    |  123 +-
 tools/testing/cxl/test/mem.c                    |   53 -
 tools/testing/cxl/test/mock.c                   |    8 
 27 files changed, 4300 insertions(+), 536 deletions(-)
 create mode 100644 drivers/cxl/core/region.c

base-commit: f50974eee5c4a5de1e4f1a3d873099f170df25f8

^ permalink raw reply	[flat|nested] 157+ messages in thread

* [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
@ 2022-06-24  2:45 ` Dan Williams
  2022-06-28 10:37   ` Jonathan Cameron
       [not found]   ` <CGME20220629174147uscas1p211384ae262e099484440ef285be26c75@uscas1p2.samsung.com>
  2022-06-24  2:45 ` [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port Dan Williams
                   ` (47 subsequent siblings)
  48 siblings, 2 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:45 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

This failing signature:

[    8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760
[    8.392670] cxl_port: probe of endpoint2 failed with error 970997760
[    8.392719] create_endpoint: cxl_mem mem0: add: endpoint2
[    8.392721] cxl_mem mem0: endpoint2 failed probe
[    8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6

...shows cxl_hdm_decode_init() resulting in a return code ("970997760")
that looks like stack corruption. The problem goes away if
cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init().

The corruption results from the mismatch that the calling convention for
cxl_hdm_decode_init() is:

int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)

...and __wrap_cxl_hdm_decode_init() is:

bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)

...i.e. an int is expected but __wrap_hdm_decode_init() returns bool.

Fix the convention and cleanup the organization to match
__wrap_cxl_await_media_ready() as the difference was a red herring that
distracted from finding the bug.

Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/cxl/test/mock.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index f1f8c40948c5..bce6a21df0d5 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL);
 
-bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
-				struct cxl_hdm *cxlhdm)
+int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
+			       struct cxl_hdm *cxlhdm)
 {
 	int rc = 0, index;
 	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
 
-	if (!ops || !ops->is_mock_dev(cxlds->dev))
+	if (ops && ops->is_mock_dev(cxlds->dev))
+		rc = 0;
+	else
 		rc = cxl_hdm_decode_init(cxlds, cxlhdm);
 	put_cxl_mock_ops(index);
 


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
  2022-06-24  2:45 ` [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention Dan Williams
@ 2022-06-24  2:45 ` Dan Williams
  2022-06-24  3:37   ` Alison Schofield
                     ` (2 more replies)
  2022-06-24  2:45 ` [PATCH 03/46] cxl/hdm: Use local hdm variable Dan Williams
                   ` (46 subsequent siblings)
  48 siblings, 3 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:45 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

The upcoming region provisioning implementation has a need to
dereference port->uport during the port unregister flow. Specifically,
endpoint decoders need to be able to lookup their corresponding memdev
via port->uport.

The existing ->dead flag was added for cases where the core was
committed to tearing down the port, but needed to drop locks before
calling device_unregister(). Reuse that flag to indicate to
delete_endpoint() that it has no "release action" work to do as
unregister_port() will handle it.

Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/port.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index dbce99bdffab..7810d1a8369b 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -370,7 +370,7 @@ static void unregister_port(void *_port)
 		lock_dev = &parent->dev;
 
 	device_lock_assert(lock_dev);
-	port->uport = NULL;
+	port->dead = true;
 	device_unregister(&port->dev);
 }
 
@@ -857,7 +857,7 @@ static void delete_endpoint(void *data)
 	parent = &parent_port->dev;
 
 	device_lock(parent);
-	if (parent->driver && endpoint->uport) {
+	if (parent->driver && !endpoint->dead) {
 		devm_release_action(parent, cxl_unlink_uport, endpoint);
 		devm_release_action(parent, unregister_port, endpoint);
 	}


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 03/46] cxl/hdm: Use local hdm variable
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
  2022-06-24  2:45 ` [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention Dan Williams
  2022-06-24  2:45 ` [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port Dan Williams
@ 2022-06-24  2:45 ` Dan Williams
  2022-06-24  3:38   ` Alison Schofield
                     ` (2 more replies)
  2022-06-24  2:45 ` [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range Dan Williams
                   ` (45 subsequent siblings)
  48 siblings, 3 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:45 UTC (permalink / raw)
  To: linux-cxl; +Cc: Ben Widawsky, hch, alison.schofield, nvdimm, linux-pci, patches

From: Ben Widawsky <bwidawsk@kernel.org>

Save a few characters and use the already initialized local variable.

Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index bfc8ee876278..ba3d2d959c71 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -251,8 +251,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 			return PTR_ERR(cxld);
 		}
 
-		rc = init_hdm_decoder(port, cxld, target_map,
-				      cxlhdm->regs.hdm_decoder, i);
+		rc = init_hdm_decoder(port, cxld, target_map, hdm, i);
 		if (rc) {
 			put_device(&cxld->dev);
 			failed++;


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (2 preceding siblings ...)
  2022-06-24  2:45 ` [PATCH 03/46] cxl/hdm: Use local hdm variable Dan Williams
@ 2022-06-24  2:45 ` Dan Williams
  2022-06-24  3:39   ` Alison Schofield
                     ` (2 more replies)
  2022-06-24  2:45 ` [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders Dan Williams
                   ` (44 subsequent siblings)
  48 siblings, 3 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:45 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

In preparation for growing a ->dpa_range attribute for endpoint
decoders, rename the current ->decoder_range to the more descriptive
->hpa_range.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c       |    2 +-
 drivers/cxl/core/port.c      |    4 ++--
 drivers/cxl/cxl.h            |    4 ++--
 tools/testing/cxl/test/cxl.c |    2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index ba3d2d959c71..5c070c93b07f 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -172,7 +172,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 		return -ENXIO;
 	}
 
-	cxld->decoder_range = (struct range) {
+	cxld->hpa_range = (struct range) {
 		.start = base,
 		.end = base + size - 1,
 	};
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 7810d1a8369b..98bcbbd59a75 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -78,7 +78,7 @@ static ssize_t start_show(struct device *dev, struct device_attribute *attr,
 	if (is_root_decoder(dev))
 		start = cxld->platform_res.start;
 	else
-		start = cxld->decoder_range.start;
+		start = cxld->hpa_range.start;
 
 	return sysfs_emit(buf, "%#llx\n", start);
 }
@@ -93,7 +93,7 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
 	if (is_root_decoder(dev))
 		size = resource_size(&cxld->platform_res);
 	else
-		size = range_len(&cxld->decoder_range);
+		size = range_len(&cxld->hpa_range);
 
 	return sysfs_emit(buf, "%#llx\n", size);
 }
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 6799b27c7db2..8256728cea8d 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -198,7 +198,7 @@ enum cxl_decoder_type {
  * @dev: this decoder's device
  * @id: kernel device name id
  * @platform_res: address space resources considered by root decoder
- * @decoder_range: address space resources considered by midlevel decoder
+ * @hpa_range: Host physical address range mapped by this decoder
  * @interleave_ways: number of cxl_dports in this decode
  * @interleave_granularity: data stride per dport
  * @target_type: accelerator vs expander (type2 vs type3) selector
@@ -212,7 +212,7 @@ struct cxl_decoder {
 	int id;
 	union {
 		struct resource platform_res;
-		struct range decoder_range;
+		struct range hpa_range;
 	};
 	int interleave_ways;
 	int interleave_granularity;
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 431f2bddf6c8..7a08b025f2de 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -461,7 +461,7 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 			return PTR_ERR(cxld);
 		}
 
-		cxld->decoder_range = (struct range) {
+		cxld->hpa_range = (struct range) {
 			.start = 0,
 			.end = -1,
 		};


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (3 preceding siblings ...)
  2022-06-24  2:45 ` [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range Dan Williams
@ 2022-06-24  2:45 ` Dan Williams
  2022-06-28 15:24   ` Jonathan Cameron
       [not found]   ` <CGME20220629202117uscas1p2892fb68ae60c4754e2f7d26882a92ae5@uscas1p2.samsung.com>
  2022-06-24  2:45 ` [PATCH 06/46] cxl/core: Drop is_cxl_decoder() Dan Williams
                   ` (43 subsequent siblings)
  48 siblings, 2 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:45 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

Root decoders are responsible for hosting the available host address
space for endpoints and regions to claim. The tracking of that available
capacity can be done in iomem_resource directly. As a result, root
decoders no longer need to host their own resource tree. The
current ->platform_res attribute was added prematurely.

Otherwise, ->hpa_range fills the role of conveying the current decode
range of the decoder.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c      |   17 ++++++++++-------
 drivers/cxl/core/pci.c  |    8 +-------
 drivers/cxl/core/port.c |   30 +++++++-----------------------
 drivers/cxl/cxl.h       |    6 +-----
 4 files changed, 19 insertions(+), 42 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 40286f5df812..951695cdb455 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -108,8 +108,10 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 
 	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
 	cxld->target_type = CXL_DECODER_EXPANDER;
-	cxld->platform_res = (struct resource)DEFINE_RES_MEM(cfmws->base_hpa,
-							     cfmws->window_size);
+	cxld->hpa_range = (struct range) {
+		.start = cfmws->base_hpa,
+		.end = cfmws->base_hpa + cfmws->window_size - 1,
+	};
 	cxld->interleave_ways = CFMWS_INTERLEAVE_WAYS(cfmws);
 	cxld->interleave_granularity = CFMWS_INTERLEAVE_GRANULARITY(cfmws);
 
@@ -119,13 +121,14 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	else
 		rc = cxl_decoder_autoremove(dev, cxld);
 	if (rc) {
-		dev_err(dev, "Failed to add decoder for %pr\n",
-			&cxld->platform_res);
+		dev_err(dev, "Failed to add decoder for [%#llx - %#llx]\n",
+			cxld->hpa_range.start, cxld->hpa_range.end);
 		return 0;
 	}
-	dev_dbg(dev, "add: %s node: %d range %pr\n", dev_name(&cxld->dev),
-		phys_to_target_node(cxld->platform_res.start),
-		&cxld->platform_res);
+	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
+		dev_name(&cxld->dev),
+		phys_to_target_node(cxld->hpa_range.start),
+		cxld->hpa_range.start, cxld->hpa_range.end);
 
 	return 0;
 }
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index c4c99ff7b55e..7672789c3225 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -225,7 +225,6 @@ static int dvsec_range_allowed(struct device *dev, void *arg)
 {
 	struct range *dev_range = arg;
 	struct cxl_decoder *cxld;
-	struct range root_range;
 
 	if (!is_root_decoder(dev))
 		return 0;
@@ -237,12 +236,7 @@ static int dvsec_range_allowed(struct device *dev, void *arg)
 	if (!(cxld->flags & CXL_DECODER_F_RAM))
 		return 0;
 
-	root_range = (struct range) {
-		.start = cxld->platform_res.start,
-		.end = cxld->platform_res.end,
-	};
-
-	return range_contains(&root_range, dev_range);
+	return range_contains(&cxld->hpa_range, dev_range);
 }
 
 static void disable_hdm(void *_cxlhdm)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 98bcbbd59a75..b51eb41aa839 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -73,29 +73,17 @@ static ssize_t start_show(struct device *dev, struct device_attribute *attr,
 			  char *buf)
 {
 	struct cxl_decoder *cxld = to_cxl_decoder(dev);
-	u64 start;
 
-	if (is_root_decoder(dev))
-		start = cxld->platform_res.start;
-	else
-		start = cxld->hpa_range.start;
-
-	return sysfs_emit(buf, "%#llx\n", start);
+	return sysfs_emit(buf, "%#llx\n", cxld->hpa_range.start);
 }
 static DEVICE_ATTR_ADMIN_RO(start);
 
 static ssize_t size_show(struct device *dev, struct device_attribute *attr,
-			char *buf)
+			 char *buf)
 {
 	struct cxl_decoder *cxld = to_cxl_decoder(dev);
-	u64 size;
-
-	if (is_root_decoder(dev))
-		size = resource_size(&cxld->platform_res);
-	else
-		size = range_len(&cxld->hpa_range);
 
-	return sysfs_emit(buf, "%#llx\n", size);
+	return sysfs_emit(buf, "%#llx\n", range_len(&cxld->hpa_range));
 }
 static DEVICE_ATTR_RO(size);
 
@@ -1233,7 +1221,10 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 	cxld->interleave_ways = 1;
 	cxld->interleave_granularity = PAGE_SIZE;
 	cxld->target_type = CXL_DECODER_EXPANDER;
-	cxld->platform_res = (struct resource)DEFINE_RES_MEM(0, 0);
+	cxld->hpa_range = (struct range) {
+		.start = 0,
+		.end = -1,
+	};
 
 	return cxld;
 err:
@@ -1347,13 +1338,6 @@ int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map)
 	if (rc)
 		return rc;
 
-	/*
-	 * Platform decoder resources should show up with a reasonable name. All
-	 * other resources are just sub ranges within the main decoder resource.
-	 */
-	if (is_root_decoder(dev))
-		cxld->platform_res.name = dev_name(dev);
-
 	return device_add(dev);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_decoder_add_locked, CXL);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 8256728cea8d..35ce17872fc1 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -197,7 +197,6 @@ enum cxl_decoder_type {
  * struct cxl_decoder - CXL address range decode configuration
  * @dev: this decoder's device
  * @id: kernel device name id
- * @platform_res: address space resources considered by root decoder
  * @hpa_range: Host physical address range mapped by this decoder
  * @interleave_ways: number of cxl_dports in this decode
  * @interleave_granularity: data stride per dport
@@ -210,10 +209,7 @@ enum cxl_decoder_type {
 struct cxl_decoder {
 	struct device dev;
 	int id;
-	union {
-		struct resource platform_res;
-		struct range hpa_range;
-	};
+	struct range hpa_range;
 	int interleave_ways;
 	int interleave_granularity;
 	enum cxl_decoder_type target_type;


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 06/46] cxl/core: Drop is_cxl_decoder()
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (4 preceding siblings ...)
  2022-06-24  2:45 ` [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders Dan Williams
@ 2022-06-24  2:45 ` Dan Williams
  2022-06-24  3:48   ` Alison Schofield
                     ` (2 more replies)
  2022-06-24  2:45 ` [PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity} Dan Williams
                   ` (42 subsequent siblings)
  48 siblings, 3 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:45 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

This helper was only used to identify the object type for lockdep
purposes. Now that lockdep support is done with explicit lock classes,
this helper can be dropped.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/port.c |    6 ------
 drivers/cxl/cxl.h       |    1 -
 2 files changed, 7 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index b51eb41aa839..13c321afe076 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -271,12 +271,6 @@ bool is_root_decoder(struct device *dev)
 }
 EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
 
-bool is_cxl_decoder(struct device *dev)
-{
-	return dev->type && dev->type->release == cxl_decoder_release;
-}
-EXPORT_SYMBOL_NS_GPL(is_cxl_decoder, CXL);
-
 struct cxl_decoder *to_cxl_decoder(struct device *dev)
 {
 	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 35ce17872fc1..6e08fe8cc0fe 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -337,7 +337,6 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
 struct cxl_decoder *to_cxl_decoder(struct device *dev);
 bool is_root_decoder(struct device *dev);
 bool is_endpoint_decoder(struct device *dev);
-bool is_cxl_decoder(struct device *dev);
 struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 					   unsigned int nr_targets);
 struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity}
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (5 preceding siblings ...)
  2022-06-24  2:45 ` [PATCH 06/46] cxl/core: Drop is_cxl_decoder() Dan Williams
@ 2022-06-24  2:45 ` Dan Williams
  2022-06-28 15:36   ` Jonathan Cameron
  2022-06-24  2:45 ` [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder' Dan Williams
                   ` (41 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:45 UTC (permalink / raw)
  To: linux-cxl; +Cc: Ben Widawsky, hch, alison.schofield, nvdimm, linux-pci, patches

Interleave granularity and ways have CXL specification defined encodings.
Promote the conversion helpers to a common header, and use them to
replace other open-coded instances.

Force caller to consider the error case of the conversion as well.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c     |   34 +++++++++++++++++++---------------
 drivers/cxl/core/hdm.c |   35 +++++++++--------------------------
 drivers/cxl/cxl.h      |   26 ++++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 41 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 951695cdb455..544cb10ce33e 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -9,10 +9,6 @@
 #include "cxlpci.h"
 #include "cxl.h"
 
-/* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
-#define CFMWS_INTERLEAVE_WAYS(x)	(1 << (x)->interleave_ways)
-#define CFMWS_INTERLEAVE_GRANULARITY(x)	((x)->granularity + 8)
-
 static unsigned long cfmws_to_decoder_flags(int restrictions)
 {
 	unsigned long flags = CXL_DECODER_F_ENABLE;
@@ -34,7 +30,8 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
 static int cxl_acpi_cfmws_verify(struct device *dev,
 				 struct acpi_cedt_cfmws *cfmws)
 {
-	int expected_len;
+	unsigned int expected_len, ways;
+	int rc;
 
 	if (cfmws->interleave_arithmetic != ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
 		dev_err(dev, "CFMWS Unsupported Interleave Arithmetic\n");
@@ -51,14 +48,14 @@ static int cxl_acpi_cfmws_verify(struct device *dev,
 		return -EINVAL;
 	}
 
-	if (CFMWS_INTERLEAVE_WAYS(cfmws) > CXL_DECODER_MAX_INTERLEAVE) {
-		dev_err(dev, "CFMWS Interleave Ways (%d) too large\n",
-			CFMWS_INTERLEAVE_WAYS(cfmws));
+	rc = cxl_to_ways(cfmws->interleave_ways, &ways);
+	if (rc) {
+		dev_err(dev, "CFMWS Interleave Ways (%d) invalid\n",
+			cfmws->interleave_ways);
 		return -EINVAL;
 	}
 
-	expected_len = struct_size((cfmws), interleave_targets,
-				   CFMWS_INTERLEAVE_WAYS(cfmws));
+	expected_len = struct_size(cfmws, interleave_targets, ways);
 
 	if (cfmws->header.length < expected_len) {
 		dev_err(dev, "CFMWS length %d less than expected %d\n",
@@ -87,7 +84,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	struct device *dev = ctx->dev;
 	struct acpi_cedt_cfmws *cfmws;
 	struct cxl_decoder *cxld;
-	int rc, i;
+	unsigned int ways, i, ig;
+	int rc;
 
 	cfmws = (struct acpi_cedt_cfmws *) header;
 
@@ -99,10 +97,16 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 		return 0;
 	}
 
-	for (i = 0; i < CFMWS_INTERLEAVE_WAYS(cfmws); i++)
+	rc = cxl_to_ways(cfmws->interleave_ways, &ways);
+	if (rc)
+		return rc;
+	rc = cxl_to_granularity(cfmws->granularity, &ig);
+	if (rc)
+		return rc;
+	for (i = 0; i < ways; i++)
 		target_map[i] = cfmws->interleave_targets[i];
 
-	cxld = cxl_root_decoder_alloc(root_port, CFMWS_INTERLEAVE_WAYS(cfmws));
+	cxld = cxl_root_decoder_alloc(root_port, ways);
 	if (IS_ERR(cxld))
 		return 0;
 
@@ -112,8 +116,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 		.start = cfmws->base_hpa,
 		.end = cfmws->base_hpa + cfmws->window_size - 1,
 	};
-	cxld->interleave_ways = CFMWS_INTERLEAVE_WAYS(cfmws);
-	cxld->interleave_granularity = CFMWS_INTERLEAVE_GRANULARITY(cfmws);
+	cxld->interleave_ways = ways;
+	cxld->interleave_granularity = ig;
 
 	rc = cxl_decoder_add(cxld, target_map);
 	if (rc)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 5c070c93b07f..46635105a1f1 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -128,33 +128,12 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
 
-static int to_interleave_granularity(u32 ctrl)
-{
-	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IG_MASK, ctrl);
-
-	return 256 << val;
-}
-
-static int to_interleave_ways(u32 ctrl)
-{
-	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IW_MASK, ctrl);
-
-	switch (val) {
-	case 0 ... 4:
-		return 1 << val;
-	case 8 ... 10:
-		return 3 << (val - 8);
-	default:
-		return 0;
-	}
-}
-
 static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 			    int *target_map, void __iomem *hdm, int which)
 {
 	u64 size, base;
+	int i, rc;
 	u32 ctrl;
-	int i;
 	union {
 		u64 value;
 		unsigned char target_id[8];
@@ -183,14 +162,18 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
 			cxld->flags |= CXL_DECODER_F_LOCK;
 	}
-	cxld->interleave_ways = to_interleave_ways(ctrl);
-	if (!cxld->interleave_ways) {
+	rc = cxl_to_ways(FIELD_GET(CXL_HDM_DECODER0_CTRL_IW_MASK, ctrl),
+			 &cxld->interleave_ways);
+	if (rc) {
 		dev_warn(&port->dev,
 			 "decoder%d.%d: Invalid interleave ways (ctrl: %#x)\n",
 			 port->id, cxld->id, ctrl);
-		return -ENXIO;
+		return rc;
 	}
-	cxld->interleave_granularity = to_interleave_granularity(ctrl);
+	rc = cxl_to_granularity(FIELD_GET(CXL_HDM_DECODER0_CTRL_IG_MASK, ctrl),
+				&cxld->interleave_granularity);
+	if (rc)
+		return rc;
 
 	if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
 		cxld->target_type = CXL_DECODER_EXPANDER;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 6e08fe8cc0fe..fd02f9e2a829 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -64,6 +64,32 @@ static inline int cxl_hdm_decoder_count(u32 cap_hdr)
 	return val ? val * 2 : 1;
 }
 
+/* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
+static inline int cxl_to_granularity(u16 ig, unsigned int *val)
+{
+	if (ig > 6)
+		return -EINVAL;
+	*val = 256 << ig;
+	return 0;
+}
+
+/* Encode defined in CXL ECN "3, 6, 12 and 16-way memory Interleaving" */
+static inline int cxl_to_ways(u8 eniw, unsigned int *val)
+{
+	switch (eniw) {
+	case 0 ... 4:
+		*val = 1 << eniw;
+		break;
+	case 8 ... 10:
+		*val = 3 << (eniw - 8);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
 #define CXLDEV_CAP_ARRAY_OFFSET 0x0
 #define   CXLDEV_CAP_ARRAY_CAP_ID 0


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder'
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (6 preceding siblings ...)
  2022-06-24  2:45 ` [PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity} Dan Williams
@ 2022-06-24  2:45 ` Dan Williams
  2022-06-28 16:12   ` Jonathan Cameron
  2022-06-24  2:46 ` [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource Dan Williams
                   ` (40 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:45 UTC (permalink / raw)
  To: linux-cxl; +Cc: Ben Widawsky, hch, alison.schofield, nvdimm, linux-pci, patches

Currently 'struct cxl_decoder' contains the superset of attributes
needed for all decoder types. Before more type-specific attributes are
added to the common definition, reorganize 'struct cxl_decoder' into type
specific objects.

This patch, the first of three, factors out a cxl_switch_decoder type.
The 'switch' decoder type represents the decoder instances of cxl_port's
that route from the root of a CXL memory decode topology to the
endpoints. They come in two flavors, root-level decoders, statically
defined by platform firmware, and mid-level decoders, where
interleave-granularity, interleave-width, and the target list are
mutable.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c           |    4 +
 drivers/cxl/core/hdm.c       |   21 +++++---
 drivers/cxl/core/port.c      |  115 +++++++++++++++++++++++++++++++-----------
 drivers/cxl/cxl.h            |   27 ++++++----
 tools/testing/cxl/test/cxl.c |   12 +++-
 5 files changed, 128 insertions(+), 51 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 544cb10ce33e..d1b914dfa36c 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -81,6 +81,7 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	int target_map[CXL_DECODER_MAX_INTERLEAVE];
 	struct cxl_cfmws_context *ctx = arg;
 	struct cxl_port *root_port = ctx->root_port;
+	struct cxl_switch_decoder *cxlsd;
 	struct device *dev = ctx->dev;
 	struct acpi_cedt_cfmws *cfmws;
 	struct cxl_decoder *cxld;
@@ -106,10 +107,11 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	for (i = 0; i < ways; i++)
 		target_map[i] = cfmws->interleave_targets[i];
 
-	cxld = cxl_root_decoder_alloc(root_port, ways);
+	cxlsd = cxl_root_decoder_alloc(root_port, ways);
 	if (IS_ERR(cxld))
 		return 0;
 
+	cxld = &cxlsd->cxld;
 	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
 	cxld->target_type = CXL_DECODER_EXPANDER;
 	cxld->hpa_range = (struct range) {
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 46635105a1f1..2d1f3e6eebea 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -46,20 +46,20 @@ static int add_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
  */
 int devm_cxl_add_passthrough_decoder(struct cxl_port *port)
 {
-	struct cxl_decoder *cxld;
+	struct cxl_switch_decoder *cxlsd;
 	struct cxl_dport *dport;
 	int single_port_map[1];
 
-	cxld = cxl_switch_decoder_alloc(port, 1);
-	if (IS_ERR(cxld))
-		return PTR_ERR(cxld);
+	cxlsd = cxl_switch_decoder_alloc(port, 1);
+	if (IS_ERR(cxlsd))
+		return PTR_ERR(cxlsd);
 
 	device_lock_assert(&port->dev);
 
 	dport = list_first_entry(&port->dports, typeof(*dport), list);
 	single_port_map[0] = dport->port_id;
 
-	return add_hdm_decoder(port, cxld, single_port_map);
+	return add_hdm_decoder(port, &cxlsd->cxld, single_port_map);
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_add_passthrough_decoder, CXL);
 
@@ -226,8 +226,15 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 
 		if (is_cxl_endpoint(port))
 			cxld = cxl_endpoint_decoder_alloc(port);
-		else
-			cxld = cxl_switch_decoder_alloc(port, target_count);
+		else {
+			struct cxl_switch_decoder *cxlsd;
+
+			cxlsd = cxl_switch_decoder_alloc(port, target_count);
+			if (IS_ERR(cxlsd))
+				cxld = ERR_CAST(cxlsd);
+			else
+				cxld = &cxlsd->cxld;
+		}
 		if (IS_ERR(cxld)) {
 			dev_warn(&port->dev,
 				 "Failed to allocate the decoder\n");
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 13c321afe076..fd1cac13cd2e 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -119,20 +119,21 @@ static ssize_t target_type_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(target_type);
 
-static ssize_t emit_target_list(struct cxl_decoder *cxld, char *buf)
+static ssize_t emit_target_list(struct cxl_switch_decoder *cxlsd, char *buf)
 {
+	struct cxl_decoder *cxld = &cxlsd->cxld;
 	ssize_t offset = 0;
 	int i, rc = 0;
 
 	for (i = 0; i < cxld->interleave_ways; i++) {
-		struct cxl_dport *dport = cxld->target[i];
+		struct cxl_dport *dport = cxlsd->target[i];
 		struct cxl_dport *next = NULL;
 
 		if (!dport)
 			break;
 
 		if (i + 1 < cxld->interleave_ways)
-			next = cxld->target[i + 1];
+			next = cxlsd->target[i + 1];
 		rc = sysfs_emit_at(buf, offset, "%d%s", dport->port_id,
 				   next ? "," : "");
 		if (rc < 0)
@@ -143,18 +144,20 @@ static ssize_t emit_target_list(struct cxl_decoder *cxld, char *buf)
 	return offset;
 }
 
+static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
+
 static ssize_t target_list_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
 	ssize_t offset;
 	unsigned int seq;
 	int rc;
 
 	do {
-		seq = read_seqbegin(&cxld->target_lock);
-		rc = emit_target_list(cxld, buf);
-	} while (read_seqretry(&cxld->target_lock, seq));
+		seq = read_seqbegin(&cxlsd->target_lock);
+		rc = emit_target_list(cxlsd, buf);
+	} while (read_seqretry(&cxlsd->target_lock, seq));
 
 	if (rc < 0)
 		return rc;
@@ -232,14 +235,28 @@ static const struct attribute_group *cxl_decoder_endpoint_attribute_groups[] = {
 	NULL,
 };
 
+static void __cxl_decoder_release(struct cxl_decoder *cxld)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+
+	ida_free(&port->decoder_ida, cxld->id);
+	put_device(&port->dev);
+}
+
 static void cxl_decoder_release(struct device *dev)
 {
 	struct cxl_decoder *cxld = to_cxl_decoder(dev);
-	struct cxl_port *port = to_cxl_port(dev->parent);
 
-	ida_free(&port->decoder_ida, cxld->id);
+	__cxl_decoder_release(cxld);
 	kfree(cxld);
-	put_device(&port->dev);
+}
+
+static void cxl_switch_decoder_release(struct device *dev)
+{
+	struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
+
+	__cxl_decoder_release(&cxlsd->cxld);
+	kfree(cxlsd);
 }
 
 static const struct device_type cxl_decoder_endpoint_type = {
@@ -250,13 +267,13 @@ static const struct device_type cxl_decoder_endpoint_type = {
 
 static const struct device_type cxl_decoder_switch_type = {
 	.name = "cxl_decoder_switch",
-	.release = cxl_decoder_release,
+	.release = cxl_switch_decoder_release,
 	.groups = cxl_decoder_switch_attribute_groups,
 };
 
 static const struct device_type cxl_decoder_root_type = {
 	.name = "cxl_decoder_root",
-	.release = cxl_decoder_release,
+	.release = cxl_switch_decoder_release,
 	.groups = cxl_decoder_root_attribute_groups,
 };
 
@@ -271,15 +288,29 @@ bool is_root_decoder(struct device *dev)
 }
 EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
 
+static bool is_switch_decoder(struct device *dev)
+{
+	return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
+}
+
 struct cxl_decoder *to_cxl_decoder(struct device *dev)
 {
-	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
+	if (dev_WARN_ONCE(dev,
+			  !is_switch_decoder(dev) && !is_endpoint_decoder(dev),
 			  "not a cxl_decoder device\n"))
 		return NULL;
 	return container_of(dev, struct cxl_decoder, dev);
 }
 EXPORT_SYMBOL_NS_GPL(to_cxl_decoder, CXL);
 
+static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
+{
+	if (dev_WARN_ONCE(dev, !is_switch_decoder(dev),
+			  "not a cxl_switch_decoder device\n"))
+		return NULL;
+	return container_of(dev, struct cxl_switch_decoder, cxld.dev);
+}
+
 static void cxl_ep_release(struct cxl_ep *ep)
 {
 	if (!ep)
@@ -1129,7 +1160,7 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_find_dport_by_dev, CXL);
 
-static int decoder_populate_targets(struct cxl_decoder *cxld,
+static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
 				    struct cxl_port *port, int *target_map)
 {
 	int i, rc = 0;
@@ -1142,17 +1173,17 @@ static int decoder_populate_targets(struct cxl_decoder *cxld,
 	if (list_empty(&port->dports))
 		return -EINVAL;
 
-	write_seqlock(&cxld->target_lock);
-	for (i = 0; i < cxld->nr_targets; i++) {
+	write_seqlock(&cxlsd->target_lock);
+	for (i = 0; i < cxlsd->nr_targets; i++) {
 		struct cxl_dport *dport = find_dport(port, target_map[i]);
 
 		if (!dport) {
 			rc = -ENXIO;
 			break;
 		}
-		cxld->target[i] = dport;
+		cxlsd->target[i] = dport;
 	}
-	write_sequnlock(&cxld->target_lock);
+	write_sequnlock(&cxlsd->target_lock);
 
 	return rc;
 }
@@ -1179,13 +1210,27 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 {
 	struct cxl_decoder *cxld;
 	struct device *dev;
+	void *alloc;
 	int rc = 0;
 
 	if (nr_targets > CXL_DECODER_MAX_INTERLEAVE)
 		return ERR_PTR(-EINVAL);
 
-	cxld = kzalloc(struct_size(cxld, target, nr_targets), GFP_KERNEL);
-	if (!cxld)
+	if (nr_targets) {
+		struct cxl_switch_decoder *cxlsd;
+
+		alloc = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL);
+		cxlsd = alloc;
+		if (cxlsd) {
+			cxlsd->nr_targets = nr_targets;
+			seqlock_init(&cxlsd->target_lock);
+			cxld = &cxlsd->cxld;
+		}
+	} else {
+		alloc = kzalloc(sizeof(*cxld), GFP_KERNEL);
+		cxld = alloc;
+	}
+	if (!alloc)
 		return ERR_PTR(-ENOMEM);
 
 	rc = ida_alloc(&port->decoder_ida, GFP_KERNEL);
@@ -1196,8 +1241,6 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 	get_device(&port->dev);
 	cxld->id = rc;
 
-	cxld->nr_targets = nr_targets;
-	seqlock_init(&cxld->target_lock);
 	dev = &cxld->dev;
 	device_initialize(dev);
 	lockdep_set_class(&dev->mutex, &cxl_decoder_key);
@@ -1222,7 +1265,7 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 
 	return cxld;
 err:
-	kfree(cxld);
+	kfree(alloc);
 	return ERR_PTR(rc);
 }
 
@@ -1236,13 +1279,18 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
  * firmware description of CXL resources into a CXL standard decode
  * topology.
  */
-struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
-					   unsigned int nr_targets)
+struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
+						  unsigned int nr_targets)
 {
+	struct cxl_decoder *cxld;
+
 	if (!is_cxl_root(port))
 		return ERR_PTR(-EINVAL);
 
-	return cxl_decoder_alloc(port, nr_targets);
+	cxld = cxl_decoder_alloc(port, nr_targets);
+	if (IS_ERR(cxld))
+		return ERR_CAST(cxld);
+	return to_cxl_switch_decoder(&cxld->dev);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
 
@@ -1257,13 +1305,18 @@ EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
  * that sit between Switch Upstream Ports / Switch Downstream Ports and
  * Host Bridges / Root Ports.
  */
-struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
-					     unsigned int nr_targets)
+struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
+						    unsigned int nr_targets)
 {
+	struct cxl_decoder *cxld;
+
 	if (is_cxl_root(port) || is_cxl_endpoint(port))
 		return ERR_PTR(-EINVAL);
 
-	return cxl_decoder_alloc(port, nr_targets);
+	cxld = cxl_decoder_alloc(port, nr_targets);
+	if (IS_ERR(cxld))
+		return ERR_CAST(cxld);
+	return to_cxl_switch_decoder(&cxld->dev);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL);
 
@@ -1320,7 +1373,9 @@ int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map)
 
 	port = to_cxl_port(cxld->dev.parent);
 	if (!is_endpoint_decoder(dev)) {
-		rc = decoder_populate_targets(cxld, port, target_map);
+		struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
+
+		rc = decoder_populate_targets(cxlsd, port, target_map);
 		if (rc && (cxld->flags & CXL_DECODER_F_ENABLE)) {
 			dev_err(&port->dev,
 				"Failed to populate active decoder targets\n");
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index fd02f9e2a829..7525b55b11bb 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -220,7 +220,7 @@ enum cxl_decoder_type {
 #define CXL_DECODER_MAX_INTERLEAVE 16
 
 /**
- * struct cxl_decoder - CXL address range decode configuration
+ * struct cxl_decoder - Common CXL HDM Decoder Attributes
  * @dev: this decoder's device
  * @id: kernel device name id
  * @hpa_range: Host physical address range mapped by this decoder
@@ -228,10 +228,7 @@ enum cxl_decoder_type {
  * @interleave_granularity: data stride per dport
  * @target_type: accelerator vs expander (type2 vs type3) selector
  * @flags: memory type capabilities and locking
- * @target_lock: coordinate coherent reads of the target list
- * @nr_targets: number of elements in @target
- * @target: active ordered target list in current decoder configuration
- */
+*/
 struct cxl_decoder {
 	struct device dev;
 	int id;
@@ -240,12 +237,22 @@ struct cxl_decoder {
 	int interleave_granularity;
 	enum cxl_decoder_type target_type;
 	unsigned long flags;
+};
+
+/**
+ * struct cxl_switch_decoder - Switch specific CXL HDM Decoder
+ * @cxld: base cxl_decoder object
+ * @target_lock: coordinate coherent reads of the target list
+ * @nr_targets: number of elements in @target
+ * @target: active ordered target list in current decoder configuration
+ */
+struct cxl_switch_decoder {
+	struct cxl_decoder cxld;
 	seqlock_t target_lock;
 	int nr_targets;
 	struct cxl_dport *target[];
 };
 
-
 /**
  * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
  * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
@@ -363,10 +370,10 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
 struct cxl_decoder *to_cxl_decoder(struct device *dev);
 bool is_root_decoder(struct device *dev);
 bool is_endpoint_decoder(struct device *dev);
-struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
-					   unsigned int nr_targets);
-struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
-					     unsigned int nr_targets);
+struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
+						  unsigned int nr_targets);
+struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
+						    unsigned int nr_targets);
 int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
 struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
 int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map);
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 7a08b025f2de..68288354b419 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -451,9 +451,15 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 		struct cxl_decoder *cxld;
 		int rc;
 
-		if (target_count)
-			cxld = cxl_switch_decoder_alloc(port, target_count);
-		else
+		if (target_count) {
+			struct cxl_switch_decoder *cxlsd;
+
+			cxlsd = cxl_switch_decoder_alloc(port, target_count);
+			if (IS_ERR(cxlsd))
+				cxld = ERR_CAST(cxlsd);
+			else
+				cxld = &cxlsd->cxld;
+		} else
 			cxld = cxl_endpoint_decoder_alloc(port);
 		if (IS_ERR(cxld)) {
 			dev_warn(&port->dev,


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (7 preceding siblings ...)
  2022-06-24  2:45 ` [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder' Dan Williams
@ 2022-06-24  2:46 ` Dan Williams
  2022-06-28 16:43   ` Jonathan Cameron
  2022-06-24  2:46 ` [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources Dan Williams
                   ` (39 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:46 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

Recall that CXL capable address ranges, on ACPI platforms, are published
in the CEDT.CFMWS (CXL Early Discovery Table - CXL Fixed Memory Window
Structures). These windows represent both the actively mapped capacity
and the potential address space that can be dynamically assigned to a
new CXL decode configuration.

CXL endpoints like DDR DIMMs can be mapped at any physical address
including 0 and legacy ranges.

There is an expectation and requirement that the /proc/iomem interface
and the iomem_resource in the kernel reflect the full set of platform
address ranges. I.e. that every address range that platform firmware and
bus drivers enumerate be reflected as an iomem_resource entry. The hard
requirement to do this for CXL arises from the fact that capabilities
like CONFIG_DEVICE_PRIVATE expect to be able to treat empty
iomem_resource ranges as free for software to use as proxy address
space. Without CXL publishing its potential address ranges in
iomem_resource, the CONFIG_DEVICE_PRIVATE mechanism may inadvertently
steal capacity reserved for runtime provisioning of new CXL regions.

The approach taken supports dynamically publishing the CXL window map on
demand when a CXL platform driver like cxl_acpi loads. The windows are
then forced into the first level of iomem_resource tree via the
insert_resource_expand_to_fit() API. This forcing sacrifices some
resource boundary accurracy in order to better reflect the decode
hierarchy of a CXL window hosting "System RAM" and other resources.

Walkers of the iomem_resource tree will also need to have access to the
related 'struct cxl_decoder' instances to disambiguate which portions of
a CXL memory resource are present vs expanded to enforce the expected
resource topology.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c |  110 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/resource.c  |    7 +++
 2 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index d1b914dfa36c..003fa4fde357 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -73,6 +73,7 @@ static int cxl_acpi_cfmws_verify(struct device *dev,
 struct cxl_cfmws_context {
 	struct device *dev;
 	struct cxl_port *root_port;
+	int id;
 };
 
 static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
@@ -84,8 +85,10 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	struct cxl_switch_decoder *cxlsd;
 	struct device *dev = ctx->dev;
 	struct acpi_cedt_cfmws *cfmws;
+	struct resource *cxl_res;
 	struct cxl_decoder *cxld;
 	unsigned int ways, i, ig;
+	struct resource *res;
 	int rc;
 
 	cfmws = (struct acpi_cedt_cfmws *) header;
@@ -107,6 +110,24 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	for (i = 0; i < ways; i++)
 		target_map[i] = cfmws->interleave_targets[i];
 
+	res = kzalloc(sizeof(*res), GFP_KERNEL);
+	if (!res)
+		return -ENOMEM;
+
+	res->name = kasprintf(GFP_KERNEL, "CXL Window %d", ctx->id++);
+	if (!res->name)
+		goto err_name;
+
+	res->start = cfmws->base_hpa;
+	res->end = cfmws->base_hpa + cfmws->window_size - 1;
+	res->flags = IORESOURCE_MEM;
+
+	/* add to the local resource tracking to establish a sort order */
+	cxl_res = dev_get_drvdata(&root_port->dev);
+	rc = insert_resource(cxl_res, res);
+	if (rc)
+		goto err_insert;
+
 	cxlsd = cxl_root_decoder_alloc(root_port, ways);
 	if (IS_ERR(cxld))
 		return 0;
@@ -115,8 +136,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
 	cxld->target_type = CXL_DECODER_EXPANDER;
 	cxld->hpa_range = (struct range) {
-		.start = cfmws->base_hpa,
-		.end = cfmws->base_hpa + cfmws->window_size - 1,
+		.start = res->start,
+		.end = res->end,
 	};
 	cxld->interleave_ways = ways;
 	cxld->interleave_granularity = ig;
@@ -131,12 +152,19 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 			cxld->hpa_range.start, cxld->hpa_range.end);
 		return 0;
 	}
+
 	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
 		dev_name(&cxld->dev),
 		phys_to_target_node(cxld->hpa_range.start),
 		cxld->hpa_range.start, cxld->hpa_range.end);
 
 	return 0;
+
+err_insert:
+	kfree(res->name);
+err_name:
+	kfree(res);
+	return -ENOMEM;
 }
 
 __mock struct acpi_device *to_cxl_host_bridge(struct device *host,
@@ -291,9 +319,66 @@ static void cxl_acpi_lock_reset_class(void *dev)
 	device_lock_reset_class(dev);
 }
 
+static void del_cxl_resource(struct resource *res)
+{
+	kfree(res->name);
+	kfree(res);
+}
+
+static void remove_cxl_resources(void *data)
+{
+	struct resource *res, *next, *cxl = data;
+
+	for (res = cxl->child; res; res = next) {
+		struct resource *victim = (struct resource *) res->desc;
+
+		next = res->sibling;
+		remove_resource(res);
+
+		if (victim) {
+			remove_resource(victim);
+			kfree(victim);
+		}
+
+		del_cxl_resource(res);
+	}
+}
+
+static int add_cxl_resources(struct resource *cxl)
+{
+	struct resource *res, *new, *next;
+
+	for (res = cxl->child; res; res = next) {
+		new = kzalloc(sizeof(*new), GFP_KERNEL);
+		if (!new)
+			return -ENOMEM;
+		new->name = res->name;
+		new->start = res->start;
+		new->end = res->end;
+		new->flags = IORESOURCE_MEM;
+		res->desc = (unsigned long) new;
+
+		insert_resource_expand_to_fit(&iomem_resource, new);
+
+		next = res->sibling;
+		while (next && resource_overlaps(new, next)) {
+			if (resource_contains(new, next)) {
+				struct resource *_next = next->sibling;
+
+				remove_resource(next);
+				del_cxl_resource(next);
+				next = _next;
+			} else
+				next->start = new->end + 1;
+		}
+	}
+	return 0;
+}
+
 static int cxl_acpi_probe(struct platform_device *pdev)
 {
 	int rc;
+	struct resource *cxl_res;
 	struct cxl_port *root_port;
 	struct device *host = &pdev->dev;
 	struct acpi_device *adev = ACPI_COMPANION(host);
@@ -305,21 +390,40 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 	if (rc)
 		return rc;
 
+	cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL);
+	if (!cxl_res)
+		return -ENOMEM;
+	cxl_res->name = "CXL mem";
+	cxl_res->start = 0;
+	cxl_res->end = -1;
+	cxl_res->flags = IORESOURCE_MEM;
+
 	root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
 	if (IS_ERR(root_port))
 		return PTR_ERR(root_port);
 	dev_dbg(host, "add: %s\n", dev_name(&root_port->dev));
+	dev_set_drvdata(&root_port->dev, cxl_res);
 
 	rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
 			      add_host_bridge_dport);
 	if (rc < 0)
 		return rc;
 
+	rc = devm_add_action_or_reset(host, remove_cxl_resources, cxl_res);
+	if (rc)
+		return rc;
+
 	ctx = (struct cxl_cfmws_context) {
 		.dev = host,
 		.root_port = root_port,
 	};
-	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, cxl_parse_cfmws, &ctx);
+	rc = acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, cxl_parse_cfmws, &ctx);
+	if (rc < 0)
+		return -ENXIO;
+
+	rc = add_cxl_resources(cxl_res);
+	if (rc)
+		return rc;
 
 	/*
 	 * Root level scanned with host-bridge as dports, now scan host-bridges
diff --git a/kernel/resource.c b/kernel/resource.c
index 34eaee179689..53a534db350e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -891,6 +891,13 @@ void insert_resource_expand_to_fit(struct resource *root, struct resource *new)
 	}
 	write_unlock(&resource_lock);
 }
+/*
+ * Not for general consumption, only early boot memory map parsing, PCI
+ * resource discovery, and late discovery of CXL resources are expected
+ * to use this interface. The former are built-in and only the latter,
+ * CXL, is a module.
+ */
+EXPORT_SYMBOL_NS_GPL(insert_resource_expand_to_fit, CXL);
 
 /**
  * remove_resource - Remove a resource in the resource tree


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (8 preceding siblings ...)
  2022-06-24  2:46 ` [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource Dan Williams
@ 2022-06-24  2:46 ` Dan Williams
  2022-06-28 16:49   ` Jonathan Cameron
  2022-06-28 16:53   ` Jonathan Cameron
  2022-06-24  2:46 ` [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources Dan Williams
                   ` (38 subsequent siblings)
  48 siblings, 2 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:46 UTC (permalink / raw)
  To: linux-cxl; +Cc: Ben Widawsky, hch, alison.schofield, nvdimm, linux-pci, patches

Previously the target routing specifics of switch decoders were factored
out of 'struct cxl_decoder' into 'struct cxl_switch_decoder'.

This patch, 2 of 3, adds a 'struct cxl_root_decoder' as a superset of a
switch decoder that also track the associated CXL window platform
resource.

Note that the reason the resource for a given root decoder needs to be
looked up after the fact (i.e. after cxl_parse_cfmws() and
add_cxl_resource()) is because add_cxl_resource() may have merged CXL
windows in order to keep them at the top of the resource tree / decode
hierarchy.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c      |   40 ++++++++++++++++++++++++++++++++++++----
 drivers/cxl/core/port.c |   43 +++++++++++++++++++++++++++++++++++++------
 drivers/cxl/cxl.h       |   15 +++++++++++++--
 3 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 003fa4fde357..5972f380cdf2 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -82,7 +82,7 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	int target_map[CXL_DECODER_MAX_INTERLEAVE];
 	struct cxl_cfmws_context *ctx = arg;
 	struct cxl_port *root_port = ctx->root_port;
-	struct cxl_switch_decoder *cxlsd;
+	struct cxl_root_decoder *cxlrd;
 	struct device *dev = ctx->dev;
 	struct acpi_cedt_cfmws *cfmws;
 	struct resource *cxl_res;
@@ -128,11 +128,11 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	if (rc)
 		goto err_insert;
 
-	cxlsd = cxl_root_decoder_alloc(root_port, ways);
-	if (IS_ERR(cxld))
+	cxlrd = cxl_root_decoder_alloc(root_port, ways);
+	if (IS_ERR(cxlrd))
 		return 0;
 
-	cxld = &cxlsd->cxld;
+	cxld = &cxlrd->cxlsd.cxld;
 	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
 	cxld->target_type = CXL_DECODER_EXPANDER;
 	cxld->hpa_range = (struct range) {
@@ -375,6 +375,32 @@ static int add_cxl_resources(struct resource *cxl)
 	return 0;
 }
 
+static int pair_cxl_resource(struct device *dev, void *data)
+{
+	struct resource *cxl_res = data;
+	struct resource *p;
+
+	if (!is_root_decoder(dev))
+		return 0;
+
+	for (p = cxl_res->child; p; p = p->sibling) {
+		struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+		struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
+		struct resource res = {
+			.start = cxld->hpa_range.start,
+			.end = cxld->hpa_range.end,
+			.flags = IORESOURCE_MEM,
+		};
+
+		if (resource_contains(p, &res)) {
+			cxlrd->res = (struct resource *)p->desc;
+			break;
+		}
+	}
+
+	return 0;
+}
+
 static int cxl_acpi_probe(struct platform_device *pdev)
 {
 	int rc;
@@ -425,6 +451,12 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 	if (rc)
 		return rc;
 
+	/*
+	 * Populate the root decoders with their related iomem resource,
+	 * if present
+	 */
+	device_for_each_child(&root_port->dev, cxl_res, pair_cxl_resource);
+
 	/*
 	 * Root level scanned with host-bridge as dports, now scan host-bridges
 	 * for their role as CXL uports to their CXL-capable PCIe Root Ports.
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index fd1cac13cd2e..abf3455c4eff 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -259,6 +259,23 @@ static void cxl_switch_decoder_release(struct device *dev)
 	kfree(cxlsd);
 }
 
+struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev)
+{
+	if (dev_WARN_ONCE(dev, !is_root_decoder(dev),
+			  "not a cxl_root_decoder device\n"))
+		return NULL;
+	return container_of(dev, struct cxl_root_decoder, cxlsd.cxld.dev);
+}
+EXPORT_SYMBOL_NS_GPL(to_cxl_root_decoder, CXL);
+
+static void cxl_root_decoder_release(struct device *dev)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+
+	__cxl_decoder_release(&cxlrd->cxlsd.cxld);
+	kfree(cxlrd);
+}
+
 static const struct device_type cxl_decoder_endpoint_type = {
 	.name = "cxl_decoder_endpoint",
 	.release = cxl_decoder_release,
@@ -273,7 +290,7 @@ static const struct device_type cxl_decoder_switch_type = {
 
 static const struct device_type cxl_decoder_root_type = {
 	.name = "cxl_decoder_root",
-	.release = cxl_switch_decoder_release,
+	.release = cxl_root_decoder_release,
 	.groups = cxl_decoder_root_attribute_groups,
 };
 
@@ -1218,9 +1235,23 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 
 	if (nr_targets) {
 		struct cxl_switch_decoder *cxlsd;
+		struct cxl_root_decoder *cxlrd;
+
+		if (is_cxl_root(port)) {
+			alloc = kzalloc(struct_size(cxlrd, cxlsd.target,
+						    nr_targets),
+					GFP_KERNEL);
+			cxlrd = alloc;
+			if (cxlrd)
+				cxlsd = &cxlrd->cxlsd;
+			else
+				cxlsd = NULL;
+		} else {
+			alloc = kzalloc(struct_size(cxlsd, target, nr_targets),
+					GFP_KERNEL);
+			cxlsd = alloc;
+		}
 
-		alloc = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL);
-		cxlsd = alloc;
 		if (cxlsd) {
 			cxlsd->nr_targets = nr_targets;
 			seqlock_init(&cxlsd->target_lock);
@@ -1279,8 +1310,8 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
  * firmware description of CXL resources into a CXL standard decode
  * topology.
  */
-struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
-						  unsigned int nr_targets)
+struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
+						unsigned int nr_targets)
 {
 	struct cxl_decoder *cxld;
 
@@ -1290,7 +1321,7 @@ struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 	cxld = cxl_decoder_alloc(port, nr_targets);
 	if (IS_ERR(cxld))
 		return ERR_CAST(cxld);
-	return to_cxl_switch_decoder(&cxld->dev);
+	return to_cxl_root_decoder(&cxld->dev);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 7525b55b11bb..6dd1e4c57a67 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -253,6 +253,16 @@ struct cxl_switch_decoder {
 	struct cxl_dport *target[];
 };
 
+/**
+ * struct cxl_root_decoder - Static platform CXL address decoder
+ * @res: host / parent resource for region allocations
+ * @cxlsd: base cxl switch decoder
+ */
+struct cxl_root_decoder {
+	struct resource *res;
+	struct cxl_switch_decoder cxlsd;
+};
+
 /**
  * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
  * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
@@ -368,10 +378,11 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
 					const struct device *dev);
 
 struct cxl_decoder *to_cxl_decoder(struct device *dev);
+struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
 bool is_root_decoder(struct device *dev);
 bool is_endpoint_decoder(struct device *dev);
-struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
-						  unsigned int nr_targets);
+struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
+						unsigned int nr_targets);
 struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
 						    unsigned int nr_targets);
 int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (9 preceding siblings ...)
  2022-06-24  2:46 ` [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources Dan Williams
@ 2022-06-24  2:46 ` Dan Williams
  2022-06-28 16:55   ` Jonathan Cameron
  2022-06-24  2:46 ` [PATCH 12/46] cxl/mem: Convert partition-info to resources Dan Williams
                   ` (37 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:46 UTC (permalink / raw)
  To: linux-cxl; +Cc: Ben Widawsky, hch, alison.schofield, nvdimm, linux-pci, patches

Previously the target routing specifics of switch decoders and platfom
CXL window resource tracking of root decoders were factored out of
'struct cxl_decoder'. While switch decoders translate from SPA to
downstream ports, endpoint decoders translate from SPA to DPA.

This patch, 3 of 3, adds a 'struct cxl_endpoint_decoder' that tracks an
endpoint-specific Device Physical Address (DPA) resource. For now this
just defines ->dpa_res, a follow-on patch will handle requesting DPA
resource ranges from a device-DPA resource tree.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c       |   12 +++++++++---
 drivers/cxl/core/port.c      |   36 +++++++++++++++++++++++++++---------
 drivers/cxl/cxl.h            |   15 ++++++++++++++-
 tools/testing/cxl/test/cxl.c |   11 +++++++++--
 4 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 2d1f3e6eebea..2223d151b61b 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -224,9 +224,15 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 		int rc, target_count = cxlhdm->target_count;
 		struct cxl_decoder *cxld;
 
-		if (is_cxl_endpoint(port))
-			cxld = cxl_endpoint_decoder_alloc(port);
-		else {
+		if (is_cxl_endpoint(port)) {
+			struct cxl_endpoint_decoder *cxled;
+
+			cxled = cxl_endpoint_decoder_alloc(port);
+			if (IS_ERR(cxled))
+				cxld = ERR_CAST(cxled);
+			else
+				cxld = &cxled->cxld;
+		} else {
 			struct cxl_switch_decoder *cxlsd;
 
 			cxlsd = cxl_switch_decoder_alloc(port, target_count);
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index abf3455c4eff..b5f5fb9aa4b7 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -243,12 +243,12 @@ static void __cxl_decoder_release(struct cxl_decoder *cxld)
 	put_device(&port->dev);
 }
 
-static void cxl_decoder_release(struct device *dev)
+static void cxl_endpoint_decoder_release(struct device *dev)
 {
-	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
 
-	__cxl_decoder_release(cxld);
-	kfree(cxld);
+	__cxl_decoder_release(&cxled->cxld);
+	kfree(cxled);
 }
 
 static void cxl_switch_decoder_release(struct device *dev)
@@ -278,7 +278,7 @@ static void cxl_root_decoder_release(struct device *dev)
 
 static const struct device_type cxl_decoder_endpoint_type = {
 	.name = "cxl_decoder_endpoint",
-	.release = cxl_decoder_release,
+	.release = cxl_endpoint_decoder_release,
 	.groups = cxl_decoder_endpoint_attribute_groups,
 };
 
@@ -320,6 +320,15 @@ struct cxl_decoder *to_cxl_decoder(struct device *dev)
 }
 EXPORT_SYMBOL_NS_GPL(to_cxl_decoder, CXL);
 
+struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev)
+{
+	if (dev_WARN_ONCE(dev, !is_endpoint_decoder(dev),
+			  "not a cxl_endpoint_decoder device\n"))
+		return NULL;
+	return container_of(dev, struct cxl_endpoint_decoder, cxld.dev);
+}
+EXPORT_SYMBOL_NS_GPL(to_cxl_endpoint_decoder, CXL);
+
 static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
 {
 	if (dev_WARN_ONCE(dev, !is_switch_decoder(dev),
@@ -1258,8 +1267,12 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 			cxld = &cxlsd->cxld;
 		}
 	} else {
-		alloc = kzalloc(sizeof(*cxld), GFP_KERNEL);
-		cxld = alloc;
+		struct cxl_endpoint_decoder *cxled;
+
+		alloc = kzalloc(sizeof(*cxled), GFP_KERNEL);
+		cxled = alloc;
+		if (cxled)
+			cxld = &cxled->cxld;
 	}
 	if (!alloc)
 		return ERR_PTR(-ENOMEM);
@@ -1357,12 +1370,17 @@ EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL);
  *
  * Return: A new cxl decoder to be registered by cxl_decoder_add()
  */
-struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port)
+struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port)
 {
+	struct cxl_decoder *cxld;
+
 	if (!is_cxl_endpoint(port))
 		return ERR_PTR(-EINVAL);
 
-	return cxl_decoder_alloc(port, 0);
+	cxld = cxl_decoder_alloc(port, 0);
+	if (IS_ERR(cxld))
+		return ERR_CAST(cxld);
+	return to_cxl_endpoint_decoder(&cxld->dev);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_alloc, CXL);
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 6dd1e4c57a67..579f2d802396 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -239,6 +239,18 @@ struct cxl_decoder {
 	unsigned long flags;
 };
 
+/**
+ * struct cxl_endpoint_decoder - Endpoint  / SPA to DPA decoder
+ * @cxld: base cxl_decoder_object
+ * @dpa_res: actively claimed DPA span of this decoder
+ * @skip: offset into @dpa_res where @cxld.hpa_range maps
+ */
+struct cxl_endpoint_decoder {
+	struct cxl_decoder cxld;
+	struct resource *dpa_res;
+	resource_size_t skip;
+};
+
 /**
  * struct cxl_switch_decoder - Switch specific CXL HDM Decoder
  * @cxld: base cxl_decoder object
@@ -379,6 +391,7 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
 
 struct cxl_decoder *to_cxl_decoder(struct device *dev);
 struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
+struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
 bool is_root_decoder(struct device *dev);
 bool is_endpoint_decoder(struct device *dev);
 struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
@@ -386,7 +399,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
 						    unsigned int nr_targets);
 int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
-struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
+struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
 int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map);
 int cxl_decoder_autoremove(struct device *host, struct cxl_decoder *cxld);
 int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 68288354b419..f52a5dd69d36 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -459,8 +459,15 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 				cxld = ERR_CAST(cxlsd);
 			else
 				cxld = &cxlsd->cxld;
-		} else
-			cxld = cxl_endpoint_decoder_alloc(port);
+		} else {
+			struct cxl_endpoint_decoder *cxled;
+
+			cxled = cxl_endpoint_decoder_alloc(port);
+			if (IS_ERR(cxled))
+				cxld = ERR_CAST(cxled);
+			else
+				cxld = &cxled->cxld;
+		}
 		if (IS_ERR(cxld)) {
 			dev_warn(&port->dev,
 				 "Failed to allocate the decoder\n");


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 12/46] cxl/mem: Convert partition-info to resources
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (10 preceding siblings ...)
  2022-06-24  2:46 ` [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources Dan Williams
@ 2022-06-24  2:46 ` Dan Williams
  2022-06-28 17:02   ` Jonathan Cameron
  2022-06-24  2:46 ` [PATCH 13/46] cxl/hdm: Require all decoders to be enumerated Dan Williams
                   ` (36 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:46 UTC (permalink / raw)
  To: linux-cxl; +Cc: Ira Weiny, hch, alison.schofield, nvdimm, linux-pci, patches

To date the per-device-partition DPA range information has only been
used for enumeration purposes. In preparation for allocating regions
from available DPA capacity, convert those ranges into DPA-type resource
trees.

With resources and the new add_dpa_res() helper some open coded end
address calculations and debug prints can be cleaned.

The 'cxlds->pmem_res' and 'cxlds->ram_res' resources are child resources
of the total-device DPA space and they in turn will host DPA allocations
from cxl_endpoint_decoder instances (tracked by cxled->dpa_res).

Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/mbox.c      |   78 ++++++++++++++++++++++++------------------
 drivers/cxl/core/memdev.c    |    4 +-
 drivers/cxl/cxlmem.h         |   10 +++--
 drivers/cxl/pci.c            |    2 +
 tools/testing/cxl/test/mem.c |    2 +
 5 files changed, 55 insertions(+), 41 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 54f434733b56..3fe113dd21ad 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -771,15 +771,6 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
 	cxlds->partition_align_bytes =
 		le64_to_cpu(id.partition_align) * CXL_CAPACITY_MULTIPLIER;
 
-	dev_dbg(cxlds->dev,
-		"Identify Memory Device\n"
-		"     total_bytes = %#llx\n"
-		"     volatile_only_bytes = %#llx\n"
-		"     persistent_only_bytes = %#llx\n"
-		"     partition_align_bytes = %#llx\n",
-		cxlds->total_bytes, cxlds->volatile_only_bytes,
-		cxlds->persistent_only_bytes, cxlds->partition_align_bytes);
-
 	cxlds->lsa_size = le32_to_cpu(id.lsa_size);
 	memcpy(cxlds->firmware_version, id.fw_revision, sizeof(id.fw_revision));
 
@@ -787,42 +778,63 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
 
-int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
+static int add_dpa_res(struct device *dev, struct resource *parent,
+		       struct resource *res, resource_size_t start,
+		       resource_size_t size, const char *type)
 {
 	int rc;
 
-	if (cxlds->partition_align_bytes == 0) {
-		cxlds->ram_range.start = 0;
-		cxlds->ram_range.end = cxlds->volatile_only_bytes - 1;
-		cxlds->pmem_range.start = cxlds->volatile_only_bytes;
-		cxlds->pmem_range.end = cxlds->volatile_only_bytes +
-				       cxlds->persistent_only_bytes - 1;
+	res->name = type;
+	res->start = start;
+	res->end = start + size - 1;
+	res->flags = IORESOURCE_MEM;
+	if (resource_size(res) == 0) {
+		dev_dbg(dev, "DPA(%s): no capacity\n", res->name);
 		return 0;
 	}
-
-	rc = cxl_mem_get_partition_info(cxlds);
+	rc = request_resource(parent, res);
 	if (rc) {
-		dev_err(cxlds->dev, "Failed to query partition information\n");
+		dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name,
+			res, rc);
 		return rc;
 	}
 
-	dev_dbg(cxlds->dev,
-		"Get Partition Info\n"
-		"     active_volatile_bytes = %#llx\n"
-		"     active_persistent_bytes = %#llx\n"
-		"     next_volatile_bytes = %#llx\n"
-		"     next_persistent_bytes = %#llx\n",
-		cxlds->active_volatile_bytes, cxlds->active_persistent_bytes,
-		cxlds->next_volatile_bytes, cxlds->next_persistent_bytes);
+	dev_dbg(dev, "DPA(%s): %pr\n", res->name, res);
 
-	cxlds->ram_range.start = 0;
-	cxlds->ram_range.end = cxlds->active_volatile_bytes - 1;
+	return 0;
+}
 
-	cxlds->pmem_range.start = cxlds->active_volatile_bytes;
-	cxlds->pmem_range.end =
-		cxlds->active_volatile_bytes + cxlds->active_persistent_bytes - 1;
+int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
+{
+	struct device *dev = cxlds->dev;
+	int rc;
 
-	return 0;
+	cxlds->dpa_res =
+		(struct resource)DEFINE_RES_MEM(0, cxlds->total_bytes);
+
+	if (cxlds->partition_align_bytes == 0) {
+		rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
+				 cxlds->volatile_only_bytes, "ram");
+		if (rc)
+			return rc;
+		return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
+				   cxlds->volatile_only_bytes,
+				   cxlds->persistent_only_bytes, "pmem");
+	}
+
+	rc = cxl_mem_get_partition_info(cxlds);
+	if (rc) {
+		dev_err(dev, "Failed to query partition information\n");
+		return rc;
+	}
+
+	rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
+			 cxlds->active_volatile_bytes, "ram");
+	if (rc)
+		return rc;
+	return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
+			   cxlds->active_volatile_bytes,
+			   cxlds->active_persistent_bytes, "pmem");
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, CXL);
 
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index f7cdcd33504a..20ce488a7754 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -68,7 +68,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
-	unsigned long long len = range_len(&cxlds->ram_range);
+	unsigned long long len = resource_size(&cxlds->ram_res);
 
 	return sysfs_emit(buf, "%#llx\n", len);
 }
@@ -81,7 +81,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
-	unsigned long long len = range_len(&cxlds->pmem_range);
+	unsigned long long len = resource_size(&cxlds->pmem_res);
 
 	return sysfs_emit(buf, "%#llx\n", len);
 }
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 7df0b053373a..a9609d40643f 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -178,8 +178,9 @@ struct cxl_endpoint_dvsec_info {
  * @firmware_version: Firmware version for the memory device.
  * @enabled_cmds: Hardware commands found enabled in CEL.
  * @exclusive_cmds: Commands that are kernel-internal only
- * @pmem_range: Active Persistent memory capacity configuration
- * @ram_range: Active Volatile memory capacity configuration
+ * @dpa_res: Overall DPA resource tree for the device
+ * @pmem_res: Active Persistent memory capacity configuration
+ * @ram_res: Active Volatile memory capacity configuration
  * @total_bytes: sum of all possible capacities
  * @volatile_only_bytes: hard volatile capacity
  * @persistent_only_bytes: hard persistent capacity
@@ -209,8 +210,9 @@ struct cxl_dev_state {
 	DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
 	DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
 
-	struct range pmem_range;
-	struct range ram_range;
+	struct resource dpa_res;
+	struct resource pmem_res;
+	struct resource ram_res;
 	u64 total_bytes;
 	u64 volatile_only_bytes;
 	u64 persistent_only_bytes;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 5a0ae46d4989..eeff9599acda 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -454,7 +454,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
-	if (range_len(&cxlds->pmem_range) && IS_ENABLED(CONFIG_CXL_PMEM))
+	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
 
 	return rc;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 6b9239b2afd4..b81c90715fe8 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -282,7 +282,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
-	if (range_len(&cxlds->pmem_range) && IS_ENABLED(CONFIG_CXL_PMEM))
+	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(dev, cxlmd);
 
 	return 0;


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 13/46] cxl/hdm: Require all decoders to be enumerated
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (11 preceding siblings ...)
  2022-06-24  2:46 ` [PATCH 12/46] cxl/mem: Convert partition-info to resources Dan Williams
@ 2022-06-24  2:46 ` Dan Williams
  2022-06-28 17:04   ` Jonathan Cameron
  2022-06-24  2:46 ` [PATCH 14/46] cxl/hdm: Enumerate allocated DPA Dan Williams
                   ` (35 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:46 UTC (permalink / raw)
  To: linux-cxl; +Cc: Ben Widawsky, hch, alison.schofield, nvdimm, linux-pci, patches

From: Ben Widawsky <bwidawsk@kernel.org>

In preparation for region provisioning all device decoders need to be
enumerated since DPA allocations are calculated by summing the
capacities of all decoders in a set. I.e. the programming for decoder[N]
depends on the state of decoder[N-1], so skipping over decoders that
fail to initialize prevents accurate DPA accounting.

Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
[djbw: reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c |   12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 2223d151b61b..c940a4911fee 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -199,7 +199,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 {
 	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
 	struct cxl_port *port = cxlhdm->port;
-	int i, committed, failed;
+	int i, committed;
 	u32 ctrl;
 
 	/*
@@ -219,7 +219,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 	if (committed != cxlhdm->decoder_count)
 		msleep(20);
 
-	for (i = 0, failed = 0; i < cxlhdm->decoder_count; i++) {
+	for (i = 0; i < cxlhdm->decoder_count; i++) {
 		int target_map[CXL_DECODER_MAX_INTERLEAVE] = { 0 };
 		int rc, target_count = cxlhdm->target_count;
 		struct cxl_decoder *cxld;
@@ -250,8 +250,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 		rc = init_hdm_decoder(port, cxld, target_map, hdm, i);
 		if (rc) {
 			put_device(&cxld->dev);
-			failed++;
-			continue;
+			return rc;
 		}
 		rc = add_hdm_decoder(port, cxld, target_map);
 		if (rc) {
@@ -261,11 +260,6 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 		}
 	}
 
-	if (failed == cxlhdm->decoder_count) {
-		dev_err(&port->dev, "No valid decoders found\n");
-		return -ENXIO;
-	}
-
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_decoders, CXL);


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 14/46] cxl/hdm: Enumerate allocated DPA
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (12 preceding siblings ...)
  2022-06-24  2:46 ` [PATCH 13/46] cxl/hdm: Require all decoders to be enumerated Dan Williams
@ 2022-06-24  2:46 ` Dan Williams
  2022-06-29 14:43   ` Jonathan Cameron
  2022-06-24  2:46 ` [PATCH 15/46] cxl/Documentation: List attribute permissions Dan Williams
                   ` (34 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:46 UTC (permalink / raw)
  To: linux-cxl; +Cc: Ben Widawsky, hch, alison.schofield, nvdimm, linux-pci, patches

In preparation for provisioining CXL regions, add accounting for the DPA
space consumed by existing regions / decoders. Recall, a CXL region is a
memory range comrpised from one or more endpoint devices contributing a
mapping of their DPA into HPA space through a decoder.

Record the DPA ranges covered by committed decoders at initial probe of
endpoint ports relative to a per-device resource tree of the DPA type
(pmem or volaltile-ram).

The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
state across all endpoints and their decoders at once. The vast majority
of DPA operations are reads as region creation is expected to be as rare
as disk partitioning and volume creation. The device_lock() for this
synchronization is specifically avoided for concern of entangling with
sysfs attribute removal.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c |  148 ++++++++++++++++++++++++++++++++++++++++++++----
 drivers/cxl/cxl.h      |    2 +
 drivers/cxl/cxlmem.h   |   13 ++++
 3 files changed, 152 insertions(+), 11 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index c940a4911fee..daae6e533146 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -7,6 +7,8 @@
 #include "cxlmem.h"
 #include "core.h"
 
+static DECLARE_RWSEM(cxl_dpa_rwsem);
+
 /**
  * DOC: cxl core hdm
  *
@@ -128,10 +130,108 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
 
+/*
+ * Must be called in a context that synchronizes against this decoder's
+ * port ->remove() callback (like an endpoint decoder sysfs attribute)
+ */
+static void cxl_dpa_release(void *cxled);
+static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_action)
+{
+	struct cxl_port *port = cxled_to_port(cxled);
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct resource *res = cxled->dpa_res;
+
+	lockdep_assert_held_write(&cxl_dpa_rwsem);
+
+	if (remove_action)
+		devm_remove_action(&port->dev, cxl_dpa_release, cxled);
+
+	if (cxled->skip)
+		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
+				 cxled->skip);
+	cxled->skip = 0;
+	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
+	cxled->dpa_res = NULL;
+}
+
+static void cxl_dpa_release(void *cxled)
+{
+	down_write(&cxl_dpa_rwsem);
+	__cxl_dpa_release(cxled, false);
+	up_write(&cxl_dpa_rwsem);
+}
+
+static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
+			     resource_size_t base, resource_size_t len,
+			     resource_size_t skip)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_port *port = cxled_to_port(cxled);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct device *dev = &port->dev;
+	struct resource *res;
+
+	lockdep_assert_held_write(&cxl_dpa_rwsem);
+
+	if (!len)
+		return 0;
+
+	if (cxled->dpa_res) {
+		dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
+			port->id, cxled->cxld.id, cxled->dpa_res);
+		return -EBUSY;
+	}
+
+	if (skip) {
+		res = __request_region(&cxlds->dpa_res, base - skip, skip,
+				       dev_name(dev), 0);
+		if (!res) {
+			dev_dbg(dev,
+				"decoder%d.%d: failed to reserve skip space\n",
+				port->id, cxled->cxld.id);
+			return -EBUSY;
+		}
+	}
+	res = __request_region(&cxlds->dpa_res, base, len, dev_name(dev), 0);
+	if (!res) {
+		dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
+			port->id, cxled->cxld.id);
+		if (skip)
+			__release_region(&cxlds->dpa_res, base - skip, skip);
+		return -EBUSY;
+	}
+	cxled->dpa_res = res;
+	cxled->skip = skip;
+
+	return 0;
+}
+
+static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
+			   resource_size_t base, resource_size_t len,
+			   resource_size_t skip)
+{
+	struct cxl_port *port = cxled_to_port(cxled);
+	int rc;
+
+	down_write(&cxl_dpa_rwsem);
+	rc = __cxl_dpa_reserve(cxled, base, len, skip);
+	up_write(&cxl_dpa_rwsem);
+
+	if (rc)
+		return rc;
+
+	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
+}
+
 static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
-			    int *target_map, void __iomem *hdm, int which)
+			    int *target_map, void __iomem *hdm, int which,
+			    u64 *dpa_base)
 {
-	u64 size, base;
+	struct cxl_endpoint_decoder *cxled = NULL;
+	u64 size, base, skip, dpa_size;
+	bool committed;
+	u32 remainder;
 	int i, rc;
 	u32 ctrl;
 	union {
@@ -139,11 +239,15 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 		unsigned char target_id[8];
 	} target_list;
 
+	if (is_endpoint_decoder(&cxld->dev))
+		cxled = to_cxl_endpoint_decoder(&cxld->dev);
+
 	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
 	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));
 	size = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(which));
+	committed = !!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED);
 
-	if (!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED))
+	if (!committed)
 		size = 0;
 	if (base == U64_MAX || size == U64_MAX) {
 		dev_warn(&port->dev, "decoder%d.%d: Invalid resource range\n",
@@ -156,8 +260,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 		.end = base + size - 1,
 	};
 
-	/* switch decoders are always enabled if committed */
-	if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED) {
+	/* decoders are enabled if committed */
+	if (committed) {
 		cxld->flags |= CXL_DECODER_F_ENABLE;
 		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
 			cxld->flags |= CXL_DECODER_F_LOCK;
@@ -180,14 +284,35 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 	else
 		cxld->target_type = CXL_DECODER_ACCELERATOR;
 
-	if (is_endpoint_decoder(&cxld->dev))
+	if (!cxled) {
+		target_list.value =
+			ioread64_hi_lo(hdm + CXL_HDM_DECODER0_TL_LOW(which));
+		for (i = 0; i < cxld->interleave_ways; i++)
+			target_map[i] = target_list.target_id[i];
+
 		return 0;
+	}
 
-	target_list.value =
-		ioread64_hi_lo(hdm + CXL_HDM_DECODER0_TL_LOW(which));
-	for (i = 0; i < cxld->interleave_ways; i++)
-		target_map[i] = target_list.target_id[i];
+	if (!committed)
+		return 0;
 
+	dpa_size = div_u64_rem(size, cxld->interleave_ways, &remainder);
+	if (remainder) {
+		dev_err(&port->dev,
+			"decoder%d.%d: invalid committed configuration size: %#llx ways: %d\n",
+			port->id, cxld->id, size, cxld->interleave_ways);
+		return -ENXIO;
+	}
+	skip = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_SKIP_LOW(which));
+	rc = cxl_dpa_reserve(cxled, *dpa_base + skip, dpa_size, skip);
+	if (rc) {
+		dev_err(&port->dev,
+			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
+			port->id, cxld->id, *dpa_base,
+			*dpa_base + dpa_size + skip - 1, rc);
+		return rc;
+	}
+	*dpa_base += dpa_size + skip;
 	return 0;
 }
 
@@ -200,6 +325,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
 	struct cxl_port *port = cxlhdm->port;
 	int i, committed;
+	u64 dpa_base = 0;
 	u32 ctrl;
 
 	/*
@@ -247,7 +373,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 			return PTR_ERR(cxld);
 		}
 
-		rc = init_hdm_decoder(port, cxld, target_map, hdm, i);
+		rc = init_hdm_decoder(port, cxld, target_map, hdm, i, &dpa_base);
 		if (rc) {
 			put_device(&cxld->dev);
 			return rc;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 579f2d802396..6832d6d70548 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -56,6 +56,8 @@
 #define   CXL_HDM_DECODER0_CTRL_TYPE BIT(12)
 #define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
 #define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
+#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
+#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
 
 static inline int cxl_hdm_decoder_count(u32 cap_hdr)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index a9609d40643f..b4e5ed9eabc9 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -50,6 +50,19 @@ static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
 	return container_of(dev, struct cxl_memdev, dev);
 }
 
+static inline struct cxl_port *cxled_to_port(struct cxl_endpoint_decoder *cxled)
+{
+	return to_cxl_port(cxled->cxld.dev.parent);
+}
+
+static inline struct cxl_memdev *
+cxled_to_memdev(struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent);
+
+	return to_cxl_memdev(port->uport);
+}
+
 bool is_cxl_memdev(struct device *dev);
 static inline bool is_cxl_endpoint(struct cxl_port *port)
 {


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 15/46] cxl/Documentation: List attribute permissions
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (13 preceding siblings ...)
  2022-06-24  2:46 ` [PATCH 14/46] cxl/hdm: Enumerate allocated DPA Dan Williams
@ 2022-06-24  2:46 ` Dan Williams
  2022-06-28  3:16   ` Alison Schofield
  2022-06-29 14:59   ` Jonathan Cameron
  2022-06-24  2:46 ` [PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects Dan Williams
                   ` (33 subsequent siblings)
  48 siblings, 2 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:46 UTC (permalink / raw)
  To: linux-cxl; +Cc: Alison Schofield, hch, nvdimm, linux-pci, patches

Clarify the access permission of CXL sysfs attributes in the
documentation to help development of userspace tooling.

Reported-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |   81 ++++++++++++++++---------------
 1 file changed, 41 insertions(+), 40 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 7c2b846521f3..1fd5984b6158 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -57,28 +57,28 @@ Date:		June, 2021
 KernelVersion:	v5.14
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		CXL device objects export the devtype attribute which mirrors
-		the same value communicated in the DEVTYPE environment variable
-		for uevents for devices on the "cxl" bus.
+		(RO) CXL device objects export the devtype attribute which
+		mirrors the same value communicated in the DEVTYPE environment
+		variable for uevents for devices on the "cxl" bus.
 
 What:		/sys/bus/cxl/devices/*/modalias
 Date:		December, 2021
 KernelVersion:	v5.18
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		CXL device objects export the modalias attribute which mirrors
-		the same value communicated in the MODALIAS environment variable
-		for uevents for devices on the "cxl" bus.
+		(RO) CXL device objects export the modalias attribute which
+		mirrors the same value communicated in the MODALIAS environment
+		variable for uevents for devices on the "cxl" bus.
 
 What:		/sys/bus/cxl/devices/portX/uport
 Date:		June, 2021
 KernelVersion:	v5.14
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		CXL port objects are enumerated from either a platform firmware
-		device (ACPI0017 and ACPI0016) or PCIe switch upstream port with
-		CXL component registers. The 'uport' symlink connects the CXL
-		portX object to the device that published the CXL port
+		(RO) CXL port objects are enumerated from either a platform
+		firmware device (ACPI0017 and ACPI0016) or PCIe switch upstream
+		port with CXL component registers. The 'uport' symlink connects
+		the CXL portX object to the device that published the CXL port
 		capability.
 
 What:		/sys/bus/cxl/devices/portX/dportY
@@ -86,20 +86,20 @@ Date:		June, 2021
 KernelVersion:	v5.14
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		CXL port objects are enumerated from either a platform firmware
-		device (ACPI0017 and ACPI0016) or PCIe switch upstream port with
-		CXL component registers. The 'dportY' symlink identifies one or
-		more downstream ports that the upstream port may target in its
-		decode of CXL memory resources.  The 'Y' integer reflects the
-		hardware port unique-id used in the hardware decoder target
-		list.
+		(RO) CXL port objects are enumerated from either a platform
+		firmware device (ACPI0017 and ACPI0016) or PCIe switch upstream
+		port with CXL component registers. The 'dportY' symlink
+		identifies one or more downstream ports that the upstream port
+		may target in its decode of CXL memory resources.  The 'Y'
+		integer reflects the hardware port unique-id used in the
+		hardware decoder target list.
 
 What:		/sys/bus/cxl/devices/decoderX.Y
 Date:		June, 2021
 KernelVersion:	v5.14
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		CXL decoder objects are enumerated from either a platform
+		(RO) CXL decoder objects are enumerated from either a platform
 		firmware description, or a CXL HDM decoder register set in a
 		PCIe device (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder
 		Capability Structure). The 'X' in decoderX.Y represents the
@@ -111,42 +111,43 @@ Date:		June, 2021
 KernelVersion:	v5.14
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		The 'start' and 'size' attributes together convey the physical
-		address base and number of bytes mapped in the decoder's decode
-		window. For decoders of devtype "cxl_decoder_root" the address
-		range is fixed. For decoders of devtype "cxl_decoder_switch" the
-		address is bounded by the decode range of the cxl_port ancestor
-		of the decoder's cxl_port, and dynamically updates based on the
-		active memory regions in that address space.
+		(RO) The 'start' and 'size' attributes together convey the
+		physical address base and number of bytes mapped in the
+		decoder's decode window. For decoders of devtype
+		"cxl_decoder_root" the address range is fixed. For decoders of
+		devtype "cxl_decoder_switch" the address is bounded by the
+		decode range of the cxl_port ancestor of the decoder's cxl_port,
+		and dynamically updates based on the active memory regions in
+		that address space.
 
 What:		/sys/bus/cxl/devices/decoderX.Y/locked
 Date:		June, 2021
 KernelVersion:	v5.14
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		CXL HDM decoders have the capability to lock the configuration
-		until the next device reset. For decoders of devtype
-		"cxl_decoder_root" there is no standard facility to unlock them.
-		For decoders of devtype "cxl_decoder_switch" a secondary bus
-		reset, of the PCIe bridge that provides the bus for this
-		decoders uport, unlocks / resets the decoder.
+		(RO) CXL HDM decoders have the capability to lock the
+		configuration until the next device reset. For decoders of
+		devtype "cxl_decoder_root" there is no standard facility to
+		unlock them.  For decoders of devtype "cxl_decoder_switch" a
+		secondary bus reset, of the PCIe bridge that provides the bus
+		for this decoders uport, unlocks / resets the decoder.
 
 What:		/sys/bus/cxl/devices/decoderX.Y/target_list
 Date:		June, 2021
 KernelVersion:	v5.14
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		Display a comma separated list of the current decoder target
-		configuration. The list is ordered by the current configured
-		interleave order of the decoder's dport instances. Each entry in
-		the list is a dport id.
+		(RO) Display a comma separated list of the current decoder
+		target configuration. The list is ordered by the current
+		configured interleave order of the decoder's dport instances.
+		Each entry in the list is a dport id.
 
 What:		/sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3}
 Date:		June, 2021
 KernelVersion:	v5.14
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		When a CXL decoder is of devtype "cxl_decoder_root", it
+		(RO) When a CXL decoder is of devtype "cxl_decoder_root", it
 		represents a fixed memory window identified by platform
 		firmware. A fixed window may only support a subset of memory
 		types. The 'cap_*' attributes indicate whether persistent
@@ -158,8 +159,8 @@ Date:		June, 2021
 KernelVersion:	v5.14
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		When a CXL decoder is of devtype "cxl_decoder_switch", it can
-		optionally decode either accelerator memory (type-2) or expander
-		memory (type-3). The 'target_type' attribute indicates the
-		current setting which may dynamically change based on what
+		(RO) When a CXL decoder is of devtype "cxl_decoder_switch", it
+		can optionally decode either accelerator memory (type-2) or
+		expander memory (type-3). The 'target_type' attribute indicates
+		the current setting which may dynamically change based on what
 		memory regions are activated in this decode hierarchy.


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (14 preceding siblings ...)
  2022-06-24  2:46 ` [PATCH 15/46] cxl/Documentation: List attribute permissions Dan Williams
@ 2022-06-24  2:46 ` Dan Williams
  2022-06-29 15:28   ` Jonathan Cameron
  2022-06-24  2:47 ` [PATCH 17/46] cxl/hdm: Track next decoder to allocate Dan Williams
                   ` (32 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:46 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

Recall that the Device Physical Address (DPA) space of a CXL Memory
Expander is potentially partitioned into a volatile and persistent
portion. A decoder maps a Host Physical Address (HPA) range to a DPA
range and that translation depends on the value of all previous (lower
instance number) decoders before the current one.

In preparation for allowing dynamic provisioning of regions, decoders
need an ABI to indicate which DPA partition a decoder targets. This ABI
needs to be prepared for the possibility that some other agent committed
and locked a decoder that spans the partition boundary.

Add 'decoderX.Y/mode' to endpoint decoders that indicates which
partition 'ram' / 'pmem' the decoder targets, or 'mixed' if the decoder
currently spans the partition boundary.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |   16 ++++++++++++++++
 drivers/cxl/core/hdm.c                  |   10 ++++++++++
 drivers/cxl/core/port.c                 |   20 ++++++++++++++++++++
 drivers/cxl/cxl.h                       |    9 +++++++++
 4 files changed, 55 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 1fd5984b6158..091459216e11 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -164,3 +164,19 @@ Description:
 		expander memory (type-3). The 'target_type' attribute indicates
 		the current setting which may dynamically change based on what
 		memory regions are activated in this decode hierarchy.
+
+
+What:		/sys/bus/cxl/devices/decoderX.Y/mode
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
+		translates from a host physical address range, to a device local
+		address range. Device-local address ranges are further split
+		into a 'ram' (volatile memory) range and 'pmem' (persistent
+		memory) range. The 'mode' attribute emits one of 'ram', 'pmem',
+		'mixed', or 'none'. The 'mixed' indication is for error cases
+		when a decoder straddles the volatile/persistent partition
+		boundary, and 'none' indicates the decoder is not actively
+		decoding, or no DPA allocation policy has been set.
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index daae6e533146..3f929231b822 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -204,6 +204,16 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 	cxled->dpa_res = res;
 	cxled->skip = skip;
 
+	if (resource_contains(&cxlds->pmem_res, res))
+		cxled->mode = CXL_DECODER_PMEM;
+	else if (resource_contains(&cxlds->ram_res, res))
+		cxled->mode = CXL_DECODER_RAM;
+	else {
+		dev_dbg(dev, "decoder%d.%d: %pr mixed\n", port->id,
+			cxled->cxld.id, cxled->dpa_res);
+		cxled->mode = CXL_DECODER_MIXED;
+	}
+
 	return 0;
 }
 
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index b5f5fb9aa4b7..9d632c8c580b 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -171,6 +171,25 @@ static ssize_t target_list_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(target_list);
 
+static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+
+	switch (cxled->mode) {
+	case CXL_DECODER_RAM:
+		return sysfs_emit(buf, "ram\n");
+	case CXL_DECODER_PMEM:
+		return sysfs_emit(buf, "pmem\n");
+	case CXL_DECODER_NONE:
+		return sysfs_emit(buf, "none\n");
+	case CXL_DECODER_MIXED:
+	default:
+		return sysfs_emit(buf, "mixed\n");
+	}
+}
+static DEVICE_ATTR_RO(mode);
+
 static struct attribute *cxl_decoder_base_attrs[] = {
 	&dev_attr_start.attr,
 	&dev_attr_size.attr,
@@ -221,6 +240,7 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = {
 
 static struct attribute *cxl_decoder_endpoint_attrs[] = {
 	&dev_attr_target_type.attr,
+	&dev_attr_mode.attr,
 	NULL,
 };
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 6832d6d70548..aa223166f7ef 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -241,16 +241,25 @@ struct cxl_decoder {
 	unsigned long flags;
 };
 
+enum cxl_decoder_mode {
+	CXL_DECODER_NONE,
+	CXL_DECODER_RAM,
+	CXL_DECODER_PMEM,
+	CXL_DECODER_MIXED,
+};
+
 /**
  * struct cxl_endpoint_decoder - Endpoint  / SPA to DPA decoder
  * @cxld: base cxl_decoder_object
  * @dpa_res: actively claimed DPA span of this decoder
  * @skip: offset into @dpa_res where @cxld.hpa_range maps
+ * @mode: which memory type / access-mode-partition this decoder targets
  */
 struct cxl_endpoint_decoder {
 	struct cxl_decoder cxld;
 	struct resource *dpa_res;
 	resource_size_t skip;
+	enum cxl_decoder_mode mode;
 };
 
 /**


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 17/46] cxl/hdm: Track next decoder to allocate
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (15 preceding siblings ...)
  2022-06-24  2:46 ` [PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects Dan Williams
@ 2022-06-24  2:47 ` Dan Williams
  2022-06-29 15:31   ` Jonathan Cameron
  2022-06-24  2:47 ` [PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder Dan Williams
                   ` (31 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:47 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

The CXL specification enforces that endpoint decoders are committed in
hw instance id order. In preparation for adding dynamic DPA allocation,
record the hw instance id in endpoint decoders, and enforce allocations
to occur in hw instance id order.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c  |   14 ++++++++++++++
 drivers/cxl/core/port.c |    1 +
 drivers/cxl/cxl.h       |    2 ++
 3 files changed, 17 insertions(+)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 3f929231b822..8805afe63ebf 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -153,6 +153,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_ac
 	cxled->skip = 0;
 	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
 	cxled->dpa_res = NULL;
+	port->dpa_end--;
 }
 
 static void cxl_dpa_release(void *cxled)
@@ -183,6 +184,18 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 		return -EBUSY;
 	}
 
+	if (port->dpa_end + 1 != cxled->cxld.id) {
+		/*
+		 * Assumes alloc and commit order is always in hardware instance
+		 * order per expectations from 8.2.5.12.20 Committing Decoder
+		 * Programming that enforce decoder[m] committed before
+		 * decoder[m+1] commit start.
+		 */
+		dev_dbg(dev, "decoder%d.%d: expected decoder%d.%d\n", port->id,
+			cxled->cxld.id, port->id, port->dpa_end + 1);
+		return -EBUSY;
+	}
+
 	if (skip) {
 		res = __request_region(&cxlds->dpa_res, base - skip, skip,
 				       dev_name(dev), 0);
@@ -213,6 +226,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 			cxled->cxld.id, cxled->dpa_res);
 		cxled->mode = CXL_DECODER_MIXED;
 	}
+	port->dpa_end++;
 
 	return 0;
 }
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 9d632c8c580b..54bf032cbcb7 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -485,6 +485,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
 	port->uport = uport;
 	port->component_reg_phys = component_reg_phys;
 	ida_init(&port->decoder_ida);
+	port->dpa_end = -1;
 	INIT_LIST_HEAD(&port->dports);
 	INIT_LIST_HEAD(&port->endpoints);
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index aa223166f7ef..d8edbdaa6208 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -326,6 +326,7 @@ struct cxl_nvdimm {
  * @dports: cxl_dport instances referenced by decoders
  * @endpoints: cxl_ep instances, endpoints that are a descendant of this port
  * @decoder_ida: allocator for decoder ids
+ * @dpa_end: cursor to track highest allocated decoder for allocation ordering
  * @component_reg_phys: component register capability base address (optional)
  * @dead: last ep has been removed, force port re-creation
  * @depth: How deep this port is relative to the root. depth 0 is the root.
@@ -337,6 +338,7 @@ struct cxl_port {
 	struct list_head dports;
 	struct list_head endpoints;
 	struct ida decoder_ida;
+	int dpa_end;
 	resource_size_t component_reg_phys;
 	bool dead;
 	unsigned int depth;


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (16 preceding siblings ...)
  2022-06-24  2:47 ` [PATCH 17/46] cxl/hdm: Track next decoder to allocate Dan Williams
@ 2022-06-24  2:47 ` Dan Williams
  2022-06-29 15:56   ` Jonathan Cameron
  2022-06-24  2:47 ` [PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init() Dan Williams
                   ` (30 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:47 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

The region provisioning flow will roughly follow a sequence of:

1/ Allocate DPA to a set of decoders

2/ Allocate HPA to a region

3/ Associate decoders with a region and validate that the DPA allocations
   and topologies match the parameters of the region.

For now, this change (step 1) arranges for DPA capacity to be allocated
and deleted from non-committed decoders based on the decoder's mode /
partition selection. Capacity is allocated from the lowest DPA in the
partition and any 'pmem' allocation blocks out all remaining ram
capacity in its 'skip' setting. DPA allocations are enforced in decoder
instance order. I.e. decoder N + 1 always starts at a higher DPA than
instance N, and deleting allocations must proceed from the
highest-instance allocated decoder to the lowest.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |   37 +++++++
 drivers/cxl/core/core.h                 |    7 +
 drivers/cxl/core/hdm.c                  |  160 +++++++++++++++++++++++++++++++
 drivers/cxl/core/port.c                 |   73 ++++++++++++++
 4 files changed, 275 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 091459216e11..85844f9bc00b 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -171,7 +171,7 @@ Date:		May, 2022
 KernelVersion:	v5.20
 Contact:	linux-cxl@vger.kernel.org
 Description:
-		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
+		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
 		translates from a host physical address range, to a device local
 		address range. Device-local address ranges are further split
 		into a 'ram' (volatile memory) range and 'pmem' (persistent
@@ -180,3 +180,38 @@ Description:
 		when a decoder straddles the volatile/persistent partition
 		boundary, and 'none' indicates the decoder is not actively
 		decoding, or no DPA allocation policy has been set.
+
+		'mode' can be written, when the decoder is in the 'disabled'
+		state, with either 'ram' or 'pmem' to set the boundaries for the
+		next allocation.
+
+
+What:		/sys/bus/cxl/devices/decoderX.Y/dpa_resource
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint",
+		and its 'dpa_size' attribute is non-zero, this attribute
+		indicates the device physical address (DPA) base address of the
+		allocation.
+
+
+What:		/sys/bus/cxl/devices/decoderX.Y/dpa_size
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
+		translates from a host physical address range, to a device local
+		address range. The range, base address plus length in bytes, of
+		DPA allocated to this decoder is conveyed in these 2 attributes.
+		Allocations can be mutated as long as the decoder is in the
+		disabled state. A write to 'size' releases the previous DPA
+		allocation and then attempts to allocate from the free capacity
+		in the device partition referred to by 'decoderX.Y/mode'.
+		Allocate and free requests can only be performed on the highest
+		instance number disabled decoder with non-zero size. I.e.
+		allocations are enforced to occur in increasing 'decoderX.Y/id'
+		order and frees are enforced to occur in decreasing
+		'decoderX.Y/id' order.
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 1a50c0fc399c..47cf0c286fc3 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -17,6 +17,13 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s);
 void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
 				   resource_size_t length);
 
+int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
+		     enum cxl_decoder_mode mode);
+int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size);
+int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
+resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
+resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
+
 int cxl_memdev_init(void);
 void cxl_memdev_exit(void);
 void cxl_mbox_init(void);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 8805afe63ebf..ceb4c28abc1b 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -248,6 +248,166 @@ static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
 }
 
+resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled)
+{
+	resource_size_t size = 0;
+
+	down_read(&cxl_dpa_rwsem);
+	if (cxled->dpa_res)
+		size = resource_size(cxled->dpa_res);
+	up_read(&cxl_dpa_rwsem);
+
+	return size;
+}
+
+resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled)
+{
+	resource_size_t base = -1;
+
+	down_read(&cxl_dpa_rwsem);
+	if (cxled->dpa_res)
+		base = cxled->dpa_res->start;
+	up_read(&cxl_dpa_rwsem);
+
+	return base;
+}
+
+int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
+{
+	int rc = -EBUSY;
+	struct device *dev = &cxled->cxld.dev;
+	struct cxl_port *port = to_cxl_port(dev->parent);
+
+	down_write(&cxl_dpa_rwsem);
+	if (!cxled->dpa_res) {
+		rc = 0;
+		goto out;
+	}
+	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
+		dev_dbg(dev, "decoder enabled\n");
+		goto out;
+	}
+	if (cxled->cxld.id != port->dpa_end) {
+		dev_dbg(dev, "expected decoder%d.%d\n", port->id,
+			port->dpa_end);
+		goto out;
+	}
+	__cxl_dpa_release(cxled, true);
+	rc = 0;
+out:
+	up_write(&cxl_dpa_rwsem);
+	return rc;
+}
+
+int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
+		     enum cxl_decoder_mode mode)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct device *dev = &cxled->cxld.dev;
+	int rc = -EBUSY;
+
+	switch (mode) {
+	case CXL_DECODER_RAM:
+	case CXL_DECODER_PMEM:
+		break;
+	default:
+		dev_dbg(dev, "unsupported mode: %d\n", mode);
+		return -EINVAL;
+	}
+
+	down_write(&cxl_dpa_rwsem);
+	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
+		goto out;
+	/*
+	 * Only allow modes that are supported by the current partition
+	 * configuration
+	 */
+	rc = -ENXIO;
+	if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) {
+		dev_dbg(dev, "no available pmem capacity\n");
+		goto out;
+	}
+	if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) {
+		dev_dbg(dev, "no available ram capacity\n");
+		goto out;
+	}
+
+	cxled->mode = mode;
+	rc = 0;
+out:
+	up_write(&cxl_dpa_rwsem);
+
+	return rc;
+}
+
+int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	resource_size_t free_ram_start, free_pmem_start;
+	struct cxl_port *port = cxled_to_port(cxled);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct device *dev = &cxled->cxld.dev;
+	resource_size_t start, avail, skip;
+	struct resource *p, *last;
+	int rc = -EBUSY;
+
+	down_write(&cxl_dpa_rwsem);
+	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
+		dev_dbg(dev, "decoder enabled\n");
+		goto out;
+	}
+
+	for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling)
+		last = p;
+	if (last)
+		free_ram_start = last->end + 1;
+	else
+		free_ram_start = cxlds->ram_res.start;
+
+	for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling)
+		last = p;
+	if (last)
+		free_pmem_start = last->end + 1;
+	else
+		free_pmem_start = cxlds->pmem_res.start;
+
+	if (cxled->mode == CXL_DECODER_RAM) {
+		start = free_ram_start;
+		avail = cxlds->ram_res.end - start + 1;
+		skip = 0;
+	} else if (cxled->mode == CXL_DECODER_PMEM) {
+		resource_size_t skip_start, skip_end;
+
+		start = free_pmem_start;
+		avail = cxlds->pmem_res.end - start + 1;
+		skip_start = free_ram_start;
+		skip_end = start - 1;
+		skip = skip_end - skip_start + 1;
+	} else {
+		dev_dbg(dev, "mode not set\n");
+		rc = -EINVAL;
+		goto out;
+	}
+
+	if (size > avail) {
+		dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
+			cxled->mode == CXL_DECODER_RAM ? "ram" : "pmem",
+			&avail);
+		rc = -ENOSPC;
+		goto out;
+	}
+
+	rc = __cxl_dpa_reserve(cxled, start, size, skip);
+out:
+	up_write(&cxl_dpa_rwsem);
+
+	if (rc)
+		return rc;
+
+	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
+}
+
 static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 			    int *target_map, void __iomem *hdm, int which,
 			    u64 *dpa_base)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 54bf032cbcb7..08851357b364 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -188,7 +188,76 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
 		return sysfs_emit(buf, "mixed\n");
 	}
 }
-static DEVICE_ATTR_RO(mode);
+
+static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
+{
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+	enum cxl_decoder_mode mode;
+	ssize_t rc;
+
+	if (sysfs_streq(buf, "pmem"))
+		mode = CXL_DECODER_PMEM;
+	else if (sysfs_streq(buf, "ram"))
+		mode = CXL_DECODER_RAM;
+	else
+		return -EINVAL;
+
+	rc = cxl_dpa_set_mode(cxled, mode);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_RW(mode);
+
+static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+{
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+	u64 base = cxl_dpa_resource(cxled);
+
+	return sysfs_emit(buf, "%#llx\n", base);
+}
+static DEVICE_ATTR_RO(dpa_resource);
+
+static ssize_t dpa_size_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+	resource_size_t size = cxl_dpa_size(cxled);
+
+	return sysfs_emit(buf, "%pa\n", &size);
+}
+
+static ssize_t dpa_size_store(struct device *dev, struct device_attribute *attr,
+			      const char *buf, size_t len)
+{
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+	unsigned long long size;
+	ssize_t rc;
+
+	rc = kstrtoull(buf, 0, &size);
+	if (rc)
+		return rc;
+
+	if (!IS_ALIGNED(size, SZ_256M))
+		return -EINVAL;
+
+	rc = cxl_dpa_free(cxled);
+	if (rc)
+		return rc;
+
+	if (size == 0)
+		return len;
+
+	rc = cxl_dpa_alloc(cxled, size);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_RW(dpa_size);
 
 static struct attribute *cxl_decoder_base_attrs[] = {
 	&dev_attr_start.attr,
@@ -241,6 +310,8 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = {
 static struct attribute *cxl_decoder_endpoint_attrs[] = {
 	&dev_attr_target_type.attr,
 	&dev_attr_mode.attr,
+	&dev_attr_dpa_size.attr,
+	&dev_attr_dpa_resource.attr,
 	NULL,
 };
 


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init()
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (17 preceding siblings ...)
  2022-06-24  2:47 ` [PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder Dan Williams
@ 2022-06-24  2:47 ` Dan Williams
  2022-06-29 15:58   ` Jonathan Cameron
  2022-06-24  2:47 ` [PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem' Dan Williams
                   ` (29 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:47 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

In preparation for a new cxl debugfs file, move 'cxl' directory
establishment and teardown to the core and let subsequent init routines
reference that setup.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/core.h |    2 +-
 drivers/cxl/core/mbox.c |   10 +---------
 drivers/cxl/core/port.c |   13 +++++++++++--
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 47cf0c286fc3..c242fa02d5e8 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -24,9 +24,9 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
 resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
 resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
 
+struct dentry *cxl_debugfs_create_dir(const char *dir);
 int cxl_memdev_init(void);
 void cxl_memdev_exit(void);
 void cxl_mbox_init(void);
-void cxl_mbox_exit(void);
 
 #endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 3fe113dd21ad..dd438ca12dcd 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -855,19 +855,11 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_dev_state_create, CXL);
 
-static struct dentry *cxl_debugfs;
-
 void __init cxl_mbox_init(void)
 {
 	struct dentry *mbox_debugfs;
 
-	cxl_debugfs = debugfs_create_dir("cxl", NULL);
-	mbox_debugfs = debugfs_create_dir("mbox", cxl_debugfs);
+	mbox_debugfs = cxl_debugfs_create_dir("mbox");
 	debugfs_create_bool("raw_allow_all", 0600, mbox_debugfs,
 			    &cxl_raw_allow_all);
 }
-
-void cxl_mbox_exit(void)
-{
-	debugfs_remove_recursive(cxl_debugfs);
-}
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 08851357b364..f02b7470c20e 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2,6 +2,7 @@
 /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/workqueue.h>
+#include <linux/debugfs.h>
 #include <linux/device.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -1695,10 +1696,19 @@ struct bus_type cxl_bus_type = {
 };
 EXPORT_SYMBOL_NS_GPL(cxl_bus_type, CXL);
 
+static struct dentry *cxl_debugfs;
+
+struct dentry *cxl_debugfs_create_dir(const char *dir)
+{
+	return debugfs_create_dir(dir, cxl_debugfs);
+}
+
 static __init int cxl_core_init(void)
 {
 	int rc;
 
+	cxl_debugfs = debugfs_create_dir("cxl", NULL);
+
 	cxl_mbox_init();
 
 	rc = cxl_memdev_init();
@@ -1721,7 +1731,6 @@ static __init int cxl_core_init(void)
 	destroy_workqueue(cxl_bus_wq);
 err_wq:
 	cxl_memdev_exit();
-	cxl_mbox_exit();
 	return rc;
 }
 
@@ -1730,7 +1739,7 @@ static void cxl_core_exit(void)
 	bus_unregister(&cxl_bus_type);
 	destroy_workqueue(cxl_bus_wq);
 	cxl_memdev_exit();
-	cxl_mbox_exit();
+	debugfs_remove_recursive(cxl_debugfs);
 }
 
 module_init(cxl_core_init);


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem'
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (18 preceding siblings ...)
  2022-06-24  2:47 ` [PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init() Dan Williams
@ 2022-06-24  2:47 ` Dan Williams
  2022-06-29 16:08   ` Jonathan Cameron
  2022-06-24  2:47 ` [PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory Dan Williams
                   ` (28 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:47 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

Dump the device-physial-address map for a CXL expander in /proc/iomem
style format. E.g.:

  cat /sys/kernel/debug/cxl/mem1/dpamem
  00000000-0fffffff : ram
  10000000-1fffffff : pmem

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/core.h |    1 -
 drivers/cxl/core/hdm.c  |   23 +++++++++++++++++++++++
 drivers/cxl/core/port.c |    1 +
 drivers/cxl/cxlmem.h    |    4 ++++
 drivers/cxl/mem.c       |   23 +++++++++++++++++++++++
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index c242fa02d5e8..472ec9cb1018 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -24,7 +24,6 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
 resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
 resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
 
-struct dentry *cxl_debugfs_create_dir(const char *dir);
 int cxl_memdev_init(void);
 void cxl_memdev_exit(void);
 void cxl_mbox_init(void);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index ceb4c28abc1b..c0164f9b2195 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
 #include <linux/io-64-nonatomic-hi-lo.h>
+#include <linux/seq_file.h>
 #include <linux/device.h>
 #include <linux/delay.h>
 
@@ -248,6 +249,28 @@ static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
 }
 
+static void __cxl_dpa_debug(struct seq_file *file, struct resource *r, int depth)
+{
+	unsigned long long start = r->start, end = r->end;
+
+	seq_printf(file, "%*s%08llx-%08llx : %s\n", depth * 2, "", start, end,
+		   r->name);
+}
+
+void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
+{
+	struct resource *p1, *p2;
+
+	down_read(&cxl_dpa_rwsem);
+	for (p1 = cxlds->dpa_res.child; p1; p1 = p1->sibling) {
+		__cxl_dpa_debug(file, p1, 0);
+		for (p2 = p1->child; p2; p2 = p2->sibling)
+			__cxl_dpa_debug(file, p2, 1);
+	}
+	up_read(&cxl_dpa_rwsem);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, CXL);
+
 resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled)
 {
 	resource_size_t size = 0;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index f02b7470c20e..4e4e26ca507c 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1702,6 +1702,7 @@ struct dentry *cxl_debugfs_create_dir(const char *dir)
 {
 	return debugfs_create_dir(dir, cxl_debugfs);
 }
+EXPORT_SYMBOL_NS_GPL(cxl_debugfs_create_dir, CXL);
 
 static __init int cxl_core_init(void)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index b4e5ed9eabc9..db9c889f42ab 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -385,4 +385,8 @@ struct cxl_hdm {
 	unsigned int interleave_mask;
 	struct cxl_port *port;
 };
+
+struct seq_file;
+struct dentry *cxl_debugfs_create_dir(const char *dir);
+void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds);
 #endif /* __CXL_MEM_H__ */
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index a979d0b484d5..7513bea55145 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+#include <linux/debugfs.h>
 #include <linux/device.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -56,10 +57,26 @@ static void enable_suspend(void *data)
 	cxl_mem_active_dec();
 }
 
+static void remove_debugfs(void *dentry)
+{
+	debugfs_remove_recursive(dentry);
+}
+
+static int cxl_mem_dpa_show(struct seq_file *file, void *data)
+{
+	struct device *dev = file->private;
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+
+	cxl_dpa_debug(file, cxlmd->cxlds);
+
+	return 0;
+}
+
 static int cxl_mem_probe(struct device *dev)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_port *parent_port;
+	struct dentry *dentry;
 	int rc;
 
 	/*
@@ -73,6 +90,12 @@ static int cxl_mem_probe(struct device *dev)
 	if (work_pending(&cxlmd->detach_work))
 		return -EBUSY;
 
+	dentry = cxl_debugfs_create_dir(dev_name(dev));
+	debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
+	rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
+	if (rc)
+		return rc;
+
 	rc = devm_cxl_enumerate_ports(cxlmd);
 	if (rc)
 		return rc;


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (19 preceding siblings ...)
  2022-06-24  2:47 ` [PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem' Dan Williams
@ 2022-06-24  2:47 ` Dan Williams
  2022-06-29 16:11   ` Jonathan Cameron
  2022-06-24  2:47 ` [PATCH 22/46] tools/testing/cxl: Expand CFMWS windows Dan Williams
                   ` (27 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:47 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

A recent QEMU upgrade resulted in collisions between QEMU's chosen
location for PCI MMIO and cxl_test's fake address location for emulated
CXL purposes. This was great for testing resource collisions, but not so
great for continuing to test the nominal cases. Move cxl_test to the
top-of-memory where it is less likely to collide with other resources.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/cxl/test/cxl.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index f52a5dd69d36..b6e6bc02a507 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -632,7 +632,8 @@ static __init int cxl_test_init(void)
 		goto err_gen_pool_create;
 	}
 
-	rc = gen_pool_add(cxl_mock_pool, SZ_512G, SZ_64G, NUMA_NO_NODE);
+	rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G,
+			  SZ_64G, NUMA_NO_NODE);
 	if (rc)
 		goto err_gen_pool_add;
 


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 22/46] tools/testing/cxl: Expand CFMWS windows
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (20 preceding siblings ...)
  2022-06-24  2:47 ` [PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory Dan Williams
@ 2022-06-24  2:47 ` Dan Williams
  2022-06-29 16:14   ` Jonathan Cameron
  2022-06-24  2:47 ` [PATCH 23/46] tools/testing/cxl: Add partition support Dan Williams
                   ` (26 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:47 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

For the x2 host-bridge interleave windows, allow for a
x8-endpoint-interleave configuration per memory-type with each device
contributing the minimum 256MB extent. Similarly, for the x1 host-bridge
interleave windows, allow for a x4-endpoint-interleave configuration per
memory-type.

Bump up the number of decoders per-port to support hosting 8 regions.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/cxl/test/cxl.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index b6e6bc02a507..599326796b83 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -14,7 +14,7 @@
 #define NR_CXL_HOST_BRIDGES 2
 #define NR_CXL_ROOT_PORTS 2
 #define NR_CXL_SWITCH_PORTS 2
-#define NR_CXL_PORT_DECODERS 2
+#define NR_CXL_PORT_DECODERS 8
 
 static struct platform_device *cxl_acpi;
 static struct platform_device *cxl_host_bridge[NR_CXL_HOST_BRIDGES];
@@ -118,7 +118,7 @@ static struct {
 			.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
 					ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
 			.qtg_id = 0,
-			.window_size = SZ_256M,
+			.window_size = SZ_256M * 4UL,
 		},
 		.target = { 0 },
 	},
@@ -133,7 +133,7 @@ static struct {
 			.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
 					ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
 			.qtg_id = 1,
-			.window_size = SZ_256M * 2,
+			.window_size = SZ_256M * 8UL,
 		},
 		.target = { 0, 1, },
 	},
@@ -148,7 +148,7 @@ static struct {
 			.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
 					ACPI_CEDT_CFMWS_RESTRICT_PMEM,
 			.qtg_id = 2,
-			.window_size = SZ_256M,
+			.window_size = SZ_256M * 4UL,
 		},
 		.target = { 0 },
 	},
@@ -163,7 +163,7 @@ static struct {
 			.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
 					ACPI_CEDT_CFMWS_RESTRICT_PMEM,
 			.qtg_id = 3,
-			.window_size = SZ_256M * 2,
+			.window_size = SZ_256M * 8UL,
 		},
 		.target = { 0, 1, },
 	},


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 23/46] tools/testing/cxl: Add partition support
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (21 preceding siblings ...)
  2022-06-24  2:47 ` [PATCH 22/46] tools/testing/cxl: Expand CFMWS windows Dan Williams
@ 2022-06-24  2:47 ` Dan Williams
  2022-06-29 16:20   ` Jonathan Cameron
  2022-06-24  2:48 ` [PATCH 24/46] tools/testing/cxl: Fix decoder default state Dan Williams
                   ` (25 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:47 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

In support of testing DPA allocation mecahinisms in the CXL core, the
cxl_test environment needs to support establishing and retrieving the
'pmem partition boundary.

Replace the platform_device_add_resources() method for delineating DPA
within an endpoint with an emulated DEV_SIZE amount of partitionable
capacity. Set DEV_SIZE such that an endpoint has enough capacity to
simultaneously participate in 8 distinct regions.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/mbox.c      |    7 +-----
 drivers/cxl/cxlmem.h         |    7 ++++++
 tools/testing/cxl/test/cxl.c |   40 +--------------------------------
 tools/testing/cxl/test/mem.c |   51 ++++++++++++++++++++++--------------------
 4 files changed, 36 insertions(+), 69 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index dd438ca12dcd..40e3ccb2bf3e 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -716,12 +716,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
  */
 static int cxl_mem_get_partition_info(struct cxl_dev_state *cxlds)
 {
-	struct cxl_mbox_get_partition_info {
-		__le64 active_volatile_cap;
-		__le64 active_persistent_cap;
-		__le64 next_volatile_cap;
-		__le64 next_persistent_cap;
-	} __packed pi;
+	struct cxl_mbox_get_partition_info pi;
 	int rc;
 
 	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_PARTITION_INFO, NULL, 0,
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index db9c889f42ab..eee96016c3c7 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -314,6 +314,13 @@ struct cxl_mbox_identify {
 	u8 qos_telemetry_caps;
 } __packed;
 
+struct cxl_mbox_get_partition_info {
+	__le64 active_volatile_cap;
+	__le64 active_persistent_cap;
+	__le64 next_volatile_cap;
+	__le64 next_persistent_cap;
+} __packed;
+
 struct cxl_mbox_get_lsa {
 	__le32 offset;
 	__le32 length;
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 599326796b83..c396f20a57dd 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -582,44 +582,6 @@ static void mock_companion(struct acpi_device *adev, struct device *dev)
 #define SZ_512G (SZ_64G * 8)
 #endif
 
-static struct platform_device *alloc_memdev(int id)
-{
-	struct resource res[] = {
-		[0] = {
-			.flags = IORESOURCE_MEM,
-		},
-		[1] = {
-			.flags = IORESOURCE_MEM,
-			.desc = IORES_DESC_PERSISTENT_MEMORY,
-		},
-	};
-	struct platform_device *pdev;
-	int i, rc;
-
-	for (i = 0; i < ARRAY_SIZE(res); i++) {
-		struct cxl_mock_res *r = alloc_mock_res(SZ_256M);
-
-		if (!r)
-			return NULL;
-		res[i].start = r->range.start;
-		res[i].end = r->range.end;
-	}
-
-	pdev = platform_device_alloc("cxl_mem", id);
-	if (!pdev)
-		return NULL;
-
-	rc = platform_device_add_resources(pdev, res, ARRAY_SIZE(res));
-	if (rc)
-		goto err;
-
-	return pdev;
-
-err:
-	platform_device_put(pdev);
-	return NULL;
-}
-
 static __init int cxl_test_init(void)
 {
 	int rc, i;
@@ -722,7 +684,7 @@ static __init int cxl_test_init(void)
 		struct platform_device *dport = cxl_switch_dport[i];
 		struct platform_device *pdev;
 
-		pdev = alloc_memdev(i);
+		pdev = platform_device_alloc("cxl_mem", i);
 		if (!pdev)
 			goto err_mem;
 		pdev->dev.parent = &dport->dev;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index b81c90715fe8..aa2df3a15051 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -10,6 +10,7 @@
 #include <cxlmem.h>
 
 #define LSA_SIZE SZ_128K
+#define DEV_SIZE SZ_2G
 #define EFFECT(x) (1U << x)
 
 static struct cxl_cel_entry mock_cel[] = {
@@ -25,6 +26,10 @@ static struct cxl_cel_entry mock_cel[] = {
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_LSA),
 		.effect = cpu_to_le16(0),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_PARTITION_INFO),
+		.effect = cpu_to_le16(0),
+	},
 	{
 		.opcode = cpu_to_le16(CXL_MBOX_OP_SET_LSA),
 		.effect = cpu_to_le16(EFFECT(1) | EFFECT(2)),
@@ -97,42 +102,37 @@ static int mock_get_log(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 
 static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
-	struct platform_device *pdev = to_platform_device(cxlds->dev);
 	struct cxl_mbox_identify id = {
 		.fw_revision = { "mock fw v1 " },
 		.lsa_size = cpu_to_le32(LSA_SIZE),
-		/* FIXME: Add partition support */
-		.partition_align = cpu_to_le64(0),
+		.partition_align =
+			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
+		.total_capacity =
+			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
 	};
-	u64 capacity = 0;
-	int i;
 
 	if (cmd->size_out < sizeof(id))
 		return -EINVAL;
 
-	for (i = 0; i < 2; i++) {
-		struct resource *res;
-
-		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
-		if (!res)
-			break;
-
-		capacity += resource_size(res) / CXL_CAPACITY_MULTIPLIER;
+	memcpy(cmd->payload_out, &id, sizeof(id));
 
-		if (le64_to_cpu(id.partition_align))
-			continue;
+	return 0;
+}
 
-		if (res->desc == IORES_DESC_PERSISTENT_MEMORY)
-			id.persistent_capacity = cpu_to_le64(
-				resource_size(res) / CXL_CAPACITY_MULTIPLIER);
-		else
-			id.volatile_capacity = cpu_to_le64(
-				resource_size(res) / CXL_CAPACITY_MULTIPLIER);
-	}
+static int mock_partition_info(struct cxl_dev_state *cxlds,
+			       struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_get_partition_info pi = {
+		.active_volatile_cap =
+			cpu_to_le64(DEV_SIZE / 2 / CXL_CAPACITY_MULTIPLIER),
+		.active_persistent_cap =
+			cpu_to_le64(DEV_SIZE / 2 / CXL_CAPACITY_MULTIPLIER),
+	};
 
-	id.total_capacity = cpu_to_le64(capacity);
+	if (cmd->size_out < sizeof(pi))
+		return -EINVAL;
 
-	memcpy(cmd->payload_out, &id, sizeof(id));
+	memcpy(cmd->payload_out, &pi, sizeof(pi));
 
 	return 0;
 }
@@ -221,6 +221,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_GET_LSA:
 		rc = mock_get_lsa(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_GET_PARTITION_INFO:
+		rc = mock_partition_info(cxlds, cmd);
+		break;
 	case CXL_MBOX_OP_SET_LSA:
 		rc = mock_set_lsa(cxlds, cmd);
 		break;


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 24/46] tools/testing/cxl: Fix decoder default state
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (22 preceding siblings ...)
  2022-06-24  2:47 ` [PATCH 23/46] tools/testing/cxl: Add partition support Dan Williams
@ 2022-06-24  2:48 ` Dan Williams
  2022-06-29 16:22   ` Jonathan Cameron
  2022-06-24  2:48 ` [PATCH 25/46] cxl/port: Record dport in endpoint references Dan Williams
                   ` (24 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:48 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

The 'enabled' state is reserved for committed decoders. By default,
cxl_test decoders are uncommitted at init time.

Fixes: 7c7d68db0254 ("tools/testing/cxl: Enumerate mock decoders")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/cxl/test/cxl.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index c396f20a57dd..51d517fa62ee 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -479,7 +479,6 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 			.end = -1,
 		};
 
-		cxld->flags = CXL_DECODER_F_ENABLE;
 		cxld->interleave_ways = min_not_zero(target_count, 1);
 		cxld->interleave_granularity = SZ_4K;
 		cxld->target_type = CXL_DECODER_EXPANDER;


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 25/46] cxl/port: Record dport in endpoint references
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (23 preceding siblings ...)
  2022-06-24  2:48 ` [PATCH 24/46] tools/testing/cxl: Fix decoder default state Dan Williams
@ 2022-06-24  2:48 ` Dan Williams
  2022-06-29 16:49   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 26/46] cxl/port: Record parent dport when adding ports Dan Williams
                   ` (23 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  2:48 UTC (permalink / raw)
  To: linux-cxl; +Cc: hch, alison.schofield, nvdimm, linux-pci, patches

Recall that the primary role of the cxl_mem driver is to probe if the
given endoint is connected to a CXL port topology. In that process it
walks its device ancestry to its PCI root port. If that root port is
also a CXL root port then the probe process adds cxl_port object
instances at switch in the path between to the root and the endpoint. As
those cxl_port instances are added, or if a previous enumeration
attempt already created the port a 'struct cxl_ep' instance is
registered with that port to track the endpoints interested in that
port.

At the time the cxl_ep is registered the downstream egress path from the
port to the endpoint is known. Take the opportunity to record that
information as it will be needed for dynamic programming of decoder
targets during region provisioning.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/port.c |   52 ++++++++++++++++++++++++++++++++---------------
 drivers/cxl/cxl.h       |    2 ++
 2 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 4e4e26ca507c..c54e1dbf92cb 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -866,8 +866,9 @@ static struct cxl_ep *find_ep(struct cxl_port *port, struct device *ep_dev)
 	return NULL;
 }
 
-static int add_ep(struct cxl_port *port, struct cxl_ep *new)
+static int add_ep(struct cxl_ep *new)
 {
+	struct cxl_port *port = new->dport->port;
 	struct cxl_ep *dup;
 
 	device_lock(&port->dev);
@@ -885,14 +886,14 @@ static int add_ep(struct cxl_port *port, struct cxl_ep *new)
 
 /**
  * cxl_add_ep - register an endpoint's interest in a port
- * @port: a port in the endpoint's topology ancestry
+ * @dport: the dport that routes to @ep_dev
  * @ep_dev: device representing the endpoint
  *
  * Intermediate CXL ports are scanned based on the arrival of endpoints.
  * When those endpoints depart the port can be destroyed once all
  * endpoints that care about that port have been removed.
  */
-static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)
+static int cxl_add_ep(struct cxl_dport *dport, struct device *ep_dev)
 {
 	struct cxl_ep *ep;
 	int rc;
@@ -903,8 +904,9 @@ static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)
 
 	INIT_LIST_HEAD(&ep->list);
 	ep->ep = get_device(ep_dev);
+	ep->dport = dport;
 
-	rc = add_ep(port, ep);
+	rc = add_ep(ep);
 	if (rc)
 		cxl_ep_release(ep);
 	return rc;
@@ -913,11 +915,13 @@ static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)
 struct cxl_find_port_ctx {
 	const struct device *dport_dev;
 	const struct cxl_port *parent_port;
+	struct cxl_dport **dport;
 };
 
 static int match_port_by_dport(struct device *dev, const void *data)
 {
 	const struct cxl_find_port_ctx *ctx = data;
+	struct cxl_dport *dport;
 	struct cxl_port *port;
 
 	if (!is_cxl_port(dev))
@@ -926,7 +930,10 @@ static int match_port_by_dport(struct device *dev, const void *data)
 		return 0;
 
 	port = to_cxl_port(dev);
-	return cxl_find_dport_by_dev(port, ctx->dport_dev) != NULL;
+	dport = cxl_find_dport_by_dev(port, ctx->dport_dev);
+	if (ctx->dport)
+		*ctx->dport = dport;
+	return dport != NULL;
 }
 
 static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
@@ -942,24 +949,32 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
 	return NULL;
 }
 
-static struct cxl_port *find_cxl_port(struct device *dport_dev)
+static struct cxl_port *find_cxl_port(struct device *dport_dev,
+				      struct cxl_dport **dport)
 {
 	struct cxl_find_port_ctx ctx = {
 		.dport_dev = dport_dev,
+		.dport = dport,
 	};
+	struct cxl_port *port;
 
-	return __find_cxl_port(&ctx);
+	port = __find_cxl_port(&ctx);
+	return port;
 }
 
 static struct cxl_port *find_cxl_port_at(struct cxl_port *parent_port,
-					 struct device *dport_dev)
+					 struct device *dport_dev,
+					 struct cxl_dport **dport)
 {
 	struct cxl_find_port_ctx ctx = {
 		.dport_dev = dport_dev,
 		.parent_port = parent_port,
+		.dport = dport,
 	};
+	struct cxl_port *port;
 
-	return __find_cxl_port(&ctx);
+	port = __find_cxl_port(&ctx);
+	return port;
 }
 
 /*
@@ -1044,7 +1059,7 @@ static void cxl_detach_ep(void *data)
 		if (!dport_dev)
 			break;
 
-		port = find_cxl_port(dport_dev);
+		port = find_cxl_port(dport_dev, NULL);
 		if (!port)
 			continue;
 
@@ -1119,6 +1134,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
 	struct device *dparent = grandparent(dport_dev);
 	struct cxl_port *port, *parent_port = NULL;
 	resource_size_t component_reg_phys;
+	struct cxl_dport *dport;
 	int rc;
 
 	if (!dparent) {
@@ -1132,7 +1148,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
 		return -ENXIO;
 	}
 
-	parent_port = find_cxl_port(dparent);
+	parent_port = find_cxl_port(dparent, NULL);
 	if (!parent_port) {
 		/* iterate to create this parent_port */
 		return -EAGAIN;
@@ -1147,13 +1163,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
 		goto out;
 	}
 
-	port = find_cxl_port_at(parent_port, dport_dev);
+	port = find_cxl_port_at(parent_port, dport_dev, &dport);
 	if (!port) {
 		component_reg_phys = find_component_registers(uport_dev);
 		port = devm_cxl_add_port(&parent_port->dev, uport_dev,
 					 component_reg_phys, parent_port);
+		/* retry find to pick up the new dport information */
 		if (!IS_ERR(port))
-			get_device(&port->dev);
+			port = find_cxl_port_at(parent_port, dport_dev, &dport);
 	}
 out:
 	device_unlock(&parent_port->dev);
@@ -1163,7 +1180,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
 	else {
 		dev_dbg(&cxlmd->dev, "add to new port %s:%s\n",
 			dev_name(&port->dev), dev_name(port->uport));
-		rc = cxl_add_ep(port, &cxlmd->dev);
+		rc = cxl_add_ep(dport, &cxlmd->dev);
 		if (rc == -EEXIST) {
 			/*
 			 * "can't" happen, but this error code means
@@ -1197,6 +1214,7 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
 	for (iter = dev; iter; iter = grandparent(iter)) {
 		struct device *dport_dev = grandparent(iter);
 		struct device *uport_dev;
+		struct cxl_dport *dport;
 		struct cxl_port *port;
 
 		if (!dport_dev)
@@ -1212,12 +1230,12 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
 		dev_dbg(dev, "scan: iter: %s dport_dev: %s parent: %s\n",
 			dev_name(iter), dev_name(dport_dev),
 			dev_name(uport_dev));
-		port = find_cxl_port(dport_dev);
+		port = find_cxl_port(dport_dev, &dport);
 		if (port) {
 			dev_dbg(&cxlmd->dev,
 				"found already registered port %s:%s\n",
 				dev_name(&port->dev), dev_name(port->uport));
-			rc = cxl_add_ep(port, &cxlmd->dev);
+			rc = cxl_add_ep(dport, &cxlmd->dev);
 
 			/*
 			 * If the endpoint already exists in the port's list,
@@ -1258,7 +1276,7 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
 
 struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd)
 {
-	return find_cxl_port(grandparent(&cxlmd->dev));
+	return find_cxl_port(grandparent(&cxlmd->dev), NULL);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL);
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index d8edbdaa6208..e654251a54dd 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -363,10 +363,12 @@ struct cxl_dport {
 /**
  * struct cxl_ep - track an endpoint's interest in a port
  * @ep: device that hosts a generic CXL endpoint (expander or accelerator)
+ * @dport: which dport routes to this endpoint on this port
  * @list: node on port->endpoints list
  */
 struct cxl_ep {
 	struct device *ep;
+	struct cxl_dport *dport;
 	struct list_head list;
 };
 


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port
  2022-06-24  2:45 ` [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port Dan Williams
@ 2022-06-24  3:37   ` Alison Schofield
  2022-06-28 11:47   ` Jonathan Cameron
       [not found]   ` <CGME20220629174622uscas1p2236a084ce25771a3ab57c6f006632f35@uscas1p2.samsung.com>
  2 siblings, 0 replies; 157+ messages in thread
From: Alison Schofield @ 2022-06-24  3:37 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: linux-cxl, hch, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:14PM -0700, Dan Williams wrote:
> The upcoming region provisioning implementation has a need to
> dereference port->uport during the port unregister flow. Specifically,
> endpoint decoders need to be able to lookup their corresponding memdev
> via port->uport.
> 
> The existing ->dead flag was added for cases where the core was
> committed to tearing down the port, but needed to drop locks before
> calling device_unregister(). Reuse that flag to indicate to
> delete_endpoint() that it has no "release action" work to do as
> unregister_port() will handle it.
> 
> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>

> ---
>  drivers/cxl/core/port.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index dbce99bdffab..7810d1a8369b 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -370,7 +370,7 @@ static void unregister_port(void *_port)
>  		lock_dev = &parent->dev;
>  
>  	device_lock_assert(lock_dev);
> -	port->uport = NULL;
> +	port->dead = true;
>  	device_unregister(&port->dev);
>  }
>  
> @@ -857,7 +857,7 @@ static void delete_endpoint(void *data)
>  	parent = &parent_port->dev;
>  
>  	device_lock(parent);
> -	if (parent->driver && endpoint->uport) {
> +	if (parent->driver && !endpoint->dead) {
>  		devm_release_action(parent, cxl_unlink_uport, endpoint);
>  		devm_release_action(parent, unregister_port, endpoint);
>  	}
> 

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 03/46] cxl/hdm: Use local hdm variable
  2022-06-24  2:45 ` [PATCH 03/46] cxl/hdm: Use local hdm variable Dan Williams
@ 2022-06-24  3:38   ` Alison Schofield
  2022-06-28 15:16   ` Jonathan Cameron
       [not found]   ` <CGME20220629200312uscas1p292303b9325dcbfe59293f002dc9e6b03@uscas1p2.samsung.com>
  2 siblings, 0 replies; 157+ messages in thread
From: Alison Schofield @ 2022-06-24  3:38 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: linux-cxl, Ben Widawsky, hch, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:21PM -0700, Dan Williams wrote:
> From: Ben Widawsky <bwidawsk@kernel.org>
> 
> Save a few characters and use the already initialized local variable.
> 
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>

> ---
>  drivers/cxl/core/hdm.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index bfc8ee876278..ba3d2d959c71 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -251,8 +251,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  			return PTR_ERR(cxld);
>  		}
>  
> -		rc = init_hdm_decoder(port, cxld, target_map,
> -				      cxlhdm->regs.hdm_decoder, i);
> +		rc = init_hdm_decoder(port, cxld, target_map, hdm, i);
>  		if (rc) {
>  			put_device(&cxld->dev);
>  			failed++;
> 

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range
  2022-06-24  2:45 ` [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range Dan Williams
@ 2022-06-24  3:39   ` Alison Schofield
  2022-06-28 15:17   ` Jonathan Cameron
       [not found]   ` <CGME20220629200652uscas1p2c1da644ea63a5de69e14e046379779b1@uscas1p2.samsung.com>
  2 siblings, 0 replies; 157+ messages in thread
From: Alison Schofield @ 2022-06-24  3:39 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: linux-cxl, hch, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:28PM -0700, Dan Williams wrote:
> In preparation for growing a ->dpa_range attribute for endpoint
> decoders, rename the current ->decoder_range to the more descriptive
> ->hpa_range.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>

> ---
>  drivers/cxl/core/hdm.c       |    2 +-
>  drivers/cxl/core/port.c      |    4 ++--
>  drivers/cxl/cxl.h            |    4 ++--
>  tools/testing/cxl/test/cxl.c |    2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index ba3d2d959c71..5c070c93b07f 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -172,7 +172,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  		return -ENXIO;
>  	}
>  
> -	cxld->decoder_range = (struct range) {
> +	cxld->hpa_range = (struct range) {
>  		.start = base,
>  		.end = base + size - 1,
>  	};
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 7810d1a8369b..98bcbbd59a75 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -78,7 +78,7 @@ static ssize_t start_show(struct device *dev, struct device_attribute *attr,
>  	if (is_root_decoder(dev))
>  		start = cxld->platform_res.start;
>  	else
> -		start = cxld->decoder_range.start;
> +		start = cxld->hpa_range.start;
>  
>  	return sysfs_emit(buf, "%#llx\n", start);
>  }
> @@ -93,7 +93,7 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
>  	if (is_root_decoder(dev))
>  		size = resource_size(&cxld->platform_res);
>  	else
> -		size = range_len(&cxld->decoder_range);
> +		size = range_len(&cxld->hpa_range);
>  
>  	return sysfs_emit(buf, "%#llx\n", size);
>  }
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 6799b27c7db2..8256728cea8d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -198,7 +198,7 @@ enum cxl_decoder_type {
>   * @dev: this decoder's device
>   * @id: kernel device name id
>   * @platform_res: address space resources considered by root decoder
> - * @decoder_range: address space resources considered by midlevel decoder
> + * @hpa_range: Host physical address range mapped by this decoder
>   * @interleave_ways: number of cxl_dports in this decode
>   * @interleave_granularity: data stride per dport
>   * @target_type: accelerator vs expander (type2 vs type3) selector
> @@ -212,7 +212,7 @@ struct cxl_decoder {
>  	int id;
>  	union {
>  		struct resource platform_res;
> -		struct range decoder_range;
> +		struct range hpa_range;
>  	};
>  	int interleave_ways;
>  	int interleave_granularity;
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 431f2bddf6c8..7a08b025f2de 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -461,7 +461,7 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  			return PTR_ERR(cxld);
>  		}
>  
> -		cxld->decoder_range = (struct range) {
> +		cxld->hpa_range = (struct range) {
>  			.start = 0,
>  			.end = -1,
>  		};
> 

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 06/46] cxl/core: Drop is_cxl_decoder()
  2022-06-24  2:45 ` [PATCH 06/46] cxl/core: Drop is_cxl_decoder() Dan Williams
@ 2022-06-24  3:48   ` Alison Schofield
  2022-06-28 15:25   ` Jonathan Cameron
       [not found]   ` <CGME20220629203448uscas1p264a7f79a1ed7f9257eefcb3064c7d943@uscas1p2.samsung.com>
  2 siblings, 0 replies; 157+ messages in thread
From: Alison Schofield @ 2022-06-24  3:48 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: linux-cxl, hch, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:43PM -0700, Dan Williams wrote:
> This helper was only used to identify the object type for lockdep
> purposes. Now that lockdep support is done with explicit lock classes,
> this helper can be dropped.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>

> ---
>  drivers/cxl/core/port.c |    6 ------
>  drivers/cxl/cxl.h       |    1 -
>  2 files changed, 7 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index b51eb41aa839..13c321afe076 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -271,12 +271,6 @@ bool is_root_decoder(struct device *dev)
>  }
>  EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
>  
> -bool is_cxl_decoder(struct device *dev)
> -{
> -	return dev->type && dev->type->release == cxl_decoder_release;
> -}
> -EXPORT_SYMBOL_NS_GPL(is_cxl_decoder, CXL);
> -
>  struct cxl_decoder *to_cxl_decoder(struct device *dev)
>  {
>  	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 35ce17872fc1..6e08fe8cc0fe 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -337,7 +337,6 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
>  struct cxl_decoder *to_cxl_decoder(struct device *dev);
>  bool is_root_decoder(struct device *dev);
>  bool is_endpoint_decoder(struct device *dev);
> -bool is_cxl_decoder(struct device *dev);
>  struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>  					   unsigned int nr_targets);
>  struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> 

^ permalink raw reply	[flat|nested] 157+ messages in thread

* [PATCH 26/46] cxl/port: Record parent dport when adding ports
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (24 preceding siblings ...)
  2022-06-24  2:48 ` [PATCH 25/46] cxl/port: Record dport in endpoint references Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-29 17:02   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port Dan Williams
                   ` (22 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

At the time that cxl_port instances are being created, cache the dport
from the parent port that points to this new child port. This will be
useful for region provisioning when walking the tree to calculate
decoder targets, and saves rewalking the dport list after the fact to
build this information.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c      |  3 +--
 drivers/cxl/core/port.c | 30 +++++++++++++++++-------------
 drivers/cxl/cxl.h       |  7 +++++--
 drivers/cxl/mem.c       | 10 ++++++----
 4 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 5972f380cdf2..09fe92177d03 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -212,8 +212,7 @@ static int add_host_bridge_uport(struct device *match, void *arg)
 	if (rc)
 		return rc;
 
-	port = devm_cxl_add_port(host, match, dport->component_reg_phys,
-				 root_port);
+	port = devm_cxl_add_port(host, match, dport->component_reg_phys, dport);
 	if (IS_ERR(port))
 		return PTR_ERR(port);
 	dev_dbg(host, "%s: add: %s\n", dev_name(match), dev_name(&port->dev));
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index c54e1dbf92cb..8f53f59dd0fa 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -526,7 +526,7 @@ static struct lock_class_key cxl_port_key;
 
 static struct cxl_port *cxl_port_alloc(struct device *uport,
 				       resource_size_t component_reg_phys,
-				       struct cxl_port *parent_port)
+				       struct cxl_dport *parent_dport)
 {
 	struct cxl_port *port;
 	struct device *dev;
@@ -548,9 +548,12 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
 	 * description.
 	 */
 	dev = &port->dev;
-	if (parent_port) {
-		dev->parent = &parent_port->dev;
+	if (parent_dport) {
+		struct cxl_port *parent_port = parent_dport->port;
+
 		port->depth = parent_port->depth + 1;
+		port->parent_dport = parent_dport;
+		dev->parent = &parent_port->dev;
 	} else
 		dev->parent = uport;
 
@@ -579,24 +582,24 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
  * @host: host device for devm operations
  * @uport: "physical" device implementing this upstream port
  * @component_reg_phys: (optional) for configurable cxl_port instances
- * @parent_port: next hop up in the CXL memory decode hierarchy
+ * @parent_dport: next hop up in the CXL memory decode hierarchy
  */
 struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
 				   resource_size_t component_reg_phys,
-				   struct cxl_port *parent_port)
+				   struct cxl_dport *parent_dport)
 {
 	struct cxl_port *port;
 	struct device *dev;
 	int rc;
 
-	port = cxl_port_alloc(uport, component_reg_phys, parent_port);
+	port = cxl_port_alloc(uport, component_reg_phys, parent_dport);
 	if (IS_ERR(port))
 		return port;
 
 	dev = &port->dev;
 	if (is_cxl_memdev(uport))
 		rc = dev_set_name(dev, "endpoint%d", port->id);
-	else if (parent_port)
+	else if (parent_dport)
 		rc = dev_set_name(dev, "port%d", port->id);
 	else
 		rc = dev_set_name(dev, "root%d", port->id);
@@ -998,7 +1001,7 @@ static void delete_endpoint(void *data)
 	struct cxl_port *parent_port;
 	struct device *parent;
 
-	parent_port = cxl_mem_find_port(cxlmd);
+	parent_port = cxl_mem_find_port(cxlmd, NULL);
 	if (!parent_port)
 		goto out;
 	parent = &parent_port->dev;
@@ -1133,8 +1136,8 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
 {
 	struct device *dparent = grandparent(dport_dev);
 	struct cxl_port *port, *parent_port = NULL;
+	struct cxl_dport *dport, *parent_dport;
 	resource_size_t component_reg_phys;
-	struct cxl_dport *dport;
 	int rc;
 
 	if (!dparent) {
@@ -1148,7 +1151,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
 		return -ENXIO;
 	}
 
-	parent_port = find_cxl_port(dparent, NULL);
+	parent_port = find_cxl_port(dparent, &parent_dport);
 	if (!parent_port) {
 		/* iterate to create this parent_port */
 		return -EAGAIN;
@@ -1167,7 +1170,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
 	if (!port) {
 		component_reg_phys = find_component_registers(uport_dev);
 		port = devm_cxl_add_port(&parent_port->dev, uport_dev,
-					 component_reg_phys, parent_port);
+					 component_reg_phys, parent_dport);
 		/* retry find to pick up the new dport information */
 		if (!IS_ERR(port))
 			port = find_cxl_port_at(parent_port, dport_dev, &dport);
@@ -1274,9 +1277,10 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
 
-struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd)
+struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
+				   struct cxl_dport **dport)
 {
-	return find_cxl_port(grandparent(&cxlmd->dev), NULL);
+	return find_cxl_port(grandparent(&cxlmd->dev), dport);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL);
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index e654251a54dd..55d34b1576f1 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -325,6 +325,7 @@ struct cxl_nvdimm {
  * @id: id for port device-name
  * @dports: cxl_dport instances referenced by decoders
  * @endpoints: cxl_ep instances, endpoints that are a descendant of this port
+ * @parent_dport: dport that points to this port in the parent
  * @decoder_ida: allocator for decoder ids
  * @dpa_end: cursor to track highest allocated decoder for allocation ordering
  * @component_reg_phys: component register capability base address (optional)
@@ -337,6 +338,7 @@ struct cxl_port {
 	int id;
 	struct list_head dports;
 	struct list_head endpoints;
+	struct cxl_dport *parent_dport;
 	struct ida decoder_ida;
 	int dpa_end;
 	resource_size_t component_reg_phys;
@@ -391,11 +393,12 @@ int devm_cxl_register_pci_bus(struct device *host, struct device *uport,
 struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port);
 struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
 				   resource_size_t component_reg_phys,
-				   struct cxl_port *parent_port);
+				   struct cxl_dport *parent_dport);
 struct cxl_port *find_cxl_root(struct device *dev);
 int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
 int cxl_bus_rescan(void);
-struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd);
+struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
+				   struct cxl_dport **dport);
 bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
 
 struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 7513bea55145..2786d3402c9e 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -26,14 +26,15 @@
  */
 
 static int create_endpoint(struct cxl_memdev *cxlmd,
-			   struct cxl_port *parent_port)
+			   struct cxl_dport *parent_dport)
 {
+	struct cxl_port *parent_port = parent_dport->port;
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
 	struct cxl_port *endpoint;
 	int rc;
 
 	endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev,
-				     cxlds->component_reg_phys, parent_port);
+				     cxlds->component_reg_phys, parent_dport);
 	if (IS_ERR(endpoint))
 		return PTR_ERR(endpoint);
 
@@ -76,6 +77,7 @@ static int cxl_mem_probe(struct device *dev)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_port *parent_port;
+	struct cxl_dport *dport;
 	struct dentry *dentry;
 	int rc;
 
@@ -100,7 +102,7 @@ static int cxl_mem_probe(struct device *dev)
 	if (rc)
 		return rc;
 
-	parent_port = cxl_mem_find_port(cxlmd);
+	parent_port = cxl_mem_find_port(cxlmd, &dport);
 	if (!parent_port) {
 		dev_err(dev, "CXL port topology not found\n");
 		return -ENXIO;
@@ -114,7 +116,7 @@ static int cxl_mem_probe(struct device *dev)
 		goto unlock;
 	}
 
-	rc = create_endpoint(cxlmd, parent_port);
+	rc = create_endpoint(cxlmd, dport);
 unlock:
 	device_unlock(&parent_port->dev);
 	put_device(&parent_port->dev);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (25 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 26/46] cxl/port: Record parent dport when adding ports Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-29 17:19   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 28/46] cxl/port: Move dport tracking to an xarray Dan Williams
                   ` (21 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

In preparation for region provisioning that needs to walk the topology
by endpoints, use an xarray to record endpoint interest in a given port.
In addition to being more space and time efficient it also reduces the
complexity of the implementation by moving locking internal to the
xarray implementation. It also allows for a single cxl_ep reference to
be recorded in multiple xarrays.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/port.c | 60 ++++++++++++++++++++---------------------
 drivers/cxl/cxl.h       |  4 +--
 2 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 8f53f59dd0fa..ea3ab9baf232 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -431,22 +431,27 @@ static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
 
 static void cxl_ep_release(struct cxl_ep *ep)
 {
-	if (!ep)
-		return;
-	list_del(&ep->list);
 	put_device(ep->ep);
 	kfree(ep);
 }
 
+static void cxl_ep_remove(struct cxl_port *port, struct cxl_ep *ep)
+{
+	if (!ep)
+		return;
+	xa_erase(&port->endpoints, (unsigned long) ep->ep);
+	cxl_ep_release(ep);
+}
+
 static void cxl_port_release(struct device *dev)
 {
 	struct cxl_port *port = to_cxl_port(dev);
-	struct cxl_ep *ep, *_e;
+	unsigned long index;
+	struct cxl_ep *ep;
 
-	device_lock(dev);
-	list_for_each_entry_safe(ep, _e, &port->endpoints, list)
-		cxl_ep_release(ep);
-	device_unlock(dev);
+	xa_for_each(&port->endpoints, index, ep)
+		cxl_ep_remove(port, ep);
+	xa_destroy(&port->endpoints);
 	ida_free(&cxl_port_ida, port->id);
 	kfree(port);
 }
@@ -562,7 +567,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
 	ida_init(&port->decoder_ida);
 	port->dpa_end = -1;
 	INIT_LIST_HEAD(&port->dports);
-	INIT_LIST_HEAD(&port->endpoints);
+	xa_init(&port->endpoints);
 
 	device_initialize(dev);
 	lockdep_set_class_and_subclass(&dev->mutex, &cxl_port_key, port->depth);
@@ -858,33 +863,21 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport, CXL);
 
-static struct cxl_ep *find_ep(struct cxl_port *port, struct device *ep_dev)
-{
-	struct cxl_ep *ep;
-
-	device_lock_assert(&port->dev);
-	list_for_each_entry(ep, &port->endpoints, list)
-		if (ep->ep == ep_dev)
-			return ep;
-	return NULL;
-}
-
 static int add_ep(struct cxl_ep *new)
 {
 	struct cxl_port *port = new->dport->port;
-	struct cxl_ep *dup;
+	int rc;
 
 	device_lock(&port->dev);
 	if (port->dead) {
 		device_unlock(&port->dev);
 		return -ENXIO;
 	}
-	dup = find_ep(port, new->ep);
-	if (!dup)
-		list_add_tail(&new->list, &port->endpoints);
+	rc = xa_insert(&port->endpoints, (unsigned long)new->ep, new,
+		       GFP_KERNEL);
 	device_unlock(&port->dev);
 
-	return dup ? -EEXIST : 0;
+	return rc;
 }
 
 /**
@@ -905,7 +898,6 @@ static int cxl_add_ep(struct cxl_dport *dport, struct device *ep_dev)
 	if (!ep)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&ep->list);
 	ep->ep = get_device(ep_dev);
 	ep->dport = dport;
 
@@ -1048,6 +1040,12 @@ static void delete_switch_port(struct cxl_port *port, struct list_head *dports)
 	devm_release_action(port->dev.parent, unregister_port, port);
 }
 
+static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
+				  struct cxl_memdev *cxlmd)
+{
+	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
+}
+
 static void cxl_detach_ep(void *data)
 {
 	struct cxl_memdev *cxlmd = data;
@@ -1086,11 +1084,11 @@ static void cxl_detach_ep(void *data)
 		}
 
 		device_lock(&port->dev);
-		ep = find_ep(port, &cxlmd->dev);
+		ep = cxl_ep_load(port, cxlmd);
 		dev_dbg(&cxlmd->dev, "disconnect %s from %s\n",
 			ep ? dev_name(ep->ep) : "", dev_name(&port->dev));
-		cxl_ep_release(ep);
-		if (ep && !port->dead && list_empty(&port->endpoints) &&
+		cxl_ep_remove(port, ep);
+		if (ep && !port->dead && xa_empty(&port->endpoints) &&
 		    !is_cxl_root(parent_port)) {
 			/*
 			 * This was the last ep attached to a dynamically
@@ -1184,7 +1182,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
 		dev_dbg(&cxlmd->dev, "add to new port %s:%s\n",
 			dev_name(&port->dev), dev_name(port->uport));
 		rc = cxl_add_ep(dport, &cxlmd->dev);
-		if (rc == -EEXIST) {
+		if (rc == -EBUSY) {
 			/*
 			 * "can't" happen, but this error code means
 			 * something to the caller, so translate it.
@@ -1247,7 +1245,7 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
 			 * the parent_port lock as the current port may be being
 			 * reaped.
 			 */
-			if (rc && rc != -EEXIST) {
+			if (rc && rc != -EBUSY) {
 				put_device(&port->dev);
 				return rc;
 			}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 55d34b1576f1..3d149780d724 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -337,7 +337,7 @@ struct cxl_port {
 	struct device *uport;
 	int id;
 	struct list_head dports;
-	struct list_head endpoints;
+	struct xarray endpoints;
 	struct cxl_dport *parent_dport;
 	struct ida decoder_ida;
 	int dpa_end;
@@ -366,12 +366,10 @@ struct cxl_dport {
  * struct cxl_ep - track an endpoint's interest in a port
  * @ep: device that hosts a generic CXL endpoint (expander or accelerator)
  * @dport: which dport routes to this endpoint on this port
- * @list: node on port->endpoints list
  */
 struct cxl_ep {
 	struct device *ep;
 	struct cxl_dport *dport;
-	struct list_head list;
 };
 
 /*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 28/46] cxl/port: Move dport tracking to an xarray
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (26 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30  9:18   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 29/46] cxl/port: Cache CXL host bridge data Dan Williams
                   ` (20 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

Reduce the complexity and the overhead of walking the topology to
determine endpoint connectivity to root decoder interleave
configurations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c      |  2 +-
 drivers/cxl/core/hdm.c  |  6 ++-
 drivers/cxl/core/port.c | 88 ++++++++++++++++++-----------------------
 drivers/cxl/cxl.h       | 12 +++---
 4 files changed, 51 insertions(+), 57 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 09fe92177d03..92ad1f359faf 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -197,7 +197,7 @@ static int add_host_bridge_uport(struct device *match, void *arg)
 	if (!bridge)
 		return 0;
 
-	dport = cxl_find_dport_by_dev(root_port, match);
+	dport = cxl_dport_load(root_port, match);
 	if (!dport) {
 		dev_dbg(host, "host bridge expected and not found\n");
 		return 0;
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index c0164f9b2195..672bf3e97811 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -50,8 +50,9 @@ static int add_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 int devm_cxl_add_passthrough_decoder(struct cxl_port *port)
 {
 	struct cxl_switch_decoder *cxlsd;
-	struct cxl_dport *dport;
+	struct cxl_dport *dport = NULL;
 	int single_port_map[1];
+	unsigned long index;
 
 	cxlsd = cxl_switch_decoder_alloc(port, 1);
 	if (IS_ERR(cxlsd))
@@ -59,7 +60,8 @@ int devm_cxl_add_passthrough_decoder(struct cxl_port *port)
 
 	device_lock_assert(&port->dev);
 
-	dport = list_first_entry(&port->dports, typeof(*dport), list);
+	xa_for_each(&port->dports, index, dport)
+		break;
 	single_port_map[0] = dport->port_id;
 
 	return add_hdm_decoder(port, &cxlsd->cxld, single_port_map);
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index ea3ab9baf232..d2f6898940fa 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -452,6 +452,7 @@ static void cxl_port_release(struct device *dev)
 	xa_for_each(&port->endpoints, index, ep)
 		cxl_ep_remove(port, ep);
 	xa_destroy(&port->endpoints);
+	xa_destroy(&port->dports);
 	ida_free(&cxl_port_ida, port->id);
 	kfree(port);
 }
@@ -566,7 +567,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
 	port->component_reg_phys = component_reg_phys;
 	ida_init(&port->decoder_ida);
 	port->dpa_end = -1;
-	INIT_LIST_HEAD(&port->dports);
+	xa_init(&port->dports);
 	xa_init(&port->endpoints);
 
 	device_initialize(dev);
@@ -696,17 +697,13 @@ static int match_root_child(struct device *dev, const void *match)
 		return 0;
 
 	port = to_cxl_port(dev);
-	device_lock(dev);
-	list_for_each_entry(dport, &port->dports, list) {
-		iter = match;
-		while (iter) {
-			if (iter == dport->dport)
-				goto out;
-			iter = iter->parent;
-		}
+	iter = match;
+	while (iter) {
+		dport = cxl_dport_load(port, iter);
+		if (dport)
+			break;
+		iter = iter->parent;
 	}
-out:
-	device_unlock(dev);
 
 	return !!iter;
 }
@@ -730,9 +727,10 @@ EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL);
 static struct cxl_dport *find_dport(struct cxl_port *port, int id)
 {
 	struct cxl_dport *dport;
+	unsigned long index;
 
 	device_lock_assert(&port->dev);
-	list_for_each_entry (dport, &port->dports, list)
+	xa_for_each(&port->dports, index, dport)
 		if (dport->port_id == id)
 			return dport;
 	return NULL;
@@ -741,18 +739,21 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id)
 static int add_dport(struct cxl_port *port, struct cxl_dport *new)
 {
 	struct cxl_dport *dup;
+	int rc;
 
 	device_lock_assert(&port->dev);
 	dup = find_dport(port, new->port_id);
-	if (dup)
+	if (dup) {
 		dev_err(&port->dev,
 			"unable to add dport%d-%s non-unique port id (%s)\n",
 			new->port_id, dev_name(new->dport),
 			dev_name(dup->dport));
-	else
-		list_add_tail(&new->list, &port->dports);
+		rc = -EBUSY;
+	} else
+		rc = xa_insert(&port->dports, (unsigned long)new->dport, new,
+			       GFP_KERNEL);
 
-	return dup ? -EEXIST : 0;
+	return rc;
 }
 
 /*
@@ -779,10 +780,8 @@ static void cxl_dport_remove(void *data)
 	struct cxl_dport *dport = data;
 	struct cxl_port *port = dport->port;
 
+	xa_erase(&port->dports, (unsigned long) dport->dport);
 	put_device(dport->dport);
-	cond_cxl_root_lock(port);
-	list_del(&dport->list);
-	cond_cxl_root_unlock(port);
 }
 
 static void cxl_dport_unlink(void *data)
@@ -834,7 +833,6 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
 	if (!dport)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_LIST_HEAD(&dport->list);
 	dport->dport = dport_dev;
 	dport->port_id = port_id;
 	dport->component_reg_phys = component_reg_phys;
@@ -925,7 +923,7 @@ static int match_port_by_dport(struct device *dev, const void *data)
 		return 0;
 
 	port = to_cxl_port(dev);
-	dport = cxl_find_dport_by_dev(port, ctx->dport_dev);
+	dport = cxl_dport_load(port, ctx->dport_dev);
 	if (ctx->dport)
 		*ctx->dport = dport;
 	return dport != NULL;
@@ -1025,19 +1023,27 @@ EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
  * for a port to be unregistered is when all memdevs beneath that port have gone
  * through ->remove(). This "bottom-up" removal selectively removes individual
  * child ports manually. This depends on devm_cxl_add_port() to not change is
- * devm action registration order.
+ * devm action registration order, and for dports to have already been
+ * destroyed by reap_dports().
  */
-static void delete_switch_port(struct cxl_port *port, struct list_head *dports)
+static void delete_switch_port(struct cxl_port *port)
+{
+	devm_release_action(port->dev.parent, cxl_unlink_uport, port);
+	devm_release_action(port->dev.parent, unregister_port, port);
+}
+
+static void reap_dports(struct cxl_port *port)
 {
-	struct cxl_dport *dport, *_d;
+	struct cxl_dport *dport;
+	unsigned long index;
+
+	device_lock_assert(&port->dev);
 
-	list_for_each_entry_safe(dport, _d, dports, list) {
+	xa_for_each(&port->dports, index, dport) {
 		devm_release_action(&port->dev, cxl_dport_unlink, dport);
 		devm_release_action(&port->dev, cxl_dport_remove, dport);
 		devm_kfree(&port->dev, dport);
 	}
-	devm_release_action(port->dev.parent, cxl_unlink_uport, port);
-	devm_release_action(port->dev.parent, unregister_port, port);
 }
 
 static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
@@ -1054,8 +1060,8 @@ static void cxl_detach_ep(void *data)
 	for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) {
 		struct device *dport_dev = grandparent(iter);
 		struct cxl_port *port, *parent_port;
-		LIST_HEAD(reap_dports);
 		struct cxl_ep *ep;
+		bool died = false;
 
 		if (!dport_dev)
 			break;
@@ -1095,15 +1101,16 @@ static void cxl_detach_ep(void *data)
 			 * enumerated port. Block new cxl_add_ep() and garbage
 			 * collect the port.
 			 */
+			died = true;
 			port->dead = true;
-			list_splice_init(&port->dports, &reap_dports);
+			reap_dports(port);
 		}
 		device_unlock(&port->dev);
 
-		if (!list_empty(&reap_dports)) {
+		if (died) {
 			dev_dbg(&cxlmd->dev, "delete %s\n",
 				dev_name(&port->dev));
-			delete_switch_port(port, &reap_dports);
+			delete_switch_port(port);
 		}
 		put_device(&port->dev);
 		device_unlock(&parent_port->dev);
@@ -1282,23 +1289,6 @@ struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL);
 
-struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
-					const struct device *dev)
-{
-	struct cxl_dport *dport;
-
-	device_lock(&port->dev);
-	list_for_each_entry(dport, &port->dports, list)
-		if (dport->dport == dev) {
-			device_unlock(&port->dev);
-			return dport;
-		}
-
-	device_unlock(&port->dev);
-	return NULL;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_find_dport_by_dev, CXL);
-
 static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
 				    struct cxl_port *port, int *target_map)
 {
@@ -1309,7 +1299,7 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
 
 	device_lock_assert(&port->dev);
 
-	if (list_empty(&port->dports))
+	if (xa_empty(&port->dports))
 		return -EINVAL;
 
 	write_seqlock(&cxlsd->target_lock);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 3d149780d724..8e2c1b393552 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -336,7 +336,7 @@ struct cxl_port {
 	struct device dev;
 	struct device *uport;
 	int id;
-	struct list_head dports;
+	struct xarray dports;
 	struct xarray endpoints;
 	struct cxl_dport *parent_dport;
 	struct ida decoder_ida;
@@ -346,20 +346,24 @@ struct cxl_port {
 	unsigned int depth;
 };
 
+static inline struct cxl_dport *cxl_dport_load(struct cxl_port *port,
+					       const struct device *dport_dev)
+{
+	return xa_load(&port->dports, (unsigned long)dport_dev);
+}
+
 /**
  * struct cxl_dport - CXL downstream port
  * @dport: PCI bridge or firmware device representing the downstream link
  * @port_id: unique hardware identifier for dport in decoder target list
  * @component_reg_phys: downstream port component registers
  * @port: reference to cxl_port that contains this downstream port
- * @list: node for a cxl_port's list of cxl_dport instances
  */
 struct cxl_dport {
 	struct device *dport;
 	int port_id;
 	resource_size_t component_reg_phys;
 	struct cxl_port *port;
-	struct list_head list;
 };
 
 /**
@@ -402,8 +406,6 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
 struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
 				     struct device *dport, int port_id,
 				     resource_size_t component_reg_phys);
-struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
-					const struct device *dev);
 
 struct cxl_decoder *to_cxl_decoder(struct device *dev);
 struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 29/46] cxl/port: Cache CXL host bridge data
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (27 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 28/46] cxl/port: Move dport tracking to an xarray Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30  9:21   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity Dan Williams
                   ` (19 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

Region creation has need for checking host-bridge connectivity when
adding endpoints to regions. Record, at port creation time, the
host-bridge to provide a useful shortcut from any location in the
topology to the most-significant ancestor.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/port.c | 16 +++++++++++++++-
 drivers/cxl/cxl.h       |  2 ++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index d2f6898940fa..c48f217e689a 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -546,6 +546,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
 	if (rc < 0)
 		goto err;
 	port->id = rc;
+	port->uport = uport;
 
 	/*
 	 * The top-level cxl_port "cxl_root" does not have a cxl_port as
@@ -556,14 +557,27 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
 	dev = &port->dev;
 	if (parent_dport) {
 		struct cxl_port *parent_port = parent_dport->port;
+		struct cxl_port *iter;
 
 		port->depth = parent_port->depth + 1;
 		port->parent_dport = parent_dport;
 		dev->parent = &parent_port->dev;
+		/*
+		 * walk to the host bridge, or the first ancestor that knows
+		 * the host bridge
+		 */
+		iter = port;
+		while (!iter->host_bridge &&
+		       !is_cxl_root(to_cxl_port(iter->dev.parent)))
+			iter = to_cxl_port(iter->dev.parent);
+		if (iter->host_bridge)
+			port->host_bridge = iter->host_bridge;
+		else
+			port->host_bridge = iter->uport;
+		dev_dbg(uport, "host-bridge: %s\n", dev_name(port->host_bridge));
 	} else
 		dev->parent = uport;
 
-	port->uport = uport;
 	port->component_reg_phys = component_reg_phys;
 	ida_init(&port->decoder_ida);
 	port->dpa_end = -1;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 8e2c1b393552..0211cf0d3574 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -331,6 +331,7 @@ struct cxl_nvdimm {
  * @component_reg_phys: component register capability base address (optional)
  * @dead: last ep has been removed, force port re-creation
  * @depth: How deep this port is relative to the root. depth 0 is the root.
+ * @host_bridge: Shortcut to the platform attach point for this port
  */
 struct cxl_port {
 	struct device dev;
@@ -344,6 +345,7 @@ struct cxl_port {
 	resource_size_t component_reg_phys;
 	bool dead;
 	unsigned int depth;
+	struct device *host_bridge;
 };
 
 static inline struct cxl_dport *cxl_dport_load(struct cxl_port *port,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (28 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 29/46] cxl/port: Cache CXL host bridge data Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30  9:26   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices Dan Williams
                   ` (18 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Ben Widawsky, Dan Williams

From: Ben Widawsky <bwidawsk@kernel.org>

The region provisioning flow involves selecting interleave ways +
granularity settings for a region, and then programming the decoder
topology to meet those constraints, if possible. For example, root
decoders set the minimum interleave ways + granularity for any hosted
regions.

Given decoder programming is not atomic and collisions can occur between
multiple requesting regions userpace will be resonsible for conflict
resolution and it needs these attributes to make those decisions.

Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
[djbw: reword changelog, make read-only, add sysfs ABI documentaion]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 23 +++++++++++++++++++++++
 drivers/cxl/core/port.c                 | 23 +++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 85844f9bc00b..2a4e4163879f 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -215,3 +215,26 @@ Description:
 		allocations are enforced to occur in increasing 'decoderX.Y/id'
 		order and frees are enforced to occur in decreasing
 		'decoderX.Y/id' order.
+
+
+What:		/sys/bus/cxl/devices/decoderX.Y/interleave_ways
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) The number of targets across which this decoder's host
+		physical address (HPA) memory range is interleaved. The device
+		maps every Nth block of HPA (of size ==
+		'interleave_granularity') to consecutive DPA addresses. The
+		decoder's position in the interleave is determined by the
+		device's (endpoint or switch) switch ancestry.
+
+
+What:		/sys/bus/cxl/devices/decoderX.Y/interleave_granularity
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) The number of consecutive bytes of host physical address
+		space this decoder claims at address N before awaint the next
+		address (N + interleave_granularity * intereleave_ways).
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index c48f217e689a..08a380d20cf1 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -260,10 +260,33 @@ static ssize_t dpa_size_store(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RW(dpa_size);
 
+static ssize_t interleave_granularity_show(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+
+	return sysfs_emit(buf, "%d\n", cxld->interleave_granularity);
+}
+
+static DEVICE_ATTR_RO(interleave_granularity);
+
+static ssize_t interleave_ways_show(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+
+	return sysfs_emit(buf, "%d\n", cxld->interleave_ways);
+}
+
+static DEVICE_ATTR_RO(interleave_ways);
+
 static struct attribute *cxl_decoder_base_attrs[] = {
 	&dev_attr_start.attr,
 	&dev_attr_size.attr,
 	&dev_attr_locked.attr,
+	&dev_attr_interleave_granularity.attr,
+	&dev_attr_interleave_ways.attr,
 	NULL,
 };
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (29 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30  9:33   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints Dan Williams
                   ` (17 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

Unless and until accelerator (type-2) drivers start registering for
CXL.mem mapping services from the CXL subsystem core, initialize idle
HDM decoders to the "expander" type. I.e. the only CXL devices using the
CXL core presently are those implementing the CXL 2.0 Type-3 memory
expander device class code that the cxl_pci driver claims.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 672bf3e97811..7b58f6911523 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -474,6 +474,17 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 		cxld->flags |= CXL_DECODER_F_ENABLE;
 		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
 			cxld->flags |= CXL_DECODER_F_LOCK;
+		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
+			cxld->target_type = CXL_DECODER_EXPANDER;
+		else
+			cxld->target_type = CXL_DECODER_ACCELERATOR;
+	} else {
+		/* unless / until type-2 drivers arrive, assume type-3 */
+		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl) == 0) {
+			ctrl |= CXL_HDM_DECODER0_CTRL_TYPE;
+			writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
+		}
+		cxld->target_type = CXL_DECODER_EXPANDER;
 	}
 	rc = cxl_to_ways(FIELD_GET(CXL_HDM_DECODER0_CTRL_IW_MASK, ctrl),
 			 &cxld->interleave_ways);
@@ -488,11 +499,6 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 	if (rc)
 		return rc;
 
-	if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
-		cxld->target_type = CXL_DECODER_EXPANDER;
-	else
-		cxld->target_type = CXL_DECODER_ACCELERATOR;
-
 	if (!cxled) {
 		target_list.value =
 			ioread64_hi_lo(hdm + CXL_HDM_DECODER0_TL_LOW(which));
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (30 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30  9:48   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 33/46] resource: Introduce alloc_free_mem_region() Dan Williams
                   ` (16 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

The port scanning algorithm in devm_cxl_enumerate_ports() walks up the
topology and adds cxl_port objects starting from the root down to the
endpoint. When those ports are initially created they know all their
dports, but they do not know the downstream cxl_port instance that
represents the next descendant in the topology. Rework create_endpoint()
into devm_cxl_add_endpoint() that enumerates the downstream cxl_port
topology into each port's 'struct cxl_ep' record for each endpoint it
that the port is an ancestor.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/port.c | 41 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h       |  7 ++++++-
 drivers/cxl/mem.c       | 30 +-----------------------------
 3 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 08a380d20cf1..2e56903399c2 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1089,6 +1089,47 @@ static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
 	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
 }
 
+int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd,
+			  struct cxl_dport *parent_dport)
+{
+	struct cxl_port *parent_port = parent_dport->port;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_port *endpoint, *iter, *down;
+	int rc;
+
+	/*
+	 * Now that the path to the root is established record all the
+	 * intervening ports in the chain.
+	 */
+	for (iter = parent_port, down = NULL; !is_cxl_root(iter);
+	     down = iter, iter = to_cxl_port(iter->dev.parent)) {
+		struct cxl_ep *ep;
+
+		ep = cxl_ep_load(iter, cxlmd);
+		ep->next = down;
+	}
+
+	endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev,
+				     cxlds->component_reg_phys, parent_dport);
+	if (IS_ERR(endpoint))
+		return PTR_ERR(endpoint);
+
+	dev_dbg(&cxlmd->dev, "add: %s\n", dev_name(&endpoint->dev));
+
+	rc = cxl_endpoint_autoremove(cxlmd, endpoint);
+	if (rc)
+		return rc;
+
+	if (!endpoint->dev.driver) {
+		dev_err(&cxlmd->dev, "%s failed probe\n",
+			dev_name(&endpoint->dev));
+		return -ENXIO;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_add_endpoint, CXL);
+
 static void cxl_detach_ep(void *data)
 {
 	struct cxl_memdev *cxlmd = data;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 0211cf0d3574..f761cf78cc05 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -371,11 +371,14 @@ struct cxl_dport {
 /**
  * struct cxl_ep - track an endpoint's interest in a port
  * @ep: device that hosts a generic CXL endpoint (expander or accelerator)
- * @dport: which dport routes to this endpoint on this port
+ * @dport: which dport routes to this endpoint on @port
+ * @next: cxl switch port across the link attached to @dport NULL if
+ *	  attached to an endpoint
  */
 struct cxl_ep {
 	struct device *ep;
 	struct cxl_dport *dport;
+	struct cxl_port *next;
 };
 
 /*
@@ -398,6 +401,8 @@ struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port);
 struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
 				   resource_size_t component_reg_phys,
 				   struct cxl_dport *parent_dport);
+int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd,
+			  struct cxl_dport *parent_dport);
 struct cxl_port *find_cxl_root(struct device *dev);
 int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
 int cxl_bus_rescan(void);
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 2786d3402c9e..64ccf053d32c 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -25,34 +25,6 @@
  * in higher level operations.
  */
 
-static int create_endpoint(struct cxl_memdev *cxlmd,
-			   struct cxl_dport *parent_dport)
-{
-	struct cxl_port *parent_port = parent_dport->port;
-	struct cxl_dev_state *cxlds = cxlmd->cxlds;
-	struct cxl_port *endpoint;
-	int rc;
-
-	endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev,
-				     cxlds->component_reg_phys, parent_dport);
-	if (IS_ERR(endpoint))
-		return PTR_ERR(endpoint);
-
-	dev_dbg(&cxlmd->dev, "add: %s\n", dev_name(&endpoint->dev));
-
-	rc = cxl_endpoint_autoremove(cxlmd, endpoint);
-	if (rc)
-		return rc;
-
-	if (!endpoint->dev.driver) {
-		dev_err(&cxlmd->dev, "%s failed probe\n",
-			dev_name(&endpoint->dev));
-		return -ENXIO;
-	}
-
-	return 0;
-}
-
 static void enable_suspend(void *data)
 {
 	cxl_mem_active_dec();
@@ -116,7 +88,7 @@ static int cxl_mem_probe(struct device *dev)
 		goto unlock;
 	}
 
-	rc = create_endpoint(cxlmd, dport);
+	rc = devm_cxl_add_endpoint(cxlmd, dport);
 unlock:
 	device_unlock(&parent_port->dev);
 	put_device(&parent_port->dev);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 33/46] resource: Introduce alloc_free_mem_region()
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (31 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 10:35   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 34/46] cxl/region: Add region creation support Dan Williams
                   ` (15 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl
  Cc: nvdimm, linux-pci, patches, hch, Dan Williams, Jason Gunthorpe,
	Matthew Wilcox, Christoph Hellwig

The core of devm_request_free_mem_region() is a helper that searches for
free space in iomem_resource and performs __request_region_locked() on
the result of that search. The policy choices of the implementation
conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is
immediately marked busy, and a preference to search for the first-fit
free range in descending order from the top of the physical address
space.

CXL has a need for a similar allocator, but with the following tweaks:

1/ Search for free space in ascending order

2/ Search for free space relative to a given CXL window

3/ 'insert' rather than 'request' the new resource given downstream
   drivers from the CXL Region driver (like the pmem or dax drivers) are
   responsible for request_mem_region() when they activate the memory
   range.

Rework __request_free_mem_region() into get_free_mem_region() which
takes a set of GFR_* (Get Free Region) flags to control the allocation
policy (ascending vs descending), and "busy" policy (insert_resource()
vs request_region()).

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/ioport.h |   2 +
 kernel/resource.c      | 174 ++++++++++++++++++++++++++++++++---------
 mm/Kconfig             |   5 ++
 3 files changed, 146 insertions(+), 35 deletions(-)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index ec5f71f7135b..ed03518347aa 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -329,6 +329,8 @@ struct resource *devm_request_free_mem_region(struct device *dev,
 		struct resource *base, unsigned long size);
 struct resource *request_free_mem_region(struct resource *base,
 		unsigned long size, const char *name);
+struct resource *alloc_free_mem_region(struct resource *base,
+		unsigned long size, unsigned long align, const char *name);
 
 static inline void irqresource_disabled(struct resource *res, u32 irq)
 {
diff --git a/kernel/resource.c b/kernel/resource.c
index 53a534db350e..9fc990274106 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -489,8 +489,9 @@ int __weak page_is_ram(unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(page_is_ram);
 
-static int __region_intersects(resource_size_t start, size_t size,
-			unsigned long flags, unsigned long desc)
+static int __region_intersects(struct resource *parent, resource_size_t start,
+			       size_t size, unsigned long flags,
+			       unsigned long desc)
 {
 	struct resource res;
 	int type = 0; int other = 0;
@@ -499,7 +500,7 @@ static int __region_intersects(resource_size_t start, size_t size,
 	res.start = start;
 	res.end = start + size - 1;
 
-	for (p = iomem_resource.child; p ; p = p->sibling) {
+	for (p = parent->child; p ; p = p->sibling) {
 		bool is_type = (((p->flags & flags) == flags) &&
 				((desc == IORES_DESC_NONE) ||
 				 (desc == p->desc)));
@@ -543,7 +544,7 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags,
 	int ret;
 
 	read_lock(&resource_lock);
-	ret = __region_intersects(start, size, flags, desc);
+	ret = __region_intersects(&iomem_resource, start, size, flags, desc);
 	read_unlock(&resource_lock);
 
 	return ret;
@@ -1780,62 +1781,135 @@ void resource_list_free(struct list_head *head)
 }
 EXPORT_SYMBOL(resource_list_free);
 
-#ifdef CONFIG_DEVICE_PRIVATE
-static struct resource *__request_free_mem_region(struct device *dev,
-		struct resource *base, unsigned long size, const char *name)
+#ifdef CONFIG_GET_FREE_REGION
+#define GFR_DESCENDING		(1UL << 0)
+#define GFR_REQUEST_REGION	(1UL << 1)
+#define GFR_DEFAULT_ALIGN (1UL << PA_SECTION_SHIFT)
+
+static resource_size_t gfr_start(struct resource *base, resource_size_t size,
+				 resource_size_t align, unsigned long flags)
+{
+	if (flags & GFR_DESCENDING) {
+		resource_size_t end;
+
+		end = min_t(resource_size_t, base->end,
+			    (1ULL << MAX_PHYSMEM_BITS) - 1);
+		return end - size + 1;
+	}
+
+	return ALIGN(base->start, align);
+}
+
+static bool gfr_continue(struct resource *base, resource_size_t addr,
+			 resource_size_t size, unsigned long flags)
+{
+	if (flags & GFR_DESCENDING)
+		return addr > size && addr >= base->start;
+	return addr > addr - size &&
+	       addr <= min_t(resource_size_t, base->end,
+			     (1ULL << MAX_PHYSMEM_BITS) - 1);
+}
+
+static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
+				unsigned long flags)
+{
+	if (flags & GFR_DESCENDING)
+		return addr - size;
+	return addr + size;
+}
+
+static void remove_free_mem_region(void *_res)
 {
-	resource_size_t end, addr;
+	struct resource *res = _res;
+
+	if (res->parent)
+		remove_resource(res);
+	free_resource(res);
+}
+
+static struct resource *
+get_free_mem_region(struct device *dev, struct resource *base,
+		    resource_size_t size, const unsigned long align,
+		    const char *name, const unsigned long desc,
+		    const unsigned long flags)
+{
+	resource_size_t addr;
 	struct resource *res;
 	struct region_devres *dr = NULL;
 
-	size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
-	end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
-	addr = end - size + 1UL;
+	size = ALIGN(size, align);
 
 	res = alloc_resource(GFP_KERNEL);
 	if (!res)
 		return ERR_PTR(-ENOMEM);
 
-	if (dev) {
+	if (dev && (flags & GFR_REQUEST_REGION)) {
 		dr = devres_alloc(devm_region_release,
 				sizeof(struct region_devres), GFP_KERNEL);
 		if (!dr) {
 			free_resource(res);
 			return ERR_PTR(-ENOMEM);
 		}
+	} else if (dev) {
+		if (devm_add_action_or_reset(dev, remove_free_mem_region, res))
+			return ERR_PTR(-ENOMEM);
 	}
 
 	write_lock(&resource_lock);
-	for (; addr > size && addr >= base->start; addr -= size) {
-		if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
-				REGION_DISJOINT)
+	for (addr = gfr_start(base, size, align, flags);
+	     gfr_continue(base, addr, size, flags);
+	     addr = gfr_next(addr, size, flags)) {
+		if (__region_intersects(base, addr, size, 0, IORES_DESC_NONE) !=
+		    REGION_DISJOINT)
 			continue;
 
-		if (__request_region_locked(res, &iomem_resource, addr, size,
-						name, 0))
-			break;
+		if (flags & GFR_REQUEST_REGION) {
+			if (__request_region_locked(res, &iomem_resource, addr,
+						    size, name, 0))
+				break;
 
-		if (dev) {
-			dr->parent = &iomem_resource;
-			dr->start = addr;
-			dr->n = size;
-			devres_add(dev, dr);
-		}
+			if (dev) {
+				dr->parent = &iomem_resource;
+				dr->start = addr;
+				dr->n = size;
+				devres_add(dev, dr);
+			}
 
-		res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
-		write_unlock(&resource_lock);
+			res->desc = desc;
+			write_unlock(&resource_lock);
+
+
+			/*
+			 * A driver is claiming this region so revoke any
+			 * mappings.
+			 */
+			revoke_iomem(res);
+		} else {
+			res->start = addr;
+			res->end = addr + size - 1;
+			res->name = name;
+			res->desc = desc;
+			res->flags = IORESOURCE_MEM;
+
+			/*
+			 * Only succeed if the resource hosts an exclusive
+			 * range after the insert
+			 */
+			if (__insert_resource(base, res) || res->child)
+				break;
+
+			write_unlock(&resource_lock);
+		}
 
-		/*
-		 * A driver is claiming this region so revoke any mappings.
-		 */
-		revoke_iomem(res);
 		return res;
 	}
 	write_unlock(&resource_lock);
 
-	free_resource(res);
-	if (dr)
+	if (flags & GFR_REQUEST_REGION) {
+		free_resource(res);
 		devres_free(dr);
+	} else if (dev)
+		devm_release_action(dev, remove_free_mem_region, res);
 
 	return ERR_PTR(-ERANGE);
 }
@@ -1854,18 +1928,48 @@ static struct resource *__request_free_mem_region(struct device *dev,
 struct resource *devm_request_free_mem_region(struct device *dev,
 		struct resource *base, unsigned long size)
 {
-	return __request_free_mem_region(dev, base, size, dev_name(dev));
+	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
+
+	return get_free_mem_region(dev, base, size, GFR_DEFAULT_ALIGN,
+				   dev_name(dev),
+				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
 }
 EXPORT_SYMBOL_GPL(devm_request_free_mem_region);
 
 struct resource *request_free_mem_region(struct resource *base,
 		unsigned long size, const char *name)
 {
-	return __request_free_mem_region(NULL, base, size, name);
+	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
+
+	return get_free_mem_region(NULL, base, size, GFR_DEFAULT_ALIGN, name,
+				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
 }
 EXPORT_SYMBOL_GPL(request_free_mem_region);
 
-#endif /* CONFIG_DEVICE_PRIVATE */
+/**
+ * alloc_free_mem_region - find a free region relative to @base
+ * @base: resource that will parent the new resource
+ * @size: size in bytes of memory to allocate from @base
+ * @align: alignment requirements for the allocation
+ * @name: resource name
+ *
+ * Buses like CXL, that can dynamically instantiate new memory regions,
+ * need a method to allocate physical address space for those regions.
+ * Allocate and insert a new resource to cover a free, unclaimed by a
+ * descendant of @base, range in the span of @base.
+ */
+struct resource *alloc_free_mem_region(struct resource *base,
+				       unsigned long size, unsigned long align,
+				       const char *name)
+{
+	/* GFR_ASCENDING | GFR_INSERT_RESOURCE */
+	unsigned long flags = 0;
+
+	return get_free_mem_region(NULL, base, size, align, name,
+				   IORES_DESC_NONE, flags);
+}
+EXPORT_SYMBOL_NS_GPL(alloc_free_mem_region, CXL);
+#endif /* CONFIG_GET_FREE_REGION */
 
 static int __init strict_iomem(char *str)
 {
diff --git a/mm/Kconfig b/mm/Kconfig
index 169e64192e48..a5b4fee2e3fd 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -994,9 +994,14 @@ config HMM_MIRROR
 	bool
 	depends on MMU
 
+config GET_FREE_REGION
+	depends on SPARSEMEM
+	bool
+
 config DEVICE_PRIVATE
 	bool "Unaddressable device memory (GPU memory, ...)"
 	depends on ZONE_DEVICE
+	select GET_FREE_REGION
 
 	help
 	  Allows creation of struct pages to represent unaddressable device
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 34/46] cxl/region: Add region creation support
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (32 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 33/46] resource: Introduce alloc_free_mem_region() Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 13:17   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 35/46] cxl/region: Add a 'uuid' attribute Dan Williams
                   ` (14 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Ben Widawsky, Dan Williams

From: Ben Widawsky <bwidawsk@kernel.org>

CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.

Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance.

Recall that the major change that CXL brings over previous
persistent memory architectures is the ability to dynamically define new
regions.  Compare that to drivers like 'nfit' where the region
configuration is statically defined by platform firmware.

Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.

Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.

An example of creating a new region:

- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)

- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done

- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region

- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region

Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl       |  25 +++
 .../driver-api/cxl/memory-devices.rst         |  11 +
 drivers/cxl/Kconfig                           |   5 +
 drivers/cxl/core/Makefile                     |   1 +
 drivers/cxl/core/core.h                       |  12 ++
 drivers/cxl/core/port.c                       |  39 +++-
 drivers/cxl/core/region.c                     | 199 ++++++++++++++++++
 drivers/cxl/cxl.h                             |  18 ++
 tools/testing/cxl/Kbuild                      |   1 +
 9 files changed, 308 insertions(+), 3 deletions(-)
 create mode 100644 drivers/cxl/core/region.c

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 2a4e4163879f..9a4856066631 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -238,3 +238,28 @@ Description:
 		(RO) The number of consecutive bytes of host physical address
 		space this decoder claims at address N before awaint the next
 		address (N + interleave_granularity * intereleave_ways).
+
+
+What:		/sys/bus/cxl/devices/decoderX.Y/create_pmem_region
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RW) Write a string in the form 'regionZ' to start the process
+		of defining a new persistent memory region (interleave-set)
+		within the decode range bounded by root decoder 'decoderX.Y'.
+		The value written must match the current value returned from
+		reading this attribute. An atomic compare exchange operation is
+		done on write to assign the requested id to a region and
+		allocate the region-id for the next creation attempt. EBUSY is
+		returned if the region name written does not match the current
+		cached value.
+
+
+What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) Write a string in the form 'regionZ' to delete that region,
+		provided it is currently idle / not bound to a driver.
diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
index db476bb170b6..66ddc58a21b1 100644
--- a/Documentation/driver-api/cxl/memory-devices.rst
+++ b/Documentation/driver-api/cxl/memory-devices.rst
@@ -362,6 +362,17 @@ CXL Core
 .. kernel-doc:: drivers/cxl/core/mbox.c
    :doc: cxl mbox
 
+CXL Regions
+-----------
+.. kernel-doc:: drivers/cxl/region.h
+   :identifiers:
+
+.. kernel-doc:: drivers/cxl/core/region.c
+   :doc: cxl core region
+
+.. kernel-doc:: drivers/cxl/core/region.c
+   :identifiers:
+
 External Interfaces
 ===================
 
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index f64e3984689f..aa2728de419e 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -102,4 +102,9 @@ config CXL_SUSPEND
 	def_bool y
 	depends on SUSPEND && CXL_MEM
 
+config CXL_REGION
+	bool
+	default CXL_BUS
+	select MEMREGION
+
 endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 9d35085d25af..79c7257f4107 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -10,3 +10,4 @@ cxl_core-y += memdev.o
 cxl_core-y += mbox.o
 cxl_core-y += pci.o
 cxl_core-y += hdm.o
+cxl_core-$(CONFIG_CXL_REGION) += region.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 472ec9cb1018..ebe6197fb9b8 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -9,6 +9,18 @@ extern const struct device_type cxl_nvdimm_type;
 
 extern struct attribute_group cxl_base_attribute_group;
 
+#ifdef CONFIG_CXL_REGION
+extern struct device_attribute dev_attr_create_pmem_region;
+extern struct device_attribute dev_attr_delete_region;
+/*
+ * Note must be used at the end of an attribute list, since it
+ * terminates the list in the CONFIG_CXL_REGION=n case.
+ */
+#define CXL_REGION_ATTR(x) (&dev_attr_##x.attr)
+#else
+#define CXL_REGION_ATTR(x) NULL
+#endif
+
 struct cxl_send_command;
 struct cxl_mem_query_commands;
 int cxl_query_cmd(struct cxl_memdev *cxlmd,
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 2e56903399c2..c9207ebc3f32 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
 #include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/memregion.h>
 #include <linux/workqueue.h>
 #include <linux/debugfs.h>
 #include <linux/device.h>
@@ -300,11 +301,35 @@ static struct attribute *cxl_decoder_root_attrs[] = {
 	&dev_attr_cap_type2.attr,
 	&dev_attr_cap_type3.attr,
 	&dev_attr_target_list.attr,
+	CXL_REGION_ATTR(create_pmem_region),
+	CXL_REGION_ATTR(delete_region),
 	NULL,
 };
 
+static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
+{
+	unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_PMEM;
+
+	return (cxlrd->cxlsd.cxld.flags & flags) == flags;
+}
+
+static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+
+	if (a == CXL_REGION_ATTR(create_pmem_region) && !can_create_pmem(cxlrd))
+		return 0;
+
+	if (a == CXL_REGION_ATTR(delete_region) && !can_create_pmem(cxlrd))
+		return 0;
+
+	return a->mode;
+}
+
 static struct attribute_group cxl_decoder_root_attribute_group = {
 	.attrs = cxl_decoder_root_attrs,
+	.is_visible = cxl_root_decoder_visible,
 };
 
 static const struct attribute_group *cxl_decoder_root_attribute_groups[] = {
@@ -387,6 +412,7 @@ static void cxl_root_decoder_release(struct device *dev)
 {
 	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
 
+	memregion_free(atomic_read(&cxlrd->region_id));
 	__cxl_decoder_release(&cxlrd->cxlsd.cxld);
 	kfree(cxlrd);
 }
@@ -1415,6 +1441,7 @@ static struct lock_class_key cxl_decoder_key;
 static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 					     unsigned int nr_targets)
 {
+	struct cxl_root_decoder *cxlrd = NULL;
 	struct cxl_decoder *cxld;
 	struct device *dev;
 	void *alloc;
@@ -1425,16 +1452,20 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 
 	if (nr_targets) {
 		struct cxl_switch_decoder *cxlsd;
-		struct cxl_root_decoder *cxlrd;
 
 		if (is_cxl_root(port)) {
 			alloc = kzalloc(struct_size(cxlrd, cxlsd.target,
 						    nr_targets),
 					GFP_KERNEL);
 			cxlrd = alloc;
-			if (cxlrd)
+			if (cxlrd) {
 				cxlsd = &cxlrd->cxlsd;
-			else
+				atomic_set(&cxlrd->region_id, -1);
+				rc = memregion_alloc(GFP_KERNEL);
+				if (rc < 0)
+					goto err;
+				atomic_set(&cxlrd->region_id, rc);
+			} else
 				cxlsd = NULL;
 		} else {
 			alloc = kzalloc(struct_size(cxlsd, target, nr_targets),
@@ -1490,6 +1521,8 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 
 	return cxld;
 err:
+	if (cxlrd && atomic_read(&cxlrd->region_id) >= 0)
+		memregion_free(atomic_read(&cxlrd->region_id));
 	kfree(alloc);
 	return ERR_PTR(rc);
 }
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
new file mode 100644
index 000000000000..f2a0ead20ca7
--- /dev/null
+++ b/drivers/cxl/core/region.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+#include <linux/memregion.h>
+#include <linux/genalloc.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <cxl.h>
+#include "core.h"
+
+/**
+ * DOC: cxl core region
+ *
+ * CXL Regions represent mapped memory capacity in system physical address
+ * space. Whereas the CXL Root Decoders identify the bounds of potential CXL
+ * Memory ranges, Regions represent the active mapped capacity by the HDM
+ * Decoder Capability structures throughout the Host Bridges, Switches, and
+ * Endpoints in the topology.
+ */
+
+static struct cxl_region *to_cxl_region(struct device *dev);
+
+static void cxl_region_release(struct device *dev)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+
+	memregion_free(cxlr->id);
+	kfree(cxlr);
+}
+
+static const struct device_type cxl_region_type = {
+	.name = "cxl_region",
+	.release = cxl_region_release,
+};
+
+bool is_cxl_region(struct device *dev)
+{
+	return dev->type == &cxl_region_type;
+}
+EXPORT_SYMBOL_NS_GPL(is_cxl_region, CXL);
+
+static struct cxl_region *to_cxl_region(struct device *dev)
+{
+	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
+			  "not a cxl_region device\n"))
+		return NULL;
+
+	return container_of(dev, struct cxl_region, dev);
+}
+
+static void unregister_region(void *dev)
+{
+	device_unregister(dev);
+}
+
+static struct lock_class_key cxl_region_key;
+
+static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int id)
+{
+	struct cxl_region *cxlr;
+	struct device *dev;
+
+	cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL);
+	if (!cxlr) {
+		memregion_free(id);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	dev = &cxlr->dev;
+	device_initialize(dev);
+	lockdep_set_class(&dev->mutex, &cxl_region_key);
+	dev->parent = &cxlrd->cxlsd.cxld.dev;
+	device_set_pm_not_required(dev);
+	dev->bus = &cxl_bus_type;
+	dev->type = &cxl_region_type;
+	cxlr->id = id;
+
+	return cxlr;
+}
+
+/**
+ * devm_cxl_add_region - Adds a region to a decoder
+ * @cxlrd: root decoder
+ * @id: memregion id to create
+ * @mode: mode for the endpoint decoders of this region
+ *
+ * This is the second step of region initialization. Regions exist within an
+ * address space which is mapped by a @cxlrd.
+ *
+ * Return: 0 if the region was added to the @cxlrd, else returns negative error
+ * code. The region will be named "regionZ" where Z is the unique region number.
+ */
+static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
+					      int id,
+					      enum cxl_decoder_mode mode,
+					      enum cxl_decoder_type type)
+{
+	struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
+	struct cxl_region *cxlr;
+	struct device *dev;
+	int rc;
+
+	cxlr = cxl_region_alloc(cxlrd, id);
+	if (IS_ERR(cxlr))
+		return cxlr;
+	cxlr->mode = mode;
+	cxlr->type = type;
+
+	dev = &cxlr->dev;
+	rc = dev_set_name(dev, "region%d", id);
+	if (rc)
+		goto err;
+
+	rc = device_add(dev);
+	if (rc)
+		goto err;
+
+	rc = devm_add_action_or_reset(port->uport, unregister_region, cxlr);
+	if (rc)
+		return ERR_PTR(rc);
+
+	dev_dbg(port->uport, "%s: created %s\n",
+		dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
+	return cxlr;
+
+err:
+	put_device(dev);
+	return ERR_PTR(rc);
+}
+
+static ssize_t create_pmem_region_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+
+	return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
+}
+
+static ssize_t create_pmem_region_store(struct device *dev,
+					struct device_attribute *attr,
+					const char *buf, size_t len)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+	struct cxl_region *cxlr;
+	unsigned int id, rc;
+
+	rc = sscanf(buf, "region%u\n", &id);
+	if (rc != 1)
+		return -EINVAL;
+
+	rc = memregion_alloc(GFP_KERNEL);
+	if (rc < 0)
+		return rc;
+
+	if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
+		memregion_free(rc);
+		return -EBUSY;
+	}
+
+	cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM,
+				   CXL_DECODER_EXPANDER);
+	if (IS_ERR(cxlr))
+		return PTR_ERR(cxlr);
+
+	return len;
+}
+DEVICE_ATTR_RW(create_pmem_region);
+
+static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld,
+						  const char *name)
+{
+	struct device *region_dev;
+
+	region_dev = device_find_child_by_name(&cxld->dev, name);
+	if (!region_dev)
+		return ERR_PTR(-ENODEV);
+
+	return to_cxl_region(region_dev);
+}
+
+static ssize_t delete_region_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_port *port = to_cxl_port(dev->parent);
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	struct cxl_region *cxlr;
+
+	cxlr = cxl_find_region_by_name(cxld, buf);
+	if (IS_ERR(cxlr))
+		return PTR_ERR(cxlr);
+
+	devm_release_action(port->uport, unregister_region, cxlr);
+	put_device(&cxlr->dev);
+
+	return len;
+}
+DEVICE_ATTR_WO(delete_region);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index f761cf78cc05..49b73b2e44a9 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -279,13 +279,29 @@ struct cxl_switch_decoder {
 /**
  * struct cxl_root_decoder - Static platform CXL address decoder
  * @res: host / parent resource for region allocations
+ * @region_id: region id for next region provisioning event
  * @cxlsd: base cxl switch decoder
  */
 struct cxl_root_decoder {
 	struct resource *res;
+	atomic_t region_id;
 	struct cxl_switch_decoder cxlsd;
 };
 
+/**
+ * struct cxl_region - CXL region
+ * @dev: This region's device
+ * @id: This region's id. Id is globally unique across all regions
+ * @mode: Endpoint decoder allocation / access mode
+ * @type: Endpoint decoder target type
+ */
+struct cxl_region {
+	struct device dev;
+	int id;
+	enum cxl_decoder_mode mode;
+	enum cxl_decoder_type type;
+};
+
 /**
  * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
  * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
@@ -434,6 +450,8 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port);
 int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm);
 int devm_cxl_add_passthrough_decoder(struct cxl_port *port);
 
+bool is_cxl_region(struct device *dev);
+
 extern struct bus_type cxl_bus_type;
 
 struct cxl_driver {
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 33543231d453..500be85729cc 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -47,6 +47,7 @@ cxl_core-y += $(CXL_CORE_SRC)/memdev.o
 cxl_core-y += $(CXL_CORE_SRC)/mbox.o
 cxl_core-y += $(CXL_CORE_SRC)/pci.o
 cxl_core-y += $(CXL_CORE_SRC)/hdm.o
+cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
 cxl_core-y += config_check.o
 
 obj-m += test/
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 35/46] cxl/region: Add a 'uuid' attribute
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (33 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 34/46] cxl/region: Add region creation support Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-28 10:29   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 36/46] cxl/region: Add interleave ways attribute Dan Williams
                   ` (13 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Ben Widawsky, Dan Williams

From: Ben Widawsky <bwidawsk@kernel.org>

The process of provisioning a region involves triggering the creation of
a new region object, pouring in the configuration, and then binding that
configured object to the region driver to start is operation. For
persistent memory regions the CXL specification mandates that it
identified by a uuid. Add an ABI for userspace to specify a region's
uuid.

Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
[djbw: simplify locking]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |  10 +++
 drivers/cxl/core/region.c               | 115 ++++++++++++++++++++++++
 drivers/cxl/cxl.h                       |  25 ++++++
 3 files changed, 150 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 9a4856066631..d30c95a758a9 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -263,3 +263,13 @@ Contact:	linux-cxl@vger.kernel.org
 Description:
 		(WO) Write a string in the form 'regionZ' to delete that region,
 		provided it is currently idle / not bound to a driver.
+
+
+What:		/sys/bus/cxl/devices/regionZ/uuid
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RW) Write a unique identifier for the region. This field must
+		be set for persistent regions and it must not conflict with the
+		UUID of another region.
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index f2a0ead20ca7..f75978f846b9 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -5,6 +5,7 @@
 #include <linux/device.h>
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <linux/uuid.h>
 #include <linux/idr.h>
 #include <cxl.h>
 #include "core.h"
@@ -17,10 +18,123 @@
  * Memory ranges, Regions represent the active mapped capacity by the HDM
  * Decoder Capability structures throughout the Host Bridges, Switches, and
  * Endpoints in the topology.
+ *
+ * Region configuration has ordering constraints. UUID may be set at any time
+ * but is only visible for persistent regions.
+ */
+
+/*
+ * All changes to the interleave configuration occur with this lock held
+ * for write.
  */
+static DECLARE_RWSEM(cxl_region_rwsem);
 
 static struct cxl_region *to_cxl_region(struct device *dev);
 
+static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	ssize_t rc;
+
+	rc = down_read_interruptible(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+
+static int is_dup(struct device *match, void *data)
+{
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+	uuid_t *uuid = data;
+
+	if (!is_cxl_region(match))
+		return 0;
+
+	lockdep_assert_held(&cxl_region_rwsem);
+	cxlr = to_cxl_region(match);
+	p = &cxlr->params;
+
+	if (uuid_equal(&p->uuid, uuid)) {
+		dev_dbg(match, "already has uuid: %pUb\n", uuid);
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+static ssize_t uuid_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	uuid_t temp;
+	ssize_t rc;
+
+	if (len != UUID_STRING_LEN + 1)
+		return -EINVAL;
+
+	rc = uuid_parse(buf, &temp);
+	if (rc)
+		return rc;
+
+	if (uuid_is_null(&temp))
+		return -EINVAL;
+
+	rc = down_write_killable(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+
+	rc = -EBUSY;
+	if (p->state >= CXL_CONFIG_ACTIVE)
+		goto out;
+
+	rc = bus_for_each_dev(&cxl_bus_type, NULL, &temp, is_dup);
+	if (rc < 0)
+		goto out;
+
+	uuid_copy(&p->uuid, &temp);
+out:
+	up_write(&cxl_region_rwsem);
+
+	if (rc)
+		return rc;
+	return len;
+}
+static DEVICE_ATTR_RW(uuid);
+
+static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
+				  int n)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct cxl_region *cxlr = to_cxl_region(dev);
+
+	if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM)
+		return 0;
+	return a->mode;
+}
+
+static struct attribute *cxl_region_attrs[] = {
+	&dev_attr_uuid.attr,
+	NULL,
+};
+
+static const struct attribute_group cxl_region_group = {
+	.attrs = cxl_region_attrs,
+	.is_visible = cxl_region_visible,
+};
+
+static const struct attribute_group *region_groups[] = {
+	&cxl_base_attribute_group,
+	&cxl_region_group,
+	NULL,
+};
+
 static void cxl_region_release(struct device *dev)
 {
 	struct cxl_region *cxlr = to_cxl_region(dev);
@@ -32,6 +146,7 @@ static void cxl_region_release(struct device *dev)
 static const struct device_type cxl_region_type = {
 	.name = "cxl_region",
 	.release = cxl_region_release,
+	.groups = region_groups
 };
 
 bool is_cxl_region(struct device *dev)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 49b73b2e44a9..46a9f8acc602 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -288,18 +288,43 @@ struct cxl_root_decoder {
 	struct cxl_switch_decoder cxlsd;
 };
 
+/*
+ * enum cxl_config_state - State machine for region configuration
+ * @CXL_CONFIG_IDLE: Any sysfs attribute can be written freely
+ * @CXL_CONFIG_ACTIVE: All targets have been added the region is now
+ * active
+ */
+enum cxl_config_state {
+	CXL_CONFIG_IDLE,
+	CXL_CONFIG_ACTIVE,
+};
+
+/**
+ * struct cxl_region_params - region settings
+ * @state: allow the driver to lockdown further parameter changes
+ * @uuid: unique id for persistent regions
+ *
+ * State transitions are protected by the cxl_region_rwsem
+ */
+struct cxl_region_params {
+	enum cxl_config_state state;
+	uuid_t uuid;
+};
+
 /**
  * struct cxl_region - CXL region
  * @dev: This region's device
  * @id: This region's id. Id is globally unique across all regions
  * @mode: Endpoint decoder allocation / access mode
  * @type: Endpoint decoder target type
+ * @params: active + config params for the region
  */
 struct cxl_region {
 	struct device dev;
 	int id;
 	enum cxl_decoder_mode mode;
 	enum cxl_decoder_type type;
+	struct cxl_region_params params;
 };
 
 /**
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 36/46] cxl/region: Add interleave ways attribute
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (34 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 35/46] cxl/region: Add a 'uuid' attribute Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 13:44   ` Jonathan Cameron
  2022-06-30 13:45   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions Dan Williams
                   ` (12 subsequent siblings)
  48 siblings, 2 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Ben Widawsky, Dan Williams

From: Ben Widawsky <bwidawsk@kernel.org>

Add an ABI to allow the number of devices that comprise a region to be
set.

Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
[djbw: reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |  21 ++++
 drivers/cxl/core/region.c               | 128 ++++++++++++++++++++++++
 drivers/cxl/cxl.h                       |  33 ++++++
 3 files changed, 182 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index d30c95a758a9..46d5295c1149 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -273,3 +273,24 @@ Description:
 		(RW) Write a unique identifier for the region. This field must
 		be set for persistent regions and it must not conflict with the
 		UUID of another region.
+
+
+What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RW) Set the number of consecutive bytes each device in the
+		interleave set will claim. The possible interleave granularity
+		values are determined by the CXL spec and the participating
+		devices.
+
+
+What:		/sys/bus/cxl/devices/regionZ/interleave_ways
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RW) Configures the number of devices participating in the
+		region is set by writing this value. Each device will provide
+		1/interleave_ways of storage for the region.
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index f75978f846b9..78af42454760 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -7,6 +7,7 @@
 #include <linux/slab.h>
 #include <linux/uuid.h>
 #include <linux/idr.h>
+#include <cxlmem.h>
 #include <cxl.h>
 #include "core.h"
 
@@ -21,6 +22,8 @@
  *
  * Region configuration has ordering constraints. UUID may be set at any time
  * but is only visible for persistent regions.
+ * 1. Interleave granularity
+ * 2. Interleave size
  */
 
 /*
@@ -119,8 +122,129 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
 	return a->mode;
 }
 
+static ssize_t interleave_ways_show(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	ssize_t rc;
+
+	rc = down_read_interruptible(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	rc = sysfs_emit(buf, "%d\n", p->interleave_ways);
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+
+static ssize_t interleave_ways_store(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, size_t len)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	int rc, val;
+	u8 iw;
+
+	rc = kstrtoint(buf, 0, &val);
+	if (rc)
+		return rc;
+
+	rc = ways_to_cxl(val, &iw);
+	if (rc)
+		return rc;
+
+	/*
+	 * Even for x3, x9, and x12 interleaves the region interleave must be a
+	 * power of 2 multiple of the host bridge interleave.
+	 */
+	if (!is_power_of_2(val / cxld->interleave_ways) ||
+	    (val % cxld->interleave_ways)) {
+		dev_dbg(&cxlr->dev, "invalid interleave: %d\n", val);
+		return -EINVAL;
+	}
+
+	rc = down_write_killable(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
+		rc = -EBUSY;
+		goto out;
+	}
+
+	p->interleave_ways = val;
+out:
+	up_read(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	return len;
+}
+static DEVICE_ATTR_RW(interleave_ways);
+
+static ssize_t interleave_granularity_show(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	ssize_t rc;
+
+	rc = down_read_interruptible(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	rc = sysfs_emit(buf, "%d\n", p->interleave_granularity);
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+
+static ssize_t interleave_granularity_store(struct device *dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t len)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	int rc, val;
+	u16 ig;
+
+	rc = kstrtoint(buf, 0, &val);
+	if (rc)
+		return rc;
+
+	rc = granularity_to_cxl(val, &ig);
+	if (rc)
+		return rc;
+
+	/* region granularity must be >= root granularity */
+	if (val < cxld->interleave_granularity)
+		return -EINVAL;
+
+	rc = down_write_killable(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
+		rc = -EBUSY;
+		goto out;
+	}
+
+	p->interleave_granularity = val;
+out:
+	up_read(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	return len;
+}
+static DEVICE_ATTR_RW(interleave_granularity);
+
 static struct attribute *cxl_region_attrs[] = {
 	&dev_attr_uuid.attr,
+	&dev_attr_interleave_ways.attr,
+	&dev_attr_interleave_granularity.attr,
 	NULL,
 };
 
@@ -212,6 +336,8 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
 					      enum cxl_decoder_type type)
 {
 	struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
+	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
+	struct cxl_region_params *p;
 	struct cxl_region *cxlr;
 	struct device *dev;
 	int rc;
@@ -219,8 +345,10 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
 	cxlr = cxl_region_alloc(cxlrd, id);
 	if (IS_ERR(cxlr))
 		return cxlr;
+	p = &cxlr->params;
 	cxlr->mode = mode;
 	cxlr->type = type;
+	p->interleave_granularity = cxld->interleave_granularity;
 
 	dev = &cxlr->dev;
 	rc = dev_set_name(dev, "region%d", id);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 46a9f8acc602..13ee04b00e0c 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -7,6 +7,7 @@
 #include <linux/libnvdimm.h>
 #include <linux/bitfield.h>
 #include <linux/bitops.h>
+#include <linux/log2.h>
 #include <linux/io.h>
 
 /**
@@ -92,6 +93,31 @@ static inline int cxl_to_ways(u8 eniw, unsigned int *val)
 	return 0;
 }
 
+static inline int granularity_to_cxl(int g, u16 *ig)
+{
+	if (g > SZ_16K || g < 256 || !is_power_of_2(g))
+		return -EINVAL;
+	*ig = ilog2(g) - 8;
+	return 0;
+}
+
+static inline int ways_to_cxl(int ways, u8 *iw)
+{
+	if (ways > 16)
+		return -EINVAL;
+	if (is_power_of_2(ways)) {
+		*iw = ilog2(ways);
+		return 0;
+	}
+	if (ways % 3)
+		return -EINVAL;
+	ways /= 3;
+	if (!is_power_of_2(ways))
+		return -EINVAL;
+	*iw = ilog2(ways) + 8;
+	return 0;
+}
+
 /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
 #define CXLDEV_CAP_ARRAY_OFFSET 0x0
 #define   CXLDEV_CAP_ARRAY_CAP_ID 0
@@ -291,11 +317,14 @@ struct cxl_root_decoder {
 /*
  * enum cxl_config_state - State machine for region configuration
  * @CXL_CONFIG_IDLE: Any sysfs attribute can be written freely
+ * @CXL_CONFIG_INTERLEAVE_ACTIVE: region size has been set, no more
+ * changes to interleave_ways or interleave_granularity
  * @CXL_CONFIG_ACTIVE: All targets have been added the region is now
  * active
  */
 enum cxl_config_state {
 	CXL_CONFIG_IDLE,
+	CXL_CONFIG_INTERLEAVE_ACTIVE,
 	CXL_CONFIG_ACTIVE,
 };
 
@@ -303,12 +332,16 @@ enum cxl_config_state {
  * struct cxl_region_params - region settings
  * @state: allow the driver to lockdown further parameter changes
  * @uuid: unique id for persistent regions
+ * @interleave_ways: number of endpoints in the region
+ * @interleave_granularity: capacity each endpoint contributes to a stripe
  *
  * State transitions are protected by the cxl_region_rwsem
  */
 struct cxl_region_params {
 	enum cxl_config_state state;
 	uuid_t uuid;
+	int interleave_ways;
+	int interleave_granularity;
 };
 
 /**
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (35 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 36/46] cxl/region: Add interleave ways attribute Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 13:56   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions Dan Williams
                   ` (11 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams, Ben Widawsky

After a region's interleave parameters (ways and granularity) are set,
add a way for regions to allocate HPA from the free capacity in their
decoder. The allocator for this capacity reuses the 'struct resource'
based allocator used for CONFIG_DEVICE_PRIVATE.

Once the tuple of "ways, granularity, and size" is set the
region configuration transitions to the CXL_CONFIG_INTERLEAVE_ACTIVE
state which is a precursor to allowing endpoint decoders to be added to
a region.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |  25 ++++
 drivers/cxl/Kconfig                     |   3 +
 drivers/cxl/core/region.c               | 148 +++++++++++++++++++++++-
 drivers/cxl/cxl.h                       |   2 +
 4 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 46d5295c1149..3658facc9944 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -294,3 +294,28 @@ Description:
 		(RW) Configures the number of devices participating in the
 		region is set by writing this value. Each device will provide
 		1/interleave_ways of storage for the region.
+
+
+What:		/sys/bus/cxl/devices/regionZ/size
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RW) System physical address space to be consumed by the region.
+		When written to, this attribute will allocate space out of the
+		CXL root decoder's address space. When read the size of the
+		address space is reported and should match the span of the
+		region's resource attribute. Size shall be set after the
+		interleave configuration parameters.
+
+
+What:		/sys/bus/cxl/devices/regionZ/resource
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) A region is a contiguous partition of a CXL root decoder
+		address space. Region capacity is allocated by writing to the
+		size attribute, the resulting physical address space determined
+		by the driver is reflected here. It is therefore not useful to
+		read this before writing a value to the size attribute.
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index aa2728de419e..74c2cd069d9d 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -105,6 +105,9 @@ config CXL_SUSPEND
 config CXL_REGION
 	bool
 	default CXL_BUS
+	# For MAX_PHYSMEM_BITS
+	depends on SPARSEMEM
 	select MEMREGION
+	select GET_FREE_REGION
 
 endif
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 78af42454760..a604c24ff918 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -241,10 +241,150 @@ static ssize_t interleave_granularity_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(interleave_granularity);
 
+static ssize_t resource_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	u64 resource = -1ULL;
+	ssize_t rc;
+
+	rc = down_read_interruptible(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	if (p->res)
+		resource = p->res->start;
+	rc = sysfs_emit(buf, "%#llx\n", resource);
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+static DEVICE_ATTR_RO(resource);
+
+static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+	struct cxl_region_params *p = &cxlr->params;
+	struct resource *res;
+	u32 remainder = 0;
+
+	lockdep_assert_held_write(&cxl_region_rwsem);
+
+	/* Nothing to do... */
+	if (p->res && resource_size(res) == size)
+		return 0;
+
+	/* To change size the old size must be freed first */
+	if (p->res)
+		return -EBUSY;
+
+	if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
+		return -EBUSY;
+
+	if (!p->interleave_ways || !p->interleave_granularity)
+		return -ENXIO;
+
+	div_u64_rem(size, SZ_256M * p->interleave_ways, &remainder);
+	if (remainder)
+		return -EINVAL;
+
+	res = alloc_free_mem_region(cxlrd->res, size, SZ_256M,
+				    dev_name(&cxlr->dev));
+	if (IS_ERR(res)) {
+		dev_dbg(&cxlr->dev, "failed to allocate HPA: %ld\n",
+			PTR_ERR(res));
+		return PTR_ERR(res);
+	}
+
+	p->res = res;
+	p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
+
+	return 0;
+}
+
+static void cxl_region_iomem_release(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+
+	if (device_is_registered(&cxlr->dev))
+		lockdep_assert_held_write(&cxl_region_rwsem);
+	if (p->res) {
+		remove_resource(p->res);
+		kfree(p->res);
+		p->res = NULL;
+	}
+}
+
+static int free_hpa(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+
+	lockdep_assert_held_write(&cxl_region_rwsem);
+
+	if (!p->res)
+		return 0;
+
+	if (p->state >= CXL_CONFIG_ACTIVE)
+		return -EBUSY;
+
+	cxl_region_iomem_release(cxlr);
+	p->state = CXL_CONFIG_IDLE;
+	return 0;
+}
+
+static ssize_t size_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	u64 val;
+	int rc;
+
+	rc = kstrtou64(buf, 0, &val);
+	if (rc)
+		return rc;
+
+	rc = down_write_killable(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+
+	if (val)
+		rc = alloc_hpa(cxlr, val);
+	else
+		rc = free_hpa(cxlr);
+	up_write(&cxl_region_rwsem);
+
+	if (rc)
+		return rc;
+
+	return len;
+}
+
+static ssize_t size_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	u64 size = 0;
+	ssize_t rc;
+
+	rc = down_read_interruptible(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	if (p->res)
+		size = resource_size(p->res);
+	rc = sysfs_emit(buf, "%#llx\n", size);
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+static DEVICE_ATTR_RW(size);
+
 static struct attribute *cxl_region_attrs[] = {
 	&dev_attr_uuid.attr,
 	&dev_attr_interleave_ways.attr,
 	&dev_attr_interleave_granularity.attr,
+	&dev_attr_resource.attr,
+	&dev_attr_size.attr,
 	NULL,
 };
 
@@ -290,7 +430,11 @@ static struct cxl_region *to_cxl_region(struct device *dev)
 
 static void unregister_region(void *dev)
 {
-	device_unregister(dev);
+	struct cxl_region *cxlr = to_cxl_region(dev);
+
+	device_del(dev);
+	cxl_region_iomem_release(cxlr);
+	put_device(dev);
 }
 
 static struct lock_class_key cxl_region_key;
@@ -440,3 +584,5 @@ static ssize_t delete_region_store(struct device *dev,
 	return len;
 }
 DEVICE_ATTR_WO(delete_region);
+
+MODULE_IMPORT_NS(CXL);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 13ee04b00e0c..25960c1e4ebd 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -334,6 +334,7 @@ enum cxl_config_state {
  * @uuid: unique id for persistent regions
  * @interleave_ways: number of endpoints in the region
  * @interleave_granularity: capacity each endpoint contributes to a stripe
+ * @res: allocated iomem capacity for this region
  *
  * State transitions are protected by the cxl_region_rwsem
  */
@@ -342,6 +343,7 @@ struct cxl_region_params {
 	uuid_t uuid;
 	int interleave_ways;
 	int interleave_granularity;
+	struct resource *res;
 };
 
 /**
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (36 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 14:31   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism Dan Williams
                   ` (10 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams, Ben Widawsky

The region provisioning process involves allocating DPA to a set of
endpoint decoders, and HPA plus the region geometry to a region device.
Then the decoder is assigned to the region. At this point several
validation steps can be performed to validate that the decoder is
suitable to participate in the region.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |  19 ++
 drivers/cxl/core/core.h                 |   6 +
 drivers/cxl/core/hdm.c                  |  13 +-
 drivers/cxl/core/port.c                 |  12 +-
 drivers/cxl/core/region.c               | 286 +++++++++++++++++++++++-
 drivers/cxl/cxl.h                       |  11 +
 6 files changed, 342 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 3658facc9944..f1b74a71927d 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -319,3 +319,22 @@ Description:
 		size attribute, the resulting physical address space determined
 		by the driver is reflected here. It is therefore not useful to
 		read this before writing a value to the size attribute.
+
+
+What:		/sys/bus/cxl/devices/regionZ/target[0..N]
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RW) Write an endpoint decoder object name to 'targetX' where X
+		is the intended position of the endpoint device in the region
+		interleave and N is the 'interleave_ways' setting for the
+		region. ENXIO is returned if the write results in an impossible
+		to map decode scenario, like the endpoint is unreachable at that
+		position relative to the root decoder interleave. EBUSY is
+		returned if the position in the region is already occupied, or
+		if the region is not in a state to accept interleave
+		configuration changes. EINVAL is returned if the object name is
+		not an endpoint decoder. Once all positions have been
+		successfully written a final validation for decode conflicts is
+		performed before activating the region.
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index ebe6197fb9b8..36b6bd8dac2b 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -12,12 +12,17 @@ extern struct attribute_group cxl_base_attribute_group;
 #ifdef CONFIG_CXL_REGION
 extern struct device_attribute dev_attr_create_pmem_region;
 extern struct device_attribute dev_attr_delete_region;
+extern struct device_attribute dev_attr_region;
+void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled);
 /*
  * Note must be used at the end of an attribute list, since it
  * terminates the list in the CONFIG_CXL_REGION=n case.
  */
 #define CXL_REGION_ATTR(x) (&dev_attr_##x.attr)
 #else
+static inline void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
+{
+}
 #define CXL_REGION_ATTR(x) NULL
 #endif
 
@@ -35,6 +40,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size);
 int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
 resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
 resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
+extern struct rw_semaphore cxl_dpa_rwsem;
 
 int cxl_memdev_init(void);
 void cxl_memdev_exit(void);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 7b58f6911523..2ee62dde8b23 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -8,7 +8,7 @@
 #include "cxlmem.h"
 #include "core.h"
 
-static DECLARE_RWSEM(cxl_dpa_rwsem);
+DECLARE_RWSEM(cxl_dpa_rwsem);
 
 /**
  * DOC: cxl core hdm
@@ -308,6 +308,11 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
 		rc = 0;
 		goto out;
 	}
+	if (cxled->cxld.region) {
+		dev_dbg(dev, "decoder assigned to: %s\n",
+			dev_name(&cxled->cxld.region->dev));
+		goto out;
+	}
 	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
 		dev_dbg(dev, "decoder enabled\n");
 		goto out;
@@ -378,6 +383,12 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
 	int rc = -EBUSY;
 
 	down_write(&cxl_dpa_rwsem);
+	if (cxled->cxld.region) {
+		dev_dbg(dev, "decoder attached to %s\n",
+			dev_name(&cxled->cxld.region->dev));
+		goto out;
+	}
+
 	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
 		dev_dbg(dev, "decoder enabled\n");
 		goto out;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index c9207ebc3f32..562a6453249b 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -288,6 +288,7 @@ static struct attribute *cxl_decoder_base_attrs[] = {
 	&dev_attr_locked.attr,
 	&dev_attr_interleave_granularity.attr,
 	&dev_attr_interleave_ways.attr,
+	CXL_REGION_ATTR(region),
 	NULL,
 };
 
@@ -1483,8 +1484,10 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 
 		alloc = kzalloc(sizeof(*cxled), GFP_KERNEL);
 		cxled = alloc;
-		if (cxled)
+		if (cxled) {
 			cxld = &cxled->cxld;
+			cxled->pos = -1;
+		}
 	}
 	if (!alloc)
 		return ERR_PTR(-ENOMEM);
@@ -1690,6 +1693,13 @@ EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
 
 static void cxld_unregister(void *dev)
 {
+	struct cxl_endpoint_decoder *cxled;
+
+	if (is_endpoint_decoder(dev)) {
+		cxled = to_cxl_endpoint_decoder(dev);
+		cxl_decoder_kill_region(cxled);
+	}
+
 	device_unregister(dev);
 }
 
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a604c24ff918..4830365f3857 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -24,6 +24,7 @@
  * but is only visible for persistent regions.
  * 1. Interleave granularity
  * 2. Interleave size
+ * 3. Decoder targets
  */
 
 /*
@@ -138,6 +139,8 @@ static ssize_t interleave_ways_show(struct device *dev,
 	return rc;
 }
 
+static const struct attribute_group *get_cxl_region_target_group(void);
+
 static ssize_t interleave_ways_store(struct device *dev,
 				     struct device_attribute *attr,
 				     const char *buf, size_t len)
@@ -146,7 +149,7 @@ static ssize_t interleave_ways_store(struct device *dev,
 	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
 	struct cxl_region *cxlr = to_cxl_region(dev);
 	struct cxl_region_params *p = &cxlr->params;
-	int rc, val;
+	int rc, val, save;
 	u8 iw;
 
 	rc = kstrtoint(buf, 0, &val);
@@ -175,9 +178,13 @@ static ssize_t interleave_ways_store(struct device *dev,
 		goto out;
 	}
 
+	save = p->interleave_ways;
 	p->interleave_ways = val;
+	rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
+	if (rc)
+		p->interleave_ways = save;
 out:
-	up_read(&cxl_region_rwsem);
+	up_write(&cxl_region_rwsem);
 	if (rc)
 		return rc;
 	return len;
@@ -234,7 +241,7 @@ static ssize_t interleave_granularity_store(struct device *dev,
 
 	p->interleave_granularity = val;
 out:
-	up_read(&cxl_region_rwsem);
+	up_write(&cxl_region_rwsem);
 	if (rc)
 		return rc;
 	return len;
@@ -393,9 +400,262 @@ static const struct attribute_group cxl_region_group = {
 	.is_visible = cxl_region_visible,
 };
 
+static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_endpoint_decoder *cxled;
+	int rc;
+
+	rc = down_read_interruptible(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+
+	if (pos >= p->interleave_ways) {
+		dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
+			p->interleave_ways);
+		rc = -ENXIO;
+		goto out;
+	}
+
+	cxled = p->targets[pos];
+	if (!cxled)
+		rc = sysfs_emit(buf, "\n");
+	else
+		rc = sysfs_emit(buf, "%s\n", dev_name(&cxled->cxld.dev));
+out:
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+
+/*
+ * - Check that the given endpoint is attached to a host-bridge identified
+ *   in the root interleave.
+ */
+static int cxl_region_attach(struct cxl_region *cxlr,
+			     struct cxl_endpoint_decoder *cxled, int pos)
+{
+	struct cxl_region_params *p = &cxlr->params;
+
+	if (cxled->mode == CXL_DECODER_DEAD) {
+		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
+		return -ENODEV;
+	}
+
+	if (pos >= p->interleave_ways) {
+		dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
+			p->interleave_ways);
+		return -ENXIO;
+	}
+
+	if (p->targets[pos] == cxled)
+		return 0;
+
+	if (p->targets[pos]) {
+		struct cxl_endpoint_decoder *cxled_target = p->targets[pos];
+		struct cxl_memdev *cxlmd_target = cxled_to_memdev(cxled_target);
+
+		dev_dbg(&cxlr->dev, "position %d already assigned to %s:%s\n",
+			pos, dev_name(&cxlmd_target->dev),
+			dev_name(&cxled_target->cxld.dev));
+		return -EBUSY;
+	}
+
+	p->targets[pos] = cxled;
+	cxled->pos = pos;
+	p->nr_targets++;
+
+	return 0;
+}
+
+static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_region *cxlr = cxled->cxld.region;
+	struct cxl_region_params *p;
+
+	lockdep_assert_held_write(&cxl_region_rwsem);
+
+	if (!cxlr)
+		return;
+
+	p = &cxlr->params;
+	get_device(&cxlr->dev);
+
+	if (cxled->pos < 0 || cxled->pos >= p->interleave_ways ||
+	    p->targets[cxled->pos] != cxled) {
+		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+
+		dev_WARN_ONCE(&cxlr->dev, 1, "expected %s:%s at position %d\n",
+			      dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+			      cxled->pos);
+		goto out;
+	}
+
+	p->targets[cxled->pos] = NULL;
+	p->nr_targets--;
+
+	/* notify the region driver that one of its targets has deparated */
+	up_write(&cxl_region_rwsem);
+	device_release_driver(&cxlr->dev);
+	down_write(&cxl_region_rwsem);
+out:
+	put_device(&cxlr->dev);
+}
+
+void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
+{
+	down_write(&cxl_region_rwsem);
+	cxled->mode = CXL_DECODER_DEAD;
+	cxl_region_detach(cxled);
+	up_write(&cxl_region_rwsem);
+}
+
+static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos)
+{
+	struct device *dev;
+	int rc;
+
+	dev = bus_find_device_by_name(&cxl_bus_type, NULL, decoder);
+	if (!dev)
+		return -ENODEV;
+
+	if (!is_endpoint_decoder(dev)) {
+		put_device(dev);
+		return -EINVAL;
+	}
+
+	rc = down_write_killable(&cxl_region_rwsem);
+	if (rc)
+		goto out;
+	down_read(&cxl_dpa_rwsem);
+	rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos);
+	up_read(&cxl_dpa_rwsem);
+	up_write(&cxl_region_rwsem);
+out:
+	put_device(dev);
+	return rc;
+}
+
+static int detach_target(struct cxl_region *cxlr, int pos)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	int rc;
+
+	rc = down_write_killable(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+
+	if (pos >= p->interleave_ways) {
+		dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
+			p->interleave_ways);
+		rc = -ENXIO;
+		goto out;
+	}
+
+	if (!p->targets[pos]) {
+		rc = 0;
+		goto out;
+	}
+
+	cxl_region_detach(p->targets[pos]);
+	rc = 0;
+out:
+	up_write(&cxl_region_rwsem);
+	return rc;
+}
+
+static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
+			    size_t len)
+{
+	int rc;
+
+	if (sysfs_streq(buf, "\n"))
+		rc = detach_target(cxlr, pos);
+	else
+		rc = attach_target(cxlr, buf, pos);
+
+	if (rc < 0)
+		return rc;
+	return len;
+}
+
+#define TARGET_ATTR_RW(n)                                              \
+static ssize_t target##n##_show(                                       \
+	struct device *dev, struct device_attribute *attr, char *buf)  \
+{                                                                      \
+	return show_targetN(to_cxl_region(dev), buf, (n));             \
+}                                                                      \
+static ssize_t target##n##_store(struct device *dev,                   \
+				 struct device_attribute *attr,        \
+				 const char *buf, size_t len)          \
+{                                                                      \
+	return store_targetN(to_cxl_region(dev), buf, (n), len);       \
+}                                                                      \
+static DEVICE_ATTR_RW(target##n)
+
+TARGET_ATTR_RW(0);
+TARGET_ATTR_RW(1);
+TARGET_ATTR_RW(2);
+TARGET_ATTR_RW(3);
+TARGET_ATTR_RW(4);
+TARGET_ATTR_RW(5);
+TARGET_ATTR_RW(6);
+TARGET_ATTR_RW(7);
+TARGET_ATTR_RW(8);
+TARGET_ATTR_RW(9);
+TARGET_ATTR_RW(10);
+TARGET_ATTR_RW(11);
+TARGET_ATTR_RW(12);
+TARGET_ATTR_RW(13);
+TARGET_ATTR_RW(14);
+TARGET_ATTR_RW(15);
+
+static struct attribute *target_attrs[] = {
+	&dev_attr_target0.attr,
+	&dev_attr_target1.attr,
+	&dev_attr_target2.attr,
+	&dev_attr_target3.attr,
+	&dev_attr_target4.attr,
+	&dev_attr_target5.attr,
+	&dev_attr_target6.attr,
+	&dev_attr_target7.attr,
+	&dev_attr_target8.attr,
+	&dev_attr_target9.attr,
+	&dev_attr_target10.attr,
+	&dev_attr_target11.attr,
+	&dev_attr_target12.attr,
+	&dev_attr_target13.attr,
+	&dev_attr_target14.attr,
+	&dev_attr_target15.attr,
+	NULL,
+};
+
+static umode_t cxl_region_target_visible(struct kobject *kobj,
+					 struct attribute *a, int n)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+
+	if (n < p->interleave_ways)
+		return a->mode;
+	return 0;
+}
+
+static const struct attribute_group cxl_region_target_group = {
+	.attrs = target_attrs,
+	.is_visible = cxl_region_target_visible,
+};
+
+static const struct attribute_group *get_cxl_region_target_group(void)
+{
+	return &cxl_region_target_group;
+}
+
 static const struct attribute_group *region_groups[] = {
 	&cxl_base_attribute_group,
 	&cxl_region_group,
+	&cxl_region_target_group,
 	NULL,
 };
 
@@ -554,6 +814,26 @@ static ssize_t create_pmem_region_store(struct device *dev,
 }
 DEVICE_ATTR_RW(create_pmem_region);
 
+static ssize_t region_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+	ssize_t rc;
+
+	rc = down_read_interruptible(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+
+	if (cxld->region)
+		rc = sysfs_emit(buf, "%s\n", dev_name(&cxld->region->dev));
+	else
+		rc = sysfs_emit(buf, "\n");
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+DEVICE_ATTR_RO(region);
+
 static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld,
 						  const char *name)
 {
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 25960c1e4ebd..9340deccad4f 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -255,6 +255,7 @@ enum cxl_decoder_type {
  * @interleave_ways: number of cxl_dports in this decode
  * @interleave_granularity: data stride per dport
  * @target_type: accelerator vs expander (type2 vs type3) selector
+ * @region: currently assigned region for this decoder
  * @flags: memory type capabilities and locking
 */
 struct cxl_decoder {
@@ -264,14 +265,20 @@ struct cxl_decoder {
 	int interleave_ways;
 	int interleave_granularity;
 	enum cxl_decoder_type target_type;
+	struct cxl_region *region;
 	unsigned long flags;
 };
 
+/*
+ * CXL_DECODER_DEAD prevents endpoints from being reattached to regions
+ * while cxld_unregister() is running
+ */
 enum cxl_decoder_mode {
 	CXL_DECODER_NONE,
 	CXL_DECODER_RAM,
 	CXL_DECODER_PMEM,
 	CXL_DECODER_MIXED,
+	CXL_DECODER_DEAD,
 };
 
 /**
@@ -280,12 +287,14 @@ enum cxl_decoder_mode {
  * @dpa_res: actively claimed DPA span of this decoder
  * @skip: offset into @dpa_res where @cxld.hpa_range maps
  * @mode: which memory type / access-mode-partition this decoder targets
+ * @pos: interleave position in @cxld.region
  */
 struct cxl_endpoint_decoder {
 	struct cxl_decoder cxld;
 	struct resource *dpa_res;
 	resource_size_t skip;
 	enum cxl_decoder_mode mode;
+	int pos;
 };
 
 /**
@@ -344,6 +353,8 @@ struct cxl_region_params {
 	int interleave_ways;
 	int interleave_granularity;
 	struct resource *res;
+	struct cxl_endpoint_decoder *targets[CXL_DECODER_MAX_INTERLEAVE];
+	int nr_targets;
 };
 
 /**
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (37 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 15:48   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 40/46] cxl/region: Attach endpoint decoders Dan Williams
                   ` (9 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

The ACPI CXL Fixed Memory Window Structure (CFMWS) defines multiple
methods to determine which host bridge provides access to a given
endpoint relative to that device's position in the interleave. The
"Interleave Arithmetic" defines either a "standard modulo" /
round-random algorithm, or "xormap" based algorithm which can be defined
as a non-linear transform. Given that there are already more options
beyond "standard modulo" and that "xormap" may turn out to be ACPI CXL
specific, provide a callback for the region provisioning code to map
endpoint positions back to expected host bridge id (cxl_dport target).

For now just support the simple modulo math case and save the xormap for
a follow-on change.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/port.c | 15 +++++++++++++++
 drivers/cxl/cxl.h       |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 562a6453249b..7756409d0a58 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1422,6 +1422,20 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
 	return rc;
 }
 
+static struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos)
+{
+	struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
+	struct cxl_decoder *cxld = &cxlsd->cxld;
+	int iw;
+
+	iw = cxld->interleave_ways;
+	if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets,
+			  "misconfigured root decoder\n"))
+		return NULL;
+
+	return cxlrd->cxlsd.target[pos % iw];
+}
+
 static struct lock_class_key cxl_decoder_key;
 
 /**
@@ -1466,6 +1480,7 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
 				if (rc < 0)
 					goto err;
 				atomic_set(&cxlrd->region_id, rc);
+				cxlrd->calc_hb = cxl_hb_modulo;
 			} else
 				cxlsd = NULL;
 		} else {
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9340deccad4f..30227348f768 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -315,11 +315,13 @@ struct cxl_switch_decoder {
  * struct cxl_root_decoder - Static platform CXL address decoder
  * @res: host / parent resource for region allocations
  * @region_id: region id for next region provisioning event
+ * @calc_hb: which host bridge covers the n'th position by granularity
  * @cxlsd: base cxl switch decoder
  */
 struct cxl_root_decoder {
 	struct resource *res;
 	atomic_t region_id;
+	struct cxl_dport *(*calc_hb)(struct cxl_root_decoder *cxlrd, int pos);
 	struct cxl_switch_decoder cxlsd;
 };
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 40/46] cxl/region: Attach endpoint decoders
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (38 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-24 18:25   ` Jonathan Cameron
  2022-06-30 16:34   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 41/46] cxl/region: Program target lists Dan Williams
                   ` (8 subsequent siblings)
  48 siblings, 2 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams, Ben Widawsky

CXL regions (interleave sets) are made up of a set of memory devices
where each device maps a portion of the interleave with one of its
decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
As endpoint decoders are identified by a provisioning tool they can be
added to a region provided the region interleave properties are set
(way, granularity, HPA) and DPA has been assigned to the decoder.

The attach event triggers several validation checks, for example:
- is the DPA sized appropriately for the region
- is the decoder reachable via the host-bridges identified by the
  region's root decoder
- is the device already active in a different region position slot
- are there already regions with a higher HPA active on a given port
  (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)

...and the attach event affords an opportunity to collect data and
resources relevant to later programming the target lists in switch
decoders, for example:
- allocate a decoder at each cxl_port in the decode chain
- for a given switch port, how many the region's endpoints are hosted
  through the port
- how many unique targets (next hops) does a port need to map to reach
  those endpoints

The act of reconciling this information and deploying it to the decoder
configuration is saved for a follow-on patch.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/core.h   |   7 +
 drivers/cxl/core/port.c   |  10 +-
 drivers/cxl/core/region.c | 338 +++++++++++++++++++++++++++++++++++++-
 drivers/cxl/cxl.h         |  20 +++
 drivers/cxl/cxlmem.h      |   5 +
 5 files changed, 372 insertions(+), 8 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 36b6bd8dac2b..0e4e5c2d9452 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -42,6 +42,13 @@ resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
 resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
 extern struct rw_semaphore cxl_dpa_rwsem;
 
+bool is_switch_decoder(struct device *dev);
+static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
+					 struct cxl_memdev *cxlmd)
+{
+	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
+}
+
 int cxl_memdev_init(void);
 void cxl_memdev_exit(void);
 void cxl_mbox_init(void);
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 7756409d0a58..fde2a2e103d4 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -447,7 +447,7 @@ bool is_root_decoder(struct device *dev)
 }
 EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
 
-static bool is_switch_decoder(struct device *dev)
+bool is_switch_decoder(struct device *dev)
 {
 	return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
 }
@@ -503,6 +503,7 @@ static void cxl_port_release(struct device *dev)
 		cxl_ep_remove(port, ep);
 	xa_destroy(&port->endpoints);
 	xa_destroy(&port->dports);
+	xa_destroy(&port->regions);
 	ida_free(&cxl_port_ida, port->id);
 	kfree(port);
 }
@@ -633,6 +634,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
 	port->dpa_end = -1;
 	xa_init(&port->dports);
 	xa_init(&port->endpoints);
+	xa_init(&port->regions);
 
 	device_initialize(dev);
 	lockdep_set_class_and_subclass(&dev->mutex, &cxl_port_key, port->depth);
@@ -1110,12 +1112,6 @@ static void reap_dports(struct cxl_port *port)
 	}
 }
 
-static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
-				  struct cxl_memdev *cxlmd)
-{
-	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
-}
-
 int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd,
 			  struct cxl_dport *parent_dport)
 {
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 4830365f3857..65bf84abad57 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -428,6 +428,254 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
 	return rc;
 }
 
+static int match_free_decoder(struct device *dev, void *data)
+{
+	struct cxl_decoder *cxld;
+	int *id = data;
+
+	if (!is_switch_decoder(dev))
+		return 0;
+
+	cxld = to_cxl_decoder(dev);
+
+	/* enforce ordered allocation */
+	if (cxld->id != *id)
+		return 0;
+
+	if (!cxld->region)
+		return 1;
+
+	(*id)++;
+
+	return 0;
+}
+
+static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
+						   struct cxl_region *cxlr)
+{
+	struct device *dev;
+	int id = 0;
+
+	dev = device_find_child(&port->dev, &id, match_free_decoder);
+	if (!dev)
+		return NULL;
+	/*
+	 * This decoder is pinned registered as long as the endpoint decoder is
+	 * registered, and endpoint decoder unregistration holds the
+	 * cxl_region_rwsem over unregister events, so no need to hold on to
+	 * this extra reference.
+	 */
+	put_device(dev);
+	return to_cxl_decoder(dev);
+}
+
+static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
+					       struct cxl_region *cxlr)
+{
+	struct cxl_region_ref *cxl_rr;
+
+	cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL);
+	if (!cxl_rr)
+		return NULL;
+	cxl_rr->port = port;
+	cxl_rr->region = cxlr;
+	xa_init(&cxl_rr->endpoints);
+	return cxl_rr;
+}
+
+static void free_region_ref(struct cxl_region_ref *cxl_rr)
+{
+	struct cxl_port *port = cxl_rr->port;
+	struct cxl_region *cxlr = cxl_rr->region;
+	struct cxl_decoder *cxld = cxl_rr->decoder;
+
+	dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n");
+	if (cxld->region == cxlr) {
+		cxld->region = NULL;
+		put_device(&cxlr->dev);
+	}
+
+	xa_erase(&port->regions, (unsigned long)cxlr);
+	xa_destroy(&cxl_rr->endpoints);
+	kfree(cxl_rr);
+}
+
+static int cxl_rr_add(struct cxl_region_ref *cxl_rr)
+{
+	struct cxl_port *port = cxl_rr->port;
+	struct cxl_region *cxlr = cxl_rr->region;
+
+	return xa_insert(&port->regions, (unsigned long)cxlr, cxl_rr,
+			 GFP_KERNEL);
+}
+
+static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
+			 struct cxl_endpoint_decoder *cxled)
+{
+	int rc;
+	struct cxl_port *port = cxl_rr->port;
+	struct cxl_region *cxlr = cxl_rr->region;
+	struct cxl_decoder *cxld = cxl_rr->decoder;
+	struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled));
+
+	rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep,
+			 GFP_KERNEL);
+	if (rc)
+		return rc;
+	cxl_rr->nr_eps++;
+
+	if (!cxld->region) {
+		cxld->region = cxlr;
+		get_device(&cxlr->dev);
+	}
+
+	return 0;
+}
+
+static int cxl_port_attach_region(struct cxl_port *port,
+				  struct cxl_region *cxlr,
+				  struct cxl_endpoint_decoder *cxled, int pos)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
+	struct cxl_region_ref *cxl_rr = NULL, *iter;
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_decoder *cxld = NULL;
+	unsigned long index;
+	int rc = -EBUSY;
+
+	lockdep_assert_held_write(&cxl_region_rwsem);
+
+	xa_for_each(&port->regions, index, iter) {
+		struct cxl_region_params *ip = &iter->region->params;
+
+		if (iter->region == cxlr)
+			cxl_rr = iter;
+		if (ip->res->start > p->res->start) {
+			dev_dbg(&cxlr->dev,
+				"%s: HPA order violation %s:%pr vs %pr\n",
+				dev_name(&port->dev),
+				dev_name(&iter->region->dev), ip->res, p->res);
+			return -EBUSY;
+		}
+	}
+
+	if (cxl_rr) {
+		struct cxl_ep *ep_iter;
+		int found = 0;
+
+		cxld = cxl_rr->decoder;
+		xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
+			if (ep_iter == ep)
+				continue;
+			if (ep_iter->next == ep->next) {
+				found++;
+				break;
+			}
+		}
+
+		/*
+		 * If this is a new target or if this port is direct connected
+		 * to this endpoint then add to the target count.
+		 */
+		if (!found || !ep->next)
+			cxl_rr->nr_targets++;
+	} else {
+		cxl_rr = alloc_region_ref(port, cxlr);
+		if (!cxl_rr) {
+			dev_dbg(&cxlr->dev,
+				"%s: failed to allocate region reference\n",
+				dev_name(&port->dev));
+			return -ENOMEM;
+		}
+		rc = cxl_rr_add(cxl_rr);
+		if (rc) {
+			dev_dbg(&cxlr->dev,
+				"%s: failed to track region reference\n",
+				dev_name(&port->dev));
+			kfree(cxl_rr);
+			return rc;
+		}
+	}
+
+	if (!cxld) {
+		if (port == cxled_to_port(cxled))
+			cxld = &cxled->cxld;
+		else
+			cxld = cxl_region_find_decoder(port, cxlr);
+		if (!cxld) {
+			dev_dbg(&cxlr->dev, "%s: no decoder available\n",
+				dev_name(&port->dev));
+			goto out_erase;
+		}
+
+		if (cxld->region) {
+			dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n",
+				dev_name(&port->dev), dev_name(&cxld->dev),
+				dev_name(&cxld->region->dev));
+			rc = -EBUSY;
+			goto out_erase;
+		}
+
+		cxl_rr->decoder = cxld;
+	}
+
+	rc = cxl_rr_ep_add(cxl_rr, cxled);
+	if (rc) {
+		dev_dbg(&cxlr->dev,
+			"%s: failed to track endpoint %s:%s reference\n",
+			dev_name(&port->dev), dev_name(&cxlmd->dev),
+			dev_name(&cxld->dev));
+		goto out_erase;
+	}
+
+	return 0;
+out_erase:
+	if (cxl_rr->nr_eps == 0)
+		free_region_ref(cxl_rr);
+	return rc;
+}
+
+static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port,
+					  struct cxl_region *cxlr)
+{
+	return xa_load(&port->regions, (unsigned long)cxlr);
+}
+
+static void cxl_port_detach_region(struct cxl_port *port,
+				   struct cxl_region *cxlr,
+				   struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_region_ref *cxl_rr;
+	struct cxl_ep *ep;
+
+	lockdep_assert_held_write(&cxl_region_rwsem);
+
+	cxl_rr = cxl_rr_load(port, cxlr);
+	if (!cxl_rr)
+		return;
+
+	ep = xa_erase(&cxl_rr->endpoints, (unsigned long)cxled);
+	if (ep) {
+		struct cxl_ep *ep_iter;
+		unsigned long index;
+		int found = 0;
+
+		cxl_rr->nr_eps--;
+		xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
+			if (ep_iter->next == ep->next) {
+				found++;
+				break;
+			}
+		}
+		if (!found)
+			cxl_rr->nr_targets--;
+	}
+
+	if (cxl_rr->nr_eps == 0)
+		free_region_ref(cxl_rr);
+}
+
 /*
  * - Check that the given endpoint is attached to a host-bridge identified
  *   in the root interleave.
@@ -435,14 +683,28 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
 static int cxl_region_attach(struct cxl_region *cxlr,
 			     struct cxl_endpoint_decoder *cxled, int pos)
 {
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_port *ep_port, *root_port, *iter;
 	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_dport *dport;
+	int i, rc = -ENXIO;
 
 	if (cxled->mode == CXL_DECODER_DEAD) {
 		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
 		return -ENODEV;
 	}
 
-	if (pos >= p->interleave_ways) {
+	/* all full of members, or interleave config not established? */
+	if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
+		dev_dbg(&cxlr->dev, "region already active\n");
+		return -EBUSY;
+	} else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
+		dev_dbg(&cxlr->dev, "interleave config missing\n");
+		return -ENXIO;
+	}
+
+	if (pos < 0 || pos >= p->interleave_ways) {
 		dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
 			p->interleave_ways);
 		return -ENXIO;
@@ -461,15 +723,83 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 		return -EBUSY;
 	}
 
+	for (i = 0; i < p->interleave_ways; i++) {
+		struct cxl_endpoint_decoder *cxled_target;
+		struct cxl_memdev *cxlmd_target;
+
+		cxled_target = p->targets[pos];
+		if (!cxled_target)
+			continue;
+
+		cxlmd_target = cxled_to_memdev(cxled_target);
+		if (cxlmd_target == cxlmd) {
+			dev_dbg(&cxlr->dev,
+				"%s already specified at position %d via: %s\n",
+				dev_name(&cxlmd->dev), pos,
+				dev_name(&cxled_target->cxld.dev));
+			return -EBUSY;
+		}
+	}
+
+	ep_port = cxled_to_port(cxled);
+	root_port = cxlrd_to_port(cxlrd);
+	dport = cxl_dport_load(root_port, ep_port->host_bridge);
+	if (!dport) {
+		dev_dbg(&cxlr->dev, "%s:%s invalid target for %s\n",
+			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+			dev_name(cxlr->dev.parent));
+		return -ENXIO;
+	}
+
+	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
+		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
+			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+			dev_name(&cxlrd->cxlsd.cxld.dev));
+		return -ENXIO;
+	}
+
+	if (cxled->cxld.target_type != cxlr->type) {
+		dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
+			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+			cxled->cxld.target_type, cxlr->type);
+		return -ENXIO;
+	}
+
+	if (resource_size(cxled->dpa_res) * p->interleave_ways !=
+	    resource_size(p->res)) {
+		dev_dbg(&cxlr->dev,
+			"decoder-size-%#llx * ways-%d != region-size-%#llx\n",
+			(u64)resource_size(cxled->dpa_res), p->interleave_ways,
+			(u64)resource_size(p->res));
+		return -EINVAL;
+	}
+
+	for (iter = ep_port; !is_cxl_root(iter);
+	     iter = to_cxl_port(iter->dev.parent)) {
+		rc = cxl_port_attach_region(iter, cxlr, cxled, pos);
+		if (rc)
+			goto err;
+	}
+
 	p->targets[pos] = cxled;
 	cxled->pos = pos;
 	p->nr_targets++;
 
+	if (p->nr_targets == p->interleave_ways)
+		p->state = CXL_CONFIG_ACTIVE;
+
 	return 0;
+
+err:
+	for (iter = ep_port; !is_cxl_root(iter);
+	     iter = to_cxl_port(iter->dev.parent))
+		cxl_port_detach_region(iter, cxlr, cxled);
+	return rc;
 }
 
 static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
 {
+	struct cxl_port *iter, *ep_port = cxled_to_port(cxled);
 	struct cxl_region *cxlr = cxled->cxld.region;
 	struct cxl_region_params *p;
 
@@ -481,6 +811,10 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
 	p = &cxlr->params;
 	get_device(&cxlr->dev);
 
+	for (iter = ep_port; !is_cxl_root(iter);
+	     iter = to_cxl_port(iter->dev.parent))
+		cxl_port_detach_region(iter, cxlr, cxled);
+
 	if (cxled->pos < 0 || cxled->pos >= p->interleave_ways ||
 	    p->targets[cxled->pos] != cxled) {
 		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
@@ -491,6 +825,8 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
 		goto out;
 	}
 
+	if (p->state == CXL_CONFIG_ACTIVE)
+		p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
 	p->targets[cxled->pos] = NULL;
 	p->nr_targets--;
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 30227348f768..09dbd46cc4c7 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -414,6 +414,7 @@ struct cxl_nvdimm {
  * @id: id for port device-name
  * @dports: cxl_dport instances referenced by decoders
  * @endpoints: cxl_ep instances, endpoints that are a descendant of this port
+ * @regions: cxl_region_ref instances, regions mapped by this port
  * @parent_dport: dport that points to this port in the parent
  * @decoder_ida: allocator for decoder ids
  * @dpa_end: cursor to track highest allocated decoder for allocation ordering
@@ -428,6 +429,7 @@ struct cxl_port {
 	int id;
 	struct xarray dports;
 	struct xarray endpoints;
+	struct xarray regions;
 	struct cxl_dport *parent_dport;
 	struct ida decoder_ida;
 	int dpa_end;
@@ -470,6 +472,24 @@ struct cxl_ep {
 	struct cxl_port *next;
 };
 
+/**
+ * struct cxl_region_ref - track a region's interest in a port
+ * @port: point in topology to install this reference
+ * @decoder: decoder assigned for @region in @port
+ * @region: region for this reference
+ * @endpoints: cxl_ep references for region members beneath @port
+ * @nr_eps: number of endpoints beneath @port
+ * @nr_targets: number of distinct targets needed to reach @nr_eps
+ */
+struct cxl_region_ref {
+	struct cxl_port *port;
+	struct cxl_decoder *decoder;
+	struct cxl_region *region;
+	struct xarray endpoints;
+	int nr_eps;
+	int nr_targets;
+};
+
 /*
  * The platform firmware device hosting the root is also the top of the
  * CXL port topology. All other CXL ports have another CXL port as their
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index eee96016c3c7..a83bb6782d23 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -55,6 +55,11 @@ static inline struct cxl_port *cxled_to_port(struct cxl_endpoint_decoder *cxled)
 	return to_cxl_port(cxled->cxld.dev.parent);
 }
 
+static inline struct cxl_port *cxlrd_to_port(struct cxl_root_decoder *cxlrd)
+{
+	return to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
+}
+
 static inline struct cxl_memdev *
 cxled_to_memdev(struct cxl_endpoint_decoder *cxled)
 {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 41/46] cxl/region: Program target lists
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (39 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 40/46] cxl/region: Attach endpoint decoders Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-24  4:19 ` [PATCH 42/46] cxl/hdm: Commit decoder state to hardware Dan Williams
                   ` (7 subsequent siblings)
  48 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

Once the region's interleave geometry (ways, granularity, size) is
established and all the endpoint decoder targets are assigned, the next
phase is to program all the intermediate decoders. Specifically, each
CXL switch in the path between the endpoint and its CXL host-bridge
(including the logical switch internal to the host-bridge) needs to have
its decoders programmed and the target list order assigned.

The difficulty in this implementation lies in determining which endpoint
decoder ordering combinations are valid. Consider the cxl_test case of 2
host bridges, each of those host-bridges attached to 2 switches, and
each of those switches attached to 2 endpoints for a potential 8-way
interleave. The x2 interleave at the host-bridge level requires that all
even numbered endpoint decoder positions be located on the "left" hand
side of the topology tree, and the odd numbered positions on the other.
The endpoints that are peers on the same switch need to have a position
that can be routed with a dedicated address bit per-endpoint. See
check_last_peer() for the details.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/core.h   |   4 +
 drivers/cxl/core/port.c   |   4 +-
 drivers/cxl/core/region.c | 262 ++++++++++++++++++++++++++++++++++++--
 drivers/cxl/cxl.h         |   2 +
 4 files changed, 260 insertions(+), 12 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 0e4e5c2d9452..6f5c4fb85879 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -43,9 +43,13 @@ resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
 extern struct rw_semaphore cxl_dpa_rwsem;
 
 bool is_switch_decoder(struct device *dev);
+struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
 static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
 					 struct cxl_memdev *cxlmd)
 {
+	if (!port)
+		return NULL;
+
 	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
 }
 
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index fde2a2e103d4..7034300e72b2 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -146,8 +146,6 @@ static ssize_t emit_target_list(struct cxl_switch_decoder *cxlsd, char *buf)
 	return offset;
 }
 
-static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
-
 static ssize_t target_list_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
@@ -471,7 +469,7 @@ struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev)
 }
 EXPORT_SYMBOL_NS_GPL(to_cxl_endpoint_decoder, CXL);
 
-static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
+struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
 {
 	if (dev_WARN_ONCE(dev, !is_switch_decoder(dev),
 			  "not a cxl_switch_decoder device\n"))
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 65bf84abad57..071b8cafe2bb 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -479,6 +479,7 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
 		return NULL;
 	cxl_rr->port = port;
 	cxl_rr->region = cxlr;
+	cxl_rr->nr_targets = 1;
 	xa_init(&cxl_rr->endpoints);
 	return cxl_rr;
 }
@@ -518,10 +519,12 @@ static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
 	struct cxl_decoder *cxld = cxl_rr->decoder;
 	struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled));
 
-	rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep,
-			 GFP_KERNEL);
-	if (rc)
-		return rc;
+	if (ep) {
+		rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep,
+			       GFP_KERNEL);
+		if (rc)
+			return rc;
+	}
 	cxl_rr->nr_eps++;
 
 	if (!cxld->region) {
@@ -537,7 +540,7 @@ static int cxl_port_attach_region(struct cxl_port *port,
 				  struct cxl_endpoint_decoder *cxled, int pos)
 {
 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
-	struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
+	const struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
 	struct cxl_region_ref *cxl_rr = NULL, *iter;
 	struct cxl_region_params *p = &cxlr->params;
 	struct cxl_decoder *cxld = NULL;
@@ -629,6 +632,16 @@ static int cxl_port_attach_region(struct cxl_port *port,
 		goto out_erase;
 	}
 
+	dev_dbg(&cxlr->dev,
+		"%s:%s %s add: %s:%s @ %d next: %s nr_eps: %d nr_targets: %d\n",
+		dev_name(port->uport), dev_name(&port->dev),
+		dev_name(&cxld->dev), dev_name(&cxlmd->dev),
+		dev_name(&cxled->cxld.dev), pos,
+		ep ? ep->next ? dev_name(ep->next->uport) :
+				      dev_name(&cxlmd->dev) :
+			   "none",
+		cxl_rr->nr_eps, cxl_rr->nr_targets);
+
 	return 0;
 out_erase:
 	if (cxl_rr->nr_eps == 0)
@@ -647,7 +660,7 @@ static void cxl_port_detach_region(struct cxl_port *port,
 				   struct cxl_endpoint_decoder *cxled)
 {
 	struct cxl_region_ref *cxl_rr;
-	struct cxl_ep *ep;
+	struct cxl_ep *ep = NULL;
 
 	lockdep_assert_held_write(&cxl_region_rwsem);
 
@@ -655,7 +668,14 @@ static void cxl_port_detach_region(struct cxl_port *port,
 	if (!cxl_rr)
 		return;
 
-	ep = xa_erase(&cxl_rr->endpoints, (unsigned long)cxled);
+	/*
+	 * Endpoint ports do not carry cxl_ep references, and they
+	 * never target more than one endpoint by definition
+	 */
+	if (cxl_rr->decoder == &cxled->cxld)
+		cxl_rr->nr_eps--;
+	else
+		ep = xa_erase(&cxl_rr->endpoints, (unsigned long)cxled);
 	if (ep) {
 		struct cxl_ep *ep_iter;
 		unsigned long index;
@@ -676,6 +696,224 @@ static void cxl_port_detach_region(struct cxl_port *port,
 		free_region_ref(cxl_rr);
 }
 
+static int check_last_peer(struct cxl_endpoint_decoder *cxled,
+			   struct cxl_ep *ep, struct cxl_region_ref *cxl_rr,
+			   int distance)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_region *cxlr = cxl_rr->region;
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_endpoint_decoder *cxled_peer;
+	struct cxl_port *port = cxl_rr->port;
+	struct cxl_memdev *cxlmd_peer;
+	struct cxl_ep *ep_peer;
+	int pos = cxled->pos;
+
+	/*
+	 * If this position wants to share a dport with the last endpoint mapped
+	 * then that endpoint, at index 'position - distance', must also be
+	 * mapped by this dport.
+	 */
+	if (pos < distance) {
+		dev_dbg(&cxlr->dev, "%s:%s: cannot host %s:%s at %d\n",
+			dev_name(port->uport), dev_name(&port->dev),
+			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos);
+		return -ENXIO;
+	}
+	cxled_peer = p->targets[pos - distance];
+	cxlmd_peer = cxled_to_memdev(cxled_peer);
+	ep_peer = cxl_ep_load(port, cxlmd_peer);
+	if (ep->dport != ep_peer->dport) {
+		dev_dbg(&cxlr->dev,
+			"%s:%s: %s:%s pos %d mismatched peer %s:%s\n",
+			dev_name(port->uport), dev_name(&port->dev),
+			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos,
+			dev_name(&cxlmd_peer->dev),
+			dev_name(&cxled_peer->cxld.dev));
+		return -ENXIO;
+	}
+
+	return 0;
+}
+
+static int cxl_port_setup_targets(struct cxl_port *port,
+				  struct cxl_region *cxlr,
+				  struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+	int parent_iw, parent_ig, ig, iw, rc, inc = 0, pos = cxled->pos;
+	struct cxl_port *parent_port = to_cxl_port(port->dev.parent);
+	struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr);
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_decoder *cxld = cxl_rr->decoder;
+	struct cxl_switch_decoder *cxlsd;
+	u16 eig, peig;
+	u8 eiw, peiw;
+
+	/*
+	 * While root level decoders support x3, x6, x12, switch level
+	 * decoders only support powers of 2 up to x16.
+	 */
+	if (!is_power_of_2(cxl_rr->nr_targets)) {
+		dev_dbg(&cxlr->dev, "%s:%s: invalid target count %d\n",
+			dev_name(port->uport), dev_name(&port->dev),
+			cxl_rr->nr_targets);
+		return -EINVAL;
+	}
+
+	cxlsd = to_cxl_switch_decoder(&cxld->dev);
+	if (cxl_rr->nr_targets_set) {
+		int i, distance;
+
+		distance = p->nr_targets / cxl_rr->nr_targets;
+		for (i = 0; i < cxl_rr->nr_targets_set; i++)
+			if (ep->dport == cxlsd->target[i]) {
+				rc = check_last_peer(cxled, ep, cxl_rr,
+						     distance);
+				if (rc)
+					return rc;
+				goto out_target_set;
+			}
+		goto add_target;
+	}
+
+	if (is_cxl_root(parent_port)) {
+		parent_ig = cxlrd->cxlsd.cxld.interleave_granularity;
+		parent_iw = cxlrd->cxlsd.cxld.interleave_ways;
+		/*
+		 * For purposes of address bit routing, use power-of-2 math for
+		 * switch ports.
+		 */
+		if (!is_power_of_2(parent_iw))
+			parent_iw /= 3;
+	} else {
+		struct cxl_region_ref *parent_rr;
+		struct cxl_decoder *parent_cxld;
+
+		parent_rr = cxl_rr_load(parent_port, cxlr);
+		parent_cxld = parent_rr->decoder;
+		parent_ig = parent_cxld->interleave_granularity;
+		parent_iw = parent_cxld->interleave_ways;
+	}
+
+	granularity_to_cxl(parent_ig, &peig);
+	ways_to_cxl(parent_iw, &peiw);
+
+	iw = cxl_rr->nr_targets;
+	ways_to_cxl(iw, &eiw);
+	if (cxl_rr->nr_targets > 1) {
+		u32 address_bit = max(peig + peiw, eiw + peig);
+
+		eig = address_bit - eiw + 1;
+	} else {
+		eiw = peiw;
+		eig = peig;
+	}
+
+	rc = cxl_to_granularity(eig, &ig);
+	if (rc) {
+		dev_dbg(&cxlr->dev, "%s:%s: invalid interleave: %d\n",
+			dev_name(port->uport), dev_name(&port->dev),
+			256 << eig);
+		return rc;
+	}
+
+	cxld->interleave_ways = iw;
+	cxld->interleave_granularity = ig;
+	dev_dbg(&cxlr->dev, "%s:%s iw: %d ig: %d\n", dev_name(port->uport),
+		dev_name(&port->dev), iw, ig);
+add_target:
+	if (cxl_rr->nr_targets_set == cxl_rr->nr_targets) {
+		dev_dbg(&cxlr->dev,
+			"%s:%s: targets full trying to add %s:%s at %d\n",
+			dev_name(port->uport), dev_name(&port->dev),
+			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos);
+		return -ENXIO;
+	}
+	cxlsd->target[cxl_rr->nr_targets_set] = ep->dport;
+	inc = 1;
+out_target_set:
+	cxl_rr->nr_targets_set += inc;
+	dev_dbg(&cxlr->dev, "%s:%s target[%d] = %s for %s:%s @ %d\n",
+		dev_name(port->uport), dev_name(&port->dev),
+		cxl_rr->nr_targets_set - 1, dev_name(ep->dport->dport),
+		dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos);
+
+	return 0;
+}
+
+static void cxl_port_reset_targets(struct cxl_port *port,
+				   struct cxl_region *cxlr)
+{
+	struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr);
+
+	/*
+	 * After the last endpoint has been detached the entire cxl_rr may now
+	 * be gone.
+	 */
+	if (cxl_rr)
+		cxl_rr->nr_targets_set = 0;
+}
+
+static void cxl_region_teardown_targets(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_endpoint_decoder *cxled;
+	struct cxl_memdev *cxlmd;
+	struct cxl_port *iter;
+	struct cxl_ep *ep;
+	int i;
+
+	for (i = 0; i < p->nr_targets; i++) {
+		cxled = p->targets[i];
+		cxlmd = cxled_to_memdev(cxled);
+
+		iter = cxled_to_port(cxled);
+		while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
+			iter = to_cxl_port(iter->dev.parent);
+
+		for (ep = cxl_ep_load(iter, cxlmd); iter;
+		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd))
+			cxl_port_reset_targets(iter, cxlr);
+	}
+}
+
+static int cxl_region_setup_targets(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	struct cxl_endpoint_decoder *cxled;
+	struct cxl_memdev *cxlmd;
+	struct cxl_port *iter;
+	struct cxl_ep *ep;
+	int i, rc;
+
+	for (i = 0; i < p->nr_targets; i++) {
+		cxled = p->targets[i];
+		cxlmd = cxled_to_memdev(cxled);
+
+		iter = cxled_to_port(cxled);
+		while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
+			iter = to_cxl_port(iter->dev.parent);
+
+		/*
+		 * Descend the topology tree programming targets while
+		 * looking for conflicts.
+		 */
+		for (ep = cxl_ep_load(iter, cxlmd); iter;
+		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
+			rc = cxl_port_setup_targets(iter, cxlr, cxled);
+			if (rc) {
+				cxl_region_teardown_targets(cxlr);
+				return rc;
+			}
+		}
+	}
+
+	return 0;
+}
+
 /*
  * - Check that the given endpoint is attached to a host-bridge identified
  *   in the root interleave.
@@ -785,8 +1023,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 	cxled->pos = pos;
 	p->nr_targets++;
 
-	if (p->nr_targets == p->interleave_ways)
+	if (p->nr_targets == p->interleave_ways) {
+		rc = cxl_region_setup_targets(cxlr);
+		if (rc)
+			goto err;
 		p->state = CXL_CONFIG_ACTIVE;
+	}
 
 	return 0;
 
@@ -825,8 +1067,10 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
 		goto out;
 	}
 
-	if (p->state == CXL_CONFIG_ACTIVE)
+	if (p->state == CXL_CONFIG_ACTIVE) {
 		p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
+		cxl_region_teardown_targets(cxlr);
+	}
 	p->targets[cxled->pos] = NULL;
 	p->nr_targets--;
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 09dbd46cc4c7..a93d7c4efd1a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -478,6 +478,7 @@ struct cxl_ep {
  * @decoder: decoder assigned for @region in @port
  * @region: region for this reference
  * @endpoints: cxl_ep references for region members beneath @port
+ * @nr_targets_set: track how many targets have been programmed during setup
  * @nr_eps: number of endpoints beneath @port
  * @nr_targets: number of distinct targets needed to reach @nr_eps
  */
@@ -486,6 +487,7 @@ struct cxl_region_ref {
 	struct cxl_decoder *decoder;
 	struct cxl_region *region;
 	struct xarray endpoints;
+	int nr_targets_set;
 	int nr_eps;
 	int nr_targets;
 };
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 42/46] cxl/hdm: Commit decoder state to hardware
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (40 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 41/46] cxl/region: Program target lists Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 17:05   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 43/46] cxl/region: Add region driver boiler plate Dan Williams
                   ` (6 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

After all the soft validation of the region has completed, convey the
region configuration to hardware while being careful to commit decoders
in specification mandated order. In addition to programming the endpoint
decoder base-addres, intereleave ways and granularity, the switch
decoder target lists are also established.

While the kernel can enforce spec-mandated commit order, it can not
enforce spec-mandated reset order. For example, the kernel can't stop
someone from removing an endpoint device that is occupying decoderN in a
switch decoder where decoderN+1 is also committed. To reset decoderN,
decoderN+1 must be torn down first. That "tear down the world"
implementation is saved for a follow-on patch.

Callback operations are provided for the 'commit' and 'reset'
operations. While those callbacks may prove useful for CXL accelerators
(Type-2 devices with memory) the primary motivation is to enable a
simple way for cxl_test to intercept those operations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl |  16 ++
 drivers/cxl/core/hdm.c                  | 218 ++++++++++++++++++++++++
 drivers/cxl/core/port.c                 |   1 +
 drivers/cxl/core/region.c               | 189 ++++++++++++++++++--
 drivers/cxl/cxl.h                       |  11 ++
 tools/testing/cxl/test/cxl.c            |  46 +++++
 6 files changed, 471 insertions(+), 10 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index f1b74a71927d..0debe2955f34 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -338,3 +338,19 @@ Description:
 		not an endpoint decoder. Once all positions have been
 		successfully written a final validation for decode conflicts is
 		performed before activating the region.
+
+
+What:		/sys/bus/cxl/devices/regionZ/commit
+Date:		May, 2022
+KernelVersion:	v5.20
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RW) Write a boolean 'true' string value to this attribute to
+		trigger the region to transition from the software programmed
+		state to the actively decoding in hardware state. The commit
+		operation in addition to validating that the region is in proper
+		configured state, validates that the decoders are being
+		committed in spec mandated order (last committed decoder id +
+		1), and checks that the hardware accepts the commit request.
+		Reading this value indicates whether the region is committed or
+		not.
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 2ee62dde8b23..72f98f1a782c 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -129,6 +129,8 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
 		return ERR_PTR(-ENXIO);
 	}
 
+	dev_set_drvdata(&port->dev, cxlhdm);
+
 	return cxlhdm;
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
@@ -444,6 +446,213 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
 	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
 }
 
+static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
+{
+	u16 eig;
+	u8 eiw;
+
+	ways_to_cxl(cxld->interleave_ways, &eiw);
+	granularity_to_cxl(cxld->interleave_granularity, &eig);
+
+	u32p_replace_bits(ctrl, eig, CXL_HDM_DECODER0_CTRL_IG_MASK);
+	u32p_replace_bits(ctrl, eiw, CXL_HDM_DECODER0_CTRL_IW_MASK);
+	*ctrl |= CXL_HDM_DECODER0_CTRL_COMMIT;
+}
+
+static void cxld_set_type(struct cxl_decoder *cxld, u32 *ctrl)
+{
+	u32p_replace_bits(ctrl, !!(cxld->target_type == 3),
+			  CXL_HDM_DECODER0_CTRL_TYPE);
+}
+
+static void cxld_set_hpa(struct cxl_decoder *cxld, u64 *base, u64 *size)
+{
+	struct cxl_region *cxlr = cxld->region;
+	struct cxl_region_params *p = &cxlr->params;
+
+	cxld->hpa_range = (struct range) {
+		.start = p->res->start,
+		.end = p->res->end,
+	};
+
+	*base = p->res->start;
+	*size = resource_size(p->res);
+}
+
+static void cxld_clear_hpa(struct cxl_decoder *cxld)
+{
+	cxld->hpa_range = (struct range) {
+		.start = 0,
+		.end = -1,
+	};
+}
+
+static int cxlsd_set_targets(struct cxl_switch_decoder *cxlsd, u64 *tgt)
+{
+	struct cxl_dport **t = &cxlsd->target[0];
+	int ways = cxlsd->cxld.interleave_ways;
+
+	if (dev_WARN_ONCE(&cxlsd->cxld.dev,
+			  ways > 8 || ways > cxlsd->nr_targets,
+			  "ways: %d overflows targets: %d\n", ways,
+			  cxlsd->nr_targets))
+		return -ENXIO;
+
+	*tgt = FIELD_PREP(GENMASK(7, 0), t[0]->port_id);
+	if (ways > 1)
+		*tgt |= FIELD_PREP(GENMASK(15, 8), t[1]->port_id);
+	if (ways > 2)
+		*tgt |= FIELD_PREP(GENMASK(23, 16), t[2]->port_id);
+	if (ways > 3)
+		*tgt |= FIELD_PREP(GENMASK(31, 24), t[3]->port_id);
+	if (ways > 4)
+		*tgt |= FIELD_PREP(GENMASK_ULL(39, 32), t[4]->port_id);
+	if (ways > 5)
+		*tgt |= FIELD_PREP(GENMASK_ULL(47, 40), t[5]->port_id);
+	if (ways > 6)
+		*tgt |= FIELD_PREP(GENMASK_ULL(55, 48), t[6]->port_id);
+	if (ways > 7)
+		*tgt |= FIELD_PREP(GENMASK_ULL(63, 56), t[7]->port_id);
+
+	return 0;
+}
+
+/*
+ * Per CXL 2.0 8.2.5.12.20 Committing Decoder Programming, hardware must set
+ * committed or error within 10ms, but just be generous with 20ms to account for
+ * clock skew and other marginal behavior
+ */
+#define COMMIT_TIMEOUT_MS 20
+static int cxld_await_commit(void __iomem *hdm, int id)
+{
+	u32 ctrl;
+	int i;
+
+	for (i = 0; i < COMMIT_TIMEOUT_MS; i++) {
+		ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
+		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMIT_ERROR, ctrl)) {
+			ctrl &= ~CXL_HDM_DECODER0_CTRL_COMMIT;
+			writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
+			return -EIO;
+		}
+		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl))
+			return 0;
+		fsleep(1000);
+	}
+
+	return -ETIMEDOUT;
+}
+
+static int cxl_decoder_commit(struct cxl_decoder *cxld)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
+	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+	int id = cxld->id, rc;
+	u64 base, size;
+	u32 ctrl;
+
+	if (cxld->flags & CXL_DECODER_F_ENABLE)
+		return 0;
+
+	if (port->commit_end + 1 != id) {
+		dev_dbg(&port->dev,
+			"%s: out of order commit, expected decoder%d.%d\n",
+			dev_name(&cxld->dev), port->id, port->commit_end + 1);
+		return -EBUSY;
+	}
+
+	down_read(&cxl_dpa_rwsem);
+	/* common decoder settings */
+	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(cxld->id));
+	cxld_set_interleave(cxld, &ctrl);
+	cxld_set_type(cxld, &ctrl);
+	cxld_set_hpa(cxld, &base, &size);
+
+	writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
+	writel(lower_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
+	writel(upper_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(id));
+	writel(lower_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id));
+
+	if (is_switch_decoder(&cxld->dev)) {
+		struct cxl_switch_decoder *cxlsd =
+			to_cxl_switch_decoder(&cxld->dev);
+		void __iomem *tl_hi = hdm + CXL_HDM_DECODER0_TL_HIGH(id);
+		void __iomem *tl_lo = hdm + CXL_HDM_DECODER0_TL_LOW(id);
+		u64 targets;
+
+		rc = cxlsd_set_targets(cxlsd, &targets);
+		if (rc) {
+			dev_dbg(&port->dev, "%s: target configuration error\n",
+				dev_name(&cxld->dev));
+			goto err;
+		}
+
+		writel(upper_32_bits(targets), tl_hi);
+		writel(lower_32_bits(targets), tl_lo);
+	} else {
+		struct cxl_endpoint_decoder *cxled =
+			to_cxl_endpoint_decoder(&cxld->dev);
+		void __iomem *sk_hi = hdm + CXL_HDM_DECODER0_SKIP_HIGH(id);
+		void __iomem *sk_lo = hdm + CXL_HDM_DECODER0_SKIP_LOW(id);
+
+		writel(upper_32_bits(cxled->skip), sk_hi);
+		writel(lower_32_bits(cxled->skip), sk_lo);
+	}
+
+	writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
+	up_read(&cxl_dpa_rwsem);
+
+	port->commit_end++;
+	rc = cxld_await_commit(hdm, cxld->id);
+err:
+	if (rc) {
+		dev_dbg(&port->dev, "%s: error %d committing decoder\n",
+			dev_name(&cxld->dev), rc);
+		cxld->reset(cxld);
+		return rc;
+	}
+	cxld->flags |= CXL_DECODER_F_ENABLE;
+
+	return 0;
+}
+
+static int cxl_decoder_reset(struct cxl_decoder *cxld)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
+	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+	int id = cxld->id;
+	u32 ctrl;
+
+	if ((cxld->flags & CXL_DECODER_F_ENABLE) ==  0)
+		return 0;
+
+	if (port->commit_end != id) {
+		dev_dbg(&port->dev,
+			"%s: out of order reset, expected decoder%d.%d\n",
+			dev_name(&cxld->dev), port->id, port->commit_end);
+		return -EBUSY;
+	}
+
+	down_read(&cxl_dpa_rwsem);
+	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
+	ctrl &= ~CXL_HDM_DECODER0_CTRL_COMMIT;
+	writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
+
+	cxld_clear_hpa(cxld);
+	writel(0, hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(id));
+	writel(0, hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id));
+	writel(0, hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
+	writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
+	up_read(&cxl_dpa_rwsem);
+
+	port->commit_end--;
+	cxld->flags &= ~CXL_DECODER_F_ENABLE;
+
+	return 0;
+}
+
 static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 			    int *target_map, void __iomem *hdm, int which,
 			    u64 *dpa_base)
@@ -466,6 +675,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));
 	size = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(which));
 	committed = !!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED);
+	cxld->commit = cxl_decoder_commit;
+	cxld->reset = cxl_decoder_reset;
 
 	if (!committed)
 		size = 0;
@@ -489,6 +700,13 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 			cxld->target_type = CXL_DECODER_EXPANDER;
 		else
 			cxld->target_type = CXL_DECODER_ACCELERATOR;
+		if (cxld->id != port->commit_end + 1) {
+			dev_warn(&port->dev,
+				 "decoder%d.%d: Committed out of order\n",
+				 port->id, cxld->id);
+			return -ENXIO;
+		}
+		port->commit_end = cxld->id;
 	} else {
 		/* unless / until type-2 drivers arrive, assume type-3 */
 		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl) == 0) {
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 7034300e72b2..eee1615d2319 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -630,6 +630,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
 	port->component_reg_phys = component_reg_phys;
 	ida_init(&port->decoder_ida);
 	port->dpa_end = -1;
+	port->commit_end = -1;
 	xa_init(&port->dports);
 	xa_init(&port->endpoints);
 	xa_init(&port->regions);
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 071b8cafe2bb..b90160c4f975 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -112,6 +112,168 @@ static ssize_t uuid_store(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RW(uuid);
 
+static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port,
+					  struct cxl_region *cxlr)
+{
+	return xa_load(&port->regions, (unsigned long)cxlr);
+}
+
+static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	int i;
+
+	for (i = count - 1; i >= 0; i--) {
+		struct cxl_endpoint_decoder *cxled = p->targets[i];
+		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+		struct cxl_port *iter = cxled_to_port(cxled);
+		struct cxl_ep *ep;
+		int rc;
+
+		while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
+			iter = to_cxl_port(iter->dev.parent);
+
+		for (ep = cxl_ep_load(iter, cxlmd); iter;
+		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
+			struct cxl_region_ref *cxl_rr;
+			struct cxl_decoder *cxld;
+
+			cxl_rr = cxl_rr_load(iter, cxlr);
+			cxld = cxl_rr->decoder;
+			rc = cxld->reset(cxld);
+			if (rc)
+				return rc;
+		}
+
+		rc = cxled->cxld.reset(&cxled->cxld);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
+static int cxl_region_decode_commit(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	int i, rc;
+
+	for (i = 0; i < p->nr_targets; i++) {
+		struct cxl_endpoint_decoder *cxled = p->targets[i];
+		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+		struct cxl_region_ref *cxl_rr;
+		struct cxl_decoder *cxld;
+		struct cxl_port *iter;
+		struct cxl_ep *ep;
+
+		/* commit bottom up */
+		for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
+		     iter = to_cxl_port(iter->dev.parent)) {
+			cxl_rr = cxl_rr_load(iter, cxlr);
+			cxld = cxl_rr->decoder;
+			rc = cxld->commit(cxld);
+			if (rc)
+				break;
+		}
+
+		if (is_cxl_root(iter))
+			continue;
+
+		/* teardown top down */
+		for (ep = cxl_ep_load(iter, cxlmd); ep && iter;
+		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
+			cxl_rr = cxl_rr_load(iter, cxlr);
+			cxld = cxl_rr->decoder;
+			cxld->reset(cxld);
+		}
+
+		cxled->cxld.reset(&cxled->cxld);
+		if (i == 0)
+			return rc;
+		break;
+	}
+
+	if (i >= p->nr_targets)
+		return 0;
+
+	/* undo the targets that were successfully committed */
+	cxl_region_decode_reset(cxlr, i);
+	return rc;
+}
+
+static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t len)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	bool commit;
+	ssize_t rc;
+
+	rc = kstrtobool(buf, &commit);
+	if (rc)
+		return rc;
+
+	rc = down_write_killable(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+
+	/* Already in the requested state? */
+	if (commit && p->state >= CXL_CONFIG_COMMIT)
+		goto out;
+	if (!commit && p->state < CXL_CONFIG_COMMIT)
+		goto out;
+
+	/* Not ready to commit? */
+	if (commit && p->state < CXL_CONFIG_ACTIVE) {
+		rc = -ENXIO;
+		goto out;
+	}
+
+	if (commit)
+		rc = cxl_region_decode_commit(cxlr);
+	else {
+		p->state = CXL_CONFIG_RESET_PENDING;
+		up_write(&cxl_region_rwsem);
+		device_release_driver(&cxlr->dev);
+		down_write(&cxl_region_rwsem);
+
+		if (p->state == CXL_CONFIG_RESET_PENDING)
+			rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
+	}
+
+	if (rc)
+		goto out;
+
+	if (commit)
+		p->state = CXL_CONFIG_COMMIT;
+	else if (p->state == CXL_CONFIG_RESET_PENDING)
+		p->state = CXL_CONFIG_ACTIVE;
+
+out:
+	up_write(&cxl_region_rwsem);
+
+	if (rc)
+		return rc;
+	return len;
+}
+
+static ssize_t commit_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	ssize_t rc;
+
+	rc = down_read_interruptible(&cxl_region_rwsem);
+	if (rc)
+		return rc;
+	rc = sysfs_emit(buf, "%d\n", p->state >= CXL_CONFIG_COMMIT);
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+static DEVICE_ATTR_RW(commit);
+
 static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
 				  int n)
 {
@@ -388,6 +550,7 @@ static DEVICE_ATTR_RW(size);
 
 static struct attribute *cxl_region_attrs[] = {
 	&dev_attr_uuid.attr,
+	&dev_attr_commit.attr,
 	&dev_attr_interleave_ways.attr,
 	&dev_attr_interleave_granularity.attr,
 	&dev_attr_resource.attr,
@@ -649,12 +812,6 @@ static int cxl_port_attach_region(struct cxl_port *port,
 	return rc;
 }
 
-static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port,
-					  struct cxl_region *cxlr)
-{
-	return xa_load(&port->regions, (unsigned long)cxlr);
-}
-
 static void cxl_port_detach_region(struct cxl_port *port,
 				   struct cxl_region *cxlr,
 				   struct cxl_endpoint_decoder *cxled)
@@ -1039,20 +1196,32 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 	return rc;
 }
 
-static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
+static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
 {
 	struct cxl_port *iter, *ep_port = cxled_to_port(cxled);
 	struct cxl_region *cxlr = cxled->cxld.region;
 	struct cxl_region_params *p;
+	int rc = 0;
 
 	lockdep_assert_held_write(&cxl_region_rwsem);
 
 	if (!cxlr)
-		return;
+		return 0;
 
 	p = &cxlr->params;
 	get_device(&cxlr->dev);
 
+	if (p->state > CXL_CONFIG_ACTIVE) {
+		/*
+		 * TODO: tear down all impacted regions if a device is
+		 * removed out of order
+		 */
+		rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
+		if (rc)
+			goto out;
+		p->state = CXL_CONFIG_ACTIVE;
+	}
+
 	for (iter = ep_port; !is_cxl_root(iter);
 	     iter = to_cxl_port(iter->dev.parent))
 		cxl_port_detach_region(iter, cxlr, cxled);
@@ -1080,6 +1249,7 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
 	down_write(&cxl_region_rwsem);
 out:
 	put_device(&cxlr->dev);
+	return rc;
 }
 
 void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
@@ -1137,8 +1307,7 @@ static int detach_target(struct cxl_region *cxlr, int pos)
 		goto out;
 	}
 
-	cxl_region_detach(p->targets[pos]);
-	rc = 0;
+	rc = cxl_region_detach(p->targets[pos]);
 out:
 	up_write(&cxl_region_rwsem);
 	return rc;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index a93d7c4efd1a..fc14f6805f2c 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -54,6 +54,7 @@
 #define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
 #define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
 #define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
+#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
 #define   CXL_HDM_DECODER0_CTRL_TYPE BIT(12)
 #define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
 #define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
@@ -257,6 +258,8 @@ enum cxl_decoder_type {
  * @target_type: accelerator vs expander (type2 vs type3) selector
  * @region: currently assigned region for this decoder
  * @flags: memory type capabilities and locking
+ * @commit: device/decoder-type specific callback to commit settings to hw
+ * @commit: device/decoder-type specific callback to reset hw settings
 */
 struct cxl_decoder {
 	struct device dev;
@@ -267,6 +270,8 @@ struct cxl_decoder {
 	enum cxl_decoder_type target_type;
 	struct cxl_region *region;
 	unsigned long flags;
+	int (*commit)(struct cxl_decoder *cxld);
+	int (*reset)(struct cxl_decoder *cxld);
 };
 
 /*
@@ -332,11 +337,15 @@ struct cxl_root_decoder {
  * changes to interleave_ways or interleave_granularity
  * @CXL_CONFIG_ACTIVE: All targets have been added the region is now
  * active
+ * @CXL_CONFIG_RESET_PENDING: see commit_store()
+ * @CXL_CONFIG_COMMIT: Soft-config has been committed to hardware
  */
 enum cxl_config_state {
 	CXL_CONFIG_IDLE,
 	CXL_CONFIG_INTERLEAVE_ACTIVE,
 	CXL_CONFIG_ACTIVE,
+	CXL_CONFIG_RESET_PENDING,
+	CXL_CONFIG_COMMIT,
 };
 
 /**
@@ -418,6 +427,7 @@ struct cxl_nvdimm {
  * @parent_dport: dport that points to this port in the parent
  * @decoder_ida: allocator for decoder ids
  * @dpa_end: cursor to track highest allocated decoder for allocation ordering
+ * @commit_end: cursor to track highest committed decoder for commit ordering
  * @component_reg_phys: component register capability base address (optional)
  * @dead: last ep has been removed, force port re-creation
  * @depth: How deep this port is relative to the root. depth 0 is the root.
@@ -433,6 +443,7 @@ struct cxl_port {
 	struct cxl_dport *parent_dport;
 	struct ida decoder_ida;
 	int dpa_end;
+	int commit_end;
 	resource_size_t component_reg_phys;
 	bool dead;
 	unsigned int depth;
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 51d517fa62ee..94653201631c 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -429,6 +429,50 @@ static int map_targets(struct device *dev, void *data)
 	return 0;
 }
 
+static int mock_decoder_commit(struct cxl_decoder *cxld)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+	int id = cxld->id;
+
+	if (cxld->flags & CXL_DECODER_F_ENABLE)
+		return 0;
+
+	dev_dbg(&port->dev, "%s commit\n", dev_name(&cxld->dev));
+	if (port->commit_end + 1 != id) {
+		dev_dbg(&port->dev,
+			"%s: out of order commit, expected decoder%d.%d\n",
+			dev_name(&cxld->dev), port->id, port->commit_end + 1);
+		return -EBUSY;
+	}
+
+	port->commit_end++;
+	cxld->flags |= CXL_DECODER_F_ENABLE;
+
+	return 0;
+}
+
+static int mock_decoder_reset(struct cxl_decoder *cxld)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+	int id = cxld->id;
+
+	if ((cxld->flags & CXL_DECODER_F_ENABLE) ==  0)
+		return 0;
+
+	dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev));
+	if (port->commit_end != id) {
+		dev_dbg(&port->dev,
+			"%s: out of order reset, expected decoder%d.%d\n",
+			dev_name(&cxld->dev), port->id, port->commit_end);
+		return -EBUSY;
+	}
+
+	port->commit_end--;
+	cxld->flags &= ~CXL_DECODER_F_ENABLE;
+
+	return 0;
+}
+
 static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 {
 	struct cxl_port *port = cxlhdm->port;
@@ -482,6 +526,8 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 		cxld->interleave_ways = min_not_zero(target_count, 1);
 		cxld->interleave_granularity = SZ_4K;
 		cxld->target_type = CXL_DECODER_EXPANDER;
+		cxld->commit = mock_decoder_commit;
+		cxld->reset = mock_decoder_reset;
 
 		if (target_count) {
 			rc = device_for_each_child(port->uport, &ctx,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 43/46] cxl/region: Add region driver boiler plate
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (41 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 42/46] cxl/hdm: Commit decoder state to hardware Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 17:09   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute Dan Williams
                   ` (5 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams, Ben Widawsky

The CXL region driver is responsible for routing fully formed CXL
regions to one of libnvdimm, for persistent memory regions, device-dax
for volatile memory regions, or just act as an enumeration placeholder
if the region was setup and configuration locked by platform firmware.
In the platform-firmware-setup case the expectation is that region is
already accounted in the system memory map, i.e. already enabled as
"System RAM".

For now, just attach to CXL regions in the CXL_CONFIG_COMMIT state, and
take no further action.

Given this driver is just a small / simple router, include it in the
core rather than its own module.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/core.h   | 12 +++++++++++
 drivers/cxl/core/port.c   |  9 ++++++++
 drivers/cxl/core/region.c | 45 ++++++++++++++++++++++++++++++++++++++-
 drivers/cxl/cxl.h         |  1 +
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 6f5c4fb85879..be5198ab8f3b 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -13,17 +13,29 @@ extern struct attribute_group cxl_base_attribute_group;
 extern struct device_attribute dev_attr_create_pmem_region;
 extern struct device_attribute dev_attr_delete_region;
 extern struct device_attribute dev_attr_region;
+extern const struct device_type cxl_region_type;
 void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled);
+int cxl_region_init(void);
+void cxl_region_exit(void);
 /*
  * Note must be used at the end of an attribute list, since it
  * terminates the list in the CONFIG_CXL_REGION=n case.
  */
 #define CXL_REGION_ATTR(x) (&dev_attr_##x.attr)
+#define CXL_REGION_TYPE(x) (&cxl_region_type)
 #else
 static inline void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
 {
 }
+static inline int cxl_region_init(void)
+{
+	return 0;
+}
+static inline void cxl_region_exit(void)
+{
+}
 #define CXL_REGION_ATTR(x) NULL
+#define CXL_REGION_TYPE(x) NULL
 #endif
 
 struct cxl_send_command;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index eee1615d2319..00add9e0b192 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -51,6 +51,8 @@ static int cxl_device_id(struct device *dev)
 	}
 	if (is_cxl_memdev(dev))
 		return CXL_DEVICE_MEMORY_EXPANDER;
+	if (dev->type == CXL_REGION_TYPE())
+		return CXL_DEVICE_REGION;
 	return 0;
 }
 
@@ -1867,8 +1869,14 @@ static __init int cxl_core_init(void)
 	if (rc)
 		goto err_bus;
 
+	rc = cxl_region_init();
+	if (rc)
+		goto err_region;
+
 	return 0;
 
+err_region:
+	bus_unregister(&cxl_bus_type);
 err_bus:
 	destroy_workqueue(cxl_bus_wq);
 err_wq:
@@ -1878,6 +1886,7 @@ static __init int cxl_core_init(void)
 
 static void cxl_core_exit(void)
 {
+	cxl_region_exit();
 	bus_unregister(&cxl_bus_type);
 	destroy_workqueue(cxl_bus_wq);
 	cxl_memdev_exit();
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index b90160c4f975..cd1848d4c8fe 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1416,7 +1416,7 @@ static void cxl_region_release(struct device *dev)
 	kfree(cxlr);
 }
 
-static const struct device_type cxl_region_type = {
+const struct device_type cxl_region_type = {
 	.name = "cxl_region",
 	.release = cxl_region_release,
 	.groups = region_groups
@@ -1614,4 +1614,47 @@ static ssize_t delete_region_store(struct device *dev,
 }
 DEVICE_ATTR_WO(delete_region);
 
+static int cxl_region_probe(struct device *dev)
+{
+	struct cxl_region *cxlr = to_cxl_region(dev);
+	struct cxl_region_params *p = &cxlr->params;
+	int rc;
+
+	rc = down_read_interruptible(&cxl_region_rwsem);
+	if (rc) {
+		dev_dbg(&cxlr->dev, "probe interrupted\n");
+		return rc;
+	}
+
+	if (p->state < CXL_CONFIG_COMMIT) {
+		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
+		rc = -ENXIO;
+	}
+
+	/*
+	 * From this point on any path that changes the region's state away from
+	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
+	 */
+	up_read(&cxl_region_rwsem);
+
+	return rc;
+}
+
+static struct cxl_driver cxl_region_driver = {
+	.name = "cxl_region",
+	.probe = cxl_region_probe,
+	.id = CXL_DEVICE_REGION,
+};
+
+int cxl_region_init(void)
+{
+	return cxl_driver_register(&cxl_region_driver);
+}
+
+void cxl_region_exit(void)
+{
+	cxl_driver_unregister(&cxl_region_driver);
+}
+
 MODULE_IMPORT_NS(CXL);
+MODULE_ALIAS_CXL(CXL_DEVICE_REGION);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index fc14f6805f2c..734b4479feb2 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -586,6 +586,7 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv);
 #define CXL_DEVICE_PORT			3
 #define CXL_DEVICE_ROOT			4
 #define CXL_DEVICE_MEMORY_EXPANDER	5
+#define CXL_DEVICE_REGION		6
 
 #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
 #define CXL_MODALIAS_FMT "cxl:t%d"
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (42 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 43/46] cxl/region: Add region driver boiler plate Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 17:10   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge Dan Williams
                   ` (4 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

While there is a need to go from a LIBNVDIMM 'struct nvdimm' to a CXL
'struct cxl_nvdimm', there is no use case to go the other direction.
Likely this is a leftover from an early version of the referenced commit
before it implemented devm for releasing the created nvdimm.

Fixes: 21083f51521f ("cxl/pmem: Register 'pmem' / cxl_nvdimm devices")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxl.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 734b4479feb2..d6ff6337aa49 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -411,7 +411,6 @@ struct cxl_nvdimm_bridge {
 struct cxl_nvdimm {
 	struct device dev;
 	struct cxl_memdev *cxlmd;
-	struct nvdimm *nvdimm;
 };
 
 /**
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (43 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 17:14   ` Jonathan Cameron
  2022-06-24  4:19 ` [PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects Dan Williams
                   ` (3 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams

Be careful to only disable cxl_pmem objects related to a given
cxl_nvdimm_bridge. Otherwise, offline_nvdimm_bus() reaches across CXL
domains and disables more than is expected.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxl.h  |  1 +
 drivers/cxl/pmem.c | 21 +++++++++++++++++----
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index d6ff6337aa49..95f486bc1b41 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -411,6 +411,7 @@ struct cxl_nvdimm_bridge {
 struct cxl_nvdimm {
 	struct device dev;
 	struct cxl_memdev *cxlmd;
+	struct cxl_nvdimm_bridge *bridge;
 };
 
 /**
diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c
index 0aaa70b4e0f7..b271f6e90b91 100644
--- a/drivers/cxl/pmem.c
+++ b/drivers/cxl/pmem.c
@@ -26,7 +26,10 @@ static void clear_exclusive(void *cxlds)
 
 static void unregister_nvdimm(void *nvdimm)
 {
+	struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
+
 	nvdimm_delete(nvdimm);
+	cxl_nvd->bridge = NULL;
 }
 
 static int cxl_nvdimm_probe(struct device *dev)
@@ -66,6 +69,7 @@ static int cxl_nvdimm_probe(struct device *dev)
 	}
 
 	dev_set_drvdata(dev, nvdimm);
+	cxl_nvd->bridge = cxl_nvb;
 	rc = devm_add_action_or_reset(dev, unregister_nvdimm, nvdimm);
 out:
 	device_unlock(&cxl_nvb->dev);
@@ -204,15 +208,23 @@ static bool online_nvdimm_bus(struct cxl_nvdimm_bridge *cxl_nvb)
 	return cxl_nvb->nvdimm_bus != NULL;
 }
 
-static int cxl_nvdimm_release_driver(struct device *dev, void *data)
+static int cxl_nvdimm_release_driver(struct device *dev, void *cxl_nvb)
 {
+	struct cxl_nvdimm *cxl_nvd;
+
 	if (!is_cxl_nvdimm(dev))
 		return 0;
+
+	cxl_nvd = to_cxl_nvdimm(dev);
+	if (cxl_nvd->bridge != cxl_nvb)
+		return 0;
+
 	device_release_driver(dev);
 	return 0;
 }
 
-static void offline_nvdimm_bus(struct nvdimm_bus *nvdimm_bus)
+static void offline_nvdimm_bus(struct cxl_nvdimm_bridge *cxl_nvb,
+			       struct nvdimm_bus *nvdimm_bus)
 {
 	if (!nvdimm_bus)
 		return;
@@ -222,7 +234,8 @@ static void offline_nvdimm_bus(struct nvdimm_bus *nvdimm_bus)
 	 * nvdimm_bus_unregister() rips the nvdimm objects out from
 	 * underneath them.
 	 */
-	bus_for_each_dev(&cxl_bus_type, NULL, NULL, cxl_nvdimm_release_driver);
+	bus_for_each_dev(&cxl_bus_type, NULL, cxl_nvb,
+			 cxl_nvdimm_release_driver);
 	nvdimm_bus_unregister(nvdimm_bus);
 }
 
@@ -260,7 +273,7 @@ static void cxl_nvb_update_state(struct work_struct *work)
 
 		dev_dbg(&cxl_nvb->dev, "rescan: %d\n", rc);
 	}
-	offline_nvdimm_bus(victim_bus);
+	offline_nvdimm_bus(cxl_nvb, victim_bus);
 
 	put_device(&cxl_nvb->dev);
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (44 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge Dan Williams
@ 2022-06-24  4:19 ` Dan Williams
  2022-06-30 17:34   ` Jonathan Cameron
  2022-06-24 15:13 ` [PATCH 00/46] CXL PMEM Region Provisioning Jonathan Cameron
                   ` (2 subsequent siblings)
  48 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24  4:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: nvdimm, linux-pci, patches, hch, Dan Williams, Ben Widawsky

The LIBNVDIMM subsystem is a platform agnostic representation of system
NVDIMM / persistent memory resources. To date, the CXL subsystem's
interaction with LIBNVDIMM has been to register an nvdimm-bridge device
and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM
subsystem mechanics.

With regions the approach is the same. Create a new cxl_pmem_region
object to proxy CXL region details into a LIBNVDIMM definition. With
this enabling LIBNVDIMM can partition CXL persistent memory regions with
legacy namespace labels. A follow-on patch will add CXL region label and
CXL namespace label support to persist region configurations across
driver reload / system-reset events.

Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/core.h      |   3 +
 drivers/cxl/core/pmem.c      |   4 +-
 drivers/cxl/core/port.c      |   2 +
 drivers/cxl/core/region.c    | 139 ++++++++++++++++++++-
 drivers/cxl/cxl.h            |  36 +++++-
 drivers/cxl/pmem.c           | 235 ++++++++++++++++++++++++++++++++++-
 drivers/nvdimm/region_devs.c |  28 +++--
 include/linux/libnvdimm.h    |   5 +
 8 files changed, 440 insertions(+), 12 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index be5198ab8f3b..f5c5b041e8a5 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -13,6 +13,7 @@ extern struct attribute_group cxl_base_attribute_group;
 extern struct device_attribute dev_attr_create_pmem_region;
 extern struct device_attribute dev_attr_delete_region;
 extern struct device_attribute dev_attr_region;
+extern const struct device_type cxl_pmem_region_type;
 extern const struct device_type cxl_region_type;
 void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled);
 int cxl_region_init(void);
@@ -23,6 +24,7 @@ void cxl_region_exit(void);
  */
 #define CXL_REGION_ATTR(x) (&dev_attr_##x.attr)
 #define CXL_REGION_TYPE(x) (&cxl_region_type)
+#define CXL_PMEM_REGION_TYPE(x) (&cxl_pmem_region_type)
 #else
 static inline void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
 {
@@ -36,6 +38,7 @@ static inline void cxl_region_exit(void)
 }
 #define CXL_REGION_ATTR(x) NULL
 #define CXL_REGION_TYPE(x) NULL
+#define CXL_PMEM_REGION_TYPE(x) NULL
 #endif
 
 struct cxl_send_command;
diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c
index bec7cfb54ebf..1d12a8206444 100644
--- a/drivers/cxl/core/pmem.c
+++ b/drivers/cxl/core/pmem.c
@@ -62,9 +62,9 @@ static int match_nvdimm_bridge(struct device *dev, void *data)
 	return is_cxl_nvdimm_bridge(dev);
 }
 
-struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_nvdimm *cxl_nvd)
+struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct device *start)
 {
-	struct cxl_port *port = find_cxl_root(&cxl_nvd->dev);
+	struct cxl_port *port = find_cxl_root(start);
 	struct device *dev;
 
 	if (!port)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 00add9e0b192..e13cd012ed22 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -44,6 +44,8 @@ static int cxl_device_id(struct device *dev)
 		return CXL_DEVICE_NVDIMM_BRIDGE;
 	if (dev->type == &cxl_nvdimm_type)
 		return CXL_DEVICE_NVDIMM;
+	if (dev->type == CXL_PMEM_REGION_TYPE())
+		return CXL_DEVICE_PMEM_REGION;
 	if (is_cxl_port(dev)) {
 		if (is_cxl_root(to_cxl_port(dev)))
 			return CXL_DEVICE_ROOT;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index cd1848d4c8fe..70e9baef95f7 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1614,6 +1614,136 @@ static ssize_t delete_region_store(struct device *dev,
 }
 DEVICE_ATTR_WO(delete_region);
 
+static void cxl_pmem_region_release(struct device *dev)
+{
+	struct cxl_pmem_region *cxlr_pmem = to_cxl_pmem_region(dev);
+	int i;
+
+	for (i = 0; i < cxlr_pmem->nr_mappings; i++) {
+		struct cxl_memdev *cxlmd = cxlr_pmem->mapping[i].cxlmd;
+
+		put_device(&cxlmd->dev);
+	}
+
+	kfree(cxlr_pmem);
+}
+
+static const struct attribute_group *cxl_pmem_region_attribute_groups[] = {
+	&cxl_base_attribute_group,
+	NULL,
+};
+
+const struct device_type cxl_pmem_region_type = {
+	.name = "cxl_pmem_region",
+	.release = cxl_pmem_region_release,
+	.groups = cxl_pmem_region_attribute_groups,
+};
+
+bool is_cxl_pmem_region(struct device *dev)
+{
+	return dev->type == &cxl_pmem_region_type;
+}
+EXPORT_SYMBOL_NS_GPL(is_cxl_pmem_region, CXL);
+
+struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev)
+{
+	if (dev_WARN_ONCE(dev, !is_cxl_pmem_region(dev),
+			  "not a cxl_pmem_region device\n"))
+		return NULL;
+	return container_of(dev, struct cxl_pmem_region, dev);
+}
+EXPORT_SYMBOL_NS_GPL(to_cxl_pmem_region, CXL);
+
+static struct lock_class_key cxl_pmem_region_key;
+
+static struct cxl_pmem_region *cxl_pmem_region_alloc(struct cxl_region *cxlr)
+{
+	struct cxl_pmem_region *cxlr_pmem = ERR_PTR(-ENXIO);
+	struct cxl_region_params *p = &cxlr->params;
+	struct device *dev;
+	int i;
+
+	down_read(&cxl_region_rwsem);
+	if (p->state != CXL_CONFIG_COMMIT)
+		goto out;
+	cxlr_pmem = kzalloc(struct_size(cxlr_pmem, mapping, p->nr_targets),
+			    GFP_KERNEL);
+	if (!cxlr_pmem) {
+		cxlr_pmem = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	cxlr_pmem->hpa_range.start = p->res->start;
+	cxlr_pmem->hpa_range.end = p->res->end;
+
+	/* Snapshot the region configuration underneath the cxl_region_rwsem */
+	cxlr_pmem->nr_mappings = p->nr_targets;
+	for (i = 0; i < p->nr_targets; i++) {
+		struct cxl_endpoint_decoder *cxled = p->targets[i];
+		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+		struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
+
+		m->cxlmd = cxlmd;
+		get_device(&cxlmd->dev);
+		m->start = cxled->dpa_res->start;
+		m->size = resource_size(cxled->dpa_res);
+		m->position = i;
+	}
+
+	dev = &cxlr_pmem->dev;
+	cxlr_pmem->cxlr = cxlr;
+	device_initialize(dev);
+	lockdep_set_class(&dev->mutex, &cxl_pmem_region_key);
+	device_set_pm_not_required(dev);
+	dev->parent = &cxlr->dev;
+	dev->bus = &cxl_bus_type;
+	dev->type = &cxl_pmem_region_type;
+out:
+	up_read(&cxl_region_rwsem);
+
+	return cxlr_pmem;
+}
+
+static void cxlr_pmem_unregister(void *dev)
+{
+	device_unregister(dev);
+}
+
+/**
+ * devm_cxl_add_pmem_region() - add a cxl_region to nd_region bridge
+ * @host: same host as @cxlmd
+ *
+ * Return: 0 on success negative error code on failure.
+ */
+static int devm_cxl_add_pmem_region(struct cxl_region *cxlr)
+{
+	struct cxl_pmem_region *cxlr_pmem;
+	struct device *dev;
+	int rc;
+
+	cxlr_pmem = cxl_pmem_region_alloc(cxlr);
+	if (IS_ERR(cxlr_pmem))
+		return PTR_ERR(cxlr_pmem);
+
+	dev = &cxlr_pmem->dev;
+	rc = dev_set_name(dev, "pmem_region%d", cxlr->id);
+	if (rc)
+		goto err;
+
+	rc = device_add(dev);
+	if (rc)
+		goto err;
+
+	dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent),
+		dev_name(dev));
+
+	return devm_add_action_or_reset(&cxlr->dev, cxlr_pmem_unregister, dev);
+
+err:
+	put_device(dev);
+	return rc;
+}
+
 static int cxl_region_probe(struct device *dev)
 {
 	struct cxl_region *cxlr = to_cxl_region(dev);
@@ -1637,7 +1767,14 @@ static int cxl_region_probe(struct device *dev)
 	 */
 	up_read(&cxl_region_rwsem);
 
-	return rc;
+	switch (cxlr->mode) {
+	case CXL_DECODER_PMEM:
+		return devm_cxl_add_pmem_region(cxlr);
+	default:
+		dev_dbg(&cxlr->dev, "unsupported region mode: %d\n",
+			cxlr->mode);
+		return -ENXIO;
+	}
 }
 
 static struct cxl_driver cxl_region_driver = {
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 95f486bc1b41..bf878509bed4 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -412,6 +412,25 @@ struct cxl_nvdimm {
 	struct device dev;
 	struct cxl_memdev *cxlmd;
 	struct cxl_nvdimm_bridge *bridge;
+	struct cxl_pmem_region *region;
+};
+
+struct cxl_pmem_region_mapping {
+	struct cxl_memdev *cxlmd;
+	struct cxl_nvdimm *cxl_nvd;
+	u64 start;
+	u64 size;
+	int position;
+};
+
+struct cxl_pmem_region {
+	struct device dev;
+	struct cxl_region *cxlr;
+	struct nd_region *nd_region;
+	struct cxl_nvdimm_bridge *bridge;
+	struct range hpa_range;
+	int nr_mappings;
+	struct cxl_pmem_region_mapping mapping[];
 };
 
 /**
@@ -587,6 +606,7 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv);
 #define CXL_DEVICE_ROOT			4
 #define CXL_DEVICE_MEMORY_EXPANDER	5
 #define CXL_DEVICE_REGION		6
+#define CXL_DEVICE_PMEM_REGION		7
 
 #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
 #define CXL_MODALIAS_FMT "cxl:t%d"
@@ -598,7 +618,21 @@ struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev);
 bool is_cxl_nvdimm(struct device *dev);
 bool is_cxl_nvdimm_bridge(struct device *dev);
 int devm_cxl_add_nvdimm(struct device *host, struct cxl_memdev *cxlmd);
-struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_nvdimm *cxl_nvd);
+struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct device *dev);
+
+#ifdef CONFIG_CXL_REGION
+bool is_cxl_pmem_region(struct device *dev);
+struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
+#else
+static inline bool is_cxl_pmem_region(struct device *dev)
+{
+	return false;
+}
+static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev)
+{
+	return NULL;
+}
+#endif
 
 /*
  * Unit test builds overrides this to __weak, find the 'strong' version
diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c
index b271f6e90b91..4ba7248275ac 100644
--- a/drivers/cxl/pmem.c
+++ b/drivers/cxl/pmem.c
@@ -7,6 +7,7 @@
 #include <linux/ndctl.h>
 #include <linux/async.h>
 #include <linux/slab.h>
+#include <linux/nd.h>
 #include "cxlmem.h"
 #include "cxl.h"
 
@@ -27,6 +28,19 @@ static void clear_exclusive(void *cxlds)
 static void unregister_nvdimm(void *nvdimm)
 {
 	struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
+	struct cxl_nvdimm_bridge *cxl_nvb = cxl_nvd->bridge;
+	struct cxl_pmem_region *cxlr_pmem;
+
+	device_lock(&cxl_nvb->dev);
+	cxlr_pmem = cxl_nvd->region;
+	dev_set_drvdata(&cxl_nvd->dev, NULL);
+	cxl_nvd->region = NULL;
+	device_unlock(&cxl_nvb->dev);
+
+	if (cxlr_pmem) {
+		device_release_driver(&cxlr_pmem->dev);
+		put_device(&cxlr_pmem->dev);
+	}
 
 	nvdimm_delete(nvdimm);
 	cxl_nvd->bridge = NULL;
@@ -42,7 +56,7 @@ static int cxl_nvdimm_probe(struct device *dev)
 	struct nvdimm *nvdimm;
 	int rc;
 
-	cxl_nvb = cxl_find_nvdimm_bridge(cxl_nvd);
+	cxl_nvb = cxl_find_nvdimm_bridge(dev);
 	if (!cxl_nvb)
 		return -ENXIO;
 
@@ -223,6 +237,21 @@ static int cxl_nvdimm_release_driver(struct device *dev, void *cxl_nvb)
 	return 0;
 }
 
+static int cxl_pmem_region_release_driver(struct device *dev, void *cxl_nvb)
+{
+	struct cxl_pmem_region *cxlr_pmem;
+
+	if (!is_cxl_pmem_region(dev))
+		return 0;
+
+	cxlr_pmem = to_cxl_pmem_region(dev);
+	if (cxlr_pmem->bridge != cxl_nvb)
+		return 0;
+
+	device_release_driver(dev);
+	return 0;
+}
+
 static void offline_nvdimm_bus(struct cxl_nvdimm_bridge *cxl_nvb,
 			       struct nvdimm_bus *nvdimm_bus)
 {
@@ -234,6 +263,8 @@ static void offline_nvdimm_bus(struct cxl_nvdimm_bridge *cxl_nvb,
 	 * nvdimm_bus_unregister() rips the nvdimm objects out from
 	 * underneath them.
 	 */
+	bus_for_each_dev(&cxl_bus_type, NULL, cxl_nvb,
+			 cxl_pmem_region_release_driver);
 	bus_for_each_dev(&cxl_bus_type, NULL, cxl_nvb,
 			 cxl_nvdimm_release_driver);
 	nvdimm_bus_unregister(nvdimm_bus);
@@ -328,6 +359,200 @@ static struct cxl_driver cxl_nvdimm_bridge_driver = {
 	.id = CXL_DEVICE_NVDIMM_BRIDGE,
 };
 
+static int match_cxl_nvdimm(struct device *dev, void *data)
+{
+	return is_cxl_nvdimm(dev);
+}
+
+static void unregister_region(void *nd_region)
+{
+	struct cxl_nvdimm_bridge *cxl_nvb;
+	struct cxl_pmem_region *cxlr_pmem;
+	int i;
+
+	cxlr_pmem = nd_region_provider_data(nd_region);
+	cxl_nvb = cxlr_pmem->bridge;
+	device_lock(&cxl_nvb->dev);
+	for (i = 0; i < cxlr_pmem->nr_mappings; i++) {
+		struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
+		struct cxl_nvdimm *cxl_nvd = m->cxl_nvd;
+
+		if (cxl_nvd->region) {
+			put_device(&cxlr_pmem->dev);
+			cxl_nvd->region = NULL;
+		}
+	}
+	device_unlock(&cxl_nvb->dev);
+
+	nvdimm_region_delete(nd_region);
+}
+
+static void cxlr_pmem_remove_resource(void *res)
+{
+	remove_resource(res);
+}
+
+struct cxl_pmem_region_info {
+	u64 offset;
+	u64 serial;
+};
+
+static int cxl_pmem_region_probe(struct device *dev)
+{
+	struct nd_mapping_desc mappings[CXL_DECODER_MAX_INTERLEAVE];
+	struct cxl_pmem_region *cxlr_pmem = to_cxl_pmem_region(dev);
+	struct cxl_region *cxlr = cxlr_pmem->cxlr;
+	struct cxl_pmem_region_info *info = NULL;
+	struct cxl_nvdimm_bridge *cxl_nvb;
+	struct nd_interleave_set *nd_set;
+	struct nd_region_desc ndr_desc;
+	struct cxl_nvdimm *cxl_nvd;
+	struct nvdimm *nvdimm;
+	struct resource *res;
+	int rc = 0, i;
+
+	cxl_nvb = cxl_find_nvdimm_bridge(&cxlr_pmem->mapping[0].cxlmd->dev);
+	if (!cxl_nvb) {
+		dev_dbg(dev, "bridge not found\n");
+		return -ENXIO;
+	}
+	cxlr_pmem->bridge = cxl_nvb;
+
+	device_lock(&cxl_nvb->dev);
+	if (!cxl_nvb->nvdimm_bus) {
+		dev_dbg(dev, "nvdimm bus not found\n");
+		rc = -ENXIO;
+		goto out;
+	}
+
+	memset(&mappings, 0, sizeof(mappings));
+	memset(&ndr_desc, 0, sizeof(ndr_desc));
+
+	res = devm_kzalloc(dev, sizeof(*res), GFP_KERNEL);
+	if (!res) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	res->name = "Persistent Memory";
+	res->start = cxlr_pmem->hpa_range.start;
+	res->end = cxlr_pmem->hpa_range.end;
+	res->flags = IORESOURCE_MEM;
+	res->desc = IORES_DESC_PERSISTENT_MEMORY;
+
+	rc = insert_resource(&iomem_resource, res);
+	if (rc)
+		goto out;
+
+	rc = devm_add_action_or_reset(dev, cxlr_pmem_remove_resource, res);
+	if (rc)
+		goto out;
+
+	ndr_desc.res = res;
+	ndr_desc.provider_data = cxlr_pmem;
+
+	ndr_desc.numa_node = memory_add_physaddr_to_nid(res->start);
+	ndr_desc.target_node = phys_to_target_node(res->start);
+	if (ndr_desc.target_node == NUMA_NO_NODE) {
+		ndr_desc.target_node = ndr_desc.numa_node;
+		dev_dbg(&cxlr->dev, "changing target node from %d to %d",
+			NUMA_NO_NODE, ndr_desc.target_node);
+	}
+
+	nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL);
+	if (!nd_set) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	ndr_desc.memregion = cxlr->id;
+	set_bit(ND_REGION_CXL, &ndr_desc.flags);
+	set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
+
+	info = kmalloc_array(cxlr_pmem->nr_mappings, sizeof(*info), GFP_KERNEL);
+	if (!info)
+		goto out;
+
+	rc = -ENODEV;
+	for (i = 0; i < cxlr_pmem->nr_mappings; i++) {
+		struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
+		struct cxl_memdev *cxlmd = m->cxlmd;
+		struct cxl_dev_state *cxlds = cxlmd->cxlds;
+		struct device *d;
+
+		d = device_find_child(&cxlmd->dev, NULL, match_cxl_nvdimm);
+		if (!d) {
+			dev_dbg(dev, "[%d]: %s: no cxl_nvdimm found\n", i,
+				dev_name(&cxlmd->dev));
+			goto err;
+		}
+
+		/* safe to drop ref now with bridge lock held */
+		put_device(d);
+
+		cxl_nvd = to_cxl_nvdimm(d);
+		nvdimm = dev_get_drvdata(&cxl_nvd->dev);
+		if (!nvdimm) {
+			dev_dbg(dev, "[%d]: %s: no nvdimm found\n", i,
+				dev_name(&cxlmd->dev));
+			goto err;
+		}
+		cxl_nvd->region = cxlr_pmem;
+		get_device(&cxlr_pmem->dev);
+		m->cxl_nvd = cxl_nvd;
+		mappings[i] = (struct nd_mapping_desc) {
+			.nvdimm = nvdimm,
+			.start = m->start,
+			.size = m->size,
+			.position = i,
+		};
+		info[i].offset = m->start;
+		info[i].serial = cxlds->serial;
+	}
+	ndr_desc.num_mappings = cxlr_pmem->nr_mappings;
+	ndr_desc.mapping = mappings;
+
+	/*
+	 * TODO enable CXL labels which skip the need for 'interleave-set cookie'
+	 */
+	nd_set->cookie1 =
+		nd_fletcher64(info, sizeof(*info) * cxlr_pmem->nr_mappings, 0);
+	nd_set->cookie2 = nd_set->cookie1;
+	ndr_desc.nd_set = nd_set;
+
+	cxlr_pmem->nd_region =
+		nvdimm_pmem_region_create(cxl_nvb->nvdimm_bus, &ndr_desc);
+	if (IS_ERR(cxlr_pmem->nd_region)) {
+		rc = PTR_ERR(cxlr_pmem->nd_region);
+		goto err;
+	} else
+		rc = devm_add_action_or_reset(dev, unregister_region,
+					      cxlr_pmem->nd_region);
+out:
+	device_unlock(&cxl_nvb->dev);
+	put_device(&cxl_nvb->dev);
+	kfree(info);
+
+	if (rc)
+		dev_dbg(dev, "failed to create nvdimm region\n");
+	return rc;
+
+err:
+	for (i--; i >= 0; i--) {
+		nvdimm = mappings[i].nvdimm;
+		cxl_nvd = nvdimm_provider_data(nvdimm);
+		put_device(&cxl_nvd->region->dev);
+		cxl_nvd->region = NULL;
+	}
+	goto out;
+}
+
+static struct cxl_driver cxl_pmem_region_driver = {
+	.name = "cxl_pmem_region",
+	.probe = cxl_pmem_region_probe,
+	.id = CXL_DEVICE_PMEM_REGION,
+};
+
 /*
  * Return all bridges to the CXL_NVB_NEW state to invalidate any
  * ->state_work referring to the now destroyed cxl_pmem_wq.
@@ -372,8 +597,14 @@ static __init int cxl_pmem_init(void)
 	if (rc)
 		goto err_nvdimm;
 
+	rc = cxl_driver_register(&cxl_pmem_region_driver);
+	if (rc)
+		goto err_region;
+
 	return 0;
 
+err_region:
+	cxl_driver_unregister(&cxl_nvdimm_driver);
 err_nvdimm:
 	cxl_driver_unregister(&cxl_nvdimm_bridge_driver);
 err_bridge:
@@ -383,6 +614,7 @@ static __init int cxl_pmem_init(void)
 
 static __exit void cxl_pmem_exit(void)
 {
+	cxl_driver_unregister(&cxl_pmem_region_driver);
 	cxl_driver_unregister(&cxl_nvdimm_driver);
 	cxl_driver_unregister(&cxl_nvdimm_bridge_driver);
 	destroy_cxl_pmem_wq();
@@ -394,3 +626,4 @@ module_exit(cxl_pmem_exit);
 MODULE_IMPORT_NS(CXL);
 MODULE_ALIAS_CXL(CXL_DEVICE_NVDIMM_BRIDGE);
 MODULE_ALIAS_CXL(CXL_DEVICE_NVDIMM);
+MODULE_ALIAS_CXL(CXL_DEVICE_PMEM_REGION);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index d976260eca7a..473a71bbd9c9 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -133,7 +133,8 @@ static void nd_region_release(struct device *dev)
 		put_device(&nvdimm->dev);
 	}
 	free_percpu(nd_region->lane);
-	memregion_free(nd_region->id);
+	if (!test_bit(ND_REGION_CXL, &nd_region->flags))
+		memregion_free(nd_region->id);
 	kfree(nd_region);
 }
 
@@ -982,9 +983,14 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 
 	if (!nd_region)
 		return NULL;
-	nd_region->id = memregion_alloc(GFP_KERNEL);
-	if (nd_region->id < 0)
-		goto err_id;
+	/* CXL pre-assigns memregion ids before creating nvdimm regions */
+	if (test_bit(ND_REGION_CXL, &ndr_desc->flags)) {
+		nd_region->id = ndr_desc->memregion;
+	} else {
+		nd_region->id = memregion_alloc(GFP_KERNEL);
+		if (nd_region->id < 0)
+			goto err_id;
+	}
 
 	nd_region->lane = alloc_percpu(struct nd_percpu_lane);
 	if (!nd_region->lane)
@@ -1043,9 +1049,10 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 
 	return nd_region;
 
- err_percpu:
-	memregion_free(nd_region->id);
- err_id:
+err_percpu:
+	if (!test_bit(ND_REGION_CXL, &ndr_desc->flags))
+		memregion_free(nd_region->id);
+err_id:
 	kfree(nd_region);
 	return NULL;
 }
@@ -1068,6 +1075,13 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
 }
 EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 
+void nvdimm_region_delete(struct nd_region *nd_region)
+{
+	if (nd_region)
+		nd_device_unregister(&nd_region->dev, ND_SYNC);
+}
+EXPORT_SYMBOL_GPL(nvdimm_region_delete);
+
 int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
 {
 	int rc = 0;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 0d61e07b6827..c74acfa1a3fe 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -59,6 +59,9 @@ enum {
 	/* Platform provides asynchronous flush mechanism */
 	ND_REGION_ASYNC = 3,
 
+	/* Region was created by CXL subsystem */
+	ND_REGION_CXL = 4,
+
 	/* mark newly adjusted resources as requiring a label update */
 	DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -122,6 +125,7 @@ struct nd_region_desc {
 	int numa_node;
 	int target_node;
 	unsigned long flags;
+	int memregion;
 	struct device_node *of_node;
 	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 };
@@ -259,6 +263,7 @@ static inline struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus,
 			cmd_mask, num_flush, flush_wpq, NULL, NULL, NULL);
 }
 void nvdimm_delete(struct nvdimm *nvdimm);
+void nvdimm_region_delete(struct nd_region *nd_region);
 
 const struct nd_cmd_desc *nd_cmd_dimm_desc(int cmd);
 const struct nd_cmd_desc *nd_cmd_bus_desc(int cmd);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH 00/46] CXL PMEM Region Provisioning
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (45 preceding siblings ...)
  2022-06-24  4:19 ` [PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects Dan Williams
@ 2022-06-24 15:13 ` Jonathan Cameron
  2022-06-24 15:32   ` Dan Williams
  2022-06-28  3:12 ` Alison Schofield
  2022-07-02  2:26 ` Alison Schofield
  48 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-24 15:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ira Weiny, Christoph Hellwig, Jason Gunthorpe,
	Ben Widawsky, Alison Schofield, Matthew Wilcox, nvdimm,
	linux-pci, patches

On Thu, 23 Jun 2022 19:45:00 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> tl;dr: 46 patches is way too many patches to review in one sitting. Jump
> to the PATCH SUMMARY below to find a subset of interest to jump into.
> 
> The series is also posted on the 'preview' branch [1]. Note that branch
> rebases, the tip of that branch at time of posting is:
> 
> 7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects
> 
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview

Via a W=1 build some docs are out of sync with parameter names.
I'm lazy so I'll leave finding the right patch to you ;)
drivers/cxl/core/region.c:1490: warning: Function parameter or member 'type' not described in 'devm_cxl_add_region'
drivers/cxl/core/region.c:1719: warning: Function parameter or member 'cxlr' not described in 'devm_cxl_add_pmem_region'
drivers/cxl/core/region.c:1719: warning: Excess function parameter 'host' description in 'devm_cxl_add_pmem_region'

whilst here, docs for generic_nvdimm_flush() need updating to reflect
generic getting added to the name in 2019...

Jonathan

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 00/46] CXL PMEM Region Provisioning
  2022-06-24 15:13 ` [PATCH 00/46] CXL PMEM Region Provisioning Jonathan Cameron
@ 2022-06-24 15:32   ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24 15:32 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, Ira Weiny, Christoph Hellwig, Jason Gunthorpe,
	Ben Widawsky, Alison Schofield, Matthew Wilcox, nvdimm,
	linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:45:00 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > tl;dr: 46 patches is way too many patches to review in one sitting. Jump
> > to the PATCH SUMMARY below to find a subset of interest to jump into.
> > 
> > The series is also posted on the 'preview' branch [1]. Note that branch
> > rebases, the tip of that branch at time of posting is:
> > 
> > 7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects
> > 
> > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview
> 
> Via a W=1 build some docs are out of sync with parameter names.
> I'm lazy so I'll leave finding the right patch to you ;)
> drivers/cxl/core/region.c:1490: warning: Function parameter or member 'type' not described in 'devm_cxl_add_region'

Added:

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index f2a0ead20ca7..f5ca4f811463 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -84,6 +84,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i
  * @cxlrd: root decoder
  * @id: memregion id to create
  * @mode: mode for the endpoint decoders of this region
+ * @type: select whether this is an expander or accelerator (type-2 or type-3)
  *
  * This is the second step of region initialization. Regions exist within an
  * address space which is mapped by a @cxlrd.

...to patch 34.

> drivers/cxl/core/region.c:1719: warning: Function parameter or member 'cxlr' not described in 'devm_cxl_add_pmem_region'
> drivers/cxl/core/region.c:1719: warning: Excess function parameter 'host' description in 'devm_cxl_add_pmem_region'

Added:

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 808148eef557..fa209fb649f7 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1711,8 +1711,8 @@ static void cxlr_pmem_unregister(void *dev)
 }
 
 /**
- * devm_cxl_add_pmem_region() - add a cxl_region to nd_region bridge
- * @host: same host as @cxlmd
+ * devm_cxl_add_pmem_region() - add a cxl_region-to-nd_region bridge
+ * @cxlr: parent CXL region for this pmem region bridge device
  *
  * Return: 0 on success negative error code on failure.
  */

...to patch 46.

> whilst here, docs for generic_nvdimm_flush() need updating to reflect
> generic getting added to the name in 2019...

Sure, but not in this series.

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH 40/46] cxl/region: Attach endpoint decoders
  2022-06-24  4:19 ` [PATCH 40/46] cxl/region: Attach endpoint decoders Dan Williams
@ 2022-06-24 18:25   ` Jonathan Cameron
  2022-06-24 18:49     ` Dan Williams
  2022-06-24 20:51     ` Dan Williams
  2022-06-30 16:34   ` Jonathan Cameron
  1 sibling, 2 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-24 18:25 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:44 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> CXL regions (interleave sets) are made up of a set of memory devices
> where each device maps a portion of the interleave with one of its
> decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
> As endpoint decoders are identified by a provisioning tool they can be
> added to a region provided the region interleave properties are set
> (way, granularity, HPA) and DPA has been assigned to the decoder.
> 
> The attach event triggers several validation checks, for example:
> - is the DPA sized appropriately for the region
> - is the decoder reachable via the host-bridges identified by the
>   region's root decoder
> - is the device already active in a different region position slot
> - are there already regions with a higher HPA active on a given port
>   (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)
> 
> ...and the attach event affords an opportunity to collect data and
> resources relevant to later programming the target lists in switch
> decoders, for example:
> - allocate a decoder at each cxl_port in the decode chain
> - for a given switch port, how many the region's endpoints are hosted
>   through the port
> - how many unique targets (next hops) does a port need to map to reach
>   those endpoints
> 
> The act of reconciling this information and deploying it to the decoder
> configuration is saved for a follow-on patch.
Hi Dam,
n
Only managed to grab a few mins today to debug that crash.. So I know
the immediate cause but not yet why we got to that state.

Test case (happened to be one I had open) is 2x HB, 2x RP on each,
direct connected type 3s on all ports.

Manual test script is:

insmod modules/5.19.0-rc3+/kernel/drivers/cxl/core/cxl_core.ko
insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_acpi.ko
insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_port.ko
insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_pci.ko
insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_mem.ko
insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_pmem.ko

cd /sys/bus/cxl/devices/decoder0.0/
cat create_pmem_region
echo region0 > create_pmem_region

cd region0/
echo 4 > interleave_ways
echo $((256 << 22)) > size
echo 6a6b9b22-e0d4-11ec-9d64-0242ac120002 > uuid
ls -lh /sys/bus/cxl/devices/endpoint?/upo*

# Then figure out the order hopefully write the correct targets 
echo decoder5.0 > target0

Location of crash below...
No idea if these breadcrumbs will be much use. I'll poke
it some more next week. Have a good weekend,

Jonathan


> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/core.h   |   7 +
>  drivers/cxl/core/port.c   |  10 +-
>  drivers/cxl/core/region.c | 338 +++++++++++++++++++++++++++++++++++++-
>  drivers/cxl/cxl.h         |  20 +++
>  drivers/cxl/cxlmem.h      |   5 +
>  5 files changed, 372 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 36b6bd8dac2b..0e4e5c2d9452 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -42,6 +42,13 @@ resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
>  resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
>  extern struct rw_semaphore cxl_dpa_rwsem;
>  
> +bool is_switch_decoder(struct device *dev);
> +static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> +					 struct cxl_memdev *cxlmd)
> +{
> +	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
> +}
> +
>  int cxl_memdev_init(void);
>  void cxl_memdev_exit(void);
>  void cxl_mbox_init(void);
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 7756409d0a58..fde2a2e103d4 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -447,7 +447,7 @@ bool is_root_decoder(struct device *dev)
>  }
>  EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
>  
> -static bool is_switch_decoder(struct device *dev)
> +bool is_switch_decoder(struct device *dev)
>  {
>  	return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
>  }
> @@ -503,6 +503,7 @@ static void cxl_port_release(struct device *dev)
>  		cxl_ep_remove(port, ep);
>  	xa_destroy(&port->endpoints);
>  	xa_destroy(&port->dports);
> +	xa_destroy(&port->regions);
>  	ida_free(&cxl_port_ida, port->id);
>  	kfree(port);
>  }
> @@ -633,6 +634,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
>  	port->dpa_end = -1;
>  	xa_init(&port->dports);
>  	xa_init(&port->endpoints);
> +	xa_init(&port->regions);
>  
>  	device_initialize(dev);
>  	lockdep_set_class_and_subclass(&dev->mutex, &cxl_port_key, port->depth);
> @@ -1110,12 +1112,6 @@ static void reap_dports(struct cxl_port *port)
>  	}
>  }
>  
> -static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> -				  struct cxl_memdev *cxlmd)
> -{
> -	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
> -}
> -
>  int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd,
>  			  struct cxl_dport *parent_dport)
>  {
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 4830365f3857..65bf84abad57 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -428,6 +428,254 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
>  	return rc;
>  }
>  
> +static int match_free_decoder(struct device *dev, void *data)
> +{
> +	struct cxl_decoder *cxld;
> +	int *id = data;
> +
> +	if (!is_switch_decoder(dev))
> +		return 0;
> +
> +	cxld = to_cxl_decoder(dev);
> +
> +	/* enforce ordered allocation */
> +	if (cxld->id != *id)
> +		return 0;
> +
> +	if (!cxld->region)
> +		return 1;
> +
> +	(*id)++;
> +
> +	return 0;
> +}
> +
> +static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> +						   struct cxl_region *cxlr)
> +{
> +	struct device *dev;
> +	int id = 0;
> +
> +	dev = device_find_child(&port->dev, &id, match_free_decoder);
> +	if (!dev)
> +		return NULL;
> +	/*
> +	 * This decoder is pinned registered as long as the endpoint decoder is
> +	 * registered, and endpoint decoder unregistration holds the
> +	 * cxl_region_rwsem over unregister events, so no need to hold on to
> +	 * this extra reference.
> +	 */
> +	put_device(dev);
> +	return to_cxl_decoder(dev);
> +}
> +
> +static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> +					       struct cxl_region *cxlr)
> +{
> +	struct cxl_region_ref *cxl_rr;
> +
> +	cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL);
> +	if (!cxl_rr)
> +		return NULL;
> +	cxl_rr->port = port;
> +	cxl_rr->region = cxlr;
> +	xa_init(&cxl_rr->endpoints);
> +	return cxl_rr;
> +}
> +
> +static void free_region_ref(struct cxl_region_ref *cxl_rr)
> +{
> +	struct cxl_port *port = cxl_rr->port;
> +	struct cxl_region *cxlr = cxl_rr->region;
> +	struct cxl_decoder *cxld = cxl_rr->decoder;
> +
> +	dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n");
> +	if (cxld->region == cxlr) {
> +		cxld->region = NULL;
> +		put_device(&cxlr->dev);
> +	}
> +
> +	xa_erase(&port->regions, (unsigned long)cxlr);
> +	xa_destroy(&cxl_rr->endpoints);
> +	kfree(cxl_rr);
> +}
> +
> +static int cxl_rr_add(struct cxl_region_ref *cxl_rr)
> +{
> +	struct cxl_port *port = cxl_rr->port;
> +	struct cxl_region *cxlr = cxl_rr->region;
> +
> +	return xa_insert(&port->regions, (unsigned long)cxlr, cxl_rr,
> +			 GFP_KERNEL);
> +}
> +
> +static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
> +			 struct cxl_endpoint_decoder *cxled)
> +{
> +	int rc;
> +	struct cxl_port *port = cxl_rr->port;
> +	struct cxl_region *cxlr = cxl_rr->region;
> +	struct cxl_decoder *cxld = cxl_rr->decoder;
> +	struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled));
> +
> +	rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep,
> +			 GFP_KERNEL);
> +	if (rc)
> +		return rc;
> +	cxl_rr->nr_eps++;
> +
> +	if (!cxld->region) {
> +		cxld->region = cxlr;
> +		get_device(&cxlr->dev);
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_port_attach_region(struct cxl_port *port,
> +				  struct cxl_region *cxlr,
> +				  struct cxl_endpoint_decoder *cxled, int pos)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
> +	struct cxl_region_ref *cxl_rr = NULL, *iter;
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct cxl_decoder *cxld = NULL;
> +	unsigned long index;
> +	int rc = -EBUSY;
> +
> +	lockdep_assert_held_write(&cxl_region_rwsem);
> +
> +	xa_for_each(&port->regions, index, iter) {
> +		struct cxl_region_params *ip = &iter->region->params;
> +
> +		if (iter->region == cxlr)
> +			cxl_rr = iter;
> +		if (ip->res->start > p->res->start) {
> +			dev_dbg(&cxlr->dev,
> +				"%s: HPA order violation %s:%pr vs %pr\n",
> +				dev_name(&port->dev),
> +				dev_name(&iter->region->dev), ip->res, p->res);
> +			return -EBUSY;
> +		}
> +	}
> +
> +	if (cxl_rr) {
> +		struct cxl_ep *ep_iter;
> +		int found = 0;
> +
> +		cxld = cxl_rr->decoder;
> +		xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
> +			if (ep_iter == ep)
> +				continue;
> +			if (ep_iter->next == ep->next) {
> +				found++;
> +				break;
> +			}
> +		}
> +
> +		/*
> +		 * If this is a new target or if this port is direct connected
> +		 * to this endpoint then add to the target count.
> +		 */
> +		if (!found || !ep->next)
> +			cxl_rr->nr_targets++;
> +	} else {
> +		cxl_rr = alloc_region_ref(port, cxlr);
> +		if (!cxl_rr) {
> +			dev_dbg(&cxlr->dev,
> +				"%s: failed to allocate region reference\n",
> +				dev_name(&port->dev));
> +			return -ENOMEM;
> +		}
> +		rc = cxl_rr_add(cxl_rr);
> +		if (rc) {
> +			dev_dbg(&cxlr->dev,
> +				"%s: failed to track region reference\n",
> +				dev_name(&port->dev));
> +			kfree(cxl_rr);
> +			return rc;
> +		}
> +	}
> +
> +	if (!cxld) {
> +		if (port == cxled_to_port(cxled))
> +			cxld = &cxled->cxld;
> +		else
> +			cxld = cxl_region_find_decoder(port, cxlr);
> +		if (!cxld) {
> +			dev_dbg(&cxlr->dev, "%s: no decoder available\n",
> +				dev_name(&port->dev));
> +			goto out_erase;
> +		}
> +
> +		if (cxld->region) {
> +			dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n",
> +				dev_name(&port->dev), dev_name(&cxld->dev),
> +				dev_name(&cxld->region->dev));
> +			rc = -EBUSY;
> +			goto out_erase;
> +		}
> +
> +		cxl_rr->decoder = cxld;
> +	}
> +
> +	rc = cxl_rr_ep_add(cxl_rr, cxled);
> +	if (rc) {
> +		dev_dbg(&cxlr->dev,
> +			"%s: failed to track endpoint %s:%s reference\n",
> +			dev_name(&port->dev), dev_name(&cxlmd->dev),
> +			dev_name(&cxld->dev));
> +		goto out_erase;
> +	}
> +
> +	return 0;
> +out_erase:
> +	if (cxl_rr->nr_eps == 0)
> +		free_region_ref(cxl_rr);
> +	return rc;
> +}
> +
> +static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port,
> +					  struct cxl_region *cxlr)
> +{
> +	return xa_load(&port->regions, (unsigned long)cxlr);
> +}
> +
> +static void cxl_port_detach_region(struct cxl_port *port,
> +				   struct cxl_region *cxlr,
> +				   struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_region_ref *cxl_rr;
> +	struct cxl_ep *ep;
> +
> +	lockdep_assert_held_write(&cxl_region_rwsem);
> +
> +	cxl_rr = cxl_rr_load(port, cxlr);
> +	if (!cxl_rr)
> +		return;
> +
> +	ep = xa_erase(&cxl_rr->endpoints, (unsigned long)cxled);
> +	if (ep) {
> +		struct cxl_ep *ep_iter;
> +		unsigned long index;
> +		int found = 0;
> +
> +		cxl_rr->nr_eps--;
> +		xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
> +			if (ep_iter->next == ep->next) {
> +				found++;
> +				break;
> +			}
> +		}
> +		if (!found)
> +			cxl_rr->nr_targets--;
> +	}
> +
> +	if (cxl_rr->nr_eps == 0)
> +		free_region_ref(cxl_rr);
> +}
> +
>  /*
>   * - Check that the given endpoint is attached to a host-bridge identified
>   *   in the root interleave.
> @@ -435,14 +683,28 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
>  static int cxl_region_attach(struct cxl_region *cxlr,
>  			     struct cxl_endpoint_decoder *cxled, int pos)
>  {
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_port *ep_port, *root_port, *iter;
>  	struct cxl_region_params *p = &cxlr->params;
> +	struct cxl_dport *dport;
> +	int i, rc = -ENXIO;
>  
>  	if (cxled->mode == CXL_DECODER_DEAD) {
>  		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
>  		return -ENODEV;
>  	}
>  
> -	if (pos >= p->interleave_ways) {
> +	/* all full of members, or interleave config not established? */
> +	if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +		dev_dbg(&cxlr->dev, "region already active\n");
> +		return -EBUSY;
> +	} else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +		dev_dbg(&cxlr->dev, "interleave config missing\n");
> +		return -ENXIO;
> +	}
> +
> +	if (pos < 0 || pos >= p->interleave_ways) {
>  		dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
>  			p->interleave_ways);
>  		return -ENXIO;
> @@ -461,15 +723,83 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  		return -EBUSY;
>  	}
>  
> +	for (i = 0; i < p->interleave_ways; i++) {
> +		struct cxl_endpoint_decoder *cxled_target;
> +		struct cxl_memdev *cxlmd_target;
> +
> +		cxled_target = p->targets[pos];
> +		if (!cxled_target)
> +			continue;
> +
> +		cxlmd_target = cxled_to_memdev(cxled_target);
> +		if (cxlmd_target == cxlmd) {
> +			dev_dbg(&cxlr->dev,
> +				"%s already specified at position %d via: %s\n",
> +				dev_name(&cxlmd->dev), pos,
> +				dev_name(&cxled_target->cxld.dev));
> +			return -EBUSY;
> +		}
> +	}
> +
> +	ep_port = cxled_to_port(cxled);
> +	root_port = cxlrd_to_port(cxlrd);
> +	dport = cxl_dport_load(root_port, ep_port->host_bridge);
> +	if (!dport) {
> +		dev_dbg(&cxlr->dev, "%s:%s invalid target for %s\n",
> +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +			dev_name(cxlr->dev.parent));
> +		return -ENXIO;
> +	}
> +
> +	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> +		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +			dev_name(&cxlrd->cxlsd.cxld.dev));
> +		return -ENXIO;
> +	}
> +
> +	if (cxled->cxld.target_type != cxlr->type) {
> +		dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
> +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +			cxled->cxld.target_type, cxlr->type);
> +		return -ENXIO;
> +	}
> +
> +	if (resource_size(cxled->dpa_res) * p->interleave_ways !=

At this point cxled->dpa_res is NULL.

> +	    resource_size(p->res)) {
> +		dev_dbg(&cxlr->dev,
> +			"decoder-size-%#llx * ways-%d != region-size-%#llx\n",
> +			(u64)resource_size(cxled->dpa_res), p->interleave_ways,
> +			(u64)resource_size(p->res));
> +		return -EINVAL;
> +	}
> +
> +	for (iter = ep_port; !is_cxl_root(iter);
> +	     iter = to_cxl_port(iter->dev.parent)) {
> +		rc = cxl_port_attach_region(iter, cxlr, cxled, pos);
> +		if (rc)
> +			goto err;
> +	}
> +
>  	p->targets[pos] = cxled;
>  	cxled->pos = pos;
>  	p->nr_targets++;
>  
> +	if (p->nr_targets == p->interleave_ways)
> +		p->state = CXL_CONFIG_ACTIVE;
> +
>  	return 0;
> +
> +err:
> +	for (iter = ep_port; !is_cxl_root(iter);
> +	     iter = to_cxl_port(iter->dev.parent))
> +		cxl_port_detach_region(iter, cxlr, cxled);
> +	return rc;
>  }
>  
>  static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>  {
> +	struct cxl_port *iter, *ep_port = cxled_to_port(cxled);
>  	struct cxl_region *cxlr = cxled->cxld.region;
>  	struct cxl_region_params *p;
>  
> @@ -481,6 +811,10 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>  	p = &cxlr->params;
>  	get_device(&cxlr->dev);
>  
> +	for (iter = ep_port; !is_cxl_root(iter);
> +	     iter = to_cxl_port(iter->dev.parent))
> +		cxl_port_detach_region(iter, cxlr, cxled);
> +
>  	if (cxled->pos < 0 || cxled->pos >= p->interleave_ways ||
>  	    p->targets[cxled->pos] != cxled) {
>  		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> @@ -491,6 +825,8 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>  		goto out;
>  	}
>  
> +	if (p->state == CXL_CONFIG_ACTIVE)
> +		p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
>  	p->targets[cxled->pos] = NULL;
>  	p->nr_targets--;
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 30227348f768..09dbd46cc4c7 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -414,6 +414,7 @@ struct cxl_nvdimm {
>   * @id: id for port device-name
>   * @dports: cxl_dport instances referenced by decoders
>   * @endpoints: cxl_ep instances, endpoints that are a descendant of this port
> + * @regions: cxl_region_ref instances, regions mapped by this port
>   * @parent_dport: dport that points to this port in the parent
>   * @decoder_ida: allocator for decoder ids
>   * @dpa_end: cursor to track highest allocated decoder for allocation ordering
> @@ -428,6 +429,7 @@ struct cxl_port {
>  	int id;
>  	struct xarray dports;
>  	struct xarray endpoints;
> +	struct xarray regions;
>  	struct cxl_dport *parent_dport;
>  	struct ida decoder_ida;
>  	int dpa_end;
> @@ -470,6 +472,24 @@ struct cxl_ep {
>  	struct cxl_port *next;
>  };
>  
> +/**
> + * struct cxl_region_ref - track a region's interest in a port
> + * @port: point in topology to install this reference
> + * @decoder: decoder assigned for @region in @port
> + * @region: region for this reference
> + * @endpoints: cxl_ep references for region members beneath @port
> + * @nr_eps: number of endpoints beneath @port
> + * @nr_targets: number of distinct targets needed to reach @nr_eps
> + */
> +struct cxl_region_ref {
> +	struct cxl_port *port;
> +	struct cxl_decoder *decoder;
> +	struct cxl_region *region;
> +	struct xarray endpoints;
> +	int nr_eps;
> +	int nr_targets;
> +};
> +
>  /*
>   * The platform firmware device hosting the root is also the top of the
>   * CXL port topology. All other CXL ports have another CXL port as their
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index eee96016c3c7..a83bb6782d23 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -55,6 +55,11 @@ static inline struct cxl_port *cxled_to_port(struct cxl_endpoint_decoder *cxled)
>  	return to_cxl_port(cxled->cxld.dev.parent);
>  }
>  
> +static inline struct cxl_port *cxlrd_to_port(struct cxl_root_decoder *cxlrd)
> +{
> +	return to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
> +}
> +
>  static inline struct cxl_memdev *
>  cxled_to_memdev(struct cxl_endpoint_decoder *cxled)
>  {


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 40/46] cxl/region: Attach endpoint decoders
  2022-06-24 18:25   ` Jonathan Cameron
@ 2022-06-24 18:49     ` Dan Williams
  2022-06-24 20:51     ` Dan Williams
  1 sibling, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24 18:49 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:44 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > CXL regions (interleave sets) are made up of a set of memory devices
> > where each device maps a portion of the interleave with one of its
> > decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
> > As endpoint decoders are identified by a provisioning tool they can be
> > added to a region provided the region interleave properties are set
> > (way, granularity, HPA) and DPA has been assigned to the decoder.
> > 
> > The attach event triggers several validation checks, for example:
> > - is the DPA sized appropriately for the region
> > - is the decoder reachable via the host-bridges identified by the
> >   region's root decoder
> > - is the device already active in a different region position slot
> > - are there already regions with a higher HPA active on a given port
> >   (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)
> > 
> > ...and the attach event affords an opportunity to collect data and
> > resources relevant to later programming the target lists in switch
> > decoders, for example:
> > - allocate a decoder at each cxl_port in the decode chain
> > - for a given switch port, how many the region's endpoints are hosted
> >   through the port
> > - how many unique targets (next hops) does a port need to map to reach
> >   those endpoints
> > 
> > The act of reconciling this information and deploying it to the decoder
> > configuration is saved for a follow-on patch.
> Hi Dam,
> n
> Only managed to grab a few mins today to debug that crash.. So I know
> the immediate cause but not yet why we got to that state.
> 
> Test case (happened to be one I had open) is 2x HB, 2x RP on each,
> direct connected type 3s on all ports.

Can you send along the QEMU startup script for this config as well?

> Manual test script is:
> 
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/core/cxl_core.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_acpi.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_port.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_pci.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_mem.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_pmem.ko

Yikes, nothing good ever came insmod, modprobe and automatically module
dependency handling is the way to go.

> 
> cd /sys/bus/cxl/devices/decoder0.0/
> cat create_pmem_region
> echo region0 > create_pmem_region
> 
> cd region0/
> echo 4 > interleave_ways
> echo $((256 << 22)) > size
> echo 6a6b9b22-e0d4-11ec-9d64-0242ac120002 > uuid
> ls -lh /sys/bus/cxl/devices/endpoint?/upo*
> 
> # Then figure out the order hopefully write the correct targets 
> echo decoder5.0 > target0
> 
> Location of crash below...
> No idea if these breadcrumbs will be much use. I'll poke
> it some more next week. Have a good weekend,
> 
> Jonathan
> 
> 
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/core.h   |   7 +
> >  drivers/cxl/core/port.c   |  10 +-
> >  drivers/cxl/core/region.c | 338 +++++++++++++++++++++++++++++++++++++-
> >  drivers/cxl/cxl.h         |  20 +++
> >  drivers/cxl/cxlmem.h      |   5 +
> >  5 files changed, 372 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 36b6bd8dac2b..0e4e5c2d9452 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -42,6 +42,13 @@ resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
> >  resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
> >  extern struct rw_semaphore cxl_dpa_rwsem;
> >  
> > +bool is_switch_decoder(struct device *dev);
> > +static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> > +					 struct cxl_memdev *cxlmd)
> > +{
> > +	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
> > +}
> > +
> >  int cxl_memdev_init(void);
> >  void cxl_memdev_exit(void);
> >  void cxl_mbox_init(void);
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 7756409d0a58..fde2a2e103d4 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -447,7 +447,7 @@ bool is_root_decoder(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
> >  
> > -static bool is_switch_decoder(struct device *dev)
> > +bool is_switch_decoder(struct device *dev)
> >  {
> >  	return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
> >  }
> > @@ -503,6 +503,7 @@ static void cxl_port_release(struct device *dev)
> >  		cxl_ep_remove(port, ep);
> >  	xa_destroy(&port->endpoints);
> >  	xa_destroy(&port->dports);
> > +	xa_destroy(&port->regions);
> >  	ida_free(&cxl_port_ida, port->id);
> >  	kfree(port);
> >  }
> > @@ -633,6 +634,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
> >  	port->dpa_end = -1;
> >  	xa_init(&port->dports);
> >  	xa_init(&port->endpoints);
> > +	xa_init(&port->regions);
> >  
> >  	device_initialize(dev);
> >  	lockdep_set_class_and_subclass(&dev->mutex, &cxl_port_key, port->depth);
> > @@ -1110,12 +1112,6 @@ static void reap_dports(struct cxl_port *port)
> >  	}
> >  }
> >  
> > -static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> > -				  struct cxl_memdev *cxlmd)
> > -{
> > -	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
> > -}
> > -
> >  int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd,
> >  			  struct cxl_dport *parent_dport)
> >  {
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 4830365f3857..65bf84abad57 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -428,6 +428,254 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
> >  	return rc;
> >  }
> >  
> > +static int match_free_decoder(struct device *dev, void *data)
> > +{
> > +	struct cxl_decoder *cxld;
> > +	int *id = data;
> > +
> > +	if (!is_switch_decoder(dev))
> > +		return 0;
> > +
> > +	cxld = to_cxl_decoder(dev);
> > +
> > +	/* enforce ordered allocation */
> > +	if (cxld->id != *id)
> > +		return 0;
> > +
> > +	if (!cxld->region)
> > +		return 1;
> > +
> > +	(*id)++;
> > +
> > +	return 0;
> > +}
> > +
> > +static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> > +						   struct cxl_region *cxlr)
> > +{
> > +	struct device *dev;
> > +	int id = 0;
> > +
> > +	dev = device_find_child(&port->dev, &id, match_free_decoder);
> > +	if (!dev)
> > +		return NULL;
> > +	/*
> > +	 * This decoder is pinned registered as long as the endpoint decoder is
> > +	 * registered, and endpoint decoder unregistration holds the
> > +	 * cxl_region_rwsem over unregister events, so no need to hold on to
> > +	 * this extra reference.
> > +	 */
> > +	put_device(dev);
> > +	return to_cxl_decoder(dev);
> > +}
> > +
> > +static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> > +					       struct cxl_region *cxlr)
> > +{
> > +	struct cxl_region_ref *cxl_rr;
> > +
> > +	cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL);
> > +	if (!cxl_rr)
> > +		return NULL;
> > +	cxl_rr->port = port;
> > +	cxl_rr->region = cxlr;
> > +	xa_init(&cxl_rr->endpoints);
> > +	return cxl_rr;
> > +}
> > +
> > +static void free_region_ref(struct cxl_region_ref *cxl_rr)
> > +{
> > +	struct cxl_port *port = cxl_rr->port;
> > +	struct cxl_region *cxlr = cxl_rr->region;
> > +	struct cxl_decoder *cxld = cxl_rr->decoder;
> > +
> > +	dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n");
> > +	if (cxld->region == cxlr) {
> > +		cxld->region = NULL;
> > +		put_device(&cxlr->dev);
> > +	}
> > +
> > +	xa_erase(&port->regions, (unsigned long)cxlr);
> > +	xa_destroy(&cxl_rr->endpoints);
> > +	kfree(cxl_rr);
> > +}
> > +
> > +static int cxl_rr_add(struct cxl_region_ref *cxl_rr)
> > +{
> > +	struct cxl_port *port = cxl_rr->port;
> > +	struct cxl_region *cxlr = cxl_rr->region;
> > +
> > +	return xa_insert(&port->regions, (unsigned long)cxlr, cxl_rr,
> > +			 GFP_KERNEL);
> > +}
> > +
> > +static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
> > +			 struct cxl_endpoint_decoder *cxled)
> > +{
> > +	int rc;
> > +	struct cxl_port *port = cxl_rr->port;
> > +	struct cxl_region *cxlr = cxl_rr->region;
> > +	struct cxl_decoder *cxld = cxl_rr->decoder;
> > +	struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled));
> > +
> > +	rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep,
> > +			 GFP_KERNEL);
> > +	if (rc)
> > +		return rc;
> > +	cxl_rr->nr_eps++;
> > +
> > +	if (!cxld->region) {
> > +		cxld->region = cxlr;
> > +		get_device(&cxlr->dev);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int cxl_port_attach_region(struct cxl_port *port,
> > +				  struct cxl_region *cxlr,
> > +				  struct cxl_endpoint_decoder *cxled, int pos)
> > +{
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
> > +	struct cxl_region_ref *cxl_rr = NULL, *iter;
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	struct cxl_decoder *cxld = NULL;
> > +	unsigned long index;
> > +	int rc = -EBUSY;
> > +
> > +	lockdep_assert_held_write(&cxl_region_rwsem);
> > +
> > +	xa_for_each(&port->regions, index, iter) {
> > +		struct cxl_region_params *ip = &iter->region->params;
> > +
> > +		if (iter->region == cxlr)
> > +			cxl_rr = iter;
> > +		if (ip->res->start > p->res->start) {
> > +			dev_dbg(&cxlr->dev,
> > +				"%s: HPA order violation %s:%pr vs %pr\n",
> > +				dev_name(&port->dev),
> > +				dev_name(&iter->region->dev), ip->res, p->res);
> > +			return -EBUSY;
> > +		}
> > +	}
> > +
> > +	if (cxl_rr) {
> > +		struct cxl_ep *ep_iter;
> > +		int found = 0;
> > +
> > +		cxld = cxl_rr->decoder;
> > +		xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
> > +			if (ep_iter == ep)
> > +				continue;
> > +			if (ep_iter->next == ep->next) {
> > +				found++;
> > +				break;
> > +			}
> > +		}
> > +
> > +		/*
> > +		 * If this is a new target or if this port is direct connected
> > +		 * to this endpoint then add to the target count.
> > +		 */
> > +		if (!found || !ep->next)
> > +			cxl_rr->nr_targets++;
> > +	} else {
> > +		cxl_rr = alloc_region_ref(port, cxlr);
> > +		if (!cxl_rr) {
> > +			dev_dbg(&cxlr->dev,
> > +				"%s: failed to allocate region reference\n",
> > +				dev_name(&port->dev));
> > +			return -ENOMEM;
> > +		}
> > +		rc = cxl_rr_add(cxl_rr);
> > +		if (rc) {
> > +			dev_dbg(&cxlr->dev,
> > +				"%s: failed to track region reference\n",
> > +				dev_name(&port->dev));
> > +			kfree(cxl_rr);
> > +			return rc;
> > +		}
> > +	}
> > +
> > +	if (!cxld) {
> > +		if (port == cxled_to_port(cxled))
> > +			cxld = &cxled->cxld;
> > +		else
> > +			cxld = cxl_region_find_decoder(port, cxlr);
> > +		if (!cxld) {
> > +			dev_dbg(&cxlr->dev, "%s: no decoder available\n",
> > +				dev_name(&port->dev));
> > +			goto out_erase;
> > +		}
> > +
> > +		if (cxld->region) {
> > +			dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n",
> > +				dev_name(&port->dev), dev_name(&cxld->dev),
> > +				dev_name(&cxld->region->dev));
> > +			rc = -EBUSY;
> > +			goto out_erase;
> > +		}
> > +
> > +		cxl_rr->decoder = cxld;
> > +	}
> > +
> > +	rc = cxl_rr_ep_add(cxl_rr, cxled);
> > +	if (rc) {
> > +		dev_dbg(&cxlr->dev,
> > +			"%s: failed to track endpoint %s:%s reference\n",
> > +			dev_name(&port->dev), dev_name(&cxlmd->dev),
> > +			dev_name(&cxld->dev));
> > +		goto out_erase;
> > +	}
> > +
> > +	return 0;
> > +out_erase:
> > +	if (cxl_rr->nr_eps == 0)
> > +		free_region_ref(cxl_rr);
> > +	return rc;
> > +}
> > +
> > +static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port,
> > +					  struct cxl_region *cxlr)
> > +{
> > +	return xa_load(&port->regions, (unsigned long)cxlr);
> > +}
> > +
> > +static void cxl_port_detach_region(struct cxl_port *port,
> > +				   struct cxl_region *cxlr,
> > +				   struct cxl_endpoint_decoder *cxled)
> > +{
> > +	struct cxl_region_ref *cxl_rr;
> > +	struct cxl_ep *ep;
> > +
> > +	lockdep_assert_held_write(&cxl_region_rwsem);
> > +
> > +	cxl_rr = cxl_rr_load(port, cxlr);
> > +	if (!cxl_rr)
> > +		return;
> > +
> > +	ep = xa_erase(&cxl_rr->endpoints, (unsigned long)cxled);
> > +	if (ep) {
> > +		struct cxl_ep *ep_iter;
> > +		unsigned long index;
> > +		int found = 0;
> > +
> > +		cxl_rr->nr_eps--;
> > +		xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
> > +			if (ep_iter->next == ep->next) {
> > +				found++;
> > +				break;
> > +			}
> > +		}
> > +		if (!found)
> > +			cxl_rr->nr_targets--;
> > +	}
> > +
> > +	if (cxl_rr->nr_eps == 0)
> > +		free_region_ref(cxl_rr);
> > +}
> > +
> >  /*
> >   * - Check that the given endpoint is attached to a host-bridge identified
> >   *   in the root interleave.
> > @@ -435,14 +683,28 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
> >  static int cxl_region_attach(struct cxl_region *cxlr,
> >  			     struct cxl_endpoint_decoder *cxled, int pos)
> >  {
> > +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_port *ep_port, *root_port, *iter;
> >  	struct cxl_region_params *p = &cxlr->params;
> > +	struct cxl_dport *dport;
> > +	int i, rc = -ENXIO;
> >  
> >  	if (cxled->mode == CXL_DECODER_DEAD) {
> >  		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> >  		return -ENODEV;
> >  	}
> >  
> > -	if (pos >= p->interleave_ways) {
> > +	/* all full of members, or interleave config not established? */
> > +	if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
> > +		dev_dbg(&cxlr->dev, "region already active\n");
> > +		return -EBUSY;
> > +	} else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
> > +		dev_dbg(&cxlr->dev, "interleave config missing\n");
> > +		return -ENXIO;
> > +	}
> > +
> > +	if (pos < 0 || pos >= p->interleave_ways) {
> >  		dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
> >  			p->interleave_ways);
> >  		return -ENXIO;
> > @@ -461,15 +723,83 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> >  		return -EBUSY;
> >  	}
> >  
> > +	for (i = 0; i < p->interleave_ways; i++) {
> > +		struct cxl_endpoint_decoder *cxled_target;
> > +		struct cxl_memdev *cxlmd_target;
> > +
> > +		cxled_target = p->targets[pos];
> > +		if (!cxled_target)
> > +			continue;
> > +
> > +		cxlmd_target = cxled_to_memdev(cxled_target);
> > +		if (cxlmd_target == cxlmd) {
> > +			dev_dbg(&cxlr->dev,
> > +				"%s already specified at position %d via: %s\n",
> > +				dev_name(&cxlmd->dev), pos,
> > +				dev_name(&cxled_target->cxld.dev));
> > +			return -EBUSY;
> > +		}
> > +	}
> > +
> > +	ep_port = cxled_to_port(cxled);
> > +	root_port = cxlrd_to_port(cxlrd);
> > +	dport = cxl_dport_load(root_port, ep_port->host_bridge);
> > +	if (!dport) {
> > +		dev_dbg(&cxlr->dev, "%s:%s invalid target for %s\n",
> > +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > +			dev_name(cxlr->dev.parent));
> > +		return -ENXIO;
> > +	}
> > +
> > +	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> > +		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> > +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > +			dev_name(&cxlrd->cxlsd.cxld.dev));
> > +		return -ENXIO;
> > +	}
> > +
> > +	if (cxled->cxld.target_type != cxlr->type) {
> > +		dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
> > +			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > +			cxled->cxld.target_type, cxlr->type);
> > +		return -ENXIO;
> > +	}
> > +
> > +	if (resource_size(cxled->dpa_res) * p->interleave_ways !=
> 
> At this point cxled->dpa_res is NULL.

Will take a look, thanks.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 40/46] cxl/region: Attach endpoint decoders
  2022-06-24 18:25   ` Jonathan Cameron
  2022-06-24 18:49     ` Dan Williams
@ 2022-06-24 20:51     ` Dan Williams
  2022-06-24 23:21       ` Dan Williams
  1 sibling, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-06-24 20:51 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:44 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > CXL regions (interleave sets) are made up of a set of memory devices
> > where each device maps a portion of the interleave with one of its
> > decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
> > As endpoint decoders are identified by a provisioning tool they can be
> > added to a region provided the region interleave properties are set
> > (way, granularity, HPA) and DPA has been assigned to the decoder.
> > 
> > The attach event triggers several validation checks, for example:
> > - is the DPA sized appropriately for the region
> > - is the decoder reachable via the host-bridges identified by the
> >   region's root decoder
> > - is the device already active in a different region position slot
> > - are there already regions with a higher HPA active on a given port
> >   (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)
> > 
> > ...and the attach event affords an opportunity to collect data and
> > resources relevant to later programming the target lists in switch
> > decoders, for example:
> > - allocate a decoder at each cxl_port in the decode chain
> > - for a given switch port, how many the region's endpoints are hosted
> >   through the port
> > - how many unique targets (next hops) does a port need to map to reach
> >   those endpoints
> > 
> > The act of reconciling this information and deploying it to the decoder
> > configuration is saved for a follow-on patch.
> Hi Dam,
> n
> Only managed to grab a few mins today to debug that crash.. So I know
> the immediate cause but not yet why we got to that state.
> 
> Test case (happened to be one I had open) is 2x HB, 2x RP on each,
> direct connected type 3s on all ports.
> 
> Manual test script is:
> 
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/core/cxl_core.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_acpi.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_port.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_pci.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_mem.ko
> insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_pmem.ko
> 
> cd /sys/bus/cxl/devices/decoder0.0/
> cat create_pmem_region
> echo region0 > create_pmem_region
> 
> cd region0/
> echo 4 > interleave_ways
> echo $((256 << 22)) > size
> echo 6a6b9b22-e0d4-11ec-9d64-0242ac120002 > uuid
> ls -lh /sys/bus/cxl/devices/endpoint?/upo*
> 
> # Then figure out the order hopefully write the correct targets 
> echo decoder5.0 > target0

Oh, something simple in the end. Just need to check that DPA is assigned
before region attach. I folded the following into patch 40:

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 0b5acabcc541..d52c97e941fe 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -765,10 +765,17 @@ static int cxl_region_attach(struct cxl_region *cxlr,
                return -ENXIO;
        }
 
+       if (!cxled->dpa_res) {
+               dev_dbg(&cxlr->dev, "%s:%s: missing DPA allocation.\n",
+                       dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
+               return -ENXIO;
+       }
+
        if (resource_size(cxled->dpa_res) * p->interleave_ways !=
            resource_size(p->res)) {
                dev_dbg(&cxlr->dev,
-                       "decoder-size-%#llx * ways-%d != region-size-%#llx\n",
+                       "%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n",
+                       dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
                        (u64)resource_size(cxled->dpa_res), p->interleave_ways,
                        (u64)resource_size(p->res));
                return -EINVAL;

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH 40/46] cxl/region: Attach endpoint decoders
  2022-06-24 20:51     ` Dan Williams
@ 2022-06-24 23:21       ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-24 23:21 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Dan Williams wrote:
> Jonathan Cameron wrote:
> > On Thu, 23 Jun 2022 21:19:44 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> > 
> > > CXL regions (interleave sets) are made up of a set of memory devices
> > > where each device maps a portion of the interleave with one of its
> > > decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
> > > As endpoint decoders are identified by a provisioning tool they can be
> > > added to a region provided the region interleave properties are set
> > > (way, granularity, HPA) and DPA has been assigned to the decoder.
> > > 
> > > The attach event triggers several validation checks, for example:
> > > - is the DPA sized appropriately for the region
> > > - is the decoder reachable via the host-bridges identified by the
> > >   region's root decoder
> > > - is the device already active in a different region position slot
> > > - are there already regions with a higher HPA active on a given port
> > >   (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)
> > > 
> > > ...and the attach event affords an opportunity to collect data and
> > > resources relevant to later programming the target lists in switch
> > > decoders, for example:
> > > - allocate a decoder at each cxl_port in the decode chain
> > > - for a given switch port, how many the region's endpoints are hosted
> > >   through the port
> > > - how many unique targets (next hops) does a port need to map to reach
> > >   those endpoints
> > > 
> > > The act of reconciling this information and deploying it to the decoder
> > > configuration is saved for a follow-on patch.
> > Hi Dam,
> > n
> > Only managed to grab a few mins today to debug that crash.. So I know
> > the immediate cause but not yet why we got to that state.
> > 
> > Test case (happened to be one I had open) is 2x HB, 2x RP on each,
> > direct connected type 3s on all ports.
> > 
> > Manual test script is:
> > 
> > insmod modules/5.19.0-rc3+/kernel/drivers/cxl/core/cxl_core.ko
> > insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_acpi.ko
> > insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_port.ko
> > insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_pci.ko
> > insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_mem.ko
> > insmod modules/5.19.0-rc3+/kernel/drivers/cxl/cxl_pmem.ko
> > 
> > cd /sys/bus/cxl/devices/decoder0.0/
> > cat create_pmem_region
> > echo region0 > create_pmem_region
> > 
> > cd region0/
> > echo 4 > interleave_ways
> > echo $((256 << 22)) > size
> > echo 6a6b9b22-e0d4-11ec-9d64-0242ac120002 > uuid
> > ls -lh /sys/bus/cxl/devices/endpoint?/upo*
> > 
> > # Then figure out the order hopefully write the correct targets 
> > echo decoder5.0 > target0
> 
> Oh, something simple in the end. Just need to check that DPA is assigned
> before region attach. I folded the following into patch 40:

BTW, as I'm finding these things I'm force pushing the preview branch
with the updates, so this one is fixed as of current HEAD at:

1985cf588505 cxl/region: Introduce cxl_pmem_region objects

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 00/46] CXL PMEM Region Provisioning
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (46 preceding siblings ...)
  2022-06-24 15:13 ` [PATCH 00/46] CXL PMEM Region Provisioning Jonathan Cameron
@ 2022-06-28  3:12 ` Alison Schofield
  2022-06-28  3:34   ` Dan Williams
  2022-07-02  2:26 ` Alison Schofield
  48 siblings, 1 reply; 157+ messages in thread
From: Alison Schofield @ 2022-06-28  3:12 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Weiny, Ira, Christoph Hellwig, Jason Gunthorpe,
	Ben Widawsky, Matthew Wilcox, nvdimm, linux-pci, patches


-snipped everything

These are commit message typos followed by one tidy-up request.

[PATCH 00/46] CXL PMEM Region Provisioning
s/usersapce/userspace
s/mangage/manage

[PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
s/accurracy/accuracy

[PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
s/platfom/platforma

[PATCH 14/46] cxl/hdm: Enumerate allocated DPA
s/provisioining/provisioning
s/comrpised/comprised
s/volaltile-ram/volatile-ram

[PATCH 23/46] tools/testing/cxl: Add partition support
s/mecahinisms/mechanisms

[PATCH 25/46] cxl/port: Record dport in endpoint references
s/endoint/endpoint

[PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
s/userpace/userspace
s/resonsible/responsible

[PATCH 35/46] cxl/region: Add a 'uuid' attribute
s/is operation/its operation

[PATCH 42/46] cxl/hdm: Commit decoder state to hardware
s/base-addres/base-address
s/intereleave/interleave

How about shortening the commit messages of Patch 10 & 11? They make my
git pretty one liner output ugly.



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 15/46] cxl/Documentation: List attribute permissions
  2022-06-24  2:46 ` [PATCH 15/46] cxl/Documentation: List attribute permissions Dan Williams
@ 2022-06-28  3:16   ` Alison Schofield
  2022-06-29 14:59   ` Jonathan Cameron
  1 sibling, 0 replies; 157+ messages in thread
From: Alison Schofield @ 2022-06-28  3:16 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: linux-cxl, hch, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:46:52PM -0700, Dan Williams wrote:
> Clarify the access permission of CXL sysfs attributes in the
> documentation to help development of userspace tooling.
> 
> Reported-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Reviewed-by: Alison Schofield <alison.schofield@intel.com>


>  Documentation/ABI/testing/sysfs-bus-cxl |   81 ++++++++++++++++---------------
>  1 file changed, 41 insertions(+), 40 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 7c2b846521f3..1fd5984b6158 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -57,28 +57,28 @@ Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL device objects export the devtype attribute which mirrors
> -		the same value communicated in the DEVTYPE environment variable
> -		for uevents for devices on the "cxl" bus.
> +		(RO) CXL device objects export the devtype attribute which
> +		mirrors the same value communicated in the DEVTYPE environment
> +		variable for uevents for devices on the "cxl" bus.
>  
>  What:		/sys/bus/cxl/devices/*/modalias
>  Date:		December, 2021
>  KernelVersion:	v5.18
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL device objects export the modalias attribute which mirrors
> -		the same value communicated in the MODALIAS environment variable
> -		for uevents for devices on the "cxl" bus.
> +		(RO) CXL device objects export the modalias attribute which
> +		mirrors the same value communicated in the MODALIAS environment
> +		variable for uevents for devices on the "cxl" bus.
>  
>  What:		/sys/bus/cxl/devices/portX/uport
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL port objects are enumerated from either a platform firmware
> -		device (ACPI0017 and ACPI0016) or PCIe switch upstream port with
> -		CXL component registers. The 'uport' symlink connects the CXL
> -		portX object to the device that published the CXL port
> +		(RO) CXL port objects are enumerated from either a platform
> +		firmware device (ACPI0017 and ACPI0016) or PCIe switch upstream
> +		port with CXL component registers. The 'uport' symlink connects
> +		the CXL portX object to the device that published the CXL port
>  		capability.
>  
>  What:		/sys/bus/cxl/devices/portX/dportY
> @@ -86,20 +86,20 @@ Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL port objects are enumerated from either a platform firmware
> -		device (ACPI0017 and ACPI0016) or PCIe switch upstream port with
> -		CXL component registers. The 'dportY' symlink identifies one or
> -		more downstream ports that the upstream port may target in its
> -		decode of CXL memory resources.  The 'Y' integer reflects the
> -		hardware port unique-id used in the hardware decoder target
> -		list.
> +		(RO) CXL port objects are enumerated from either a platform
> +		firmware device (ACPI0017 and ACPI0016) or PCIe switch upstream
> +		port with CXL component registers. The 'dportY' symlink
> +		identifies one or more downstream ports that the upstream port
> +		may target in its decode of CXL memory resources.  The 'Y'
> +		integer reflects the hardware port unique-id used in the
> +		hardware decoder target list.
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL decoder objects are enumerated from either a platform
> +		(RO) CXL decoder objects are enumerated from either a platform
>  		firmware description, or a CXL HDM decoder register set in a
>  		PCIe device (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder
>  		Capability Structure). The 'X' in decoderX.Y represents the
> @@ -111,42 +111,43 @@ Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		The 'start' and 'size' attributes together convey the physical
> -		address base and number of bytes mapped in the decoder's decode
> -		window. For decoders of devtype "cxl_decoder_root" the address
> -		range is fixed. For decoders of devtype "cxl_decoder_switch" the
> -		address is bounded by the decode range of the cxl_port ancestor
> -		of the decoder's cxl_port, and dynamically updates based on the
> -		active memory regions in that address space.
> +		(RO) The 'start' and 'size' attributes together convey the
> +		physical address base and number of bytes mapped in the
> +		decoder's decode window. For decoders of devtype
> +		"cxl_decoder_root" the address range is fixed. For decoders of
> +		devtype "cxl_decoder_switch" the address is bounded by the
> +		decode range of the cxl_port ancestor of the decoder's cxl_port,
> +		and dynamically updates based on the active memory regions in
> +		that address space.
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/locked
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL HDM decoders have the capability to lock the configuration
> -		until the next device reset. For decoders of devtype
> -		"cxl_decoder_root" there is no standard facility to unlock them.
> -		For decoders of devtype "cxl_decoder_switch" a secondary bus
> -		reset, of the PCIe bridge that provides the bus for this
> -		decoders uport, unlocks / resets the decoder.
> +		(RO) CXL HDM decoders have the capability to lock the
> +		configuration until the next device reset. For decoders of
> +		devtype "cxl_decoder_root" there is no standard facility to
> +		unlock them.  For decoders of devtype "cxl_decoder_switch" a
> +		secondary bus reset, of the PCIe bridge that provides the bus
> +		for this decoders uport, unlocks / resets the decoder.
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/target_list
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		Display a comma separated list of the current decoder target
> -		configuration. The list is ordered by the current configured
> -		interleave order of the decoder's dport instances. Each entry in
> -		the list is a dport id.
> +		(RO) Display a comma separated list of the current decoder
> +		target configuration. The list is ordered by the current
> +		configured interleave order of the decoder's dport instances.
> +		Each entry in the list is a dport id.
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3}
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		When a CXL decoder is of devtype "cxl_decoder_root", it
> +		(RO) When a CXL decoder is of devtype "cxl_decoder_root", it
>  		represents a fixed memory window identified by platform
>  		firmware. A fixed window may only support a subset of memory
>  		types. The 'cap_*' attributes indicate whether persistent
> @@ -158,8 +159,8 @@ Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		When a CXL decoder is of devtype "cxl_decoder_switch", it can
> -		optionally decode either accelerator memory (type-2) or expander
> -		memory (type-3). The 'target_type' attribute indicates the
> -		current setting which may dynamically change based on what
> +		(RO) When a CXL decoder is of devtype "cxl_decoder_switch", it
> +		can optionally decode either accelerator memory (type-2) or
> +		expander memory (type-3). The 'target_type' attribute indicates
> +		the current setting which may dynamically change based on what
>  		memory regions are activated in this decode hierarchy.
> 

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 00/46] CXL PMEM Region Provisioning
  2022-06-28  3:12 ` Alison Schofield
@ 2022-06-28  3:34   ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-28  3:34 UTC (permalink / raw)
  To: Alison Schofield, Dan Williams
  Cc: linux-cxl, Weiny, Ira, Christoph Hellwig, Jason Gunthorpe,
	Ben Widawsky, Matthew Wilcox, nvdimm, linux-pci, patches

Alison Schofield wrote:
> 
> -snipped everything
> 
> These are commit message typos followed by one tidy-up request.
> 
> [PATCH 00/46] CXL PMEM Region Provisioning
> s/usersapce/userspace
> s/mangage/manage
> 
> [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
> s/accurracy/accuracy
> 
> [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
> s/platfom/platforma
> 
> [PATCH 14/46] cxl/hdm: Enumerate allocated DPA
> s/provisioining/provisioning
> s/comrpised/comprised
> s/volaltile-ram/volatile-ram
> 
> [PATCH 23/46] tools/testing/cxl: Add partition support
> s/mecahinisms/mechanisms
> 
> [PATCH 25/46] cxl/port: Record dport in endpoint references
> s/endoint/endpoint
> 
> [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
> s/userpace/userspace
> s/resonsible/responsible
> 
> [PATCH 35/46] cxl/region: Add a 'uuid' attribute
> s/is operation/its operation
> 
> [PATCH 42/46] cxl/hdm: Commit decoder state to hardware
> s/base-addres/base-address
> s/intereleave/interleave

Thanks!

Wonder why my checkpatch run elided those.

> How about shortening the commit messages of Patch 10 & 11? They make my
> git pretty one liner output ugly.

I'll think about it if the whole series ends up needing a resend, but
changing subjects does confuse b4 version tracking.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 35/46] cxl/region: Add a 'uuid' attribute
  2022-06-24  4:19 ` [PATCH 35/46] cxl/region: Add a 'uuid' attribute Dan Williams
@ 2022-06-28 10:29   ` Jonathan Cameron
  2022-06-28 14:24     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 10:29 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:39 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Ben Widawsky <bwidawsk@kernel.org>
> 
> The process of provisioning a region involves triggering the creation of
> a new region object, pouring in the configuration, and then binding that
> configured object to the region driver to start is operation. For
> persistent memory regions the CXL specification mandates that it
> identified by a uuid. Add an ABI for userspace to specify a region's
> uuid.
> 
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> [djbw: simplify locking]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

I think this needs to be a little less restrictive as it currently errors
out on trying to write the same UUID to the same region twice.

Short cut that case and just return 0 if the UUID is same as already set.

Thanks,

Jonathan



> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |  10 +++
>  drivers/cxl/core/region.c               | 115 ++++++++++++++++++++++++
>  drivers/cxl/cxl.h                       |  25 ++++++
>  3 files changed, 150 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 9a4856066631..d30c95a758a9 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -263,3 +263,13 @@ Contact:	linux-cxl@vger.kernel.org
>  Description:
>  		(WO) Write a string in the form 'regionZ' to delete that region,
>  		provided it is currently idle / not bound to a driver.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/uuid
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RW) Write a unique identifier for the region. This field must
> +		be set for persistent regions and it must not conflict with the
> +		UUID of another region.
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f2a0ead20ca7..f75978f846b9 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -5,6 +5,7 @@
>  #include <linux/device.h>
>  #include <linux/module.h>
>  #include <linux/slab.h>
> +#include <linux/uuid.h>
>  #include <linux/idr.h>
>  #include <cxl.h>
>  #include "core.h"
> @@ -17,10 +18,123 @@
>   * Memory ranges, Regions represent the active mapped capacity by the HDM
>   * Decoder Capability structures throughout the Host Bridges, Switches, and
>   * Endpoints in the topology.
> + *
> + * Region configuration has ordering constraints. UUID may be set at any time
> + * but is only visible for persistent regions.
> + */
> +
> +/*
> + * All changes to the interleave configuration occur with this lock held
> + * for write.
>   */
> +static DECLARE_RWSEM(cxl_region_rwsem);
>  
>  static struct cxl_region *to_cxl_region(struct device *dev);
>  
> +static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
> +			 char *buf)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +	struct cxl_region_params *p = &cxlr->params;
> +	ssize_t rc;
> +
> +	rc = down_read_interruptible(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +	rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
> +	up_read(&cxl_region_rwsem);
> +
> +	return rc;
> +}
> +
> +static int is_dup(struct device *match, void *data)
> +{
> +	struct cxl_region_params *p;
> +	struct cxl_region *cxlr;
> +	uuid_t *uuid = data;
> +
> +	if (!is_cxl_region(match))
> +		return 0;
> +
> +	lockdep_assert_held(&cxl_region_rwsem);
> +	cxlr = to_cxl_region(match);
> +	p = &cxlr->params;
> +
> +	if (uuid_equal(&p->uuid, uuid)) {
> +		dev_dbg(match, "already has uuid: %pUb\n", uuid);
> +		return -EBUSY;
> +	}
> +
> +	return 0;
> +}
> +
> +static ssize_t uuid_store(struct device *dev, struct device_attribute *attr,
> +			  const char *buf, size_t len)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +	struct cxl_region_params *p = &cxlr->params;
> +	uuid_t temp;
> +	ssize_t rc;
> +
> +	if (len != UUID_STRING_LEN + 1)
> +		return -EINVAL;
> +
> +	rc = uuid_parse(buf, &temp);
> +	if (rc)
> +		return rc;
> +
> +	if (uuid_is_null(&temp))
> +		return -EINVAL;
> +
> +	rc = down_write_killable(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +
> +	rc = -EBUSY;
> +	if (p->state >= CXL_CONFIG_ACTIVE)
> +		goto out;
> +
> +	rc = bus_for_each_dev(&cxl_bus_type, NULL, &temp, is_dup);
> +	if (rc < 0)
> +		goto out;
> +
> +	uuid_copy(&p->uuid, &temp);
> +out:
> +	up_write(&cxl_region_rwsem);
> +
> +	if (rc)
> +		return rc;
> +	return len;
> +}
> +static DEVICE_ATTR_RW(uuid);
> +
> +static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
> +				  int n)
> +{
> +	struct device *dev = kobj_to_dev(kobj);
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +
> +	if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM)
> +		return 0;
> +	return a->mode;
> +}
> +
> +static struct attribute *cxl_region_attrs[] = {
> +	&dev_attr_uuid.attr,
> +	NULL,
> +};
> +
> +static const struct attribute_group cxl_region_group = {
> +	.attrs = cxl_region_attrs,
> +	.is_visible = cxl_region_visible,
> +};
> +
> +static const struct attribute_group *region_groups[] = {
> +	&cxl_base_attribute_group,
> +	&cxl_region_group,
> +	NULL,
> +};
> +
>  static void cxl_region_release(struct device *dev)
>  {
>  	struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -32,6 +146,7 @@ static void cxl_region_release(struct device *dev)
>  static const struct device_type cxl_region_type = {
>  	.name = "cxl_region",
>  	.release = cxl_region_release,
> +	.groups = region_groups
>  };
>  
>  bool is_cxl_region(struct device *dev)
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 49b73b2e44a9..46a9f8acc602 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -288,18 +288,43 @@ struct cxl_root_decoder {
>  	struct cxl_switch_decoder cxlsd;
>  };
>  
> +/*
> + * enum cxl_config_state - State machine for region configuration
> + * @CXL_CONFIG_IDLE: Any sysfs attribute can be written freely
> + * @CXL_CONFIG_ACTIVE: All targets have been added the region is now
> + * active
> + */
> +enum cxl_config_state {
> +	CXL_CONFIG_IDLE,
> +	CXL_CONFIG_ACTIVE,
> +};
> +
> +/**
> + * struct cxl_region_params - region settings
> + * @state: allow the driver to lockdown further parameter changes
> + * @uuid: unique id for persistent regions
> + *
> + * State transitions are protected by the cxl_region_rwsem
> + */
> +struct cxl_region_params {
> +	enum cxl_config_state state;
> +	uuid_t uuid;
> +};
> +
>  /**
>   * struct cxl_region - CXL region
>   * @dev: This region's device
>   * @id: This region's id. Id is globally unique across all regions
>   * @mode: Endpoint decoder allocation / access mode
>   * @type: Endpoint decoder target type
> + * @params: active + config params for the region
>   */
>  struct cxl_region {
>  	struct device dev;
>  	int id;
>  	enum cxl_decoder_mode mode;
>  	enum cxl_decoder_type type;
> +	struct cxl_region_params params;
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention
  2022-06-24  2:45 ` [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention Dan Williams
@ 2022-06-28 10:37   ` Jonathan Cameron
       [not found]   ` <CGME20220629174147uscas1p211384ae262e099484440ef285be26c75@uscas1p2.samsung.com>
  1 sibling, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 10:37 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:45:07 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> This failing signature:
> 
> [    8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760
> [    8.392670] cxl_port: probe of endpoint2 failed with error 970997760
> [    8.392719] create_endpoint: cxl_mem mem0: add: endpoint2
> [    8.392721] cxl_mem mem0: endpoint2 failed probe
> [    8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6
> 
> ...shows cxl_hdm_decode_init() resulting in a return code ("970997760")
> that looks like stack corruption. The problem goes away if
> cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init().
> 
> The corruption results from the mismatch that the calling convention for
> cxl_hdm_decode_init() is:
> 
> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> 
> ...and __wrap_cxl_hdm_decode_init() is:
> 
> bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> 
> ...i.e. an int is expected but __wrap_hdm_decode_init() returns bool.
> 
> Fix the convention and cleanup the organization to match
> __wrap_cxl_await_media_ready() as the difference was a red herring that
> distracted from finding the bug.
> 
> Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
LGTM

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  tools/testing/cxl/test/mock.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> index f1f8c40948c5..bce6a21df0d5 100644
> --- a/tools/testing/cxl/test/mock.c
> +++ b/tools/testing/cxl/test/mock.c
> @@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL);
>  
> -bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
> -				struct cxl_hdm *cxlhdm)
> +int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
> +			       struct cxl_hdm *cxlhdm)
>  {
>  	int rc = 0, index;
>  	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
>  
> -	if (!ops || !ops->is_mock_dev(cxlds->dev))
> +	if (ops && ops->is_mock_dev(cxlds->dev))
> +		rc = 0;
> +	else
>  		rc = cxl_hdm_decode_init(cxlds, cxlhdm);
>  	put_cxl_mock_ops(index);
>  
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port
  2022-06-24  2:45 ` [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port Dan Williams
  2022-06-24  3:37   ` Alison Schofield
@ 2022-06-28 11:47   ` Jonathan Cameron
  2022-06-28 14:27     ` Dan Williams
       [not found]   ` <CGME20220629174622uscas1p2236a084ce25771a3ab57c6f006632f35@uscas1p2.samsung.com>
  2 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 11:47 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:45:14 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The upcoming region provisioning implementation has a need to
> dereference port->uport during the port unregister flow. Specifically,
> endpoint decoders need to be able to lookup their corresponding memdev
> via port->uport.
> 
> The existing ->dead flag was added for cases where the core was
> committed to tearing down the port, but needed to drop locks before
> calling device_unregister(). Reuse that flag to indicate to
> delete_endpoint() that it has no "release action" work to do as
> unregister_port() will handle it.
> 
> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")

From the explanation I'm not seeing why this has a fixes tag?

Otherwise seems fine...


> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/port.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index dbce99bdffab..7810d1a8369b 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -370,7 +370,7 @@ static void unregister_port(void *_port)
>  		lock_dev = &parent->dev;
>  
>  	device_lock_assert(lock_dev);
> -	port->uport = NULL;
> +	port->dead = true;
>  	device_unregister(&port->dev);
>  }
>  
> @@ -857,7 +857,7 @@ static void delete_endpoint(void *data)
>  	parent = &parent_port->dev;
>  
>  	device_lock(parent);
> -	if (parent->driver && endpoint->uport) {
> +	if (parent->driver && !endpoint->dead) {
>  		devm_release_action(parent, cxl_unlink_uport, endpoint);
>  		devm_release_action(parent, unregister_port, endpoint);
>  	}
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 35/46] cxl/region: Add a 'uuid' attribute
  2022-06-28 10:29   ` Jonathan Cameron
@ 2022-06-28 14:24     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-28 14:24 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:39 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > From: Ben Widawsky <bwidawsk@kernel.org>
> > 
> > The process of provisioning a region involves triggering the creation of
> > a new region object, pouring in the configuration, and then binding that
> > configured object to the region driver to start is operation. For
> > persistent memory regions the CXL specification mandates that it
> > identified by a uuid. Add an ABI for userspace to specify a region's
> > uuid.
> > 
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > [djbw: simplify locking]
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> I think this needs to be a little less restrictive as it currently errors
> out on trying to write the same UUID to the same region twice.
> 
> Short cut that case and just return 0 if the UUID is same as already set.

Sure, fixed that up locally.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port
  2022-06-28 11:47   ` Jonathan Cameron
@ 2022-06-28 14:27     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-06-28 14:27 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:45:14 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > The upcoming region provisioning implementation has a need to
> > dereference port->uport during the port unregister flow. Specifically,
> > endpoint decoders need to be able to lookup their corresponding memdev
> > via port->uport.
> > 
> > The existing ->dead flag was added for cases where the core was
> > committed to tearing down the port, but needed to drop locks before
> > calling device_unregister(). Reuse that flag to indicate to
> > delete_endpoint() that it has no "release action" work to do as
> > unregister_port() will handle it.
> > 
> > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> 
> From the explanation I'm not seeing why this has a fixes tag?

True, that can be dropped as the crash scenario that found the need for
this was not relevant at that older baseline.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 03/46] cxl/hdm: Use local hdm variable
  2022-06-24  2:45 ` [PATCH 03/46] cxl/hdm: Use local hdm variable Dan Williams
  2022-06-24  3:38   ` Alison Schofield
@ 2022-06-28 15:16   ` Jonathan Cameron
       [not found]   ` <CGME20220629200312uscas1p292303b9325dcbfe59293f002dc9e6b03@uscas1p2.samsung.com>
  2 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 15:16 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

On Thu, 23 Jun 2022 19:45:21 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Ben Widawsky <bwidawsk@kernel.org>
> 
> Save a few characters and use the already initialized local variable.
> 
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/hdm.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index bfc8ee876278..ba3d2d959c71 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -251,8 +251,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  			return PTR_ERR(cxld);
>  		}
>  
> -		rc = init_hdm_decoder(port, cxld, target_map,
> -				      cxlhdm->regs.hdm_decoder, i);
> +		rc = init_hdm_decoder(port, cxld, target_map, hdm, i);
>  		if (rc) {
>  			put_device(&cxld->dev);
>  			failed++;
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range
  2022-06-24  2:45 ` [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range Dan Williams
  2022-06-24  3:39   ` Alison Schofield
@ 2022-06-28 15:17   ` Jonathan Cameron
       [not found]   ` <CGME20220629200652uscas1p2c1da644ea63a5de69e14e046379779b1@uscas1p2.samsung.com>
  2 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 15:17 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:45:28 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for growing a ->dpa_range attribute for endpoint
> decoders, rename the current ->decoder_range to the more descriptive
> ->hpa_range.  
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Make sense
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/hdm.c       |    2 +-
>  drivers/cxl/core/port.c      |    4 ++--
>  drivers/cxl/cxl.h            |    4 ++--
>  tools/testing/cxl/test/cxl.c |    2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index ba3d2d959c71..5c070c93b07f 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -172,7 +172,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  		return -ENXIO;
>  	}
>  
> -	cxld->decoder_range = (struct range) {
> +	cxld->hpa_range = (struct range) {
>  		.start = base,
>  		.end = base + size - 1,
>  	};
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 7810d1a8369b..98bcbbd59a75 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -78,7 +78,7 @@ static ssize_t start_show(struct device *dev, struct device_attribute *attr,
>  	if (is_root_decoder(dev))
>  		start = cxld->platform_res.start;
>  	else
> -		start = cxld->decoder_range.start;
> +		start = cxld->hpa_range.start;
>  
>  	return sysfs_emit(buf, "%#llx\n", start);
>  }
> @@ -93,7 +93,7 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
>  	if (is_root_decoder(dev))
>  		size = resource_size(&cxld->platform_res);
>  	else
> -		size = range_len(&cxld->decoder_range);
> +		size = range_len(&cxld->hpa_range);
>  
>  	return sysfs_emit(buf, "%#llx\n", size);
>  }
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 6799b27c7db2..8256728cea8d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -198,7 +198,7 @@ enum cxl_decoder_type {
>   * @dev: this decoder's device
>   * @id: kernel device name id
>   * @platform_res: address space resources considered by root decoder
> - * @decoder_range: address space resources considered by midlevel decoder
> + * @hpa_range: Host physical address range mapped by this decoder
>   * @interleave_ways: number of cxl_dports in this decode
>   * @interleave_granularity: data stride per dport
>   * @target_type: accelerator vs expander (type2 vs type3) selector
> @@ -212,7 +212,7 @@ struct cxl_decoder {
>  	int id;
>  	union {
>  		struct resource platform_res;
> -		struct range decoder_range;
> +		struct range hpa_range;
>  	};
>  	int interleave_ways;
>  	int interleave_granularity;
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 431f2bddf6c8..7a08b025f2de 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -461,7 +461,7 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  			return PTR_ERR(cxld);
>  		}
>  
> -		cxld->decoder_range = (struct range) {
> +		cxld->hpa_range = (struct range) {
>  			.start = 0,
>  			.end = -1,
>  		};
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders
  2022-06-24  2:45 ` [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders Dan Williams
@ 2022-06-28 15:24   ` Jonathan Cameron
  2022-07-09 23:33     ` Dan Williams
       [not found]   ` <CGME20220629202117uscas1p2892fb68ae60c4754e2f7d26882a92ae5@uscas1p2.samsung.com>
  1 sibling, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 15:24 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:45:36 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Root decoders are responsible for hosting the available host address
> space for endpoints and regions to claim. The tracking of that available
> capacity can be done in iomem_resource directly. As a result, root
> decoders no longer need to host their own resource tree. The
> current ->platform_res attribute was added prematurely.
> 
> Otherwise, ->hpa_range fills the role of conveying the current decode
> range of the decoder.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

One trivial moan inline about sneaky whitespace fixes, I'll cope if you really
don't want to move that to a separate patch though :)

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/acpi.c      |   17 ++++++++++-------
>  drivers/cxl/core/pci.c  |    8 +-------
>  drivers/cxl/core/port.c |   30 +++++++-----------------------
>  drivers/cxl/cxl.h       |    6 +-----
>  4 files changed, 19 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 40286f5df812..951695cdb455 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -108,8 +108,10 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  
>  	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
>  	cxld->target_type = CXL_DECODER_EXPANDER;
> -	cxld->platform_res = (struct resource)DEFINE_RES_MEM(cfmws->base_hpa,
> -							     cfmws->window_size);
> +	cxld->hpa_range = (struct range) {
> +		.start = cfmws->base_hpa,
> +		.end = cfmws->base_hpa + cfmws->window_size - 1,
> +	};
>  	cxld->interleave_ways = CFMWS_INTERLEAVE_WAYS(cfmws);
>  	cxld->interleave_granularity = CFMWS_INTERLEAVE_GRANULARITY(cfmws);
>  
> @@ -119,13 +121,14 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  	else
>  		rc = cxl_decoder_autoremove(dev, cxld);
>  	if (rc) {
> -		dev_err(dev, "Failed to add decoder for %pr\n",
> -			&cxld->platform_res);
> +		dev_err(dev, "Failed to add decoder for [%#llx - %#llx]\n",
> +			cxld->hpa_range.start, cxld->hpa_range.end);
>  		return 0;
>  	}
> -	dev_dbg(dev, "add: %s node: %d range %pr\n", dev_name(&cxld->dev),
> -		phys_to_target_node(cxld->platform_res.start),
> -		&cxld->platform_res);
> +	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> +		dev_name(&cxld->dev),
> +		phys_to_target_node(cxld->hpa_range.start),
> +		cxld->hpa_range.start, cxld->hpa_range.end);
>  
>  	return 0;
>  }
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index c4c99ff7b55e..7672789c3225 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -225,7 +225,6 @@ static int dvsec_range_allowed(struct device *dev, void *arg)
>  {
>  	struct range *dev_range = arg;
>  	struct cxl_decoder *cxld;
> -	struct range root_range;
>  
>  	if (!is_root_decoder(dev))
>  		return 0;
> @@ -237,12 +236,7 @@ static int dvsec_range_allowed(struct device *dev, void *arg)
>  	if (!(cxld->flags & CXL_DECODER_F_RAM))
>  		return 0;
>  
> -	root_range = (struct range) {
> -		.start = cxld->platform_res.start,
> -		.end = cxld->platform_res.end,
> -	};
> -
> -	return range_contains(&root_range, dev_range);
> +	return range_contains(&cxld->hpa_range, dev_range);
>  }
>  
>  static void disable_hdm(void *_cxlhdm)
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 98bcbbd59a75..b51eb41aa839 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -73,29 +73,17 @@ static ssize_t start_show(struct device *dev, struct device_attribute *attr,
>  			  char *buf)
>  {
>  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> -	u64 start;
>  
> -	if (is_root_decoder(dev))
> -		start = cxld->platform_res.start;
> -	else
> -		start = cxld->hpa_range.start;
> -
> -	return sysfs_emit(buf, "%#llx\n", start);
> +	return sysfs_emit(buf, "%#llx\n", cxld->hpa_range.start);
>  }
>  static DEVICE_ATTR_ADMIN_RO(start);
>  
>  static ssize_t size_show(struct device *dev, struct device_attribute *attr,
> -			char *buf)
> +			 char *buf)

nitpick: Unrelated change.  Ideally not in this patch.

>  {
>  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> -	u64 size;
> -
> -	if (is_root_decoder(dev))
> -		size = resource_size(&cxld->platform_res);
> -	else
> -		size = range_len(&cxld->hpa_range);
>  
> -	return sysfs_emit(buf, "%#llx\n", size);
> +	return sysfs_emit(buf, "%#llx\n", range_len(&cxld->hpa_range));
>  }
>  static DEVICE_ATTR_RO(size);
>  
> @@ -1233,7 +1221,10 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  	cxld->interleave_ways = 1;
>  	cxld->interleave_granularity = PAGE_SIZE;
>  	cxld->target_type = CXL_DECODER_EXPANDER;
> -	cxld->platform_res = (struct resource)DEFINE_RES_MEM(0, 0);
> +	cxld->hpa_range = (struct range) {
> +		.start = 0,
> +		.end = -1,
> +	};
>  
>  	return cxld;
>  err:
> @@ -1347,13 +1338,6 @@ int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map)
>  	if (rc)
>  		return rc;
>  
> -	/*
> -	 * Platform decoder resources should show up with a reasonable name. All
> -	 * other resources are just sub ranges within the main decoder resource.
> -	 */
> -	if (is_root_decoder(dev))
> -		cxld->platform_res.name = dev_name(dev);
> -
>  	return device_add(dev);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_decoder_add_locked, CXL);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 8256728cea8d..35ce17872fc1 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -197,7 +197,6 @@ enum cxl_decoder_type {
>   * struct cxl_decoder - CXL address range decode configuration
>   * @dev: this decoder's device
>   * @id: kernel device name id
> - * @platform_res: address space resources considered by root decoder
>   * @hpa_range: Host physical address range mapped by this decoder
>   * @interleave_ways: number of cxl_dports in this decode
>   * @interleave_granularity: data stride per dport
> @@ -210,10 +209,7 @@ enum cxl_decoder_type {
>  struct cxl_decoder {
>  	struct device dev;
>  	int id;
> -	union {
> -		struct resource platform_res;
> -		struct range hpa_range;
> -	};
> +	struct range hpa_range;
>  	int interleave_ways;
>  	int interleave_granularity;
>  	enum cxl_decoder_type target_type;
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 06/46] cxl/core: Drop is_cxl_decoder()
  2022-06-24  2:45 ` [PATCH 06/46] cxl/core: Drop is_cxl_decoder() Dan Williams
  2022-06-24  3:48   ` Alison Schofield
@ 2022-06-28 15:25   ` Jonathan Cameron
       [not found]   ` <CGME20220629203448uscas1p264a7f79a1ed7f9257eefcb3064c7d943@uscas1p2.samsung.com>
  2 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 15:25 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:45:43 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> This helper was only used to identify the object type for lockdep
> purposes. Now that lockdep support is done with explicit lock classes,
> this helper can be dropped.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
FWIW..

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/cxl/core/port.c |    6 ------
>  drivers/cxl/cxl.h       |    1 -
>  2 files changed, 7 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index b51eb41aa839..13c321afe076 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -271,12 +271,6 @@ bool is_root_decoder(struct device *dev)
>  }
>  EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
>  
> -bool is_cxl_decoder(struct device *dev)
> -{
> -	return dev->type && dev->type->release == cxl_decoder_release;
> -}
> -EXPORT_SYMBOL_NS_GPL(is_cxl_decoder, CXL);
> -
>  struct cxl_decoder *to_cxl_decoder(struct device *dev)
>  {
>  	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 35ce17872fc1..6e08fe8cc0fe 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -337,7 +337,6 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
>  struct cxl_decoder *to_cxl_decoder(struct device *dev);
>  bool is_root_decoder(struct device *dev);
>  bool is_endpoint_decoder(struct device *dev);
> -bool is_cxl_decoder(struct device *dev);
>  struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>  					   unsigned int nr_targets);
>  struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity}
  2022-06-24  2:45 ` [PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity} Dan Williams
@ 2022-06-28 15:36   ` Jonathan Cameron
  2022-07-09 23:52     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 15:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

On Thu, 23 Jun 2022 19:45:50 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Interleave granularity and ways have CXL specification defined encodings.
> Promote the conversion helpers to a common header, and use them to
> replace other open-coded instances.
> 
> Force caller to consider the error case of the conversion as well.

What was the reasoning behind not just returning the value (rather
than the extra *val parameter)?  Negative values would be errors
still. Plenty of room to do that in an int.

I don't really mind, just feels a tiny bit uglier than it could be.

Also, there is one little unrelated type change in here.

> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/acpi.c     |   34 +++++++++++++++++++---------------
>  drivers/cxl/core/hdm.c |   35 +++++++++--------------------------
>  drivers/cxl/cxl.h      |   26 ++++++++++++++++++++++++++
>  3 files changed, 54 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 951695cdb455..544cb10ce33e 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -9,10 +9,6 @@
>  #include "cxlpci.h"
>  #include "cxl.h"
>  
> -/* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
> -#define CFMWS_INTERLEAVE_WAYS(x)	(1 << (x)->interleave_ways)
> -#define CFMWS_INTERLEAVE_GRANULARITY(x)	((x)->granularity + 8)
> -
>  static unsigned long cfmws_to_decoder_flags(int restrictions)
>  {
>  	unsigned long flags = CXL_DECODER_F_ENABLE;
> @@ -34,7 +30,8 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
>  static int cxl_acpi_cfmws_verify(struct device *dev,
>  				 struct acpi_cedt_cfmws *cfmws)
>  {
> -	int expected_len;
> +	unsigned int expected_len, ways;

Type change for expected_len seems fine but isn't mentioned in the patch description.

> +	int rc;
>  
>  	if (cfmws->interleave_arithmetic != ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
>  		dev_err(dev, "CFMWS Unsupported Interleave Arithmetic\n");
> @@ -51,14 +48,14 @@ static int cxl_acpi_cfmws_verify(struct device *dev,
>  		return -EINVAL;
>  	}
>  
> -	if (CFMWS_INTERLEAVE_WAYS(cfmws) > CXL_DECODER_MAX_INTERLEAVE) {
> -		dev_err(dev, "CFMWS Interleave Ways (%d) too large\n",
> -			CFMWS_INTERLEAVE_WAYS(cfmws));
> +	rc = cxl_to_ways(cfmws->interleave_ways, &ways);
> +	if (rc) {
> +		dev_err(dev, "CFMWS Interleave Ways (%d) invalid\n",
> +			cfmws->interleave_ways);
>  		return -EINVAL;
>  	}
>  
> -	expected_len = struct_size((cfmws), interleave_targets,
> -				   CFMWS_INTERLEAVE_WAYS(cfmws));
> +	expected_len = struct_size(cfmws, interleave_targets, ways);
>  
>  	if (cfmws->header.length < expected_len) {
>  		dev_err(dev, "CFMWS length %d less than expected %d\n",
> @@ -87,7 +84,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  	struct device *dev = ctx->dev;
>  	struct acpi_cedt_cfmws *cfmws;
>  	struct cxl_decoder *cxld;
> -	int rc, i;
> +	unsigned int ways, i, ig;
> +	int rc;
>  
>  	cfmws = (struct acpi_cedt_cfmws *) header;
>  
> @@ -99,10 +97,16 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  		return 0;
>  	}
>  
> -	for (i = 0; i < CFMWS_INTERLEAVE_WAYS(cfmws); i++)
> +	rc = cxl_to_ways(cfmws->interleave_ways, &ways);
> +	if (rc)
> +		return rc;
> +	rc = cxl_to_granularity(cfmws->granularity, &ig);
> +	if (rc)
> +		return rc;
> +	for (i = 0; i < ways; i++)
>  		target_map[i] = cfmws->interleave_targets[i];
>  
> -	cxld = cxl_root_decoder_alloc(root_port, CFMWS_INTERLEAVE_WAYS(cfmws));
> +	cxld = cxl_root_decoder_alloc(root_port, ways);
>  	if (IS_ERR(cxld))
>  		return 0;
>  
> @@ -112,8 +116,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  		.start = cfmws->base_hpa,
>  		.end = cfmws->base_hpa + cfmws->window_size - 1,
>  	};
> -	cxld->interleave_ways = CFMWS_INTERLEAVE_WAYS(cfmws);
> -	cxld->interleave_granularity = CFMWS_INTERLEAVE_GRANULARITY(cfmws);
> +	cxld->interleave_ways = ways;
> +	cxld->interleave_granularity = ig;
>  
>  	rc = cxl_decoder_add(cxld, target_map);
>  	if (rc)
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 5c070c93b07f..46635105a1f1 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -128,33 +128,12 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
>  
> -static int to_interleave_granularity(u32 ctrl)
> -{
> -	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IG_MASK, ctrl);
> -
> -	return 256 << val;
> -}
> -
> -static int to_interleave_ways(u32 ctrl)
> -{
> -	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IW_MASK, ctrl);
> -
> -	switch (val) {
> -	case 0 ... 4:
> -		return 1 << val;
> -	case 8 ... 10:
> -		return 3 << (val - 8);
> -	default:
> -		return 0;
> -	}
> -}
> -
>  static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  			    int *target_map, void __iomem *hdm, int which)
>  {
>  	u64 size, base;
> +	int i, rc;
>  	u32 ctrl;
> -	int i;
>  	union {
>  		u64 value;
>  		unsigned char target_id[8];
> @@ -183,14 +162,18 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
>  			cxld->flags |= CXL_DECODER_F_LOCK;
>  	}
> -	cxld->interleave_ways = to_interleave_ways(ctrl);
> -	if (!cxld->interleave_ways) {
> +	rc = cxl_to_ways(FIELD_GET(CXL_HDM_DECODER0_CTRL_IW_MASK, ctrl),
> +			 &cxld->interleave_ways);
> +	if (rc) {
>  		dev_warn(&port->dev,
>  			 "decoder%d.%d: Invalid interleave ways (ctrl: %#x)\n",
>  			 port->id, cxld->id, ctrl);
> -		return -ENXIO;
> +		return rc;
>  	}
> -	cxld->interleave_granularity = to_interleave_granularity(ctrl);
> +	rc = cxl_to_granularity(FIELD_GET(CXL_HDM_DECODER0_CTRL_IG_MASK, ctrl),
> +				&cxld->interleave_granularity);
> +	if (rc)
> +		return rc;
>  
>  	if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
>  		cxld->target_type = CXL_DECODER_EXPANDER;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 6e08fe8cc0fe..fd02f9e2a829 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -64,6 +64,32 @@ static inline int cxl_hdm_decoder_count(u32 cap_hdr)
>  	return val ? val * 2 : 1;
>  }
>  
> +/* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
> +static inline int cxl_to_granularity(u16 ig, unsigned int *val)
> +{
> +	if (ig > 6)
> +		return -EINVAL;
> +	*val = 256 << ig;
> +	return 0;
> +}
> +
> +/* Encode defined in CXL ECN "3, 6, 12 and 16-way memory Interleaving" */
> +static inline int cxl_to_ways(u8 eniw, unsigned int *val)
> +{
> +	switch (eniw) {
> +	case 0 ... 4:
> +		*val = 1 << eniw;
> +		break;
> +	case 8 ... 10:
> +		*val = 3 << (eniw - 8);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
>  #define CXLDEV_CAP_ARRAY_OFFSET 0x0
>  #define   CXLDEV_CAP_ARRAY_CAP_ID 0
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder'
  2022-06-24  2:45 ` [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder' Dan Williams
@ 2022-06-28 16:12   ` Jonathan Cameron
  2022-06-30 10:56     ` Jonathan Cameron
  2022-07-10  0:33     ` Dan Williams
  0 siblings, 2 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 16:12 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

On Thu, 23 Jun 2022 19:45:57 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Currently 'struct cxl_decoder' contains the superset of attributes
> needed for all decoder types. Before more type-specific attributes are
> added to the common definition, reorganize 'struct cxl_decoder' into type
> specific objects.
> 
> This patch, the first of three, factors out a cxl_switch_decoder type.
> The 'switch' decoder type represents the decoder instances of cxl_port's
> that route from the root of a CXL memory decode topology to the
> endpoints. They come in two flavors, root-level decoders, statically
> defined by platform firmware, and mid-level decoders, where
> interleave-granularity, interleave-width, and the target list are
> mutable.

I'd like to see this info on cxl_switch_decoder being used for
switches AND other stuff as docs next to the definition. It confused
me when looked directly at the resulting of applying this series
and made more sense once I read to this patch.

> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Basic idea is fine, but there are a few places where I think this is
'too clever' with error handling and it's worth duplicating a few
error messages to keep the flow simpler.

Also, nice to drop the white space tweaks that have snuck in here.
Particularly the wrong one ;)


> ---
>  drivers/cxl/acpi.c           |    4 +
>  drivers/cxl/core/hdm.c       |   21 +++++---
>  drivers/cxl/core/port.c      |  115 +++++++++++++++++++++++++++++++-----------
>  drivers/cxl/cxl.h            |   27 ++++++----
>  tools/testing/cxl/test/cxl.c |   12 +++-
>  5 files changed, 128 insertions(+), 51 deletions(-)
> 

> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 46635105a1f1..2d1f3e6eebea 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c


> @@ -226,8 +226,15 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  
>  		if (is_cxl_endpoint(port))
>  			cxld = cxl_endpoint_decoder_alloc(port);
> -		else
> -			cxld = cxl_switch_decoder_alloc(port, target_count);
> +		else {
> +			struct cxl_switch_decoder *cxlsd;
> +
> +			cxlsd = cxl_switch_decoder_alloc(port, target_count);
> +			if (IS_ERR(cxlsd))
> +				cxld = ERR_CAST(cxlsd);

As described later, I'd rather local error handing in these branches
as I think it will be more readable than this dance with error casting. for
the cost of maybe 2 lines.

> +			else
> +				cxld = &cxlsd->cxld;
> +		}
>  		if (IS_ERR(cxld)) {
>  			dev_warn(&port->dev,
>  				 "Failed to allocate the decoder\n");
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 13c321afe076..fd1cac13cd2e 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c

....

>  
> +static void __cxl_decoder_release(struct cxl_decoder *cxld)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +
> +	ida_free(&port->decoder_ida, cxld->id);
> +	put_device(&port->dev);
> +}
> +
>  static void cxl_decoder_release(struct device *dev)
>  {
>  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> -	struct cxl_port *port = to_cxl_port(dev->parent);
>  
> -	ida_free(&port->decoder_ida, cxld->id);
> +	__cxl_decoder_release(cxld);
>  	kfree(cxld);
> -	put_device(&port->dev);

I was going to moan about this reorder, but this is actually
the right order as we allocate then get_device() so
reverse should indeed do the put _device first.
So good incidental clean up of ordering :)

> +}
> +
> +static void cxl_switch_decoder_release(struct device *dev)
> +{
> +	struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
> +
> +	__cxl_decoder_release(&cxlsd->cxld);
> +	kfree(cxlsd);
>  }
>  
>  static const struct device_type cxl_decoder_endpoint_type = {
> @@ -250,13 +267,13 @@ static const struct device_type cxl_decoder_endpoint_type = {
>  
>  static const struct device_type cxl_decoder_switch_type = {
>  	.name = "cxl_decoder_switch",
> -	.release = cxl_decoder_release,
> +	.release = cxl_switch_decoder_release,
>  	.groups = cxl_decoder_switch_attribute_groups,
>  };
>  
>  static const struct device_type cxl_decoder_root_type = {
>  	.name = "cxl_decoder_root",
> -	.release = cxl_decoder_release,
> +	.release = cxl_switch_decoder_release,
>  	.groups = cxl_decoder_root_attribute_groups,
>  };
>  
> @@ -271,15 +288,29 @@ bool is_root_decoder(struct device *dev)
>  }
>  EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
>  
> +static bool is_switch_decoder(struct device *dev)
> +{
> +	return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
> +}
> +
>  struct cxl_decoder *to_cxl_decoder(struct device *dev)
>  {
> -	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
> +	if (dev_WARN_ONCE(dev,
> +			  !is_switch_decoder(dev) && !is_endpoint_decoder(dev),
>  			  "not a cxl_decoder device\n"))
>  		return NULL;
>  	return container_of(dev, struct cxl_decoder, dev);
>  }
>  EXPORT_SYMBOL_NS_GPL(to_cxl_decoder, CXL);
>  
> +static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
> +{
> +	if (dev_WARN_ONCE(dev, !is_switch_decoder(dev),
> +			  "not a cxl_switch_decoder device\n"))
> +		return NULL;
> +	return container_of(dev, struct cxl_switch_decoder, cxld.dev);
> +}
> +
>  static void cxl_ep_release(struct cxl_ep *ep)
>  {
>  	if (!ep)
> @@ -1129,7 +1160,7 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_find_dport_by_dev, CXL);
>  
> -static int decoder_populate_targets(struct cxl_decoder *cxld,
> +static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
>  				    struct cxl_port *port, int *target_map)
>  {
>  	int i, rc = 0;
> @@ -1142,17 +1173,17 @@ static int decoder_populate_targets(struct cxl_decoder *cxld,
>  	if (list_empty(&port->dports))
>  		return -EINVAL;
>  
> -	write_seqlock(&cxld->target_lock);
> -	for (i = 0; i < cxld->nr_targets; i++) {
> +	write_seqlock(&cxlsd->target_lock);
> +	for (i = 0; i < cxlsd->nr_targets; i++) {
>  		struct cxl_dport *dport = find_dport(port, target_map[i]);
>  
>  		if (!dport) {
>  			rc = -ENXIO;
>  			break;
>  		}
> -		cxld->target[i] = dport;
> +		cxlsd->target[i] = dport;
>  	}
> -	write_sequnlock(&cxld->target_lock);
> +	write_sequnlock(&cxlsd->target_lock);
>  
>  	return rc;
>  }
> @@ -1179,13 +1210,27 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  {
>  	struct cxl_decoder *cxld;
>  	struct device *dev;
> +	void *alloc;
>  	int rc = 0;
>  
>  	if (nr_targets > CXL_DECODER_MAX_INTERLEAVE)
>  		return ERR_PTR(-EINVAL);
>  
> -	cxld = kzalloc(struct_size(cxld, target, nr_targets), GFP_KERNEL);
> -	if (!cxld)
> +	if (nr_targets) {
> +		struct cxl_switch_decoder *cxlsd;
> +
> +		alloc = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL);

I'd rather see a local check on the allocation failure even if it adds a few lines
of duplicated code - which after you've dropped the local alloc variable won't be
much even after a later patch adds another path in here.  The eventual code
of this function is more than a little nasty when an early return in each
path would, as far as I can tell, give the same result without the at least
3 null checks prior to returning (to ensure nothing happens before reaching
the if (!alloc)




		cxlsd = kzalloc()
		if (!cxlsd)
			return ERR_PTR(-ENOMEM);

		cxlsd->nr_targets = nr_targets;
		seqlock_init(...)

	} else {
		cxld = kzalloc(sizerof(*cxld), GFP_KERNEL);
		if (!cxld)
			return ERR_PTR(-ENOMEM);

> +		cxlsd = alloc;
> +		if (cxlsd) {
> +			cxlsd->nr_targets = nr_targets;
> +			seqlock_init(&cxlsd->target_lock);
> +			cxld = &cxlsd->cxld;
> +		}
> +	} else {
> +		alloc = kzalloc(sizeof(*cxld), GFP_KERNEL);
> +		cxld = alloc;
> +	}
> +	if (!alloc)
>  		return ERR_PTR(-ENOMEM);
>  
>  	rc = ida_alloc(&port->decoder_ida, GFP_KERNEL);
> @@ -1196,8 +1241,6 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  	get_device(&port->dev);
>  	cxld->id = rc;
>  
> -	cxld->nr_targets = nr_targets;
> -	seqlock_init(&cxld->target_lock);
>  	dev = &cxld->dev;
>  	device_initialize(dev);
>  	lockdep_set_class(&dev->mutex, &cxl_decoder_key);
> @@ -1222,7 +1265,7 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  
>  	return cxld;
>  err:
> -	kfree(cxld);
> +	kfree(alloc);
>  	return ERR_PTR(rc);
>  }
>  
> @@ -1236,13 +1279,18 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>   * firmware description of CXL resources into a CXL standard decode
>   * topology.
>   */
> -struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> -					   unsigned int nr_targets)
> +struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> +						  unsigned int nr_targets)
>  {
> +	struct cxl_decoder *cxld;
> +
>  	if (!is_cxl_root(port))
>  		return ERR_PTR(-EINVAL);
>  
> -	return cxl_decoder_alloc(port, nr_targets);
> +	cxld = cxl_decoder_alloc(port, nr_targets);
> +	if (IS_ERR(cxld))
> +		return ERR_CAST(cxld);
> +	return to_cxl_switch_decoder(&cxld->dev);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
>  
> @@ -1257,13 +1305,18 @@ EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
>   * that sit between Switch Upstream Ports / Switch Downstream Ports and
>   * Host Bridges / Root Ports.
>   */
> -struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> -					     unsigned int nr_targets)
> +struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> +						    unsigned int nr_targets)
>  {
> +	struct cxl_decoder *cxld;
> +
>  	if (is_cxl_root(port) || is_cxl_endpoint(port))
>  		return ERR_PTR(-EINVAL);
>  
> -	return cxl_decoder_alloc(port, nr_targets);
> +	cxld = cxl_decoder_alloc(port, nr_targets);
> +	if (IS_ERR(cxld))
> +		return ERR_CAST(cxld);
> +	return to_cxl_switch_decoder(&cxld->dev);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL);
>  
> @@ -1320,7 +1373,9 @@ int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map)
>  
>  	port = to_cxl_port(cxld->dev.parent);
>  	if (!is_endpoint_decoder(dev)) {
> -		rc = decoder_populate_targets(cxld, port, target_map);
> +		struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
> +
> +		rc = decoder_populate_targets(cxlsd, port, target_map);
>  		if (rc && (cxld->flags & CXL_DECODER_F_ENABLE)) {
>  			dev_err(&port->dev,
>  				"Failed to populate active decoder targets\n");
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index fd02f9e2a829..7525b55b11bb 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -220,7 +220,7 @@ enum cxl_decoder_type {
>  #define CXL_DECODER_MAX_INTERLEAVE 16
>  
>  /**
> - * struct cxl_decoder - CXL address range decode configuration
> + * struct cxl_decoder - Common CXL HDM Decoder Attributes
>   * @dev: this decoder's device
>   * @id: kernel device name id
>   * @hpa_range: Host physical address range mapped by this decoder
> @@ -228,10 +228,7 @@ enum cxl_decoder_type {
>   * @interleave_granularity: data stride per dport
>   * @target_type: accelerator vs expander (type2 vs type3) selector
>   * @flags: memory type capabilities and locking
> - * @target_lock: coordinate coherent reads of the target list
> - * @nr_targets: number of elements in @target
> - * @target: active ordered target list in current decoder configuration
> - */
> +*/

?

>  struct cxl_decoder {
>  	struct device dev;
>  	int id;
> @@ -240,12 +237,22 @@ struct cxl_decoder {
>  	int interleave_granularity;
>  	enum cxl_decoder_type target_type;
>  	unsigned long flags;
> +};
> +
> +/**
> + * struct cxl_switch_decoder - Switch specific CXL HDM Decoder

Whilst you define the broad use of switch in the patch description, I think
it is worth explaining here that it's CFMWS, HB and switch decoders
(if I understand correctly - this had me very confused when looking
at the overall code)

> + * @cxld: base cxl_decoder object
> + * @target_lock: coordinate coherent reads of the target list
> + * @nr_targets: number of elements in @target
> + * @target: active ordered target list in current decoder configuration
> + */
> +struct cxl_switch_decoder {
> +	struct cxl_decoder cxld;
>  	seqlock_t target_lock;
>  	int nr_targets;
>  	struct cxl_dport *target[];
>  };
>  
> -

*grumble grumble*  Unconnected white space fix.

>  /**
>   * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
>   * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
> @@ -363,10 +370,10 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
>  struct cxl_decoder *to_cxl_decoder(struct device *dev);
>  bool is_root_decoder(struct device *dev);
>  bool is_endpoint_decoder(struct device *dev);
> -struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> -					   unsigned int nr_targets);
> -struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> -					     unsigned int nr_targets);
> +struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> +						  unsigned int nr_targets);
> +struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> +						    unsigned int nr_targets);
>  int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
>  struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
>  int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map);
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 7a08b025f2de..68288354b419 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -451,9 +451,15 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  		struct cxl_decoder *cxld;
>  		int rc;
>  
> -		if (target_count)
> -			cxld = cxl_switch_decoder_alloc(port, target_count);
> -		else
> +		if (target_count) {
> +			struct cxl_switch_decoder *cxlsd;
> +
> +			cxlsd = cxl_switch_decoder_alloc(port, target_count);
> +			if (IS_ERR(cxlsd))
> +				cxld = ERR_CAST(cxlsd);

Looks cleaner to me to move error handling into the branches. You duplicate
an error print but avoid ERR_CAST mess just to cast it back to an error in the
error path a few lines later.


			if (IS_ERR(cxlsd)) {
				dev_warn(&port->dev,
					 "Failed to allocate switch decoder\n");
				return PTR_ERR(cxlsd);
			}
			cxld = &cxlsd->cxld;
		} else {
			cxld = cxl_endpoint_decoder_alloc(port);
			if (IS_ERR(cxld)) {
				dev_warn(&port->dev,
					 "Failed to allocate EP decoder\n");
				return PTR_ERR(cxld);
		}


> +			else
> +				cxld = &cxlsd->cxld;
> +		} else
>  			cxld = cxl_endpoint_decoder_alloc(port);
>  		if (IS_ERR(cxld)) {
>  			dev_warn(&port->dev,
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
  2022-06-24  2:46 ` [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource Dan Williams
@ 2022-06-28 16:43   ` Jonathan Cameron
  2022-07-10  2:12     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 16:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:46:05 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Recall that CXL capable address ranges, on ACPI platforms, are published
> in the CEDT.CFMWS (CXL Early Discovery Table - CXL Fixed Memory Window
> Structures). These windows represent both the actively mapped capacity
> and the potential address space that can be dynamically assigned to a
> new CXL decode configuration.
> 
> CXL endpoints like DDR DIMMs can be mapped at any physical address
> including 0 and legacy ranges.
> 
> There is an expectation and requirement that the /proc/iomem interface
> and the iomem_resource in the kernel reflect the full set of platform
> address ranges. I.e. that every address range that platform firmware and
> bus drivers enumerate be reflected as an iomem_resource entry. The hard
> requirement to do this for CXL arises from the fact that capabilities
> like CONFIG_DEVICE_PRIVATE expect to be able to treat empty
> iomem_resource ranges as free for software to use as proxy address
> space. Without CXL publishing its potential address ranges in
> iomem_resource, the CONFIG_DEVICE_PRIVATE mechanism may inadvertently
> steal capacity reserved for runtime provisioning of new CXL regions.
> 
> The approach taken supports dynamically publishing the CXL window map on
> demand when a CXL platform driver like cxl_acpi loads. The windows are
> then forced into the first level of iomem_resource tree via the
> insert_resource_expand_to_fit() API. This forcing sacrifices some
> resource boundary accurracy in order to better reflect the decode
> hierarchy of a CXL window hosting "System RAM" and other resources.

I don't fully understand this and in particular what assumptions it
is making.  How do we end up with overlaping resources via just parsing
the CFMWS for instance...

I would shout a lot louder in this description about using the CXL NS
for that export.  That's liable to be controversial.

> 
> Walkers of the iomem_resource tree will also need to have access to the
> related 'struct cxl_decoder' instances to disambiguate which portions of
> a CXL memory resource are present vs expanded to enforce the expected
> resource topology.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/acpi.c |  110 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  kernel/resource.c  |    7 +++
>  2 files changed, 114 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index d1b914dfa36c..003fa4fde357 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -73,6 +73,7 @@ static int cxl_acpi_cfmws_verify(struct device *dev,
>  struct cxl_cfmws_context {
>  	struct device *dev;
>  	struct cxl_port *root_port;
> +	int id;
>  };
>  
>  static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> @@ -84,8 +85,10 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  	struct cxl_switch_decoder *cxlsd;
>  	struct device *dev = ctx->dev;
>  	struct acpi_cedt_cfmws *cfmws;
> +	struct resource *cxl_res;
>  	struct cxl_decoder *cxld;
>  	unsigned int ways, i, ig;
> +	struct resource *res;
>  	int rc;
>  
>  	cfmws = (struct acpi_cedt_cfmws *) header;
> @@ -107,6 +110,24 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  	for (i = 0; i < ways; i++)
>  		target_map[i] = cfmws->interleave_targets[i];
>  
> +	res = kzalloc(sizeof(*res), GFP_KERNEL);
> +	if (!res)
> +		return -ENOMEM;
> +
> +	res->name = kasprintf(GFP_KERNEL, "CXL Window %d", ctx->id++);
> +	if (!res->name)
> +		goto err_name;
> +
> +	res->start = cfmws->base_hpa;
> +	res->end = cfmws->base_hpa + cfmws->window_size - 1;
> +	res->flags = IORESOURCE_MEM;
> +
> +	/* add to the local resource tracking to establish a sort order */
> +	cxl_res = dev_get_drvdata(&root_port->dev);

As mentioned below, why not add cxl_res to the ctx?

> +	rc = insert_resource(cxl_res, res);
> +	if (rc)
> +		goto err_insert;
> +
>  	cxlsd = cxl_root_decoder_alloc(root_port, ways);
>  	if (IS_ERR(cxld))
>  		return 0;
> @@ -115,8 +136,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
>  	cxld->target_type = CXL_DECODER_EXPANDER;
>  	cxld->hpa_range = (struct range) {
> -		.start = cfmws->base_hpa,
> -		.end = cfmws->base_hpa + cfmws->window_size - 1,
> +		.start = res->start,
> +		.end = res->end,
>  	};
>  	cxld->interleave_ways = ways;
>  	cxld->interleave_granularity = ig;
> @@ -131,12 +152,19 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  			cxld->hpa_range.start, cxld->hpa_range.end);
>  		return 0;
>  	}
> +
Another whitespace tweak that shouldn't be in a patch like this...

>  	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>  		dev_name(&cxld->dev),
>  		phys_to_target_node(cxld->hpa_range.start),
>  		cxld->hpa_range.start, cxld->hpa_range.end);
>  
>  	return 0;
> +
> +err_insert:
> +	kfree(res->name);
> +err_name:
> +	kfree(res);
> +	return -ENOMEM;
>  }
>  
>  __mock struct acpi_device *to_cxl_host_bridge(struct device *host,
> @@ -291,9 +319,66 @@ static void cxl_acpi_lock_reset_class(void *dev)
>  	device_lock_reset_class(dev);
>  }
>  
> +static void del_cxl_resource(struct resource *res)
> +{
> +	kfree(res->name);
> +	kfree(res);
> +}
> +
> +static void remove_cxl_resources(void *data)
> +{
> +	struct resource *res, *next, *cxl = data;
> +
> +	for (res = cxl->child; res; res = next) {
> +		struct resource *victim = (struct resource *) res->desc;
> +
> +		next = res->sibling;
> +		remove_resource(res);
> +
> +		if (victim) {
> +			remove_resource(victim);
> +			kfree(victim);
> +		}
> +
> +		del_cxl_resource(res);
> +	}
> +}
> +
> +static int add_cxl_resources(struct resource *cxl)

I'd like to see some documentation of what this is doing...

> +{
> +	struct resource *res, *new, *next;
> +
> +	for (res = cxl->child; res; res = next) {
> +		new = kzalloc(sizeof(*new), GFP_KERNEL);
> +		if (!new)
> +			return -ENOMEM;
> +		new->name = res->name;
> +		new->start = res->start;
> +		new->end = res->end;
> +		new->flags = IORESOURCE_MEM;
> +		res->desc = (unsigned long) new;
> +
> +		insert_resource_expand_to_fit(&iomem_resource, new);

Given you've called out limitations of this call in the patch description
it would be good to have some of that info in the code.

> +
> +		next = res->sibling;
> +		while (next && resource_overlaps(new, next)) {

I'm struggling to grasp why we'd have overlaps, comments would probably help.

> +			if (resource_contains(new, next)) {
> +				struct resource *_next = next->sibling;
> +
> +				remove_resource(next);
> +				del_cxl_resource(next);
> +				next = _next;
> +			} else
> +				next->start = new->end + 1;
> +		}
> +	}
> +	return 0;
> +}
> +
>  static int cxl_acpi_probe(struct platform_device *pdev)
>  {
>  	int rc;
> +	struct resource *cxl_res;
>  	struct cxl_port *root_port;
>  	struct device *host = &pdev->dev;
>  	struct acpi_device *adev = ACPI_COMPANION(host);
> @@ -305,21 +390,40 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>  	if (rc)
>  		return rc;
>  
> +	cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL);
> +	if (!cxl_res)
> +		return -ENOMEM;
> +	cxl_res->name = "CXL mem";
> +	cxl_res->start = 0;
> +	cxl_res->end = -1;
> +	cxl_res->flags = IORESOURCE_MEM;
> +
>  	root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
>  	if (IS_ERR(root_port))
>  		return PTR_ERR(root_port);
>  	dev_dbg(host, "add: %s\n", dev_name(&root_port->dev));
> +	dev_set_drvdata(&root_port->dev, cxl_res);

Rather ugly way of sneaking it into the callback. If that is the only
purpose, perhaps better to just add to the cxl_cfmws_context.

>  
>  	rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
>  			      add_host_bridge_dport);
>  	if (rc < 0)
>  		return rc;
>  
> +	rc = devm_add_action_or_reset(host, remove_cxl_resources, cxl_res);
> +	if (rc)
> +		return rc;
> +
>  	ctx = (struct cxl_cfmws_context) {
>  		.dev = host,
>  		.root_port = root_port,
>  	};
> -	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, cxl_parse_cfmws, &ctx);
> +	rc = acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, cxl_parse_cfmws, &ctx);
> +	if (rc < 0)
> +		return -ENXIO;
> +
> +	rc = add_cxl_resources(cxl_res);
> +	if (rc)
> +		return rc;
>  
>  	/*
>  	 * Root level scanned with host-bridge as dports, now scan host-bridges


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources
  2022-06-24  2:46 ` [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources Dan Williams
@ 2022-06-28 16:49   ` Jonathan Cameron
  2022-07-10  2:20     ` Dan Williams
  2022-06-28 16:53   ` Jonathan Cameron
  1 sibling, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 16:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

On Thu, 23 Jun 2022 19:46:13 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Previously the target routing specifics of switch decoders were factored
> out of 'struct cxl_decoder' into 'struct cxl_switch_decoder'.
> 
> This patch, 2 of 3, adds a 'struct cxl_root_decoder' as a superset of a
> switch decoder that also track the associated CXL window platform
> resource.
> 
> Note that the reason the resource for a given root decoder needs to be
> looked up after the fact (i.e. after cxl_parse_cfmws() and
> add_cxl_resource()) is because add_cxl_resource() may have merged CXL
> windows in order to keep them at the top of the resource tree / decode
> hierarchy.

One trivial comment below that follows from earlier patch.

Otherwise, I'll look again at this when I understand what the constraints
of CXL windows are that you are dealing with.  I don't get why they might not
be at the top of the resource tree without the merging!

> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/acpi.c      |   40 ++++++++++++++++++++++++++++++++++++----
>  drivers/cxl/core/port.c |   43 +++++++++++++++++++++++++++++++++++++------
>  drivers/cxl/cxl.h       |   15 +++++++++++++--
>  3 files changed, 86 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 003fa4fde357..5972f380cdf2 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -82,7 +82,7 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  	int target_map[CXL_DECODER_MAX_INTERLEAVE];
>  	struct cxl_cfmws_context *ctx = arg;
>  	struct cxl_port *root_port = ctx->root_port;
> -	struct cxl_switch_decoder *cxlsd;
> +	struct cxl_root_decoder *cxlrd;
>  	struct device *dev = ctx->dev;
>  	struct acpi_cedt_cfmws *cfmws;
>  	struct resource *cxl_res;
> @@ -128,11 +128,11 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  	if (rc)
>  		goto err_insert;
>  
> -	cxlsd = cxl_root_decoder_alloc(root_port, ways);
> -	if (IS_ERR(cxld))
> +	cxlrd = cxl_root_decoder_alloc(root_port, ways);
> +	if (IS_ERR(cxlrd))
>  		return 0;
>  
> -	cxld = &cxlsd->cxld;
> +	cxld = &cxlrd->cxlsd.cxld;
>  	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
>  	cxld->target_type = CXL_DECODER_EXPANDER;
>  	cxld->hpa_range = (struct range) {
> @@ -375,6 +375,32 @@ static int add_cxl_resources(struct resource *cxl)
>  	return 0;
>  }
>  
> +static int pair_cxl_resource(struct device *dev, void *data)
> +{
> +	struct resource *cxl_res = data;
> +	struct resource *p;
> +
> +	if (!is_root_decoder(dev))
> +		return 0;
> +
> +	for (p = cxl_res->child; p; p = p->sibling) {
> +		struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +		struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> +		struct resource res = {
> +			.start = cxld->hpa_range.start,
> +			.end = cxld->hpa_range.end,
> +			.flags = IORESOURCE_MEM,
> +		};
> +
> +		if (resource_contains(p, &res)) {
> +			cxlrd->res = (struct resource *)p->desc;
> +			break;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static int cxl_acpi_probe(struct platform_device *pdev)
>  {
>  	int rc;
> @@ -425,6 +451,12 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>  	if (rc)
>  		return rc;
>  
> +	/*
> +	 * Populate the root decoders with their related iomem resource,
> +	 * if present
> +	 */
> +	device_for_each_child(&root_port->dev, cxl_res, pair_cxl_resource);
> +
>  	/*
>  	 * Root level scanned with host-bridge as dports, now scan host-bridges
>  	 * for their role as CXL uports to their CXL-capable PCIe Root Ports.
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index fd1cac13cd2e..abf3455c4eff 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -259,6 +259,23 @@ static void cxl_switch_decoder_release(struct device *dev)
>  	kfree(cxlsd);
>  }
>  
> +struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev)
> +{
> +	if (dev_WARN_ONCE(dev, !is_root_decoder(dev),
> +			  "not a cxl_root_decoder device\n"))
> +		return NULL;
> +	return container_of(dev, struct cxl_root_decoder, cxlsd.cxld.dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(to_cxl_root_decoder, CXL);
> +
> +static void cxl_root_decoder_release(struct device *dev)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +
> +	__cxl_decoder_release(&cxlrd->cxlsd.cxld);
> +	kfree(cxlrd);
> +}
> +
>  static const struct device_type cxl_decoder_endpoint_type = {
>  	.name = "cxl_decoder_endpoint",
>  	.release = cxl_decoder_release,
> @@ -273,7 +290,7 @@ static const struct device_type cxl_decoder_switch_type = {
>  
>  static const struct device_type cxl_decoder_root_type = {
>  	.name = "cxl_decoder_root",
> -	.release = cxl_switch_decoder_release,
> +	.release = cxl_root_decoder_release,
>  	.groups = cxl_decoder_root_attribute_groups,
>  };
>  
> @@ -1218,9 +1235,23 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  
>  	if (nr_targets) {
>  		struct cxl_switch_decoder *cxlsd;
> +		struct cxl_root_decoder *cxlrd;
> +
> +		if (is_cxl_root(port)) {
> +			alloc = kzalloc(struct_size(cxlrd, cxlsd.target,
> +						    nr_targets),
> +					GFP_KERNEL);
> +			cxlrd = alloc;
> +			if (cxlrd)
> +				cxlsd = &cxlrd->cxlsd;
> +			else
> +				cxlsd = NULL;
> +		} else {
> +			alloc = kzalloc(struct_size(cxlsd, target, nr_targets),
> +					GFP_KERNEL);
> +			cxlsd = alloc;

As earlier, I'd prefer you just handled errors when they happened rather than
dancing onwards...

> +		}
>  
> -		alloc = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL);
> -		cxlsd = alloc;
>  		if (cxlsd) {
>  			cxlsd->nr_targets = nr_targets;
>  			seqlock_init(&cxlsd->target_lock);
> @@ -1279,8 +1310,8 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>   * firmware description of CXL resources into a CXL standard decode
>   * topology.
>   */
> -struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> -						  unsigned int nr_targets)
> +struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> +						unsigned int nr_targets)
>  {
>  	struct cxl_decoder *cxld;
>  
> @@ -1290,7 +1321,7 @@ struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>  	cxld = cxl_decoder_alloc(port, nr_targets);
>  	if (IS_ERR(cxld))
>  		return ERR_CAST(cxld);
> -	return to_cxl_switch_decoder(&cxld->dev);
> +	return to_cxl_root_decoder(&cxld->dev);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 7525b55b11bb..6dd1e4c57a67 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -253,6 +253,16 @@ struct cxl_switch_decoder {
>  	struct cxl_dport *target[];
>  };
>  
> +/**
> + * struct cxl_root_decoder - Static platform CXL address decoder
> + * @res: host / parent resource for region allocations
> + * @cxlsd: base cxl switch decoder
> + */
> +struct cxl_root_decoder {
> +	struct resource *res;
> +	struct cxl_switch_decoder cxlsd;

Could be nice to those container of macros and just put the cxlsd first.

> +};
> +
>  /**
>   * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
>   * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
> @@ -368,10 +378,11 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
>  					const struct device *dev);
>  
>  struct cxl_decoder *to_cxl_decoder(struct device *dev);
> +struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
>  bool is_root_decoder(struct device *dev);
>  bool is_endpoint_decoder(struct device *dev);
> -struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> -						  unsigned int nr_targets);
> +struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> +						unsigned int nr_targets);
>  struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
>  						    unsigned int nr_targets);
>  int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources
  2022-06-24  2:46 ` [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources Dan Williams
  2022-06-28 16:49   ` Jonathan Cameron
@ 2022-06-28 16:53   ` Jonathan Cameron
  1 sibling, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 16:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches


>  
> +/**
> + * struct cxl_root_decoder - Static platform CXL address decoder
> + * @res: host / parent resource for region allocations
> + * @cxlsd: base cxl switch decoder
> + */
> +struct cxl_root_decoder {
> +	struct resource *res;
> +	struct cxl_switch_decoder cxlsd;

Ordering in these inheriting structures is inconsistent. I'd put
the cxlsd entry first here.  Doesn't matter hugely but seems a bit
odd when looking at next patch

> +};
> +
>  /**



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
  2022-06-24  2:46 ` [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources Dan Williams
@ 2022-06-28 16:55   ` Jonathan Cameron
  2022-07-10  2:40     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 16:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

On Thu, 23 Jun 2022 19:46:21 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Previously the target routing specifics of switch decoders and platfom
> CXL window resource tracking of root decoders were factored out of
> 'struct cxl_decoder'. While switch decoders translate from SPA to
> downstream ports, endpoint decoders translate from SPA to DPA.
> 
> This patch, 3 of 3, adds a 'struct cxl_endpoint_decoder' that tracks an
> endpoint-specific Device Physical Address (DPA) resource. For now this
> just defines ->dpa_res, a follow-on patch will handle requesting DPA
> resource ranges from a device-DPA resource tree.
> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/hdm.c       |   12 +++++++++---
>  drivers/cxl/core/port.c      |   36 +++++++++++++++++++++++++++---------
>  drivers/cxl/cxl.h            |   15 ++++++++++++++-
>  tools/testing/cxl/test/cxl.c |   11 +++++++++--
>  4 files changed, 59 insertions(+), 15 deletions(-)
> 



> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 6dd1e4c57a67..579f2d802396 100644


>  int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
> -struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
> +struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
>  int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map);
>  int cxl_decoder_autoremove(struct device *host, struct cxl_decoder *cxld);
>  int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 68288354b419..f52a5dd69d36 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -459,8 +459,15 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  				cxld = ERR_CAST(cxlsd);
>  			else
>  				cxld = &cxlsd->cxld;
> -		} else
> -			cxld = cxl_endpoint_decoder_alloc(port);
> +		} else {
> +			struct cxl_endpoint_decoder *cxled;
> +
> +			cxled = cxl_endpoint_decoder_alloc(port);
> +			if (IS_ERR(cxled))
> +				cxld = ERR_CAST(cxled);

It's my favourite code pattern to moan about today :)
Same thing - just handle error here and it'll be easier to read for cost of a few
lines of additional code.  Few other cases of it in here.


> +			else
> +				cxld = &cxled->cxld;
> +		}
>  		if (IS_ERR(cxld)) {
>  			dev_warn(&port->dev,
>  				 "Failed to allocate the decoder\n");
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 12/46] cxl/mem: Convert partition-info to resources
  2022-06-24  2:46 ` [PATCH 12/46] cxl/mem: Convert partition-info to resources Dan Williams
@ 2022-06-28 17:02   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 17:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ira Weiny, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:46:29 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> To date the per-device-partition DPA range information has only been
> used for enumeration purposes. In preparation for allocating regions
> from available DPA capacity, convert those ranges into DPA-type resource
> trees.
> 
> With resources and the new add_dpa_res() helper some open coded end
> address calculations and debug prints can be cleaned.
> 
> The 'cxlds->pmem_res' and 'cxlds->ram_res' resources are child resources
> of the total-device DPA space and they in turn will host DPA allocations
> from cxl_endpoint_decoder instances (tracked by cxled->dpa_res).
> 
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
LGTM

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 13/46] cxl/hdm: Require all decoders to be enumerated
  2022-06-24  2:46 ` [PATCH 13/46] cxl/hdm: Require all decoders to be enumerated Dan Williams
@ 2022-06-28 17:04   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-28 17:04 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

On Thu, 23 Jun 2022 19:46:36 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Ben Widawsky <bwidawsk@kernel.org>
> 
> In preparation for region provisioning all device decoders need to be
> enumerated since DPA allocations are calculated by summing the
> capacities of all decoders in a set. I.e. the programming for decoder[N]
> depends on the state of decoder[N-1], so skipping over decoders that
> fail to initialize prevents accurate DPA accounting.
> 
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> [djbw: reword changelog]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Good to see this tidied up the handling always felt a bit odd..

> ---
>  drivers/cxl/core/hdm.c |   12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 2223d151b61b..c940a4911fee 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -199,7 +199,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  {
>  	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
>  	struct cxl_port *port = cxlhdm->port;
> -	int i, committed, failed;
> +	int i, committed;
>  	u32 ctrl;
>  
>  	/*
> @@ -219,7 +219,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  	if (committed != cxlhdm->decoder_count)
>  		msleep(20);
>  
> -	for (i = 0, failed = 0; i < cxlhdm->decoder_count; i++) {
> +	for (i = 0; i < cxlhdm->decoder_count; i++) {
>  		int target_map[CXL_DECODER_MAX_INTERLEAVE] = { 0 };
>  		int rc, target_count = cxlhdm->target_count;
>  		struct cxl_decoder *cxld;
> @@ -250,8 +250,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  		rc = init_hdm_decoder(port, cxld, target_map, hdm, i);
>  		if (rc) {
>  			put_device(&cxld->dev);
> -			failed++;
> -			continue;
> +			return rc;
>  		}
>  		rc = add_hdm_decoder(port, cxld, target_map);
>  		if (rc) {
> @@ -261,11 +260,6 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  		}
>  	}
>  
> -	if (failed == cxlhdm->decoder_count) {
> -		dev_err(&port->dev, "No valid decoders found\n");
> -		return -ENXIO;
> -	}
> -
>  	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_decoders, CXL);
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 14/46] cxl/hdm: Enumerate allocated DPA
  2022-06-24  2:46 ` [PATCH 14/46] cxl/hdm: Enumerate allocated DPA Dan Williams
@ 2022-06-29 14:43   ` Jonathan Cameron
  2022-07-10  3:03     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 14:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

On Thu, 23 Jun 2022 19:46:44 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for provisioining CXL regions, add accounting for the DPA
> space consumed by existing regions / decoders. Recall, a CXL region is a
> memory range comrpised from one or more endpoint devices contributing a
> mapping of their DPA into HPA space through a decoder.
> 
> Record the DPA ranges covered by committed decoders at initial probe of
> endpoint ports relative to a per-device resource tree of the DPA type
> (pmem or volaltile-ram).
> 
> The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
> state across all endpoints and their decoders at once. The vast majority
> of DPA operations are reads as region creation is expected to be as rare
> as disk partitioning and volume creation. The device_lock() for this
> synchronization is specifically avoided for concern of entangling with
> sysfs attribute removal.
> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/hdm.c |  148 ++++++++++++++++++++++++++++++++++++++++++++----
>  drivers/cxl/cxl.h      |    2 +
>  drivers/cxl/cxlmem.h   |   13 ++++
>  3 files changed, 152 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index c940a4911fee..daae6e533146 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -7,6 +7,8 @@
>  #include "cxlmem.h"
>  #include "core.h"
>  
> +static DECLARE_RWSEM(cxl_dpa_rwsem);

I've not checked many files, but pci.c has equivalent static defines after
the DOC: entry so for consistency move this below that?


> +
>  /**
>   * DOC: cxl core hdm
>   *
> @@ -128,10 +130,108 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
>  
> +/*
> + * Must be called in a context that synchronizes against this decoder's
> + * port ->remove() callback (like an endpoint decoder sysfs attribute)
> + */
> +static void cxl_dpa_release(void *cxled);
> +static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_action)
> +{
> +	struct cxl_port *port = cxled_to_port(cxled);
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct resource *res = cxled->dpa_res;
> +
> +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> +
> +	if (remove_action)
> +		devm_remove_action(&port->dev, cxl_dpa_release, cxled);

This code organization is more surprising than I'd like. Why not move this to
a wrapper that is like devm_kfree() and similar which do the free now and
remove from the devm list?

static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
{
	struct cxl_port *port = cxled_to_port(cxled);
	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
	struct cxl_dev_state *cxlds = cxlmd->cxlds;
	struct resource *res = cxled->dpa_res;

	if (cxled->skip)
		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
				 cxled->skip);
	cxled->skip = 0;
	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
	cxled->dpa_res = NULL;
}

/* possibly add some underscores to this name to indicate it's special
   in when you can safely call it */
static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
{
	struct cxl_port *port = cxled_to_port(cxled);
	lockdep_assert_held_write(&cxl_dpa_rwsem);
	devm_remove_action(&port->dev, cxl_dpa_release, cxled);
	__cxl_dpa_release(cxled);
}

static void cxl_dpa_release(void *cxled)
{
	down_write(&cxl_dpa_rwsem);
	__cxl_dpa_release(cxled, false);
	up_write(&cxl_dpa_rwsem);
}

> +
> +	if (cxled->skip)
> +		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
> +				 cxled->skip);
> +	cxled->skip = 0;
> +	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> +	cxled->dpa_res = NULL;
> +}
> +
> +static void cxl_dpa_release(void *cxled)
> +{
> +	down_write(&cxl_dpa_rwsem);
> +	__cxl_dpa_release(cxled, false);
> +	up_write(&cxl_dpa_rwsem);
> +}
> +
> +static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> +			     resource_size_t base, resource_size_t len,
> +			     resource_size_t skip)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_port *port = cxled_to_port(cxled);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct device *dev = &port->dev;
> +	struct resource *res;
> +
> +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> +
> +	if (!len)
> +		return 0;
> +
> +	if (cxled->dpa_res) {
> +		dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
> +			port->id, cxled->cxld.id, cxled->dpa_res);
> +		return -EBUSY;
> +	}
> +
> +	if (skip) {
> +		res = __request_region(&cxlds->dpa_res, base - skip, skip,
> +				       dev_name(dev), 0);


Interface that uses a backwards definition of skip as what to skip before
the base parameter is a little odd can we rename base parameter to something
like 'current_top' then have base = current_top + skip?  current_top naming
not great though...



> +		if (!res) {
> +			dev_dbg(dev,
> +				"decoder%d.%d: failed to reserve skip space\n",
> +				port->id, cxled->cxld.id);
> +			return -EBUSY;
> +		}
> +	}
> +	res = __request_region(&cxlds->dpa_res, base, len, dev_name(dev), 0);
> +	if (!res) {
> +		dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
> +			port->id, cxled->cxld.id);
> +		if (skip)
> +			__release_region(&cxlds->dpa_res, base - skip, skip);
> +		return -EBUSY;
> +	}
> +	cxled->dpa_res = res;
> +	cxled->skip = skip;
> +
> +	return 0;
> +}
> +

...


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 15/46] cxl/Documentation: List attribute permissions
  2022-06-24  2:46 ` [PATCH 15/46] cxl/Documentation: List attribute permissions Dan Williams
  2022-06-28  3:16   ` Alison Schofield
@ 2022-06-29 14:59   ` Jonathan Cameron
  1 sibling, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 14:59 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Alison Schofield, hch, nvdimm, linux-pci, patches,
	Mauro Carvalho Chehab

On Thu, 23 Jun 2022 19:46:52 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Clarify the access permission of CXL sysfs attributes in the
> documentation to help development of userspace tooling.
> 
> Reported-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Makes sense, though might be a good idea at somepoint to standardize
this in some fashion for the automated docs build.  e.g.

https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html#abi-sys-bus-cxl-devices-devtype
+CC Mauro in case he thinks it's worth looking at doing for purposes of
his runtime verification scripts...

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   81 ++++++++++++++++---------------
>  1 file changed, 41 insertions(+), 40 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 7c2b846521f3..1fd5984b6158 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -57,28 +57,28 @@ Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL device objects export the devtype attribute which mirrors
> -		the same value communicated in the DEVTYPE environment variable
> -		for uevents for devices on the "cxl" bus.
> +		(RO) CXL device objects export the devtype attribute which
> +		mirrors the same value communicated in the DEVTYPE environment
> +		variable for uevents for devices on the "cxl" bus.
>  
>  What:		/sys/bus/cxl/devices/*/modalias
>  Date:		December, 2021
>  KernelVersion:	v5.18
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL device objects export the modalias attribute which mirrors
> -		the same value communicated in the MODALIAS environment variable
> -		for uevents for devices on the "cxl" bus.
> +		(RO) CXL device objects export the modalias attribute which
> +		mirrors the same value communicated in the MODALIAS environment
> +		variable for uevents for devices on the "cxl" bus.
>  
>  What:		/sys/bus/cxl/devices/portX/uport
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL port objects are enumerated from either a platform firmware
> -		device (ACPI0017 and ACPI0016) or PCIe switch upstream port with
> -		CXL component registers. The 'uport' symlink connects the CXL
> -		portX object to the device that published the CXL port
> +		(RO) CXL port objects are enumerated from either a platform
> +		firmware device (ACPI0017 and ACPI0016) or PCIe switch upstream
> +		port with CXL component registers. The 'uport' symlink connects
> +		the CXL portX object to the device that published the CXL port
>  		capability.
>  
>  What:		/sys/bus/cxl/devices/portX/dportY
> @@ -86,20 +86,20 @@ Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL port objects are enumerated from either a platform firmware
> -		device (ACPI0017 and ACPI0016) or PCIe switch upstream port with
> -		CXL component registers. The 'dportY' symlink identifies one or
> -		more downstream ports that the upstream port may target in its
> -		decode of CXL memory resources.  The 'Y' integer reflects the
> -		hardware port unique-id used in the hardware decoder target
> -		list.
> +		(RO) CXL port objects are enumerated from either a platform
> +		firmware device (ACPI0017 and ACPI0016) or PCIe switch upstream
> +		port with CXL component registers. The 'dportY' symlink
> +		identifies one or more downstream ports that the upstream port
> +		may target in its decode of CXL memory resources.  The 'Y'
> +		integer reflects the hardware port unique-id used in the
> +		hardware decoder target list.
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL decoder objects are enumerated from either a platform
> +		(RO) CXL decoder objects are enumerated from either a platform
>  		firmware description, or a CXL HDM decoder register set in a
>  		PCIe device (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder
>  		Capability Structure). The 'X' in decoderX.Y represents the
> @@ -111,42 +111,43 @@ Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		The 'start' and 'size' attributes together convey the physical
> -		address base and number of bytes mapped in the decoder's decode
> -		window. For decoders of devtype "cxl_decoder_root" the address
> -		range is fixed. For decoders of devtype "cxl_decoder_switch" the
> -		address is bounded by the decode range of the cxl_port ancestor
> -		of the decoder's cxl_port, and dynamically updates based on the
> -		active memory regions in that address space.
> +		(RO) The 'start' and 'size' attributes together convey the
> +		physical address base and number of bytes mapped in the
> +		decoder's decode window. For decoders of devtype
> +		"cxl_decoder_root" the address range is fixed. For decoders of
> +		devtype "cxl_decoder_switch" the address is bounded by the
> +		decode range of the cxl_port ancestor of the decoder's cxl_port,
> +		and dynamically updates based on the active memory regions in
> +		that address space.
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/locked
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		CXL HDM decoders have the capability to lock the configuration
> -		until the next device reset. For decoders of devtype
> -		"cxl_decoder_root" there is no standard facility to unlock them.
> -		For decoders of devtype "cxl_decoder_switch" a secondary bus
> -		reset, of the PCIe bridge that provides the bus for this
> -		decoders uport, unlocks / resets the decoder.
> +		(RO) CXL HDM decoders have the capability to lock the
> +		configuration until the next device reset. For decoders of
> +		devtype "cxl_decoder_root" there is no standard facility to
> +		unlock them.  For decoders of devtype "cxl_decoder_switch" a
> +		secondary bus reset, of the PCIe bridge that provides the bus
> +		for this decoders uport, unlocks / resets the decoder.
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/target_list
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		Display a comma separated list of the current decoder target
> -		configuration. The list is ordered by the current configured
> -		interleave order of the decoder's dport instances. Each entry in
> -		the list is a dport id.
> +		(RO) Display a comma separated list of the current decoder
> +		target configuration. The list is ordered by the current
> +		configured interleave order of the decoder's dport instances.
> +		Each entry in the list is a dport id.
>  
>  What:		/sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3}
>  Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		When a CXL decoder is of devtype "cxl_decoder_root", it
> +		(RO) When a CXL decoder is of devtype "cxl_decoder_root", it
>  		represents a fixed memory window identified by platform
>  		firmware. A fixed window may only support a subset of memory
>  		types. The 'cap_*' attributes indicate whether persistent
> @@ -158,8 +159,8 @@ Date:		June, 2021
>  KernelVersion:	v5.14
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		When a CXL decoder is of devtype "cxl_decoder_switch", it can
> -		optionally decode either accelerator memory (type-2) or expander
> -		memory (type-3). The 'target_type' attribute indicates the
> -		current setting which may dynamically change based on what
> +		(RO) When a CXL decoder is of devtype "cxl_decoder_switch", it
> +		can optionally decode either accelerator memory (type-2) or
> +		expander memory (type-3). The 'target_type' attribute indicates
> +		the current setting which may dynamically change based on what
>  		memory regions are activated in this decode hierarchy.
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects
  2022-06-24  2:46 ` [PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects Dan Williams
@ 2022-06-29 15:28   ` Jonathan Cameron
  2022-07-10  3:45     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 15:28 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:46:59 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Recall that the Device Physical Address (DPA) space of a CXL Memory
> Expander is potentially partitioned into a volatile and persistent
> portion. A decoder maps a Host Physical Address (HPA) range to a DPA
> range and that translation depends on the value of all previous (lower
> instance number) decoders before the current one.
> 
> In preparation for allowing dynamic provisioning of regions, decoders
> need an ABI to indicate which DPA partition a decoder targets. This ABI
> needs to be prepared for the possibility that some other agent committed
> and locked a decoder that spans the partition boundary.
> 
> Add 'decoderX.Y/mode' to endpoint decoders that indicates which
> partition 'ram' / 'pmem' the decoder targets, or 'mixed' if the decoder
> currently spans the partition boundary.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

A few trivial things inline though I'm not super keen on it being
introduced RO for just 2 patches...  You could pull forwards
the outline of the store() to avoid that slight oddity, but
I'm not that bothered if it is a pain to do.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   16 ++++++++++++++++
>  drivers/cxl/core/hdm.c                  |   10 ++++++++++
>  drivers/cxl/core/port.c                 |   20 ++++++++++++++++++++
>  drivers/cxl/cxl.h                       |    9 +++++++++
>  4 files changed, 55 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 1fd5984b6158..091459216e11 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -164,3 +164,19 @@ Description:
>  		expander memory (type-3). The 'target_type' attribute indicates
>  		the current setting which may dynamically change based on what
>  		memory regions are activated in this decode hierarchy.
> +
> +

Single blank line used for previous entries. Note this carries to other
later patches.


> +What:		/sys/bus/cxl/devices/decoderX.Y/mode
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> +		translates from a host physical address range, to a device local
> +		address range. Device-local address ranges are further split
> +		into a 'ram' (volatile memory) range and 'pmem' (persistent
> +		memory) range. The 'mode' attribute emits one of 'ram', 'pmem',
> +		'mixed', or 'none'. The 'mixed' indication is for error cases
> +		when a decoder straddles the volatile/persistent partition
> +		boundary, and 'none' indicates the decoder is not actively
> +		decoding, or no DPA allocation policy has been set.
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index daae6e533146..3f929231b822 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -204,6 +204,16 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>  	cxled->dpa_res = res;
>  	cxled->skip = skip;
>  
> +	if (resource_contains(&cxlds->pmem_res, res))
> +		cxled->mode = CXL_DECODER_PMEM;
> +	else if (resource_contains(&cxlds->ram_res, res))
> +		cxled->mode = CXL_DECODER_RAM;
> +	else {
> +		dev_dbg(dev, "decoder%d.%d: %pr mixed\n", port->id,
> +			cxled->cxld.id, cxled->dpa_res);

Why debug for one case and not the the others?

> +		cxled->mode = CXL_DECODER_MIXED;
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index b5f5fb9aa4b7..9d632c8c580b 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -171,6 +171,25 @@ static ssize_t target_list_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(target_list);
>  
> +static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
> +			 char *buf)
> +{
> +	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> +
> +	switch (cxled->mode) {
> +	case CXL_DECODER_RAM:
> +		return sysfs_emit(buf, "ram\n");
> +	case CXL_DECODER_PMEM:
> +		return sysfs_emit(buf, "pmem\n");
> +	case CXL_DECODER_NONE:
> +		return sysfs_emit(buf, "none\n");
> +	case CXL_DECODER_MIXED:
> +	default:
> +		return sysfs_emit(buf, "mixed\n");
> +	}
> +}
> +static DEVICE_ATTR_RO(mode);
> +
>  static struct attribute *cxl_decoder_base_attrs[] = {
>  	&dev_attr_start.attr,
>  	&dev_attr_size.attr,
> @@ -221,6 +240,7 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = {
>  
>  static struct attribute *cxl_decoder_endpoint_attrs[] = {
>  	&dev_attr_target_type.attr,
> +	&dev_attr_mode.attr,
>  	NULL,
>  };
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 6832d6d70548..aa223166f7ef 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -241,16 +241,25 @@ struct cxl_decoder {
>  	unsigned long flags;
>  };
>  
> +enum cxl_decoder_mode {
> +	CXL_DECODER_NONE,
> +	CXL_DECODER_RAM,
> +	CXL_DECODER_PMEM,
> +	CXL_DECODER_MIXED,
> +};
> +
>  /**
>   * struct cxl_endpoint_decoder - Endpoint  / SPA to DPA decoder
>   * @cxld: base cxl_decoder_object
>   * @dpa_res: actively claimed DPA span of this decoder
>   * @skip: offset into @dpa_res where @cxld.hpa_range maps
> + * @mode: which memory type / access-mode-partition this decoder targets
>   */
>  struct cxl_endpoint_decoder {
>  	struct cxl_decoder cxld;
>  	struct resource *dpa_res;
>  	resource_size_t skip;
> +	enum cxl_decoder_mode mode;
>  };
>  
>  /**
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 17/46] cxl/hdm: Track next decoder to allocate
  2022-06-24  2:47 ` [PATCH 17/46] cxl/hdm: Track next decoder to allocate Dan Williams
@ 2022-06-29 15:31   ` Jonathan Cameron
  2022-07-10  3:55     ` Dan Williams
  2022-07-10 16:34     ` Dan Williams
  0 siblings, 2 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 15:31 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:47:07 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The CXL specification enforces that endpoint decoders are committed in
> hw instance id order. In preparation for adding dynamic DPA allocation,
> record the hw instance id in endpoint decoders, and enforce allocations
> to occur in hw instance id order.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

dpa_end isn't a good name given the value isn't a Device Physical Address.

Otherwise looks fine,

Jonathan

> ---
>  drivers/cxl/core/hdm.c  |   14 ++++++++++++++
>  drivers/cxl/core/port.c |    1 +
>  drivers/cxl/cxl.h       |    2 ++
>  3 files changed, 17 insertions(+)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 3f929231b822..8805afe63ebf 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -153,6 +153,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_ac
>  	cxled->skip = 0;
>  	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
>  	cxled->dpa_res = NULL;
> +	port->dpa_end--;
>  }
>  
>  static void cxl_dpa_release(void *cxled)
> @@ -183,6 +184,18 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>  		return -EBUSY;
>  	}
>  
> +	if (port->dpa_end + 1 != cxled->cxld.id) {
> +		/*
> +		 * Assumes alloc and commit order is always in hardware instance
> +		 * order per expectations from 8.2.5.12.20 Committing Decoder
> +		 * Programming that enforce decoder[m] committed before
> +		 * decoder[m+1] commit start.
> +		 */
> +		dev_dbg(dev, "decoder%d.%d: expected decoder%d.%d\n", port->id,
> +			cxled->cxld.id, port->id, port->dpa_end + 1);
> +		return -EBUSY;
> +	}
> +
>  	if (skip) {
>  		res = __request_region(&cxlds->dpa_res, base - skip, skip,
>  				       dev_name(dev), 0);
> @@ -213,6 +226,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>  			cxled->cxld.id, cxled->dpa_res);
>  		cxled->mode = CXL_DECODER_MIXED;
>  	}
> +	port->dpa_end++;
>  
>  	return 0;
>  }
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 9d632c8c580b..54bf032cbcb7 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -485,6 +485,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
>  	port->uport = uport;
>  	port->component_reg_phys = component_reg_phys;
>  	ida_init(&port->decoder_ida);
> +	port->dpa_end = -1;
>  	INIT_LIST_HEAD(&port->dports);
>  	INIT_LIST_HEAD(&port->endpoints);
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index aa223166f7ef..d8edbdaa6208 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -326,6 +326,7 @@ struct cxl_nvdimm {
>   * @dports: cxl_dport instances referenced by decoders
>   * @endpoints: cxl_ep instances, endpoints that are a descendant of this port
>   * @decoder_ida: allocator for decoder ids
> + * @dpa_end: cursor to track highest allocated decoder for allocation ordering

dpa_end not a good name as this isn't a Device Physical Address.

>   * @component_reg_phys: component register capability base address (optional)
>   * @dead: last ep has been removed, force port re-creation
>   * @depth: How deep this port is relative to the root. depth 0 is the root.
> @@ -337,6 +338,7 @@ struct cxl_port {
>  	struct list_head dports;
>  	struct list_head endpoints;
>  	struct ida decoder_ida;
> +	int dpa_end;
>  	resource_size_t component_reg_phys;
>  	bool dead;
>  	unsigned int depth;
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder
  2022-06-24  2:47 ` [PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder Dan Williams
@ 2022-06-29 15:56   ` Jonathan Cameron
  2022-07-10 16:53     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 15:56 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:47:18 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The region provisioning flow will roughly follow a sequence of:
> 
> 1/ Allocate DPA to a set of decoders
> 
> 2/ Allocate HPA to a region
> 
> 3/ Associate decoders with a region and validate that the DPA allocations
>    and topologies match the parameters of the region.
> 
> For now, this change (step 1) arranges for DPA capacity to be allocated
> and deleted from non-committed decoders based on the decoder's mode /
> partition selection. Capacity is allocated from the lowest DPA in the
> partition and any 'pmem' allocation blocks out all remaining ram
> capacity in its 'skip' setting. DPA allocations are enforced in decoder
> instance order. I.e. decoder N + 1 always starts at a higher DPA than
> instance N, and deleting allocations must proceed from the
> highest-instance allocated decoder to the lowest.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

The error value setting in here might save a few lines, but to me it
is less readable than setting rc in each error path.

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |   37 +++++++
>  drivers/cxl/core/core.h                 |    7 +
>  drivers/cxl/core/hdm.c                  |  160 +++++++++++++++++++++++++++++++
>  drivers/cxl/core/port.c                 |   73 ++++++++++++++
>  4 files changed, 275 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 091459216e11..85844f9bc00b 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -171,7 +171,7 @@ Date:		May, 2022
>  KernelVersion:	v5.20
>  Contact:	linux-cxl@vger.kernel.org
>  Description:
> -		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> +		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
>  		translates from a host physical address range, to a device local
>  		address range. Device-local address ranges are further split
>  		into a 'ram' (volatile memory) range and 'pmem' (persistent
> @@ -180,3 +180,38 @@ Description:
>  		when a decoder straddles the volatile/persistent partition
>  		boundary, and 'none' indicates the decoder is not actively
>  		decoding, or no DPA allocation policy has been set.
> +
> +		'mode' can be written, when the decoder is in the 'disabled'
> +		state, with either 'ram' or 'pmem' to set the boundaries for the
> +		next allocation.
> +

As before, documentation above this in the file only uses single line break between
entries.

> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/dpa_resource
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint",
> +		and its 'dpa_size' attribute is non-zero, this attribute
> +		indicates the device physical address (DPA) base address of the
> +		allocation.

Why _resource rather than _base or _start?

> +
> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/dpa_size
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> +		translates from a host physical address range, to a device local
> +		address range. The range, base address plus length in bytes, of
> +		DPA allocated to this decoder is conveyed in these 2 attributes.
> +		Allocations can be mutated as long as the decoder is in the
> +		disabled state. A write to 'size' releases the previous DPA

'dpa_size' ?

> +		allocation and then attempts to allocate from the free capacity
> +		in the device partition referred to by 'decoderX.Y/mode'.
> +		Allocate and free requests can only be performed on the highest
> +		instance number disabled decoder with non-zero size. I.e.
> +		allocations are enforced to occur in increasing 'decoderX.Y/id'
> +		order and frees are enforced to occur in decreasing
> +		'decoderX.Y/id' order.
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 1a50c0fc399c..47cf0c286fc3 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -17,6 +17,13 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s);
>  void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
>  				   resource_size_t length);
>  
> +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> +		     enum cxl_decoder_mode mode);
> +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size);
> +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
> +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
> +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
> +
>  int cxl_memdev_init(void);
>  void cxl_memdev_exit(void);
>  void cxl_mbox_init(void);
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 8805afe63ebf..ceb4c28abc1b 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -248,6 +248,166 @@ static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>  	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>  }
>  
> +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled)
> +{
> +	resource_size_t size = 0;
> +
> +	down_read(&cxl_dpa_rwsem);
> +	if (cxled->dpa_res)
> +		size = resource_size(cxled->dpa_res);
> +	up_read(&cxl_dpa_rwsem);
> +
> +	return size;
> +}
> +
> +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled)

Instinct would be to expect this to return the resource, not the start.
Rename?


> +{
> +	resource_size_t base = -1;
> +
> +	down_read(&cxl_dpa_rwsem);
> +	if (cxled->dpa_res)
> +		base = cxled->dpa_res->start;
> +	up_read(&cxl_dpa_rwsem);
> +
> +	return base;
> +}
> +
> +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> +{
> +	int rc = -EBUSY;
> +	struct device *dev = &cxled->cxld.dev;
> +	struct cxl_port *port = to_cxl_port(dev->parent);
> +
> +	down_write(&cxl_dpa_rwsem);
> +	if (!cxled->dpa_res) {
> +		rc = 0;
> +		goto out;
> +	}
> +	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
> +		dev_dbg(dev, "decoder enabled\n");

I'd prefer explicit setting of rc = -EBUSY in the two
'error' paths to make it really clear when looking at these
that they are treated as errors.

> +		goto out;
> +	}
> +	if (cxled->cxld.id != port->dpa_end) {
> +		dev_dbg(dev, "expected decoder%d.%d\n", port->id,
> +			port->dpa_end);
> +		goto out;
> +	}
> +	__cxl_dpa_release(cxled, true);
> +	rc = 0;
> +out:
> +	up_write(&cxl_dpa_rwsem);
> +	return rc;
> +}
> +
> +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> +		     enum cxl_decoder_mode mode)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct device *dev = &cxled->cxld.dev;
> +	int rc = -EBUSY;

As above, I'd prefer seeing error set in each error path rther
than it being set in a few locations and having to go look
for which value it currently has.  To me having the
error code next to the condition is much easier to follow.

> +
> +	switch (mode) {
> +	case CXL_DECODER_RAM:
> +	case CXL_DECODER_PMEM:
> +		break;
> +	default:
> +		dev_dbg(dev, "unsupported mode: %d\n", mode);
> +		return -EINVAL;
> +	}
> +
> +	down_write(&cxl_dpa_rwsem);
> +	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
> +		goto out;
> +	/*
> +	 * Only allow modes that are supported by the current partition
> +	 * configuration
> +	 */
> +	rc = -ENXIO;
> +	if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) {
> +		dev_dbg(dev, "no available pmem capacity\n");
> +		goto out;
> +	}
> +	if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) {
> +		dev_dbg(dev, "no available ram capacity\n");
> +		goto out;
> +	}
> +
> +	cxled->mode = mode;
> +	rc = 0;
> +out:
> +	up_write(&cxl_dpa_rwsem);
> +
> +	return rc;
> +}
> +
> +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	resource_size_t free_ram_start, free_pmem_start;
> +	struct cxl_port *port = cxled_to_port(cxled);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct device *dev = &cxled->cxld.dev;
> +	resource_size_t start, avail, skip;
> +	struct resource *p, *last;
> +	int rc = -EBUSY;
> +
> +	down_write(&cxl_dpa_rwsem);
> +	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
> +		dev_dbg(dev, "decoder enabled\n");
> +		goto out;


-EBUSY only used in this path, so clearer to me to push that setting down
to in this  error path.


> +	}
> +
> +	for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling)
> +		last = p;
> +	if (last)
> +		free_ram_start = last->end + 1;
> +	else
> +		free_ram_start = cxlds->ram_res.start;
> +
> +	for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling)
> +		last = p;
> +	if (last)
> +		free_pmem_start = last->end + 1;
> +	else
> +		free_pmem_start = cxlds->pmem_res.start;
> +
> +	if (cxled->mode == CXL_DECODER_RAM) {
> +		start = free_ram_start;
> +		avail = cxlds->ram_res.end - start + 1;
> +		skip = 0;
> +	} else if (cxled->mode == CXL_DECODER_PMEM) {
> +		resource_size_t skip_start, skip_end;
> +
> +		start = free_pmem_start;
> +		avail = cxlds->pmem_res.end - start + 1;
> +		skip_start = free_ram_start;
> +		skip_end = start - 1;
> +		skip = skip_end - skip_start + 1;
> +	} else {
> +		dev_dbg(dev, "mode not set\n");
> +		rc = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (size > avail) {
> +		dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
> +			cxled->mode == CXL_DECODER_RAM ? "ram" : "pmem",
> +			&avail);
> +		rc = -ENOSPC;
> +		goto out;
> +	}
> +
> +	rc = __cxl_dpa_reserve(cxled, start, size, skip);
> +out:
> +	up_write(&cxl_dpa_rwsem);
> +
> +	if (rc)
> +		return rc;
> +
> +	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
> +}
> +
>  static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  			    int *target_map, void __iomem *hdm, int which,
>  			    u64 *dpa_base)

>  
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init()
  2022-06-24  2:47 ` [PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init() Dan Williams
@ 2022-06-29 15:58   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 15:58 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:47:26 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for a new cxl debugfs file, move 'cxl' directory
> establishment and teardown to the core and let subsequent init routines
> reference that setup.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

LGTM

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem'
  2022-06-24  2:47 ` [PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem' Dan Williams
@ 2022-06-29 16:08   ` Jonathan Cameron
  2022-07-10 17:09     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 16:08 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:47:33 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Dump the device-physial-address map for a CXL expander in /proc/iomem
> style format. E.g.:
> 
>   cat /sys/kernel/debug/cxl/mem1/dpamem
>   00000000-0fffffff : ram
>   10000000-1fffffff : pmem

Nice in general, but...

When I just checked what this looked like on my test setup. I'm 
seeing
00000000-0ffffff : pmem
  00000000-0fffff : endpoint3

Seems odd to see an endpoint nested below a pmem.  Wrong name somewhere
in a later patch. I'd expect that to be a decoder rather than the endpoint...
If I spot where that comes from whilst reviewing I'll call it out, but
didn't want to forget to raise it.

This patch is fine.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/core.h |    1 -
>  drivers/cxl/core/hdm.c  |   23 +++++++++++++++++++++++
>  drivers/cxl/core/port.c |    1 +
>  drivers/cxl/cxlmem.h    |    4 ++++
>  drivers/cxl/mem.c       |   23 +++++++++++++++++++++++
>  5 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index c242fa02d5e8..472ec9cb1018 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -24,7 +24,6 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
>  resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
>  resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
>  
> -struct dentry *cxl_debugfs_create_dir(const char *dir);
>  int cxl_memdev_init(void);
>  void cxl_memdev_exit(void);
>  void cxl_mbox_init(void);
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index ceb4c28abc1b..c0164f9b2195 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
>  #include <linux/io-64-nonatomic-hi-lo.h>
> +#include <linux/seq_file.h>
>  #include <linux/device.h>
>  #include <linux/delay.h>
>  
> @@ -248,6 +249,28 @@ static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>  	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>  }
>  
> +static void __cxl_dpa_debug(struct seq_file *file, struct resource *r, int depth)
> +{
> +	unsigned long long start = r->start, end = r->end;
> +
> +	seq_printf(file, "%*s%08llx-%08llx : %s\n", depth * 2, "", start, end,
> +		   r->name);
> +}
> +
> +void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
> +{
> +	struct resource *p1, *p2;
> +
> +	down_read(&cxl_dpa_rwsem);
> +	for (p1 = cxlds->dpa_res.child; p1; p1 = p1->sibling) {
> +		__cxl_dpa_debug(file, p1, 0);
> +		for (p2 = p1->child; p2; p2 = p2->sibling)
> +			__cxl_dpa_debug(file, p2, 1);
> +	}
> +	up_read(&cxl_dpa_rwsem);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, CXL);
> +
>  resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled)
>  {
>  	resource_size_t size = 0;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index f02b7470c20e..4e4e26ca507c 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1702,6 +1702,7 @@ struct dentry *cxl_debugfs_create_dir(const char *dir)
>  {
>  	return debugfs_create_dir(dir, cxl_debugfs);
>  }
> +EXPORT_SYMBOL_NS_GPL(cxl_debugfs_create_dir, CXL);
>  
>  static __init int cxl_core_init(void)
>  {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index b4e5ed9eabc9..db9c889f42ab 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -385,4 +385,8 @@ struct cxl_hdm {
>  	unsigned int interleave_mask;
>  	struct cxl_port *port;
>  };
> +
> +struct seq_file;
> +struct dentry *cxl_debugfs_create_dir(const char *dir);
> +void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds);
>  #endif /* __CXL_MEM_H__ */
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index a979d0b484d5..7513bea55145 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> +#include <linux/debugfs.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> @@ -56,10 +57,26 @@ static void enable_suspend(void *data)
>  	cxl_mem_active_dec();
>  }
>  
> +static void remove_debugfs(void *dentry)
> +{
> +	debugfs_remove_recursive(dentry);
> +}
> +
> +static int cxl_mem_dpa_show(struct seq_file *file, void *data)
> +{
> +	struct device *dev = file->private;
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +
> +	cxl_dpa_debug(file, cxlmd->cxlds);
> +
> +	return 0;
> +}
> +
>  static int cxl_mem_probe(struct device *dev)
>  {
>  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>  	struct cxl_port *parent_port;
> +	struct dentry *dentry;
>  	int rc;
>  
>  	/*
> @@ -73,6 +90,12 @@ static int cxl_mem_probe(struct device *dev)
>  	if (work_pending(&cxlmd->detach_work))
>  		return -EBUSY;
>  
> +	dentry = cxl_debugfs_create_dir(dev_name(dev));
> +	debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
> +	rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
> +	if (rc)
> +		return rc;
> +
>  	rc = devm_cxl_enumerate_ports(cxlmd);
>  	if (rc)
>  		return rc;
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory
  2022-06-24  2:47 ` [PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory Dan Williams
@ 2022-06-29 16:11   ` Jonathan Cameron
  2022-07-10 17:19     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 16:11 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:47:40 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> A recent QEMU upgrade resulted in collisions between QEMU's chosen
> location for PCI MMIO and cxl_test's fake address location for emulated
> CXL purposes. This was great for testing resource collisions, but not so
> great for continuing to test the nominal cases. Move cxl_test to the
> top-of-memory where it is less likely to collide with other resources.
*snigger*

Seems reasonable, though I'm sure someone else will have the same
idea for some other usecase and we'll keep moving this around...
Ah well.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  tools/testing/cxl/test/cxl.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index f52a5dd69d36..b6e6bc02a507 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -632,7 +632,8 @@ static __init int cxl_test_init(void)
>  		goto err_gen_pool_create;
>  	}
>  
> -	rc = gen_pool_add(cxl_mock_pool, SZ_512G, SZ_64G, NUMA_NO_NODE);
> +	rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G,
> +			  SZ_64G, NUMA_NO_NODE);
>  	if (rc)
>  		goto err_gen_pool_add;
>  
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 22/46] tools/testing/cxl: Expand CFMWS windows
  2022-06-24  2:47 ` [PATCH 22/46] tools/testing/cxl: Expand CFMWS windows Dan Williams
@ 2022-06-29 16:14   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 16:14 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:47:47 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> For the x2 host-bridge interleave windows, allow for a
> x8-endpoint-interleave configuration per memory-type with each device
> contributing the minimum 256MB extent. Similarly, for the x1 host-bridge
> interleave windows, allow for a x4-endpoint-interleave configuration per
> memory-type.
> 
> Bump up the number of decoders per-port to support hosting 8 regions.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Hmm. I should get around to adding multiple decoders to the programmable
bits of the QEMU emulation to give us more flexibility.  Mind you
volatile memory support would probably also be good ;)

Jonathan

> ---
>  tools/testing/cxl/test/cxl.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index b6e6bc02a507..599326796b83 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -14,7 +14,7 @@
>  #define NR_CXL_HOST_BRIDGES 2
>  #define NR_CXL_ROOT_PORTS 2
>  #define NR_CXL_SWITCH_PORTS 2
> -#define NR_CXL_PORT_DECODERS 2
> +#define NR_CXL_PORT_DECODERS 8
>  
>  static struct platform_device *cxl_acpi;
>  static struct platform_device *cxl_host_bridge[NR_CXL_HOST_BRIDGES];
> @@ -118,7 +118,7 @@ static struct {
>  			.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
>  					ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
>  			.qtg_id = 0,
> -			.window_size = SZ_256M,
> +			.window_size = SZ_256M * 4UL,
>  		},
>  		.target = { 0 },
>  	},
> @@ -133,7 +133,7 @@ static struct {
>  			.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
>  					ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
>  			.qtg_id = 1,
> -			.window_size = SZ_256M * 2,
> +			.window_size = SZ_256M * 8UL,
>  		},
>  		.target = { 0, 1, },
>  	},
> @@ -148,7 +148,7 @@ static struct {
>  			.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
>  					ACPI_CEDT_CFMWS_RESTRICT_PMEM,
>  			.qtg_id = 2,
> -			.window_size = SZ_256M,
> +			.window_size = SZ_256M * 4UL,
>  		},
>  		.target = { 0 },
>  	},
> @@ -163,7 +163,7 @@ static struct {
>  			.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
>  					ACPI_CEDT_CFMWS_RESTRICT_PMEM,
>  			.qtg_id = 3,
> -			.window_size = SZ_256M * 2,
> +			.window_size = SZ_256M * 8UL,
>  		},
>  		.target = { 0, 1, },
>  	},
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 23/46] tools/testing/cxl: Add partition support
  2022-06-24  2:47 ` [PATCH 23/46] tools/testing/cxl: Add partition support Dan Williams
@ 2022-06-29 16:20   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 16:20 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:47:54 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> In support of testing DPA allocation mecahinisms in the CXL core, the
> cxl_test environment needs to support establishing and retrieving the
> 'pmem partition boundary.
> 
> Replace the platform_device_add_resources() method for delineating DPA
> within an endpoint with an emulated DEV_SIZE amount of partitionable
> capacity. Set DEV_SIZE such that an endpoint has enough capacity to
> simultaneously participate in 8 distinct regions.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
FWIW

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/mbox.c      |    7 +-----
>  drivers/cxl/cxlmem.h         |    7 ++++++
>  tools/testing/cxl/test/cxl.c |   40 +--------------------------------
>  tools/testing/cxl/test/mem.c |   51 ++++++++++++++++++++++--------------------
>  4 files changed, 36 insertions(+), 69 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index dd438ca12dcd..40e3ccb2bf3e 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -716,12 +716,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>   */
>  static int cxl_mem_get_partition_info(struct cxl_dev_state *cxlds)
>  {
> -	struct cxl_mbox_get_partition_info {
> -		__le64 active_volatile_cap;
> -		__le64 active_persistent_cap;
> -		__le64 next_volatile_cap;
> -		__le64 next_persistent_cap;
> -	} __packed pi;
> +	struct cxl_mbox_get_partition_info pi;
>  	int rc;
>  
>  	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_PARTITION_INFO, NULL, 0,
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index db9c889f42ab..eee96016c3c7 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -314,6 +314,13 @@ struct cxl_mbox_identify {
>  	u8 qos_telemetry_caps;
>  } __packed;
>  
> +struct cxl_mbox_get_partition_info {
> +	__le64 active_volatile_cap;
> +	__le64 active_persistent_cap;
> +	__le64 next_volatile_cap;
> +	__le64 next_persistent_cap;
> +} __packed;
> +
>  struct cxl_mbox_get_lsa {
>  	__le32 offset;
>  	__le32 length;
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 599326796b83..c396f20a57dd 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -582,44 +582,6 @@ static void mock_companion(struct acpi_device *adev, struct device *dev)
>  #define SZ_512G (SZ_64G * 8)
>  #endif
>  
> -static struct platform_device *alloc_memdev(int id)
> -{
> -	struct resource res[] = {
> -		[0] = {
> -			.flags = IORESOURCE_MEM,
> -		},
> -		[1] = {
> -			.flags = IORESOURCE_MEM,
> -			.desc = IORES_DESC_PERSISTENT_MEMORY,
> -		},
> -	};
> -	struct platform_device *pdev;
> -	int i, rc;
> -
> -	for (i = 0; i < ARRAY_SIZE(res); i++) {
> -		struct cxl_mock_res *r = alloc_mock_res(SZ_256M);
> -
> -		if (!r)
> -			return NULL;
> -		res[i].start = r->range.start;
> -		res[i].end = r->range.end;
> -	}
> -
> -	pdev = platform_device_alloc("cxl_mem", id);
> -	if (!pdev)
> -		return NULL;
> -
> -	rc = platform_device_add_resources(pdev, res, ARRAY_SIZE(res));
> -	if (rc)
> -		goto err;
> -
> -	return pdev;
> -
> -err:
> -	platform_device_put(pdev);
> -	return NULL;
> -}
> -
>  static __init int cxl_test_init(void)
>  {
>  	int rc, i;
> @@ -722,7 +684,7 @@ static __init int cxl_test_init(void)
>  		struct platform_device *dport = cxl_switch_dport[i];
>  		struct platform_device *pdev;
>  
> -		pdev = alloc_memdev(i);
> +		pdev = platform_device_alloc("cxl_mem", i);
>  		if (!pdev)
>  			goto err_mem;
>  		pdev->dev.parent = &dport->dev;
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index b81c90715fe8..aa2df3a15051 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -10,6 +10,7 @@
>  #include <cxlmem.h>
>  
>  #define LSA_SIZE SZ_128K
> +#define DEV_SIZE SZ_2G
>  #define EFFECT(x) (1U << x)
>  
>  static struct cxl_cel_entry mock_cel[] = {
> @@ -25,6 +26,10 @@ static struct cxl_cel_entry mock_cel[] = {
>  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_LSA),
>  		.effect = cpu_to_le16(0),
>  	},
> +	{
> +		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_PARTITION_INFO),
> +		.effect = cpu_to_le16(0),
> +	},
>  	{
>  		.opcode = cpu_to_le16(CXL_MBOX_OP_SET_LSA),
>  		.effect = cpu_to_le16(EFFECT(1) | EFFECT(2)),
> @@ -97,42 +102,37 @@ static int mock_get_log(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  
>  static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  {
> -	struct platform_device *pdev = to_platform_device(cxlds->dev);
>  	struct cxl_mbox_identify id = {
>  		.fw_revision = { "mock fw v1 " },
>  		.lsa_size = cpu_to_le32(LSA_SIZE),
> -		/* FIXME: Add partition support */
> -		.partition_align = cpu_to_le64(0),
> +		.partition_align =
> +			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
> +		.total_capacity =
> +			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
>  	};
> -	u64 capacity = 0;
> -	int i;
>  
>  	if (cmd->size_out < sizeof(id))
>  		return -EINVAL;
>  
> -	for (i = 0; i < 2; i++) {
> -		struct resource *res;
> -
> -		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
> -		if (!res)
> -			break;
> -
> -		capacity += resource_size(res) / CXL_CAPACITY_MULTIPLIER;
> +	memcpy(cmd->payload_out, &id, sizeof(id));
>  
> -		if (le64_to_cpu(id.partition_align))
> -			continue;
> +	return 0;
> +}
>  
> -		if (res->desc == IORES_DESC_PERSISTENT_MEMORY)
> -			id.persistent_capacity = cpu_to_le64(
> -				resource_size(res) / CXL_CAPACITY_MULTIPLIER);
> -		else
> -			id.volatile_capacity = cpu_to_le64(
> -				resource_size(res) / CXL_CAPACITY_MULTIPLIER);
> -	}
> +static int mock_partition_info(struct cxl_dev_state *cxlds,
> +			       struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_mbox_get_partition_info pi = {
> +		.active_volatile_cap =
> +			cpu_to_le64(DEV_SIZE / 2 / CXL_CAPACITY_MULTIPLIER),
> +		.active_persistent_cap =
> +			cpu_to_le64(DEV_SIZE / 2 / CXL_CAPACITY_MULTIPLIER),
> +	};
>  
> -	id.total_capacity = cpu_to_le64(capacity);
> +	if (cmd->size_out < sizeof(pi))
> +		return -EINVAL;
>  
> -	memcpy(cmd->payload_out, &id, sizeof(id));
> +	memcpy(cmd->payload_out, &pi, sizeof(pi));
>  
>  	return 0;
>  }
> @@ -221,6 +221,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
>  	case CXL_MBOX_OP_GET_LSA:
>  		rc = mock_get_lsa(cxlds, cmd);
>  		break;
> +	case CXL_MBOX_OP_GET_PARTITION_INFO:
> +		rc = mock_partition_info(cxlds, cmd);
> +		break;
>  	case CXL_MBOX_OP_SET_LSA:
>  		rc = mock_set_lsa(cxlds, cmd);
>  		break;
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 24/46] tools/testing/cxl: Fix decoder default state
  2022-06-24  2:48 ` [PATCH 24/46] tools/testing/cxl: Fix decoder default state Dan Williams
@ 2022-06-29 16:22   ` Jonathan Cameron
  2022-07-10 17:33     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 16:22 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:48:01 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The 'enabled' state is reserved for committed decoders. By default,
> cxl_test decoders are uncommitted at init time.
> 
> Fixes: 7c7d68db0254 ("tools/testing/cxl: Enumerate mock decoders")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Whilst sanity checking this I notcie we have
CXL_DECODER_F_MASK but never use it. Might be worth dropping...

For this

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  tools/testing/cxl/test/cxl.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index c396f20a57dd..51d517fa62ee 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -479,7 +479,6 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  			.end = -1,
>  		};
>  
> -		cxld->flags = CXL_DECODER_F_ENABLE;
>  		cxld->interleave_ways = min_not_zero(target_count, 1);
>  		cxld->interleave_granularity = SZ_4K;
>  		cxld->target_type = CXL_DECODER_EXPANDER;
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 25/46] cxl/port: Record dport in endpoint references
  2022-06-24  2:48 ` [PATCH 25/46] cxl/port: Record dport in endpoint references Dan Williams
@ 2022-06-29 16:49   ` Jonathan Cameron
  2022-07-10 18:40     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 16:49 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, 23 Jun 2022 19:48:07 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Recall that the primary role of the cxl_mem driver is to probe if the
> given endoint is connected to a CXL port topology. In that process it
> walks its device ancestry to its PCI root port. If that root port is
> also a CXL root port then the probe process adds cxl_port object
> instances at switch in the path between to the root and the endpoint. As
> those cxl_port instances are added, or if a previous enumeration
> attempt already created the port a 'struct cxl_ep' instance is
port, a 

would make this more readable.

> registered with that port to track the endpoints interested in that
> port.
> 
> At the time the cxl_ep is registered the downstream egress path from the
> port to the endpoint is known. Take the opportunity to record that
> information as it will be needed for dynamic programming of decoder
> targets during region provisioning.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Otherwise, one comment on function naming not reflecting what it does
inline.

Jonathan

> ---
>  drivers/cxl/core/port.c |   52 ++++++++++++++++++++++++++++++++---------------
>  drivers/cxl/cxl.h       |    2 ++
>  2 files changed, 37 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 4e4e26ca507c..c54e1dbf92cb 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -866,8 +866,9 @@ static struct cxl_ep *find_ep(struct cxl_port *port, struct device *ep_dev)
>  	return NULL;
>  }
>  
> -static int add_ep(struct cxl_port *port, struct cxl_ep *new)
> +static int add_ep(struct cxl_ep *new)
>  {
> +	struct cxl_port *port = new->dport->port;
>  	struct cxl_ep *dup;
>  
>  	device_lock(&port->dev);
> @@ -885,14 +886,14 @@ static int add_ep(struct cxl_port *port, struct cxl_ep *new)
>  
>  /**
>   * cxl_add_ep - register an endpoint's interest in a port
> - * @port: a port in the endpoint's topology ancestry
> + * @dport: the dport that routes to @ep_dev
>   * @ep_dev: device representing the endpoint
>   *
>   * Intermediate CXL ports are scanned based on the arrival of endpoints.
>   * When those endpoints depart the port can be destroyed once all
>   * endpoints that care about that port have been removed.
>   */
> -static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)
> +static int cxl_add_ep(struct cxl_dport *dport, struct device *ep_dev)
>  {
>  	struct cxl_ep *ep;
>  	int rc;
> @@ -903,8 +904,9 @@ static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)
>  
>  	INIT_LIST_HEAD(&ep->list);
>  	ep->ep = get_device(ep_dev);
> +	ep->dport = dport;
>  
> -	rc = add_ep(port, ep);
> +	rc = add_ep(ep);
>  	if (rc)
>  		cxl_ep_release(ep);
>  	return rc;
> @@ -913,11 +915,13 @@ static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)
>  struct cxl_find_port_ctx {
>  	const struct device *dport_dev;
>  	const struct cxl_port *parent_port;
> +	struct cxl_dport **dport;
>  };
>  
>  static int match_port_by_dport(struct device *dev, const void *data)
>  {

This seems a little oddly name for a function that 'returns'
the dport via ctx when a match is found.


>  	const struct cxl_find_port_ctx *ctx = data;
> +	struct cxl_dport *dport;
>  	struct cxl_port *port;
>  
>  	if (!is_cxl_port(dev))
> @@ -926,7 +930,10 @@ static int match_port_by_dport(struct device *dev, const void *data)
>  		return 0;
>  
>  	port = to_cxl_port(dev);
> -	return cxl_find_dport_by_dev(port, ctx->dport_dev) != NULL;
> +	dport = cxl_find_dport_by_dev(port, ctx->dport_dev);
> +	if (ctx->dport)
> +		*ctx->dport = dport;
> +	return dport != NULL;
>  }
>  
>  static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
> @@ -942,24 +949,32 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
>  	return NULL;
>  }
>  
> -static struct cxl_port *find_cxl_port(struct device *dport_dev)
> +static struct cxl_port *find_cxl_port(struct device *dport_dev,
> +				      struct cxl_dport **dport)
>  {
>  	struct cxl_find_port_ctx ctx = {
>  		.dport_dev = dport_dev,
> +		.dport = dport,
>  	};
> +	struct cxl_port *port;
>  
> -	return __find_cxl_port(&ctx);
> +	port = __find_cxl_port(&ctx);
> +	return port;
>  }
>  
>  static struct cxl_port *find_cxl_port_at(struct cxl_port *parent_port,
> -					 struct device *dport_dev)
> +					 struct device *dport_dev,
> +					 struct cxl_dport **dport)
>  {
>  	struct cxl_find_port_ctx ctx = {
>  		.dport_dev = dport_dev,
>  		.parent_port = parent_port,
> +		.dport = dport,
>  	};
> +	struct cxl_port *port;
>  
> -	return __find_cxl_port(&ctx);
> +	port = __find_cxl_port(&ctx);
> +	return port;
>  }
>  
>  /*
> @@ -1044,7 +1059,7 @@ static void cxl_detach_ep(void *data)
>  		if (!dport_dev)
>  			break;
>  
> -		port = find_cxl_port(dport_dev);
> +		port = find_cxl_port(dport_dev, NULL);
>  		if (!port)
>  			continue;
>  
> @@ -1119,6 +1134,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
>  	struct device *dparent = grandparent(dport_dev);
>  	struct cxl_port *port, *parent_port = NULL;
>  	resource_size_t component_reg_phys;
> +	struct cxl_dport *dport;
>  	int rc;
>  
>  	if (!dparent) {
> @@ -1132,7 +1148,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
>  		return -ENXIO;
>  	}
>  
> -	parent_port = find_cxl_port(dparent);
> +	parent_port = find_cxl_port(dparent, NULL);
>  	if (!parent_port) {
>  		/* iterate to create this parent_port */
>  		return -EAGAIN;
> @@ -1147,13 +1163,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
>  		goto out;
>  	}
>  
> -	port = find_cxl_port_at(parent_port, dport_dev);
> +	port = find_cxl_port_at(parent_port, dport_dev, &dport);
>  	if (!port) {
>  		component_reg_phys = find_component_registers(uport_dev);
>  		port = devm_cxl_add_port(&parent_port->dev, uport_dev,
>  					 component_reg_phys, parent_port);
> +		/* retry find to pick up the new dport information */
>  		if (!IS_ERR(port))
> -			get_device(&port->dev);
> +			port = find_cxl_port_at(parent_port, dport_dev, &dport);
>  	}
>  out:
>  	device_unlock(&parent_port->dev);
> @@ -1163,7 +1180,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
>  	else {
>  		dev_dbg(&cxlmd->dev, "add to new port %s:%s\n",
>  			dev_name(&port->dev), dev_name(port->uport));
> -		rc = cxl_add_ep(port, &cxlmd->dev);
> +		rc = cxl_add_ep(dport, &cxlmd->dev);
>  		if (rc == -EEXIST) {
>  			/*
>  			 * "can't" happen, but this error code means
> @@ -1197,6 +1214,7 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>  	for (iter = dev; iter; iter = grandparent(iter)) {
>  		struct device *dport_dev = grandparent(iter);
>  		struct device *uport_dev;
> +		struct cxl_dport *dport;
>  		struct cxl_port *port;
>  
>  		if (!dport_dev)
> @@ -1212,12 +1230,12 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>  		dev_dbg(dev, "scan: iter: %s dport_dev: %s parent: %s\n",
>  			dev_name(iter), dev_name(dport_dev),
>  			dev_name(uport_dev));
> -		port = find_cxl_port(dport_dev);
> +		port = find_cxl_port(dport_dev, &dport);
>  		if (port) {
>  			dev_dbg(&cxlmd->dev,
>  				"found already registered port %s:%s\n",
>  				dev_name(&port->dev), dev_name(port->uport));
> -			rc = cxl_add_ep(port, &cxlmd->dev);
> +			rc = cxl_add_ep(dport, &cxlmd->dev);
>  
>  			/*
>  			 * If the endpoint already exists in the port's list,
> @@ -1258,7 +1276,7 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
>  
>  struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd)
>  {
> -	return find_cxl_port(grandparent(&cxlmd->dev));
> +	return find_cxl_port(grandparent(&cxlmd->dev), NULL);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL);
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index d8edbdaa6208..e654251a54dd 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -363,10 +363,12 @@ struct cxl_dport {
>  /**
>   * struct cxl_ep - track an endpoint's interest in a port
>   * @ep: device that hosts a generic CXL endpoint (expander or accelerator)
> + * @dport: which dport routes to this endpoint on this port
>   * @list: node on port->endpoints list
>   */
>  struct cxl_ep {
>  	struct device *ep;
> +	struct cxl_dport *dport;
>  	struct list_head list;
>  };
>  
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 26/46] cxl/port: Record parent dport when adding ports
  2022-06-24  4:19 ` [PATCH 26/46] cxl/port: Record parent dport when adding ports Dan Williams
@ 2022-06-29 17:02   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 17:02 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:30 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> At the time that cxl_port instances are being created, cache the dport
> from the parent port that points to this new child port. This will be
> useful for region provisioning when walking the tree to calculate
> decoder targets, and saves rewalking the dport list after the fact to
> build this information.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>




^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port
  2022-06-24  4:19 ` [PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port Dan Williams
@ 2022-06-29 17:19   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-29 17:19 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:31 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for region provisioning that needs to walk the topology
> by endpoints, use an xarray to record endpoint interest in a given port.
> In addition to being more space and time efficient it also reduces the
> complexity of the implementation by moving locking internal to the
> xarray implementation. It also allows for a single cxl_ep reference to
> be recorded in multiple xarrays.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention
       [not found]   ` <CGME20220629174147uscas1p211384ae262e099484440ef285be26c75@uscas1p2.samsung.com>
@ 2022-06-29 17:41     ` Adam Manzanares
  2022-07-09 20:06       ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Adam Manzanares @ 2022-06-29 17:41 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:07PM -0700, Dan Williams wrote:
> This failing signature:
> 
> [    8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760
> [    8.392670] cxl_port: probe of endpoint2 failed with error 970997760
> [    8.392719] create_endpoint: cxl_mem mem0: add: endpoint2
> [    8.392721] cxl_mem mem0: endpoint2 failed probe
> [    8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6
> 
> ...shows cxl_hdm_decode_init() resulting in a return code ("970997760")
> that looks like stack corruption. The problem goes away if
> cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init().
> 
> The corruption results from the mismatch that the calling convention for
> cxl_hdm_decode_init() is:
> 
> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> 
> ...and __wrap_cxl_hdm_decode_init() is:
> 
> bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> 
> ...i.e. an int is expected but __wrap_hdm_decode_init() returns bool.
> 
> Fix the convention and cleanup the organization to match
> __wrap_cxl_await_media_ready() as the difference was a red herring that
> distracted from finding the bug.
> 
> Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  tools/testing/cxl/test/mock.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> index f1f8c40948c5..bce6a21df0d5 100644
> --- a/tools/testing/cxl/test/mock.c
> +++ b/tools/testing/cxl/test/mock.c
> @@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL);
>  
> -bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
> -				struct cxl_hdm *cxlhdm)
> +int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
> +			       struct cxl_hdm *cxlhdm)
>  {
>  	int rc = 0, index;
>  	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
>  
> -	if (!ops || !ops->is_mock_dev(cxlds->dev))
> +	if (ops && ops->is_mock_dev(cxlds->dev))
> +		rc = 0;
> +	else
>  		rc = cxl_hdm_decode_init(cxlds, cxlhdm);
>  	put_cxl_mock_ops(index);
>  
> 


Looks good.

Reviewed by: Adam Manzanares <a.manzanares@samsung.com>
> 

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port
       [not found]   ` <CGME20220629174622uscas1p2236a084ce25771a3ab57c6f006632f35@uscas1p2.samsung.com>
@ 2022-06-29 17:46     ` Adam Manzanares
  0 siblings, 0 replies; 157+ messages in thread
From: Adam Manzanares @ 2022-06-29 17:46 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:14PM -0700, Dan Williams wrote:
> The upcoming region provisioning implementation has a need to
> dereference port->uport during the port unregister flow. Specifically,
> endpoint decoders need to be able to lookup their corresponding memdev
> via port->uport.
> 
> The existing ->dead flag was added for cases where the core was
> committed to tearing down the port, but needed to drop locks before
> calling device_unregister(). Reuse that flag to indicate to
> delete_endpoint() that it has no "release action" work to do as
> unregister_port() will handle it.
> 
> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/port.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index dbce99bdffab..7810d1a8369b 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -370,7 +370,7 @@ static void unregister_port(void *_port)
>  		lock_dev = &parent->dev;
>  
>  	device_lock_assert(lock_dev);
> -	port->uport = NULL;
> +	port->dead = true;
>  	device_unregister(&port->dev);
>  }
>  
> @@ -857,7 +857,7 @@ static void delete_endpoint(void *data)
>  	parent = &parent_port->dev;
>  
>  	device_lock(parent);
> -	if (parent->driver && endpoint->uport) {
> +	if (parent->driver && !endpoint->dead) {
>  		devm_release_action(parent, cxl_unlink_uport, endpoint);
>  		devm_release_action(parent, unregister_port, endpoint);
>  	}
> 
>


Looks good.

Reviewed by: Adam Manzanares <a.manzanares@samsung.com>

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 03/46] cxl/hdm: Use local hdm variable
       [not found]   ` <CGME20220629200312uscas1p292303b9325dcbfe59293f002dc9e6b03@uscas1p2.samsung.com>
@ 2022-06-29 20:03     ` Adam Manzanares
  0 siblings, 0 replies; 157+ messages in thread
From: Adam Manzanares @ 2022-06-29 20:03 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:21PM -0700, Dan Williams wrote:
> From: Ben Widawsky <bwidawsk@kernel.org>
> 
> Save a few characters and use the already initialized local variable.
> 
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/hdm.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index bfc8ee876278..ba3d2d959c71 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -251,8 +251,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  			return PTR_ERR(cxld);
>  		}
>  
> -		rc = init_hdm_decoder(port, cxld, target_map,
> -				      cxlhdm->regs.hdm_decoder, i);
> +		rc = init_hdm_decoder(port, cxld, target_map, hdm, i);
>  		if (rc) {
>  			put_device(&cxld->dev);
>  			failed++;
> 
> 


Looks good.

Reviewed by: Adam Manzanares <a.manzanares@samsung.com>

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range
       [not found]   ` <CGME20220629200652uscas1p2c1da644ea63a5de69e14e046379779b1@uscas1p2.samsung.com>
@ 2022-06-29 20:06     ` Adam Manzanares
  0 siblings, 0 replies; 157+ messages in thread
From: Adam Manzanares @ 2022-06-29 20:06 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:28PM -0700, Dan Williams wrote:
> In preparation for growing a ->dpa_range attribute for endpoint
> decoders, rename the current ->decoder_range to the more descriptive
> ->hpa_range.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/hdm.c       |    2 +-
>  drivers/cxl/core/port.c      |    4 ++--
>  drivers/cxl/cxl.h            |    4 ++--
>  tools/testing/cxl/test/cxl.c |    2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index ba3d2d959c71..5c070c93b07f 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -172,7 +172,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  		return -ENXIO;
>  	}
>  
> -	cxld->decoder_range = (struct range) {
> +	cxld->hpa_range = (struct range) {
>  		.start = base,
>  		.end = base + size - 1,
>  	};
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 7810d1a8369b..98bcbbd59a75 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -78,7 +78,7 @@ static ssize_t start_show(struct device *dev, struct device_attribute *attr,
>  	if (is_root_decoder(dev))
>  		start = cxld->platform_res.start;
>  	else
> -		start = cxld->decoder_range.start;
> +		start = cxld->hpa_range.start;
>  
>  	return sysfs_emit(buf, "%#llx\n", start);
>  }
> @@ -93,7 +93,7 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr,
>  	if (is_root_decoder(dev))
>  		size = resource_size(&cxld->platform_res);
>  	else
> -		size = range_len(&cxld->decoder_range);
> +		size = range_len(&cxld->hpa_range);
>  
>  	return sysfs_emit(buf, "%#llx\n", size);
>  }
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 6799b27c7db2..8256728cea8d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -198,7 +198,7 @@ enum cxl_decoder_type {
>   * @dev: this decoder's device
>   * @id: kernel device name id
>   * @platform_res: address space resources considered by root decoder
> - * @decoder_range: address space resources considered by midlevel decoder
> + * @hpa_range: Host physical address range mapped by this decoder
>   * @interleave_ways: number of cxl_dports in this decode
>   * @interleave_granularity: data stride per dport
>   * @target_type: accelerator vs expander (type2 vs type3) selector
> @@ -212,7 +212,7 @@ struct cxl_decoder {
>  	int id;
>  	union {
>  		struct resource platform_res;
> -		struct range decoder_range;
> +		struct range hpa_range;
>  	};
>  	int interleave_ways;
>  	int interleave_granularity;
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 431f2bddf6c8..7a08b025f2de 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -461,7 +461,7 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>  			return PTR_ERR(cxld);
>  		}
>  
> -		cxld->decoder_range = (struct range) {
> +		cxld->hpa_range = (struct range) {
>  			.start = 0,
>  			.end = -1,
>  		};
> 
>

Looks good.

Reviewed by: Adam Manzanares <a.manzanares@samsung.com>

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders
       [not found]   ` <CGME20220629202117uscas1p2892fb68ae60c4754e2f7d26882a92ae5@uscas1p2.samsung.com>
@ 2022-06-29 20:21     ` Adam Manzanares
  2022-07-09 23:38       ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Adam Manzanares @ 2022-06-29 20:21 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:36PM -0700, Dan Williams wrote:
> Root decoders are responsible for hosting the available host address
> space for endpoints and regions to claim. The tracking of that available
> capacity can be done in iomem_resource directly. As a result, root
> decoders no longer need to host their own resource tree. The
> current ->platform_res attribute was added prematurely.
> 
> Otherwise, ->hpa_range fills the role of conveying the current decode
> range of the decoder.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/acpi.c      |   17 ++++++++++-------
>  drivers/cxl/core/pci.c  |    8 +-------
>  drivers/cxl/core/port.c |   30 +++++++-----------------------
>  drivers/cxl/cxl.h       |    6 +-----
>  4 files changed, 19 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 40286f5df812..951695cdb455 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -108,8 +108,10 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  
>  	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
>  	cxld->target_type = CXL_DECODER_EXPANDER;
> -	cxld->platform_res = (struct resource)DEFINE_RES_MEM(cfmws->base_hpa,
> -							     cfmws->window_size);
> +	cxld->hpa_range = (struct range) {
> +		.start = cfmws->base_hpa,
> +		.end = cfmws->base_hpa + cfmws->window_size - 1,
> +	};
>  	cxld->interleave_ways = CFMWS_INTERLEAVE_WAYS(cfmws);
>  	cxld->interleave_granularity = CFMWS_INTERLEAVE_GRANULARITY(cfmws);
>  
> @@ -119,13 +121,14 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  	else
>  		rc = cxl_decoder_autoremove(dev, cxld);
>  	if (rc) {
> -		dev_err(dev, "Failed to add decoder for %pr\n",
> -			&cxld->platform_res);
> +		dev_err(dev, "Failed to add decoder for [%#llx - %#llx]\n",
> +			cxld->hpa_range.start, cxld->hpa_range.end);

Minor nit, should we add range in our debug message?

+		dev_err(dev, "Failed to add decoder for range [%#llx - %#llx]\n",

>  		return 0;
>  	}
> -	dev_dbg(dev, "add: %s node: %d range %pr\n", dev_name(&cxld->dev),
> -		phys_to_target_node(cxld->platform_res.start),
> -		&cxld->platform_res);
> +	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> +		dev_name(&cxld->dev),
> +		phys_to_target_node(cxld->hpa_range.start),
> +		cxld->hpa_range.start, cxld->hpa_range.end);
>  
>  	return 0;
>  }
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index c4c99ff7b55e..7672789c3225 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -225,7 +225,6 @@ static int dvsec_range_allowed(struct device *dev, void *arg)
>  {
>  	struct range *dev_range = arg;
>  	struct cxl_decoder *cxld;
> -	struct range root_range;
>  
>  	if (!is_root_decoder(dev))
>  		return 0;
> @@ -237,12 +236,7 @@ static int dvsec_range_allowed(struct device *dev, void *arg)
>  	if (!(cxld->flags & CXL_DECODER_F_RAM))
>  		return 0;
>  
> -	root_range = (struct range) {
> -		.start = cxld->platform_res.start,
> -		.end = cxld->platform_res.end,
> -	};
> -
> -	return range_contains(&root_range, dev_range);
> +	return range_contains(&cxld->hpa_range, dev_range);
>  }
>  
>  static void disable_hdm(void *_cxlhdm)
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 98bcbbd59a75..b51eb41aa839 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -73,29 +73,17 @@ static ssize_t start_show(struct device *dev, struct device_attribute *attr,
>  			  char *buf)
>  {
>  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> -	u64 start;
>  
> -	if (is_root_decoder(dev))
> -		start = cxld->platform_res.start;
> -	else
> -		start = cxld->hpa_range.start;
> -
> -	return sysfs_emit(buf, "%#llx\n", start);
> +	return sysfs_emit(buf, "%#llx\n", cxld->hpa_range.start);
>  }
>  static DEVICE_ATTR_ADMIN_RO(start);
>  
>  static ssize_t size_show(struct device *dev, struct device_attribute *attr,
> -			char *buf)
> +			 char *buf)
>  {
>  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> -	u64 size;
> -
> -	if (is_root_decoder(dev))
> -		size = resource_size(&cxld->platform_res);
> -	else
> -		size = range_len(&cxld->hpa_range);
>  
> -	return sysfs_emit(buf, "%#llx\n", size);
> +	return sysfs_emit(buf, "%#llx\n", range_len(&cxld->hpa_range));
>  }
>  static DEVICE_ATTR_RO(size);
>  
> @@ -1233,7 +1221,10 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  	cxld->interleave_ways = 1;
>  	cxld->interleave_granularity = PAGE_SIZE;
>  	cxld->target_type = CXL_DECODER_EXPANDER;
> -	cxld->platform_res = (struct resource)DEFINE_RES_MEM(0, 0);
> +	cxld->hpa_range = (struct range) {
> +		.start = 0,
> +		.end = -1,
> +	};
>  
>  	return cxld;
>  err:
> @@ -1347,13 +1338,6 @@ int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map)
>  	if (rc)
>  		return rc;
>  
> -	/*
> -	 * Platform decoder resources should show up with a reasonable name. All
> -	 * other resources are just sub ranges within the main decoder resource.
> -	 */
> -	if (is_root_decoder(dev))
> -		cxld->platform_res.name = dev_name(dev);
> -
>  	return device_add(dev);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_decoder_add_locked, CXL);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 8256728cea8d..35ce17872fc1 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -197,7 +197,6 @@ enum cxl_decoder_type {
>   * struct cxl_decoder - CXL address range decode configuration
>   * @dev: this decoder's device
>   * @id: kernel device name id
> - * @platform_res: address space resources considered by root decoder
>   * @hpa_range: Host physical address range mapped by this decoder
>   * @interleave_ways: number of cxl_dports in this decode
>   * @interleave_granularity: data stride per dport
> @@ -210,10 +209,7 @@ enum cxl_decoder_type {
>  struct cxl_decoder {
>  	struct device dev;
>  	int id;
> -	union {
> -		struct resource platform_res;
> -		struct range hpa_range;
> -	};
> +	struct range hpa_range;
>  	int interleave_ways;
>  	int interleave_granularity;
>  	enum cxl_decoder_type target_type;
> 
> 


Otherwise, looks good.

Reviewed by: Adam Manzanares <a.manzanares@samsung.com>

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 06/46] cxl/core: Drop is_cxl_decoder()
       [not found]   ` <CGME20220629203448uscas1p264a7f79a1ed7f9257eefcb3064c7d943@uscas1p2.samsung.com>
@ 2022-06-29 20:34     ` Adam Manzanares
  0 siblings, 0 replies; 157+ messages in thread
From: Adam Manzanares @ 2022-06-29 20:34 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:43PM -0700, Dan Williams wrote:
> This helper was only used to identify the object type for lockdep
> purposes. Now that lockdep support is done with explicit lock classes,
> this helper can be dropped.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/port.c |    6 ------
>  drivers/cxl/cxl.h       |    1 -
>  2 files changed, 7 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index b51eb41aa839..13c321afe076 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -271,12 +271,6 @@ bool is_root_decoder(struct device *dev)
>  }
>  EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
>  
> -bool is_cxl_decoder(struct device *dev)
> -{
> -	return dev->type && dev->type->release == cxl_decoder_release;
> -}
> -EXPORT_SYMBOL_NS_GPL(is_cxl_decoder, CXL);
> -
>  struct cxl_decoder *to_cxl_decoder(struct device *dev)
>  {
>  	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 35ce17872fc1..6e08fe8cc0fe 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -337,7 +337,6 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
>  struct cxl_decoder *to_cxl_decoder(struct device *dev);
>  bool is_root_decoder(struct device *dev);
>  bool is_endpoint_decoder(struct device *dev);
> -bool is_cxl_decoder(struct device *dev);
>  struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>  					   unsigned int nr_targets);
>  struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> 
>

Looks good.

Reviewed by: Adam Manzanares <a.manzanares@samsung.com>

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 28/46] cxl/port: Move dport tracking to an xarray
  2022-06-24  4:19 ` [PATCH 28/46] cxl/port: Move dport tracking to an xarray Dan Williams
@ 2022-06-30  9:18   ` Jonathan Cameron
  2022-07-10 19:06     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30  9:18 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:32 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Reduce the complexity and the overhead of walking the topology to
> determine endpoint connectivity to root decoder interleave
> configurations.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Hi Dan,

A few minor comments inline around naming and also one query on why
the refactor or reap_ports is connected to the xarray change.

Thanks,

Jonathan

> ---
>  drivers/cxl/acpi.c      |  2 +-
>  drivers/cxl/core/hdm.c  |  6 ++-
>  drivers/cxl/core/port.c | 88 ++++++++++++++++++-----------------------
>  drivers/cxl/cxl.h       | 12 +++---
>  4 files changed, 51 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 09fe92177d03..92ad1f359faf 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -197,7 +197,7 @@ static int add_host_bridge_uport(struct device *match, void *arg)
>  	if (!bridge)
>  		return 0;
>  
> -	dport = cxl_find_dport_by_dev(root_port, match);
> +	dport = cxl_dport_load(root_port, match);

Load is kind of specific to the xarray.  I'd be tempted to keep it to
original find naming.


>  	if (!dport) {
>  		dev_dbg(host, "host bridge expected and not found\n");
>  		return 0;
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index c0164f9b2195..672bf3e97811 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -50,8 +50,9 @@ static int add_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  int devm_cxl_add_passthrough_decoder(struct cxl_port *port)
>  {
>  	struct cxl_switch_decoder *cxlsd;
> -	struct cxl_dport *dport;
> +	struct cxl_dport *dport = NULL;
>  	int single_port_map[1];
> +	unsigned long index;
>  
>  	cxlsd = cxl_switch_decoder_alloc(port, 1);
>  	if (IS_ERR(cxlsd))
> @@ -59,7 +60,8 @@ int devm_cxl_add_passthrough_decoder(struct cxl_port *port)
>  
>  	device_lock_assert(&port->dev);
>  
> -	dport = list_first_entry(&port->dports, typeof(*dport), list);
> +	xa_for_each(&port->dports, index, dport)
> +		break;
>  	single_port_map[0] = dport->port_id;
>  
>  	return add_hdm_decoder(port, &cxlsd->cxld, single_port_map);
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index ea3ab9baf232..d2f6898940fa 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -452,6 +452,7 @@ static void cxl_port_release(struct device *dev)
>  	xa_for_each(&port->endpoints, index, ep)
>  		cxl_ep_remove(port, ep);
>  	xa_destroy(&port->endpoints);
> +	xa_destroy(&port->dports);
>  	ida_free(&cxl_port_ida, port->id);
>  	kfree(port);
>  }
> @@ -566,7 +567,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
>  	port->component_reg_phys = component_reg_phys;
>  	ida_init(&port->decoder_ida);
>  	port->dpa_end = -1;
> -	INIT_LIST_HEAD(&port->dports);
> +	xa_init(&port->dports);
>  	xa_init(&port->endpoints);
>  
>  	device_initialize(dev);
> @@ -696,17 +697,13 @@ static int match_root_child(struct device *dev, const void *match)
>  		return 0;
>  
>  	port = to_cxl_port(dev);
> -	device_lock(dev);
> -	list_for_each_entry(dport, &port->dports, list) {
> -		iter = match;
> -		while (iter) {
> -			if (iter == dport->dport)
> -				goto out;
> -			iter = iter->parent;
> -		}
> +	iter = match;
> +	while (iter) {
> +		dport = cxl_dport_load(port, iter);
> +		if (dport)
> +			break;
> +		iter = iter->parent;
>  	}
> -out:
> -	device_unlock(dev);
>  
>  	return !!iter;
>  }
> @@ -730,9 +727,10 @@ EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL);
>  static struct cxl_dport *find_dport(struct cxl_port *port, int id)
>  {
>  	struct cxl_dport *dport;
> +	unsigned long index;
>  
>  	device_lock_assert(&port->dev);
> -	list_for_each_entry (dport, &port->dports, list)
> +	xa_for_each(&port->dports, index, dport)
>  		if (dport->port_id == id)
>  			return dport;
>  	return NULL;
> @@ -741,18 +739,21 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id)
>  static int add_dport(struct cxl_port *port, struct cxl_dport *new)
>  {
>  	struct cxl_dport *dup;
> +	int rc;
>  
>  	device_lock_assert(&port->dev);
>  	dup = find_dport(port, new->port_id);
> -	if (dup)
> +	if (dup) {
>  		dev_err(&port->dev,
>  			"unable to add dport%d-%s non-unique port id (%s)\n",
>  			new->port_id, dev_name(new->dport),
>  			dev_name(dup->dport));
> -	else
> -		list_add_tail(&new->list, &port->dports);
> +		rc = -EBUSY;

Direct return slightly simpler and reduce indent on next bit plus makes
this more obviously an 'error condition' by indenting it.

> +	} else
> +		rc = xa_insert(&port->dports, (unsigned long)new->dport, new,
> +			       GFP_KERNEL);
>  
> -	return dup ? -EEXIST : 0;
> +	return rc;
>  }
>  
>  /*
> @@ -779,10 +780,8 @@ static void cxl_dport_remove(void *data)
>  	struct cxl_dport *dport = data;
>  	struct cxl_port *port = dport->port;
>  
> +	xa_erase(&port->dports, (unsigned long) dport->dport);
>  	put_device(dport->dport);
> -	cond_cxl_root_lock(port);
> -	list_del(&dport->list);
> -	cond_cxl_root_unlock(port);
>  }
>  
>  static void cxl_dport_unlink(void *data)
> @@ -834,7 +833,6 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
>  	if (!dport)
>  		return ERR_PTR(-ENOMEM);
>  
> -	INIT_LIST_HEAD(&dport->list);
>  	dport->dport = dport_dev;
>  	dport->port_id = port_id;
>  	dport->component_reg_phys = component_reg_phys;
> @@ -925,7 +923,7 @@ static int match_port_by_dport(struct device *dev, const void *data)
>  		return 0;
>  
>  	port = to_cxl_port(dev);
> -	dport = cxl_find_dport_by_dev(port, ctx->dport_dev);
> +	dport = cxl_dport_load(port, ctx->dport_dev);
>  	if (ctx->dport)
>  		*ctx->dport = dport;
>  	return dport != NULL;
> @@ -1025,19 +1023,27 @@ EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
>   * for a port to be unregistered is when all memdevs beneath that port have gone
>   * through ->remove(). This "bottom-up" removal selectively removes individual
>   * child ports manually. This depends on devm_cxl_add_port() to not change is
> - * devm action registration order.
> + * devm action registration order, and for dports to have already been
> + * destroyed by reap_dports().
>   */
> -static void delete_switch_port(struct cxl_port *port, struct list_head *dports)
> +static void delete_switch_port(struct cxl_port *port)
> +{
> +	devm_release_action(port->dev.parent, cxl_unlink_uport, port);
> +	devm_release_action(port->dev.parent, unregister_port, port);
> +}
> +
> +static void reap_dports(struct cxl_port *port)
>  {
> -	struct cxl_dport *dport, *_d;
> +	struct cxl_dport *dport;
> +	unsigned long index;
> +
> +	device_lock_assert(&port->dev);
>  
> -	list_for_each_entry_safe(dport, _d, dports, list) {
> +	xa_for_each(&port->dports, index, dport) {
>  		devm_release_action(&port->dev, cxl_dport_unlink, dport);
>  		devm_release_action(&port->dev, cxl_dport_remove, dport);
>  		devm_kfree(&port->dev, dport);
>  	}
> -	devm_release_action(port->dev.parent, cxl_unlink_uport, port);
> -	devm_release_action(port->dev.parent, unregister_port, port);
>  }
>  
>  static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> @@ -1054,8 +1060,8 @@ static void cxl_detach_ep(void *data)
>  	for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) {
>  		struct device *dport_dev = grandparent(iter);
>  		struct cxl_port *port, *parent_port;
> -		LIST_HEAD(reap_dports);
>  		struct cxl_ep *ep;
> +		bool died = false;
>  
>  		if (!dport_dev)
>  			break;
> @@ -1095,15 +1101,16 @@ static void cxl_detach_ep(void *data)
>  			 * enumerated port. Block new cxl_add_ep() and garbage
>  			 * collect the port.
>  			 */
> +			died = true;
>  			port->dead = true;
> -			list_splice_init(&port->dports, &reap_dports);
> +			reap_dports(port);

I'm not immediately clear on why this refactor is tied up with moving
to the xarray.  Perhaps a comment in the commit message to add
more detail around this?

>  		}
>  		device_unlock(&port->dev);
>  
> -		if (!list_empty(&reap_dports)) {
> +		if (died) {
>  			dev_dbg(&cxlmd->dev, "delete %s\n",
>  				dev_name(&port->dev));
> -			delete_switch_port(port, &reap_dports);
> +			delete_switch_port(port);
>  		}
>  		put_device(&port->dev);
>  		device_unlock(&parent_port->dev);
> @@ -1282,23 +1289,6 @@ struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL);
>  
> -struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
> -					const struct device *dev)
> -{
> -	struct cxl_dport *dport;
> -
> -	device_lock(&port->dev);
> -	list_for_each_entry(dport, &port->dports, list)
> -		if (dport->dport == dev) {
> -			device_unlock(&port->dev);
> -			return dport;
> -		}
> -
> -	device_unlock(&port->dev);
> -	return NULL;
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_find_dport_by_dev, CXL);
> -
>  static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
>  				    struct cxl_port *port, int *target_map)
>  {
> @@ -1309,7 +1299,7 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
>  
>  	device_lock_assert(&port->dev);
>  
> -	if (list_empty(&port->dports))
> +	if (xa_empty(&port->dports))
>  		return -EINVAL;
>  
>  	write_seqlock(&cxlsd->target_lock);


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 29/46] cxl/port: Cache CXL host bridge data
  2022-06-24  4:19 ` [PATCH 29/46] cxl/port: Cache CXL host bridge data Dan Williams
@ 2022-06-30  9:21   ` Jonathan Cameron
  2022-07-10 19:09     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30  9:21 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:33 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Region creation has need for checking host-bridge connectivity when
> adding endpoints to regions. Record, at port creation time, the
> host-bridge to provide a useful shortcut from any location in the
> topology to the most-significant ancestor.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Trivial comment inline, but otherwise seems reasonable to me.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/port.c | 16 +++++++++++++++-
>  drivers/cxl/cxl.h       |  2 ++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index d2f6898940fa..c48f217e689a 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -546,6 +546,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
>  	if (rc < 0)
>  		goto err;
>  	port->id = rc;
> +	port->uport = uport;
>  
>  	/*
>  	 * The top-level cxl_port "cxl_root" does not have a cxl_port as
> @@ -556,14 +557,27 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
>  	dev = &port->dev;
>  	if (parent_dport) {
>  		struct cxl_port *parent_port = parent_dport->port;
> +		struct cxl_port *iter;
>  
>  		port->depth = parent_port->depth + 1;
>  		port->parent_dport = parent_dport;
>  		dev->parent = &parent_port->dev;
> +		/*
> +		 * walk to the host bridge, or the first ancestor that knows
> +		 * the host bridge
> +		 */
> +		iter = port;
> +		while (!iter->host_bridge &&
> +		       !is_cxl_root(to_cxl_port(iter->dev.parent)))
> +			iter = to_cxl_port(iter->dev.parent);
> +		if (iter->host_bridge)
> +			port->host_bridge = iter->host_bridge;
> +		else
> +			port->host_bridge = iter->uport;
> +		dev_dbg(uport, "host-bridge: %s\n", dev_name(port->host_bridge));
>  	} else
>  		dev->parent = uport;
>  
> -	port->uport = uport;
>  	port->component_reg_phys = component_reg_phys;
>  	ida_init(&port->decoder_ida);
>  	port->dpa_end = -1;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 8e2c1b393552..0211cf0d3574 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -331,6 +331,7 @@ struct cxl_nvdimm {
>   * @component_reg_phys: component register capability base address (optional)
>   * @dead: last ep has been removed, force port re-creation
>   * @depth: How deep this port is relative to the root. depth 0 is the root.
> + * @host_bridge: Shortcut to the platform attach point for this port
>   */
>  struct cxl_port {
>  	struct device dev;
> @@ -344,6 +345,7 @@ struct cxl_port {
>  	resource_size_t component_reg_phys;
>  	bool dead;
>  	unsigned int depth;
> +	struct device *host_bridge;
Would feel more natural up next to the struct device *uport element of cxl_port.


>  };
>  
>  static inline struct cxl_dport *cxl_dport_load(struct cxl_port *port,


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
  2022-06-24  4:19 ` [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity Dan Williams
@ 2022-06-30  9:26   ` Jonathan Cameron
  2022-07-10 20:40     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30  9:26 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:34 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Ben Widawsky <bwidawsk@kernel.org>
> 
> The region provisioning flow involves selecting interleave ways +
> granularity settings for a region, and then programming the decoder
> topology to meet those constraints, if possible. For example, root
> decoders set the minimum interleave ways + granularity for any hosted
> regions.
> 
> Given decoder programming is not atomic and collisions can occur between
> multiple requesting regions userpace will be resonsible for conflict
> resolution and it needs these attributes to make those decisions.
> 
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> [djbw: reword changelog, make read-only, add sysfs ABI documentaion]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
some comments on docs.

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 23 +++++++++++++++++++++++
>  drivers/cxl/core/port.c                 | 23 +++++++++++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 85844f9bc00b..2a4e4163879f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -215,3 +215,26 @@ Description:
>  		allocations are enforced to occur in increasing 'decoderX.Y/id'
>  		order and frees are enforced to occur in decreasing
>  		'decoderX.Y/id' order.
> +
> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/interleave_ways
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The number of targets across which this decoder's host
> +		physical address (HPA) memory range is interleaved. The device
> +		maps every Nth block of HPA (of size ==
> +		'interleave_granularity') to consecutive DPA addresses. The
> +		decoder's position in the interleave is determined by the
> +		device's (endpoint or switch) switch ancestry.

Perhaps make it clear what happens for host bridges (i.e. decoder position
in interleave defined by fixed memory window.

> +
> +
> +What:		/sys/bus/cxl/devices/decoderX.Y/interleave_granularity
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) The number of consecutive bytes of host physical address
> +		space this decoder claims at address N before awaint the next

awaint?

> +		address (N + interleave_granularity * intereleave_ways).

interleave_ways

Even knowing exactly what this is, I don't understand the docs so
perhaps reword this :)

> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index c48f217e689a..08a380d20cf1 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -260,10 +260,33 @@ static ssize_t dpa_size_store(struct device *dev, struct device_attribute *attr,
>  }
>  static DEVICE_ATTR_RW(dpa_size);
>  
> +static ssize_t interleave_granularity_show(struct device *dev,
> +					   struct device_attribute *attr,
> +					   char *buf)
> +{
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +
> +	return sysfs_emit(buf, "%d\n", cxld->interleave_granularity);
> +}
> +
> +static DEVICE_ATTR_RO(interleave_granularity);
> +
> +static ssize_t interleave_ways_show(struct device *dev,
> +				    struct device_attribute *attr, char *buf)
> +{
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +
> +	return sysfs_emit(buf, "%d\n", cxld->interleave_ways);
> +}
> +
> +static DEVICE_ATTR_RO(interleave_ways);
> +
>  static struct attribute *cxl_decoder_base_attrs[] = {
>  	&dev_attr_start.attr,
>  	&dev_attr_size.attr,
>  	&dev_attr_locked.attr,
> +	&dev_attr_interleave_granularity.attr,
> +	&dev_attr_interleave_ways.attr,
>  	NULL,
>  };
>  


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices
  2022-06-24  4:19 ` [PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices Dan Williams
@ 2022-06-30  9:33   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30  9:33 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:35 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Unless and until accelerator (type-2) drivers start registering for
> CXL.mem mapping services from the CXL subsystem core, initialize idle
> HDM decoders to the "expander" type. I.e. the only CXL devices using the
> CXL core presently are those implementing the CXL 2.0 Type-3 memory
> expander device class code that the cxl_pci driver claims.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/hdm.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 672bf3e97811..7b58f6911523 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -474,6 +474,17 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  		cxld->flags |= CXL_DECODER_F_ENABLE;
>  		if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
>  			cxld->flags |= CXL_DECODER_F_LOCK;
> +		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
> +			cxld->target_type = CXL_DECODER_EXPANDER;
> +		else
> +			cxld->target_type = CXL_DECODER_ACCELERATOR;
> +	} else {
> +		/* unless / until type-2 drivers arrive, assume type-3 */
> +		if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl) == 0) {
> +			ctrl |= CXL_HDM_DECODER0_CTRL_TYPE;
> +			writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
> +		}
> +		cxld->target_type = CXL_DECODER_EXPANDER;
>  	}
>  	rc = cxl_to_ways(FIELD_GET(CXL_HDM_DECODER0_CTRL_IW_MASK, ctrl),
>  			 &cxld->interleave_ways);
> @@ -488,11 +499,6 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  	if (rc)
>  		return rc;
>  
> -	if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl))
> -		cxld->target_type = CXL_DECODER_EXPANDER;
> -	else
> -		cxld->target_type = CXL_DECODER_ACCELERATOR;
> -
>  	if (!cxled) {
>  		target_list.value =
>  			ioread64_hi_lo(hdm + CXL_HDM_DECODER0_TL_LOW(which));


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints
  2022-06-24  4:19 ` [PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints Dan Williams
@ 2022-06-30  9:48   ` Jonathan Cameron
  2022-07-10 21:01     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30  9:48 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:36 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The port scanning algorithm in devm_cxl_enumerate_ports() walks up the
> topology and adds cxl_port objects starting from the root down to the
> endpoint. When those ports are initially created they know all their
> dports, but they do not know the downstream cxl_port instance that
> represents the next descendant in the topology. Rework create_endpoint()
> into devm_cxl_add_endpoint() that enumerates the downstream cxl_port
> topology into each port's 'struct cxl_ep' record for each endpoint it
> that the port is an ancestor.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

I'm doing my normal moaning about tidying up in a patch that makes
a more serious change.  Ideally pull that out, but meh if it's a real pain
I can live with it as long as you call it out in the patch description.

With that

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/port.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h       |  7 ++++++-
>  drivers/cxl/mem.c       | 30 +-----------------------------
>  3 files changed, 48 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 08a380d20cf1..2e56903399c2 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1089,6 +1089,47 @@ static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
>  	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
>  }
>  
> +int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd,
> +			  struct cxl_dport *parent_dport)
> +{
> +	struct cxl_port *parent_port = parent_dport->port;
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_port *endpoint, *iter, *down;
> +	int rc;
> +
> +	/*
> +	 * Now that the path to the root is established record all the
> +	 * intervening ports in the chain.
> +	 */
> +	for (iter = parent_port, down = NULL; !is_cxl_root(iter);
> +	     down = iter, iter = to_cxl_port(iter->dev.parent)) {
> +		struct cxl_ep *ep;
> +
> +		ep = cxl_ep_load(iter, cxlmd);
> +		ep->next = down;
> +	}
> +
> +	endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev,
> +				     cxlds->component_reg_phys, parent_dport);
> +	if (IS_ERR(endpoint))
> +		return PTR_ERR(endpoint);
> +
> +	dev_dbg(&cxlmd->dev, "add: %s\n", dev_name(&endpoint->dev));
> +
> +	rc = cxl_endpoint_autoremove(cxlmd, endpoint);
> +	if (rc)
> +		return rc;
> +
> +	if (!endpoint->dev.driver) {
> +		dev_err(&cxlmd->dev, "%s failed probe\n",
> +			dev_name(&endpoint->dev));
> +		return -ENXIO;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_add_endpoint, CXL);
> +
>  static void cxl_detach_ep(void *data)
>  {
>  	struct cxl_memdev *cxlmd = data;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 0211cf0d3574..f761cf78cc05 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -371,11 +371,14 @@ struct cxl_dport {
>  /**
>   * struct cxl_ep - track an endpoint's interest in a port
>   * @ep: device that hosts a generic CXL endpoint (expander or accelerator)
> - * @dport: which dport routes to this endpoint on this port
> + * @dport: which dport routes to this endpoint on @port

fix is good, but shouldn't be in this patch really..

> + * @next: cxl switch port across the link attached to @dport NULL if
> + *	  attached to an endpoint
>   */
>  struct cxl_ep {
>  	struct device *ep;
>  	struct cxl_dport *dport;
> +	struct cxl_port *next;
>  };
>  
>  /*
> @@ -398,6 +401,8 @@ struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port);
>  struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
>  				   resource_size_t component_reg_phys,
>  				   struct cxl_dport *parent_dport);
> +int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd,
> +			  struct cxl_dport *parent_dport);
>  struct cxl_port *find_cxl_root(struct device *dev);
>  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
>  int cxl_bus_rescan(void);
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 2786d3402c9e..64ccf053d32c 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -25,34 +25,6 @@
>   * in higher level operations.
>   */
>  
> -static int create_endpoint(struct cxl_memdev *cxlmd,
> -			   struct cxl_dport *parent_dport)
> -{
> -	struct cxl_port *parent_port = parent_dport->port;
> -	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> -	struct cxl_port *endpoint;
> -	int rc;
> -
> -	endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev,
> -				     cxlds->component_reg_phys, parent_dport);
> -	if (IS_ERR(endpoint))
> -		return PTR_ERR(endpoint);
> -
> -	dev_dbg(&cxlmd->dev, "add: %s\n", dev_name(&endpoint->dev));
> -
> -	rc = cxl_endpoint_autoremove(cxlmd, endpoint);
> -	if (rc)
> -		return rc;
> -
> -	if (!endpoint->dev.driver) {
> -		dev_err(&cxlmd->dev, "%s failed probe\n",
> -			dev_name(&endpoint->dev));
> -		return -ENXIO;
> -	}
> -
> -	return 0;
> -}
> -
>  static void enable_suspend(void *data)
>  {
>  	cxl_mem_active_dec();
> @@ -116,7 +88,7 @@ static int cxl_mem_probe(struct device *dev)
>  		goto unlock;
>  	}
>  
> -	rc = create_endpoint(cxlmd, dport);
> +	rc = devm_cxl_add_endpoint(cxlmd, dport);
>  unlock:
>  	device_unlock(&parent_port->dev);
>  	put_device(&parent_port->dev);


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 33/46] resource: Introduce alloc_free_mem_region()
  2022-06-24  4:19 ` [PATCH 33/46] resource: Introduce alloc_free_mem_region() Dan Williams
@ 2022-06-30 10:35   ` Jonathan Cameron
  2022-07-10 21:58     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 10:35 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Jason Gunthorpe,
	Matthew Wilcox, Christoph Hellwig

On Thu, 23 Jun 2022 21:19:37 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The core of devm_request_free_mem_region() is a helper that searches for
> free space in iomem_resource and performs __request_region_locked() on
> the result of that search. The policy choices of the implementation
> conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is
> immediately marked busy, and a preference to search for the first-fit
> free range in descending order from the top of the physical address
> space.
> 
> CXL has a need for a similar allocator, but with the following tweaks:
> 
> 1/ Search for free space in ascending order
> 
> 2/ Search for free space relative to a given CXL window
> 
> 3/ 'insert' rather than 'request' the new resource given downstream
>    drivers from the CXL Region driver (like the pmem or dax drivers) are
>    responsible for request_mem_region() when they activate the memory
>    range.
> 
> Rework __request_free_mem_region() into get_free_mem_region() which
> takes a set of GFR_* (Get Free Region) flags to control the allocation
> policy (ascending vs descending), and "busy" policy (insert_resource()
> vs request_region()).
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

A few things inline,

Thanks,

Jonathan

> ---
>  include/linux/ioport.h |   2 +
>  kernel/resource.c      | 174 ++++++++++++++++++++++++++++++++---------
>  mm/Kconfig             |   5 ++
>  3 files changed, 146 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index ec5f71f7135b..ed03518347aa 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -329,6 +329,8 @@ struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size);
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name);
> +struct resource *alloc_free_mem_region(struct resource *base,
> +		unsigned long size, unsigned long align, const char *name);
>  
>  static inline void irqresource_disabled(struct resource *res, u32 irq)
>  {
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 53a534db350e..9fc990274106 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c


> +static bool gfr_continue(struct resource *base, resource_size_t addr,
> +			 resource_size_t size, unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr > size && addr >= base->start;
> +	return addr > addr - size &&

Is this checking for wrap around?  If so maybe a comment to call that out?

> +	       addr <= min_t(resource_size_t, base->end,
> +			     (1ULL << MAX_PHYSMEM_BITS) - 1);
> +}
> +
> +static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
> +				unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr - size;
> +	return addr + size;
> +}
> +
> +static void remove_free_mem_region(void *_res)
>  {
> -	resource_size_t end, addr;
> +	struct resource *res = _res;
> +
> +	if (res->parent)
> +		remove_resource(res);
> +	free_resource(res);
> +}
> +
> +static struct resource *
> +get_free_mem_region(struct device *dev, struct resource *base,
> +		    resource_size_t size, const unsigned long align,
> +		    const char *name, const unsigned long desc,
> +		    const unsigned long flags)
> +{
> +	resource_size_t addr;
>  	struct resource *res;
>  	struct region_devres *dr = NULL;
>  
> -	size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
> -	end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
> -	addr = end - size + 1UL;
> +	size = ALIGN(size, align);
>  
>  	res = alloc_resource(GFP_KERNEL);
>  	if (!res)
>  		return ERR_PTR(-ENOMEM);
>  
> -	if (dev) {
> +	if (dev && (flags & GFR_REQUEST_REGION)) {
>  		dr = devres_alloc(devm_region_release,
>  				sizeof(struct region_devres), GFP_KERNEL);
>  		if (!dr) {
>  			free_resource(res);
>  			return ERR_PTR(-ENOMEM);
>  		}
> +	} else if (dev) {
> +		if (devm_add_action_or_reset(dev, remove_free_mem_region, res))
> +			return ERR_PTR(-ENOMEM);

slightly nicer to return whatever value you got back from devm_add_action_or_reset()

>  	}
>  
>  	write_lock(&resource_lock);
> -	for (; addr > size && addr >= base->start; addr -= size) {
> -		if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> -				REGION_DISJOINT)
> +	for (addr = gfr_start(base, size, align, flags);
> +	     gfr_continue(base, addr, size, flags);
> +	     addr = gfr_next(addr, size, flags)) {
> +		if (__region_intersects(base, addr, size, 0, IORES_DESC_NONE) !=
> +		    REGION_DISJOINT)
>  			continue;
>  
> -		if (__request_region_locked(res, &iomem_resource, addr, size,
> -						name, 0))
> -			break;
> +		if (flags & GFR_REQUEST_REGION) {
> +			if (__request_region_locked(res, &iomem_resource, addr,
> +						    size, name, 0))
> +				break;
>  
> -		if (dev) {
> -			dr->parent = &iomem_resource;
> -			dr->start = addr;
> -			dr->n = size;
> -			devres_add(dev, dr);
> -		}
> +			if (dev) {
> +				dr->parent = &iomem_resource;
> +				dr->start = addr;
> +				dr->n = size;
> +				devres_add(dev, dr);
> +			}
>  
> -		res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> -		write_unlock(&resource_lock);
> +			res->desc = desc;
> +			write_unlock(&resource_lock);
> +
> +
> +			/*
> +			 * A driver is claiming this region so revoke any
> +			 * mappings.
> +			 */
> +			revoke_iomem(res);
> +		} else {
> +			res->start = addr;
> +			res->end = addr + size - 1;
> +			res->name = name;
> +			res->desc = desc;
> +			res->flags = IORESOURCE_MEM;
> +
> +			/*
> +			 * Only succeed if the resource hosts an exclusive
> +			 * range after the insert
> +			 */
> +			if (__insert_resource(base, res) || res->child)
> +				break;
> +
> +			write_unlock(&resource_lock);
> +		}
>  
> -		/*
> -		 * A driver is claiming this region so revoke any mappings.
> -		 */
> -		revoke_iomem(res);
>  		return res;
>  	}
>  	write_unlock(&resource_lock);
>  
> -	free_resource(res);
> -	if (dr)
> +	if (flags & GFR_REQUEST_REGION) {
> +		free_resource(res);
>  		devres_free(dr);

The original if (dr) was unnecessary as devres_free() checks.

Looking just at this patch it looks like you aren't covering the
corner case of dev == NULL and GFR_REQUEST_REGION.

Perhaps worth a tiny comment in patch description? (doesn't seem worth
pulling this change out as a precursor given it's so small).
Of add the extra if (dr) back in to 'document' that no change...


> +	} else if (dev)
> +		devm_release_action(dev, remove_free_mem_region, res);
>  
>  	return ERR_PTR(-ERANGE);
>  }
> @@ -1854,18 +1928,48 @@ static struct resource *__request_free_mem_region(struct device *dev,
>  struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size)
>  {
> -	return __request_free_mem_region(dev, base, size, dev_name(dev));
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(dev, base, size, GFR_DEFAULT_ALIGN,
> +				   dev_name(dev),
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(devm_request_free_mem_region);
>  
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name)
>  {
> -	return __request_free_mem_region(NULL, base, size, name);
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(NULL, base, size, GFR_DEFAULT_ALIGN, name,
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(request_free_mem_region);
>  
> -#endif /* CONFIG_DEVICE_PRIVATE */
> +/**
> + * alloc_free_mem_region - find a free region relative to @base
> + * @base: resource that will parent the new resource
> + * @size: size in bytes of memory to allocate from @base
> + * @align: alignment requirements for the allocation
> + * @name: resource name
> + *
> + * Buses like CXL, that can dynamically instantiate new memory regions,
> + * need a method to allocate physical address space for those regions.
> + * Allocate and insert a new resource to cover a free, unclaimed by a
> + * descendant of @base, range in the span of @base.
> + */
> +struct resource *alloc_free_mem_region(struct resource *base,
Given the extra align parameter, does it make sense to give this a naming
that highlights that vs the other two interfaces above?

alloc_free_mem_region_aligned()

> +				       unsigned long size, unsigned long align,
> +				       const char *name)
> +{
> +	/* GFR_ASCENDING | GFR_INSERT_RESOURCE */

Given those flags don't exist and some fool like me might grep for them
perhaps better to describe it in text

	/* Default of ascending direction and insert resource */

> +	unsigned long flags = 0;
> +
> +	return get_free_mem_region(NULL, base, size, align, name,
> +				   IORES_DESC_NONE, flags);
> +}
> +EXPORT_SYMBOL_NS_GPL(alloc_free_mem_region, CXL);
> +#endif /* CONFIG_GET_FREE_REGION */
>  
>  static int __init strict_iomem(char *str)
>  {
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 169e64192e48..a5b4fee2e3fd 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -994,9 +994,14 @@ config HMM_MIRROR
>  	bool
>  	depends on MMU
>  
> +config GET_FREE_REGION
> +	depends on SPARSEMEM
> +	bool
> +
>  config DEVICE_PRIVATE
>  	bool "Unaddressable device memory (GPU memory, ...)"
>  	depends on ZONE_DEVICE
> +	select GET_FREE_REGION
>  
>  	help
>  	  Allows creation of struct pages to represent unaddressable device


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder'
  2022-06-28 16:12   ` Jonathan Cameron
@ 2022-06-30 10:56     ` Jonathan Cameron
  2022-07-10  0:49       ` Dan Williams
  2022-07-10  0:33     ` Dan Williams
  1 sibling, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 10:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

On Tue, 28 Jun 2022 17:12:04 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Thu, 23 Jun 2022 19:45:57 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Currently 'struct cxl_decoder' contains the superset of attributes
> > needed for all decoder types. Before more type-specific attributes are
> > added to the common definition, reorganize 'struct cxl_decoder' into type
> > specific objects.
> > 
> > This patch, the first of three, factors out a cxl_switch_decoder type.
> > The 'switch' decoder type represents the decoder instances of cxl_port's
> > that route from the root of a CXL memory decode topology to the
> > endpoints. They come in two flavors, root-level decoders, statically
> > defined by platform firmware, and mid-level decoders, where
> > interleave-granularity, interleave-width, and the target list are
> > mutable.  
> 
> I'd like to see this info on cxl_switch_decoder being used for
> switches AND other stuff as docs next to the definition. It confused
> me when looked directly at the resulting of applying this series
> and made more sense once I read to this patch.
> 
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>  
> 
> Basic idea is fine, but there are a few places where I think this is
> 'too clever' with error handling and it's worth duplicating a few
> error messages to keep the flow simpler.
> 

follow up on that. I'd missed the kfree(alloc) hiding in plain
sight at the end of the function.



> > @@ -1179,13 +1210,27 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  {
> >  	struct cxl_decoder *cxld;
> >  	struct device *dev;
> > +	void *alloc;
> >  	int rc = 0;
> >  
> >  	if (nr_targets > CXL_DECODER_MAX_INTERLEAVE)
> >  		return ERR_PTR(-EINVAL);
> >  
> > -	cxld = kzalloc(struct_size(cxld, target, nr_targets), GFP_KERNEL);
> > -	if (!cxld)
> > +	if (nr_targets) {
> > +		struct cxl_switch_decoder *cxlsd;
> > +
> > +		alloc = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL);  
> 
> I'd rather see a local check on the allocation failure even if it adds a few lines
> of duplicated code - which after you've dropped the local alloc variable won't be
> much even after a later patch adds another path in here.  The eventual code
> of this function is more than a little nasty when an early return in each
> path would, as far as I can tell, give the same result without the at least
> 3 null checks prior to returning (to ensure nothing happens before reaching
> the if (!alloc)

clearly not enough caffeine that day as I'd missed the use for unifying
the frees at the end of the function... Just noticed that in a later patch
that touches the error path.

I still don't much like the complexity of the flow, but can see why you did it
this way now.



> 
> 
> 
> 
> 		cxlsd = kzalloc()
> 		if (!cxlsd)
> 			return ERR_PTR(-ENOMEM);
> 
> 		cxlsd->nr_targets = nr_targets;
> 		seqlock_init(...)
> 
> 	} else {
> 		cxld = kzalloc(sizerof(*cxld), GFP_KERNEL);
> 		if (!cxld)
> 			return ERR_PTR(-ENOMEM);
> 
> > +		cxlsd = alloc;
> > +		if (cxlsd) {
> > +			cxlsd->nr_targets = nr_targets;
> > +			seqlock_init(&cxlsd->target_lock);
> > +			cxld = &cxlsd->cxld;
> > +		}
> > +	} else {
> > +		alloc = kzalloc(sizeof(*cxld), GFP_KERNEL);
> > +		cxld = alloc;
> > +	}
> > +	if (!alloc)
> >  		return ERR_PTR(-ENOMEM);
> >  
> >  	rc = ida_alloc(&port->decoder_ida, GFP_KERNEL);
> > @@ -1196,8 +1241,6 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  	get_device(&port->dev);
> >  	cxld->id = rc;
> >  
> > -	cxld->nr_targets = nr_targets;
> > -	seqlock_init(&cxld->target_lock);
> >  	dev = &cxld->dev;
> >  	device_initialize(dev);
> >  	lockdep_set_class(&dev->mutex, &cxl_decoder_key);
> > @@ -1222,7 +1265,7 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  
> >  	return cxld;
> >  err:
> > -	kfree(cxld);
> > +	kfree(alloc);
> >  	return ERR_PTR(rc);
> >  }

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 34/46] cxl/region: Add region creation support
  2022-06-24  4:19 ` [PATCH 34/46] cxl/region: Add region creation support Dan Williams
@ 2022-06-30 13:17   ` Jonathan Cameron
  2022-07-11  0:08     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 13:17 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:38 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Ben Widawsky <bwidawsk@kernel.org>
> 
> CXL 2.0 allows for dynamic provisioning of new memory regions (system
> physical address resources like "System RAM" and "Persistent Memory").
> Whereas DDR and PMEM resources are conveyed statically at boot, CXL
> allows for assembling and instantiating new regions from the available
> capacity of CXL memory expanders in the system.
> 
> Sysfs with an "echo $region_name > $create_region_attribute" interface
> is chosen as the mechanism to initiate the provisioning process. This
> was chosen over ioctl() and netlink() to keep the configuration
> interface entirely in a pseudo-fs interface, and it was chosen over
> configfs since, aside from this one creation event, the interface is
> read-mostly. I.e. configfs supports cases where an object is designed to
> be provisioned each boot, like an iSCSI storage target, and CXL region
> creation is mostly for PMEM regions which are created usually once
> per-lifetime of a server instance.
> 
> Recall that the major change that CXL brings over previous
> persistent memory architectures is the ability to dynamically define new
> regions.  Compare that to drivers like 'nfit' where the region
> configuration is statically defined by platform firmware.
> 
> Regions are created as a child of a root decoder that encompasses an
> address space with constraints. When created through sysfs, the root
> decoder is explicit. When created from an LSA's region structure a root
> decoder will possibly need to be inferred by the driver.
> 
> Upon region creation through sysfs, a vacant region is created with a
> unique name. Regions have a number of attributes that must be configured
> before the region can be bound to the driver where HDM decoder program
> is completed.
> 
> An example of creating a new region:
> 
> - Allocate a new region name:
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
> 
> - Create a new region by name:
> while
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)

Perhaps it is worth calling out the region ID allocator is shared
with nvdimms and other usecases.  I'm not really sure what the advantage
in doing that is, but it doesn't do any real harm.

> ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
> do true; done
> 
> - Region now exists in sysfs:
> stat -t /sys/bus/cxl/devices/decoder0.0/$region
> 
> - Delete the region, and name:
> echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
> 
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> [djbw: simplify locking, reword changelog]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl       |  25 +++
>  .../driver-api/cxl/memory-devices.rst         |  11 +
>  drivers/cxl/Kconfig                           |   5 +
>  drivers/cxl/core/Makefile                     |   1 +
>  drivers/cxl/core/core.h                       |  12 ++
>  drivers/cxl/core/port.c                       |  39 +++-
>  drivers/cxl/core/region.c                     | 199 ++++++++++++++++++
>  drivers/cxl/cxl.h                             |  18 ++
>  tools/testing/cxl/Kbuild                      |   1 +
>  9 files changed, 308 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/cxl/core/region.c
> 

...


> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 472ec9cb1018..ebe6197fb9b8 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -9,6 +9,18 @@ extern const struct device_type cxl_nvdimm_type;
>  
>  extern struct attribute_group cxl_base_attribute_group;
>  
> +#ifdef CONFIG_CXL_REGION
> +extern struct device_attribute dev_attr_create_pmem_region;
> +extern struct device_attribute dev_attr_delete_region;
> +/*
> + * Note must be used at the end of an attribute list, since it
> + * terminates the list in the CONFIG_CXL_REGION=n case.

That's rather ugly.  Maybe just push the ifdef down into the c file
where we will be shortening the list and it should be obvious what is
going on without needing the comment?  Much as I don't like ifdef
magic in the c files, it sometimes ends up cleaner.

> + */
> +#define CXL_REGION_ATTR(x) (&dev_attr_##x.attr)
> +#else
> +#define CXL_REGION_ATTR(x) NULL
> +#endif
> +
>  struct cxl_send_command;
>  struct cxl_mem_query_commands;
>  int cxl_query_cmd(struct cxl_memdev *cxlmd,
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 2e56903399c2..c9207ebc3f32 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
>  #include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/memregion.h>
>  #include <linux/workqueue.h>
>  #include <linux/debugfs.h>
>  #include <linux/device.h>
> @@ -300,11 +301,35 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>  	&dev_attr_cap_type2.attr,
>  	&dev_attr_cap_type3.attr,
>  	&dev_attr_target_list.attr,
> +	CXL_REGION_ATTR(create_pmem_region),
> +	CXL_REGION_ATTR(delete_region),
>  	NULL,
>  };

>  
>  static const struct attribute_group *cxl_decoder_root_attribute_groups[] = {
> @@ -387,6 +412,7 @@ static void cxl_root_decoder_release(struct device *dev)
>  {
>  	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>  
> +	memregion_free(atomic_read(&cxlrd->region_id));
>  	__cxl_decoder_release(&cxlrd->cxlsd.cxld);
>  	kfree(cxlrd);
>  }
> @@ -1415,6 +1441,7 @@ static struct lock_class_key cxl_decoder_key;
>  static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  					     unsigned int nr_targets)
>  {
> +	struct cxl_root_decoder *cxlrd = NULL;
>  	struct cxl_decoder *cxld;
>  	struct device *dev;
>  	void *alloc;
> @@ -1425,16 +1452,20 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  
>  	if (nr_targets) {
>  		struct cxl_switch_decoder *cxlsd;
> -		struct cxl_root_decoder *cxlrd;
>  
>  		if (is_cxl_root(port)) {
>  			alloc = kzalloc(struct_size(cxlrd, cxlsd.target,
>  						    nr_targets),
>  					GFP_KERNEL);
>  			cxlrd = alloc;
> -			if (cxlrd)
> +			if (cxlrd) {
>  				cxlsd = &cxlrd->cxlsd;
> -			else
> +				atomic_set(&cxlrd->region_id, -1);
> +				rc = memregion_alloc(GFP_KERNEL);
> +				if (rc < 0)
> +					goto err;

Leaving region_id set to -1 seems interesting for ever
recovering from this error.  Perhaps a comment on how the magic
value is used.

> +				atomic_set(&cxlrd->region_id, rc);
> +			} else
>  				cxlsd = NULL;
>  		} else {
>  			alloc = kzalloc(struct_size(cxlsd, target, nr_targets),
> @@ -1490,6 +1521,8 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  
>  	return cxld;
>  err:
> +	if (cxlrd && atomic_read(&cxlrd->region_id) >= 0)
> +		memregion_free(atomic_read(&cxlrd->region_id));
>  	kfree(alloc);
>  	return ERR_PTR(rc);
>  }
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> new file mode 100644
> index 000000000000..f2a0ead20ca7
> --- /dev/null
> +++ b/drivers/cxl/core/region.c
> @@ -0,0 +1,199 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> +#include <linux/memregion.h>
> +#include <linux/genalloc.h>
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/idr.h>
> +#include <cxl.h>
> +#include "core.h"
> +
> +/**
> + * DOC: cxl core region
> + *
> + * CXL Regions represent mapped memory capacity in system physical address
> + * space. Whereas the CXL Root Decoders identify the bounds of potential CXL
> + * Memory ranges, Regions represent the active mapped capacity by the HDM
> + * Decoder Capability structures throughout the Host Bridges, Switches, and
> + * Endpoints in the topology.
> + */
> +
> +static struct cxl_region *to_cxl_region(struct device *dev);
> +
> +static void cxl_region_release(struct device *dev)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +
> +	memregion_free(cxlr->id);
> +	kfree(cxlr);
> +}
> +
> +static const struct device_type cxl_region_type = {
> +	.name = "cxl_region",
> +	.release = cxl_region_release,
> +};
> +
> +bool is_cxl_region(struct device *dev)
> +{
> +	return dev->type == &cxl_region_type;
> +}
> +EXPORT_SYMBOL_NS_GPL(is_cxl_region, CXL);
> +
> +static struct cxl_region *to_cxl_region(struct device *dev)
> +{
> +	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> +			  "not a cxl_region device\n"))
> +		return NULL;
> +
> +	return container_of(dev, struct cxl_region, dev);
> +}
> +
> +static void unregister_region(void *dev)
> +{
> +	device_unregister(dev);
> +}
> +
> +static struct lock_class_key cxl_region_key;
> +
> +static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int id)
> +{
> +	struct cxl_region *cxlr;
> +	struct device *dev;
> +
> +	cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL);
> +	if (!cxlr) {
> +		memregion_free(id);

That's a bit nasty as it gives the function side effects. Perhaps some
comments in the callers of this to highlight that memregion will either be freed
in here or handled over to the device.

> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	dev = &cxlr->dev;
> +	device_initialize(dev);
> +	lockdep_set_class(&dev->mutex, &cxl_region_key);
> +	dev->parent = &cxlrd->cxlsd.cxld.dev;
> +	device_set_pm_not_required(dev);
> +	dev->bus = &cxl_bus_type;
> +	dev->type = &cxl_region_type;
> +	cxlr->id = id;
> +
> +	return cxlr;
> +}
> +
> +/**
> + * devm_cxl_add_region - Adds a region to a decoder
> + * @cxlrd: root decoder
> + * @id: memregion id to create
> + * @mode: mode for the endpoint decoders of this region

Missing docs for type

> + *
> + * This is the second step of region initialization. Regions exist within an
> + * address space which is mapped by a @cxlrd.
> + *
> + * Return: 0 if the region was added to the @cxlrd, else returns negative error
> + * code. The region will be named "regionZ" where Z is the unique region number.
> + */
> +static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
> +					      int id,
> +					      enum cxl_decoder_mode mode,
> +					      enum cxl_decoder_type type)
> +{
> +	struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
> +	struct cxl_region *cxlr;
> +	struct device *dev;
> +	int rc;
> +
> +	cxlr = cxl_region_alloc(cxlrd, id);
> +	if (IS_ERR(cxlr))
> +		return cxlr;
> +	cxlr->mode = mode;
> +	cxlr->type = type;
> +
> +	dev = &cxlr->dev;
> +	rc = dev_set_name(dev, "region%d", id);
> +	if (rc)
> +		goto err;
> +
> +	rc = device_add(dev);
> +	if (rc)
> +		goto err;
> +
> +	rc = devm_add_action_or_reset(port->uport, unregister_region, cxlr);
> +	if (rc)
> +		return ERR_PTR(rc);
> +
> +	dev_dbg(port->uport, "%s: created %s\n",
> +		dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
> +	return cxlr;
> +
> +err:
> +	put_device(dev);
> +	return ERR_PTR(rc);
> +}
> +

> +static ssize_t create_pmem_region_store(struct device *dev,
> +					struct device_attribute *attr,
> +					const char *buf, size_t len)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> +	struct cxl_region *cxlr;
> +	unsigned int id, rc;
> +
> +	rc = sscanf(buf, "region%u\n", &id);
> +	if (rc != 1)
> +		return -EINVAL;
> +
> +	rc = memregion_alloc(GFP_KERNEL);
> +	if (rc < 0)
> +		return rc;
> +
> +	if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
> +		memregion_free(rc);
> +		return -EBUSY;
> +	}
> +
> +	cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM,
> +				   CXL_DECODER_EXPANDER);
> +	if (IS_ERR(cxlr))
> +		return PTR_ERR(cxlr);
> +
> +	return len;
> +}
> +DEVICE_ATTR_RW(create_pmem_region);
> +
> +static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld,

Perhaps rename cxld here to make it clear it's a root decoder only.

> +						  const char *name)
> +{
> +	struct device *region_dev;
> +
> +	region_dev = device_find_child_by_name(&cxld->dev, name);
> +	if (!region_dev)
> +		return ERR_PTR(-ENODEV);
> +
> +	return to_cxl_region(region_dev);
> +}
> +
> +static ssize_t delete_region_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct cxl_port *port = to_cxl_port(dev->parent);
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
As above, given it's the root decoder can we name it to make that
obvious?

> +	struct cxl_region *cxlr;
> +
> +	cxlr = cxl_find_region_by_name(cxld, buf);
> +	if (IS_ERR(cxlr))
> +		return PTR_ERR(cxlr);
> +
> +	devm_release_action(port->uport, unregister_region, cxlr);
> +	put_device(&cxlr->dev);
> +
> +	return len;
> +}
> +DEVICE_ATTR_WO(delete_region);

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 36/46] cxl/region: Add interleave ways attribute
  2022-06-24  4:19 ` [PATCH 36/46] cxl/region: Add interleave ways attribute Dan Williams
@ 2022-06-30 13:44   ` Jonathan Cameron
  2022-07-11  0:32     ` Dan Williams
  2022-06-30 13:45   ` Jonathan Cameron
  1 sibling, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 13:44 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:40 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Ben Widawsky <bwidawsk@kernel.org>
> 
> Add an ABI to allow the number of devices that comprise a region to be
> set.

Should at least mention interleave_granularity is being added as well!

> 
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> [djbw: reword changelog]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Random diversion inline...

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |  21 ++++
>  drivers/cxl/core/region.c               | 128 ++++++++++++++++++++++++
>  drivers/cxl/cxl.h                       |  33 ++++++
>  3 files changed, 182 insertions(+)

> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f75978f846b9..78af42454760 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -7,6 +7,7 @@


> +static ssize_t interleave_granularity_store(struct device *dev,
> +					    struct device_attribute *attr,
> +					    const char *buf, size_t len)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> +	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +	struct cxl_region_params *p = &cxlr->params;
> +	int rc, val;
> +	u16 ig;
> +
> +	rc = kstrtoint(buf, 0, &val);
> +	if (rc)
> +		return rc;
> +
> +	rc = granularity_to_cxl(val, &ig);
> +	if (rc)
> +		return rc;
> +
> +	/* region granularity must be >= root granularity */

In general I think that's an implementation choice.  Sure today
we only support it this way, but it's perfectly possible to build
setups where that's not the case.  Maybe the comment should say
that this code goes with an implementation choice inline with
the software guide (that argues you will always prefer small
ig for interleaving at the host to make best use of bandwidth etc).

Interestingly the code I was previously testing QEMU with
allowed that option (might have been only option that worked).
That code was a mixture of Ben's earlier version and my own hacks.
It probably doesn't make sense to support other ways of picking
the interleaving granularity until / if we ever get a request
to do so. 

I think it results in a different device ordering.

Ordering with this

    Host
     | 4k
    / \
   /   \  
  HB   HB  8k
  |     |
 / \   / \
0  2   1  3

Ordering with Larger granularity CFMWS over finer granularity HB

    Host
     | 8k
    / \
   /   \ 
  HB   HB 4k
  |     |
 / \   / \
0  1   2  3

Not clear why you'd do the second one though :)  So can ignore for now.


> +	if (val < cxld->interleave_granularity)
> +		return -EINVAL;
> +
> +	rc = down_write_killable(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +	if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +		rc = -EBUSY;
> +		goto out;
> +	}
> +
> +	p->interleave_granularity = val;
> +out:
> +	up_read(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +	return len;
> +}
> +static DEVICE_ATTR_RW(interleave_granularity);
> +
>  static struct attribute *cxl_region_attrs[] = {
>  	&dev_attr_uuid.attr,
> +	&dev_attr_interleave_ways.attr,
> +	&dev_attr_interleave_granularity.attr,
>  	NULL,
>  };
>  



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 36/46] cxl/region: Add interleave ways attribute
  2022-06-24  4:19 ` [PATCH 36/46] cxl/region: Add interleave ways attribute Dan Williams
  2022-06-30 13:44   ` Jonathan Cameron
@ 2022-06-30 13:45   ` Jonathan Cameron
  1 sibling, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 13:45 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:40 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Ben Widawsky <bwidawsk@kernel.org>
> 
> Add an ABI to allow the number of devices that comprise a region to be
> set.
> 
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> [djbw: reword changelog]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Forgot to say that with mention of the granularity in the patch
description I'm fine with this rest of this.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |  21 ++++
>  drivers/cxl/core/region.c               | 128 ++++++++++++++++++++++++
>  drivers/cxl/cxl.h                       |  33 ++++++
>  3 files changed, 182 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index d30c95a758a9..46d5295c1149 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -273,3 +273,24 @@ Description:
>  		(RW) Write a unique identifier for the region. This field must
>  		be set for persistent regions and it must not conflict with the
>  		UUID of another region.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/interleave_granularity
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RW) Set the number of consecutive bytes each device in the
> +		interleave set will claim. The possible interleave granularity
> +		values are determined by the CXL spec and the participating
> +		devices.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/interleave_ways
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RW) Configures the number of devices participating in the
> +		region is set by writing this value. Each device will provide
> +		1/interleave_ways of storage for the region.
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f75978f846b9..78af42454760 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -7,6 +7,7 @@
>  #include <linux/slab.h>
>  #include <linux/uuid.h>
>  #include <linux/idr.h>
> +#include <cxlmem.h>
>  #include <cxl.h>
>  #include "core.h"
>  
> @@ -21,6 +22,8 @@
>   *
>   * Region configuration has ordering constraints. UUID may be set at any time
>   * but is only visible for persistent regions.
> + * 1. Interleave granularity
> + * 2. Interleave size
>   */
>  
>  /*
> @@ -119,8 +122,129 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
>  	return a->mode;
>  }
>  
> +static ssize_t interleave_ways_show(struct device *dev,
> +				    struct device_attribute *attr, char *buf)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +	struct cxl_region_params *p = &cxlr->params;
> +	ssize_t rc;
> +
> +	rc = down_read_interruptible(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +	rc = sysfs_emit(buf, "%d\n", p->interleave_ways);
> +	up_read(&cxl_region_rwsem);
> +
> +	return rc;
> +}
> +
> +static ssize_t interleave_ways_store(struct device *dev,
> +				     struct device_attribute *attr,
> +				     const char *buf, size_t len)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> +	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +	struct cxl_region_params *p = &cxlr->params;
> +	int rc, val;
> +	u8 iw;
> +
> +	rc = kstrtoint(buf, 0, &val);
> +	if (rc)
> +		return rc;
> +
> +	rc = ways_to_cxl(val, &iw);
> +	if (rc)
> +		return rc;
> +
> +	/*
> +	 * Even for x3, x9, and x12 interleaves the region interleave must be a
> +	 * power of 2 multiple of the host bridge interleave.
> +	 */
> +	if (!is_power_of_2(val / cxld->interleave_ways) ||
> +	    (val % cxld->interleave_ways)) {
> +		dev_dbg(&cxlr->dev, "invalid interleave: %d\n", val);
> +		return -EINVAL;
> +	}
> +
> +	rc = down_write_killable(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +	if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +		rc = -EBUSY;
> +		goto out;
> +	}
> +
> +	p->interleave_ways = val;
> +out:
> +	up_read(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +	return len;
> +}
> +static DEVICE_ATTR_RW(interleave_ways);
> +
> +static ssize_t interleave_granularity_show(struct device *dev,
> +					   struct device_attribute *attr,
> +					   char *buf)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +	struct cxl_region_params *p = &cxlr->params;
> +	ssize_t rc;
> +
> +	rc = down_read_interruptible(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +	rc = sysfs_emit(buf, "%d\n", p->interleave_granularity);
> +	up_read(&cxl_region_rwsem);
> +
> +	return rc;
> +}
> +
> +static ssize_t interleave_granularity_store(struct device *dev,
> +					    struct device_attribute *attr,
> +					    const char *buf, size_t len)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> +	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +	struct cxl_region_params *p = &cxlr->params;
> +	int rc, val;
> +	u16 ig;
> +
> +	rc = kstrtoint(buf, 0, &val);
> +	if (rc)
> +		return rc;
> +
> +	rc = granularity_to_cxl(val, &ig);
> +	if (rc)
> +		return rc;
> +
> +	/* region granularity must be >= root granularity */
> +	if (val < cxld->interleave_granularity)
> +		return -EINVAL;
> +
> +	rc = down_write_killable(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +	if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
> +		rc = -EBUSY;
> +		goto out;
> +	}
> +
> +	p->interleave_granularity = val;
> +out:
> +	up_read(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +	return len;
> +}
> +static DEVICE_ATTR_RW(interleave_granularity);
> +
>  static struct attribute *cxl_region_attrs[] = {
>  	&dev_attr_uuid.attr,
> +	&dev_attr_interleave_ways.attr,
> +	&dev_attr_interleave_granularity.attr,
>  	NULL,
>  };
>  
> @@ -212,6 +336,8 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>  					      enum cxl_decoder_type type)
>  {
>  	struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
> +	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> +	struct cxl_region_params *p;
>  	struct cxl_region *cxlr;
>  	struct device *dev;
>  	int rc;
> @@ -219,8 +345,10 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>  	cxlr = cxl_region_alloc(cxlrd, id);
>  	if (IS_ERR(cxlr))
>  		return cxlr;
> +	p = &cxlr->params;
>  	cxlr->mode = mode;
>  	cxlr->type = type;
> +	p->interleave_granularity = cxld->interleave_granularity;
>  
>  	dev = &cxlr->dev;
>  	rc = dev_set_name(dev, "region%d", id);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 46a9f8acc602..13ee04b00e0c 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -7,6 +7,7 @@
>  #include <linux/libnvdimm.h>
>  #include <linux/bitfield.h>
>  #include <linux/bitops.h>
> +#include <linux/log2.h>
>  #include <linux/io.h>
>  
>  /**
> @@ -92,6 +93,31 @@ static inline int cxl_to_ways(u8 eniw, unsigned int *val)
>  	return 0;
>  }
>  
> +static inline int granularity_to_cxl(int g, u16 *ig)
> +{
> +	if (g > SZ_16K || g < 256 || !is_power_of_2(g))
> +		return -EINVAL;
> +	*ig = ilog2(g) - 8;
> +	return 0;
> +}
> +
> +static inline int ways_to_cxl(int ways, u8 *iw)
> +{
> +	if (ways > 16)
> +		return -EINVAL;
> +	if (is_power_of_2(ways)) {
> +		*iw = ilog2(ways);
> +		return 0;
> +	}
> +	if (ways % 3)
> +		return -EINVAL;
> +	ways /= 3;
> +	if (!is_power_of_2(ways))
> +		return -EINVAL;
> +	*iw = ilog2(ways) + 8;
> +	return 0;
> +}
> +
>  /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
>  #define CXLDEV_CAP_ARRAY_OFFSET 0x0
>  #define   CXLDEV_CAP_ARRAY_CAP_ID 0
> @@ -291,11 +317,14 @@ struct cxl_root_decoder {
>  /*
>   * enum cxl_config_state - State machine for region configuration
>   * @CXL_CONFIG_IDLE: Any sysfs attribute can be written freely
> + * @CXL_CONFIG_INTERLEAVE_ACTIVE: region size has been set, no more
> + * changes to interleave_ways or interleave_granularity
>   * @CXL_CONFIG_ACTIVE: All targets have been added the region is now
>   * active
>   */
>  enum cxl_config_state {
>  	CXL_CONFIG_IDLE,
> +	CXL_CONFIG_INTERLEAVE_ACTIVE,
>  	CXL_CONFIG_ACTIVE,
>  };
>  
> @@ -303,12 +332,16 @@ enum cxl_config_state {
>   * struct cxl_region_params - region settings
>   * @state: allow the driver to lockdown further parameter changes
>   * @uuid: unique id for persistent regions
> + * @interleave_ways: number of endpoints in the region
> + * @interleave_granularity: capacity each endpoint contributes to a stripe
>   *
>   * State transitions are protected by the cxl_region_rwsem
>   */
>  struct cxl_region_params {
>  	enum cxl_config_state state;
>  	uuid_t uuid;
> +	int interleave_ways;
> +	int interleave_granularity;
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions
  2022-06-24  4:19 ` [PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions Dan Williams
@ 2022-06-30 13:56   ` Jonathan Cameron
  2022-07-11  0:47     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 13:56 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:41 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> After a region's interleave parameters (ways and granularity) are set,
> add a way for regions to allocate HPA from the free capacity in their
> decoder. The allocator for this capacity reuses the 'struct resource'
> based allocator used for CONFIG_DEVICE_PRIVATE.
> 
> Once the tuple of "ways, granularity, and size" is set the
> region configuration transitions to the CXL_CONFIG_INTERLEAVE_ACTIVE
> state which is a precursor to allowing endpoint decoders to be added to
> a region.
> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

A few comments on the interface inline.

Thanks,

Jonathan

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |  25 ++++
>  drivers/cxl/Kconfig                     |   3 +
>  drivers/cxl/core/region.c               | 148 +++++++++++++++++++++++-
>  drivers/cxl/cxl.h                       |   2 +
>  4 files changed, 177 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 46d5295c1149..3658facc9944 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -294,3 +294,28 @@ Description:
>  		(RW) Configures the number of devices participating in the
>  		region is set by writing this value. Each device will provide
>  		1/interleave_ways of storage for the region.
> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/size
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RW) System physical address space to be consumed by the region.
> +		When written to, this attribute will allocate space out of the
> +		CXL root decoder's address space. When read the size of the
> +		address space is reported and should match the span of the
> +		region's resource attribute. Size shall be set after the
> +		interleave configuration parameters.

There seem to be constraints that say you have to set this to 0 and then something
else later to force a resize. That should be mentioned here or gotten rid of.


> +
> +
> +What:		/sys/bus/cxl/devices/regionZ/resource
> +Date:		May, 2022
> +KernelVersion:	v5.20
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) A region is a contiguous partition of a CXL root decoder
> +		address space. Region capacity is allocated by writing to the
> +		size attribute, the resulting physical address space determined
> +		by the driver is reflected here. It is therefore not useful to
> +		read this before writing a value to the size attribute.

I don't much like naming a "base address" resource.  I'd expect resource to contain
both base and size whereas this only has the base address of the region.



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions
  2022-06-24  4:19 ` [PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions Dan Williams
@ 2022-06-30 14:31   ` Jonathan Cameron
  2022-07-11  1:12     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 14:31 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:42 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The region provisioning process involves allocating DPA to a set of
> endpoint decoders, and HPA plus the region geometry to a region device.
> Then the decoder is assigned to the region. At this point several
> validation steps can be performed to validate that the decoder is
> suitable to participate in the region.
> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |  19 ++
>  drivers/cxl/core/core.h                 |   6 +
>  drivers/cxl/core/hdm.c                  |  13 +-
>  drivers/cxl/core/port.c                 |  12 +-
>  drivers/cxl/core/region.c               | 286 +++++++++++++++++++++++-
>  drivers/cxl/cxl.h                       |  11 +
>  6 files changed, 342 insertions(+), 5 deletions(-)
> 

A few fixes seems to have ended up in wrong patch.
Other trivial typos etc inline plus what looks to be an
item left from a todo list...

...


> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index a604c24ff918..4830365f3857 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -24,6 +24,7 @@
>   * but is only visible for persistent regions.
>   * 1. Interleave granularity
>   * 2. Interleave size
> + * 3. Decoder targets
>   */
>  
>  /*
> @@ -138,6 +139,8 @@ static ssize_t interleave_ways_show(struct device *dev,
>  	return rc;
>  }
>  
> +static const struct attribute_group *get_cxl_region_target_group(void);
> +
>  static ssize_t interleave_ways_store(struct device *dev,
>  				     struct device_attribute *attr,
>  				     const char *buf, size_t len)
> @@ -146,7 +149,7 @@ static ssize_t interleave_ways_store(struct device *dev,
>  	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
>  	struct cxl_region *cxlr = to_cxl_region(dev);
>  	struct cxl_region_params *p = &cxlr->params;
> -	int rc, val;
> +	int rc, val, save;
>  	u8 iw;
>  
>  	rc = kstrtoint(buf, 0, &val);
> @@ -175,9 +178,13 @@ static ssize_t interleave_ways_store(struct device *dev,
>  		goto out;
>  	}
>  
> +	save = p->interleave_ways;
>  	p->interleave_ways = val;
> +	rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
> +	if (rc)
> +		p->interleave_ways = save;
>  out:
> -	up_read(&cxl_region_rwsem);
> +	up_write(&cxl_region_rwsem);

Bug in earlier patch?

>  	if (rc)
>  		return rc;
>  	return len;
> @@ -234,7 +241,7 @@ static ssize_t interleave_granularity_store(struct device *dev,
>  
>  	p->interleave_granularity = val;
>  out:
> -	up_read(&cxl_region_rwsem);
> +	up_write(&cxl_region_rwsem);

Bug in earlier patch? 

>  	if (rc)
>  		return rc;
>  	return len;
> @@ -393,9 +400,262 @@ static const struct attribute_group cxl_region_group = {
>  	.is_visible = cxl_region_visible,
>  };

...

> +/*
> + * - Check that the given endpoint is attached to a host-bridge identified
> + *   in the root interleave.

 Comment on something to fix?  Or stale comment that can be dropped?

> + */
> +static int cxl_region_attach(struct cxl_region *cxlr,
> +			     struct cxl_endpoint_decoder *cxled, int pos)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +
> +	if (cxled->mode == CXL_DECODER_DEAD) {
> +		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> +		return -ENODEV;
> +	}
> +
> +	if (pos >= p->interleave_ways) {
> +		dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
> +			p->interleave_ways);
> +		return -ENXIO;
> +	}
> +
> +	if (p->targets[pos] == cxled)
> +		return 0;
> +
> +	if (p->targets[pos]) {
> +		struct cxl_endpoint_decoder *cxled_target = p->targets[pos];
> +		struct cxl_memdev *cxlmd_target = cxled_to_memdev(cxled_target);
> +
> +		dev_dbg(&cxlr->dev, "position %d already assigned to %s:%s\n",
> +			pos, dev_name(&cxlmd_target->dev),
> +			dev_name(&cxled_target->cxld.dev));
> +		return -EBUSY;
> +	}
> +
> +	p->targets[pos] = cxled;
> +	cxled->pos = pos;
> +	p->nr_targets++;
> +
> +	return 0;
> +}
> +
> +static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_region *cxlr = cxled->cxld.region;
> +	struct cxl_region_params *p;
> +
> +	lockdep_assert_held_write(&cxl_region_rwsem);
> +
> +	if (!cxlr)
> +		return;
> +
> +	p = &cxlr->params;
> +	get_device(&cxlr->dev);
> +
> +	if (cxled->pos < 0 || cxled->pos >= p->interleave_ways ||
> +	    p->targets[cxled->pos] != cxled) {
> +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +
> +		dev_WARN_ONCE(&cxlr->dev, 1, "expected %s:%s at position %d\n",
> +			      dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +			      cxled->pos);
> +		goto out;
> +	}
> +
> +	p->targets[cxled->pos] = NULL;
> +	p->nr_targets--;
> +
> +	/* notify the region driver that one of its targets has deparated */

departed?

> +	up_write(&cxl_region_rwsem);
> +	device_release_driver(&cxlr->dev);
> +	down_write(&cxl_region_rwsem);
> +out:
> +	put_device(&cxlr->dev);
> +}
> +



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism
  2022-06-24  4:19 ` [PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism Dan Williams
@ 2022-06-30 15:48   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 15:48 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:43 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The ACPI CXL Fixed Memory Window Structure (CFMWS) defines multiple
> methods to determine which host bridge provides access to a given
> endpoint relative to that device's position in the interleave. The
> "Interleave Arithmetic" defines either a "standard modulo" /
> round-random algorithm, or "xormap" based algorithm which can be defined
> as a non-linear transform. Given that there are already more options
> beyond "standard modulo" and that "xormap" may turn out to be ACPI CXL
> specific, provide a callback for the region provisioning code to map
> endpoint positions back to expected host bridge id (cxl_dport target).
> 
> For now just support the simple modulo math case and save the xormap for
> a follow-on change.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/port.c | 15 +++++++++++++++
>  drivers/cxl/cxl.h       |  2 ++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 562a6453249b..7756409d0a58 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1422,6 +1422,20 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
>  	return rc;
>  }
>  
> +static struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos)
> +{
> +	struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
> +	struct cxl_decoder *cxld = &cxlsd->cxld;
> +	int iw;
> +
> +	iw = cxld->interleave_ways;
> +	if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets,
> +			  "misconfigured root decoder\n"))
> +		return NULL;
> +
> +	return cxlrd->cxlsd.target[pos % iw];
> +}
> +
>  static struct lock_class_key cxl_decoder_key;
>  
>  /**
> @@ -1466,6 +1480,7 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
>  				if (rc < 0)
>  					goto err;
>  				atomic_set(&cxlrd->region_id, rc);
> +				cxlrd->calc_hb = cxl_hb_modulo;
>  			} else
>  				cxlsd = NULL;
>  		} else {
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 9340deccad4f..30227348f768 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -315,11 +315,13 @@ struct cxl_switch_decoder {
>   * struct cxl_root_decoder - Static platform CXL address decoder
>   * @res: host / parent resource for region allocations
>   * @region_id: region id for next region provisioning event
> + * @calc_hb: which host bridge covers the n'th position by granularity
>   * @cxlsd: base cxl switch decoder
>   */
>  struct cxl_root_decoder {
>  	struct resource *res;
>  	atomic_t region_id;
> +	struct cxl_dport *(*calc_hb)(struct cxl_root_decoder *cxlrd, int pos);
>  	struct cxl_switch_decoder cxlsd;
>  };
>  


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 40/46] cxl/region: Attach endpoint decoders
  2022-06-24  4:19 ` [PATCH 40/46] cxl/region: Attach endpoint decoders Dan Williams
  2022-06-24 18:25   ` Jonathan Cameron
@ 2022-06-30 16:34   ` Jonathan Cameron
  2022-07-11  2:02     ` Dan Williams
  1 sibling, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 16:34 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:44 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> CXL regions (interleave sets) are made up of a set of memory devices
> where each device maps a portion of the interleave with one of its
> decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
> As endpoint decoders are identified by a provisioning tool they can be
> added to a region provided the region interleave properties are set
> (way, granularity, HPA) and DPA has been assigned to the decoder.
> 
> The attach event triggers several validation checks, for example:
> - is the DPA sized appropriately for the region
> - is the decoder reachable via the host-bridges identified by the
>   region's root decoder
> - is the device already active in a different region position slot
> - are there already regions with a higher HPA active on a given port
>   (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)
> 
> ...and the attach event affords an opportunity to collect data and
> resources relevant to later programming the target lists in switch
> decoders, for example:
> - allocate a decoder at each cxl_port in the decode chain
> - for a given switch port, how many the region's endpoints are hosted
>   through the port
> - how many unique targets (next hops) does a port need to map to reach
>   those endpoints
> 
> The act of reconciling this information and deploying it to the decoder
> configuration is saved for a follow-on patch.
> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/core.h   |   7 +
>  drivers/cxl/core/port.c   |  10 +-
>  drivers/cxl/core/region.c | 338 +++++++++++++++++++++++++++++++++++++-
>  drivers/cxl/cxl.h         |  20 +++
>  drivers/cxl/cxlmem.h      |   5 +
>  5 files changed, 372 insertions(+), 8 deletions(-)
> 


> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 4830365f3857..65bf84abad57 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -428,6 +428,254 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
>  	return rc;
>  }
>  

> +
> +static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> +					       struct cxl_region *cxlr)
> +{
> +	struct cxl_region_ref *cxl_rr;
> +
> +	cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL);
> +	if (!cxl_rr)
> +		return NULL;
> +	cxl_rr->port = port;
> +	cxl_rr->region = cxlr;
> +	xa_init(&cxl_rr->endpoints);
> +	return cxl_rr;
> +}
> +
> +static void free_region_ref(struct cxl_region_ref *cxl_rr)
> +{
> +	struct cxl_port *port = cxl_rr->port;
> +	struct cxl_region *cxlr = cxl_rr->region;
> +	struct cxl_decoder *cxld = cxl_rr->decoder;
> +
> +	dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n");
> +	if (cxld->region == cxlr) {
> +		cxld->region = NULL;
> +		put_device(&cxlr->dev);
> +	}
> +
> +	xa_erase(&port->regions, (unsigned long)cxlr);

Why do we have things in a free_ function that aren't simply removing things
created in the alloc()?  I'd kind of expect this to be in a cxl_rr_del() or similar.

> +	xa_destroy(&cxl_rr->endpoints);
> +	kfree(cxl_rr);
> +}
> +
> +static int cxl_rr_add(struct cxl_region_ref *cxl_rr)
> +{
> +	struct cxl_port *port = cxl_rr->port;
> +	struct cxl_region *cxlr = cxl_rr->region;
> +
> +	return xa_insert(&port->regions, (unsigned long)cxlr, cxl_rr,
> +			 GFP_KERNEL);
> +}
> +
> +static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
> +			 struct cxl_endpoint_decoder *cxled)
> +{
> +	int rc;
> +	struct cxl_port *port = cxl_rr->port;
> +	struct cxl_region *cxlr = cxl_rr->region;
> +	struct cxl_decoder *cxld = cxl_rr->decoder;
> +	struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled));
> +
> +	rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep,
> +			 GFP_KERNEL);
> +	if (rc)
> +		return rc;
> +	cxl_rr->nr_eps++;
> +
> +	if (!cxld->region) {
> +		cxld->region = cxlr;
> +		get_device(&cxlr->dev);
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_port_attach_region(struct cxl_port *port,
> +				  struct cxl_region *cxlr,
> +				  struct cxl_endpoint_decoder *cxled, int pos)
> +{
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
> +	struct cxl_region_ref *cxl_rr = NULL, *iter;
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct cxl_decoder *cxld = NULL;
> +	unsigned long index;
> +	int rc = -EBUSY;
> +
> +	lockdep_assert_held_write(&cxl_region_rwsem);

This function is complex enough that maybe it would benefit from
some saying what each part is doing.

> +
> +	xa_for_each(&port->regions, index, iter) {
> +		struct cxl_region_params *ip = &iter->region->params;
> +
> +		if (iter->region == cxlr)
> +			cxl_rr = iter;
> +		if (ip->res->start > p->res->start) {
> +			dev_dbg(&cxlr->dev,
> +				"%s: HPA order violation %s:%pr vs %pr\n",
> +				dev_name(&port->dev),
> +				dev_name(&iter->region->dev), ip->res, p->res);
> +			return -EBUSY;
> +		}
> +	}
> +
> +	if (cxl_rr) {
> +		struct cxl_ep *ep_iter;
> +		int found = 0;
> +
> +		cxld = cxl_rr->decoder;
> +		xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
> +			if (ep_iter == ep)
> +				continue;
> +			if (ep_iter->next == ep->next) {
> +				found++;
> +				break;
> +			}
> +		}
> +
> +		/*
> +		 * If this is a new target or if this port is direct connected
> +		 * to this endpoint then add to the target count.
> +		 */
> +		if (!found || !ep->next)
> +			cxl_rr->nr_targets++;
> +	} else {
> +		cxl_rr = alloc_region_ref(port, cxlr);
> +		if (!cxl_rr) {
> +			dev_dbg(&cxlr->dev,
> +				"%s: failed to allocate region reference\n",
> +				dev_name(&port->dev));
> +			return -ENOMEM;
> +		}
> +		rc = cxl_rr_add(cxl_rr);
> +		if (rc) {
> +			dev_dbg(&cxlr->dev,
> +				"%s: failed to track region reference\n",
> +				dev_name(&port->dev));
> +			kfree(cxl_rr);
> +			return rc;
> +		}
> +	}
> +
> +	if (!cxld) {
> +		if (port == cxled_to_port(cxled))
> +			cxld = &cxled->cxld;
> +		else
> +			cxld = cxl_region_find_decoder(port, cxlr);
> +		if (!cxld) {
> +			dev_dbg(&cxlr->dev, "%s: no decoder available\n",
> +				dev_name(&port->dev));
> +			goto out_erase;
> +		}
> +
> +		if (cxld->region) {
> +			dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n",
> +				dev_name(&port->dev), dev_name(&cxld->dev),
> +				dev_name(&cxld->region->dev));
> +			rc = -EBUSY;
> +			goto out_erase;
> +		}
> +
> +		cxl_rr->decoder = cxld;
> +	}
> +
> +	rc = cxl_rr_ep_add(cxl_rr, cxled);
> +	if (rc) {
> +		dev_dbg(&cxlr->dev,
> +			"%s: failed to track endpoint %s:%s reference\n",
> +			dev_name(&port->dev), dev_name(&cxlmd->dev),
> +			dev_name(&cxld->dev));
> +		goto out_erase;
> +	}
> +
> +	return 0;
> +out_erase:
> +	if (cxl_rr->nr_eps == 0)
> +		free_region_ref(cxl_rr);
> +	return rc;
> +}
> +

>  
>  static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>  {
> +	struct cxl_port *iter, *ep_port = cxled_to_port(cxled);
>  	struct cxl_region *cxlr = cxled->cxld.region;
>  	struct cxl_region_params *p;
>  
> @@ -481,6 +811,10 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>  	p = &cxlr->params;
>  	get_device(&cxlr->dev);
>  
> +	for (iter = ep_port; !is_cxl_root(iter);
> +	     iter = to_cxl_port(iter->dev.parent))
> +		cxl_port_detach_region(iter, cxlr, cxled);
> +
>  	if (cxled->pos < 0 || cxled->pos >= p->interleave_ways ||
>  	    p->targets[cxled->pos] != cxled) {
>  		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> @@ -491,6 +825,8 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>  		goto out;
>  	}
>  
> +	if (p->state == CXL_CONFIG_ACTIVE)

I 'think' the state is either CXL_CONFIG_ACTIVE or CXL_CONFIG_INTERLEAVE_ACTIVE,
so you could set this unconditionally.  A comment here on permissible
states would be useful for future reference.

> +		p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
>  	p->targets[cxled->pos] = NULL;
>  	p->nr_targets--;


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 42/46] cxl/hdm: Commit decoder state to hardware
  2022-06-24  4:19 ` [PATCH 42/46] cxl/hdm: Commit decoder state to hardware Dan Williams
@ 2022-06-30 17:05   ` Jonathan Cameron
  2022-07-11  3:02     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 17:05 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:46 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> After all the soft validation of the region has completed, convey the
> region configuration to hardware while being careful to commit decoders
> in specification mandated order. In addition to programming the endpoint
> decoder base-addres, intereleave ways and granularity, the switch
> decoder target lists are also established.
> 
> While the kernel can enforce spec-mandated commit order, it can not
> enforce spec-mandated reset order. For example, the kernel can't stop
> someone from removing an endpoint device that is occupying decoderN in a
> switch decoder where decoderN+1 is also committed. To reset decoderN,
> decoderN+1 must be torn down first. That "tear down the world"
> implementation is saved for a follow-on patch.
> 
> Callback operations are provided for the 'commit' and 'reset'
> operations. While those callbacks may prove useful for CXL accelerators
> (Type-2 devices with memory) the primary motivation is to enable a
> simple way for cxl_test to intercept those operations.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Trivial comments only in this one.

Jonathan

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |  16 ++
>  drivers/cxl/core/hdm.c                  | 218 ++++++++++++++++++++++++
>  drivers/cxl/core/port.c                 |   1 +
>  drivers/cxl/core/region.c               | 189 ++++++++++++++++++--
>  drivers/cxl/cxl.h                       |  11 ++
>  tools/testing/cxl/test/cxl.c            |  46 +++++
>  6 files changed, 471 insertions(+), 10 deletions(-)
> 

> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 2ee62dde8b23..72f98f1a782c 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -129,6 +129,8 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
>  		return ERR_PTR(-ENXIO);
>  	}
>  
> +	dev_set_drvdata(&port->dev, cxlhdm);

Trivial, but dev == &port->dev I think so you might as well use dev.

This feels like a bit of a hack as it just so happens nothing else is
in the port drvdata.  Maybe it's better to add a pointer from
port to cxlhdm?

> +
>  	return cxlhdm;
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
> @@ -444,6 +446,213 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
>  	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>  }
>  

> +static int cxl_decoder_commit(struct cxl_decoder *cxld)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> +	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
> +	int id = cxld->id, rc;
> +	u64 base, size;
> +	u32 ctrl;
> +
> +	if (cxld->flags & CXL_DECODER_F_ENABLE)
> +		return 0;
> +
> +	if (port->commit_end + 1 != id) {
> +		dev_dbg(&port->dev,
> +			"%s: out of order commit, expected decoder%d.%d\n",
> +			dev_name(&cxld->dev), port->id, port->commit_end + 1);
> +		return -EBUSY;
> +	}
> +
> +	down_read(&cxl_dpa_rwsem);
> +	/* common decoder settings */
> +	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(cxld->id));
> +	cxld_set_interleave(cxld, &ctrl);
> +	cxld_set_type(cxld, &ctrl);
> +	cxld_set_hpa(cxld, &base, &size);
> +
> +	writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
> +	writel(lower_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
> +	writel(upper_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(id));
> +	writel(lower_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id));
> +
> +	if (is_switch_decoder(&cxld->dev)) {
> +		struct cxl_switch_decoder *cxlsd =
> +			to_cxl_switch_decoder(&cxld->dev);
> +		void __iomem *tl_hi = hdm + CXL_HDM_DECODER0_TL_HIGH(id);
> +		void __iomem *tl_lo = hdm + CXL_HDM_DECODER0_TL_LOW(id);
> +		u64 targets;
> +
> +		rc = cxlsd_set_targets(cxlsd, &targets);
> +		if (rc) {
> +			dev_dbg(&port->dev, "%s: target configuration error\n",
> +				dev_name(&cxld->dev));
> +			goto err;
> +		}
> +
> +		writel(upper_32_bits(targets), tl_hi);
> +		writel(lower_32_bits(targets), tl_lo);
> +	} else {
> +		struct cxl_endpoint_decoder *cxled =
> +			to_cxl_endpoint_decoder(&cxld->dev);
> +		void __iomem *sk_hi = hdm + CXL_HDM_DECODER0_SKIP_HIGH(id);
> +		void __iomem *sk_lo = hdm + CXL_HDM_DECODER0_SKIP_LOW(id);
> +
> +		writel(upper_32_bits(cxled->skip), sk_hi);
> +		writel(lower_32_bits(cxled->skip), sk_lo);
> +	}
> +
> +	writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
> +	up_read(&cxl_dpa_rwsem);
> +
> +	port->commit_end++;

Obviously doesn't matter as resetting on error, but
feels like the increment of commit_end++ should only follow
succesful commit / await_commit();

> +	rc = cxld_await_commit(hdm, cxld->id);
> +err:
> +	if (rc) {
> +		dev_dbg(&port->dev, "%s: error %d committing decoder\n",
> +			dev_name(&cxld->dev), rc);
> +		cxld->reset(cxld);
> +		return rc;
> +	}
> +	cxld->flags |= CXL_DECODER_F_ENABLE;
> +
> +	return 0;
> +}
> +
> +static int cxl_decoder_reset(struct cxl_decoder *cxld)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> +	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
> +	int id = cxld->id;
> +	u32 ctrl;
> +
> +	if ((cxld->flags & CXL_DECODER_F_ENABLE) ==  0)

extra space after ==

> +		return 0;
> +

...


>  		
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 7034300e72b2..eee1615d2319 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -630,6 +630,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
>  	port->component_reg_phys = component_reg_phys;
>  	ida_init(&port->decoder_ida);
>  	port->dpa_end = -1;
> +	port->commit_end = -1;
>  	xa_init(&port->dports);
>  	xa_init(&port->endpoints);
>  	xa_init(&port->regions);
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 071b8cafe2bb..b90160c4f975 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -112,6 +112,168 @@ static ssize_t uuid_store(struct device *dev, struct device_attribute *attr,
>  }
>  static DEVICE_ATTR_RW(uuid);

...


> +static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	int i;
> +
> +	for (i = count - 1; i >= 0; i--) {
> +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +		struct cxl_port *iter = cxled_to_port(cxled);
> +		struct cxl_ep *ep;
> +		int rc;
> +
> +		while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
> +			iter = to_cxl_port(iter->dev.parent);
> +
> +		for (ep = cxl_ep_load(iter, cxlmd); iter;
> +		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
> +			struct cxl_region_ref *cxl_rr;
> +			struct cxl_decoder *cxld;
> +
> +			cxl_rr = cxl_rr_load(iter, cxlr);
> +			cxld = cxl_rr->decoder;
> +			rc = cxld->reset(cxld);
> +			if (rc)
> +				return rc;
> +		}
> +
> +		rc = cxled->cxld.reset(&cxled->cxld);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_region_decode_commit(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	int i, rc;
> +
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +		struct cxl_region_ref *cxl_rr;
> +		struct cxl_decoder *cxld;
> +		struct cxl_port *iter;
> +		struct cxl_ep *ep;
> +
> +		/* commit bottom up */
> +		for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
> +		     iter = to_cxl_port(iter->dev.parent)) {
> +			cxl_rr = cxl_rr_load(iter, cxlr);
> +			cxld = cxl_rr->decoder;
> +			rc = cxld->commit(cxld);
> +			if (rc)
> +				break;
> +		}
> +
> +		if (is_cxl_root(iter))
> +			continue;
> +
> +		/* teardown top down */

Comment on why we are tearing down.  I guess because previous
somehow didn't end up at the root?

> +		for (ep = cxl_ep_load(iter, cxlmd); ep && iter;
> +		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
> +			cxl_rr = cxl_rr_load(iter, cxlr);
> +			cxld = cxl_rr->decoder;
> +			cxld->reset(cxld);
> +		}
> +
> +		cxled->cxld.reset(&cxled->cxld);
> +		if (i == 0)
> +			return rc;
> +		break;
> +	}
> +
> +	if (i >= p->nr_targets)
> +		return 0;
> +
> +	/* undo the targets that were successfully committed */
> +	cxl_region_decode_reset(cxlr, i);
> +	return rc;
> +}
> +
> +static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
> +			    const char *buf, size_t len)
> +{
> +	struct cxl_region *cxlr = to_cxl_region(dev);
> +	struct cxl_region_params *p = &cxlr->params;
> +	bool commit;
> +	ssize_t rc;
> +
> +	rc = kstrtobool(buf, &commit);
> +	if (rc)
> +		return rc;
> +
> +	rc = down_write_killable(&cxl_region_rwsem);
> +	if (rc)
> +		return rc;
> +
> +	/* Already in the requested state? */
> +	if (commit && p->state >= CXL_CONFIG_COMMIT)
> +		goto out;
> +	if (!commit && p->state < CXL_CONFIG_COMMIT)
> +		goto out;
> +
> +	/* Not ready to commit? */
> +	if (commit && p->state < CXL_CONFIG_ACTIVE) {
> +		rc = -ENXIO;
> +		goto out;
> +	}
> +
> +	if (commit)
> +		rc = cxl_region_decode_commit(cxlr);
> +	else {
> +		p->state = CXL_CONFIG_RESET_PENDING;
> +		up_write(&cxl_region_rwsem);
> +		device_release_driver(&cxlr->dev);
> +		down_write(&cxl_region_rwsem);
> +
> +		if (p->state == CXL_CONFIG_RESET_PENDING)

What path results in that changing in last few lines?
Perhaps a comment if there is something we need to protect against?


> +			rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
> +	}
> +
> +	if (rc)
> +		goto out;
> +
> +	if (commit)
> +		p->state = CXL_CONFIG_COMMIT;
> +	else if (p->state == CXL_CONFIG_RESET_PENDING)
> +		p->state = CXL_CONFIG_ACTIVE;
> +
> +out:
> +	up_write(&cxl_region_rwsem);
> +
> +	if (rc)
> +		return rc;
> +	return len;
> +}


...


> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index a93d7c4efd1a..fc14f6805f2c 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -54,6 +54,7 @@
>  #define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
>  #define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
>  #define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> +#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
>  #define   CXL_HDM_DECODER0_CTRL_TYPE BIT(12)
>  #define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
>  #define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
> @@ -257,6 +258,8 @@ enum cxl_decoder_type {
>   * @target_type: accelerator vs expander (type2 vs type3) selector
>   * @region: currently assigned region for this decoder
>   * @flags: memory type capabilities and locking
> + * @commit: device/decoder-type specific callback to commit settings to hw
> + * @commit: device/decoder-type specific callback to reset hw settings

@reset

>  */
>  struct cxl_decoder {
>  	struct device dev;
> @@ -267,6 +270,8 @@ struct cxl_decoder {
>  	enum cxl_decoder_type target_type;
>  	struct cxl_region *region;
>  	unsigned long flags;
> +	int (*commit)(struct cxl_decoder *cxld);
> +	int (*reset)(struct cxl_decoder *cxld);
>  };
>  


> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 51d517fa62ee..94653201631c 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -429,6 +429,50 @@ static int map_targets(struct device *dev, void *data)
>  	return 0;
>  }
>  

...

> +static int mock_decoder_reset(struct cxl_decoder *cxld)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +	int id = cxld->id;
> +
> +	if ((cxld->flags & CXL_DECODER_F_ENABLE) ==  0)

bonus space after ==


> +		return 0;
> +
> +	dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev));
> +	if (port->commit_end != id) {
> +		dev_dbg(&port->dev,
> +			"%s: out of order reset, expected decoder%d.%d\n",
> +			dev_name(&cxld->dev), port->id, port->commit_end);
> +		return -EBUSY;
> +	}
> +
> +	port->commit_end--;
> +	cxld->flags &= ~CXL_DECODER_F_ENABLE;
> +
> +	return 0;
> +}
> 


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 43/46] cxl/region: Add region driver boiler plate
  2022-06-24  4:19 ` [PATCH 43/46] cxl/region: Add region driver boiler plate Dan Williams
@ 2022-06-30 17:09   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 17:09 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:47 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The CXL region driver is responsible for routing fully formed CXL
> regions to one of libnvdimm, for persistent memory regions, device-dax
> for volatile memory regions, or just act as an enumeration placeholder
> if the region was setup and configuration locked by platform firmware.
> In the platform-firmware-setup case the expectation is that region is
> already accounted in the system memory map, i.e. already enabled as
> "System RAM".
> 
> For now, just attach to CXL regions in the CXL_CONFIG_COMMIT state, and
> take no further action.
> 
> Given this driver is just a small / simple router, include it in the
> core rather than its own module.
Ah. I was wondering why that changed. Fair enough.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute
  2022-06-24  4:19 ` [PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute Dan Williams
@ 2022-06-30 17:10   ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 17:10 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:48 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> While there is a need to go from a LIBNVDIMM 'struct nvdimm' to a CXL
> 'struct cxl_nvdimm', there is no use case to go the other direction.
> Likely this is a leftover from an early version of the referenced commit
> before it implemented devm for releasing the created nvdimm.
> 
> Fixes: 21083f51521f ("cxl/pmem: Register 'pmem' / cxl_nvdimm devices")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Must be right given it builds, so no one is using it ;)

FWIW:
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/cxl/cxl.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 734b4479feb2..d6ff6337aa49 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -411,7 +411,6 @@ struct cxl_nvdimm_bridge {
>  struct cxl_nvdimm {
>  	struct device dev;
>  	struct cxl_memdev *cxlmd;
> -	struct nvdimm *nvdimm;
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge
  2022-06-24  4:19 ` [PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge Dan Williams
@ 2022-06-30 17:14   ` Jonathan Cameron
  2022-07-11 19:49     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 17:14 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

On Thu, 23 Jun 2022 21:19:49 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Be careful to only disable cxl_pmem objects related to a given
> cxl_nvdimm_bridge. Otherwise, offline_nvdimm_bus() reaches across CXL
> domains and disables more than is expected.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fix, but not fixes tag? Probably wants a comment (I'm guessing
it didn't matter until now?)

By Domains, what do you mean?  I don't think we have that
well defined as a term.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects
  2022-06-24  4:19 ` [PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects Dan Williams
@ 2022-06-30 17:34   ` Jonathan Cameron
  2022-07-11 20:05     ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-06-30 17:34 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Thu, 23 Jun 2022 21:19:50 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> The LIBNVDIMM subsystem is a platform agnostic representation of system
> NVDIMM / persistent memory resources. To date, the CXL subsystem's
> interaction with LIBNVDIMM has been to register an nvdimm-bridge device
> and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM
> subsystem mechanics.
> 
> With regions the approach is the same. Create a new cxl_pmem_region
> object to proxy CXL region details into a LIBNVDIMM definition. With
> this enabling LIBNVDIMM can partition CXL persistent memory regions with
> legacy namespace labels. A follow-on patch will add CXL region label and
> CXL namespace label support to persist region configurations across
> driver reload / system-reset events.
ah. Now I see why we share ID space with NVDIMMs. Fair enough, I should
have read to the end ;)

> 
> Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

End of day, so a fairly superficial review on this and I'll hopefully
take a second look at one or two of the earlier patches when time allows.

Jonathan

...

> +static struct cxl_pmem_region *cxl_pmem_region_alloc(struct cxl_region *cxlr)
> +{
> +	struct cxl_pmem_region *cxlr_pmem = ERR_PTR(-ENXIO);

Rarely used, so better to set it where it is.

> +	struct cxl_region_params *p = &cxlr->params;
> +	struct device *dev;
> +	int i;
> +
> +	down_read(&cxl_region_rwsem);
> +	if (p->state != CXL_CONFIG_COMMIT)
> +		goto out;
> +	cxlr_pmem = kzalloc(struct_size(cxlr_pmem, mapping, p->nr_targets),
> +			    GFP_KERNEL);
> +	if (!cxlr_pmem) {
> +		cxlr_pmem = ERR_PTR(-ENOMEM);
> +		goto out;
> +	}
> +
> +	cxlr_pmem->hpa_range.start = p->res->start;
> +	cxlr_pmem->hpa_range.end = p->res->end;
> +
> +	/* Snapshot the region configuration underneath the cxl_region_rwsem */
> +	cxlr_pmem->nr_mappings = p->nr_targets;
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +		struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
> +
> +		m->cxlmd = cxlmd;
> +		get_device(&cxlmd->dev);
> +		m->start = cxled->dpa_res->start;
> +		m->size = resource_size(cxled->dpa_res);
> +		m->position = i;
> +	}
> +
> +	dev = &cxlr_pmem->dev;
> +	cxlr_pmem->cxlr = cxlr;
> +	device_initialize(dev);
> +	lockdep_set_class(&dev->mutex, &cxl_pmem_region_key);
> +	device_set_pm_not_required(dev);
> +	dev->parent = &cxlr->dev;
> +	dev->bus = &cxl_bus_type;
> +	dev->type = &cxl_pmem_region_type;
> +out:
> +	up_read(&cxl_region_rwsem);
> +
> +	return cxlr_pmem;
> +}
> +
> +static void cxlr_pmem_unregister(void *dev)
> +{
> +	device_unregister(dev);
> +}
> +
> +/**
> + * devm_cxl_add_pmem_region() - add a cxl_region to nd_region bridge
> + * @host: same host as @cxlmd

Run kernel-doc over these and clean all the warning sup.
Parameter if cxlr not host


> + *
> + * Return: 0 on success negative error code on failure.
> + */


>  /*
>   * Unit test builds overrides this to __weak, find the 'strong' version
> diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c
> index b271f6e90b91..4ba7248275ac 100644
> --- a/drivers/cxl/pmem.c
> +++ b/drivers/cxl/pmem.c
> @@ -7,6 +7,7 @@

>  


> +static int match_cxl_nvdimm(struct device *dev, void *data)
> +{
> +	return is_cxl_nvdimm(dev);
> +}
> +
> +static void unregister_region(void *nd_region)

Better to give this a more specific name as we have several
unregister_region() functions in CXL now.

> +{
> +	struct cxl_nvdimm_bridge *cxl_nvb;
> +	struct cxl_pmem_region *cxlr_pmem;
> +	int i;
> +
> +	cxlr_pmem = nd_region_provider_data(nd_region);
> +	cxl_nvb = cxlr_pmem->bridge;
> +	device_lock(&cxl_nvb->dev);
> +	for (i = 0; i < cxlr_pmem->nr_mappings; i++) {
> +		struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
> +		struct cxl_nvdimm *cxl_nvd = m->cxl_nvd;
> +
> +		if (cxl_nvd->region) {
> +			put_device(&cxlr_pmem->dev);
> +			cxl_nvd->region = NULL;
> +		}
> +	}
> +	device_unlock(&cxl_nvb->dev);
> +
> +	nvdimm_region_delete(nd_region);
> +}
> +

> +
> +static int cxl_pmem_region_probe(struct device *dev)
> +{
> +	struct nd_mapping_desc mappings[CXL_DECODER_MAX_INTERLEAVE];
> +	struct cxl_pmem_region *cxlr_pmem = to_cxl_pmem_region(dev);
> +	struct cxl_region *cxlr = cxlr_pmem->cxlr;
> +	struct cxl_pmem_region_info *info = NULL;
> +	struct cxl_nvdimm_bridge *cxl_nvb;
> +	struct nd_interleave_set *nd_set;
> +	struct nd_region_desc ndr_desc;
> +	struct cxl_nvdimm *cxl_nvd;
> +	struct nvdimm *nvdimm;
> +	struct resource *res;
> +	int rc = 0, i;
> +
> +	cxl_nvb = cxl_find_nvdimm_bridge(&cxlr_pmem->mapping[0].cxlmd->dev);
> +	if (!cxl_nvb) {
> +		dev_dbg(dev, "bridge not found\n");
> +		return -ENXIO;
> +	}
> +	cxlr_pmem->bridge = cxl_nvb;
> +
> +	device_lock(&cxl_nvb->dev);
> +	if (!cxl_nvb->nvdimm_bus) {
> +		dev_dbg(dev, "nvdimm bus not found\n");
> +		rc = -ENXIO;
> +		goto out;
> +	}
> +
> +	memset(&mappings, 0, sizeof(mappings));
> +	memset(&ndr_desc, 0, sizeof(ndr_desc));
> +
> +	res = devm_kzalloc(dev, sizeof(*res), GFP_KERNEL);
> +	if (!res) {
> +		rc = -ENOMEM;
> +		goto out;
> +	}
> +
> +	res->name = "Persistent Memory";
> +	res->start = cxlr_pmem->hpa_range.start;
> +	res->end = cxlr_pmem->hpa_range.end;
> +	res->flags = IORESOURCE_MEM;
> +	res->desc = IORES_DESC_PERSISTENT_MEMORY;
> +
> +	rc = insert_resource(&iomem_resource, res);
> +	if (rc)
> +		goto out;
> +
> +	rc = devm_add_action_or_reset(dev, cxlr_pmem_remove_resource, res);
> +	if (rc)
> +		goto out;
> +
> +	ndr_desc.res = res;
> +	ndr_desc.provider_data = cxlr_pmem;
> +
> +	ndr_desc.numa_node = memory_add_physaddr_to_nid(res->start);
> +	ndr_desc.target_node = phys_to_target_node(res->start);
> +	if (ndr_desc.target_node == NUMA_NO_NODE) {
> +		ndr_desc.target_node = ndr_desc.numa_node;
> +		dev_dbg(&cxlr->dev, "changing target node from %d to %d",
> +			NUMA_NO_NODE, ndr_desc.target_node);
> +	}
> +
> +	nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL);
> +	if (!nd_set) {
> +		rc = -ENOMEM;
> +		goto out;
> +	}
> +
> +	ndr_desc.memregion = cxlr->id;
> +	set_bit(ND_REGION_CXL, &ndr_desc.flags);
> +	set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
> +
> +	info = kmalloc_array(cxlr_pmem->nr_mappings, sizeof(*info), GFP_KERNEL);
> +	if (!info)
> +		goto out;
> +
> +	rc = -ENODEV;

Personal taste, but I'd much rather see that set in the error handlers
so I can quickly see where it applies.

> +	for (i = 0; i < cxlr_pmem->nr_mappings; i++) {
> +		struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
> +		struct cxl_memdev *cxlmd = m->cxlmd;
> +		struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +		struct device *d;
> +
> +		d = device_find_child(&cxlmd->dev, NULL, match_cxl_nvdimm);
> +		if (!d) {
> +			dev_dbg(dev, "[%d]: %s: no cxl_nvdimm found\n", i,
> +				dev_name(&cxlmd->dev));
> +			goto err;
> +		}
> +
> +		/* safe to drop ref now with bridge lock held */
> +		put_device(d);
> +
> +		cxl_nvd = to_cxl_nvdimm(d);
> +		nvdimm = dev_get_drvdata(&cxl_nvd->dev);
> +		if (!nvdimm) {
> +			dev_dbg(dev, "[%d]: %s: no nvdimm found\n", i,
> +				dev_name(&cxlmd->dev));
> +			goto err;
> +		}
> +		cxl_nvd->region = cxlr_pmem;
> +		get_device(&cxlr_pmem->dev);
> +		m->cxl_nvd = cxl_nvd;
> +		mappings[i] = (struct nd_mapping_desc) {
> +			.nvdimm = nvdimm,
> +			.start = m->start,
> +			.size = m->size,
> +			.position = i,
> +		};
> +		info[i].offset = m->start;
> +		info[i].serial = cxlds->serial;
> +	}
> +	ndr_desc.num_mappings = cxlr_pmem->nr_mappings;
> +	ndr_desc.mapping = mappings;
> +
> +	/*
> +	 * TODO enable CXL labels which skip the need for 'interleave-set cookie'
> +	 */
> +	nd_set->cookie1 =
> +		nd_fletcher64(info, sizeof(*info) * cxlr_pmem->nr_mappings, 0);
> +	nd_set->cookie2 = nd_set->cookie1;
> +	ndr_desc.nd_set = nd_set;
> +
> +	cxlr_pmem->nd_region =
> +		nvdimm_pmem_region_create(cxl_nvb->nvdimm_bus, &ndr_desc);
> +	if (IS_ERR(cxlr_pmem->nd_region)) {
> +		rc = PTR_ERR(cxlr_pmem->nd_region);
> +		goto err;
> +	} else

no need for else as other branch has gone flying off down to
err.

> +		rc = devm_add_action_or_reset(dev, unregister_region,
> +					      cxlr_pmem->nd_region);
> +out:

Having labels out: and err: where both are used for errors is pretty
confusing naming...  Perhaps you are better off just not sharing the
good exit path with any of the error paths.


> +	device_unlock(&cxl_nvb->dev);
> +	put_device(&cxl_nvb->dev);
> +	kfree(info);

Ok, so safe to do this here, but would be nice to do this
in reverse order of setup with multiple labels so we can avoid
paths that free things that were never created. Doesn't look
like it would hurt much to move kfree(info) above the device_unlock()
and only do that if we have allocated info.





> +
> +	if (rc)
> +		dev_dbg(dev, "failed to create nvdimm region\n");
> +	return rc;
> +
> +err:
> +	for (i--; i >= 0; i--) {
> +		nvdimm = mappings[i].nvdimm;
> +		cxl_nvd = nvdimm_provider_data(nvdimm);
> +		put_device(&cxl_nvd->region->dev);
> +		cxl_nvd->region = NULL;
> +	}
> +	goto out;
> +}
> +



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 00/46] CXL PMEM Region Provisioning
  2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
                   ` (47 preceding siblings ...)
  2022-06-28  3:12 ` Alison Schofield
@ 2022-07-02  2:26 ` Alison Schofield
  48 siblings, 0 replies; 157+ messages in thread
From: Alison Schofield @ 2022-07-02  2:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Weiny, Ira, Christoph Hellwig, Jason Gunthorpe,
	Ben Widawsky, Matthew Wilcox, nvdimm, linux-pci, patches

On Thu, Jun 23, 2022 at 07:45:00PM -0700, Dan Williams wrote:
> tl;dr: 46 patches is way too many patches to review in one sitting. Jump
> to the PATCH SUMMARY below to find a subset of interest to jump into.
> 
> The series is also posted on the 'preview' branch [1]. Note that branch
> rebases, the tip of that branch at time of posting is:
> 
> 7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects
> 
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview
>
Dan,

I'm seeing these smatch reports while working off of the preview branch.
Perhaps 0-day has already sent these reports aligned to patches. 

drivers/cxl/core/port.c:1482 cxl_decoder_alloc() warn: is 'alloc' large enough for 'struct cxl_root_decoder'? 0

drivers/cxl/core/port.c:1515 cxl_decoder_alloc() error: potentially dereferencing uninitialized 'cxld'.

drivers/cxl/core/hdm.c:457 cxld_set_interleave() error: uninitialized symbol 'eig'.
drivers/cxl/core/hdm.c:458 cxld_set_interleave() error: uninitialized symbol 'eiw'.
drivers/cxl/core/region.c:192 cxl_region_decode_commit() error: uninitialized symbol 'rc'.
drivers/cxl/core/region.c:201 cxl_region_decode_commit() error: uninitialized symbol 'rc'.
drivers/cxl/core/region.c:443 alloc_hpa() error: uninitialized symbol 'res'.
drivers/cxl/core/region.c:964 cxl_port_setup_targets() error: uninitialized symbol 'peig'.
drivers/cxl/core/region.c:964 cxl_port_setup_targets() error: uninitialized symbol 'peiw'.
drivers/cxl/core/region.c:964 cxl_port_setup_targets() error: uninitialized symbol 'eiw'.
drivers/cxl/core/region.c:968 cxl_port_setup_targets() error: uninitialized symbol 'peiw'.
drivers/cxl/core/region.c:969 cxl_port_setup_targets() error: uninitialized symbol 'peig'.
drivers/cxl/core/region.c:1557 create_pmem_region_store() warn: unsigned 'rc' is never less than zero.

> ---
snip
> 

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention
  2022-06-29 17:41     ` Adam Manzanares
@ 2022-07-09 20:06       ` Dan Williams
  2022-07-12 22:11         ` Adam Manzanares
  0 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-07-09 20:06 UTC (permalink / raw)
  To: Adam Manzanares, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Adam Manzanares wrote:
> On Thu, Jun 23, 2022 at 07:45:07PM -0700, Dan Williams wrote:
> > This failing signature:
> > 
> > [    8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760
> > [    8.392670] cxl_port: probe of endpoint2 failed with error 970997760
> > [    8.392719] create_endpoint: cxl_mem mem0: add: endpoint2
> > [    8.392721] cxl_mem mem0: endpoint2 failed probe
> > [    8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6
> > 
> > ...shows cxl_hdm_decode_init() resulting in a return code ("970997760")
> > that looks like stack corruption. The problem goes away if
> > cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init().
> > 
> > The corruption results from the mismatch that the calling convention for
> > cxl_hdm_decode_init() is:
> > 
> > int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> > 
> > ...and __wrap_cxl_hdm_decode_init() is:
> > 
> > bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> > 
> > ...i.e. an int is expected but __wrap_hdm_decode_init() returns bool.
> > 
> > Fix the convention and cleanup the organization to match
> > __wrap_cxl_await_media_ready() as the difference was a red herring that
> > distracted from finding the bug.
> > 
> > Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()")
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  tools/testing/cxl/test/mock.c |    8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> > index f1f8c40948c5..bce6a21df0d5 100644
> > --- a/tools/testing/cxl/test/mock.c
> > +++ b/tools/testing/cxl/test/mock.c
> > @@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL);
> >  
> > -bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
> > -				struct cxl_hdm *cxlhdm)
> > +int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
> > +			       struct cxl_hdm *cxlhdm)
> >  {
> >  	int rc = 0, index;
> >  	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> >  
> > -	if (!ops || !ops->is_mock_dev(cxlds->dev))
> > +	if (ops && ops->is_mock_dev(cxlds->dev))
> > +		rc = 0;
> > +	else
> >  		rc = cxl_hdm_decode_init(cxlds, cxlhdm);
> >  	put_cxl_mock_ops(index);
> >  
> > 
> 
> 
> Looks good.
> 
> Reviewed by: Adam Manzanares <a.manzanares@samsung.com>

Just fyi, b4 did not auto-apply this tag due to the missing "-", caught
it manually.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders
  2022-06-28 15:24   ` Jonathan Cameron
@ 2022-07-09 23:33     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-09 23:33 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:45:36 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Root decoders are responsible for hosting the available host address
> > space for endpoints and regions to claim. The tracking of that available
> > capacity can be done in iomem_resource directly. As a result, root
> > decoders no longer need to host their own resource tree. The
> > current ->platform_res attribute was added prematurely.
> > 
> > Otherwise, ->hpa_range fills the role of conveying the current decode
> > range of the decoder.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> One trivial moan inline about sneaky whitespace fixes, I'll cope if you really
> don't want to move that to a separate patch though :)
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > ---
> >  drivers/cxl/acpi.c      |   17 ++++++++++-------
> >  drivers/cxl/core/pci.c  |    8 +-------
> >  drivers/cxl/core/port.c |   30 +++++++-----------------------
> >  drivers/cxl/cxl.h       |    6 +-----
> >  4 files changed, 19 insertions(+), 42 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 40286f5df812..951695cdb455 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -108,8 +108,10 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  
> >  	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
> >  	cxld->target_type = CXL_DECODER_EXPANDER;
> > -	cxld->platform_res = (struct resource)DEFINE_RES_MEM(cfmws->base_hpa,
> > -							     cfmws->window_size);
> > +	cxld->hpa_range = (struct range) {
> > +		.start = cfmws->base_hpa,
> > +		.end = cfmws->base_hpa + cfmws->window_size - 1,
> > +	};
> >  	cxld->interleave_ways = CFMWS_INTERLEAVE_WAYS(cfmws);
> >  	cxld->interleave_granularity = CFMWS_INTERLEAVE_GRANULARITY(cfmws);
> >  
> > @@ -119,13 +121,14 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  	else
> >  		rc = cxl_decoder_autoremove(dev, cxld);
> >  	if (rc) {
> > -		dev_err(dev, "Failed to add decoder for %pr\n",
> > -			&cxld->platform_res);
> > +		dev_err(dev, "Failed to add decoder for [%#llx - %#llx]\n",
> > +			cxld->hpa_range.start, cxld->hpa_range.end);
> >  		return 0;
> >  	}
> > -	dev_dbg(dev, "add: %s node: %d range %pr\n", dev_name(&cxld->dev),
> > -		phys_to_target_node(cxld->platform_res.start),
> > -		&cxld->platform_res);
> > +	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> > +		dev_name(&cxld->dev),
> > +		phys_to_target_node(cxld->hpa_range.start),
> > +		cxld->hpa_range.start, cxld->hpa_range.end);
> >  
> >  	return 0;
> >  }
> > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > index c4c99ff7b55e..7672789c3225 100644
> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
> > @@ -225,7 +225,6 @@ static int dvsec_range_allowed(struct device *dev, void *arg)
> >  {
> >  	struct range *dev_range = arg;
> >  	struct cxl_decoder *cxld;
> > -	struct range root_range;
> >  
> >  	if (!is_root_decoder(dev))
> >  		return 0;
> > @@ -237,12 +236,7 @@ static int dvsec_range_allowed(struct device *dev, void *arg)
> >  	if (!(cxld->flags & CXL_DECODER_F_RAM))
> >  		return 0;
> >  
> > -	root_range = (struct range) {
> > -		.start = cxld->platform_res.start,
> > -		.end = cxld->platform_res.end,
> > -	};
> > -
> > -	return range_contains(&root_range, dev_range);
> > +	return range_contains(&cxld->hpa_range, dev_range);
> >  }
> >  
> >  static void disable_hdm(void *_cxlhdm)
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 98bcbbd59a75..b51eb41aa839 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -73,29 +73,17 @@ static ssize_t start_show(struct device *dev, struct device_attribute *attr,
> >  			  char *buf)
> >  {
> >  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > -	u64 start;
> >  
> > -	if (is_root_decoder(dev))
> > -		start = cxld->platform_res.start;
> > -	else
> > -		start = cxld->hpa_range.start;
> > -
> > -	return sysfs_emit(buf, "%#llx\n", start);
> > +	return sysfs_emit(buf, "%#llx\n", cxld->hpa_range.start);
> >  }
> >  static DEVICE_ATTR_ADMIN_RO(start);
> >  
> >  static ssize_t size_show(struct device *dev, struct device_attribute *attr,
> > -			char *buf)
> > +			 char *buf)
> 
> nitpick: Unrelated change.  Ideally not in this patch.

ok.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders
  2022-06-29 20:21     ` Adam Manzanares
@ 2022-07-09 23:38       ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-09 23:38 UTC (permalink / raw)
  To: Adam Manzanares, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Adam Manzanares wrote:
> On Thu, Jun 23, 2022 at 07:45:36PM -0700, Dan Williams wrote:
> > Root decoders are responsible for hosting the available host address
> > space for endpoints and regions to claim. The tracking of that available
> > capacity can be done in iomem_resource directly. As a result, root
> > decoders no longer need to host their own resource tree. The
> > current ->platform_res attribute was added prematurely.
> > 
> > Otherwise, ->hpa_range fills the role of conveying the current decode
> > range of the decoder.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/acpi.c      |   17 ++++++++++-------
> >  drivers/cxl/core/pci.c  |    8 +-------
> >  drivers/cxl/core/port.c |   30 +++++++-----------------------
> >  drivers/cxl/cxl.h       |    6 +-----
> >  4 files changed, 19 insertions(+), 42 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 40286f5df812..951695cdb455 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -108,8 +108,10 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  
> >  	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
> >  	cxld->target_type = CXL_DECODER_EXPANDER;
> > -	cxld->platform_res = (struct resource)DEFINE_RES_MEM(cfmws->base_hpa,
> > -							     cfmws->window_size);
> > +	cxld->hpa_range = (struct range) {
> > +		.start = cfmws->base_hpa,
> > +		.end = cfmws->base_hpa + cfmws->window_size - 1,
> > +	};
> >  	cxld->interleave_ways = CFMWS_INTERLEAVE_WAYS(cfmws);
> >  	cxld->interleave_granularity = CFMWS_INTERLEAVE_GRANULARITY(cfmws);
> >  
> > @@ -119,13 +121,14 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  	else
> >  		rc = cxl_decoder_autoremove(dev, cxld);
> >  	if (rc) {
> > -		dev_err(dev, "Failed to add decoder for %pr\n",
> > -			&cxld->platform_res);
> > +		dev_err(dev, "Failed to add decoder for [%#llx - %#llx]\n",
> > +			cxld->hpa_range.start, cxld->hpa_range.end);
> 
> Minor nit, should we add range in our debug message?
> 
> +		dev_err(dev, "Failed to add decoder for range [%#llx - %#llx]\n",

Sure, but I shortened it to:

"Failed to add decode range [%#llx - %#llx]\n", 

...just to keep it under 80 columns.

> Otherwise, looks good.
> 
> Reviewed by: Adam Manzanares <a.manzanares@samsung.com>

Thanks.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity}
  2022-06-28 15:36   ` Jonathan Cameron
@ 2022-07-09 23:52     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-09 23:52 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:45:50 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Interleave granularity and ways have CXL specification defined encodings.
> > Promote the conversion helpers to a common header, and use them to
> > replace other open-coded instances.
> > 
> > Force caller to consider the error case of the conversion as well.
> 
> What was the reasoning behind not just returning the value (rather
> than the extra *val parameter)?  Negative values would be errors
> still. Plenty of room to do that in an int.
> 
> I don't really mind, just feels a tiny bit uglier than it could be.

The rationale was to make it symmetric with reverse translation to
encoded values where those encode helpers are used directly for sysfs
input validation like the kstrto*() helpers.

Added a note to that effect.

> 
> Also, there is one little unrelated type change in here.
> 
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > ---
> >  drivers/cxl/acpi.c     |   34 +++++++++++++++++++---------------
> >  drivers/cxl/core/hdm.c |   35 +++++++++--------------------------
> >  drivers/cxl/cxl.h      |   26 ++++++++++++++++++++++++++
> >  3 files changed, 54 insertions(+), 41 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 951695cdb455..544cb10ce33e 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -9,10 +9,6 @@
> >  #include "cxlpci.h"
> >  #include "cxl.h"
> >  
> > -/* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
> > -#define CFMWS_INTERLEAVE_WAYS(x)	(1 << (x)->interleave_ways)
> > -#define CFMWS_INTERLEAVE_GRANULARITY(x)	((x)->granularity + 8)
> > -
> >  static unsigned long cfmws_to_decoder_flags(int restrictions)
> >  {
> >  	unsigned long flags = CXL_DECODER_F_ENABLE;
> > @@ -34,7 +30,8 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
> >  static int cxl_acpi_cfmws_verify(struct device *dev,
> >  				 struct acpi_cedt_cfmws *cfmws)
> >  {
> > -	int expected_len;
> > +	unsigned int expected_len, ways;
> 
> Type change for expected_len seems fine but isn't mentioned in the patch description.

Yeah, that seems a thoughtless change to me since only @ways needs to be
an unsigned int. I fixed it up.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder'
  2022-06-28 16:12   ` Jonathan Cameron
  2022-06-30 10:56     ` Jonathan Cameron
@ 2022-07-10  0:33     ` Dan Williams
  1 sibling, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10  0:33 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:45:57 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Currently 'struct cxl_decoder' contains the superset of attributes
> > needed for all decoder types. Before more type-specific attributes are
> > added to the common definition, reorganize 'struct cxl_decoder' into type
> > specific objects.
> > 
> > This patch, the first of three, factors out a cxl_switch_decoder type.
> > The 'switch' decoder type represents the decoder instances of cxl_port's
> > that route from the root of a CXL memory decode topology to the
> > endpoints. They come in two flavors, root-level decoders, statically
> > defined by platform firmware, and mid-level decoders, where
> > interleave-granularity, interleave-width, and the target list are
> > mutable.
> 
> I'd like to see this info on cxl_switch_decoder being used for
> switches AND other stuff as docs next to the definition. It confused
> me when looked directly at the resulting of applying this series
> and made more sense once I read to this patch.
> 
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> Basic idea is fine, but there are a few places where I think this is
> 'too clever' with error handling and it's worth duplicating a few
> error messages to keep the flow simpler.
> 
> Also, nice to drop the white space tweaks that have snuck in here.
> Particularly the wrong one ;)
> 
> 
> > ---
> >  drivers/cxl/acpi.c           |    4 +
> >  drivers/cxl/core/hdm.c       |   21 +++++---
> >  drivers/cxl/core/port.c      |  115 +++++++++++++++++++++++++++++++-----------
> >  drivers/cxl/cxl.h            |   27 ++++++----
> >  tools/testing/cxl/test/cxl.c |   12 +++-
> >  5 files changed, 128 insertions(+), 51 deletions(-)
> > 
> 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 46635105a1f1..2d1f3e6eebea 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> 
> 
> > @@ -226,8 +226,15 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
> >  
> >  		if (is_cxl_endpoint(port))
> >  			cxld = cxl_endpoint_decoder_alloc(port);
> > -		else
> > -			cxld = cxl_switch_decoder_alloc(port, target_count);
> > +		else {
> > +			struct cxl_switch_decoder *cxlsd;
> > +
> > +			cxlsd = cxl_switch_decoder_alloc(port, target_count);
> > +			if (IS_ERR(cxlsd))
> > +				cxld = ERR_CAST(cxlsd);
> 
> As described later, I'd rather local error handing in these branches
> as I think it will be more readable than this dance with error casting. for
> the cost of maybe 2 lines.

I am going to scrub one step deeper and just move all of the decoder
type specific code into the cxl_<type>_decoder_alloc() callers.

> 
> > +			else
> > +				cxld = &cxlsd->cxld;
> > +		}
> >  		if (IS_ERR(cxld)) {
> >  			dev_warn(&port->dev,
> >  				 "Failed to allocate the decoder\n");
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 13c321afe076..fd1cac13cd2e 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> 
> ....
> 
> >  
> > +static void __cxl_decoder_release(struct cxl_decoder *cxld)
> > +{
> > +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > +
> > +	ida_free(&port->decoder_ida, cxld->id);
> > +	put_device(&port->dev);
> > +}
> > +
> >  static void cxl_decoder_release(struct device *dev)
> >  {
> >  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> > -	struct cxl_port *port = to_cxl_port(dev->parent);
> >  
> > -	ida_free(&port->decoder_ida, cxld->id);
> > +	__cxl_decoder_release(cxld);
> >  	kfree(cxld);
> > -	put_device(&port->dev);
> 
> I was going to moan about this reorder, but this is actually
> the right order as we allocate then get_device() so
> reverse should indeed do the put _device first.
> So good incidental clean up of ordering :)
> 
> > +}
> > +
> > +static void cxl_switch_decoder_release(struct device *dev)
> > +{
> > +	struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
> > +
> > +	__cxl_decoder_release(&cxlsd->cxld);
> > +	kfree(cxlsd);
> >  }
> >  
> >  static const struct device_type cxl_decoder_endpoint_type = {
> > @@ -250,13 +267,13 @@ static const struct device_type cxl_decoder_endpoint_type = {
> >  
> >  static const struct device_type cxl_decoder_switch_type = {
> >  	.name = "cxl_decoder_switch",
> > -	.release = cxl_decoder_release,
> > +	.release = cxl_switch_decoder_release,
> >  	.groups = cxl_decoder_switch_attribute_groups,
> >  };
> >  
> >  static const struct device_type cxl_decoder_root_type = {
> >  	.name = "cxl_decoder_root",
> > -	.release = cxl_decoder_release,
> > +	.release = cxl_switch_decoder_release,
> >  	.groups = cxl_decoder_root_attribute_groups,
> >  };
> >  
> > @@ -271,15 +288,29 @@ bool is_root_decoder(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
> >  
> > +static bool is_switch_decoder(struct device *dev)
> > +{
> > +	return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
> > +}
> > +
> >  struct cxl_decoder *to_cxl_decoder(struct device *dev)
> >  {
> > -	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
> > +	if (dev_WARN_ONCE(dev,
> > +			  !is_switch_decoder(dev) && !is_endpoint_decoder(dev),
> >  			  "not a cxl_decoder device\n"))
> >  		return NULL;
> >  	return container_of(dev, struct cxl_decoder, dev);
> >  }
> >  EXPORT_SYMBOL_NS_GPL(to_cxl_decoder, CXL);
> >  
> > +static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
> > +{
> > +	if (dev_WARN_ONCE(dev, !is_switch_decoder(dev),
> > +			  "not a cxl_switch_decoder device\n"))
> > +		return NULL;
> > +	return container_of(dev, struct cxl_switch_decoder, cxld.dev);
> > +}
> > +
> >  static void cxl_ep_release(struct cxl_ep *ep)
> >  {
> >  	if (!ep)
> > @@ -1129,7 +1160,7 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_find_dport_by_dev, CXL);
> >  
> > -static int decoder_populate_targets(struct cxl_decoder *cxld,
> > +static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
> >  				    struct cxl_port *port, int *target_map)
> >  {
> >  	int i, rc = 0;
> > @@ -1142,17 +1173,17 @@ static int decoder_populate_targets(struct cxl_decoder *cxld,
> >  	if (list_empty(&port->dports))
> >  		return -EINVAL;
> >  
> > -	write_seqlock(&cxld->target_lock);
> > -	for (i = 0; i < cxld->nr_targets; i++) {
> > +	write_seqlock(&cxlsd->target_lock);
> > +	for (i = 0; i < cxlsd->nr_targets; i++) {
> >  		struct cxl_dport *dport = find_dport(port, target_map[i]);
> >  
> >  		if (!dport) {
> >  			rc = -ENXIO;
> >  			break;
> >  		}
> > -		cxld->target[i] = dport;
> > +		cxlsd->target[i] = dport;
> >  	}
> > -	write_sequnlock(&cxld->target_lock);
> > +	write_sequnlock(&cxlsd->target_lock);
> >  
> >  	return rc;
> >  }
> > @@ -1179,13 +1210,27 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  {
> >  	struct cxl_decoder *cxld;
> >  	struct device *dev;
> > +	void *alloc;
> >  	int rc = 0;
> >  
> >  	if (nr_targets > CXL_DECODER_MAX_INTERLEAVE)
> >  		return ERR_PTR(-EINVAL);
> >  
> > -	cxld = kzalloc(struct_size(cxld, target, nr_targets), GFP_KERNEL);
> > -	if (!cxld)
> > +	if (nr_targets) {
> > +		struct cxl_switch_decoder *cxlsd;
> > +
> > +		alloc = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL);
> 
> I'd rather see a local check on the allocation failure even if it adds a few lines
> of duplicated code - which after you've dropped the local alloc variable won't be
> much even after a later patch adds another path in here.  The eventual code
> of this function is more than a little nasty when an early return in each
> path would, as far as I can tell, give the same result without the at least
> 3 null checks prior to returning (to ensure nothing happens before reaching
> the if (!alloc)
> 
> 
> 
> 
> 		cxlsd = kzalloc()
> 		if (!cxlsd)
> 			return ERR_PTR(-ENOMEM);
> 
> 		cxlsd->nr_targets = nr_targets;
> 		seqlock_init(...)
> 
> 	} else {
> 		cxld = kzalloc(sizerof(*cxld), GFP_KERNEL);
> 		if (!cxld)
> 			return ERR_PTR(-ENOMEM);

Point taken, and it's even cleaner without trying to recover the decoder
type in this function that is mostly just a base 'decoder init' helper.

> 
> > +		cxlsd = alloc;
> > +		if (cxlsd) {
> > +			cxlsd->nr_targets = nr_targets;
> > +			seqlock_init(&cxlsd->target_lock);
> > +			cxld = &cxlsd->cxld;
> > +		}
> > +	} else {
> > +		alloc = kzalloc(sizeof(*cxld), GFP_KERNEL);
> > +		cxld = alloc;
> > +	}
> > +	if (!alloc)
> >  		return ERR_PTR(-ENOMEM);
> >  
> >  	rc = ida_alloc(&port->decoder_ida, GFP_KERNEL);
> > @@ -1196,8 +1241,6 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  	get_device(&port->dev);
> >  	cxld->id = rc;
> >  
> > -	cxld->nr_targets = nr_targets;
> > -	seqlock_init(&cxld->target_lock);
> >  	dev = &cxld->dev;
> >  	device_initialize(dev);
> >  	lockdep_set_class(&dev->mutex, &cxl_decoder_key);
> > @@ -1222,7 +1265,7 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  
> >  	return cxld;
> >  err:
> > -	kfree(cxld);
> > +	kfree(alloc);
> >  	return ERR_PTR(rc);
> >  }
> >  
> > @@ -1236,13 +1279,18 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >   * firmware description of CXL resources into a CXL standard decode
> >   * topology.
> >   */
> > -struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> > -					   unsigned int nr_targets)
> > +struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> > +						  unsigned int nr_targets)
> >  {
> > +	struct cxl_decoder *cxld;
> > +
> >  	if (!is_cxl_root(port))
> >  		return ERR_PTR(-EINVAL);
> >  
> > -	return cxl_decoder_alloc(port, nr_targets);
> > +	cxld = cxl_decoder_alloc(port, nr_targets);
> > +	if (IS_ERR(cxld))
> > +		return ERR_CAST(cxld);
> > +	return to_cxl_switch_decoder(&cxld->dev);
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
> >  
> > @@ -1257,13 +1305,18 @@ EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
> >   * that sit between Switch Upstream Ports / Switch Downstream Ports and
> >   * Host Bridges / Root Ports.
> >   */
> > -struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> > -					     unsigned int nr_targets)
> > +struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> > +						    unsigned int nr_targets)
> >  {
> > +	struct cxl_decoder *cxld;
> > +
> >  	if (is_cxl_root(port) || is_cxl_endpoint(port))
> >  		return ERR_PTR(-EINVAL);
> >  
> > -	return cxl_decoder_alloc(port, nr_targets);
> > +	cxld = cxl_decoder_alloc(port, nr_targets);
> > +	if (IS_ERR(cxld))
> > +		return ERR_CAST(cxld);
> > +	return to_cxl_switch_decoder(&cxld->dev);
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL);
> >  
> > @@ -1320,7 +1373,9 @@ int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map)
> >  
> >  	port = to_cxl_port(cxld->dev.parent);
> >  	if (!is_endpoint_decoder(dev)) {
> > -		rc = decoder_populate_targets(cxld, port, target_map);
> > +		struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
> > +
> > +		rc = decoder_populate_targets(cxlsd, port, target_map);
> >  		if (rc && (cxld->flags & CXL_DECODER_F_ENABLE)) {
> >  			dev_err(&port->dev,
> >  				"Failed to populate active decoder targets\n");
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index fd02f9e2a829..7525b55b11bb 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -220,7 +220,7 @@ enum cxl_decoder_type {
> >  #define CXL_DECODER_MAX_INTERLEAVE 16
> >  
> >  /**
> > - * struct cxl_decoder - CXL address range decode configuration
> > + * struct cxl_decoder - Common CXL HDM Decoder Attributes
> >   * @dev: this decoder's device
> >   * @id: kernel device name id
> >   * @hpa_range: Host physical address range mapped by this decoder
> > @@ -228,10 +228,7 @@ enum cxl_decoder_type {
> >   * @interleave_granularity: data stride per dport
> >   * @target_type: accelerator vs expander (type2 vs type3) selector
> >   * @flags: memory type capabilities and locking
> > - * @target_lock: coordinate coherent reads of the target list
> > - * @nr_targets: number of elements in @target
> > - * @target: active ordered target list in current decoder configuration
> > - */
> > +*/
> 
> ?

Fixed.

> 
> >  struct cxl_decoder {
> >  	struct device dev;
> >  	int id;
> > @@ -240,12 +237,22 @@ struct cxl_decoder {
> >  	int interleave_granularity;
> >  	enum cxl_decoder_type target_type;
> >  	unsigned long flags;
> > +};
> > +
> > +/**
> > + * struct cxl_switch_decoder - Switch specific CXL HDM Decoder
> 
> Whilst you define the broad use of switch in the patch description, I think
> it is worth explaining here that it's CFMWS, HB and switch decoders
> (if I understand correctly - this had me very confused when looking
> at the overall code)
> 
> > + * @cxld: base cxl_decoder object
> > + * @target_lock: coordinate coherent reads of the target list
> > + * @nr_targets: number of elements in @target
> > + * @target: active ordered target list in current decoder configuration
> > + */
> > +struct cxl_switch_decoder {
> > +	struct cxl_decoder cxld;
> >  	seqlock_t target_lock;
> >  	int nr_targets;
> >  	struct cxl_dport *target[];
> >  };
> >  
> > -
> 
> *grumble grumble*  Unconnected white space fix.

Just checking if you're paying attention. Fixed.

> 
> >  /**
> >   * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
> >   * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
> > @@ -363,10 +370,10 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port,
> >  struct cxl_decoder *to_cxl_decoder(struct device *dev);
> >  bool is_root_decoder(struct device *dev);
> >  bool is_endpoint_decoder(struct device *dev);
> > -struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> > -					   unsigned int nr_targets);
> > -struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> > -					     unsigned int nr_targets);
> > +struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> > +						  unsigned int nr_targets);
> > +struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> > +						    unsigned int nr_targets);
> >  int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
> >  struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
> >  int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map);
> > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> > index 7a08b025f2de..68288354b419 100644
> > --- a/tools/testing/cxl/test/cxl.c
> > +++ b/tools/testing/cxl/test/cxl.c
> > @@ -451,9 +451,15 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
> >  		struct cxl_decoder *cxld;
> >  		int rc;
> >  
> > -		if (target_count)
> > -			cxld = cxl_switch_decoder_alloc(port, target_count);
> > -		else
> > +		if (target_count) {
> > +			struct cxl_switch_decoder *cxlsd;
> > +
> > +			cxlsd = cxl_switch_decoder_alloc(port, target_count);
> > +			if (IS_ERR(cxlsd))
> > +				cxld = ERR_CAST(cxlsd);
> 
> Looks cleaner to me to move error handling into the branches. You duplicate
> an error print but avoid ERR_CAST mess just to cast it back to an error in the
> error path a few lines later.

ok.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder'
  2022-06-30 10:56     ` Jonathan Cameron
@ 2022-07-10  0:49       ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10  0:49 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

Jonathan Cameron wrote:
> On Tue, 28 Jun 2022 17:12:04 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Thu, 23 Jun 2022 19:45:57 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> > 
> > > Currently 'struct cxl_decoder' contains the superset of attributes
> > > needed for all decoder types. Before more type-specific attributes are
> > > added to the common definition, reorganize 'struct cxl_decoder' into type
> > > specific objects.
> > > 
> > > This patch, the first of three, factors out a cxl_switch_decoder type.
> > > The 'switch' decoder type represents the decoder instances of cxl_port's
> > > that route from the root of a CXL memory decode topology to the
> > > endpoints. They come in two flavors, root-level decoders, statically
> > > defined by platform firmware, and mid-level decoders, where
> > > interleave-granularity, interleave-width, and the target list are
> > > mutable.  
> > 
> > I'd like to see this info on cxl_switch_decoder being used for
> > switches AND other stuff as docs next to the definition. It confused
> > me when looked directly at the resulting of applying this series
> > and made more sense once I read to this patch.
> > 
> > > 
> > > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>  
> > 
> > Basic idea is fine, but there are a few places where I think this is
> > 'too clever' with error handling and it's worth duplicating a few
> > error messages to keep the flow simpler.
> > 
> 
> follow up on that. I'd missed the kfree(alloc) hiding in plain
> sight at the end of the function.
> 
> 
> 
> > > @@ -1179,13 +1210,27 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> > >  {
> > >  	struct cxl_decoder *cxld;
> > >  	struct device *dev;
> > > +	void *alloc;
> > >  	int rc = 0;
> > >  
> > >  	if (nr_targets > CXL_DECODER_MAX_INTERLEAVE)
> > >  		return ERR_PTR(-EINVAL);
> > >  
> > > -	cxld = kzalloc(struct_size(cxld, target, nr_targets), GFP_KERNEL);
> > > -	if (!cxld)
> > > +	if (nr_targets) {
> > > +		struct cxl_switch_decoder *cxlsd;
> > > +
> > > +		alloc = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL);  
> > 
> > I'd rather see a local check on the allocation failure even if it adds a few lines
> > of duplicated code - which after you've dropped the local alloc variable won't be
> > much even after a later patch adds another path in here.  The eventual code
> > of this function is more than a little nasty when an early return in each
> > path would, as far as I can tell, give the same result without the at least
> > 3 null checks prior to returning (to ensure nothing happens before reaching
> > the if (!alloc)
> 
> clearly not enough caffeine that day as I'd missed the use for unifying
> the frees at the end of the function... Just noticed that in a later patch
> that touches the error path.
> 
> I still don't much like the complexity of the flow, but can see why you did it
> this way now.

Appreciate it, it's much cleaner now with the nudge to take a second
look at reducing the complexity.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
  2022-06-28 16:43   ` Jonathan Cameron
@ 2022-07-10  2:12     ` Dan Williams
  2022-07-19 14:24       ` Jonathan Cameron
  0 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-07-10  2:12 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches,
	david, gregkh, jgg

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:46:05 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Recall that CXL capable address ranges, on ACPI platforms, are published
> > in the CEDT.CFMWS (CXL Early Discovery Table - CXL Fixed Memory Window
> > Structures). These windows represent both the actively mapped capacity
> > and the potential address space that can be dynamically assigned to a
> > new CXL decode configuration.
> > 
> > CXL endpoints like DDR DIMMs can be mapped at any physical address
> > including 0 and legacy ranges.
> > 
> > There is an expectation and requirement that the /proc/iomem interface
> > and the iomem_resource in the kernel reflect the full set of platform
> > address ranges. I.e. that every address range that platform firmware and
> > bus drivers enumerate be reflected as an iomem_resource entry. The hard
> > requirement to do this for CXL arises from the fact that capabilities
> > like CONFIG_DEVICE_PRIVATE expect to be able to treat empty
> > iomem_resource ranges as free for software to use as proxy address
> > space. Without CXL publishing its potential address ranges in
> > iomem_resource, the CONFIG_DEVICE_PRIVATE mechanism may inadvertently
> > steal capacity reserved for runtime provisioning of new CXL regions.
> > 
> > The approach taken supports dynamically publishing the CXL window map on
> > demand when a CXL platform driver like cxl_acpi loads. The windows are
> > then forced into the first level of iomem_resource tree via the
> > insert_resource_expand_to_fit() API. This forcing sacrifices some
> > resource boundary accurracy in order to better reflect the decode
> > hierarchy of a CXL window hosting "System RAM" and other resources.
> 
> I don't fully understand this and in particular what assumptions it
> is making.  How do we end up with overlaping resources via just parsing
> the CFMWS for instance...

Consider the case of platform firmware placing CXL memory in the EFI
memory map. In that case the CXL address range will already exist in
iomem_resource as a "System RAM" resource. The goal of this patch is to
reflect the true hierarchy of the resource tree, but late in the boot
cycle when the CXL driver stack loads.

I will add a clarification along these lines to the changelog.


> I would shout a lot louder in this description about using the CXL NS
> for that export.  That's liable to be controversial.

Added some folks to this reply and will cc them on the resend (Greg,
David, Jason), but I will remind anyone following along that proposed
solution here is the one discussed at LSF/MM:

https://lwn.net/Articles/894626/

...and suggested by Jason:

https://lore.kernel.org/all/20220420143406.GY2120790@nvidia.com/

This also builds on David's work to remove "top level resource" special
casing in various kernel paths.

Otherwise, if your concern is the export itself, I think this is a
straightforward example of why namespaces were created in the first
place to limit exports to a specific scope when there is no intent to
make the export available more generally.

> 
> > 
> > Walkers of the iomem_resource tree will also need to have access to the
> > related 'struct cxl_decoder' instances to disambiguate which portions of
> > a CXL memory resource are present vs expanded to enforce the expected
> > resource topology.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/acpi.c |  110 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  kernel/resource.c  |    7 +++
> >  2 files changed, 114 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index d1b914dfa36c..003fa4fde357 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -73,6 +73,7 @@ static int cxl_acpi_cfmws_verify(struct device *dev,
> >  struct cxl_cfmws_context {
> >  	struct device *dev;
> >  	struct cxl_port *root_port;
> > +	int id;
> >  };
> >  
> >  static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> > @@ -84,8 +85,10 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  	struct cxl_switch_decoder *cxlsd;
> >  	struct device *dev = ctx->dev;
> >  	struct acpi_cedt_cfmws *cfmws;
> > +	struct resource *cxl_res;
> >  	struct cxl_decoder *cxld;
> >  	unsigned int ways, i, ig;
> > +	struct resource *res;
> >  	int rc;
> >  
> >  	cfmws = (struct acpi_cedt_cfmws *) header;
> > @@ -107,6 +110,24 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  	for (i = 0; i < ways; i++)
> >  		target_map[i] = cfmws->interleave_targets[i];
> >  
> > +	res = kzalloc(sizeof(*res), GFP_KERNEL);
> > +	if (!res)
> > +		return -ENOMEM;
> > +
> > +	res->name = kasprintf(GFP_KERNEL, "CXL Window %d", ctx->id++);
> > +	if (!res->name)
> > +		goto err_name;
> > +
> > +	res->start = cfmws->base_hpa;
> > +	res->end = cfmws->base_hpa + cfmws->window_size - 1;
> > +	res->flags = IORESOURCE_MEM;
> > +
> > +	/* add to the local resource tracking to establish a sort order */
> > +	cxl_res = dev_get_drvdata(&root_port->dev);
> 
> As mentioned below, why not add cxl_res to the ctx?

Good idea.

> 
> > +	rc = insert_resource(cxl_res, res);
> > +	if (rc)
> > +		goto err_insert;
> > +
> >  	cxlsd = cxl_root_decoder_alloc(root_port, ways);
> >  	if (IS_ERR(cxld))
> >  		return 0;
> > @@ -115,8 +136,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
> >  	cxld->target_type = CXL_DECODER_EXPANDER;
> >  	cxld->hpa_range = (struct range) {
> > -		.start = cfmws->base_hpa,
> > -		.end = cfmws->base_hpa + cfmws->window_size - 1,
> > +		.start = res->start,
> > +		.end = res->end,
> >  	};
> >  	cxld->interleave_ways = ways;
> >  	cxld->interleave_granularity = ig;
> > @@ -131,12 +152,19 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  			cxld->hpa_range.start, cxld->hpa_range.end);
> >  		return 0;
> >  	}
> > +
> Another whitespace tweak that shouldn't be in a patch like this...

sure.

> 
> >  	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> >  		dev_name(&cxld->dev),
> >  		phys_to_target_node(cxld->hpa_range.start),
> >  		cxld->hpa_range.start, cxld->hpa_range.end);
> >  
> >  	return 0;
> > +
> > +err_insert:
> > +	kfree(res->name);
> > +err_name:
> > +	kfree(res);
> > +	return -ENOMEM;
> >  }
> >  
> >  __mock struct acpi_device *to_cxl_host_bridge(struct device *host,
> > @@ -291,9 +319,66 @@ static void cxl_acpi_lock_reset_class(void *dev)
> >  	device_lock_reset_class(dev);
> >  }
> >  
> > +static void del_cxl_resource(struct resource *res)
> > +{
> > +	kfree(res->name);
> > +	kfree(res);
> > +}
> > +
> > +static void remove_cxl_resources(void *data)
> > +{
> > +	struct resource *res, *next, *cxl = data;
> > +
> > +	for (res = cxl->child; res; res = next) {
> > +		struct resource *victim = (struct resource *) res->desc;
> > +
> > +		next = res->sibling;
> > +		remove_resource(res);
> > +
> > +		if (victim) {
> > +			remove_resource(victim);
> > +			kfree(victim);
> > +		}
> > +
> > +		del_cxl_resource(res);
> > +	}
> > +}
> > +
> > +static int add_cxl_resources(struct resource *cxl)
> 
> I'd like to see some documentation of what this is doing...
> 
> > +{
> > +	struct resource *res, *new, *next;
> > +
> > +	for (res = cxl->child; res; res = next) {
> > +		new = kzalloc(sizeof(*new), GFP_KERNEL);
> > +		if (!new)
> > +			return -ENOMEM;
> > +		new->name = res->name;
> > +		new->start = res->start;
> > +		new->end = res->end;
> > +		new->flags = IORESOURCE_MEM;
> > +		res->desc = (unsigned long) new;
> > +
> > +		insert_resource_expand_to_fit(&iomem_resource, new);
> 
> Given you've called out limitations of this call in the patch description
> it would be good to have some of that info in the code.
> 
> > +
> > +		next = res->sibling;
> > +		while (next && resource_overlaps(new, next)) {
> 
> I'm struggling to grasp why we'd have overlaps, comments would probably help.

Added the following...

/**
 * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
 * @cxl_res: A standalone resource tree where each CXL window is a sibling
 *
 * Walk each CXL window in @cxl_res and add it to iomem_resource potentially
 * expanding its boundaries to ensure that any conflicting resources become
 * children. If a window is expanded it may then conflict with a another window
 * entry and require the window to be truncated or trimmed. Consider this
 * situation:
 *
 * |-- "CXL Window 0" --||----- "CXL Window 1" -----|
 * |--------------- "System RAM" -------------|
 *
 * ...where platform firmware has established as System RAM resource across 2
 * windows, but has left some portion of window 1 for dynamic CXL region
 * provisioning. In this case "Window 0" will span the entirety of the "System
 * RAM" span, and "CXL Window 1" is truncated to the remaining tail past the end
 * of that "System RAM" resource.
 */


Also, if you're wondering, the mismatch of iomem_resource entries to the
CXL windows does not matter in practice as dynamic region provisioning
only cares about the portions of the CXL windows that do not intersect
with any other resource. All that matters is that all intersections are
accounted for when it comes time to scan for free address space.

> 
> > +			if (resource_contains(new, next)) {
> > +				struct resource *_next = next->sibling;
> > +
> > +				remove_resource(next);
> > +				del_cxl_resource(next);
> > +				next = _next;
> > +			} else
> > +				next->start = new->end + 1;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +
> >  static int cxl_acpi_probe(struct platform_device *pdev)
> >  {
> >  	int rc;
> > +	struct resource *cxl_res;
> >  	struct cxl_port *root_port;
> >  	struct device *host = &pdev->dev;
> >  	struct acpi_device *adev = ACPI_COMPANION(host);
> > @@ -305,21 +390,40 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> >  	if (rc)
> >  		return rc;
> >  
> > +	cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL);
> > +	if (!cxl_res)
> > +		return -ENOMEM;
> > +	cxl_res->name = "CXL mem";
> > +	cxl_res->start = 0;
> > +	cxl_res->end = -1;
> > +	cxl_res->flags = IORESOURCE_MEM;
> > +
> >  	root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
> >  	if (IS_ERR(root_port))
> >  		return PTR_ERR(root_port);
> >  	dev_dbg(host, "add: %s\n", dev_name(&root_port->dev));
> > +	dev_set_drvdata(&root_port->dev, cxl_res);
> 
> Rather ugly way of sneaking it into the callback. If that is the only
> purpose, perhaps better to just add to the cxl_cfmws_context.

yup.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources
  2022-06-28 16:49   ` Jonathan Cameron
@ 2022-07-10  2:20     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10  2:20 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:46:13 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Previously the target routing specifics of switch decoders were factored
> > out of 'struct cxl_decoder' into 'struct cxl_switch_decoder'.
> > 
> > This patch, 2 of 3, adds a 'struct cxl_root_decoder' as a superset of a
> > switch decoder that also track the associated CXL window platform
> > resource.
> > 
> > Note that the reason the resource for a given root decoder needs to be
> > looked up after the fact (i.e. after cxl_parse_cfmws() and
> > add_cxl_resource()) is because add_cxl_resource() may have merged CXL
> > windows in order to keep them at the top of the resource tree / decode
> > hierarchy.
> 
> One trivial comment below that follows from earlier patch.
> 
> Otherwise, I'll look again at this when I understand what the constraints
> of CXL windows are that you are dealing with.  I don't get why they might not
> be at the top of the resource tree without the merging!
> 
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/acpi.c      |   40 ++++++++++++++++++++++++++++++++++++----
> >  drivers/cxl/core/port.c |   43 +++++++++++++++++++++++++++++++++++++------
> >  drivers/cxl/cxl.h       |   15 +++++++++++++--
> >  3 files changed, 86 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 003fa4fde357..5972f380cdf2 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -82,7 +82,7 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  	int target_map[CXL_DECODER_MAX_INTERLEAVE];
> >  	struct cxl_cfmws_context *ctx = arg;
> >  	struct cxl_port *root_port = ctx->root_port;
> > -	struct cxl_switch_decoder *cxlsd;
> > +	struct cxl_root_decoder *cxlrd;
> >  	struct device *dev = ctx->dev;
> >  	struct acpi_cedt_cfmws *cfmws;
> >  	struct resource *cxl_res;
> > @@ -128,11 +128,11 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  	if (rc)
> >  		goto err_insert;
> >  
> > -	cxlsd = cxl_root_decoder_alloc(root_port, ways);
> > -	if (IS_ERR(cxld))
> > +	cxlrd = cxl_root_decoder_alloc(root_port, ways);
> > +	if (IS_ERR(cxlrd))
> >  		return 0;
> >  
> > -	cxld = &cxlsd->cxld;
> > +	cxld = &cxlrd->cxlsd.cxld;
> >  	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
> >  	cxld->target_type = CXL_DECODER_EXPANDER;
> >  	cxld->hpa_range = (struct range) {
> > @@ -375,6 +375,32 @@ static int add_cxl_resources(struct resource *cxl)
> >  	return 0;
> >  }
> >  
> > +static int pair_cxl_resource(struct device *dev, void *data)
> > +{
> > +	struct resource *cxl_res = data;
> > +	struct resource *p;
> > +
> > +	if (!is_root_decoder(dev))
> > +		return 0;
> > +
> > +	for (p = cxl_res->child; p; p = p->sibling) {
> > +		struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> > +		struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> > +		struct resource res = {
> > +			.start = cxld->hpa_range.start,
> > +			.end = cxld->hpa_range.end,
> > +			.flags = IORESOURCE_MEM,
> > +		};
> > +
> > +		if (resource_contains(p, &res)) {
> > +			cxlrd->res = (struct resource *)p->desc;
> > +			break;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int cxl_acpi_probe(struct platform_device *pdev)
> >  {
> >  	int rc;
> > @@ -425,6 +451,12 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> >  	if (rc)
> >  		return rc;
> >  
> > +	/*
> > +	 * Populate the root decoders with their related iomem resource,
> > +	 * if present
> > +	 */
> > +	device_for_each_child(&root_port->dev, cxl_res, pair_cxl_resource);
> > +
> >  	/*
> >  	 * Root level scanned with host-bridge as dports, now scan host-bridges
> >  	 * for their role as CXL uports to their CXL-capable PCIe Root Ports.
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index fd1cac13cd2e..abf3455c4eff 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -259,6 +259,23 @@ static void cxl_switch_decoder_release(struct device *dev)
> >  	kfree(cxlsd);
> >  }
> >  
> > +struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev)
> > +{
> > +	if (dev_WARN_ONCE(dev, !is_root_decoder(dev),
> > +			  "not a cxl_root_decoder device\n"))
> > +		return NULL;
> > +	return container_of(dev, struct cxl_root_decoder, cxlsd.cxld.dev);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(to_cxl_root_decoder, CXL);
> > +
> > +static void cxl_root_decoder_release(struct device *dev)
> > +{
> > +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> > +
> > +	__cxl_decoder_release(&cxlrd->cxlsd.cxld);
> > +	kfree(cxlrd);
> > +}
> > +
> >  static const struct device_type cxl_decoder_endpoint_type = {
> >  	.name = "cxl_decoder_endpoint",
> >  	.release = cxl_decoder_release,
> > @@ -273,7 +290,7 @@ static const struct device_type cxl_decoder_switch_type = {
> >  
> >  static const struct device_type cxl_decoder_root_type = {
> >  	.name = "cxl_decoder_root",
> > -	.release = cxl_switch_decoder_release,
> > +	.release = cxl_root_decoder_release,
> >  	.groups = cxl_decoder_root_attribute_groups,
> >  };
> >  
> > @@ -1218,9 +1235,23 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  
> >  	if (nr_targets) {
> >  		struct cxl_switch_decoder *cxlsd;
> > +		struct cxl_root_decoder *cxlrd;
> > +
> > +		if (is_cxl_root(port)) {
> > +			alloc = kzalloc(struct_size(cxlrd, cxlsd.target,
> > +						    nr_targets),
> > +					GFP_KERNEL);
> > +			cxlrd = alloc;
> > +			if (cxlrd)
> > +				cxlsd = &cxlrd->cxlsd;
> > +			else
> > +				cxlsd = NULL;
> > +		} else {
> > +			alloc = kzalloc(struct_size(cxlsd, target, nr_targets),
> > +					GFP_KERNEL);
> > +			cxlsd = alloc;
> 
> As earlier, I'd prefer you just handled errors when they happened rather than
> dancing onwards...

Yes, this gets cleaned up with moving the allocation to
cxl_root_decoder_alloc() directly.

> 
> > +		}
> >  
> > -		alloc = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL);
> > -		cxlsd = alloc;
> >  		if (cxlsd) {
> >  			cxlsd->nr_targets = nr_targets;
> >  			seqlock_init(&cxlsd->target_lock);
> > @@ -1279,8 +1310,8 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >   * firmware description of CXL resources into a CXL standard decode
> >   * topology.
> >   */
> > -struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> > -						  unsigned int nr_targets)
> > +struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> > +						unsigned int nr_targets)
> >  {
> >  	struct cxl_decoder *cxld;
> >  
> > @@ -1290,7 +1321,7 @@ struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> >  	cxld = cxl_decoder_alloc(port, nr_targets);
> >  	if (IS_ERR(cxld))
> >  		return ERR_CAST(cxld);
> > -	return to_cxl_switch_decoder(&cxld->dev);
> > +	return to_cxl_root_decoder(&cxld->dev);
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
> >  
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 7525b55b11bb..6dd1e4c57a67 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -253,6 +253,16 @@ struct cxl_switch_decoder {
> >  	struct cxl_dport *target[];
> >  };
> >  
> > +/**
> > + * struct cxl_root_decoder - Static platform CXL address decoder
> > + * @res: host / parent resource for region allocations
> > + * @cxlsd: base cxl switch decoder
> > + */
> > +struct cxl_root_decoder {
> > +	struct resource *res;
> > +	struct cxl_switch_decoder cxlsd;
> 
> Could be nice to those container of macros and just put the cxlsd first.

Not possible. @cxlsd needs to be the last attribute because it has a
variably sized flex-array at its end.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
  2022-06-28 16:55   ` Jonathan Cameron
@ 2022-07-10  2:40     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10  2:40 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:46:21 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Previously the target routing specifics of switch decoders and platfom
> > CXL window resource tracking of root decoders were factored out of
> > 'struct cxl_decoder'. While switch decoders translate from SPA to
> > downstream ports, endpoint decoders translate from SPA to DPA.
> > 
> > This patch, 3 of 3, adds a 'struct cxl_endpoint_decoder' that tracks an
> > endpoint-specific Device Physical Address (DPA) resource. For now this
> > just defines ->dpa_res, a follow-on patch will handle requesting DPA
> > resource ranges from a device-DPA resource tree.
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/hdm.c       |   12 +++++++++---
> >  drivers/cxl/core/port.c      |   36 +++++++++++++++++++++++++++---------
> >  drivers/cxl/cxl.h            |   15 ++++++++++++++-
> >  tools/testing/cxl/test/cxl.c |   11 +++++++++--
> >  4 files changed, 59 insertions(+), 15 deletions(-)
> > 
> 
> 
> 
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 6dd1e4c57a67..579f2d802396 100644
> 
> 
> >  int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
> > -struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
> > +struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
> >  int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map);
> >  int cxl_decoder_autoremove(struct device *host, struct cxl_decoder *cxld);
> >  int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
> > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> > index 68288354b419..f52a5dd69d36 100644
> > --- a/tools/testing/cxl/test/cxl.c
> > +++ b/tools/testing/cxl/test/cxl.c
> > @@ -459,8 +459,15 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
> >  				cxld = ERR_CAST(cxlsd);
> >  			else
> >  				cxld = &cxlsd->cxld;
> > -		} else
> > -			cxld = cxl_endpoint_decoder_alloc(port);
> > +		} else {
> > +			struct cxl_endpoint_decoder *cxled;
> > +
> > +			cxled = cxl_endpoint_decoder_alloc(port);
> > +			if (IS_ERR(cxled))
> > +				cxld = ERR_CAST(cxled);
> 
> It's my favourite code pattern to moan about today :)
> Same thing - just handle error here and it'll be easier to read for cost of a few
> lines of additional code.  Few other cases of it in here.

Done and done.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 14/46] cxl/hdm: Enumerate allocated DPA
  2022-06-29 14:43   ` Jonathan Cameron
@ 2022-07-10  3:03     ` Dan Williams
  2022-07-19 14:25       ` Jonathan Cameron
  0 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-07-10  3:03 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:46:44 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > In preparation for provisioining CXL regions, add accounting for the DPA
> > space consumed by existing regions / decoders. Recall, a CXL region is a
> > memory range comrpised from one or more endpoint devices contributing a
> > mapping of their DPA into HPA space through a decoder.
> > 
> > Record the DPA ranges covered by committed decoders at initial probe of
> > endpoint ports relative to a per-device resource tree of the DPA type
> > (pmem or volaltile-ram).
> > 
> > The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
> > state across all endpoints and their decoders at once. The vast majority
> > of DPA operations are reads as region creation is expected to be as rare
> > as disk partitioning and volume creation. The device_lock() for this
> > synchronization is specifically avoided for concern of entangling with
> > sysfs attribute removal.
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/hdm.c |  148 ++++++++++++++++++++++++++++++++++++++++++++----
> >  drivers/cxl/cxl.h      |    2 +
> >  drivers/cxl/cxlmem.h   |   13 ++++
> >  3 files changed, 152 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index c940a4911fee..daae6e533146 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -7,6 +7,8 @@
> >  #include "cxlmem.h"
> >  #include "core.h"
> >  
> > +static DECLARE_RWSEM(cxl_dpa_rwsem);
> 
> I've not checked many files, but pci.c has equivalent static defines after
> the DOC: entry so for consistency move this below that?

ok.

> 
> 
> > +
> >  /**
> >   * DOC: cxl core hdm
> >   *
> > @@ -128,10 +130,108 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
> >  
> > +/*
> > + * Must be called in a context that synchronizes against this decoder's
> > + * port ->remove() callback (like an endpoint decoder sysfs attribute)
> > + */
> > +static void cxl_dpa_release(void *cxled);
> > +static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_action)
> > +{
> > +	struct cxl_port *port = cxled_to_port(cxled);
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct resource *res = cxled->dpa_res;
> > +
> > +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> > +
> > +	if (remove_action)
> > +		devm_remove_action(&port->dev, cxl_dpa_release, cxled);
> 
> This code organization is more surprising than I'd like. Why not move this to
> a wrapper that is like devm_kfree() and similar which do the free now and
> remove from the devm list?

True. I see how this got here incrementally, but this end state can
definitely now be fixed up to be more devm idiomatic.

> 
> static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> {
> 	struct cxl_port *port = cxled_to_port(cxled);
> 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> 	struct resource *res = cxled->dpa_res;
> 
> 	if (cxled->skip)
> 		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
> 				 cxled->skip);
> 	cxled->skip = 0;
> 	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> 	cxled->dpa_res = NULL;
> }
> 
> /* possibly add some underscores to this name to indicate it's special
>    in when you can safely call it */
> static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> {
> 	struct cxl_port *port = cxled_to_port(cxled);
> 	lockdep_assert_held_write(&cxl_dpa_rwsem);
> 	devm_remove_action(&port->dev, cxl_dpa_release, cxled);
> 	__cxl_dpa_release(cxled);
> }
> 
> static void cxl_dpa_release(void *cxled)
> {
> 	down_write(&cxl_dpa_rwsem);
> 	__cxl_dpa_release(cxled, false);
> 	up_write(&cxl_dpa_rwsem);
> }
> 
> > +
> > +	if (cxled->skip)
> > +		__release_region(&cxlds->dpa_res, res->start - cxled->skip,
> > +				 cxled->skip);
> > +	cxled->skip = 0;
> > +	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> > +	cxled->dpa_res = NULL;
> > +}
> > +
> > +static void cxl_dpa_release(void *cxled)
> > +{
> > +	down_write(&cxl_dpa_rwsem);
> > +	__cxl_dpa_release(cxled, false);
> > +	up_write(&cxl_dpa_rwsem);
> > +}
> > +
> > +static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > +			     resource_size_t base, resource_size_t len,
> > +			     resource_size_t skip)
> > +{
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_port *port = cxled_to_port(cxled);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct device *dev = &port->dev;
> > +	struct resource *res;
> > +
> > +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> > +
> > +	if (!len)
> > +		return 0;
> > +
> > +	if (cxled->dpa_res) {
> > +		dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
> > +			port->id, cxled->cxld.id, cxled->dpa_res);
> > +		return -EBUSY;
> > +	}
> > +
> > +	if (skip) {
> > +		res = __request_region(&cxlds->dpa_res, base - skip, skip,
> > +				       dev_name(dev), 0);
> 
> 
> Interface that uses a backwards definition of skip as what to skip before
> the base parameter is a little odd can we rename base parameter to something
> like 'current_top' then have base = current_top + skip?  current_top naming
> not great though...

How about just name it "skipped" instead of "skip"? As the parameter is
how many bytes were skipped to allow a new allocation to start at base.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects
  2022-06-29 15:28   ` Jonathan Cameron
@ 2022-07-10  3:45     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10  3:45 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:46:59 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Recall that the Device Physical Address (DPA) space of a CXL Memory
> > Expander is potentially partitioned into a volatile and persistent
> > portion. A decoder maps a Host Physical Address (HPA) range to a DPA
> > range and that translation depends on the value of all previous (lower
> > instance number) decoders before the current one.
> > 
> > In preparation for allowing dynamic provisioning of regions, decoders
> > need an ABI to indicate which DPA partition a decoder targets. This ABI
> > needs to be prepared for the possibility that some other agent committed
> > and locked a decoder that spans the partition boundary.
> > 
> > Add 'decoderX.Y/mode' to endpoint decoders that indicates which
> > partition 'ram' / 'pmem' the decoder targets, or 'mixed' if the decoder
> > currently spans the partition boundary.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> A few trivial things inline though I'm not super keen on it being
> introduced RO for just 2 patches...  You could pull forwards
> the outline of the store() to avoid that slight oddity, but
> I'm not that bothered if it is a pain to do.

It's either RO as a temporary state, or a pointless store() as a
temporary state, I think the former is more palatable for bisecting.

> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl |   16 ++++++++++++++++
> >  drivers/cxl/core/hdm.c                  |   10 ++++++++++
> >  drivers/cxl/core/port.c                 |   20 ++++++++++++++++++++
> >  drivers/cxl/cxl.h                       |    9 +++++++++
> >  4 files changed, 55 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 1fd5984b6158..091459216e11 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -164,3 +164,19 @@ Description:
> >  		expander memory (type-3). The 'target_type' attribute indicates
> >  		the current setting which may dynamically change based on what
> >  		memory regions are activated in this decode hierarchy.
> > +
> > +
> 
> Single blank line used for previous entries. Note this carries to other
> later patches.

It was deliberate, I should go back and fix up the previous ones to be
consistent.

> 
> 
> > +What:		/sys/bus/cxl/devices/decoderX.Y/mode
> > +Date:		May, 2022
> > +KernelVersion:	v5.20
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> > +		translates from a host physical address range, to a device local
> > +		address range. Device-local address ranges are further split
> > +		into a 'ram' (volatile memory) range and 'pmem' (persistent
> > +		memory) range. The 'mode' attribute emits one of 'ram', 'pmem',
> > +		'mixed', or 'none'. The 'mixed' indication is for error cases
> > +		when a decoder straddles the volatile/persistent partition
> > +		boundary, and 'none' indicates the decoder is not actively
> > +		decoding, or no DPA allocation policy has been set.
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index daae6e533146..3f929231b822 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -204,6 +204,16 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> >  	cxled->dpa_res = res;
> >  	cxled->skip = skip;
> >  
> > +	if (resource_contains(&cxlds->pmem_res, res))
> > +		cxled->mode = CXL_DECODER_PMEM;
> > +	else if (resource_contains(&cxlds->ram_res, res))
> > +		cxled->mode = CXL_DECODER_RAM;
> > +	else {
> > +		dev_dbg(dev, "decoder%d.%d: %pr mixed\n", port->id,
> > +			cxled->cxld.id, cxled->dpa_res);
> 
> Why debug for one case and not the the others?

It's an exceptional, "should never happen" case, but can be benign so
it does not rise to the level of dev_warn(). However, if someone is
reporting a bug and I ask for a debug log, this is something odd that
I'd like to see highlighted.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 17/46] cxl/hdm: Track next decoder to allocate
  2022-06-29 15:31   ` Jonathan Cameron
@ 2022-07-10  3:55     ` Dan Williams
  2022-07-19 14:27       ` Jonathan Cameron
  2022-07-10 16:34     ` Dan Williams
  1 sibling, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-07-10  3:55 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:47:07 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > The CXL specification enforces that endpoint decoders are committed in
> > hw instance id order. In preparation for adding dynamic DPA allocation,
> > record the hw instance id in endpoint decoders, and enforce allocations
> > to occur in hw instance id order.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> dpa_end isn't a good name given the value isn't a Device Physical Address.
> 
> Otherwise looks fine,
> 
> Jonathan
> 
> > ---
> >  drivers/cxl/core/hdm.c  |   14 ++++++++++++++
> >  drivers/cxl/core/port.c |    1 +
> >  drivers/cxl/cxl.h       |    2 ++
> >  3 files changed, 17 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 3f929231b822..8805afe63ebf 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -153,6 +153,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_ac
> >  	cxled->skip = 0;
> >  	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> >  	cxled->dpa_res = NULL;
> > +	port->dpa_end--;
> >  }
> >  
> >  static void cxl_dpa_release(void *cxled)
> > @@ -183,6 +184,18 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> >  		return -EBUSY;
> >  	}
> >  
> > +	if (port->dpa_end + 1 != cxled->cxld.id) {
> > +		/*
> > +		 * Assumes alloc and commit order is always in hardware instance
> > +		 * order per expectations from 8.2.5.12.20 Committing Decoder
> > +		 * Programming that enforce decoder[m] committed before
> > +		 * decoder[m+1] commit start.
> > +		 */
> > +		dev_dbg(dev, "decoder%d.%d: expected decoder%d.%d\n", port->id,
> > +			cxled->cxld.id, port->id, port->dpa_end + 1);
> > +		return -EBUSY;
> > +	}
> > +
> >  	if (skip) {
> >  		res = __request_region(&cxlds->dpa_res, base - skip, skip,
> >  				       dev_name(dev), 0);
> > @@ -213,6 +226,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> >  			cxled->cxld.id, cxled->dpa_res);
> >  		cxled->mode = CXL_DECODER_MIXED;
> >  	}
> > +	port->dpa_end++;
> >  
> >  	return 0;
> >  }
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 9d632c8c580b..54bf032cbcb7 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -485,6 +485,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
> >  	port->uport = uport;
> >  	port->component_reg_phys = component_reg_phys;
> >  	ida_init(&port->decoder_ida);
> > +	port->dpa_end = -1;
> >  	INIT_LIST_HEAD(&port->dports);
> >  	INIT_LIST_HEAD(&port->endpoints);
> >  
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index aa223166f7ef..d8edbdaa6208 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -326,6 +326,7 @@ struct cxl_nvdimm {
> >   * @dports: cxl_dport instances referenced by decoders
> >   * @endpoints: cxl_ep instances, endpoints that are a descendant of this port
> >   * @decoder_ida: allocator for decoder ids
> > + * @dpa_end: cursor to track highest allocated decoder for allocation ordering
> 
> dpa_end not a good name as this isn't a Device Physical Address.

Ok, renamed it to 'hdm_end'. Suitable to add "Reviewed-by" now?

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 17/46] cxl/hdm: Track next decoder to allocate
  2022-06-29 15:31   ` Jonathan Cameron
  2022-07-10  3:55     ` Dan Williams
@ 2022-07-10 16:34     ` Dan Williams
  1 sibling, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 16:34 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:47:07 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > The CXL specification enforces that endpoint decoders are committed in
> > hw instance id order. In preparation for adding dynamic DPA allocation,
> > record the hw instance id in endpoint decoders, and enforce allocations
> > to occur in hw instance id order.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> dpa_end isn't a good name given the value isn't a Device Physical Address.
> 
> Otherwise looks fine,
> 
> Jonathan
> 
> > ---
> >  drivers/cxl/core/hdm.c  |   14 ++++++++++++++
> >  drivers/cxl/core/port.c |    1 +
> >  drivers/cxl/cxl.h       |    2 ++
> >  3 files changed, 17 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 3f929231b822..8805afe63ebf 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -153,6 +153,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_ac
> >  	cxled->skip = 0;
> >  	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> >  	cxled->dpa_res = NULL;
> > +	port->dpa_end--;
> >  }
> >  
> >  static void cxl_dpa_release(void *cxled)
> > @@ -183,6 +184,18 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> >  		return -EBUSY;
> >  	}
> >  
> > +	if (port->dpa_end + 1 != cxled->cxld.id) {
> > +		/*
> > +		 * Assumes alloc and commit order is always in hardware instance
> > +		 * order per expectations from 8.2.5.12.20 Committing Decoder
> > +		 * Programming that enforce decoder[m] committed before
> > +		 * decoder[m+1] commit start.
> > +		 */
> > +		dev_dbg(dev, "decoder%d.%d: expected decoder%d.%d\n", port->id,
> > +			cxled->cxld.id, port->id, port->dpa_end + 1);
> > +		return -EBUSY;
> > +	}
> > +
> >  	if (skip) {
> >  		res = __request_region(&cxlds->dpa_res, base - skip, skip,
> >  				       dev_name(dev), 0);
> > @@ -213,6 +226,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> >  			cxled->cxld.id, cxled->dpa_res);
> >  		cxled->mode = CXL_DECODER_MIXED;
> >  	}
> > +	port->dpa_end++;
> >  
> >  	return 0;
> >  }
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 9d632c8c580b..54bf032cbcb7 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -485,6 +485,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
> >  	port->uport = uport;
> >  	port->component_reg_phys = component_reg_phys;
> >  	ida_init(&port->decoder_ida);
> > +	port->dpa_end = -1;
> >  	INIT_LIST_HEAD(&port->dports);
> >  	INIT_LIST_HEAD(&port->endpoints);
> >  
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index aa223166f7ef..d8edbdaa6208 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -326,6 +326,7 @@ struct cxl_nvdimm {
> >   * @dports: cxl_dport instances referenced by decoders
> >   * @endpoints: cxl_ep instances, endpoints that are a descendant of this port
> >   * @decoder_ida: allocator for decoder ids
> > + * @dpa_end: cursor to track highest allocated decoder for allocation ordering
> 
> dpa_end not a good name as this isn't a Device Physical Address.

Ok, renamed it like this:

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 546a022ef17f..22b7fc8ed510 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -333,7 +333,7 @@ struct cxl_nvdimm {
  * @dports: cxl_dport instances referenced by decoders
  * @endpoints: cxl_ep instances, endpoints that are a descendant of this port
  * @decoder_ida: allocator for decoder ids
- * @dpa_end: cursor to track highest allocated decoder for allocation ordering
+ * @hdm_end: track last allocated HDM decoder instance for allocation ordering
  * @component_reg_phys: component register capability base address (optional)
  * @dead: last ep has been removed, force port re-creation
  * @depth: How deep this port is relative to the root. depth 0 is the root.
@@ -345,7 +345,7 @@ struct cxl_port {
        struct list_head dports;
        struct list_head endpoints;
        struct ida decoder_ida;
-       int dpa_end;
+       int hdm_end;
        resource_size_t component_reg_phys;
        bool dead;
        unsigned int depth;

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder
  2022-06-29 15:56   ` Jonathan Cameron
@ 2022-07-10 16:53     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 16:53 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:47:18 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > The region provisioning flow will roughly follow a sequence of:
> > 
> > 1/ Allocate DPA to a set of decoders
> > 
> > 2/ Allocate HPA to a region
> > 
> > 3/ Associate decoders with a region and validate that the DPA allocations
> >    and topologies match the parameters of the region.
> > 
> > For now, this change (step 1) arranges for DPA capacity to be allocated
> > and deleted from non-committed decoders based on the decoder's mode /
> > partition selection. Capacity is allocated from the lowest DPA in the
> > partition and any 'pmem' allocation blocks out all remaining ram
> > capacity in its 'skip' setting. DPA allocations are enforced in decoder
> > instance order. I.e. decoder N + 1 always starts at a higher DPA than
> > instance N, and deleting allocations must proceed from the
> > highest-instance allocated decoder to the lowest.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> The error value setting in here might save a few lines, but to me it
> is less readable than setting rc in each error path.
> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl |   37 +++++++
> >  drivers/cxl/core/core.h                 |    7 +
> >  drivers/cxl/core/hdm.c                  |  160 +++++++++++++++++++++++++++++++
> >  drivers/cxl/core/port.c                 |   73 ++++++++++++++
> >  4 files changed, 275 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 091459216e11..85844f9bc00b 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -171,7 +171,7 @@ Date:		May, 2022
> >  KernelVersion:	v5.20
> >  Contact:	linux-cxl@vger.kernel.org
> >  Description:
> > -		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> > +		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> >  		translates from a host physical address range, to a device local
> >  		address range. Device-local address ranges are further split
> >  		into a 'ram' (volatile memory) range and 'pmem' (persistent
> > @@ -180,3 +180,38 @@ Description:
> >  		when a decoder straddles the volatile/persistent partition
> >  		boundary, and 'none' indicates the decoder is not actively
> >  		decoding, or no DPA allocation policy has been set.
> > +
> > +		'mode' can be written, when the decoder is in the 'disabled'
> > +		state, with either 'ram' or 'pmem' to set the boundaries for the
> > +		next allocation.
> > +
> 
> As before, documentation above this in the file only uses single line break between
> entries.

Yeah, I'll go fix those up as a precursor patch.

> 
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/dpa_resource
> > +Date:		May, 2022
> > +KernelVersion:	v5.20
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint",
> > +		and its 'dpa_size' attribute is non-zero, this attribute
> > +		indicates the device physical address (DPA) base address of the
> > +		allocation.
> 
> Why _resource rather than _base or _start?

To mimic PCI and NVDIMM sysfs that calls it a 'resource' address.

> 
> > +
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/dpa_size
> > +Date:		May, 2022
> > +KernelVersion:	v5.20
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> > +		translates from a host physical address range, to a device local
> > +		address range. The range, base address plus length in bytes, of
> > +		DPA allocated to this decoder is conveyed in these 2 attributes.
> > +		Allocations can be mutated as long as the decoder is in the
> > +		disabled state. A write to 'size' releases the previous DPA
> 
> 'dpa_size' ?

Yes.

> 
> > +		allocation and then attempts to allocate from the free capacity
> > +		in the device partition referred to by 'decoderX.Y/mode'.
> > +		Allocate and free requests can only be performed on the highest
> > +		instance number disabled decoder with non-zero size. I.e.
> > +		allocations are enforced to occur in increasing 'decoderX.Y/id'
> > +		order and frees are enforced to occur in decreasing
> > +		'decoderX.Y/id' order.
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 1a50c0fc399c..47cf0c286fc3 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -17,6 +17,13 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s);
> >  void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> >  				   resource_size_t length);
> >  
> > +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> > +		     enum cxl_decoder_mode mode);
> > +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size);
> > +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
> > +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
> > +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
> > +
> >  int cxl_memdev_init(void);
> >  void cxl_memdev_exit(void);
> >  void cxl_mbox_init(void);
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 8805afe63ebf..ceb4c28abc1b 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -248,6 +248,166 @@ static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> >  	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
> >  }
> >  
> > +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled)
> > +{
> > +	resource_size_t size = 0;
> > +
> > +	down_read(&cxl_dpa_rwsem);
> > +	if (cxled->dpa_res)
> > +		size = resource_size(cxled->dpa_res);
> > +	up_read(&cxl_dpa_rwsem);
> > +
> > +	return size;
> > +}
> > +
> > +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled)
> 
> Instinct would be to expect this to return the resource, not the start.
> Rename?

I can add _start, but this is only servicing the dpa_resource_show()
sysfs operation, and there 'resource' as a the base address has ABI
history.

> 
> 
> > +{
> > +	resource_size_t base = -1;
> > +
> > +	down_read(&cxl_dpa_rwsem);
> > +	if (cxled->dpa_res)
> > +		base = cxled->dpa_res->start;
> > +	up_read(&cxl_dpa_rwsem);
> > +
> > +	return base;
> > +}
> > +
> > +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> > +{
> > +	int rc = -EBUSY;
> > +	struct device *dev = &cxled->cxld.dev;
> > +	struct cxl_port *port = to_cxl_port(dev->parent);
> > +
> > +	down_write(&cxl_dpa_rwsem);
> > +	if (!cxled->dpa_res) {
> > +		rc = 0;
> > +		goto out;
> > +	}
> > +	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
> > +		dev_dbg(dev, "decoder enabled\n");
> 
> I'd prefer explicit setting of rc = -EBUSY in the two
> 'error' paths to make it really clear when looking at these
> that they are treated as errors.

Ok.

> 
> > +		goto out;
> > +	}
> > +	if (cxled->cxld.id != port->dpa_end) {
> > +		dev_dbg(dev, "expected decoder%d.%d\n", port->id,
> > +			port->dpa_end);
> > +		goto out;
> > +	}
> > +	__cxl_dpa_release(cxled, true);
> > +	rc = 0;
> > +out:
> > +	up_write(&cxl_dpa_rwsem);
> > +	return rc;
> > +}
> > +
> > +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> > +		     enum cxl_decoder_mode mode)
> > +{
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct device *dev = &cxled->cxld.dev;
> > +	int rc = -EBUSY;
> 
> As above, I'd prefer seeing error set in each error path rther
> than it being set in a few locations and having to go look
> for which value it currently has.  To me having the
> error code next to the condition is much easier to follow.

Fair enough.

> 
> > +
> > +	switch (mode) {
> > +	case CXL_DECODER_RAM:
> > +	case CXL_DECODER_PMEM:
> > +		break;
> > +	default:
> > +		dev_dbg(dev, "unsupported mode: %d\n", mode);
> > +		return -EINVAL;
> > +	}
> > +
> > +	down_write(&cxl_dpa_rwsem);
> > +	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
> > +		goto out;
> > +	/*
> > +	 * Only allow modes that are supported by the current partition
> > +	 * configuration
> > +	 */
> > +	rc = -ENXIO;
> > +	if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) {
> > +		dev_dbg(dev, "no available pmem capacity\n");
> > +		goto out;
> > +	}
> > +	if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) {
> > +		dev_dbg(dev, "no available ram capacity\n");
> > +		goto out;
> > +	}
> > +
> > +	cxled->mode = mode;
> > +	rc = 0;
> > +out:
> > +	up_write(&cxl_dpa_rwsem);
> > +
> > +	return rc;
> > +}
> > +
> > +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> > +{
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	resource_size_t free_ram_start, free_pmem_start;
> > +	struct cxl_port *port = cxled_to_port(cxled);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct device *dev = &cxled->cxld.dev;
> > +	resource_size_t start, avail, skip;
> > +	struct resource *p, *last;
> > +	int rc = -EBUSY;
> > +
> > +	down_write(&cxl_dpa_rwsem);
> > +	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
> > +		dev_dbg(dev, "decoder enabled\n");
> > +		goto out;
> 
> 
> -EBUSY only used in this path, so clearer to me to push that setting down
> to in this  error path.

Ok.

Folded in these changes:

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 85844f9bc00b..3fa6da73751e 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -207,7 +207,7 @@ Description:
 		address range. The range, base address plus length in bytes, of
 		DPA allocated to this decoder is conveyed in these 2 attributes.
 		Allocations can be mutated as long as the decoder is in the
-		disabled state. A write to 'size' releases the previous DPA
+		disabled state. A write to 'dpa_size' releases the previous DPA
 		allocation and then attempts to allocate from the free capacity
 		in the device partition referred to by 'decoderX.Y/mode'.
 		Allocate and free requests can only be performed on the highest
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 47cf0c286fc3..65bcaecec405 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -22,7 +22,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
 int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size);
 int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
 resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
-resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled);
+resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled);
 
 int cxl_memdev_init(void);
 void cxl_memdev_exit(void);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 0c1ff3c0142f..e9281557781d 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -270,7 +270,7 @@ resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled)
 	return size;
 }
 
-resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled)
+resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
 {
 	resource_size_t base = -1;
 
@@ -284,9 +284,9 @@ resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled)
 
 int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
 {
-	int rc = -EBUSY;
+	struct cxl_port *port = cxled_to_port(cxled);
 	struct device *dev = &cxled->cxld.dev;
-	struct cxl_port *port = to_cxl_port(dev->parent);
+	int rc;
 
 	down_write(&cxl_dpa_rwsem);
 	if (!cxled->dpa_res) {
@@ -295,11 +295,13 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
 	}
 	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
 		dev_dbg(dev, "decoder enabled\n");
+		rc = -EBUSY;
 		goto out;
 	}
-	if (cxled->cxld.id != port->dpa_end) {
+	if (cxled->cxld.id != port->hdm_end) {
 		dev_dbg(dev, "expected decoder%d.%d\n", port->id,
-			port->dpa_end);
+			port->hdm_end);
+		rc = -EBUSY;
 		goto out;
 	}
 	devm_cxl_dpa_release(cxled);
@@ -315,7 +317,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
 	struct device *dev = &cxled->cxld.dev;
-	int rc = -EBUSY;
+	int rc;
 
 	switch (mode) {
 	case CXL_DECODER_RAM:
@@ -327,19 +329,23 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
 	}
 
 	down_write(&cxl_dpa_rwsem);
-	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
+	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
+		rc = -EBUSY;
 		goto out;
+	}
+
 	/*
 	 * Only allow modes that are supported by the current partition
 	 * configuration
 	 */
-	rc = -ENXIO;
 	if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) {
 		dev_dbg(dev, "no available pmem capacity\n");
+		rc = -ENXIO;
 		goto out;
 	}
 	if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) {
 		dev_dbg(dev, "no available ram capacity\n");
+		rc = -ENXIO;
 		goto out;
 	}
 
@@ -360,11 +366,12 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
 	struct device *dev = &cxled->cxld.dev;
 	resource_size_t start, avail, skip;
 	struct resource *p, *last;
-	int rc = -EBUSY;
+	int rc;
 
 	down_write(&cxl_dpa_rwsem);
 	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
 		dev_dbg(dev, "decoder enabled\n");
+		rc = -EBUSY;
 		goto out;
 	}
 
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 130989846cce..feed86737202 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -215,7 +215,7 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at
 			    char *buf)
 {
 	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
-	u64 base = cxl_dpa_resource(cxled);
+	u64 base = cxl_dpa_resource_start(cxled);
 
 	return sysfs_emit(buf, "%#llx\n", base);
 }

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem'
  2022-06-29 16:08   ` Jonathan Cameron
@ 2022-07-10 17:09     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 17:09 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:47:33 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Dump the device-physial-address map for a CXL expander in /proc/iomem
> > style format. E.g.:
> > 
> >   cat /sys/kernel/debug/cxl/mem1/dpamem
> >   00000000-0fffffff : ram
> >   10000000-1fffffff : pmem
> 
> Nice in general, but...
> 
> When I just checked what this looked like on my test setup. I'm 
> seeing
> 00000000-0ffffff : pmem
>   00000000-0fffff : endpoint3
> 
> Seems odd to see an endpoint nested below a pmem.  Wrong name somewhere
> in a later patch. I'd expect that to be a decoder rather than the endpoint...
> If I spot where that comes from whilst reviewing I'll call it out, but
> didn't want to forget to raise it.

Ah, yes, agree should be the decoder name not the port name for the
allocation.

The bug was actually back in the introduction of cxl_dpa_reserve().
Folded in the following to "[PATCH 14/46] cxl/hdm: Enumerate allocated
DPA":

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 1b902966db78..8a677f5f3942 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -180,7 +180,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 
        if (skipped) {
                res = __request_region(&cxlds->dpa_res, base - skipped, skipped,
-                                      dev_name(dev), 0);
+                                      dev_name(&cxled->cxld.dev), 0);
                if (!res) {
                        dev_dbg(dev,
                                "decoder%d.%d: failed to reserve skipped space\n",
@@ -188,7 +188,8 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
                        return -EBUSY;
                }
        }
-       res = __request_region(&cxlds->dpa_res, base, len, dev_name(dev), 0);
+       res = __request_region(&cxlds->dpa_res, base, len,
+                              dev_name(&cxled->cxld.dev), 0);
        if (!res) {
                dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
                        port->id, cxled->cxld.id);

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory
  2022-06-29 16:11   ` Jonathan Cameron
@ 2022-07-10 17:19     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 17:19 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:47:40 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > A recent QEMU upgrade resulted in collisions between QEMU's chosen
> > location for PCI MMIO and cxl_test's fake address location for emulated
> > CXL purposes. This was great for testing resource collisions, but not so
> > great for continuing to test the nominal cases. Move cxl_test to the
> > top-of-memory where it is less likely to collide with other resources.
> *snigger*
> 
> Seems reasonable, though I'm sure someone else will have the same
> idea for some other usecase and we'll keep moving this around...
> Ah well.

Indeed.

> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks.

> 
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  tools/testing/cxl/test/cxl.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> > index f52a5dd69d36..b6e6bc02a507 100644
> > --- a/tools/testing/cxl/test/cxl.c
> > +++ b/tools/testing/cxl/test/cxl.c
> > @@ -632,7 +632,8 @@ static __init int cxl_test_init(void)
> >  		goto err_gen_pool_create;
> >  	}
> >  
> > -	rc = gen_pool_add(cxl_mock_pool, SZ_512G, SZ_64G, NUMA_NO_NODE);
> > +	rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G,
> > +			  SZ_64G, NUMA_NO_NODE);
> >  	if (rc)
> >  		goto err_gen_pool_add;
> >  
> > 
> 
> 



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 24/46] tools/testing/cxl: Fix decoder default state
  2022-06-29 16:22   ` Jonathan Cameron
@ 2022-07-10 17:33     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 17:33 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:48:01 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > The 'enabled' state is reserved for committed decoders. By default,
> > cxl_test decoders are uncommitted at init time.
> > 
> > Fixes: 7c7d68db0254 ("tools/testing/cxl: Enumerate mock decoders")
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Whilst sanity checking this I notcie we have
> CXL_DECODER_F_MASK but never use it. Might be worth dropping...

Yes, that definition look useless.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 25/46] cxl/port: Record dport in endpoint references
  2022-06-29 16:49   ` Jonathan Cameron
@ 2022-07-10 18:40     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 18:40 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 19:48:07 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Recall that the primary role of the cxl_mem driver is to probe if the
> > given endoint is connected to a CXL port topology. In that process it
> > walks its device ancestry to its PCI root port. If that root port is
> > also a CXL root port then the probe process adds cxl_port object
> > instances at switch in the path between to the root and the endpoint. As
> > those cxl_port instances are added, or if a previous enumeration
> > attempt already created the port a 'struct cxl_ep' instance is
> port, a 
> 
> would make this more readable.

Agree.

> 
> > registered with that port to track the endpoints interested in that
> > port.
> > 
> > At the time the cxl_ep is registered the downstream egress path from the
> > port to the endpoint is known. Take the opportunity to record that
> > information as it will be needed for dynamic programming of decoder
> > targets during region provisioning.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> Otherwise, one comment on function naming not reflecting what it does
> inline.
> 
> Jonathan
> 
> > ---
> >  drivers/cxl/core/port.c |   52 ++++++++++++++++++++++++++++++++---------------
> >  drivers/cxl/cxl.h       |    2 ++
> >  2 files changed, 37 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 4e4e26ca507c..c54e1dbf92cb 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -866,8 +866,9 @@ static struct cxl_ep *find_ep(struct cxl_port *port, struct device *ep_dev)
> >  	return NULL;
> >  }
> >  
> > -static int add_ep(struct cxl_port *port, struct cxl_ep *new)
> > +static int add_ep(struct cxl_ep *new)
> >  {
> > +	struct cxl_port *port = new->dport->port;
> >  	struct cxl_ep *dup;
> >  
> >  	device_lock(&port->dev);
> > @@ -885,14 +886,14 @@ static int add_ep(struct cxl_port *port, struct cxl_ep *new)
> >  
> >  /**
> >   * cxl_add_ep - register an endpoint's interest in a port
> > - * @port: a port in the endpoint's topology ancestry
> > + * @dport: the dport that routes to @ep_dev
> >   * @ep_dev: device representing the endpoint
> >   *
> >   * Intermediate CXL ports are scanned based on the arrival of endpoints.
> >   * When those endpoints depart the port can be destroyed once all
> >   * endpoints that care about that port have been removed.
> >   */
> > -static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)
> > +static int cxl_add_ep(struct cxl_dport *dport, struct device *ep_dev)
> >  {
> >  	struct cxl_ep *ep;
> >  	int rc;
> > @@ -903,8 +904,9 @@ static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)
> >  
> >  	INIT_LIST_HEAD(&ep->list);
> >  	ep->ep = get_device(ep_dev);
> > +	ep->dport = dport;
> >  
> > -	rc = add_ep(port, ep);
> > +	rc = add_ep(ep);
> >  	if (rc)
> >  		cxl_ep_release(ep);
> >  	return rc;
> > @@ -913,11 +915,13 @@ static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)
> >  struct cxl_find_port_ctx {
> >  	const struct device *dport_dev;
> >  	const struct cxl_port *parent_port;
> > +	struct cxl_dport **dport;
> >  };
> >  
> >  static int match_port_by_dport(struct device *dev, const void *data)
> >  {
> 
> This seems a little oddly name for a function that 'returns'
> the dport via ctx when a match is found.

...but it's called by __find_cxl_port(), so the dport returned by ctx is
just extra metadata ancillary to the first order port lookup.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 28/46] cxl/port: Move dport tracking to an xarray
  2022-06-30  9:18   ` Jonathan Cameron
@ 2022-07-10 19:06     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 19:06 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:32 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Reduce the complexity and the overhead of walking the topology to
> > determine endpoint connectivity to root decoder interleave
> > configurations.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Hi Dan,
> 
> A few minor comments inline around naming and also one query on why
> the refactor or reap_ports is connected to the xarray change.
> 
> Thanks,
> 
> Jonathan
> 
> > ---
> >  drivers/cxl/acpi.c      |  2 +-
> >  drivers/cxl/core/hdm.c  |  6 ++-
> >  drivers/cxl/core/port.c | 88 ++++++++++++++++++-----------------------
> >  drivers/cxl/cxl.h       | 12 +++---
> >  4 files changed, 51 insertions(+), 57 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 09fe92177d03..92ad1f359faf 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -197,7 +197,7 @@ static int add_host_bridge_uport(struct device *match, void *arg)
> >  	if (!bridge)
> >  		return 0;
> >  
> > -	dport = cxl_find_dport_by_dev(root_port, match);
> > +	dport = cxl_dport_load(root_port, match);
> 
> Load is kind of specific to the xarray.  I'd be tempted to keep it to
> original find naming.

ok.

> 
> 
> >  	if (!dport) {
> >  		dev_dbg(host, "host bridge expected and not found\n");
> >  		return 0;
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index c0164f9b2195..672bf3e97811 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -50,8 +50,9 @@ static int add_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> >  int devm_cxl_add_passthrough_decoder(struct cxl_port *port)
> >  {
> >  	struct cxl_switch_decoder *cxlsd;
> > -	struct cxl_dport *dport;
> > +	struct cxl_dport *dport = NULL;
> >  	int single_port_map[1];
> > +	unsigned long index;
> >  
> >  	cxlsd = cxl_switch_decoder_alloc(port, 1);
> >  	if (IS_ERR(cxlsd))
> > @@ -59,7 +60,8 @@ int devm_cxl_add_passthrough_decoder(struct cxl_port *port)
> >  
> >  	device_lock_assert(&port->dev);
> >  
> > -	dport = list_first_entry(&port->dports, typeof(*dport), list);
> > +	xa_for_each(&port->dports, index, dport)
> > +		break;
> >  	single_port_map[0] = dport->port_id;
> >  
> >  	return add_hdm_decoder(port, &cxlsd->cxld, single_port_map);
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index ea3ab9baf232..d2f6898940fa 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -452,6 +452,7 @@ static void cxl_port_release(struct device *dev)
> >  	xa_for_each(&port->endpoints, index, ep)
> >  		cxl_ep_remove(port, ep);
> >  	xa_destroy(&port->endpoints);
> > +	xa_destroy(&port->dports);
> >  	ida_free(&cxl_port_ida, port->id);
> >  	kfree(port);
> >  }
> > @@ -566,7 +567,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
> >  	port->component_reg_phys = component_reg_phys;
> >  	ida_init(&port->decoder_ida);
> >  	port->dpa_end = -1;
> > -	INIT_LIST_HEAD(&port->dports);
> > +	xa_init(&port->dports);
> >  	xa_init(&port->endpoints);
> >  
> >  	device_initialize(dev);
> > @@ -696,17 +697,13 @@ static int match_root_child(struct device *dev, const void *match)
> >  		return 0;
> >  
> >  	port = to_cxl_port(dev);
> > -	device_lock(dev);
> > -	list_for_each_entry(dport, &port->dports, list) {
> > -		iter = match;
> > -		while (iter) {
> > -			if (iter == dport->dport)
> > -				goto out;
> > -			iter = iter->parent;
> > -		}
> > +	iter = match;
> > +	while (iter) {
> > +		dport = cxl_dport_load(port, iter);
> > +		if (dport)
> > +			break;
> > +		iter = iter->parent;
> >  	}
> > -out:
> > -	device_unlock(dev);
> >  
> >  	return !!iter;
> >  }
> > @@ -730,9 +727,10 @@ EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL);
> >  static struct cxl_dport *find_dport(struct cxl_port *port, int id)
> >  {
> >  	struct cxl_dport *dport;
> > +	unsigned long index;
> >  
> >  	device_lock_assert(&port->dev);
> > -	list_for_each_entry (dport, &port->dports, list)
> > +	xa_for_each(&port->dports, index, dport)
> >  		if (dport->port_id == id)
> >  			return dport;
> >  	return NULL;
> > @@ -741,18 +739,21 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id)
> >  static int add_dport(struct cxl_port *port, struct cxl_dport *new)
> >  {
> >  	struct cxl_dport *dup;
> > +	int rc;
> >  
> >  	device_lock_assert(&port->dev);
> >  	dup = find_dport(port, new->port_id);
> > -	if (dup)
> > +	if (dup) {
> >  		dev_err(&port->dev,
> >  			"unable to add dport%d-%s non-unique port id (%s)\n",
> >  			new->port_id, dev_name(new->dport),
> >  			dev_name(dup->dport));
> > -	else
> > -		list_add_tail(&new->list, &port->dports);
> > +		rc = -EBUSY;
> 
> Direct return slightly simpler and reduce indent on next bit plus makes
> this more obviously an 'error condition' by indenting it.

Looks good, yes.

> 
> > +	} else
> > +		rc = xa_insert(&port->dports, (unsigned long)new->dport, new,
> > +			       GFP_KERNEL);
> >  
> > -	return dup ? -EEXIST : 0;
> > +	return rc;
> >  }
> >  
> >  /*
> > @@ -779,10 +780,8 @@ static void cxl_dport_remove(void *data)
> >  	struct cxl_dport *dport = data;
> >  	struct cxl_port *port = dport->port;
> >  
> > +	xa_erase(&port->dports, (unsigned long) dport->dport);
> >  	put_device(dport->dport);
> > -	cond_cxl_root_lock(port);
> > -	list_del(&dport->list);
> > -	cond_cxl_root_unlock(port);
> >  }
> >  
> >  static void cxl_dport_unlink(void *data)
> > @@ -834,7 +833,6 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
> >  	if (!dport)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > -	INIT_LIST_HEAD(&dport->list);
> >  	dport->dport = dport_dev;
> >  	dport->port_id = port_id;
> >  	dport->component_reg_phys = component_reg_phys;
> > @@ -925,7 +923,7 @@ static int match_port_by_dport(struct device *dev, const void *data)
> >  		return 0;
> >  
> >  	port = to_cxl_port(dev);
> > -	dport = cxl_find_dport_by_dev(port, ctx->dport_dev);
> > +	dport = cxl_dport_load(port, ctx->dport_dev);
> >  	if (ctx->dport)
> >  		*ctx->dport = dport;
> >  	return dport != NULL;
> > @@ -1025,19 +1023,27 @@ EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
> >   * for a port to be unregistered is when all memdevs beneath that port have gone
> >   * through ->remove(). This "bottom-up" removal selectively removes individual
> >   * child ports manually. This depends on devm_cxl_add_port() to not change is
> > - * devm action registration order.
> > + * devm action registration order, and for dports to have already been
> > + * destroyed by reap_dports().
> >   */
> > -static void delete_switch_port(struct cxl_port *port, struct list_head *dports)
> > +static void delete_switch_port(struct cxl_port *port)
> > +{
> > +	devm_release_action(port->dev.parent, cxl_unlink_uport, port);
> > +	devm_release_action(port->dev.parent, unregister_port, port);
> > +}
> > +
> > +static void reap_dports(struct cxl_port *port)
> >  {
> > -	struct cxl_dport *dport, *_d;
> > +	struct cxl_dport *dport;
> > +	unsigned long index;
> > +
> > +	device_lock_assert(&port->dev);
> >  
> > -	list_for_each_entry_safe(dport, _d, dports, list) {
> > +	xa_for_each(&port->dports, index, dport) {
> >  		devm_release_action(&port->dev, cxl_dport_unlink, dport);
> >  		devm_release_action(&port->dev, cxl_dport_remove, dport);
> >  		devm_kfree(&port->dev, dport);
> >  	}
> > -	devm_release_action(port->dev.parent, cxl_unlink_uport, port);
> > -	devm_release_action(port->dev.parent, unregister_port, port);
> >  }
> >  
> >  static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> > @@ -1054,8 +1060,8 @@ static void cxl_detach_ep(void *data)
> >  	for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) {
> >  		struct device *dport_dev = grandparent(iter);
> >  		struct cxl_port *port, *parent_port;
> > -		LIST_HEAD(reap_dports);
> >  		struct cxl_ep *ep;
> > +		bool died = false;
> >  
> >  		if (!dport_dev)
> >  			break;
> > @@ -1095,15 +1101,16 @@ static void cxl_detach_ep(void *data)
> >  			 * enumerated port. Block new cxl_add_ep() and garbage
> >  			 * collect the port.
> >  			 */
> > +			died = true;
> >  			port->dead = true;
> > -			list_splice_init(&port->dports, &reap_dports);
> > +			reap_dports(port);
> 
> I'm not immediately clear on why this refactor is tied up with moving
> to the xarray.  Perhaps a comment in the commit message to add
> more detail around this?

Sure, added the following:

   Note that cxl_detach_ep(), after it determines that the last @ep has
   departed and decides to delete the port, now needs to walk the dport
   array with the device_lock() held to remove entries. Previously
   list_splice_init() could be used atomically delete all dport entries at
   once and then perform entry tear down outside the lock. There is no
   list_splice_init() equivalent for the xarray.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 29/46] cxl/port: Cache CXL host bridge data
  2022-06-30  9:21   ` Jonathan Cameron
@ 2022-07-10 19:09     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 19:09 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:33 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Region creation has need for checking host-bridge connectivity when
> > adding endpoints to regions. Record, at port creation time, the
> > host-bridge to provide a useful shortcut from any location in the
> > topology to the most-significant ancestor.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Trivial comment inline, but otherwise seems reasonable to me.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > ---
> >  drivers/cxl/core/port.c | 16 +++++++++++++++-
> >  drivers/cxl/cxl.h       |  2 ++
> >  2 files changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index d2f6898940fa..c48f217e689a 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -546,6 +546,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
> >  	if (rc < 0)
> >  		goto err;
> >  	port->id = rc;
> > +	port->uport = uport;
> >  
> >  	/*
> >  	 * The top-level cxl_port "cxl_root" does not have a cxl_port as
> > @@ -556,14 +557,27 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
> >  	dev = &port->dev;
> >  	if (parent_dport) {
> >  		struct cxl_port *parent_port = parent_dport->port;
> > +		struct cxl_port *iter;
> >  
> >  		port->depth = parent_port->depth + 1;
> >  		port->parent_dport = parent_dport;
> >  		dev->parent = &parent_port->dev;
> > +		/*
> > +		 * walk to the host bridge, or the first ancestor that knows
> > +		 * the host bridge
> > +		 */
> > +		iter = port;
> > +		while (!iter->host_bridge &&
> > +		       !is_cxl_root(to_cxl_port(iter->dev.parent)))
> > +			iter = to_cxl_port(iter->dev.parent);
> > +		if (iter->host_bridge)
> > +			port->host_bridge = iter->host_bridge;
> > +		else
> > +			port->host_bridge = iter->uport;
> > +		dev_dbg(uport, "host-bridge: %s\n", dev_name(port->host_bridge));
> >  	} else
> >  		dev->parent = uport;
> >  
> > -	port->uport = uport;
> >  	port->component_reg_phys = component_reg_phys;
> >  	ida_init(&port->decoder_ida);
> >  	port->dpa_end = -1;
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 8e2c1b393552..0211cf0d3574 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -331,6 +331,7 @@ struct cxl_nvdimm {
> >   * @component_reg_phys: component register capability base address (optional)
> >   * @dead: last ep has been removed, force port re-creation
> >   * @depth: How deep this port is relative to the root. depth 0 is the root.
> > + * @host_bridge: Shortcut to the platform attach point for this port
> >   */
> >  struct cxl_port {
> >  	struct device dev;
> > @@ -344,6 +345,7 @@ struct cxl_port {
> >  	resource_size_t component_reg_phys;
> >  	bool dead;
> >  	unsigned int depth;
> > +	struct device *host_bridge;
> Would feel more natural up next to the struct device *uport element of cxl_port.

Done.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
  2022-06-30  9:26   ` Jonathan Cameron
@ 2022-07-10 20:40     ` Dan Williams
  2022-07-19 14:32       ` Jonathan Cameron
  0 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-07-10 20:40 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:34 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > From: Ben Widawsky <bwidawsk@kernel.org>
> > 
> > The region provisioning flow involves selecting interleave ways +
> > granularity settings for a region, and then programming the decoder
> > topology to meet those constraints, if possible. For example, root
> > decoders set the minimum interleave ways + granularity for any hosted
> > regions.
> > 
> > Given decoder programming is not atomic and collisions can occur between
> > multiple requesting regions userpace will be resonsible for conflict
> > resolution and it needs these attributes to make those decisions.
> > 
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > [djbw: reword changelog, make read-only, add sysfs ABI documentaion]
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> some comments on docs.
> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl | 23 +++++++++++++++++++++++
> >  drivers/cxl/core/port.c                 | 23 +++++++++++++++++++++++
> >  2 files changed, 46 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 85844f9bc00b..2a4e4163879f 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -215,3 +215,26 @@ Description:
> >  		allocations are enforced to occur in increasing 'decoderX.Y/id'
> >  		order and frees are enforced to occur in decreasing
> >  		'decoderX.Y/id' order.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/interleave_ways
> > +Date:		May, 2022
> > +KernelVersion:	v5.20
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(RO) The number of targets across which this decoder's host
> > +		physical address (HPA) memory range is interleaved. The device
> > +		maps every Nth block of HPA (of size ==
> > +		'interleave_granularity') to consecutive DPA addresses. The
> > +		decoder's position in the interleave is determined by the
> > +		device's (endpoint or switch) switch ancestry.
> 
> Perhaps make it clear what happens for host bridges (i.e. decoder position
> in interleave defined by fixed memory window.

Added: "For root decoders their interleave is specified by platform
firmware and they only specify a downstream target order for host
bridges".

> 
> > +
> > +
> > +What:		/sys/bus/cxl/devices/decoderX.Y/interleave_granularity
> > +Date:		May, 2022
> > +KernelVersion:	v5.20
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(RO) The number of consecutive bytes of host physical address
> > +		space this decoder claims at address N before awaint the next
> 
> awaint?

Surprised checkpatch did not flag this, or that I missed the checkpatch
flag.

> 
> > +		address (N + interleave_granularity * intereleave_ways).
> 
> interleave_ways
> 
> Even knowing exactly what this is, I don't understand the docs so
> perhaps reword this :)

Reworded to:

(RO) The number of consecutive bytes of host physical address space this
decoder claims at address N before the decode rotates to the next target
in the interleave at address N + interleave_granularity (assuming N is
aligned to interleave_granularity).

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints
  2022-06-30  9:48   ` Jonathan Cameron
@ 2022-07-10 21:01     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 21:01 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:36 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > The port scanning algorithm in devm_cxl_enumerate_ports() walks up the
> > topology and adds cxl_port objects starting from the root down to the
> > endpoint. When those ports are initially created they know all their
> > dports, but they do not know the downstream cxl_port instance that
> > represents the next descendant in the topology. Rework create_endpoint()
> > into devm_cxl_add_endpoint() that enumerates the downstream cxl_port
> > topology into each port's 'struct cxl_ep' record for each endpoint it
> > that the port is an ancestor.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> I'm doing my normal moaning about tidying up in a patch that makes
> a more serious change.  Ideally pull that out, but meh if it's a real pain
> I can live with it as long as you call it out in the patch description.

No worries, I would expect to be able to ask others to do the same and I
should be more careful to collect these unrelated fixups separately.

> 
> With that
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > ---
> >  drivers/cxl/core/port.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >  drivers/cxl/cxl.h       |  7 ++++++-
> >  drivers/cxl/mem.c       | 30 +-----------------------------
> >  3 files changed, 48 insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 08a380d20cf1..2e56903399c2 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1089,6 +1089,47 @@ static struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> >  	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
> >  }
> >  
> > +int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd,
> > +			  struct cxl_dport *parent_dport)
> > +{
> > +	struct cxl_port *parent_port = parent_dport->port;
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct cxl_port *endpoint, *iter, *down;
> > +	int rc;
> > +
> > +	/*
> > +	 * Now that the path to the root is established record all the
> > +	 * intervening ports in the chain.
> > +	 */
> > +	for (iter = parent_port, down = NULL; !is_cxl_root(iter);
> > +	     down = iter, iter = to_cxl_port(iter->dev.parent)) {
> > +		struct cxl_ep *ep;
> > +
> > +		ep = cxl_ep_load(iter, cxlmd);
> > +		ep->next = down;
> > +	}
> > +
> > +	endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev,
> > +				     cxlds->component_reg_phys, parent_dport);
> > +	if (IS_ERR(endpoint))
> > +		return PTR_ERR(endpoint);
> > +
> > +	dev_dbg(&cxlmd->dev, "add: %s\n", dev_name(&endpoint->dev));
> > +
> > +	rc = cxl_endpoint_autoremove(cxlmd, endpoint);
> > +	if (rc)
> > +		return rc;
> > +
> > +	if (!endpoint->dev.driver) {
> > +		dev_err(&cxlmd->dev, "%s failed probe\n",
> > +			dev_name(&endpoint->dev));
> > +		return -ENXIO;
> > +	}
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(devm_cxl_add_endpoint, CXL);
> > +
> >  static void cxl_detach_ep(void *data)
> >  {
> >  	struct cxl_memdev *cxlmd = data;
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 0211cf0d3574..f761cf78cc05 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -371,11 +371,14 @@ struct cxl_dport {
> >  /**
> >   * struct cxl_ep - track an endpoint's interest in a port
> >   * @ep: device that hosts a generic CXL endpoint (expander or accelerator)
> > - * @dport: which dport routes to this endpoint on this port
> > + * @dport: which dport routes to this endpoint on @port
> 
> fix is good, but shouldn't be in this patch really..

Rolled this into "[PATCH 25/46] cxl/port: Record dport in endpoint
references" where @dport was introduced.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 33/46] resource: Introduce alloc_free_mem_region()
  2022-06-30 10:35   ` Jonathan Cameron
@ 2022-07-10 21:58     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-10 21:58 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Jason Gunthorpe,
	Matthew Wilcox, Christoph Hellwig

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:37 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > The core of devm_request_free_mem_region() is a helper that searches for
> > free space in iomem_resource and performs __request_region_locked() on
> > the result of that search. The policy choices of the implementation
> > conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is
> > immediately marked busy, and a preference to search for the first-fit
> > free range in descending order from the top of the physical address
> > space.
> > 
> > CXL has a need for a similar allocator, but with the following tweaks:
> > 
> > 1/ Search for free space in ascending order
> > 
> > 2/ Search for free space relative to a given CXL window
> > 
> > 3/ 'insert' rather than 'request' the new resource given downstream
> >    drivers from the CXL Region driver (like the pmem or dax drivers) are
> >    responsible for request_mem_region() when they activate the memory
> >    range.
> > 
> > Rework __request_free_mem_region() into get_free_mem_region() which
> > takes a set of GFR_* (Get Free Region) flags to control the allocation
> > policy (ascending vs descending), and "busy" policy (insert_resource()
> > vs request_region()).
> > 
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> A few things inline,
> 
> Thanks,
> 
> Jonathan
> 
> > ---
> >  include/linux/ioport.h |   2 +
> >  kernel/resource.c      | 174 ++++++++++++++++++++++++++++++++---------
> >  mm/Kconfig             |   5 ++
> >  3 files changed, 146 insertions(+), 35 deletions(-)
> > 
> > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > index ec5f71f7135b..ed03518347aa 100644
> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -329,6 +329,8 @@ struct resource *devm_request_free_mem_region(struct device *dev,
> >  		struct resource *base, unsigned long size);
> >  struct resource *request_free_mem_region(struct resource *base,
> >  		unsigned long size, const char *name);
> > +struct resource *alloc_free_mem_region(struct resource *base,
> > +		unsigned long size, unsigned long align, const char *name);
> >  
> >  static inline void irqresource_disabled(struct resource *res, u32 irq)
> >  {
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index 53a534db350e..9fc990274106 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> 
> 
> > +static bool gfr_continue(struct resource *base, resource_size_t addr,
> > +			 resource_size_t size, unsigned long flags)
> > +{
> > +	if (flags & GFR_DESCENDING)
> > +		return addr > size && addr >= base->start;
> > +	return addr > addr - size &&
> 
> Is this checking for wrap around?  If so maybe a comment to call that out?

Yes, and ok.

> 
> > +	       addr <= min_t(resource_size_t, base->end,
> > +			     (1ULL << MAX_PHYSMEM_BITS) - 1);
> > +}
> > +
> > +static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
> > +				unsigned long flags)
> > +{
> > +	if (flags & GFR_DESCENDING)
> > +		return addr - size;
> > +	return addr + size;
> > +}
> > +
> > +static void remove_free_mem_region(void *_res)
> >  {
> > -	resource_size_t end, addr;
> > +	struct resource *res = _res;
> > +
> > +	if (res->parent)
> > +		remove_resource(res);
> > +	free_resource(res);
> > +}
> > +
> > +static struct resource *
> > +get_free_mem_region(struct device *dev, struct resource *base,
> > +		    resource_size_t size, const unsigned long align,
> > +		    const char *name, const unsigned long desc,
> > +		    const unsigned long flags)
> > +{
> > +	resource_size_t addr;
> >  	struct resource *res;
> >  	struct region_devres *dr = NULL;
> >  
> > -	size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
> > -	end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
> > -	addr = end - size + 1UL;
> > +	size = ALIGN(size, align);
> >  
> >  	res = alloc_resource(GFP_KERNEL);
> >  	if (!res)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > -	if (dev) {
> > +	if (dev && (flags & GFR_REQUEST_REGION)) {
> >  		dr = devres_alloc(devm_region_release,
> >  				sizeof(struct region_devres), GFP_KERNEL);
> >  		if (!dr) {
> >  			free_resource(res);
> >  			return ERR_PTR(-ENOMEM);
> >  		}
> > +	} else if (dev) {
> > +		if (devm_add_action_or_reset(dev, remove_free_mem_region, res))
> > +			return ERR_PTR(-ENOMEM);
> 
> slightly nicer to return whatever value you got back from devm_add_action_or_reset()

Yes, but it is known to only return -ENOMEM on failure and saves adding
a local @rc variable.

> 
> >  	}
> >  
> >  	write_lock(&resource_lock);
> > -	for (; addr > size && addr >= base->start; addr -= size) {
> > -		if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> > -				REGION_DISJOINT)
> > +	for (addr = gfr_start(base, size, align, flags);
> > +	     gfr_continue(base, addr, size, flags);
> > +	     addr = gfr_next(addr, size, flags)) {
> > +		if (__region_intersects(base, addr, size, 0, IORES_DESC_NONE) !=
> > +		    REGION_DISJOINT)
> >  			continue;
> >  
> > -		if (__request_region_locked(res, &iomem_resource, addr, size,
> > -						name, 0))
> > -			break;
> > +		if (flags & GFR_REQUEST_REGION) {
> > +			if (__request_region_locked(res, &iomem_resource, addr,
> > +						    size, name, 0))
> > +				break;
> >  
> > -		if (dev) {
> > -			dr->parent = &iomem_resource;
> > -			dr->start = addr;
> > -			dr->n = size;
> > -			devres_add(dev, dr);
> > -		}
> > +			if (dev) {
> > +				dr->parent = &iomem_resource;
> > +				dr->start = addr;
> > +				dr->n = size;
> > +				devres_add(dev, dr);
> > +			}
> >  
> > -		res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> > -		write_unlock(&resource_lock);
> > +			res->desc = desc;
> > +			write_unlock(&resource_lock);
> > +
> > +
> > +			/*
> > +			 * A driver is claiming this region so revoke any
> > +			 * mappings.
> > +			 */
> > +			revoke_iomem(res);
> > +		} else {
> > +			res->start = addr;
> > +			res->end = addr + size - 1;
> > +			res->name = name;
> > +			res->desc = desc;
> > +			res->flags = IORESOURCE_MEM;
> > +
> > +			/*
> > +			 * Only succeed if the resource hosts an exclusive
> > +			 * range after the insert
> > +			 */
> > +			if (__insert_resource(base, res) || res->child)
> > +				break;
> > +
> > +			write_unlock(&resource_lock);
> > +		}
> >  
> > -		/*
> > -		 * A driver is claiming this region so revoke any mappings.
> > -		 */
> > -		revoke_iomem(res);
> >  		return res;
> >  	}
> >  	write_unlock(&resource_lock);
> >  
> > -	free_resource(res);
> > -	if (dr)
> > +	if (flags & GFR_REQUEST_REGION) {
> > +		free_resource(res);
> >  		devres_free(dr);
> 
> The original if (dr) was unnecessary as devres_free() checks.
> 
> Looking just at this patch it looks like you aren't covering the
> corner case of dev == NULL and GFR_REQUEST_REGION.
> 
> Perhaps worth a tiny comment in patch description? (doesn't seem worth
> pulling this change out as a precursor given it's so small).
> Of add the extra if (dr) back in to 'document' that no change...

Added to the changelog:

As part of the consolidation of the legacy GFR_REQUEST_REGION case with
the new default of just inserting a new resource into the free space
some minor cleanups like not checking for NULL before calling
devres_free() (which does its own check) is included.

> 
> 
> > +	} else if (dev)
> > +		devm_release_action(dev, remove_free_mem_region, res);
> >  
> >  	return ERR_PTR(-ERANGE);
> >  }
> > @@ -1854,18 +1928,48 @@ static struct resource *__request_free_mem_region(struct device *dev,
> >  struct resource *devm_request_free_mem_region(struct device *dev,
> >  		struct resource *base, unsigned long size)
> >  {
> > -	return __request_free_mem_region(dev, base, size, dev_name(dev));
> > +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> > +
> > +	return get_free_mem_region(dev, base, size, GFR_DEFAULT_ALIGN,
> > +				   dev_name(dev),
> > +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
> >  }
> >  EXPORT_SYMBOL_GPL(devm_request_free_mem_region);
> >  
> >  struct resource *request_free_mem_region(struct resource *base,
> >  		unsigned long size, const char *name)
> >  {
> > -	return __request_free_mem_region(NULL, base, size, name);
> > +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> > +
> > +	return get_free_mem_region(NULL, base, size, GFR_DEFAULT_ALIGN, name,
> > +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
> >  }
> >  EXPORT_SYMBOL_GPL(request_free_mem_region);
> >  
> > -#endif /* CONFIG_DEVICE_PRIVATE */
> > +/**
> > + * alloc_free_mem_region - find a free region relative to @base
> > + * @base: resource that will parent the new resource
> > + * @size: size in bytes of memory to allocate from @base
> > + * @align: alignment requirements for the allocation
> > + * @name: resource name
> > + *
> > + * Buses like CXL, that can dynamically instantiate new memory regions,
> > + * need a method to allocate physical address space for those regions.
> > + * Allocate and insert a new resource to cover a free, unclaimed by a
> > + * descendant of @base, range in the span of @base.
> > + */
> > +struct resource *alloc_free_mem_region(struct resource *base,
> Given the extra align parameter, does it make sense to give this a naming
> that highlights that vs the other two interfaces above?
> 
> alloc_free_mem_region_aligned()

The other variants are also aligned, they just aren't variably aligned,
they are implicitly aligned to GFR_DEFAULT_ALIGN. So I think calling
this one _aligned() betrays what is happening in the other cases.

> > +				       unsigned long size, unsigned long align,
> > +				       const char *name)
> > +{
> > +	/* GFR_ASCENDING | GFR_INSERT_RESOURCE */
> 
> Given those flags don't exist and some fool like me might grep for them
> perhaps better to describe it in text
> 
> 	/* Default of ascending direction and insert resource */

Ok.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 34/46] cxl/region: Add region creation support
  2022-06-30 13:17   ` Jonathan Cameron
@ 2022-07-11  0:08     ` Dan Williams
  2022-07-19 14:42       ` Jonathan Cameron
  0 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-07-11  0:08 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:38 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > From: Ben Widawsky <bwidawsk@kernel.org>
> > 
> > CXL 2.0 allows for dynamic provisioning of new memory regions (system
> > physical address resources like "System RAM" and "Persistent Memory").
> > Whereas DDR and PMEM resources are conveyed statically at boot, CXL
> > allows for assembling and instantiating new regions from the available
> > capacity of CXL memory expanders in the system.
> > 
> > Sysfs with an "echo $region_name > $create_region_attribute" interface
> > is chosen as the mechanism to initiate the provisioning process. This
> > was chosen over ioctl() and netlink() to keep the configuration
> > interface entirely in a pseudo-fs interface, and it was chosen over
> > configfs since, aside from this one creation event, the interface is
> > read-mostly. I.e. configfs supports cases where an object is designed to
> > be provisioned each boot, like an iSCSI storage target, and CXL region
> > creation is mostly for PMEM regions which are created usually once
> > per-lifetime of a server instance.
> > 
> > Recall that the major change that CXL brings over previous
> > persistent memory architectures is the ability to dynamically define new
> > regions.  Compare that to drivers like 'nfit' where the region
> > configuration is statically defined by platform firmware.
> > 
> > Regions are created as a child of a root decoder that encompasses an
> > address space with constraints. When created through sysfs, the root
> > decoder is explicit. When created from an LSA's region structure a root
> > decoder will possibly need to be inferred by the driver.
> > 
> > Upon region creation through sysfs, a vacant region is created with a
> > unique name. Regions have a number of attributes that must be configured
> > before the region can be bound to the driver where HDM decoder program
> > is completed.
> > 
> > An example of creating a new region:
> > 
> > - Allocate a new region name:
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
> > 
> > - Create a new region by name:
> > while
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
> 
> Perhaps it is worth calling out the region ID allocator is shared
> with nvdimms and other usecases.  I'm not really sure what the advantage
> in doing that is, but it doesn't do any real harm.

The rationale is that there are several producers of memory regions
nvdimm, device-dax (hmem), and now cxl. Of those cases cxl can pass
regoins to nvdimm and nvdimm can pass regions to device-dax (pmem). If
each of those cases allocated their own region-id it would just
complicate debug for no benefit. I can add this a note to remind why
memregion_alloc() was introduced in the first instance.

> 
> > ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
> > do true; done

I recall you also asked to clarify the rationale of this complexity. It
is related to the potential proliferation of disaparate region ids, but
also a lesson learned from nvdimm which itself learned lessons from
md-raid. The lesson from md-raid in short is do not use ioctl for object
creation. After "not ioctl" the choice is configfs or a small bit of
sysfs hackery. Configfs is overkill when there is already a sysfs
hierarchy that just needs one new object injected.

Namespace creation in nvdimm pre-created "seed" devices which let the
kernel control the naming, but confused end users that wondered about
vestigial devices. This "read to learn next object name" + "write to
atomically claim and instantiate that id" cleans up that vestigial
device problem while also constraining object naming to follow memregion
id expectations.

> > 
> > - Region now exists in sysfs:
> > stat -t /sys/bus/cxl/devices/decoder0.0/$region
> > 
> > - Delete the region, and name:
> > echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
> > 
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > [djbw: simplify locking, reword changelog]
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl       |  25 +++
> >  .../driver-api/cxl/memory-devices.rst         |  11 +
> >  drivers/cxl/Kconfig                           |   5 +
> >  drivers/cxl/core/Makefile                     |   1 +
> >  drivers/cxl/core/core.h                       |  12 ++
> >  drivers/cxl/core/port.c                       |  39 +++-
> >  drivers/cxl/core/region.c                     | 199 ++++++++++++++++++
> >  drivers/cxl/cxl.h                             |  18 ++
> >  tools/testing/cxl/Kbuild                      |   1 +
> >  9 files changed, 308 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/cxl/core/region.c
> > 
> 
> ...
> 
> 
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 472ec9cb1018..ebe6197fb9b8 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -9,6 +9,18 @@ extern const struct device_type cxl_nvdimm_type;
> >  
> >  extern struct attribute_group cxl_base_attribute_group;
> >  
> > +#ifdef CONFIG_CXL_REGION
> > +extern struct device_attribute dev_attr_create_pmem_region;
> > +extern struct device_attribute dev_attr_delete_region;
> > +/*
> > + * Note must be used at the end of an attribute list, since it
> > + * terminates the list in the CONFIG_CXL_REGION=n case.
> 
> That's rather ugly.  Maybe just push the ifdef down into the c file
> where we will be shortening the list and it should be obvious what is
> going on without needing the comment?  Much as I don't like ifdef
> magic in the c files, it sometimes ends up cleaner.

No, I think ifdef in C is definitely uglier, but I also notice that
helpers like SET_SYSTEM_SLEEP_PM_OPS() are defined to be used in any
place in the list. So, I'll just duplicate that approach.

> 
> > + */
> > +#define CXL_REGION_ATTR(x) (&dev_attr_##x.attr)
> > +#else
> > +#define CXL_REGION_ATTR(x) NULL
> > +#endif
> > +
> >  struct cxl_send_command;
> >  struct cxl_mem_query_commands;
> >  int cxl_query_cmd(struct cxl_memdev *cxlmd,
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 2e56903399c2..c9207ebc3f32 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1,6 +1,7 @@
> >  // SPDX-License-Identifier: GPL-2.0-only
> >  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> >  #include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/memregion.h>
> >  #include <linux/workqueue.h>
> >  #include <linux/debugfs.h>
> >  #include <linux/device.h>
> > @@ -300,11 +301,35 @@ static struct attribute *cxl_decoder_root_attrs[] = {
> >  	&dev_attr_cap_type2.attr,
> >  	&dev_attr_cap_type3.attr,
> >  	&dev_attr_target_list.attr,
> > +	CXL_REGION_ATTR(create_pmem_region),
> > +	CXL_REGION_ATTR(delete_region),
> >  	NULL,
> >  };
> 
> >  
> >  static const struct attribute_group *cxl_decoder_root_attribute_groups[] = {
> > @@ -387,6 +412,7 @@ static void cxl_root_decoder_release(struct device *dev)
> >  {
> >  	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> >  
> > +	memregion_free(atomic_read(&cxlrd->region_id));
> >  	__cxl_decoder_release(&cxlrd->cxlsd.cxld);
> >  	kfree(cxlrd);
> >  }
> > @@ -1415,6 +1441,7 @@ static struct lock_class_key cxl_decoder_key;
> >  static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  					     unsigned int nr_targets)
> >  {
> > +	struct cxl_root_decoder *cxlrd = NULL;
> >  	struct cxl_decoder *cxld;
> >  	struct device *dev;
> >  	void *alloc;
> > @@ -1425,16 +1452,20 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  
> >  	if (nr_targets) {
> >  		struct cxl_switch_decoder *cxlsd;
> > -		struct cxl_root_decoder *cxlrd;
> >  
> >  		if (is_cxl_root(port)) {
> >  			alloc = kzalloc(struct_size(cxlrd, cxlsd.target,
> >  						    nr_targets),
> >  					GFP_KERNEL);
> >  			cxlrd = alloc;
> > -			if (cxlrd)
> > +			if (cxlrd) {
> >  				cxlsd = &cxlrd->cxlsd;
> > -			else
> > +				atomic_set(&cxlrd->region_id, -1);
> > +				rc = memregion_alloc(GFP_KERNEL);
> > +				if (rc < 0)
> > +					goto err;
> 
> Leaving region_id set to -1 seems interesting for ever
> recovering from this error.  Perhaps a comment on how the magic
> value is used.

Comment added.

> 
> > +				atomic_set(&cxlrd->region_id, rc);
> > +			} else
> >  				cxlsd = NULL;
> >  		} else {
> >  			alloc = kzalloc(struct_size(cxlsd, target, nr_targets),
> > @@ -1490,6 +1521,8 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port,
> >  
> >  	return cxld;
> >  err:
> > +	if (cxlrd && atomic_read(&cxlrd->region_id) >= 0)
> > +		memregion_free(atomic_read(&cxlrd->region_id));
> >  	kfree(alloc);
> >  	return ERR_PTR(rc);
> >  }
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > new file mode 100644
> > index 000000000000..f2a0ead20ca7
> > --- /dev/null
> > +++ b/drivers/cxl/core/region.c
> > @@ -0,0 +1,199 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> > +#include <linux/memregion.h>
> > +#include <linux/genalloc.h>
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/idr.h>
> > +#include <cxl.h>
> > +#include "core.h"
> > +
> > +/**
> > + * DOC: cxl core region
> > + *
> > + * CXL Regions represent mapped memory capacity in system physical address
> > + * space. Whereas the CXL Root Decoders identify the bounds of potential CXL
> > + * Memory ranges, Regions represent the active mapped capacity by the HDM
> > + * Decoder Capability structures throughout the Host Bridges, Switches, and
> > + * Endpoints in the topology.
> > + */
> > +
> > +static struct cxl_region *to_cxl_region(struct device *dev);
> > +
> > +static void cxl_region_release(struct device *dev)
> > +{
> > +	struct cxl_region *cxlr = to_cxl_region(dev);
> > +
> > +	memregion_free(cxlr->id);
> > +	kfree(cxlr);
> > +}
> > +
> > +static const struct device_type cxl_region_type = {
> > +	.name = "cxl_region",
> > +	.release = cxl_region_release,
> > +};
> > +
> > +bool is_cxl_region(struct device *dev)
> > +{
> > +	return dev->type == &cxl_region_type;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(is_cxl_region, CXL);
> > +
> > +static struct cxl_region *to_cxl_region(struct device *dev)
> > +{
> > +	if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
> > +			  "not a cxl_region device\n"))
> > +		return NULL;
> > +
> > +	return container_of(dev, struct cxl_region, dev);
> > +}
> > +
> > +static void unregister_region(void *dev)
> > +{
> > +	device_unregister(dev);
> > +}
> > +
> > +static struct lock_class_key cxl_region_key;
> > +
> > +static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int id)
> > +{
> > +	struct cxl_region *cxlr;
> > +	struct device *dev;
> > +
> > +	cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL);
> > +	if (!cxlr) {
> > +		memregion_free(id);
> 
> That's a bit nasty as it gives the function side effects. Perhaps some
> comments in the callers of this to highlight that memregion will either be freed
> in here or handled over to the device.

Added to the devm_cxl_add_region() kdoc that the memregion id is freed
on failure.

> 
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	dev = &cxlr->dev;
> > +	device_initialize(dev);
> > +	lockdep_set_class(&dev->mutex, &cxl_region_key);
> > +	dev->parent = &cxlrd->cxlsd.cxld.dev;
> > +	device_set_pm_not_required(dev);
> > +	dev->bus = &cxl_bus_type;
> > +	dev->type = &cxl_region_type;
> > +	cxlr->id = id;
> > +
> > +	return cxlr;
> > +}
> > +
> > +/**
> > + * devm_cxl_add_region - Adds a region to a decoder
> > + * @cxlrd: root decoder
> > + * @id: memregion id to create
> > + * @mode: mode for the endpoint decoders of this region
> 
> Missing docs for type

Added.

> 
> > + *
> > + * This is the second step of region initialization. Regions exist within an
> > + * address space which is mapped by a @cxlrd.
> > + *
> > + * Return: 0 if the region was added to the @cxlrd, else returns negative error
> > + * code. The region will be named "regionZ" where Z is the unique region number.
> > + */
> > +static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
> > +					      int id,
> > +					      enum cxl_decoder_mode mode,
> > +					      enum cxl_decoder_type type)
> > +{
> > +	struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
> > +	struct cxl_region *cxlr;
> > +	struct device *dev;
> > +	int rc;
> > +
> > +	cxlr = cxl_region_alloc(cxlrd, id);
> > +	if (IS_ERR(cxlr))
> > +		return cxlr;
> > +	cxlr->mode = mode;
> > +	cxlr->type = type;
> > +
> > +	dev = &cxlr->dev;
> > +	rc = dev_set_name(dev, "region%d", id);
> > +	if (rc)
> > +		goto err;
> > +
> > +	rc = device_add(dev);
> > +	if (rc)
> > +		goto err;
> > +
> > +	rc = devm_add_action_or_reset(port->uport, unregister_region, cxlr);
> > +	if (rc)
> > +		return ERR_PTR(rc);
> > +
> > +	dev_dbg(port->uport, "%s: created %s\n",
> > +		dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
> > +	return cxlr;
> > +
> > +err:
> > +	put_device(dev);
> > +	return ERR_PTR(rc);
> > +}
> > +
> 
> > +static ssize_t create_pmem_region_store(struct device *dev,
> > +					struct device_attribute *attr,
> > +					const char *buf, size_t len)
> > +{
> > +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> > +	struct cxl_region *cxlr;
> > +	unsigned int id, rc;
> > +
> > +	rc = sscanf(buf, "region%u\n", &id);
> > +	if (rc != 1)
> > +		return -EINVAL;
> > +
> > +	rc = memregion_alloc(GFP_KERNEL);
> > +	if (rc < 0)
> > +		return rc;
> > +
> > +	if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
> > +		memregion_free(rc);
> > +		return -EBUSY;
> > +	}
> > +
> > +	cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM,
> > +				   CXL_DECODER_EXPANDER);
> > +	if (IS_ERR(cxlr))
> > +		return PTR_ERR(cxlr);
> > +
> > +	return len;
> > +}
> > +DEVICE_ATTR_RW(create_pmem_region);
> > +
> > +static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld,
> 
> Perhaps rename cxld here to make it clear it's a root decoder only.

Yes, in fact it should just be a 'struct cxl_root_decoder' type
argument.

> 
> > +						  const char *name)
> > +{
> > +	struct device *region_dev;
> > +
> > +	region_dev = device_find_child_by_name(&cxld->dev, name);
> > +	if (!region_dev)
> > +		return ERR_PTR(-ENODEV);
> > +
> > +	return to_cxl_region(region_dev);
> > +}
> > +
> > +static ssize_t delete_region_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_port *port = to_cxl_port(dev->parent);
> > +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> As above, given it's the root decoder can we name it to make that
> obvious?

Right, this is now:

struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 36/46] cxl/region: Add interleave ways attribute
  2022-06-30 13:44   ` Jonathan Cameron
@ 2022-07-11  0:32     ` Dan Williams
  2022-07-19 14:47       ` Jonathan Cameron
  0 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-07-11  0:32 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:40 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > From: Ben Widawsky <bwidawsk@kernel.org>
> > 
> > Add an ABI to allow the number of devices that comprise a region to be
> > set.
> 
> Should at least mention interleave_granularity is being added as well!

Added.

> 
> > 
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > [djbw: reword changelog]
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> Random diversion inline...
> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl |  21 ++++
> >  drivers/cxl/core/region.c               | 128 ++++++++++++++++++++++++
> >  drivers/cxl/cxl.h                       |  33 ++++++
> >  3 files changed, 182 insertions(+)
> 
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index f75978f846b9..78af42454760 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -7,6 +7,7 @@
> 
> 
> > +static ssize_t interleave_granularity_store(struct device *dev,
> > +					    struct device_attribute *attr,
> > +					    const char *buf, size_t len)
> > +{
> > +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> > +	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> > +	struct cxl_region *cxlr = to_cxl_region(dev);
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	int rc, val;
> > +	u16 ig;
> > +
> > +	rc = kstrtoint(buf, 0, &val);
> > +	if (rc)
> > +		return rc;
> > +
> > +	rc = granularity_to_cxl(val, &ig);
> > +	if (rc)
> > +		return rc;
> > +
> > +	/* region granularity must be >= root granularity */
> 
> In general I think that's an implementation choice.  Sure today
> we only support it this way, but it's perfectly possible to build
> setups where that's not the case.

If the region granularity is smaller than the host bridge interleave
granularity it means that multiple devices per host bridge are needed to
satsify a single "slot" in the interleave. Valid? Yes. Useful for Linux
to support, not clear.

> Maybe the comment should say that this code goes with an
> implementation choice inline with the software guide (that argues you
> will always prefer small ig for interleaving at the host to make best
> use of bandwidth etc).

No, I would prefer that as far as the Linux implementation is concerned
the software-guide does not exist. In the sense that the Linux
implementation choices supersede and are otherwise a superset of what
the guide recommends.

Also, for the same reason that the code does not anticipate future
implementation possibilities, neither should the comments. It is
sufficient to just change this comment when / if the implemetation stops
expecting region granularity >= root granularity.

> Interestingly the code I was previously testing QEMU with
> allowed that option (might have been only option that worked).
> That code was a mixture of Ben's earlier version and my own hacks.
> It probably doesn't make sense to support other ways of picking
> the interleaving granularity until / if we ever get a request
> to do so. 
> 
> I think it results in a different device ordering.
> 
> Ordering with this
> 
>     Host
>      | 4k
>     / \
>    /   \  
>   HB   HB  8k
>   |     |
>  / \   / \
> 0  2   1  3
> 
> Ordering with Larger granularity CFMWS over finer granularity HB
> 
>     Host
>      | 8k
>     / \
>    /   \ 
>   HB   HB 4k
>   |     |
>  / \   / \
> 0  1   2  3
> 
> Not clear why you'd do the second one though :)  So can ignore for now.

All I can think of is "ZOMG! My platform failed and the only one I have
to recover my data has HB interleaves with larger granularity than my
failed system!". Otherwise, I expect cross-platform CXL persistent
memory recovery to be so rare as to not need to spend time too much time
worrying about it now. It seems a straightforward constraint to lift at
a later date without any risk to breaking the ABI.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions
  2022-06-30 13:56   ` Jonathan Cameron
@ 2022-07-11  0:47     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-11  0:47 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:41 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > After a region's interleave parameters (ways and granularity) are set,
> > add a way for regions to allocate HPA from the free capacity in their
> > decoder. The allocator for this capacity reuses the 'struct resource'
> > based allocator used for CONFIG_DEVICE_PRIVATE.
> > 
> > Once the tuple of "ways, granularity, and size" is set the
> > region configuration transitions to the CXL_CONFIG_INTERLEAVE_ACTIVE
> > state which is a precursor to allowing endpoint decoders to be added to
> > a region.
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> A few comments on the interface inline.
> 
> Thanks,
> 
> Jonathan
> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl |  25 ++++
> >  drivers/cxl/Kconfig                     |   3 +
> >  drivers/cxl/core/region.c               | 148 +++++++++++++++++++++++-
> >  drivers/cxl/cxl.h                       |   2 +
> >  4 files changed, 177 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 46d5295c1149..3658facc9944 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -294,3 +294,28 @@ Description:
> >  		(RW) Configures the number of devices participating in the
> >  		region is set by writing this value. Each device will provide
> >  		1/interleave_ways of storage for the region.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/regionZ/size
> > +Date:		May, 2022
> > +KernelVersion:	v5.20
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(RW) System physical address space to be consumed by the region.
> > +		When written to, this attribute will allocate space out of the
> > +		CXL root decoder's address space. When read the size of the
> > +		address space is reported and should match the span of the
> > +		region's resource attribute. Size shall be set after the
> > +		interleave configuration parameters.
> 
> There seem to be constraints that say you have to set this to 0 and then something
> else later to force a resize. That should be mentioned here or gotten rid of.

Yes, a constraint that precludes questions about what happens to
existing data during a resize. The force trip through a zero allocation
is to help codify that the kernel makes no guarantees about the state of
data over a resize.

Updated the text to:

    (RW) System physical address space to be consumed by the region.
    When written trigger the driver to allocate space out of the
    parent root decoder's address space. When read the size of the
    address space is reported and should match the span of the
    region's resource attribute. Size shall be set after the
    interleave configuration parameters. Once set it cannot be
    changed, only freed by writing 0. The kernel makes no guarantees
    that data is maintained over an address space freeing event, and
    there is no guarantee that a free followed by an allocate
    results in the same address being allocated.

> 
> 
> > +
> > +
> > +What:		/sys/bus/cxl/devices/regionZ/resource
> > +Date:		May, 2022
> > +KernelVersion:	v5.20
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(RO) A region is a contiguous partition of a CXL root decoder
> > +		address space. Region capacity is allocated by writing to the
> > +		size attribute, the resulting physical address space determined
> > +		by the driver is reflected here. It is therefore not useful to
> > +		read this before writing a value to the size attribute.
> 
> I don't much like naming a "base address" resource.  I'd expect resource to contain
> both base and size whereas this only has the base address of the region.

I think there is too much precedent to rename at this point.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions
  2022-06-30 14:31   ` Jonathan Cameron
@ 2022-07-11  1:12     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-11  1:12 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:42 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > The region provisioning process involves allocating DPA to a set of
> > endpoint decoders, and HPA plus the region geometry to a region device.
> > Then the decoder is assigned to the region. At this point several
> > validation steps can be performed to validate that the decoder is
> > suitable to participate in the region.
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl |  19 ++
> >  drivers/cxl/core/core.h                 |   6 +
> >  drivers/cxl/core/hdm.c                  |  13 +-
> >  drivers/cxl/core/port.c                 |  12 +-
> >  drivers/cxl/core/region.c               | 286 +++++++++++++++++++++++-
> >  drivers/cxl/cxl.h                       |  11 +
> >  6 files changed, 342 insertions(+), 5 deletions(-)
> > 
> 
> A few fixes seems to have ended up in wrong patch.
> Other trivial typos etc inline plus what looks to be an
> item left from a todo list...
> 
> ...
> 
> 
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index a604c24ff918..4830365f3857 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -24,6 +24,7 @@
> >   * but is only visible for persistent regions.
> >   * 1. Interleave granularity
> >   * 2. Interleave size
> > + * 3. Decoder targets
> >   */
> >  
> >  /*
> > @@ -138,6 +139,8 @@ static ssize_t interleave_ways_show(struct device *dev,
> >  	return rc;
> >  }
> >  
> > +static const struct attribute_group *get_cxl_region_target_group(void);
> > +
> >  static ssize_t interleave_ways_store(struct device *dev,
> >  				     struct device_attribute *attr,
> >  				     const char *buf, size_t len)
> > @@ -146,7 +149,7 @@ static ssize_t interleave_ways_store(struct device *dev,
> >  	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> >  	struct cxl_region *cxlr = to_cxl_region(dev);
> >  	struct cxl_region_params *p = &cxlr->params;
> > -	int rc, val;
> > +	int rc, val, save;
> >  	u8 iw;
> >  
> >  	rc = kstrtoint(buf, 0, &val);
> > @@ -175,9 +178,13 @@ static ssize_t interleave_ways_store(struct device *dev,
> >  		goto out;
> >  	}
> >  
> > +	save = p->interleave_ways;
> >  	p->interleave_ways = val;
> > +	rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
> > +	if (rc)
> > +		p->interleave_ways = save;
> >  out:
> > -	up_read(&cxl_region_rwsem);
> > +	up_write(&cxl_region_rwsem);
> 
> Bug in earlier patch?

yes, fix now folded earlier. Good spot.

> 
> >  	if (rc)
> >  		return rc;
> >  	return len;
> > @@ -234,7 +241,7 @@ static ssize_t interleave_granularity_store(struct device *dev,
> >  
> >  	p->interleave_granularity = val;
> >  out:
> > -	up_read(&cxl_region_rwsem);
> > +	up_write(&cxl_region_rwsem);
> 
> Bug in earlier patch? 

yup.

> 
> >  	if (rc)
> >  		return rc;
> >  	return len;
> > @@ -393,9 +400,262 @@ static const struct attribute_group cxl_region_group = {
> >  	.is_visible = cxl_region_visible,
> >  };
> 
> ...
> 
> > +/*
> > + * - Check that the given endpoint is attached to a host-bridge identified
> > + *   in the root interleave.
> 
>  Comment on something to fix?  Or stale comment that can be dropped?

Stale comment, now dropped.

> 
> > + */
> > +static int cxl_region_attach(struct cxl_region *cxlr,
> > +			     struct cxl_endpoint_decoder *cxled, int pos)
> > +{
> > +	struct cxl_region_params *p = &cxlr->params;
> > +
> > +	if (cxled->mode == CXL_DECODER_DEAD) {
> > +		dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> > +		return -ENODEV;
> > +	}
> > +
> > +	if (pos >= p->interleave_ways) {
> > +		dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
> > +			p->interleave_ways);
> > +		return -ENXIO;
> > +	}
> > +
> > +	if (p->targets[pos] == cxled)
> > +		return 0;
> > +
> > +	if (p->targets[pos]) {
> > +		struct cxl_endpoint_decoder *cxled_target = p->targets[pos];
> > +		struct cxl_memdev *cxlmd_target = cxled_to_memdev(cxled_target);
> > +
> > +		dev_dbg(&cxlr->dev, "position %d already assigned to %s:%s\n",
> > +			pos, dev_name(&cxlmd_target->dev),
> > +			dev_name(&cxled_target->cxld.dev));
> > +		return -EBUSY;
> > +	}
> > +
> > +	p->targets[pos] = cxled;
> > +	cxled->pos = pos;
> > +	p->nr_targets++;
> > +
> > +	return 0;
> > +}
> > +
> > +static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
> > +{
> > +	struct cxl_region *cxlr = cxled->cxld.region;
> > +	struct cxl_region_params *p;
> > +
> > +	lockdep_assert_held_write(&cxl_region_rwsem);
> > +
> > +	if (!cxlr)
> > +		return;
> > +
> > +	p = &cxlr->params;
> > +	get_device(&cxlr->dev);
> > +
> > +	if (cxled->pos < 0 || cxled->pos >= p->interleave_ways ||
> > +	    p->targets[cxled->pos] != cxled) {
> > +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +
> > +		dev_WARN_ONCE(&cxlr->dev, 1, "expected %s:%s at position %d\n",
> > +			      dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> > +			      cxled->pos);
> > +		goto out;
> > +	}
> > +
> > +	p->targets[cxled->pos] = NULL;
> > +	p->nr_targets--;
> > +
> > +	/* notify the region driver that one of its targets has deparated */
> 
> departed?

Yup, thanks.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 40/46] cxl/region: Attach endpoint decoders
  2022-06-30 16:34   ` Jonathan Cameron
@ 2022-07-11  2:02     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-11  2:02 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:44 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > CXL regions (interleave sets) are made up of a set of memory devices
> > where each device maps a portion of the interleave with one of its
> > decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
> > As endpoint decoders are identified by a provisioning tool they can be
> > added to a region provided the region interleave properties are set
> > (way, granularity, HPA) and DPA has been assigned to the decoder.
> > 
> > The attach event triggers several validation checks, for example:
> > - is the DPA sized appropriately for the region
> > - is the decoder reachable via the host-bridges identified by the
> >   region's root decoder
> > - is the device already active in a different region position slot
> > - are there already regions with a higher HPA active on a given port
> >   (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)
> > 
> > ...and the attach event affords an opportunity to collect data and
> > resources relevant to later programming the target lists in switch
> > decoders, for example:
> > - allocate a decoder at each cxl_port in the decode chain
> > - for a given switch port, how many the region's endpoints are hosted
> >   through the port
> > - how many unique targets (next hops) does a port need to map to reach
> >   those endpoints
> > 
> > The act of reconciling this information and deploying it to the decoder
> > configuration is saved for a follow-on patch.
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/core.h   |   7 +
> >  drivers/cxl/core/port.c   |  10 +-
> >  drivers/cxl/core/region.c | 338 +++++++++++++++++++++++++++++++++++++-
> >  drivers/cxl/cxl.h         |  20 +++
> >  drivers/cxl/cxlmem.h      |   5 +
> >  5 files changed, 372 insertions(+), 8 deletions(-)
> > 
> 
> 
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 4830365f3857..65bf84abad57 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -428,6 +428,254 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
> >  	return rc;
> >  }
> >  
> 
> > +
> > +static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> > +					       struct cxl_region *cxlr)
> > +{
> > +	struct cxl_region_ref *cxl_rr;
> > +
> > +	cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL);
> > +	if (!cxl_rr)
> > +		return NULL;
> > +	cxl_rr->port = port;
> > +	cxl_rr->region = cxlr;
> > +	xa_init(&cxl_rr->endpoints);
> > +	return cxl_rr;
> > +}
> > +
> > +static void free_region_ref(struct cxl_region_ref *cxl_rr)
> > +{
> > +	struct cxl_port *port = cxl_rr->port;
> > +	struct cxl_region *cxlr = cxl_rr->region;
> > +	struct cxl_decoder *cxld = cxl_rr->decoder;
> > +
> > +	dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n");
> > +	if (cxld->region == cxlr) {
> > +		cxld->region = NULL;
> > +		put_device(&cxlr->dev);
> > +	}
> > +
> > +	xa_erase(&port->regions, (unsigned long)cxlr);
> 
> Why do we have things in a free_ function that aren't simply removing things
> created in the alloc()?  I'd kind of expect this to be in a cxl_rr_del() or similar.

Fixed it the other way by just open-coding cxl_rr_add() into
alloc_region_ref(). There was no good reason to have them as separate
steps.

> 
> > +	xa_destroy(&cxl_rr->endpoints);
> > +	kfree(cxl_rr);
> > +}
> > +
> > +static int cxl_rr_add(struct cxl_region_ref *cxl_rr)
> > +{
> > +	struct cxl_port *port = cxl_rr->port;
> > +	struct cxl_region *cxlr = cxl_rr->region;
> > +
> > +	return xa_insert(&port->regions, (unsigned long)cxlr, cxl_rr,
> > +			 GFP_KERNEL);
> > +}
> > +
> > +static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
> > +			 struct cxl_endpoint_decoder *cxled)
> > +{
> > +	int rc;
> > +	struct cxl_port *port = cxl_rr->port;
> > +	struct cxl_region *cxlr = cxl_rr->region;
> > +	struct cxl_decoder *cxld = cxl_rr->decoder;
> > +	struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled));
> > +
> > +	rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep,
> > +			 GFP_KERNEL);
> > +	if (rc)
> > +		return rc;
> > +	cxl_rr->nr_eps++;
> > +
> > +	if (!cxld->region) {
> > +		cxld->region = cxlr;
> > +		get_device(&cxlr->dev);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int cxl_port_attach_region(struct cxl_port *port,
> > +				  struct cxl_region *cxlr,
> > +				  struct cxl_endpoint_decoder *cxled, int pos)
> > +{
> > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +	struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
> > +	struct cxl_region_ref *cxl_rr = NULL, *iter;
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	struct cxl_decoder *cxld = NULL;
> > +	unsigned long index;
> > +	int rc = -EBUSY;
> > +
> > +	lockdep_assert_held_write(&cxl_region_rwsem);
> 
> This function is complex enough that maybe it would benefit from
> some saying what each part is doing.

...added a kdoc block:

/**
 * cxl_port_attach_region() - track a region's interest in a port by endpoint
 * @port: port to add a new region reference 'struct cxl_region_ref'
 * @cxlr: region to attach to @port
 * @cxled: endpoint decoder used to create or further pin a region reference
 * @pos: interleave position of @cxled in @cxlr
 *
 * The attach event is an opportunity to validate CXL decode setup
 * constraints and record metadata needed for programming HDM decoders,
 * in particular decoder target lists.
 *
 * The steps are:
 * - validate that there are no other regions with a higher HPA already
 *   associated with @port
 * - establish a region reference if one is not already present
 *   - additionally allocate a decoder instance that will host @cxlr on
 *     @port
 * - pin the region reference by the endpoint
 * - account for how many entries in @port's target list are needed to
 *   cover all of the added endpoints.
 */

> 
> > +
> > +	xa_for_each(&port->regions, index, iter) {
> > +		struct cxl_region_params *ip = &iter->region->params;
> > +
> > +		if (iter->region == cxlr)
> > +			cxl_rr = iter;
> > +		if (ip->res->start > p->res->start) {
> > +			dev_dbg(&cxlr->dev,
> > +				"%s: HPA order violation %s:%pr vs %pr\n",
> > +				dev_name(&port->dev),
> > +				dev_name(&iter->region->dev), ip->res, p->res);
> > +			return -EBUSY;
> > +		}
> > +	}
> > +
> > +	if (cxl_rr) {
> > +		struct cxl_ep *ep_iter;
> > +		int found = 0;
> > +
> > +		cxld = cxl_rr->decoder;
> > +		xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
> > +			if (ep_iter == ep)
> > +				continue;
> > +			if (ep_iter->next == ep->next) {
> > +				found++;
> > +				break;
> > +			}
> > +		}
> > +
> > +		/*
> > +		 * If this is a new target or if this port is direct connected
> > +		 * to this endpoint then add to the target count.
> > +		 */
> > +		if (!found || !ep->next)
> > +			cxl_rr->nr_targets++;
> > +	} else {
> > +		cxl_rr = alloc_region_ref(port, cxlr);
> > +		if (!cxl_rr) {
> > +			dev_dbg(&cxlr->dev,
> > +				"%s: failed to allocate region reference\n",
> > +				dev_name(&port->dev));
> > +			return -ENOMEM;
> > +		}
> > +		rc = cxl_rr_add(cxl_rr);
> > +		if (rc) {
> > +			dev_dbg(&cxlr->dev,
> > +				"%s: failed to track region reference\n",
> > +				dev_name(&port->dev));
> > +			kfree(cxl_rr);
> > +			return rc;
> > +		}
> > +	}
> > +
> > +	if (!cxld) {
> > +		if (port == cxled_to_port(cxled))
> > +			cxld = &cxled->cxld;
> > +		else
> > +			cxld = cxl_region_find_decoder(port, cxlr);
> > +		if (!cxld) {
> > +			dev_dbg(&cxlr->dev, "%s: no decoder available\n",
> > +				dev_name(&port->dev));
> > +			goto out_erase;
> > +		}
> > +
> > +		if (cxld->region) {
> > +			dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n",
> > +				dev_name(&port->dev), dev_name(&cxld->dev),
> > +				dev_name(&cxld->region->dev));
> > +			rc = -EBUSY;
> > +			goto out_erase;
> > +		}
> > +
> > +		cxl_rr->decoder = cxld;
> > +	}
> > +
> > +	rc = cxl_rr_ep_add(cxl_rr, cxled);
> > +	if (rc) {
> > +		dev_dbg(&cxlr->dev,
> > +			"%s: failed to track endpoint %s:%s reference\n",
> > +			dev_name(&port->dev), dev_name(&cxlmd->dev),
> > +			dev_name(&cxld->dev));
> > +		goto out_erase;
> > +	}
> > +
> > +	return 0;
> > +out_erase:
> > +	if (cxl_rr->nr_eps == 0)
> > +		free_region_ref(cxl_rr);
> > +	return rc;
> > +}
> > +
> 
> >  
> >  static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
> >  {
> > +	struct cxl_port *iter, *ep_port = cxled_to_port(cxled);
> >  	struct cxl_region *cxlr = cxled->cxld.region;
> >  	struct cxl_region_params *p;
> >  
> > @@ -481,6 +811,10 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
> >  	p = &cxlr->params;
> >  	get_device(&cxlr->dev);
> >  
> > +	for (iter = ep_port; !is_cxl_root(iter);
> > +	     iter = to_cxl_port(iter->dev.parent))
> > +		cxl_port_detach_region(iter, cxlr, cxled);
> > +
> >  	if (cxled->pos < 0 || cxled->pos >= p->interleave_ways ||
> >  	    p->targets[cxled->pos] != cxled) {
> >  		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > @@ -491,6 +825,8 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled)
> >  		goto out;
> >  	}
> >  
> > +	if (p->state == CXL_CONFIG_ACTIVE)
> 
> I 'think' the state is either CXL_CONFIG_ACTIVE or CXL_CONFIG_INTERLEAVE_ACTIVE,
> so you could set this unconditionally.  A comment here on permissible
> states would be useful for future reference.

cxl_region_detach() should not care if the region state is idle. Not
that it will happen in the current code, but the only expectation is
that if the region is active and a endpoint departs it must be
downgraded in config state. CXL_CONFIG_IDLE is permissible, although not
expected. I do not think a comment is needed if the "if (p->state ==
CXL_CONFIG_ACTIVE)" check stays.

> 
> > +		p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
> >  	p->targets[cxled->pos] = NULL;
> >  	p->nr_targets--;
> 
> 



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 42/46] cxl/hdm: Commit decoder state to hardware
  2022-06-30 17:05   ` Jonathan Cameron
@ 2022-07-11  3:02     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-11  3:02 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:46 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > After all the soft validation of the region has completed, convey the
> > region configuration to hardware while being careful to commit decoders
> > in specification mandated order. In addition to programming the endpoint
> > decoder base-addres, intereleave ways and granularity, the switch
> > decoder target lists are also established.
> > 
> > While the kernel can enforce spec-mandated commit order, it can not
> > enforce spec-mandated reset order. For example, the kernel can't stop
> > someone from removing an endpoint device that is occupying decoderN in a
> > switch decoder where decoderN+1 is also committed. To reset decoderN,
> > decoderN+1 must be torn down first. That "tear down the world"
> > implementation is saved for a follow-on patch.
> > 
> > Callback operations are provided for the 'commit' and 'reset'
> > operations. While those callbacks may prove useful for CXL accelerators
> > (Type-2 devices with memory) the primary motivation is to enable a
> > simple way for cxl_test to intercept those operations.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> Trivial comments only in this one.
> 
> Jonathan
> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl |  16 ++
> >  drivers/cxl/core/hdm.c                  | 218 ++++++++++++++++++++++++
> >  drivers/cxl/core/port.c                 |   1 +
> >  drivers/cxl/core/region.c               | 189 ++++++++++++++++++--
> >  drivers/cxl/cxl.h                       |  11 ++
> >  tools/testing/cxl/test/cxl.c            |  46 +++++
> >  6 files changed, 471 insertions(+), 10 deletions(-)
> > 
> 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 2ee62dde8b23..72f98f1a782c 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -129,6 +129,8 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
> >  		return ERR_PTR(-ENXIO);
> >  	}
> >  
> > +	dev_set_drvdata(&port->dev, cxlhdm);
> 
> Trivial, but dev == &port->dev I think so you might as well use dev.

Sure.

> This feels like a bit of a hack as it just so happens nothing else is
> in the port drvdata.  Maybe it's better to add a pointer from
> port to cxlhdm?

It's only valid while the port is attached to the cxl_port driver which
sets it apart from other port data.

> 
> > +
> >  	return cxlhdm;
> >  }
> >  EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL);
> > @@ -444,6 +446,213 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> >  	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
> >  }
> >  
> 
> > +static int cxl_decoder_commit(struct cxl_decoder *cxld)
> > +{
> > +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > +	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> > +	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
> > +	int id = cxld->id, rc;
> > +	u64 base, size;
> > +	u32 ctrl;
> > +
> > +	if (cxld->flags & CXL_DECODER_F_ENABLE)
> > +		return 0;
> > +
> > +	if (port->commit_end + 1 != id) {
> > +		dev_dbg(&port->dev,
> > +			"%s: out of order commit, expected decoder%d.%d\n",
> > +			dev_name(&cxld->dev), port->id, port->commit_end + 1);
> > +		return -EBUSY;
> > +	}
> > +
> > +	down_read(&cxl_dpa_rwsem);
> > +	/* common decoder settings */
> > +	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(cxld->id));
> > +	cxld_set_interleave(cxld, &ctrl);
> > +	cxld_set_type(cxld, &ctrl);
> > +	cxld_set_hpa(cxld, &base, &size);
> > +
> > +	writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
> > +	writel(lower_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
> > +	writel(upper_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(id));
> > +	writel(lower_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id));
> > +
> > +	if (is_switch_decoder(&cxld->dev)) {
> > +		struct cxl_switch_decoder *cxlsd =
> > +			to_cxl_switch_decoder(&cxld->dev);
> > +		void __iomem *tl_hi = hdm + CXL_HDM_DECODER0_TL_HIGH(id);
> > +		void __iomem *tl_lo = hdm + CXL_HDM_DECODER0_TL_LOW(id);
> > +		u64 targets;
> > +
> > +		rc = cxlsd_set_targets(cxlsd, &targets);
> > +		if (rc) {
> > +			dev_dbg(&port->dev, "%s: target configuration error\n",
> > +				dev_name(&cxld->dev));
> > +			goto err;
> > +		}
> > +
> > +		writel(upper_32_bits(targets), tl_hi);
> > +		writel(lower_32_bits(targets), tl_lo);
> > +	} else {
> > +		struct cxl_endpoint_decoder *cxled =
> > +			to_cxl_endpoint_decoder(&cxld->dev);
> > +		void __iomem *sk_hi = hdm + CXL_HDM_DECODER0_SKIP_HIGH(id);
> > +		void __iomem *sk_lo = hdm + CXL_HDM_DECODER0_SKIP_LOW(id);
> > +
> > +		writel(upper_32_bits(cxled->skip), sk_hi);
> > +		writel(lower_32_bits(cxled->skip), sk_lo);
> > +	}
> > +
> > +	writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
> > +	up_read(&cxl_dpa_rwsem);
> > +
> > +	port->commit_end++;
> 
> Obviously doesn't matter as resetting on error, but
> feels like the increment of commit_end++ should only follow
> succesful commit / await_commit();

Then it would need a special ->reset() flavor to do everything but the
commit_end management. As long as cxl_region_rwsem is held over the
combination, nothing can sneak in and observe the intermediate state.

> > +	rc = cxld_await_commit(hdm, cxld->id);
> > +err:
> > +	if (rc) {
> > +		dev_dbg(&port->dev, "%s: error %d committing decoder\n",
> > +			dev_name(&cxld->dev), rc);
> > +		cxld->reset(cxld);
> > +		return rc;
> > +	}
> > +	cxld->flags |= CXL_DECODER_F_ENABLE;
> > +
> > +	return 0;
> > +}
> > +
> > +static int cxl_decoder_reset(struct cxl_decoder *cxld)
> > +{
> > +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > +	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> > +	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
> > +	int id = cxld->id;
> > +	u32 ctrl;
> > +
> > +	if ((cxld->flags & CXL_DECODER_F_ENABLE) ==  0)
> 
> extra space after ==

got it.

> 
> > +		return 0;
> > +
> 
> ...
> 
> 
> >  		
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 7034300e72b2..eee1615d2319 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -630,6 +630,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
> >  	port->component_reg_phys = component_reg_phys;
> >  	ida_init(&port->decoder_ida);
> >  	port->dpa_end = -1;
> > +	port->commit_end = -1;
> >  	xa_init(&port->dports);
> >  	xa_init(&port->endpoints);
> >  	xa_init(&port->regions);
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 071b8cafe2bb..b90160c4f975 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -112,6 +112,168 @@ static ssize_t uuid_store(struct device *dev, struct device_attribute *attr,
> >  }
> >  static DEVICE_ATTR_RW(uuid);
> 
> ...
> 
> 
> > +static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
> > +{
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	int i;
> > +
> > +	for (i = count - 1; i >= 0; i--) {
> > +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> > +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +		struct cxl_port *iter = cxled_to_port(cxled);
> > +		struct cxl_ep *ep;
> > +		int rc;
> > +
> > +		while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
> > +			iter = to_cxl_port(iter->dev.parent);
> > +
> > +		for (ep = cxl_ep_load(iter, cxlmd); iter;
> > +		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
> > +			struct cxl_region_ref *cxl_rr;
> > +			struct cxl_decoder *cxld;
> > +
> > +			cxl_rr = cxl_rr_load(iter, cxlr);
> > +			cxld = cxl_rr->decoder;
> > +			rc = cxld->reset(cxld);
> > +			if (rc)
> > +				return rc;
> > +		}
> > +
> > +		rc = cxled->cxld.reset(&cxled->cxld);
> > +		if (rc)
> > +			return rc;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int cxl_region_decode_commit(struct cxl_region *cxlr)
> > +{
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	int i, rc;
> > +
> > +	for (i = 0; i < p->nr_targets; i++) {
> > +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> > +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +		struct cxl_region_ref *cxl_rr;
> > +		struct cxl_decoder *cxld;
> > +		struct cxl_port *iter;
> > +		struct cxl_ep *ep;
> > +
> > +		/* commit bottom up */
> > +		for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
> > +		     iter = to_cxl_port(iter->dev.parent)) {
> > +			cxl_rr = cxl_rr_load(iter, cxlr);
> > +			cxld = cxl_rr->decoder;
> > +			rc = cxld->commit(cxld);
> > +			if (rc)
> > +				break;
> > +		}
> > +
> > +		if (is_cxl_root(iter))
> > +			continue;
> > +
> > +		/* teardown top down */
> 
> Comment on why we are tearing down.  I guess because previous
> somehow didn't end up at the root?

Correct, one of those commits in the loop above failed causing it to
break out. Added a comment.

> 
> > +		for (ep = cxl_ep_load(iter, cxlmd); ep && iter;
> > +		     iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
> > +			cxl_rr = cxl_rr_load(iter, cxlr);
> > +			cxld = cxl_rr->decoder;
> > +			cxld->reset(cxld);
> > +		}
> > +
> > +		cxled->cxld.reset(&cxled->cxld);
> > +		if (i == 0)
> > +			return rc;
> > +		break;
> > +	}
> > +
> > +	if (i >= p->nr_targets)
> > +		return 0;
> > +
> > +	/* undo the targets that were successfully committed */
> > +	cxl_region_decode_reset(cxlr, i);
> > +	return rc;
> > +}
> > +
> > +static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
> > +			    const char *buf, size_t len)
> > +{
> > +	struct cxl_region *cxlr = to_cxl_region(dev);
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	bool commit;
> > +	ssize_t rc;
> > +
> > +	rc = kstrtobool(buf, &commit);
> > +	if (rc)
> > +		return rc;
> > +
> > +	rc = down_write_killable(&cxl_region_rwsem);
> > +	if (rc)
> > +		return rc;
> > +
> > +	/* Already in the requested state? */
> > +	if (commit && p->state >= CXL_CONFIG_COMMIT)
> > +		goto out;
> > +	if (!commit && p->state < CXL_CONFIG_COMMIT)
> > +		goto out;
> > +
> > +	/* Not ready to commit? */
> > +	if (commit && p->state < CXL_CONFIG_ACTIVE) {
> > +		rc = -ENXIO;
> > +		goto out;
> > +	}
> > +
> > +	if (commit)
> > +		rc = cxl_region_decode_commit(cxlr);
> > +	else {
> > +		p->state = CXL_CONFIG_RESET_PENDING;
> > +		up_write(&cxl_region_rwsem);
> > +		device_release_driver(&cxlr->dev);
> > +		down_write(&cxl_region_rwsem);
> > +
> > +		if (p->state == CXL_CONFIG_RESET_PENDING)
> 
> What path results in that changing in last few lines?
> Perhaps a comment if there is something we need to protect against?

The lock needs to be dropped before calling device_release_driver(),
after reacquiring the lock need to revalidate that the reset is still
pending. Added a comment.

> 
> 
> > +			rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
> > +	}
> > +
> > +	if (rc)
> > +		goto out;
> > +
> > +	if (commit)
> > +		p->state = CXL_CONFIG_COMMIT;
> > +	else if (p->state == CXL_CONFIG_RESET_PENDING)
> > +		p->state = CXL_CONFIG_ACTIVE;
> > +
> > +out:
> > +	up_write(&cxl_region_rwsem);
> > +
> > +	if (rc)
> > +		return rc;
> > +	return len;
> > +}
> 
> 
> ...
> 
> 
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index a93d7c4efd1a..fc14f6805f2c 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -54,6 +54,7 @@
> >  #define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
> >  #define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
> >  #define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> > +#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
> >  #define   CXL_HDM_DECODER0_CTRL_TYPE BIT(12)
> >  #define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
> >  #define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
> > @@ -257,6 +258,8 @@ enum cxl_decoder_type {
> >   * @target_type: accelerator vs expander (type2 vs type3) selector
> >   * @region: currently assigned region for this decoder
> >   * @flags: memory type capabilities and locking
> > + * @commit: device/decoder-type specific callback to commit settings to hw
> > + * @commit: device/decoder-type specific callback to reset hw settings
> 
> @reset

Yup.

> 
> >  */
> >  struct cxl_decoder {
> >  	struct device dev;
> > @@ -267,6 +270,8 @@ struct cxl_decoder {
> >  	enum cxl_decoder_type target_type;
> >  	struct cxl_region *region;
> >  	unsigned long flags;
> > +	int (*commit)(struct cxl_decoder *cxld);
> > +	int (*reset)(struct cxl_decoder *cxld);
> >  };
> >  
> 
> 
> > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> > index 51d517fa62ee..94653201631c 100644
> > --- a/tools/testing/cxl/test/cxl.c
> > +++ b/tools/testing/cxl/test/cxl.c
> > @@ -429,6 +429,50 @@ static int map_targets(struct device *dev, void *data)
> >  	return 0;
> >  }
> >  
> 
> ...
> 
> > +static int mock_decoder_reset(struct cxl_decoder *cxld)
> > +{
> > +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> > +	int id = cxld->id;
> > +
> > +	if ((cxld->flags & CXL_DECODER_F_ENABLE) ==  0)
> 
> bonus space after ==

copy-pasta plus missed clang-format. Fixed.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge
  2022-06-30 17:14   ` Jonathan Cameron
@ 2022-07-11 19:49     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-11 19:49 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:49 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Be careful to only disable cxl_pmem objects related to a given
> > cxl_nvdimm_bridge. Otherwise, offline_nvdimm_bus() reaches across CXL
> > domains and disables more than is expected.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Fix, but not fixes tag? Probably wants a comment (I'm guessing
> it didn't matter until now?)

I'll add:

Fixes: 21083f51521f ("cxl/pmem: Register 'pmem' / cxl_nvdimm devices")

To date this has been a benign side effect since it only effects
cxl_test, but as cxl_test gets wider deployment it needs to meet the
expectation that any cxl_test operations have no effect on the
production stack. It might also be important if Device Tree adds
incremental CXL platform topology support.

> By Domains, what do you mean?  I don't think we have that
> well defined as a term.

By "domain" I meant a CXL topology hierarchy that a given memdev
attaches. In the userspace cxl-cli tool terms this is a "bus":

# cxl list -M -b "ACPI.CXL" | jq .[0]
{
  "memdev": "mem0",
  "pmem_size": 536870912,
  "ram_size": 0,
  "serial": 0,
  "host": "0000:35:00.0"
}

# cxl list -M -b "cxl_test" | jq .[0]
{
  "memdev": "mem2",
  "pmem_size": 1073741824,
  "ram_size": 1073741824,
  "serial": 1,
  "numa_node": 1,
  "host": "cxl_mem.1"
}

...where "-b" filters by the "bus" provider. This shows that mem0
derives its CXL.mem connectivity from the typical ACPI hierarchy, and
mem2 is in the "cxl_test" domain. I did not use the "bus" term in the
changelog because "bus" means something different to the kernel as both
of those devices are registered on @cxl_bus_type.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects
  2022-06-30 17:34   ` Jonathan Cameron
@ 2022-07-11 20:05     ` Dan Williams
  0 siblings, 0 replies; 157+ messages in thread
From: Dan Williams @ 2022-07-11 20:05 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> On Thu, 23 Jun 2022 21:19:50 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > The LIBNVDIMM subsystem is a platform agnostic representation of system
> > NVDIMM / persistent memory resources. To date, the CXL subsystem's
> > interaction with LIBNVDIMM has been to register an nvdimm-bridge device
> > and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM
> > subsystem mechanics.
> > 
> > With regions the approach is the same. Create a new cxl_pmem_region
> > object to proxy CXL region details into a LIBNVDIMM definition. With
> > this enabling LIBNVDIMM can partition CXL persistent memory regions with
> > legacy namespace labels. A follow-on patch will add CXL region label and
> > CXL namespace label support to persist region configurations across
> > driver reload / system-reset events.
> ah. Now I see why we share ID space with NVDIMMs. Fair enough, I should
> have read to the end ;)
> 
> > 
> > Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> End of day, so a fairly superficial review on this and I'll hopefully
> take a second look at one or two of the earlier patches when time allows.
> 
> Jonathan
> 
> ...
> 
> > +static struct cxl_pmem_region *cxl_pmem_region_alloc(struct cxl_region *cxlr)
> > +{
> > +	struct cxl_pmem_region *cxlr_pmem = ERR_PTR(-ENXIO);
> 
> Rarely used, so better to set it where it is.

Ok.

> 
> > +	struct cxl_region_params *p = &cxlr->params;
> > +	struct device *dev;
> > +	int i;
> > +
> > +	down_read(&cxl_region_rwsem);
> > +	if (p->state != CXL_CONFIG_COMMIT)
> > +		goto out;
> > +	cxlr_pmem = kzalloc(struct_size(cxlr_pmem, mapping, p->nr_targets),
> > +			    GFP_KERNEL);
> > +	if (!cxlr_pmem) {
> > +		cxlr_pmem = ERR_PTR(-ENOMEM);
> > +		goto out;
> > +	}
> > +
> > +	cxlr_pmem->hpa_range.start = p->res->start;
> > +	cxlr_pmem->hpa_range.end = p->res->end;
> > +
> > +	/* Snapshot the region configuration underneath the cxl_region_rwsem */
> > +	cxlr_pmem->nr_mappings = p->nr_targets;
> > +	for (i = 0; i < p->nr_targets; i++) {
> > +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> > +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +		struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
> > +
> > +		m->cxlmd = cxlmd;
> > +		get_device(&cxlmd->dev);
> > +		m->start = cxled->dpa_res->start;
> > +		m->size = resource_size(cxled->dpa_res);
> > +		m->position = i;
> > +	}
> > +
> > +	dev = &cxlr_pmem->dev;
> > +	cxlr_pmem->cxlr = cxlr;
> > +	device_initialize(dev);
> > +	lockdep_set_class(&dev->mutex, &cxl_pmem_region_key);
> > +	device_set_pm_not_required(dev);
> > +	dev->parent = &cxlr->dev;
> > +	dev->bus = &cxl_bus_type;
> > +	dev->type = &cxl_pmem_region_type;
> > +out:
> > +	up_read(&cxl_region_rwsem);
> > +
> > +	return cxlr_pmem;
> > +}
> > +
> > +static void cxlr_pmem_unregister(void *dev)
> > +{
> > +	device_unregister(dev);
> > +}
> > +
> > +/**
> > + * devm_cxl_add_pmem_region() - add a cxl_region to nd_region bridge
> > + * @host: same host as @cxlmd
> 
> Run kernel-doc over these and clean all the warning sup.
> Parameter if cxlr not host

Fixed.

> 
> 
> > + *
> > + * Return: 0 on success negative error code on failure.
> > + */
> 
> 
> >  /*
> >   * Unit test builds overrides this to __weak, find the 'strong' version
> > diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c
> > index b271f6e90b91..4ba7248275ac 100644
> > --- a/drivers/cxl/pmem.c
> > +++ b/drivers/cxl/pmem.c
> > @@ -7,6 +7,7 @@
> 
> >  
> 
> 
> > +static int match_cxl_nvdimm(struct device *dev, void *data)
> > +{
> > +	return is_cxl_nvdimm(dev);
> > +}
> > +
> > +static void unregister_region(void *nd_region)
> 
> Better to give this a more specific name as we have several
> unregister_region() functions in CXL now.

Ok, unregister_nvdimm_region() it is.

> 
> > +{
> > +	struct cxl_nvdimm_bridge *cxl_nvb;
> > +	struct cxl_pmem_region *cxlr_pmem;
> > +	int i;
> > +
> > +	cxlr_pmem = nd_region_provider_data(nd_region);
> > +	cxl_nvb = cxlr_pmem->bridge;
> > +	device_lock(&cxl_nvb->dev);
> > +	for (i = 0; i < cxlr_pmem->nr_mappings; i++) {
> > +		struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
> > +		struct cxl_nvdimm *cxl_nvd = m->cxl_nvd;
> > +
> > +		if (cxl_nvd->region) {
> > +			put_device(&cxlr_pmem->dev);
> > +			cxl_nvd->region = NULL;
> > +		}
> > +	}
> > +	device_unlock(&cxl_nvb->dev);
> > +
> > +	nvdimm_region_delete(nd_region);
> > +}
> > +
> 
> > +
> > +static int cxl_pmem_region_probe(struct device *dev)
> > +{
> > +	struct nd_mapping_desc mappings[CXL_DECODER_MAX_INTERLEAVE];
> > +	struct cxl_pmem_region *cxlr_pmem = to_cxl_pmem_region(dev);
> > +	struct cxl_region *cxlr = cxlr_pmem->cxlr;
> > +	struct cxl_pmem_region_info *info = NULL;
> > +	struct cxl_nvdimm_bridge *cxl_nvb;
> > +	struct nd_interleave_set *nd_set;
> > +	struct nd_region_desc ndr_desc;
> > +	struct cxl_nvdimm *cxl_nvd;
> > +	struct nvdimm *nvdimm;
> > +	struct resource *res;
> > +	int rc = 0, i;
> > +
> > +	cxl_nvb = cxl_find_nvdimm_bridge(&cxlr_pmem->mapping[0].cxlmd->dev);
> > +	if (!cxl_nvb) {
> > +		dev_dbg(dev, "bridge not found\n");
> > +		return -ENXIO;
> > +	}
> > +	cxlr_pmem->bridge = cxl_nvb;
> > +
> > +	device_lock(&cxl_nvb->dev);
> > +	if (!cxl_nvb->nvdimm_bus) {
> > +		dev_dbg(dev, "nvdimm bus not found\n");
> > +		rc = -ENXIO;
> > +		goto out;
> > +	}
> > +
> > +	memset(&mappings, 0, sizeof(mappings));
> > +	memset(&ndr_desc, 0, sizeof(ndr_desc));
> > +
> > +	res = devm_kzalloc(dev, sizeof(*res), GFP_KERNEL);
> > +	if (!res) {
> > +		rc = -ENOMEM;
> > +		goto out;
> > +	}
> > +
> > +	res->name = "Persistent Memory";
> > +	res->start = cxlr_pmem->hpa_range.start;
> > +	res->end = cxlr_pmem->hpa_range.end;
> > +	res->flags = IORESOURCE_MEM;
> > +	res->desc = IORES_DESC_PERSISTENT_MEMORY;
> > +
> > +	rc = insert_resource(&iomem_resource, res);
> > +	if (rc)
> > +		goto out;
> > +
> > +	rc = devm_add_action_or_reset(dev, cxlr_pmem_remove_resource, res);
> > +	if (rc)
> > +		goto out;
> > +
> > +	ndr_desc.res = res;
> > +	ndr_desc.provider_data = cxlr_pmem;
> > +
> > +	ndr_desc.numa_node = memory_add_physaddr_to_nid(res->start);
> > +	ndr_desc.target_node = phys_to_target_node(res->start);
> > +	if (ndr_desc.target_node == NUMA_NO_NODE) {
> > +		ndr_desc.target_node = ndr_desc.numa_node;
> > +		dev_dbg(&cxlr->dev, "changing target node from %d to %d",
> > +			NUMA_NO_NODE, ndr_desc.target_node);
> > +	}
> > +
> > +	nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL);
> > +	if (!nd_set) {
> > +		rc = -ENOMEM;
> > +		goto out;
> > +	}
> > +
> > +	ndr_desc.memregion = cxlr->id;
> > +	set_bit(ND_REGION_CXL, &ndr_desc.flags);
> > +	set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
> > +
> > +	info = kmalloc_array(cxlr_pmem->nr_mappings, sizeof(*info), GFP_KERNEL);
> > +	if (!info)
> > +		goto out;
> > +
> > +	rc = -ENODEV;
> 
> Personal taste, but I'd much rather see that set in the error handlers
> so I can quickly see where it applies.

Ok.

> 
> > +	for (i = 0; i < cxlr_pmem->nr_mappings; i++) {
> > +		struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
> > +		struct cxl_memdev *cxlmd = m->cxlmd;
> > +		struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +		struct device *d;
> > +
> > +		d = device_find_child(&cxlmd->dev, NULL, match_cxl_nvdimm);
> > +		if (!d) {
> > +			dev_dbg(dev, "[%d]: %s: no cxl_nvdimm found\n", i,
> > +				dev_name(&cxlmd->dev));
> > +			goto err;
> > +		}
> > +
> > +		/* safe to drop ref now with bridge lock held */
> > +		put_device(d);
> > +
> > +		cxl_nvd = to_cxl_nvdimm(d);
> > +		nvdimm = dev_get_drvdata(&cxl_nvd->dev);
> > +		if (!nvdimm) {
> > +			dev_dbg(dev, "[%d]: %s: no nvdimm found\n", i,
> > +				dev_name(&cxlmd->dev));
> > +			goto err;
> > +		}
> > +		cxl_nvd->region = cxlr_pmem;
> > +		get_device(&cxlr_pmem->dev);
> > +		m->cxl_nvd = cxl_nvd;
> > +		mappings[i] = (struct nd_mapping_desc) {
> > +			.nvdimm = nvdimm,
> > +			.start = m->start,
> > +			.size = m->size,
> > +			.position = i,
> > +		};
> > +		info[i].offset = m->start;
> > +		info[i].serial = cxlds->serial;
> > +	}
> > +	ndr_desc.num_mappings = cxlr_pmem->nr_mappings;
> > +	ndr_desc.mapping = mappings;
> > +
> > +	/*
> > +	 * TODO enable CXL labels which skip the need for 'interleave-set cookie'
> > +	 */
> > +	nd_set->cookie1 =
> > +		nd_fletcher64(info, sizeof(*info) * cxlr_pmem->nr_mappings, 0);
> > +	nd_set->cookie2 = nd_set->cookie1;
> > +	ndr_desc.nd_set = nd_set;
> > +
> > +	cxlr_pmem->nd_region =
> > +		nvdimm_pmem_region_create(cxl_nvb->nvdimm_bus, &ndr_desc);
> > +	if (IS_ERR(cxlr_pmem->nd_region)) {
> > +		rc = PTR_ERR(cxlr_pmem->nd_region);
> > +		goto err;
> > +	} else
> 
> no need for else as other branch has gone flying off down to
> err.

Yup.

> 
> > +		rc = devm_add_action_or_reset(dev, unregister_region,
> > +					      cxlr_pmem->nd_region);
> > +out:
> 
> Having labels out: and err: where both are used for errors is pretty
> confusing naming...  Perhaps you are better off just not sharing the
> good exit path with any of the error paths.
> 

Ok.

> 
> > +	device_unlock(&cxl_nvb->dev);
> > +	put_device(&cxl_nvb->dev);
> > +	kfree(info);
> 
> Ok, so safe to do this here, but would be nice to do this
> in reverse order of setup with multiple labels so we can avoid
> paths that free things that were never created. Doesn't look
> like it would hurt much to move kfree(info) above the device_unlock()
> and only do that if we have allocated info.

Ok, but no need for more labels, unconditionally free'ing info and
trying to unwind the mapping references can proceed if @info is
initialized to NULL and @i is initialized to 0.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention
  2022-07-09 20:06       ` Dan Williams
@ 2022-07-12 22:11         ` Adam Manzanares
  0 siblings, 0 replies; 157+ messages in thread
From: Adam Manzanares @ 2022-07-12 22:11 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Sat, Jul 09, 2022 at 01:06:36PM -0700, Dan Williams wrote:
> Adam Manzanares wrote:
> > On Thu, Jun 23, 2022 at 07:45:07PM -0700, Dan Williams wrote:
> > > This failing signature:
> > > 
> > > [    8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760
> > > [    8.392670] cxl_port: probe of endpoint2 failed with error 970997760
> > > [    8.392719] create_endpoint: cxl_mem mem0: add: endpoint2
> > > [    8.392721] cxl_mem mem0: endpoint2 failed probe
> > > [    8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6
> > > 
> > > ...shows cxl_hdm_decode_init() resulting in a return code ("970997760")
> > > that looks like stack corruption. The problem goes away if
> > > cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init().
> > > 
> > > The corruption results from the mismatch that the calling convention for
> > > cxl_hdm_decode_init() is:
> > > 
> > > int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> > > 
> > > ...and __wrap_cxl_hdm_decode_init() is:
> > > 
> > > bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
> > > 
> > > ...i.e. an int is expected but __wrap_hdm_decode_init() returns bool.
> > > 
> > > Fix the convention and cleanup the organization to match
> > > __wrap_cxl_await_media_ready() as the difference was a red herring that
> > > distracted from finding the bug.
> > > 
> > > Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()")
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  tools/testing/cxl/test/mock.c |    8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> > > index f1f8c40948c5..bce6a21df0d5 100644
> > > --- a/tools/testing/cxl/test/mock.c
> > > +++ b/tools/testing/cxl/test/mock.c
> > > @@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds)
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL);
> > >  
> > > -bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
> > > -				struct cxl_hdm *cxlhdm)
> > > +int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds,
> > > +			       struct cxl_hdm *cxlhdm)
> > >  {
> > >  	int rc = 0, index;
> > >  	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> > >  
> > > -	if (!ops || !ops->is_mock_dev(cxlds->dev))
> > > +	if (ops && ops->is_mock_dev(cxlds->dev))
> > > +		rc = 0;
> > > +	else
> > >  		rc = cxl_hdm_decode_init(cxlds, cxlhdm);
> > >  	put_cxl_mock_ops(index);
> > >  
> > > 
> > 
> > 
> > Looks good.
> > 
> > Reviewed by: Adam Manzanares <a.manzanares@samsung.com>
> 
> Just fyi, b4 did not auto-apply this tag due to the missing "-", caught
> it manually.

Ouch, thanks for pointing this out. Updated my template. 

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
  2022-07-10  2:12     ` Dan Williams
@ 2022-07-19 14:24       ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-07-19 14:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches,
	david, gregkh, jgg


> Added the following...
> 
> /**
>  * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
>  * @cxl_res: A standalone resource tree where each CXL window is a sibling
>  *
>  * Walk each CXL window in @cxl_res and add it to iomem_resource potentially
>  * expanding its boundaries to ensure that any conflicting resources become
>  * children. If a window is expanded it may then conflict with a another window
>  * entry and require the window to be truncated or trimmed. Consider this
>  * situation:
>  *
>  * |-- "CXL Window 0" --||----- "CXL Window 1" -----|
>  * |--------------- "System RAM" -------------|
>  *
>  * ...where platform firmware has established as System RAM resource across 2
>  * windows, but has left some portion of window 1 for dynamic CXL region
>  * provisioning. In this case "Window 0" will span the entirety of the "System
>  * RAM" span, and "CXL Window 1" is truncated to the remaining tail past the end
>  * of that "System RAM" resource.
>  */

Very nice.  Thanks!

J

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 14/46] cxl/hdm: Enumerate allocated DPA
  2022-07-10  3:03     ` Dan Williams
@ 2022-07-19 14:25       ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-07-19 14:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, hch, alison.schofield, nvdimm,
	linux-pci, patches

...

> > > +static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > > +			     resource_size_t base, resource_size_t len,
> > > +			     resource_size_t skip)
> > > +{
> > > +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > > +	struct cxl_port *port = cxled_to_port(cxled);
> > > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > > +	struct device *dev = &port->dev;
> > > +	struct resource *res;
> > > +
> > > +	lockdep_assert_held_write(&cxl_dpa_rwsem);
> > > +
> > > +	if (!len)
> > > +		return 0;
> > > +
> > > +	if (cxled->dpa_res) {
> > > +		dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
> > > +			port->id, cxled->cxld.id, cxled->dpa_res);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	if (skip) {
> > > +		res = __request_region(&cxlds->dpa_res, base - skip, skip,
> > > +				       dev_name(dev), 0);  
> > 
> > 
> > Interface that uses a backwards definition of skip as what to skip before
> > the base parameter is a little odd can we rename base parameter to something
> > like 'current_top' then have base = current_top + skip?  current_top naming
> > not great though...  
> 
> How about just name it "skipped" instead of "skip"? As the parameter is
> how many bytes were skipped to allow a new allocation to start at base.

Works for me (guessing you long since went with this given how far behind I am!)

Thanks,

Jonathan

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 17/46] cxl/hdm: Track next decoder to allocate
  2022-07-10  3:55     ` Dan Williams
@ 2022-07-19 14:27       ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-07-19 14:27 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, hch, alison.schofield, nvdimm, linux-pci, patches

On Sat, 9 Jul 2022 20:55:17 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Thu, 23 Jun 2022 19:47:07 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > The CXL specification enforces that endpoint decoders are committed in
> > > hw instance id order. In preparation for adding dynamic DPA allocation,
> > > record the hw instance id in endpoint decoders, and enforce allocations
> > > to occur in hw instance id order.
> > > 
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>  
> > 
> > dpa_end isn't a good name given the value isn't a Device Physical Address.
> > 
> > Otherwise looks fine,
> > 
> > Jonathan
> >   
> > > ---
> > >  drivers/cxl/core/hdm.c  |   14 ++++++++++++++
> > >  drivers/cxl/core/port.c |    1 +
> > >  drivers/cxl/cxl.h       |    2 ++
> > >  3 files changed, 17 insertions(+)
> > > 
> > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > > index 3f929231b822..8805afe63ebf 100644
> > > --- a/drivers/cxl/core/hdm.c
> > > +++ b/drivers/cxl/core/hdm.c
> > > @@ -153,6 +153,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled, bool remove_ac
> > >  	cxled->skip = 0;
> > >  	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
> > >  	cxled->dpa_res = NULL;
> > > +	port->dpa_end--;
> > >  }
> > >  
> > >  static void cxl_dpa_release(void *cxled)
> > > @@ -183,6 +184,18 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > >  		return -EBUSY;
> > >  	}
> > >  
> > > +	if (port->dpa_end + 1 != cxled->cxld.id) {
> > > +		/*
> > > +		 * Assumes alloc and commit order is always in hardware instance
> > > +		 * order per expectations from 8.2.5.12.20 Committing Decoder
> > > +		 * Programming that enforce decoder[m] committed before
> > > +		 * decoder[m+1] commit start.
> > > +		 */
> > > +		dev_dbg(dev, "decoder%d.%d: expected decoder%d.%d\n", port->id,
> > > +			cxled->cxld.id, port->id, port->dpa_end + 1);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > >  	if (skip) {
> > >  		res = __request_region(&cxlds->dpa_res, base - skip, skip,
> > >  				       dev_name(dev), 0);
> > > @@ -213,6 +226,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > >  			cxled->cxld.id, cxled->dpa_res);
> > >  		cxled->mode = CXL_DECODER_MIXED;
> > >  	}
> > > +	port->dpa_end++;
> > >  
> > >  	return 0;
> > >  }
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index 9d632c8c580b..54bf032cbcb7 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -485,6 +485,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport,
> > >  	port->uport = uport;
> > >  	port->component_reg_phys = component_reg_phys;
> > >  	ida_init(&port->decoder_ida);
> > > +	port->dpa_end = -1;
> > >  	INIT_LIST_HEAD(&port->dports);
> > >  	INIT_LIST_HEAD(&port->endpoints);
> > >  
> > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > > index aa223166f7ef..d8edbdaa6208 100644
> > > --- a/drivers/cxl/cxl.h
> > > +++ b/drivers/cxl/cxl.h
> > > @@ -326,6 +326,7 @@ struct cxl_nvdimm {
> > >   * @dports: cxl_dport instances referenced by decoders
> > >   * @endpoints: cxl_ep instances, endpoints that are a descendant of this port
> > >   * @decoder_ida: allocator for decoder ids
> > > + * @dpa_end: cursor to track highest allocated decoder for allocation ordering  
> > 
> > dpa_end not a good name as this isn't a Device Physical Address.  
> 
> Ok, renamed it to 'hdm_end'. Suitable to add "Reviewed-by" now?

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Though I'll probably catch up with a later version when I've gotten through
more of my email and add it there as well so you don't have to add it
manually.

Thanks,

J

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
  2022-07-10 20:40     ` Dan Williams
@ 2022-07-19 14:32       ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-07-19 14:32 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky


> >   
> > > +		address (N + interleave_granularity * intereleave_ways).  
> > 
> > interleave_ways
> > 
> > Even knowing exactly what this is, I don't understand the docs so
> > perhaps reword this :)  
> 
> Reworded to:
> 
> (RO) The number of consecutive bytes of host physical address space this
> decoder claims at address N before the decode rotates to the next target
> in the interleave at address N + interleave_granularity (assuming N is
> aligned to interleave_granularity).

LGTM

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 34/46] cxl/region: Add region creation support
  2022-07-11  0:08     ` Dan Williams
@ 2022-07-19 14:42       ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-07-19 14:42 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky


> > > An example of creating a new region:
> > > 
> > > - Allocate a new region name:
> > > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
> > > 
> > > - Create a new region by name:
> > > while
> > > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)  
> > 
> > Perhaps it is worth calling out the region ID allocator is shared
> > with nvdimms and other usecases.  I'm not really sure what the advantage
> > in doing that is, but it doesn't do any real harm.  
> 
> The rationale is that there are several producers of memory regions
> nvdimm, device-dax (hmem), and now cxl. Of those cases cxl can pass
> regoins to nvdimm and nvdimm can pass regions to device-dax (pmem). If
> each of those cases allocated their own region-id it would just
> complicate debug for no benefit. I can add this a note to remind why
> memregion_alloc() was introduced in the first instance.
> 
> >   
> > > ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
> > > do true; done  
> 
> I recall you also asked to clarify the rationale of this complexity. It
> is related to the potential proliferation of disaparate region ids, but
> also a lesson learned from nvdimm which itself learned lessons from
> md-raid. The lesson from md-raid in short is do not use ioctl for object
> creation. After "not ioctl" the choice is configfs or a small bit of
> sysfs hackery. Configfs is overkill when there is already a sysfs
> hierarchy that just needs one new object injected.
> 
> Namespace creation in nvdimm pre-created "seed" devices which let the
> kernel control the naming, but confused end users that wondered about
> vestigial devices. This "read to learn next object name" + "write to
> atomically claim and instantiate that id" cleans up that vestigial
> device problem while also constraining object naming to follow memregion
> id expectations.
Ok.  Makes sense to me now. Thanks!


> > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > > index 472ec9cb1018..ebe6197fb9b8 100644
> > > --- a/drivers/cxl/core/core.h
> > > +++ b/drivers/cxl/core/core.h
> > > @@ -9,6 +9,18 @@ extern const struct device_type cxl_nvdimm_type;
> > >  
> > >  extern struct attribute_group cxl_base_attribute_group;
> > >  
> > > +#ifdef CONFIG_CXL_REGION
> > > +extern struct device_attribute dev_attr_create_pmem_region;
> > > +extern struct device_attribute dev_attr_delete_region;
> > > +/*
> > > + * Note must be used at the end of an attribute list, since it
> > > + * terminates the list in the CONFIG_CXL_REGION=n case.  
> > 
> > That's rather ugly.  Maybe just push the ifdef down into the c file
> > where we will be shortening the list and it should be obvious what is
> > going on without needing the comment?  Much as I don't like ifdef
> > magic in the c files, it sometimes ends up cleaner.  
> 
> No, I think ifdef in C is definitely uglier, but I also notice that
> helpers like SET_SYSTEM_SLEEP_PM_OPS() are defined to be used in any
> place in the list. So, I'll just duplicate that approach.

Ah. That's better, though has that odd quirk of no trailing comma where
the macro is called which always makes me look twice!

Guess looking twice is better than not looking at all though :)

Thanks,

Jonathan

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 36/46] cxl/region: Add interleave ways attribute
  2022-07-11  0:32     ` Dan Williams
@ 2022-07-19 14:47       ` Jonathan Cameron
  2022-07-19 22:15         ` Dan Williams
  0 siblings, 1 reply; 157+ messages in thread
From: Jonathan Cameron @ 2022-07-19 14:47 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Sun, 10 Jul 2022 17:32:26 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Thu, 23 Jun 2022 21:19:40 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > From: Ben Widawsky <bwidawsk@kernel.org>
> > > 
> > > Add an ABI to allow the number of devices that comprise a region to be
> > > set.  
> > 
> > Should at least mention interleave_granularity is being added as well!  
> 
> Added.
> 
> >   
> > > 
> > > Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
> > > [djbw: reword changelog]
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>  
> > 
> > Random diversion inline...
> >   
> > > ---
> > >  Documentation/ABI/testing/sysfs-bus-cxl |  21 ++++
> > >  drivers/cxl/core/region.c               | 128 ++++++++++++++++++++++++
> > >  drivers/cxl/cxl.h                       |  33 ++++++
> > >  3 files changed, 182 insertions(+)  
> >   
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index f75978f846b9..78af42454760 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -7,6 +7,7 @@  
> > 
> >   
> > > +static ssize_t interleave_granularity_store(struct device *dev,
> > > +					    struct device_attribute *attr,
> > > +					    const char *buf, size_t len)
> > > +{
> > > +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> > > +	struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> > > +	struct cxl_region *cxlr = to_cxl_region(dev);
> > > +	struct cxl_region_params *p = &cxlr->params;
> > > +	int rc, val;
> > > +	u16 ig;
> > > +
> > > +	rc = kstrtoint(buf, 0, &val);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	rc = granularity_to_cxl(val, &ig);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* region granularity must be >= root granularity */  
> > 
> > In general I think that's an implementation choice.  Sure today
> > we only support it this way, but it's perfectly possible to build
> > setups where that's not the case.  
> 
> If the region granularity is smaller than the host bridge interleave
> granularity it means that multiple devices per host bridge are needed to
> satsify a single "slot" in the interleave. Valid? Yes. Useful for Linux
> to support, not clear.

True.  Wait and see on this one makes sense to me. I only noticed because
my older test scripts (against hacks on top of Ben's code) were broken as
I did it the silly way :)

> 
> > Maybe the comment should say that this code goes with an
> > implementation choice inline with the software guide (that argues you
> > will always prefer small ig for interleaving at the host to make best
> > use of bandwidth etc).  
> 
> No, I would prefer that as far as the Linux implementation is concerned
> the software-guide does not exist. In the sense that the Linux
> implementation choices supersede and are otherwise a superset of what
> the guide recommends.

ah. I phrased that badly. I just meant lift the argument as a comment rather
than a cross reference.

> 
> Also, for the same reason that the code does not anticipate future
> implementation possibilities, neither should the comments. It is
> sufficient to just change this comment when / if the implemetation stops
> expecting region granularity >= root granularity.
> 
> > Interestingly the code I was previously testing QEMU with
> > allowed that option (might have been only option that worked).
> > That code was a mixture of Ben's earlier version and my own hacks.
> > It probably doesn't make sense to support other ways of picking
> > the interleaving granularity until / if we ever get a request
> > to do so. 
> > 
> > I think it results in a different device ordering.
> > 
> > Ordering with this
> > 
> >     Host
> >      | 4k
> >     / \
> >    /   \  
> >   HB   HB  8k
> >   |     |
> >  / \   / \
> > 0  2   1  3
> > 
> > Ordering with Larger granularity CFMWS over finer granularity HB
> > 
> >     Host
> >      | 8k
> >     / \
> >    /   \ 
> >   HB   HB 4k
> >   |     |
> >  / \   / \
> > 0  1   2  3
> > 
> > Not clear why you'd do the second one though :)  So can ignore for now.  
> 
> All I can think of is "ZOMG! My platform failed and the only one I have
> to recover my data has HB interleaves with larger granularity than my
> failed system!". Otherwise, I expect cross-platform CXL persistent
> memory recovery to be so rare as to not need to spend time too much time
> worrying about it now. It seems a straightforward constraint to lift at
> a later date without any risk to breaking the ABI.

It was cross platform that I was thinking but you make a fair point that
it is unlikely to occur that often.  + If another OS want's to do it wrong
that's their problem :)

J

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 36/46] cxl/region: Add interleave ways attribute
  2022-07-19 14:47       ` Jonathan Cameron
@ 2022-07-19 22:15         ` Dan Williams
  2022-07-20  9:59           ` Jonathan Cameron
  0 siblings, 1 reply; 157+ messages in thread
From: Dan Williams @ 2022-07-19 22:15 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

Jonathan Cameron wrote:
> > No, I would prefer that as far as the Linux implementation is concerned
> > the software-guide does not exist. In the sense that the Linux
> > implementation choices supersede and are otherwise a superset of what
> > the guide recommends.
> 
> ah. I phrased that badly. I just meant lift the argument as a comment rather
> than a cross reference.

Oh, you mean promote it to an actual rationale comment rather than just
parrot what the code is doing? Yeah, that's a good idea.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 36/46] cxl/region: Add interleave ways attribute
  2022-07-19 22:15         ` Dan Williams
@ 2022-07-20  9:59           ` Jonathan Cameron
  0 siblings, 0 replies; 157+ messages in thread
From: Jonathan Cameron @ 2022-07-20  9:59 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, nvdimm, linux-pci, patches, hch, Ben Widawsky

On Tue, 19 Jul 2022 15:15:16 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > > No, I would prefer that as far as the Linux implementation is concerned
> > > the software-guide does not exist. In the sense that the Linux
> > > implementation choices supersede and are otherwise a superset of what
> > > the guide recommends.  
> > 
> > ah. I phrased that badly. I just meant lift the argument as a comment rather
> > than a cross reference.  
> 
> Oh, you mean promote it to an actual rationale comment rather than just
> parrot what the code is doing? Yeah, that's a good idea.

yup

^ permalink raw reply	[flat|nested] 157+ messages in thread

end of thread, other threads:[~2022-07-20 10:00 UTC | newest]

Thread overview: 157+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-24  2:45 [PATCH 00/46] CXL PMEM Region Provisioning Dan Williams
2022-06-24  2:45 ` [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention Dan Williams
2022-06-28 10:37   ` Jonathan Cameron
     [not found]   ` <CGME20220629174147uscas1p211384ae262e099484440ef285be26c75@uscas1p2.samsung.com>
2022-06-29 17:41     ` Adam Manzanares
2022-07-09 20:06       ` Dan Williams
2022-07-12 22:11         ` Adam Manzanares
2022-06-24  2:45 ` [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port Dan Williams
2022-06-24  3:37   ` Alison Schofield
2022-06-28 11:47   ` Jonathan Cameron
2022-06-28 14:27     ` Dan Williams
     [not found]   ` <CGME20220629174622uscas1p2236a084ce25771a3ab57c6f006632f35@uscas1p2.samsung.com>
2022-06-29 17:46     ` Adam Manzanares
2022-06-24  2:45 ` [PATCH 03/46] cxl/hdm: Use local hdm variable Dan Williams
2022-06-24  3:38   ` Alison Schofield
2022-06-28 15:16   ` Jonathan Cameron
     [not found]   ` <CGME20220629200312uscas1p292303b9325dcbfe59293f002dc9e6b03@uscas1p2.samsung.com>
2022-06-29 20:03     ` Adam Manzanares
2022-06-24  2:45 ` [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range Dan Williams
2022-06-24  3:39   ` Alison Schofield
2022-06-28 15:17   ` Jonathan Cameron
     [not found]   ` <CGME20220629200652uscas1p2c1da644ea63a5de69e14e046379779b1@uscas1p2.samsung.com>
2022-06-29 20:06     ` Adam Manzanares
2022-06-24  2:45 ` [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders Dan Williams
2022-06-28 15:24   ` Jonathan Cameron
2022-07-09 23:33     ` Dan Williams
     [not found]   ` <CGME20220629202117uscas1p2892fb68ae60c4754e2f7d26882a92ae5@uscas1p2.samsung.com>
2022-06-29 20:21     ` Adam Manzanares
2022-07-09 23:38       ` Dan Williams
2022-06-24  2:45 ` [PATCH 06/46] cxl/core: Drop is_cxl_decoder() Dan Williams
2022-06-24  3:48   ` Alison Schofield
2022-06-28 15:25   ` Jonathan Cameron
     [not found]   ` <CGME20220629203448uscas1p264a7f79a1ed7f9257eefcb3064c7d943@uscas1p2.samsung.com>
2022-06-29 20:34     ` Adam Manzanares
2022-06-24  2:45 ` [PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity} Dan Williams
2022-06-28 15:36   ` Jonathan Cameron
2022-07-09 23:52     ` Dan Williams
2022-06-24  2:45 ` [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder' Dan Williams
2022-06-28 16:12   ` Jonathan Cameron
2022-06-30 10:56     ` Jonathan Cameron
2022-07-10  0:49       ` Dan Williams
2022-07-10  0:33     ` Dan Williams
2022-06-24  2:46 ` [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource Dan Williams
2022-06-28 16:43   ` Jonathan Cameron
2022-07-10  2:12     ` Dan Williams
2022-07-19 14:24       ` Jonathan Cameron
2022-06-24  2:46 ` [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources Dan Williams
2022-06-28 16:49   ` Jonathan Cameron
2022-07-10  2:20     ` Dan Williams
2022-06-28 16:53   ` Jonathan Cameron
2022-06-24  2:46 ` [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources Dan Williams
2022-06-28 16:55   ` Jonathan Cameron
2022-07-10  2:40     ` Dan Williams
2022-06-24  2:46 ` [PATCH 12/46] cxl/mem: Convert partition-info to resources Dan Williams
2022-06-28 17:02   ` Jonathan Cameron
2022-06-24  2:46 ` [PATCH 13/46] cxl/hdm: Require all decoders to be enumerated Dan Williams
2022-06-28 17:04   ` Jonathan Cameron
2022-06-24  2:46 ` [PATCH 14/46] cxl/hdm: Enumerate allocated DPA Dan Williams
2022-06-29 14:43   ` Jonathan Cameron
2022-07-10  3:03     ` Dan Williams
2022-07-19 14:25       ` Jonathan Cameron
2022-06-24  2:46 ` [PATCH 15/46] cxl/Documentation: List attribute permissions Dan Williams
2022-06-28  3:16   ` Alison Schofield
2022-06-29 14:59   ` Jonathan Cameron
2022-06-24  2:46 ` [PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects Dan Williams
2022-06-29 15:28   ` Jonathan Cameron
2022-07-10  3:45     ` Dan Williams
2022-06-24  2:47 ` [PATCH 17/46] cxl/hdm: Track next decoder to allocate Dan Williams
2022-06-29 15:31   ` Jonathan Cameron
2022-07-10  3:55     ` Dan Williams
2022-07-19 14:27       ` Jonathan Cameron
2022-07-10 16:34     ` Dan Williams
2022-06-24  2:47 ` [PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder Dan Williams
2022-06-29 15:56   ` Jonathan Cameron
2022-07-10 16:53     ` Dan Williams
2022-06-24  2:47 ` [PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init() Dan Williams
2022-06-29 15:58   ` Jonathan Cameron
2022-06-24  2:47 ` [PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem' Dan Williams
2022-06-29 16:08   ` Jonathan Cameron
2022-07-10 17:09     ` Dan Williams
2022-06-24  2:47 ` [PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory Dan Williams
2022-06-29 16:11   ` Jonathan Cameron
2022-07-10 17:19     ` Dan Williams
2022-06-24  2:47 ` [PATCH 22/46] tools/testing/cxl: Expand CFMWS windows Dan Williams
2022-06-29 16:14   ` Jonathan Cameron
2022-06-24  2:47 ` [PATCH 23/46] tools/testing/cxl: Add partition support Dan Williams
2022-06-29 16:20   ` Jonathan Cameron
2022-06-24  2:48 ` [PATCH 24/46] tools/testing/cxl: Fix decoder default state Dan Williams
2022-06-29 16:22   ` Jonathan Cameron
2022-07-10 17:33     ` Dan Williams
2022-06-24  2:48 ` [PATCH 25/46] cxl/port: Record dport in endpoint references Dan Williams
2022-06-29 16:49   ` Jonathan Cameron
2022-07-10 18:40     ` Dan Williams
2022-06-24  4:19 ` [PATCH 26/46] cxl/port: Record parent dport when adding ports Dan Williams
2022-06-29 17:02   ` Jonathan Cameron
2022-06-24  4:19 ` [PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port Dan Williams
2022-06-29 17:19   ` Jonathan Cameron
2022-06-24  4:19 ` [PATCH 28/46] cxl/port: Move dport tracking to an xarray Dan Williams
2022-06-30  9:18   ` Jonathan Cameron
2022-07-10 19:06     ` Dan Williams
2022-06-24  4:19 ` [PATCH 29/46] cxl/port: Cache CXL host bridge data Dan Williams
2022-06-30  9:21   ` Jonathan Cameron
2022-07-10 19:09     ` Dan Williams
2022-06-24  4:19 ` [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity Dan Williams
2022-06-30  9:26   ` Jonathan Cameron
2022-07-10 20:40     ` Dan Williams
2022-07-19 14:32       ` Jonathan Cameron
2022-06-24  4:19 ` [PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices Dan Williams
2022-06-30  9:33   ` Jonathan Cameron
2022-06-24  4:19 ` [PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints Dan Williams
2022-06-30  9:48   ` Jonathan Cameron
2022-07-10 21:01     ` Dan Williams
2022-06-24  4:19 ` [PATCH 33/46] resource: Introduce alloc_free_mem_region() Dan Williams
2022-06-30 10:35   ` Jonathan Cameron
2022-07-10 21:58     ` Dan Williams
2022-06-24  4:19 ` [PATCH 34/46] cxl/region: Add region creation support Dan Williams
2022-06-30 13:17   ` Jonathan Cameron
2022-07-11  0:08     ` Dan Williams
2022-07-19 14:42       ` Jonathan Cameron
2022-06-24  4:19 ` [PATCH 35/46] cxl/region: Add a 'uuid' attribute Dan Williams
2022-06-28 10:29   ` Jonathan Cameron
2022-06-28 14:24     ` Dan Williams
2022-06-24  4:19 ` [PATCH 36/46] cxl/region: Add interleave ways attribute Dan Williams
2022-06-30 13:44   ` Jonathan Cameron
2022-07-11  0:32     ` Dan Williams
2022-07-19 14:47       ` Jonathan Cameron
2022-07-19 22:15         ` Dan Williams
2022-07-20  9:59           ` Jonathan Cameron
2022-06-30 13:45   ` Jonathan Cameron
2022-06-24  4:19 ` [PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions Dan Williams
2022-06-30 13:56   ` Jonathan Cameron
2022-07-11  0:47     ` Dan Williams
2022-06-24  4:19 ` [PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions Dan Williams
2022-06-30 14:31   ` Jonathan Cameron
2022-07-11  1:12     ` Dan Williams
2022-06-24  4:19 ` [PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism Dan Williams
2022-06-30 15:48   ` Jonathan Cameron
2022-06-24  4:19 ` [PATCH 40/46] cxl/region: Attach endpoint decoders Dan Williams
2022-06-24 18:25   ` Jonathan Cameron
2022-06-24 18:49     ` Dan Williams
2022-06-24 20:51     ` Dan Williams
2022-06-24 23:21       ` Dan Williams
2022-06-30 16:34   ` Jonathan Cameron
2022-07-11  2:02     ` Dan Williams
2022-06-24  4:19 ` [PATCH 41/46] cxl/region: Program target lists Dan Williams
2022-06-24  4:19 ` [PATCH 42/46] cxl/hdm: Commit decoder state to hardware Dan Williams
2022-06-30 17:05   ` Jonathan Cameron
2022-07-11  3:02     ` Dan Williams
2022-06-24  4:19 ` [PATCH 43/46] cxl/region: Add region driver boiler plate Dan Williams
2022-06-30 17:09   ` Jonathan Cameron
2022-06-24  4:19 ` [PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute Dan Williams
2022-06-30 17:10   ` Jonathan Cameron
2022-06-24  4:19 ` [PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge Dan Williams
2022-06-30 17:14   ` Jonathan Cameron
2022-07-11 19:49     ` Dan Williams
2022-06-24  4:19 ` [PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects Dan Williams
2022-06-30 17:34   ` Jonathan Cameron
2022-07-11 20:05     ` Dan Williams
2022-06-24 15:13 ` [PATCH 00/46] CXL PMEM Region Provisioning Jonathan Cameron
2022-06-24 15:32   ` Dan Williams
2022-06-28  3:12 ` Alison Schofield
2022-06-28  3:34   ` Dan Williams
2022-07-02  2:26 ` Alison Schofield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).