[PATCH 0/5] cxl, nvdimm: Move CPU cache management to region drivers

linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] cxl, nvdimm: Move CPU cache management to region drivers
@ 2022-12-01 22:03 Dan Williams
  2022-12-01 22:03 ` [PATCH 1/5] cxl: add dimm_id support for __nvdimm_create() Dan Williams
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Dan Williams @ 2022-12-01 22:03 UTC (permalink / raw)
  To: linux-cxl
  Cc: Jonathan Cameron, Dave Jiang, stable, Jonathan.Cameron,
	dave.jiang, nvdimm, dave

This is incremental to Dave's recent "[PATCH v7 00/20] Introduce
security commands for CXL pmem device" [1], starting after patch 17 [2].
I.e. I want to drop patch 18, 19, and 20 from that series and replace
them with these.  It was prompted by Davidlohr's concerns about
cxl_invalidate_memregion().

The insight is that now that cpu_cache_invalidate_memregion() has a
default implementation for all architectures, the cache management can
move from the intel-pmem-specific security operations to the generic
NVDIMM core. This relieves the new CXL security ops from needing to
open-code their own cache flushing.

Also prompted by Davidlohr's concerns is what do about cache flushing
for scenarios outside of the PMEM security operations. For that "[PATCH
5/5] cxl/region: Manage CPU caches relative to DPA invalidation events"
proposes to handle that management at region activation time. This does
mean that dynamic CXL region provisioning is limited to environments
where cpu_cache_has_invalidate_memregion() is true. A new
CONFIG_CXL_REGION_INVALIDATION_TEST is added to bypass that data
integrity enforcement.

Lastly this includes some fixups, one for the fact that
cxl_region_probe() was ignoring some errors, another to enforce that
PMEM security operations originate through LIBNVDIMM, and lastly a
cleanup to move a string formatting failure condition from runtime to
compile-time in cxl_nvdimm_alloc().

[1]: http://lore.kernel.org/r/166983606451.2734609.4050644229630259452.stgit@djiang5-desk3.ch.intel.com
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.2/cxl-security

---

Dan Williams (4):
      cxl/region: Fix missing probe failure
      cxl/pmem: Enforce keyctl ABI for PMEM security
      nvdimm/region: Move cache management to the region driver
      cxl/region: Manage CPU caches relative to DPA invalidation events

Dave Jiang (1):
      cxl: add dimm_id support for __nvdimm_create()

 drivers/acpi/nfit/intel.c    |   25 ---------------------
 drivers/cxl/Kconfig          |   18 +++++++++++++++
 drivers/cxl/core/mbox.c      |   10 +++++++++
 drivers/cxl/core/pmem.c      |    7 ++++++
 drivers/cxl/core/region.c    |   34 +++++++++++++++++++++++++++++
 drivers/cxl/cxl.h            |   11 +++++++++
 drivers/cxl/pmem.c           |    3 ++-
 drivers/cxl/security.c       |   14 ------------
 drivers/nvdimm/region.c      |   11 +++++++++
 drivers/nvdimm/region_devs.c |   49 +++++++++++++++++++++++++++++++++++++++++-
 drivers/nvdimm/security.c    |    6 +++++
 include/linux/libnvdimm.h    |    5 ++++
 12 files changed, 152 insertions(+), 41 deletions(-)

base-commit: 15a8348707ffd2a37516db9bede88cc0bb467e0b

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] cxl: add dimm_id support for __nvdimm_create()
  2022-12-01 22:03 [PATCH 0/5] cxl, nvdimm: Move CPU cache management to region drivers Dan Williams
@ 2022-12-01 22:03 ` Dan Williams
  2022-12-01 22:03 ` [PATCH 2/5] cxl/region: Fix missing probe failure Dan Williams
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2022-12-01 22:03 UTC (permalink / raw)
  To: linux-cxl
  Cc: Jonathan Cameron, Dave Jiang, Jonathan.Cameron, dave.jiang, nvdimm, dave

From: Dave Jiang <dave.jiang@intel.com>

Set the cxlds->serial as the dimm_id to be fed to __nvdimm_create(). The
security code uses that as the key description for the security key of the
memory device. The nvdimm unlock code cannot find the respective key
without the dimm_id.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166863357043.80269.4337575149671383294.stgit@djiang5-desk3.ch.intel.com
Link: https://lore.kernel.org/r/166983620459.2734609.10175456773200251184.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/pmem.c |    7 +++++++
 drivers/cxl/cxl.h       |    3 +++
 drivers/cxl/pmem.c      |    3 ++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c
index 36aa5070d902..7b9a9573e6f2 100644
--- a/drivers/cxl/core/pmem.c
+++ b/drivers/cxl/core/pmem.c
@@ -238,6 +238,13 @@ static struct cxl_nvdimm *cxl_nvdimm_alloc(struct cxl_memdev *cxlmd)
 	dev->parent = &cxlmd->dev;
 	dev->bus = &cxl_bus_type;
 	dev->type = &cxl_nvdimm_type;
+	/*
+	 * A "%llx" string is 17-bytes vs dimm_id that is max
+	 * NVDIMM_KEY_DESC_LEN
+	 */
+	BUILD_BUG_ON(sizeof(cxl_nvd->dev_id) < 17 ||
+		     sizeof(cxl_nvd->dev_id) > NVDIMM_KEY_DESC_LEN);
+	sprintf(cxl_nvd->dev_id, "%llx", cxlmd->cxlds->serial);
 
 	return cxl_nvd;
 }
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 7d07127eade3..b433e541a054 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -420,11 +420,14 @@ struct cxl_nvdimm_bridge {
 	enum cxl_nvdimm_brige_state state;
 };
 
+#define CXL_DEV_ID_LEN 19
+
 struct cxl_nvdimm {
 	struct device dev;
 	struct cxl_memdev *cxlmd;
 	struct cxl_nvdimm_bridge *bridge;
 	struct xarray pmem_regions;
+	u8 dev_id[CXL_DEV_ID_LEN]; /* for nvdimm, string of 'serial' */
 };
 
 struct cxl_pmem_region_mapping {
diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c
index 403e41bcbf2b..ab40c93c44e5 100644
--- a/drivers/cxl/pmem.c
+++ b/drivers/cxl/pmem.c
@@ -117,7 +117,8 @@ static int cxl_nvdimm_probe(struct device *dev)
 	set_bit(ND_CMD_SET_CONFIG_DATA, &cmd_mask);
 	nvdimm = __nvdimm_create(cxl_nvb->nvdimm_bus, cxl_nvd,
 				 cxl_dimm_attribute_groups, flags,
-				 cmd_mask, 0, NULL, NULL, cxl_security_ops, NULL);
+				 cmd_mask, 0, NULL, cxl_nvd->dev_id,
+				 cxl_security_ops, NULL);
 	if (!nvdimm) {
 		rc = -ENOMEM;
 		goto out;


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] cxl/region: Fix missing probe failure
  2022-12-01 22:03 [PATCH 0/5] cxl, nvdimm: Move CPU cache management to region drivers Dan Williams
  2022-12-01 22:03 ` [PATCH 1/5] cxl: add dimm_id support for __nvdimm_create() Dan Williams
@ 2022-12-01 22:03 ` Dan Williams
  2022-12-01 22:30   ` Dave Jiang
                     ` (2 more replies)
  2022-12-01 22:03 ` [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security Dan Williams
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 22+ messages in thread
From: Dan Williams @ 2022-12-01 22:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: stable, Jonathan.Cameron, dave.jiang, nvdimm, dave

cxl_region_probe() allows for regions not in the 'commit' state to be
enabled. Fail probe when the region is not committed otherwise the
kernel may indicate that an address range is active when none of the
decoders are active.

Fixes: 8d48817df6ac ("cxl/region: Add region driver boiler plate")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/region.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index f9ae5ad284ff..1bc2ebefa2a5 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1923,6 +1923,9 @@ static int cxl_region_probe(struct device *dev)
 	 */
 	up_read(&cxl_region_rwsem);
 
+	if (rc)
+		return rc;
+
 	switch (cxlr->mode) {
 	case CXL_DECODER_PMEM:
 		return devm_cxl_add_pmem_region(cxlr);


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security
  2022-12-01 22:03 [PATCH 0/5] cxl, nvdimm: Move CPU cache management to region drivers Dan Williams
  2022-12-01 22:03 ` [PATCH 1/5] cxl: add dimm_id support for __nvdimm_create() Dan Williams
  2022-12-01 22:03 ` [PATCH 2/5] cxl/region: Fix missing probe failure Dan Williams
@ 2022-12-01 22:03 ` Dan Williams
  2022-12-01 22:32   ` Dave Jiang
                     ` (2 more replies)
  2022-12-01 22:03 ` [PATCH 4/5] nvdimm/region: Move cache management to the region driver Dan Williams
  2022-12-01 22:03 ` [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events Dan Williams
  4 siblings, 3 replies; 22+ messages in thread
From: Dan Williams @ 2022-12-01 22:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: Jonathan.Cameron, dave.jiang, nvdimm, dave

Preclude the possibility of user tooling sending device secrets in the
clear into the kernel by marking the security commands as exclusive.
This mandates the usage of the keyctl ABI for managing the device
passphrase.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/mbox.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 8747db329087..35dd889f1d3a 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -704,6 +704,16 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
 		rc = 0;
 	}
 
+	/*
+	 * Setup permanently kernel exclusive commands, i.e. the
+	 * mechanism is driven through sysfs, keyctl, etc...
+	 */
+	set_bit(CXL_MEM_COMMAND_ID_SET_PASSPHRASE, cxlds->exclusive_cmds);
+	set_bit(CXL_MEM_COMMAND_ID_DISABLE_PASSPHRASE, cxlds->exclusive_cmds);
+	set_bit(CXL_MEM_COMMAND_ID_UNLOCK, cxlds->exclusive_cmds);
+	set_bit(CXL_MEM_COMMAND_ID_PASSPHRASE_SECURE_ERASE,
+		cxlds->exclusive_cmds);
+
 out:
 	kvfree(gsl);
 	return rc;


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] nvdimm/region: Move cache management to the region driver
  2022-12-01 22:03 [PATCH 0/5] cxl, nvdimm: Move CPU cache management to region drivers Dan Williams
                   ` (2 preceding siblings ...)
  2022-12-01 22:03 ` [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security Dan Williams
@ 2022-12-01 22:03 ` Dan Williams
  2022-12-01 23:00   ` Dave Jiang
  2022-12-02  3:21   ` Davidlohr Bueso
  2022-12-01 22:03 ` [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events Dan Williams
  4 siblings, 2 replies; 22+ messages in thread
From: Dan Williams @ 2022-12-01 22:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: Jonathan.Cameron, dave.jiang, nvdimm, dave

Now that cpu_cache_invalidate_memregion() is generically available, use
it to centralize CPU cache management in the nvdimm region driver.

This trades off removing redundant per-dimm CPU cache flushing with an
opportunistic flush on every region disable event to cover the case of
sensitive dirty data in the cache being written back to media after a
secure erase / overwrite event.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/nfit/intel.c    |   25 ---------------------
 drivers/nvdimm/region.c      |   11 +++++++++
 drivers/nvdimm/region_devs.c |   49 +++++++++++++++++++++++++++++++++++++++++-
 drivers/nvdimm/security.c    |    6 +++++
 include/linux/libnvdimm.h    |    5 ++++
 5 files changed, 70 insertions(+), 26 deletions(-)

diff --git a/drivers/acpi/nfit/intel.c b/drivers/acpi/nfit/intel.c
index fa0e57e35162..3902759abcba 100644
--- a/drivers/acpi/nfit/intel.c
+++ b/drivers/acpi/nfit/intel.c
@@ -212,9 +212,6 @@ static int __maybe_unused intel_security_unlock(struct nvdimm *nvdimm,
 	if (!test_bit(NVDIMM_INTEL_UNLOCK_UNIT, &nfit_mem->dsm_mask))
 		return -ENOTTY;
 
-	if (!cpu_cache_has_invalidate_memregion())
-		return -EINVAL;
-
 	memcpy(nd_cmd.cmd.passphrase, key_data->data,
 			sizeof(nd_cmd.cmd.passphrase));
 	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
@@ -229,9 +226,6 @@ static int __maybe_unused intel_security_unlock(struct nvdimm *nvdimm,
 		return -EIO;
 	}
 
-	/* DIMM unlocked, invalidate all CPU caches before we read it */
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
-
 	return 0;
 }
 
@@ -299,11 +293,6 @@ static int __maybe_unused intel_security_erase(struct nvdimm *nvdimm,
 	if (!test_bit(cmd, &nfit_mem->dsm_mask))
 		return -ENOTTY;
 
-	if (!cpu_cache_has_invalidate_memregion())
-		return -EINVAL;
-
-	/* flush all cache before we erase DIMM */
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
 	memcpy(nd_cmd.cmd.passphrase, key->data,
 			sizeof(nd_cmd.cmd.passphrase));
 	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
@@ -322,8 +311,6 @@ static int __maybe_unused intel_security_erase(struct nvdimm *nvdimm,
 		return -ENXIO;
 	}
 
-	/* DIMM erased, invalidate all CPU caches before we read it */
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
 	return 0;
 }
 
@@ -346,9 +333,6 @@ static int __maybe_unused intel_security_query_overwrite(struct nvdimm *nvdimm)
 	if (!test_bit(NVDIMM_INTEL_QUERY_OVERWRITE, &nfit_mem->dsm_mask))
 		return -ENOTTY;
 
-	if (!cpu_cache_has_invalidate_memregion())
-		return -EINVAL;
-
 	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
 	if (rc < 0)
 		return rc;
@@ -362,8 +346,6 @@ static int __maybe_unused intel_security_query_overwrite(struct nvdimm *nvdimm)
 		return -ENXIO;
 	}
 
-	/* flush all cache before we make the nvdimms available */
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
 	return 0;
 }
 
@@ -388,11 +370,6 @@ static int __maybe_unused intel_security_overwrite(struct nvdimm *nvdimm,
 	if (!test_bit(NVDIMM_INTEL_OVERWRITE, &nfit_mem->dsm_mask))
 		return -ENOTTY;
 
-	if (!cpu_cache_has_invalidate_memregion())
-		return -EINVAL;
-
-	/* flush all cache before we erase DIMM */
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
 	memcpy(nd_cmd.cmd.passphrase, nkey->data,
 			sizeof(nd_cmd.cmd.passphrase));
 	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
@@ -770,5 +747,3 @@ static const struct nvdimm_fw_ops __intel_fw_ops = {
 };
 
 const struct nvdimm_fw_ops *intel_fw_ops = &__intel_fw_ops;
-
-MODULE_IMPORT_NS(DEVMEM);
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index 390123d293ea..88dc062af5f8 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -2,6 +2,7 @@
 /*
  * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
  */
+#include <linux/memregion.h>
 #include <linux/cpumask.h>
 #include <linux/module.h>
 #include <linux/device.h>
@@ -100,6 +101,16 @@ static void nd_region_remove(struct device *dev)
 	 */
 	sysfs_put(nd_region->bb_state);
 	nd_region->bb_state = NULL;
+
+	/*
+	 * Try to flush caches here since a disabled region may be subject to
+	 * secure erase while disabled, and previous dirty data should not be
+	 * written back to a new instance of the region. This only matters on
+	 * bare metal where security commands are available, so silent failure
+	 * here is ok.
+	 */
+	if (cpu_cache_has_invalidate_memregion())
+		cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
 }
 
 static int child_notify(struct device *dev, void *data)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e0875d369762..c73e3b1fd0a6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -59,13 +59,57 @@ static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm,
 	return 0;
 }
 
+static int nd_region_invalidate_memregion(struct nd_region *nd_region)
+{
+	int i, incoherent = 0;
+
+	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+		if (test_bit(NDD_INCOHERENT, &nvdimm->flags))
+			incoherent++;
+	}
+
+	if (!incoherent)
+		return 0;
+
+	if (!cpu_cache_has_invalidate_memregion()) {
+		if (IS_ENABLED(CONFIG_NVDIMM_SECURITY_TEST)) {
+			dev_warn(
+				&nd_region->dev,
+				"Bypassing cpu_cache_invalidate_memergion() for testing!\n");
+			goto out;
+		} else {
+			dev_err(&nd_region->dev,
+				"Failed to synchronize CPU cache state\n");
+			return -ENXIO;
+		}
+	}
+
+	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
+out:
+	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+		clear_bit(NDD_INCOHERENT, &nvdimm->flags);
+	}
+
+	return 0;
+}
+
 int nd_region_activate(struct nd_region *nd_region)
 {
-	int i, j, num_flush = 0;
+	int i, j, rc, num_flush = 0;
 	struct nd_region_data *ndrd;
 	struct device *dev = &nd_region->dev;
 	size_t flush_data_size = sizeof(void *);
 
+	rc = nd_region_invalidate_memregion(nd_region);
+	if (rc)
+		return rc;
+
 	nvdimm_bus_lock(&nd_region->dev);
 	for (i = 0; i < nd_region->ndr_mappings; i++) {
 		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
@@ -85,6 +129,7 @@ int nd_region_activate(struct nd_region *nd_region)
 	}
 	nvdimm_bus_unlock(&nd_region->dev);
 
+
 	ndrd = devm_kzalloc(dev, sizeof(*ndrd) + flush_data_size, GFP_KERNEL);
 	if (!ndrd)
 		return -ENOMEM;
@@ -1222,3 +1267,5 @@ int nd_region_conflict(struct nd_region *nd_region, resource_size_t start,
 
 	return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict);
 }
+
+MODULE_IMPORT_NS(DEVMEM);
diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c
index 6814339b3dab..a03e3c45f297 100644
--- a/drivers/nvdimm/security.c
+++ b/drivers/nvdimm/security.c
@@ -208,6 +208,8 @@ static int __nvdimm_security_unlock(struct nvdimm *nvdimm)
 	rc = nvdimm->sec.ops->unlock(nvdimm, data);
 	dev_dbg(dev, "key: %d unlock: %s\n", key_serial(key),
 			rc == 0 ? "success" : "fail");
+	if (rc == 0)
+		set_bit(NDD_INCOHERENT, &nvdimm->flags);
 
 	nvdimm_put_key(key);
 	nvdimm->sec.flags = nvdimm_security_flags(nvdimm, NVDIMM_USER);
@@ -374,6 +376,8 @@ static int security_erase(struct nvdimm *nvdimm, unsigned int keyid,
 		return -ENOKEY;
 
 	rc = nvdimm->sec.ops->erase(nvdimm, data, pass_type);
+	if (rc == 0)
+		set_bit(NDD_INCOHERENT, &nvdimm->flags);
 	dev_dbg(dev, "key: %d erase%s: %s\n", key_serial(key),
 			pass_type == NVDIMM_MASTER ? "(master)" : "(user)",
 			rc == 0 ? "success" : "fail");
@@ -408,6 +412,8 @@ static int security_overwrite(struct nvdimm *nvdimm, unsigned int keyid)
 		return -ENOKEY;
 
 	rc = nvdimm->sec.ops->overwrite(nvdimm, data);
+	if (rc == 0)
+		set_bit(NDD_INCOHERENT, &nvdimm->flags);
 	dev_dbg(dev, "key: %d overwrite submission: %s\n", key_serial(key),
 			rc == 0 ? "success" : "fail");
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 3bf658a74ccb..af38252ad704 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -35,6 +35,11 @@ enum {
 	NDD_WORK_PENDING = 4,
 	/* dimm supports namespace labels */
 	NDD_LABELING = 6,
+	/*
+	 * dimm contents have changed requiring invalidation of CPU caches prior
+	 * to activation of a region that includes this device
+	 */
+	NDD_INCOHERENT = 7,
 
 	/* need to set a limit somewhere, but yes, this is likely overkill */
 	ND_IOCTL_MAX_BUFLEN = SZ_4M,


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events
  2022-12-01 22:03 [PATCH 0/5] cxl, nvdimm: Move CPU cache management to region drivers Dan Williams
                   ` (3 preceding siblings ...)
  2022-12-01 22:03 ` [PATCH 4/5] nvdimm/region: Move cache management to the region driver Dan Williams
@ 2022-12-01 22:03 ` Dan Williams
  2022-12-01 23:04   ` Dave Jiang
  2022-12-05 19:20   ` Davidlohr Bueso
  4 siblings, 2 replies; 22+ messages in thread
From: Dan Williams @ 2022-12-01 22:03 UTC (permalink / raw)
  To: linux-cxl; +Cc: Jonathan.Cameron, dave.jiang, nvdimm, dave

A "DPA invalidation event" is any scenario where the contents of a DPA
(Device Physical Address) is modified in a way that is incoherent with
CPU caches, or if the HPA (Host Physical Address) to DPA association
changes due to a remapping event.

PMEM security events like Unlock and Passphrase Secure Erase already
manage caches through LIBNVDIMM, so that leaves HPA to DPA remap events
that need cache management by the CXL core. Those only happen when the
boot time CXL configuration has changed. That event occurs when
userspace attaches an endpoint decoder to a region configuration, and
that region is subsequently activated.

The implications of not invalidating caches between remap events is that
reads from the region at different points in time may return different
results due to stale cached data from the previous HPA to DPA mapping.
Without a guarantee that the region contents after cxl_region_probe()
are written before being read (a layering-violation assumption that
cxl_region_probe() can not make) the CXL subsystem needs to ensure that
reads that precede writes see consistent results.

A CONFIG_CXL_REGION_INVALIDATION_TEST option is added to support debug
and unit testing of the CXL implementation in QEMU or other environments
where cpu_cache_has_invalidate_memregion() returns false. This may prove
too restrictive for QEMU where the HDM decoders are emulated, but in
that case the CXL subsystem needs some new mechanism / indication that
the HDM decoder is emulated and not a passthrough of real hardware.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/Kconfig       |   18 ++++++++++++++++++
 drivers/cxl/core/region.c |   31 +++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h         |    8 ++++++++
 drivers/cxl/security.c    |   14 --------------
 4 files changed, 57 insertions(+), 14 deletions(-)

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 768ced3d6fe8..0ac53c422c31 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -111,4 +111,22 @@ config CXL_REGION
 	select MEMREGION
 	select GET_FREE_REGION
 
+config CXL_REGION_INVALIDATION_TEST
+	bool "CXL: Region Cache Management Bypass (TEST)"
+	depends on CXL_REGION
+	help
+	  CXL Region management and security operations potentially invalidate
+	  the content of CPU caches without notifiying those caches to
+	  invalidate the affected cachelines. The CXL Region driver attempts
+	  to invalidate caches when those events occur.  If that invalidation
+	  fails the region will fail to enable.  Reasons for cache
+	  invalidation failure are due to the CPU not providing a cache
+	  invalidation mechanism. For example usage of wbinvd is restricted to
+	  bare metal x86. However, for testing purposes toggling this option
+	  can disable that data integrity safety and proceed with enabling
+	  regions when there might be conflicting contents in the CPU cache.
+
+	  If unsure, or if this kernel is meant for production environments,
+	  say N.
+
 endif
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 1bc2ebefa2a5..3a6c3f84015f 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1403,6 +1403,8 @@ static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos)
 		goto out;
 	down_read(&cxl_dpa_rwsem);
 	rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos);
+	if (rc == 0)
+		set_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
 	up_read(&cxl_dpa_rwsem);
 	up_write(&cxl_region_rwsem);
 out:
@@ -1900,6 +1902,30 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr)
 	return rc;
 }
 
+static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
+{
+	if (!test_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags))
+		return 0;
+
+	if (!cpu_cache_has_invalidate_memregion()) {
+		if (IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST)) {
+			dev_warn(
+				&cxlr->dev,
+				"Bypassing cpu_cache_invalidate_memergion() for testing!\n");
+			clear_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
+			return 0;
+		} else {
+			dev_err(&cxlr->dev,
+				"Failed to synchronize CPU cache state\n");
+			return -ENXIO;
+		}
+	}
+
+	cpu_cache_invalidate_memregion(IORES_DESC_CXL);
+	clear_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
+	return 0;
+}
+
 static int cxl_region_probe(struct device *dev)
 {
 	struct cxl_region *cxlr = to_cxl_region(dev);
@@ -1915,12 +1941,16 @@ static int cxl_region_probe(struct device *dev)
 	if (p->state < CXL_CONFIG_COMMIT) {
 		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
 		rc = -ENXIO;
+		goto out;
 	}
 
+	rc = cxl_region_invalidate_memregion(cxlr);
+
 	/*
 	 * From this point on any path that changes the region's state away from
 	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
 	 */
+out:
 	up_read(&cxl_region_rwsem);
 
 	if (rc)
@@ -1953,4 +1983,5 @@ void cxl_region_exit(void)
 }
 
 MODULE_IMPORT_NS(CXL);
+MODULE_IMPORT_NS(DEVMEM);
 MODULE_ALIAS_CXL(CXL_DEVICE_REGION);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b433e541a054..e5e1abceeca7 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -380,12 +380,19 @@ struct cxl_region_params {
 	int nr_targets;
 };
 
+/*
+ * Flag whether this region needs to have its HPA span synchronized with
+ * CPU cache state at region activation time.
+ */
+#define CXL_REGION_F_INCOHERENT 0
+
 /**
  * struct cxl_region - CXL region
  * @dev: This region's device
  * @id: This region's id. Id is globally unique across all regions
  * @mode: Endpoint decoder allocation / access mode
  * @type: Endpoint decoder target type
+ * @flags: Region state flags
  * @params: active + config params for the region
  */
 struct cxl_region {
@@ -393,6 +400,7 @@ struct cxl_region {
 	int id;
 	enum cxl_decoder_mode mode;
 	enum cxl_decoder_type type;
+	unsigned long flags;
 	struct cxl_region_params params;
 };
 
diff --git a/drivers/cxl/security.c b/drivers/cxl/security.c
index cbd005ceb091..5484d4eecfd1 100644
--- a/drivers/cxl/security.c
+++ b/drivers/cxl/security.c
@@ -120,17 +120,12 @@ static int cxl_pmem_security_unlock(struct nvdimm *nvdimm,
 	u8 pass[NVDIMM_PASSPHRASE_LEN];
 	int rc;
 
-	if (!cpu_cache_has_invalidate_memregion())
-		return -EINVAL;
-
 	memcpy(pass, key_data->data, NVDIMM_PASSPHRASE_LEN);
 	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_UNLOCK,
 			       pass, NVDIMM_PASSPHRASE_LEN, NULL, 0);
 	if (rc < 0)
 		return rc;
 
-	/* DIMM unlocked, invalidate all CPU caches before we read it */
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
 	return 0;
 }
 
@@ -144,21 +139,14 @@ static int cxl_pmem_security_passphrase_erase(struct nvdimm *nvdimm,
 	struct cxl_pass_erase erase;
 	int rc;
 
-	if (!cpu_cache_has_invalidate_memregion())
-		return -EINVAL;
-
 	erase.type = ptype == NVDIMM_MASTER ?
 		CXL_PMEM_SEC_PASS_MASTER : CXL_PMEM_SEC_PASS_USER;
 	memcpy(erase.pass, key->data, NVDIMM_PASSPHRASE_LEN);
-	/* Flush all cache before we erase mem device */
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
 	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_PASSPHRASE_SECURE_ERASE,
 			       &erase, sizeof(erase), NULL, 0);
 	if (rc < 0)
 		return rc;
 
-	/* mem device erased, invalidate all CPU caches before data is read */
-	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
 	return 0;
 }
 
@@ -173,5 +161,3 @@ static const struct nvdimm_security_ops __cxl_security_ops = {
 };
 
 const struct nvdimm_security_ops *cxl_security_ops = &__cxl_security_ops;
-
-MODULE_IMPORT_NS(DEVMEM);


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] cxl/region: Fix missing probe failure
  2022-12-01 22:03 ` [PATCH 2/5] cxl/region: Fix missing probe failure Dan Williams
@ 2022-12-01 22:30   ` Dave Jiang
  2022-12-02  1:45   ` Davidlohr Bueso
  2022-12-02 14:23   ` Jonathan Cameron
  2 siblings, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2022-12-01 22:30 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: stable, Jonathan.Cameron, nvdimm, dave



On 12/1/2022 3:03 PM, Dan Williams wrote:
> cxl_region_probe() allows for regions not in the 'commit' state to be
> enabled. Fail probe when the region is not committed otherwise the
> kernel may indicate that an address range is active when none of the
> decoders are active.
> 
> Fixes: 8d48817df6ac ("cxl/region: Add region driver boiler plate")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/core/region.c |    3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f9ae5ad284ff..1bc2ebefa2a5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1923,6 +1923,9 @@ static int cxl_region_probe(struct device *dev)
>   	 */
>   	up_read(&cxl_region_rwsem);
>   
> +	if (rc)
> +		return rc;
> +
>   	switch (cxlr->mode) {
>   	case CXL_DECODER_PMEM:
>   		return devm_cxl_add_pmem_region(cxlr);
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security
  2022-12-01 22:03 ` [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security Dan Williams
@ 2022-12-01 22:32   ` Dave Jiang
  2022-12-01 22:44     ` Dan Williams
  2022-12-02  1:49   ` Davidlohr Bueso
  2022-12-02 14:24   ` Jonathan Cameron
  2 siblings, 1 reply; 22+ messages in thread
From: Dave Jiang @ 2022-12-01 22:32 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: Jonathan.Cameron, nvdimm, dave



On 12/1/2022 3:03 PM, Dan Williams wrote:
> Preclude the possibility of user tooling sending device secrets in the
> clear into the kernel by marking the security commands as exclusive.
> This mandates the usage of the keyctl ABI for managing the device
> passphrase.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

No need for get security state command?

> ---
>   drivers/cxl/core/mbox.c |   10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 8747db329087..35dd889f1d3a 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -704,6 +704,16 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>   		rc = 0;
>   	}
>   
> +	/*
> +	 * Setup permanently kernel exclusive commands, i.e. the
> +	 * mechanism is driven through sysfs, keyctl, etc...
> +	 */
> +	set_bit(CXL_MEM_COMMAND_ID_SET_PASSPHRASE, cxlds->exclusive_cmds);
> +	set_bit(CXL_MEM_COMMAND_ID_DISABLE_PASSPHRASE, cxlds->exclusive_cmds);
> +	set_bit(CXL_MEM_COMMAND_ID_UNLOCK, cxlds->exclusive_cmds);
> +	set_bit(CXL_MEM_COMMAND_ID_PASSPHRASE_SECURE_ERASE,
> +		cxlds->exclusive_cmds);
> +
>   out:
>   	kvfree(gsl);
>   	return rc;
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security
  2022-12-01 22:32   ` Dave Jiang
@ 2022-12-01 22:44     ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2022-12-01 22:44 UTC (permalink / raw)
  To: Dave Jiang, Dan Williams, linux-cxl; +Cc: Jonathan.Cameron, nvdimm, dave

Dave Jiang wrote:
> 
> 
> On 12/1/2022 3:03 PM, Dan Williams wrote:
> > Preclude the possibility of user tooling sending device secrets in the
> > clear into the kernel by marking the security commands as exclusive.
> > This mandates the usage of the keyctl ABI for managing the device
> > passphrase.
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
> No need for get security state command?

That one is ok since it's just a read-only command with no side-effect
and no key material traversing the kernel-user boundary.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] nvdimm/region: Move cache management to the region driver
  2022-12-01 22:03 ` [PATCH 4/5] nvdimm/region: Move cache management to the region driver Dan Williams
@ 2022-12-01 23:00   ` Dave Jiang
  2022-12-02  3:21   ` Davidlohr Bueso
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2022-12-01 23:00 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: Jonathan.Cameron, nvdimm, dave



On 12/1/2022 3:03 PM, Dan Williams wrote:
> Now that cpu_cache_invalidate_memregion() is generically available, use
> it to centralize CPU cache management in the nvdimm region driver.
> 
> This trades off removing redundant per-dimm CPU cache flushing with an
> opportunistic flush on every region disable event to cover the case of
> sensitive dirty data in the cache being written back to media after a
> secure erase / overwrite event.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

One minor bit below, otherwise
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>   drivers/acpi/nfit/intel.c    |   25 ---------------------
>   drivers/nvdimm/region.c      |   11 +++++++++
>   drivers/nvdimm/region_devs.c |   49 +++++++++++++++++++++++++++++++++++++++++-
>   drivers/nvdimm/security.c    |    6 +++++
>   include/linux/libnvdimm.h    |    5 ++++
>   5 files changed, 70 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/acpi/nfit/intel.c b/drivers/acpi/nfit/intel.c
> index fa0e57e35162..3902759abcba 100644
> --- a/drivers/acpi/nfit/intel.c
> +++ b/drivers/acpi/nfit/intel.c
> @@ -212,9 +212,6 @@ static int __maybe_unused intel_security_unlock(struct nvdimm *nvdimm,
>   	if (!test_bit(NVDIMM_INTEL_UNLOCK_UNIT, &nfit_mem->dsm_mask))
>   		return -ENOTTY;
>   
> -	if (!cpu_cache_has_invalidate_memregion())
> -		return -EINVAL;
> -
>   	memcpy(nd_cmd.cmd.passphrase, key_data->data,
>   			sizeof(nd_cmd.cmd.passphrase));
>   	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
> @@ -229,9 +226,6 @@ static int __maybe_unused intel_security_unlock(struct nvdimm *nvdimm,
>   		return -EIO;
>   	}
>   
> -	/* DIMM unlocked, invalidate all CPU caches before we read it */
> -	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
> -
>   	return 0;
>   }
>   
> @@ -299,11 +293,6 @@ static int __maybe_unused intel_security_erase(struct nvdimm *nvdimm,
>   	if (!test_bit(cmd, &nfit_mem->dsm_mask))
>   		return -ENOTTY;
>   
> -	if (!cpu_cache_has_invalidate_memregion())
> -		return -EINVAL;
> -
> -	/* flush all cache before we erase DIMM */
> -	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>   	memcpy(nd_cmd.cmd.passphrase, key->data,
>   			sizeof(nd_cmd.cmd.passphrase));
>   	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
> @@ -322,8 +311,6 @@ static int __maybe_unused intel_security_erase(struct nvdimm *nvdimm,
>   		return -ENXIO;
>   	}
>   
> -	/* DIMM erased, invalidate all CPU caches before we read it */
> -	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>   	return 0;
>   }
>   
> @@ -346,9 +333,6 @@ static int __maybe_unused intel_security_query_overwrite(struct nvdimm *nvdimm)
>   	if (!test_bit(NVDIMM_INTEL_QUERY_OVERWRITE, &nfit_mem->dsm_mask))
>   		return -ENOTTY;
>   
> -	if (!cpu_cache_has_invalidate_memregion())
> -		return -EINVAL;
> -
>   	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
>   	if (rc < 0)
>   		return rc;
> @@ -362,8 +346,6 @@ static int __maybe_unused intel_security_query_overwrite(struct nvdimm *nvdimm)
>   		return -ENXIO;
>   	}
>   
> -	/* flush all cache before we make the nvdimms available */
> -	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>   	return 0;
>   }
>   
> @@ -388,11 +370,6 @@ static int __maybe_unused intel_security_overwrite(struct nvdimm *nvdimm,
>   	if (!test_bit(NVDIMM_INTEL_OVERWRITE, &nfit_mem->dsm_mask))
>   		return -ENOTTY;
>   
> -	if (!cpu_cache_has_invalidate_memregion())
> -		return -EINVAL;
> -
> -	/* flush all cache before we erase DIMM */
> -	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>   	memcpy(nd_cmd.cmd.passphrase, nkey->data,
>   			sizeof(nd_cmd.cmd.passphrase));
>   	rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
> @@ -770,5 +747,3 @@ static const struct nvdimm_fw_ops __intel_fw_ops = {
>   };
>   
>   const struct nvdimm_fw_ops *intel_fw_ops = &__intel_fw_ops;
> -
> -MODULE_IMPORT_NS(DEVMEM);
> diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
> index 390123d293ea..88dc062af5f8 100644
> --- a/drivers/nvdimm/region.c
> +++ b/drivers/nvdimm/region.c
> @@ -2,6 +2,7 @@
>   /*
>    * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
>    */
> +#include <linux/memregion.h>
>   #include <linux/cpumask.h>
>   #include <linux/module.h>
>   #include <linux/device.h>
> @@ -100,6 +101,16 @@ static void nd_region_remove(struct device *dev)
>   	 */
>   	sysfs_put(nd_region->bb_state);
>   	nd_region->bb_state = NULL;
> +
> +	/*
> +	 * Try to flush caches here since a disabled region may be subject to
> +	 * secure erase while disabled, and previous dirty data should not be
> +	 * written back to a new instance of the region. This only matters on
> +	 * bare metal where security commands are available, so silent failure
> +	 * here is ok.
> +	 */
> +	if (cpu_cache_has_invalidate_memregion())
> +		cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>   }
>   
>   static int child_notify(struct device *dev, void *data)
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index e0875d369762..c73e3b1fd0a6 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -59,13 +59,57 @@ static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm,
>   	return 0;
>   }
>   
> +static int nd_region_invalidate_memregion(struct nd_region *nd_region)
> +{
> +	int i, incoherent = 0;
> +
> +	for (i = 0; i < nd_region->ndr_mappings; i++) {
> +		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
> +		struct nvdimm *nvdimm = nd_mapping->nvdimm;
> +
> +		if (test_bit(NDD_INCOHERENT, &nvdimm->flags))
> +			incoherent++;
> +	}
> +
> +	if (!incoherent)
> +		return 0;
> +
> +	if (!cpu_cache_has_invalidate_memregion()) {
> +		if (IS_ENABLED(CONFIG_NVDIMM_SECURITY_TEST)) {
> +			dev_warn(
> +				&nd_region->dev,
> +				"Bypassing cpu_cache_invalidate_memergion() for testing!\n");
> +			goto out;
> +		} else {
> +			dev_err(&nd_region->dev,
> +				"Failed to synchronize CPU cache state\n");
> +			return -ENXIO;
> +		}
> +	}
> +
> +	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
> +out:
> +	for (i = 0; i < nd_region->ndr_mappings; i++) {
> +		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
> +		struct nvdimm *nvdimm = nd_mapping->nvdimm;
> +
> +		clear_bit(NDD_INCOHERENT, &nvdimm->flags);
> +	}
> +
> +	return 0;
> +}
> +
>   int nd_region_activate(struct nd_region *nd_region)
>   {
> -	int i, j, num_flush = 0;
> +	int i, j, rc, num_flush = 0;
>   	struct nd_region_data *ndrd;
>   	struct device *dev = &nd_region->dev;
>   	size_t flush_data_size = sizeof(void *);
>   
> +	rc = nd_region_invalidate_memregion(nd_region);
> +	if (rc)
> +		return rc;
> +
>   	nvdimm_bus_lock(&nd_region->dev);
>   	for (i = 0; i < nd_region->ndr_mappings; i++) {
>   		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
> @@ -85,6 +129,7 @@ int nd_region_activate(struct nd_region *nd_region)
>   	}
>   	nvdimm_bus_unlock(&nd_region->dev);
>   
> +

Extraneous blankline

DJ

>   	ndrd = devm_kzalloc(dev, sizeof(*ndrd) + flush_data_size, GFP_KERNEL);
>   	if (!ndrd)
>   		return -ENOMEM;
> @@ -1222,3 +1267,5 @@ int nd_region_conflict(struct nd_region *nd_region, resource_size_t start,
>   
>   	return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict);
>   }
> +
> +MODULE_IMPORT_NS(DEVMEM);
> diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c
> index 6814339b3dab..a03e3c45f297 100644
> --- a/drivers/nvdimm/security.c
> +++ b/drivers/nvdimm/security.c
> @@ -208,6 +208,8 @@ static int __nvdimm_security_unlock(struct nvdimm *nvdimm)
>   	rc = nvdimm->sec.ops->unlock(nvdimm, data);
>   	dev_dbg(dev, "key: %d unlock: %s\n", key_serial(key),
>   			rc == 0 ? "success" : "fail");
> +	if (rc == 0)
> +		set_bit(NDD_INCOHERENT, &nvdimm->flags);
>   
>   	nvdimm_put_key(key);
>   	nvdimm->sec.flags = nvdimm_security_flags(nvdimm, NVDIMM_USER);
> @@ -374,6 +376,8 @@ static int security_erase(struct nvdimm *nvdimm, unsigned int keyid,
>   		return -ENOKEY;
>   
>   	rc = nvdimm->sec.ops->erase(nvdimm, data, pass_type);
> +	if (rc == 0)
> +		set_bit(NDD_INCOHERENT, &nvdimm->flags);
>   	dev_dbg(dev, "key: %d erase%s: %s\n", key_serial(key),
>   			pass_type == NVDIMM_MASTER ? "(master)" : "(user)",
>   			rc == 0 ? "success" : "fail");
> @@ -408,6 +412,8 @@ static int security_overwrite(struct nvdimm *nvdimm, unsigned int keyid)
>   		return -ENOKEY;
>   
>   	rc = nvdimm->sec.ops->overwrite(nvdimm, data);
> +	if (rc == 0)
> +		set_bit(NDD_INCOHERENT, &nvdimm->flags);
>   	dev_dbg(dev, "key: %d overwrite submission: %s\n", key_serial(key),
>   			rc == 0 ? "success" : "fail");
>   
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 3bf658a74ccb..af38252ad704 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -35,6 +35,11 @@ enum {
>   	NDD_WORK_PENDING = 4,
>   	/* dimm supports namespace labels */
>   	NDD_LABELING = 6,
> +	/*
> +	 * dimm contents have changed requiring invalidation of CPU caches prior
> +	 * to activation of a region that includes this device
> +	 */
> +	NDD_INCOHERENT = 7,
>   
>   	/* need to set a limit somewhere, but yes, this is likely overkill */
>   	ND_IOCTL_MAX_BUFLEN = SZ_4M,
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events
  2022-12-01 22:03 ` [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events Dan Williams
@ 2022-12-01 23:04   ` Dave Jiang
  2022-12-05 19:20   ` Davidlohr Bueso
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2022-12-01 23:04 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: Jonathan.Cameron, nvdimm, dave



On 12/1/2022 3:03 PM, Dan Williams wrote:
> A "DPA invalidation event" is any scenario where the contents of a DPA
> (Device Physical Address) is modified in a way that is incoherent with
> CPU caches, or if the HPA (Host Physical Address) to DPA association
> changes due to a remapping event.
> 
> PMEM security events like Unlock and Passphrase Secure Erase already
> manage caches through LIBNVDIMM, so that leaves HPA to DPA remap events
> that need cache management by the CXL core. Those only happen when the
> boot time CXL configuration has changed. That event occurs when
> userspace attaches an endpoint decoder to a region configuration, and
> that region is subsequently activated.
> 
> The implications of not invalidating caches between remap events is that
> reads from the region at different points in time may return different
> results due to stale cached data from the previous HPA to DPA mapping.
> Without a guarantee that the region contents after cxl_region_probe()
> are written before being read (a layering-violation assumption that
> cxl_region_probe() can not make) the CXL subsystem needs to ensure that
> reads that precede writes see consistent results.
> 
> A CONFIG_CXL_REGION_INVALIDATION_TEST option is added to support debug
> and unit testing of the CXL implementation in QEMU or other environments
> where cpu_cache_has_invalidate_memregion() returns false. This may prove
> too restrictive for QEMU where the HDM decoders are emulated, but in
> that case the CXL subsystem needs some new mechanism / indication that
> the HDM decoder is emulated and not a passthrough of real hardware.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/Kconfig       |   18 ++++++++++++++++++
>   drivers/cxl/core/region.c |   31 +++++++++++++++++++++++++++++++
>   drivers/cxl/cxl.h         |    8 ++++++++
>   drivers/cxl/security.c    |   14 --------------
>   4 files changed, 57 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 768ced3d6fe8..0ac53c422c31 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -111,4 +111,22 @@ config CXL_REGION
>   	select MEMREGION
>   	select GET_FREE_REGION
>   
> +config CXL_REGION_INVALIDATION_TEST
> +	bool "CXL: Region Cache Management Bypass (TEST)"
> +	depends on CXL_REGION
> +	help
> +	  CXL Region management and security operations potentially invalidate
> +	  the content of CPU caches without notifiying those caches to
> +	  invalidate the affected cachelines. The CXL Region driver attempts
> +	  to invalidate caches when those events occur.  If that invalidation
> +	  fails the region will fail to enable.  Reasons for cache
> +	  invalidation failure are due to the CPU not providing a cache
> +	  invalidation mechanism. For example usage of wbinvd is restricted to
> +	  bare metal x86. However, for testing purposes toggling this option
> +	  can disable that data integrity safety and proceed with enabling
> +	  regions when there might be conflicting contents in the CPU cache.
> +
> +	  If unsure, or if this kernel is meant for production environments,
> +	  say N.
> +
>   endif
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 1bc2ebefa2a5..3a6c3f84015f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1403,6 +1403,8 @@ static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos)
>   		goto out;
>   	down_read(&cxl_dpa_rwsem);
>   	rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos);
> +	if (rc == 0)
> +		set_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
>   	up_read(&cxl_dpa_rwsem);
>   	up_write(&cxl_region_rwsem);
>   out:
> @@ -1900,6 +1902,30 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr)
>   	return rc;
>   }
>   
> +static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
> +{
> +	if (!test_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags))
> +		return 0;
> +
> +	if (!cpu_cache_has_invalidate_memregion()) {
> +		if (IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST)) {
> +			dev_warn(
> +				&cxlr->dev,
> +				"Bypassing cpu_cache_invalidate_memergion() for testing!\n");
> +			clear_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
> +			return 0;
> +		} else {
> +			dev_err(&cxlr->dev,
> +				"Failed to synchronize CPU cache state\n");
> +			return -ENXIO;
> +		}
> +	}
> +
> +	cpu_cache_invalidate_memregion(IORES_DESC_CXL);
> +	clear_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags);
> +	return 0;
> +}
> +
>   static int cxl_region_probe(struct device *dev)
>   {
>   	struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -1915,12 +1941,16 @@ static int cxl_region_probe(struct device *dev)
>   	if (p->state < CXL_CONFIG_COMMIT) {
>   		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
>   		rc = -ENXIO;
> +		goto out;
>   	}
>   
> +	rc = cxl_region_invalidate_memregion(cxlr);
> +
>   	/*
>   	 * From this point on any path that changes the region's state away from
>   	 * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
>   	 */
> +out:
>   	up_read(&cxl_region_rwsem);
>   
>   	if (rc)
> @@ -1953,4 +1983,5 @@ void cxl_region_exit(void)
>   }
>   
>   MODULE_IMPORT_NS(CXL);
> +MODULE_IMPORT_NS(DEVMEM);
>   MODULE_ALIAS_CXL(CXL_DEVICE_REGION);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index b433e541a054..e5e1abceeca7 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -380,12 +380,19 @@ struct cxl_region_params {
>   	int nr_targets;
>   };
>   
> +/*
> + * Flag whether this region needs to have its HPA span synchronized with
> + * CPU cache state at region activation time.
> + */
> +#define CXL_REGION_F_INCOHERENT 0
> +
>   /**
>    * struct cxl_region - CXL region
>    * @dev: This region's device
>    * @id: This region's id. Id is globally unique across all regions
>    * @mode: Endpoint decoder allocation / access mode
>    * @type: Endpoint decoder target type
> + * @flags: Region state flags
>    * @params: active + config params for the region
>    */
>   struct cxl_region {
> @@ -393,6 +400,7 @@ struct cxl_region {
>   	int id;
>   	enum cxl_decoder_mode mode;
>   	enum cxl_decoder_type type;
> +	unsigned long flags;
>   	struct cxl_region_params params;
>   };
>   
> diff --git a/drivers/cxl/security.c b/drivers/cxl/security.c
> index cbd005ceb091..5484d4eecfd1 100644
> --- a/drivers/cxl/security.c
> +++ b/drivers/cxl/security.c
> @@ -120,17 +120,12 @@ static int cxl_pmem_security_unlock(struct nvdimm *nvdimm,
>   	u8 pass[NVDIMM_PASSPHRASE_LEN];
>   	int rc;
>   
> -	if (!cpu_cache_has_invalidate_memregion())
> -		return -EINVAL;
> -
>   	memcpy(pass, key_data->data, NVDIMM_PASSPHRASE_LEN);
>   	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_UNLOCK,
>   			       pass, NVDIMM_PASSPHRASE_LEN, NULL, 0);
>   	if (rc < 0)
>   		return rc;
>   
> -	/* DIMM unlocked, invalidate all CPU caches before we read it */
> -	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>   	return 0;
>   }
>   
> @@ -144,21 +139,14 @@ static int cxl_pmem_security_passphrase_erase(struct nvdimm *nvdimm,
>   	struct cxl_pass_erase erase;
>   	int rc;
>   
> -	if (!cpu_cache_has_invalidate_memregion())
> -		return -EINVAL;
> -
>   	erase.type = ptype == NVDIMM_MASTER ?
>   		CXL_PMEM_SEC_PASS_MASTER : CXL_PMEM_SEC_PASS_USER;
>   	memcpy(erase.pass, key->data, NVDIMM_PASSPHRASE_LEN);
> -	/* Flush all cache before we erase mem device */
> -	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>   	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_PASSPHRASE_SECURE_ERASE,
>   			       &erase, sizeof(erase), NULL, 0);
>   	if (rc < 0)
>   		return rc;
>   
> -	/* mem device erased, invalidate all CPU caches before data is read */
> -	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>   	return 0;
>   }
>   
> @@ -173,5 +161,3 @@ static const struct nvdimm_security_ops __cxl_security_ops = {
>   };
>   
>   const struct nvdimm_security_ops *cxl_security_ops = &__cxl_security_ops;
> -
> -MODULE_IMPORT_NS(DEVMEM);
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] cxl/region: Fix missing probe failure
  2022-12-01 22:03 ` [PATCH 2/5] cxl/region: Fix missing probe failure Dan Williams
  2022-12-01 22:30   ` Dave Jiang
@ 2022-12-02  1:45   ` Davidlohr Bueso
  2022-12-02 14:23   ` Jonathan Cameron
  2 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2022-12-02  1:45 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, stable, Jonathan.Cameron, dave.jiang, nvdimm

On Thu, 01 Dec 2022, Dan Williams wrote:

>cxl_region_probe() allows for regions not in the 'commit' state to be
>enabled. Fail probe when the region is not committed otherwise the
>kernel may indicate that an address range is active when none of the
>decoders are active.
>
>Fixes: 8d48817df6ac ("cxl/region: Add region driver boiler plate")
>Cc: <stable@vger.kernel.org>
>Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

>---
> drivers/cxl/core/region.c |    3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>index f9ae5ad284ff..1bc2ebefa2a5 100644
>--- a/drivers/cxl/core/region.c
>+++ b/drivers/cxl/core/region.c
>@@ -1923,6 +1923,9 @@ static int cxl_region_probe(struct device *dev)
>	 */
>	up_read(&cxl_region_rwsem);
>
>+	if (rc)
>+		return rc;
>+
>	switch (cxlr->mode) {
>	case CXL_DECODER_PMEM:
>		return devm_cxl_add_pmem_region(cxlr);
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security
  2022-12-01 22:03 ` [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security Dan Williams
  2022-12-01 22:32   ` Dave Jiang
@ 2022-12-02  1:49   ` Davidlohr Bueso
  2022-12-02 14:24   ` Jonathan Cameron
  2 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2022-12-02  1:49 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Jonathan.Cameron, dave.jiang, nvdimm

On Thu, 01 Dec 2022, Dan Williams wrote:

>Preclude the possibility of user tooling sending device secrets in the
>clear into the kernel by marking the security commands as exclusive.
>This mandates the usage of the keyctl ABI for managing the device
>passphrase.
>
>Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

>---
> drivers/cxl/core/mbox.c |   10 ++++++++++
> 1 file changed, 10 insertions(+)
>
>diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>index 8747db329087..35dd889f1d3a 100644
>--- a/drivers/cxl/core/mbox.c
>+++ b/drivers/cxl/core/mbox.c
>@@ -704,6 +704,16 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> 		rc = 0;
> 	}
>
>+	/*
>+	 * Setup permanently kernel exclusive commands, i.e. the
>+	 * mechanism is driven through sysfs, keyctl, etc...
>+	 */
>+	set_bit(CXL_MEM_COMMAND_ID_SET_PASSPHRASE, cxlds->exclusive_cmds);
>+	set_bit(CXL_MEM_COMMAND_ID_DISABLE_PASSPHRASE, cxlds->exclusive_cmds);
>+	set_bit(CXL_MEM_COMMAND_ID_UNLOCK, cxlds->exclusive_cmds);
>+	set_bit(CXL_MEM_COMMAND_ID_PASSPHRASE_SECURE_ERASE,
>+		cxlds->exclusive_cmds);
>+
> out:
> 	kvfree(gsl);
> 	return rc;
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] nvdimm/region: Move cache management to the region driver
  2022-12-01 22:03 ` [PATCH 4/5] nvdimm/region: Move cache management to the region driver Dan Williams
  2022-12-01 23:00   ` Dave Jiang
@ 2022-12-02  3:21   ` Davidlohr Bueso
  2022-12-03  8:01     ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: Davidlohr Bueso @ 2022-12-02  3:21 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Jonathan.Cameron, dave.jiang, nvdimm

On Thu, 01 Dec 2022, Dan Williams wrote:

>Now that cpu_cache_invalidate_memregion() is generically available, use
>it to centralize CPU cache management in the nvdimm region driver.
>
>This trades off removing redundant per-dimm CPU cache flushing with an
>opportunistic flush on every region disable event to cover the case of
>sensitive dirty data in the cache being written back to media after a
>secure erase / overwrite event.

Very nifty.

>Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

with a few notes below.

>+static int nd_region_invalidate_memregion(struct nd_region *nd_region)
>+{
>+	int i, incoherent = 0;
>+
>+	for (i = 0; i < nd_region->ndr_mappings; i++) {
>+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
>+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
>+
>+		if (test_bit(NDD_INCOHERENT, &nvdimm->flags))
>+			incoherent++;

No need to compute the rest, just break out here?

>+	}
>+
>+	if (!incoherent)
>+		return 0;
>+
>+	if (!cpu_cache_has_invalidate_memregion()) {
>+		if (IS_ENABLED(CONFIG_NVDIMM_SECURITY_TEST)) {
>+			dev_warn(
>+				&nd_region->dev,
>+				"Bypassing cpu_cache_invalidate_memergion() for testing!\n");
>+			goto out;
>+		} else {
>+			dev_err(&nd_region->dev,
>+				"Failed to synchronize CPU cache state\n");
>+			return -ENXIO;
>+		}
>+	}
>+
>+	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
>+out:
>+	for (i = 0; i < nd_region->ndr_mappings; i++) {
>+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
>+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
>+
>+		clear_bit(NDD_INCOHERENT, &nvdimm->flags);
>+	}
>+
>+	return 0;
>+}
>+
> int nd_region_activate(struct nd_region *nd_region)
> {
>-	int i, j, num_flush = 0;
>+	int i, j, rc, num_flush = 0;
>	struct nd_region_data *ndrd;
>	struct device *dev = &nd_region->dev;
>	size_t flush_data_size = sizeof(void *);
>
>+	rc = nd_region_invalidate_memregion(nd_region);
>+	if (rc)
>+		return rc;
>+
>	nvdimm_bus_lock(&nd_region->dev);
>	for (i = 0; i < nd_region->ndr_mappings; i++) {
>		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
>@@ -85,6 +129,7 @@ int nd_region_activate(struct nd_region *nd_region)
>	}
>	nvdimm_bus_unlock(&nd_region->dev);
>
>+
>	ndrd = devm_kzalloc(dev, sizeof(*ndrd) + flush_data_size, GFP_KERNEL);
>	if (!ndrd)
>		return -ENOMEM;
>@@ -1222,3 +1267,5 @@ int nd_region_conflict(struct nd_region *nd_region, resource_size_t start,
>
>	return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict);
> }
>+
>+MODULE_IMPORT_NS(DEVMEM);
>diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c
>index 6814339b3dab..a03e3c45f297 100644
>--- a/drivers/nvdimm/security.c
>+++ b/drivers/nvdimm/security.c
>@@ -208,6 +208,8 @@ static int __nvdimm_security_unlock(struct nvdimm *nvdimm)
>	rc = nvdimm->sec.ops->unlock(nvdimm, data);
>	dev_dbg(dev, "key: %d unlock: %s\n", key_serial(key),
>			rc == 0 ? "success" : "fail");
>+	if (rc == 0)
>+		set_bit(NDD_INCOHERENT, &nvdimm->flags);
>
>	nvdimm_put_key(key);
>	nvdimm->sec.flags = nvdimm_security_flags(nvdimm, NVDIMM_USER);
>@@ -374,6 +376,8 @@ static int security_erase(struct nvdimm *nvdimm, unsigned int keyid,
>		return -ENOKEY;
>
>	rc = nvdimm->sec.ops->erase(nvdimm, data, pass_type);
>+	if (rc == 0)
>+		set_bit(NDD_INCOHERENT, &nvdimm->flags);
>	dev_dbg(dev, "key: %d erase%s: %s\n", key_serial(key),
>			pass_type == NVDIMM_MASTER ? "(master)" : "(user)",
>			rc == 0 ? "success" : "fail");
>@@ -408,6 +412,8 @@ static int security_overwrite(struct nvdimm *nvdimm, unsigned int keyid)
>		return -ENOKEY;
>
>	rc = nvdimm->sec.ops->overwrite(nvdimm, data);
>+	if (rc == 0)
>+		set_bit(NDD_INCOHERENT, &nvdimm->flags);

Are you relying on hw preventing an incoming region_activate() while the overwrite
operation is in progress to ensure that the flags are stable throughout the whole
op? Currently query-overwrite also provides the flushing guarantees for when the
command is actually complete (at least from a user pov).

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] cxl/region: Fix missing probe failure
  2022-12-01 22:03 ` [PATCH 2/5] cxl/region: Fix missing probe failure Dan Williams
  2022-12-01 22:30   ` Dave Jiang
  2022-12-02  1:45   ` Davidlohr Bueso
@ 2022-12-02 14:23   ` Jonathan Cameron
  2022-12-03  8:03     ` Dan Williams
  2 siblings, 1 reply; 22+ messages in thread
From: Jonathan Cameron @ 2022-12-02 14:23 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, stable, dave.jiang, nvdimm, dave

On Thu, 01 Dec 2022 14:03:24 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> cxl_region_probe() allows for regions not in the 'commit' state to be
> enabled. Fail probe when the region is not committed otherwise the
> kernel may indicate that an address range is active when none of the
> decoders are active.
> 
> Fixes: 8d48817df6ac ("cxl/region: Add region driver boiler plate")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Huh. I wonder why this wasn't triggering a build warning given
rc is assigned but unused.

Ah well, this is clearly the original intent and makes sense.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/region.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f9ae5ad284ff..1bc2ebefa2a5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1923,6 +1923,9 @@ static int cxl_region_probe(struct device *dev)
>  	 */
>  	up_read(&cxl_region_rwsem);
>  
> +	if (rc)
> +		return rc;
> +
>  	switch (cxlr->mode) {
>  	case CXL_DECODER_PMEM:
>  		return devm_cxl_add_pmem_region(cxlr);
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security
  2022-12-01 22:03 ` [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security Dan Williams
  2022-12-01 22:32   ` Dave Jiang
  2022-12-02  1:49   ` Davidlohr Bueso
@ 2022-12-02 14:24   ` Jonathan Cameron
  2 siblings, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2022-12-02 14:24 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, dave.jiang, nvdimm, dave

On Thu, 01 Dec 2022 14:03:30 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Preclude the possibility of user tooling sending device secrets in the
> clear into the kernel by marking the security commands as exclusive.
> This mandates the usage of the keyctl ABI for managing the device
> passphrase.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Seems reasonable.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/cxl/core/mbox.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 8747db329087..35dd889f1d3a 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -704,6 +704,16 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>  		rc = 0;
>  	}
>  
> +	/*
> +	 * Setup permanently kernel exclusive commands, i.e. the
> +	 * mechanism is driven through sysfs, keyctl, etc...
> +	 */
> +	set_bit(CXL_MEM_COMMAND_ID_SET_PASSPHRASE, cxlds->exclusive_cmds);
> +	set_bit(CXL_MEM_COMMAND_ID_DISABLE_PASSPHRASE, cxlds->exclusive_cmds);
> +	set_bit(CXL_MEM_COMMAND_ID_UNLOCK, cxlds->exclusive_cmds);
> +	set_bit(CXL_MEM_COMMAND_ID_PASSPHRASE_SECURE_ERASE,
> +		cxlds->exclusive_cmds);
> +
>  out:
>  	kvfree(gsl);
>  	return rc;
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] nvdimm/region: Move cache management to the region driver
  2022-12-02  3:21   ` Davidlohr Bueso
@ 2022-12-03  8:01     ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2022-12-03  8:01 UTC (permalink / raw)
  To: Davidlohr Bueso, Dan Williams
  Cc: linux-cxl, Jonathan.Cameron, dave.jiang, nvdimm

Davidlohr Bueso wrote:
> On Thu, 01 Dec 2022, Dan Williams wrote:
> 
> >Now that cpu_cache_invalidate_memregion() is generically available, use
> >it to centralize CPU cache management in the nvdimm region driver.
> >
> >This trades off removing redundant per-dimm CPU cache flushing with an
> >opportunistic flush on every region disable event to cover the case of
> >sensitive dirty data in the cache being written back to media after a
> >secure erase / overwrite event.
> 
> Very nifty.
> 
> >Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> 
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> 
> with a few notes below.
> 
> >+static int nd_region_invalidate_memregion(struct nd_region *nd_region)
> >+{
> >+	int i, incoherent = 0;
> >+
> >+	for (i = 0; i < nd_region->ndr_mappings; i++) {
> >+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
> >+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
> >+
> >+		if (test_bit(NDD_INCOHERENT, &nvdimm->flags))
> >+			incoherent++;
> 
> No need to compute the rest, just break out here?

Sure, makes sense.

> 
> >+	}
> >+
> >+	if (!incoherent)
> >+		return 0;
> >+
> >+	if (!cpu_cache_has_invalidate_memregion()) {
> >+		if (IS_ENABLED(CONFIG_NVDIMM_SECURITY_TEST)) {
> >+			dev_warn(
> >+				&nd_region->dev,
> >+				"Bypassing cpu_cache_invalidate_memergion() for testing!\n");
> >+			goto out;
> >+		} else {
> >+			dev_err(&nd_region->dev,
> >+				"Failed to synchronize CPU cache state\n");
> >+			return -ENXIO;
> >+		}
> >+	}
> >+
> >+	cpu_cache_invalidate_memregion(IORES_DESC_PERSISTENT_MEMORY);
> >+out:
> >+	for (i = 0; i < nd_region->ndr_mappings; i++) {
> >+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
> >+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
> >+
> >+		clear_bit(NDD_INCOHERENT, &nvdimm->flags);
> >+	}
> >+
> >+	return 0;
> >+}
> >+
> > int nd_region_activate(struct nd_region *nd_region)
> > {
> >-	int i, j, num_flush = 0;
> >+	int i, j, rc, num_flush = 0;
> >	struct nd_region_data *ndrd;
> >	struct device *dev = &nd_region->dev;
> >	size_t flush_data_size = sizeof(void *);
> >
> >+	rc = nd_region_invalidate_memregion(nd_region);
> >+	if (rc)
> >+		return rc;
> >+
> >	nvdimm_bus_lock(&nd_region->dev);
> >	for (i = 0; i < nd_region->ndr_mappings; i++) {
> >		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
> >@@ -85,6 +129,7 @@ int nd_region_activate(struct nd_region *nd_region)
> >	}
> >	nvdimm_bus_unlock(&nd_region->dev);
> >
> >+
> >	ndrd = devm_kzalloc(dev, sizeof(*ndrd) + flush_data_size, GFP_KERNEL);
> >	if (!ndrd)
> >		return -ENOMEM;
> >@@ -1222,3 +1267,5 @@ int nd_region_conflict(struct nd_region *nd_region, resource_size_t start,
> >
> >	return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict);
> > }
> >+
> >+MODULE_IMPORT_NS(DEVMEM);
> >diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c
> >index 6814339b3dab..a03e3c45f297 100644
> >--- a/drivers/nvdimm/security.c
> >+++ b/drivers/nvdimm/security.c
> >@@ -208,6 +208,8 @@ static int __nvdimm_security_unlock(struct nvdimm *nvdimm)
> >	rc = nvdimm->sec.ops->unlock(nvdimm, data);
> >	dev_dbg(dev, "key: %d unlock: %s\n", key_serial(key),
> >			rc == 0 ? "success" : "fail");
> >+	if (rc == 0)
> >+		set_bit(NDD_INCOHERENT, &nvdimm->flags);
> >
> >	nvdimm_put_key(key);
> >	nvdimm->sec.flags = nvdimm_security_flags(nvdimm, NVDIMM_USER);
> >@@ -374,6 +376,8 @@ static int security_erase(struct nvdimm *nvdimm, unsigned int keyid,
> >		return -ENOKEY;
> >
> >	rc = nvdimm->sec.ops->erase(nvdimm, data, pass_type);
> >+	if (rc == 0)
> >+		set_bit(NDD_INCOHERENT, &nvdimm->flags);
> >	dev_dbg(dev, "key: %d erase%s: %s\n", key_serial(key),
> >			pass_type == NVDIMM_MASTER ? "(master)" : "(user)",
> >			rc == 0 ? "success" : "fail");
> >@@ -408,6 +412,8 @@ static int security_overwrite(struct nvdimm *nvdimm, unsigned int keyid)
> >		return -ENOKEY;
> >
> >	rc = nvdimm->sec.ops->overwrite(nvdimm, data);
> >+	if (rc == 0)
> >+		set_bit(NDD_INCOHERENT, &nvdimm->flags);
> 
> Are you relying on hw preventing an incoming region_activate() while the overwrite
> operation is in progress to ensure that the flags are stable throughout the whole
> op? Currently query-overwrite also provides the flushing guarantees for when the
> command is actually complete (at least from a user pov).

The driver handles this in nd_region_activate(), but hey, look at that,
nd_region_invalidate_memregion() is too early. I.e. it's before this check:

                if (test_bit(NDD_SECURITY_OVERWRITE, &nvdimm->flags)) {
                        nvdimm_bus_unlock(&nd_region->dev);
                        return -EBUSY;
                }

...which means that the cache could be invalidated too early while the
overwrite is still happening. Will move the cache invalidate below that
check. Thanks for poking at it!

Folded the following:

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index c73e3b1fd0a6..83dbf398ea84 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -67,8 +67,10 @@ static int nd_region_invalidate_memregion(struct nd_region *nd_region)
                struct nd_mapping *nd_mapping = &nd_region->mapping[i];
                struct nvdimm *nvdimm = nd_mapping->nvdimm;
 
-               if (test_bit(NDD_INCOHERENT, &nvdimm->flags))
+               if (test_bit(NDD_INCOHERENT, &nvdimm->flags)) {
                        incoherent++;
+                       break;
+               }
        }
 
        if (!incoherent)
@@ -106,10 +108,6 @@ int nd_region_activate(struct nd_region *nd_region)
        struct device *dev = &nd_region->dev;
        size_t flush_data_size = sizeof(void *);
 
-       rc = nd_region_invalidate_memregion(nd_region);
-       if (rc)
-               return rc;
-
        nvdimm_bus_lock(&nd_region->dev);
        for (i = 0; i < nd_region->ndr_mappings; i++) {
                struct nd_mapping *nd_mapping = &nd_region->mapping[i];
@@ -129,6 +127,9 @@ int nd_region_activate(struct nd_region *nd_region)
        }
        nvdimm_bus_unlock(&nd_region->dev);
 
+       rc = nd_region_invalidate_memregion(nd_region);
+       if (rc)
+               return rc;
 
        ndrd = devm_kzalloc(dev, sizeof(*ndrd) + flush_data_size, GFP_KERNEL);
        if (!ndrd)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] cxl/region: Fix missing probe failure
  2022-12-02 14:23   ` Jonathan Cameron
@ 2022-12-03  8:03     ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2022-12-03  8:03 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, stable, dave.jiang, nvdimm, dave

Jonathan Cameron wrote:
> On Thu, 01 Dec 2022 14:03:24 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > cxl_region_probe() allows for regions not in the 'commit' state to be
> > enabled. Fail probe when the region is not committed otherwise the
> > kernel may indicate that an address range is active when none of the
> > decoders are active.
> > 
> > Fixes: 8d48817df6ac ("cxl/region: Add region driver boiler plate")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Huh. I wonder why this wasn't triggering a build warning given
> rc is assigned but unused.

Yes, I thought that was curious too.

> 
> Ah well, this is clearly the original intent and makes sense.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > ---
> >  drivers/cxl/core/region.c |    3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index f9ae5ad284ff..1bc2ebefa2a5 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -1923,6 +1923,9 @@ static int cxl_region_probe(struct device *dev)
> >  	 */
> >  	up_read(&cxl_region_rwsem);
> >  
> > +	if (rc)
> > +		return rc;
> > +
> >  	switch (cxlr->mode) {
> >  	case CXL_DECODER_PMEM:
> >  		return devm_cxl_add_pmem_region(cxlr);
> > 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events
  2022-12-01 22:03 ` [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events Dan Williams
  2022-12-01 23:04   ` Dave Jiang
@ 2022-12-05 19:20   ` Davidlohr Bueso
  2022-12-05 20:10     ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: Davidlohr Bueso @ 2022-12-05 19:20 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Jonathan.Cameron, dave.jiang, nvdimm

On Thu, 01 Dec 2022, Dan Williams wrote:

>A "DPA invalidation event" is any scenario where the contents of a DPA
>(Device Physical Address) is modified in a way that is incoherent with
>CPU caches, or if the HPA (Host Physical Address) to DPA association
>changes due to a remapping event.
>
>PMEM security events like Unlock and Passphrase Secure Erase already
>manage caches through LIBNVDIMM,

Just to be clear, is this is why you get rid of the explicit flushing
for the respective commands in security.c?

>so that leaves HPA to DPA remap events
>that need cache management by the CXL core. Those only happen when the
>boot time CXL configuration has changed. That event occurs when
>userspace attaches an endpoint decoder to a region configuration, and
>that region is subsequently activated.
>
>The implications of not invalidating caches between remap events is that
>reads from the region at different points in time may return different
>results due to stale cached data from the previous HPA to DPA mapping.
>Without a guarantee that the region contents after cxl_region_probe()
>are written before being read (a layering-violation assumption that
>cxl_region_probe() can not make) the CXL subsystem needs to ensure that
>reads that precede writes see consistent results.

Hmm where does this leave us remaping under arm64 which is doesn't have
ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION?

Back when we were discussing this it was all related to the security stuff,
which under arm it could just be easily discarded as not available feature.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events
  2022-12-05 19:20   ` Davidlohr Bueso
@ 2022-12-05 20:10     ` Dan Williams
  2022-12-06  9:47       ` Jonathan Cameron
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Williams @ 2022-12-05 20:10 UTC (permalink / raw)
  To: Davidlohr Bueso, Dan Williams
  Cc: linux-cxl, Jonathan.Cameron, dave.jiang, nvdimm, linux-arm-kernel

[ add linux-arm-kernel@lists.infradead.org ]

Background for ARM folks, CXL can dynamically reconfigure the target
devices that back a given physical memory region. When that happens the
CPU cache can be holding cache data from a previous configuration. The
mitigation for that scenario on x86 is wbinvd, ARM does not have an
equivalent. The result, dynamic region creation is disabled on ARM. In
the near term, most CXL is configured pre-boot, but going forward this
restriction is untenable.

Davidlohr Bueso wrote:
> On Thu, 01 Dec 2022, Dan Williams wrote:
> 
> >A "DPA invalidation event" is any scenario where the contents of a DPA
> >(Device Physical Address) is modified in a way that is incoherent with
> >CPU caches, or if the HPA (Host Physical Address) to DPA association
> >changes due to a remapping event.
> >
> >PMEM security events like Unlock and Passphrase Secure Erase already
> >manage caches through LIBNVDIMM,
> 
> Just to be clear, is this is why you get rid of the explicit flushing
> for the respective commands in security.c?

Correct, because those commands can only be executed through libnvdimm.

> 
> >so that leaves HPA to DPA remap events
> >that need cache management by the CXL core. Those only happen when the
> >boot time CXL configuration has changed. That event occurs when
> >userspace attaches an endpoint decoder to a region configuration, and
> >that region is subsequently activated.
> >
> >The implications of not invalidating caches between remap events is that
> >reads from the region at different points in time may return different
> >results due to stale cached data from the previous HPA to DPA mapping.
> >Without a guarantee that the region contents after cxl_region_probe()
> >are written before being read (a layering-violation assumption that
> >cxl_region_probe() can not make) the CXL subsystem needs to ensure that
> >reads that precede writes see consistent results.
> 
> Hmm where does this leave us remaping under arm64 which is doesn't have
> ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION?
> 
> Back when we were discussing this it was all related to the security stuff,
> which under arm it could just be easily discarded as not available feature.

I can throw out a few strawman options, but really need help from ARM
folks to decide where to go next.

1/ Map and loop cache flushing line by line. It works, but for Terabytes
   of CXL the cost is 10s of seconds of latency to reconfigure a region.
   That said, region configuration, outside of test scenarios, is typically
   a "once per bare metal provisioning" event.

2/ Set a configuration dependency that mandates that all CXL memory be
   routed through the page allocator where it is guaranteed that the memory
   will be written (zeroed) before use. This restricts some planned use
   cases for the "Dynamic Capacity Device" capability.

3/ Work with the CXL consortium to extend the back-invalidate concept
   for general purpose usage to make devices capable of invalidating caches
   for a new memory region they joined, and mandate it for ARM. This one
   has a long lead time and a gap for every device in flight currently.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events
  2022-12-05 20:10     ` Dan Williams
@ 2022-12-06  9:47       ` Jonathan Cameron
  2022-12-06 15:17         ` James Morse
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Cameron @ 2022-12-06  9:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: Davidlohr Bueso, linux-cxl, dave.jiang, nvdimm, linux-arm-kernel,
	James Morse, Will Deacon, catalin.marinas, Anshuman Khandual,
	anthony.jebson, ardb

On Mon, 5 Dec 2022 12:10:22 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> [ add linux-arm-kernel@lists.infradead.org ]
> 
> Background for ARM folks, CXL can dynamically reconfigure the target
> devices that back a given physical memory region. When that happens the
> CPU cache can be holding cache data from a previous configuration. The
> mitigation for that scenario on x86 is wbinvd, ARM does not have an
> equivalent. The result, dynamic region creation is disabled on ARM. In
> the near term, most CXL is configured pre-boot, but going forward this
> restriction is untenable.
> 
> Davidlohr Bueso wrote:
> > On Thu, 01 Dec 2022, Dan Williams wrote:
> >   
> > >A "DPA invalidation event" is any scenario where the contents of a DPA
> > >(Device Physical Address) is modified in a way that is incoherent with
> > >CPU caches, or if the HPA (Host Physical Address) to DPA association
> > >changes due to a remapping event.
> > >
> > >PMEM security events like Unlock and Passphrase Secure Erase already
> > >manage caches through LIBNVDIMM,  
> > 
> > Just to be clear, is this is why you get rid of the explicit flushing
> > for the respective commands in security.c?  
> 
> Correct, because those commands can only be executed through libnvdimm.
> 
> >   
> > >so that leaves HPA to DPA remap events
> > >that need cache management by the CXL core. Those only happen when the
> > >boot time CXL configuration has changed. That event occurs when
> > >userspace attaches an endpoint decoder to a region configuration, and
> > >that region is subsequently activated.
> > >
> > >The implications of not invalidating caches between remap events is that
> > >reads from the region at different points in time may return different
> > >results due to stale cached data from the previous HPA to DPA mapping.
> > >Without a guarantee that the region contents after cxl_region_probe()
> > >are written before being read (a layering-violation assumption that
> > >cxl_region_probe() can not make) the CXL subsystem needs to ensure that
> > >reads that precede writes see consistent results.  
> > 
> > Hmm where does this leave us remaping under arm64 which is doesn't have
> > ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION?
> > 
> > Back when we were discussing this it was all related to the security stuff,
> > which under arm it could just be easily discarded as not available feature.  
> 
> I can throw out a few strawman options, but really need help from ARM
> folks to decide where to go next.

+Cc bunch of relevant people. There are discussions underway but I'm not sure
anyone will want to give more details here yet.

> 
> 1/ Map and loop cache flushing line by line. It works, but for Terabytes
>    of CXL the cost is 10s of seconds of latency to reconfigure a region.
>    That said, region configuration, outside of test scenarios, is typically
>    a "once per bare metal provisioning" event.
> 
> 2/ Set a configuration dependency that mandates that all CXL memory be
>    routed through the page allocator where it is guaranteed that the memory
>    will be written (zeroed) before use. This restricts some planned use
>    cases for the "Dynamic Capacity Device" capability.

This is the only case that's really a problem (to my mind) I hope we will have
a more general solution before there is much hardware out there, particularly
where sharing is involved. 

> 
> 3/ Work with the CXL consortium to extend the back-invalidate concept
>    for general purpose usage to make devices capable of invalidating caches
>    for a new memory region they joined, and mandate it for ARM. This one
>    has a long lead time and a gap for every device in flight currently.

There are significant disadvantages in doing this that I suspect will mean
this never happens for some classes of device, or is turned off for performance
reasons. For anyone curious, go look at the protocol requirements of back
invalidate in the CXL 3.0 spec.

Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events
  2022-12-06  9:47       ` Jonathan Cameron
@ 2022-12-06 15:17         ` James Morse
  0 siblings, 0 replies; 22+ messages in thread
From: James Morse @ 2022-12-06 15:17 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: Davidlohr Bueso, linux-cxl, dave.jiang, nvdimm, linux-arm-kernel,
	Will Deacon, catalin.marinas, Anshuman Khandual, anthony.jebson,
	ardb

Hi guys,

On 06/12/2022 09:47, Jonathan Cameron wrote:
> On Mon, 5 Dec 2022 12:10:22 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
>> [ add linux-arm-kernel@lists.infradead.org ]
>>
>> Background for ARM folks, CXL can dynamically reconfigure the target
>> devices that back a given physical memory region. When that happens the
>> CPU cache can be holding cache data from a previous configuration. The
>> mitigation for that scenario on x86 is wbinvd, ARM does not have an
>> equivalent. The result, dynamic region creation is disabled on ARM. In
>> the near term, most CXL is configured pre-boot, but going forward this
>> restriction is untenable.
>>
>> Davidlohr Bueso wrote:
>>> On Thu, 01 Dec 2022, Dan Williams wrote:
>>>   
>>>> A "DPA invalidation event" is any scenario where the contents of a DPA
>>>> (Device Physical Address) is modified in a way that is incoherent with
>>>> CPU caches, or if the HPA (Host Physical Address) to DPA association
>>>> changes due to a remapping event.
>>>>
>>>> PMEM security events like Unlock and Passphrase Secure Erase already
>>>> manage caches through LIBNVDIMM,  
>>>
>>> Just to be clear, is this is why you get rid of the explicit flushing
>>> for the respective commands in security.c?  
>>
>> Correct, because those commands can only be executed through libnvdimm.
>>
>>>   
>>>> so that leaves HPA to DPA remap events
>>>> that need cache management by the CXL core. Those only happen when the
>>>> boot time CXL configuration has changed. That event occurs when
>>>> userspace attaches an endpoint decoder to a region configuration, and
>>>> that region is subsequently activated.
>>>>
>>>> The implications of not invalidating caches between remap events is that
>>>> reads from the region at different points in time may return different
>>>> results due to stale cached data from the previous HPA to DPA mapping.
>>>> Without a guarantee that the region contents after cxl_region_probe()
>>>> are written before being read (a layering-violation assumption that
>>>> cxl_region_probe() can not make) the CXL subsystem needs to ensure that
>>>> reads that precede writes see consistent results.  
>>>
>>> Hmm where does this leave us remaping under arm64 which is doesn't have
>>> ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION?

For those reading along at home, ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION is wbinvd.
https://lore.kernel.org/linux-cxl/20220919110605.3696-1-dave@stgolabs.net/

We don't have an instruction for arm64 that 'invalidates all caches'.


>>> Back when we were discussing this it was all related to the security stuff,
>>> which under arm it could just be easily discarded as not available feature.  
>>
>> I can throw out a few strawman options, but really need help from ARM
>> folks to decide where to go next.

> +Cc bunch of relevant people. There are discussions underway but I'm not sure
> anyone will want to give more details here yet.

The best we can do today is to use the by-VA invalidate operations in the kernel.
This isn't guaranteed to invalidate 'invisible' system caches, which means its not enough
for a one-size-fits-all kernel interface.
For the NVDIMM secure-erase users of this thing, if there were a system-cache between the
CPUs and the NVDIMM, there is nothing the kernel can do to invalidate it.

If its CXL specific this would be okay for testing in Qemu, but performance would scale
with the size of the region, which would hurt in real world cases.

The plan is to add a firmware call so firmware can do things that don't scale with the
size of the mapping, and do something platform-specific to the 'invisible' system cache,
if there is one.


Ideally we wait for the PSCI spec update that describes the firmware call, and make
support dependent on that. It looks like the timeline will be March-ish, but there should
be an alpha of the spec available much sooner.


>> 1/ Map and loop cache flushing line by line. It works, but for Terabytes
>>    of CXL the cost is 10s of seconds of latency to reconfigure a region.
>>    That said, region configuration, outside of test scenarios, is typically
>>    a "once per bare metal provisioning" event.

It works for CXL because you'd never have a system-cache in front of the CXL window.
Those things don't necessarily receive cache-maintenance because they are supposed to be
invisible.

D7.4.11 of DDI0487I.a "System level caches" has this horror:
| System caches which lie beyond the point of coherency and so are invisible to the
| software. The management of such caches is outside the scope of the architecture.

(The PoP stuff reaches beyond the PoC, but there isn't a DC CIVAP instruction)

Detecting which regions we can't do this for is problematic.


>> 2/ Set a configuration dependency that mandates that all CXL memory be
>>    routed through the page allocator where it is guaranteed that the memory
>>    will be written (zeroed) before use. This restricts some planned use
>>    cases for the "Dynamic Capacity Device" capability.

> This is the only case that's really a problem (to my mind) I hope we will have
> a more general solution before there is much hardware out there, particularly
> where sharing is involved. 


Thanks,

James


>> 3/ Work with the CXL consortium to extend the back-invalidate concept
>>    for general purpose usage to make devices capable of invalidating caches
>>    for a new memory region they joined, and mandate it for ARM. This one
>>    has a long lead time and a gap for every device in flight currently.
> 
> There are significant disadvantages in doing this that I suspect will mean
> this never happens for some classes of device, or is turned off for performance
> reasons. For anyone curious, go look at the protocol requirements of back
> invalidate in the CXL 3.0 spec.
> 
> Jonathan


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-12-06 15:20 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-01 22:03 [PATCH 0/5] cxl, nvdimm: Move CPU cache management to region drivers Dan Williams
2022-12-01 22:03 ` [PATCH 1/5] cxl: add dimm_id support for __nvdimm_create() Dan Williams
2022-12-01 22:03 ` [PATCH 2/5] cxl/region: Fix missing probe failure Dan Williams
2022-12-01 22:30   ` Dave Jiang
2022-12-02  1:45   ` Davidlohr Bueso
2022-12-02 14:23   ` Jonathan Cameron
2022-12-03  8:03     ` Dan Williams
2022-12-01 22:03 ` [PATCH 3/5] cxl/pmem: Enforce keyctl ABI for PMEM security Dan Williams
2022-12-01 22:32   ` Dave Jiang
2022-12-01 22:44     ` Dan Williams
2022-12-02  1:49   ` Davidlohr Bueso
2022-12-02 14:24   ` Jonathan Cameron
2022-12-01 22:03 ` [PATCH 4/5] nvdimm/region: Move cache management to the region driver Dan Williams
2022-12-01 23:00   ` Dave Jiang
2022-12-02  3:21   ` Davidlohr Bueso
2022-12-03  8:01     ` Dan Williams
2022-12-01 22:03 ` [PATCH 5/5] cxl/region: Manage CPU caches relative to DPA invalidation events Dan Williams
2022-12-01 23:04   ` Dave Jiang
2022-12-05 19:20   ` Davidlohr Bueso
2022-12-05 20:10     ` Dan Williams
2022-12-06  9:47       ` Jonathan Cameron
2022-12-06 15:17         ` James Morse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).