All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] cxl: CXL Inject & Clear Poison
@ 2023-03-01 21:36 alison.schofield
  2023-03-01 21:36 ` [PATCH v3 1/8] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: alison.schofield @ 2023-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Changes in v3: 

- Built on cxl/next plus the Get Poison list patchset here:
  https://lore.kernel.org/linux-cxl/cover.1676685180.git.alison.schofield@intel.com/

- Two new patches are appended to address Dan's deprecate ask:
  	cxl/memdev: Make inject and clear poison cmds kernel exclusive
  	cxl/mbox: Block inject and clear poison opcodes in raw mode
  I didn't deprecate the ioctls as Dan suggested. Instead, I followed
  on Ira's recent clean up wrt enabled and exclusive commands, deciding
  that the honest response to a cxl_query() would be 'Enabled' (if hardware
  supports) and kernel 'Exclusive' (always). Please take a look.

- Move IS_ENABLED(CONFIG_CXL_POISON_INJECT) test into each attr check (Dan)
- Fail updates to poison_inject_max when poison list is not empty (DaveJ)
- Make poison_inject_max a sysfs attr, rather than module param (Dan)
- Kconfig: add note regarding debug intent (Dan)
- Kept Jonathan's Reviewed-by Tag on Patches 1-6

Link to v2: https://lore.kernel.org/linux-cxl/cover.1674101475.git.alison.schofield@intel.com/


Introducing Inject and Clear Poison support for CXL Devices.

These are optional commands, meaning not all CXL devices must support
them. The sysfs attributes, inject_poison and clear_poison, are only
visible for devices reporting support of the capability and when the
kernel Kconfig option CONFIG_CXL_POISON_INJECT is on. (Default: off)

Example:
# echo 0x40000000 > /sys/bus/cxl/devices/mem1/inject_poison
# echo 1 > /sys/bus/cxl/devices/mem1/trigger_poison_list

cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Injected flags= overflow_time=0


Alison Schofield (8):
  cxl/memdev: Add support for the Inject Poison mailbox command
  cxl/memdev: Add support for the Clear Poison mailbox command
  tools/testing/cxl: Mock the Inject Poison mailbox command
  tools/testing/cxl: Mock the Clear Poison mailbox command
  tools/testing/cxl: Use injected poison for get poison list
  tools/testing/cxl: Add a sysfs attr to test poison injection limits
  cxl/memdev: Make inject and clear poison cmds kernel exclusive
  cxl/mbox: Block inject and clear poison opcodes in raw mode

 Documentation/ABI/testing/sysfs-bus-cxl |  40 +++++
 drivers/cxl/Kconfig                     |  11 ++
 drivers/cxl/core/mbox.c                 |   6 +
 drivers/cxl/core/memdev.c               | 162 ++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  11 ++
 include/uapi/linux/cxl_mem.h            |  21 ++-
 tools/testing/cxl/test/mem.c            | 216 +++++++++++++++++++++---
 7 files changed, 443 insertions(+), 24 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/8] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-03-01 21:36 [PATCH v3 0/8] cxl: CXL Inject & Clear Poison alison.schofield
@ 2023-03-01 21:36 ` alison.schofield
  2023-03-30 18:13   ` Jonathan Cameron
  2023-03-01 21:36 ` [PATCH v3 2/8] cxl/memdev: Add support for the Clear " alison.schofield
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: alison.schofield @ 2023-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the INJECT POISON mailbox command. Add
a sysfs attribute and memdev driver support for injecting poison. The
attribute is only visible for devices supporting the capability when
the kernel is built with CONFIG_CXL_POISON_INJECT.

When a Device Physical Address (DPA) is written to the inject_poison
sysfs attribute, send an inject poison command to the device for the
specified address.

Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
inject poison request, the device will return poison when the address
is accessed through the CXL.mem bus. Injecting poison adds the address
to the device's Poison List and the error source is set to Injected.
In addition, the device adds a poison creation event to its internal
Informational Event log, updates the Event Status register, and if
configured, interrupts the host.

Also, per the CXL Specification, it is not an error to inject poison
into an address that already has poison present and no error is
returned from the device.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++
 drivers/cxl/Kconfig                     | 11 +++
 drivers/cxl/core/memdev.c               | 97 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  5 ++
 4 files changed, 135 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index d9421c965a3b..e19d1020f30a 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -429,3 +429,25 @@ Description:
 		attribute is only visible for devices supporting the
 		capability. The retrieved errors are logged as kernel
 		trace events with the label 'cxl_poison'.
+
+
+What:		/sys/bus/cxl/devices/memX/inject_poison
+Date:		January, 2023
+KernelVersion:	v6.3
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute, the memdev driver sends an inject poison command to
+		the device for the specified address. The DPA must be 64-byte
+		aligned and the length of the injected poison is 64-bytes. If
+		successful, the device returns poison when the address is
+		accessed through the CXL.mem bus. Injecting poison adds the
+		address to the device's Poison List and the error source is set
+		to Injected. In addition, the device adds a poison creation
+		event to its internal Informational Event log, updates the
+		Event Status register, and if configured, interrupts the host.
+		It is not an error to inject poison into an address that
+		already has poison present and no error is returned. The
+		inject_poison attribute is only visible for devices supporting
+		the capability. Kconfig option CXL_POISON_INJECT must be on
+		to enable this option. The default is off.
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index ff4e78117b31..a7ca0bbb8475 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -139,4 +139,15 @@ config CXL_REGION_INVALIDATION_TEST
 	  If unsure, or if this kernel is meant for production environments,
 	  say N.
 
+config CXL_POISON_INJECT
+	bool "CXL: Support CXL Memory Device Poison Inject"
+	depends on CXL_MEM
+	help
+	  Selecting this option creates the sysfs attributes inject_poison
+	  and clear_poison for CXL memory devices supporting the capability.
+	  This option is intended for debug scenarios only and is disabled
+	  by default. See Documentation/ABI/testing/sysfs-bus-cxl.
+
+	  If unsure, say N.
+
 endif
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index c11b7bc253b4..82e09b81e9c6 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -165,6 +165,92 @@ static ssize_t trigger_poison_list_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(trigger_poison_list);
 
+static int cxl_dpa_mapped(struct device *dev, void *data)
+{
+	struct cxl_endpoint_decoder *cxled;
+	u64 *dpa = data;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
+		return 0;
+
+	if (*dpa <= cxled->dpa_res->end && *dpa >= cxled->dpa_res->start) {
+		dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n",
+			*dpa, dev_name(&cxled->cxld.region->dev));
+		return 1;
+	}
+
+	return 0;
+}
+
+static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
+{
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_port *port;
+	int rc;
+
+	if (!resource_size(&cxlds->dpa_res)) {
+		dev_dbg(cxlds->dev, "device has no dpa resource\n");
+		return -EINVAL;
+	}
+	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
+		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
+			dpa, &cxlds->dpa_res);
+		return -EINVAL;
+	}
+	if (!IS_ALIGNED(dpa, 64)) {
+		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n", dpa);
+		return -EINVAL;
+	}
+	port = dev_get_drvdata(&cxlmd->dev);
+	if (port && is_cxl_endpoint(port) && port->commit_end != -1) {
+		rc = device_for_each_child(&port->dev, &dpa, cxl_dpa_mapped);
+		if (rc)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static ssize_t inject_poison_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_mbox_inject_poison inject;
+	struct cxl_mbox_cmd mbox_cmd;
+	u64 dpa;
+	int rc;
+
+	rc = kstrtou64(buf, 0, &dpa);
+	if (rc)
+		return rc;
+
+	down_read(&cxl_dpa_rwsem);
+	rc = cxl_validate_poison_dpa(cxlmd, dpa);
+	if (rc) {
+		up_read(&cxl_dpa_rwsem);
+		return rc;
+	}
+
+	inject = (struct cxl_mbox_inject_poison) {
+		.address = cpu_to_le64(dpa)
+	};
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_INJECT_POISON,
+		.size_in = sizeof(inject),
+		.payload_in = &inject,
+	};
+	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
+
+	up_read(&cxl_dpa_rwsem);
+	return rc ? rc : len;
+}
+static DEVICE_ATTR_WO(inject_poison);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
@@ -172,6 +258,7 @@ static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_label_storage_size.attr,
 	&dev_attr_numa_node.attr,
 	&dev_attr_trigger_poison_list.attr,
+	&dev_attr_inject_poison.attr,
 	NULL,
 };
 
@@ -198,6 +285,16 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
 			return 0;
 	}
+	if (a == &dev_attr_inject_poison.attr) {
+		struct device *dev = kobj_to_dev(kobj);
+
+		if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT))
+			return 0;
+
+		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
+			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
+			return 0;
+	}
 	return a->mode;
 }
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index a6eb1b42eb88..01d27f362cd6 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -602,6 +602,11 @@ struct cxl_mbox_poison_payload_out {
 #define CXL_POISON_SOURCE_INJECTED	3
 #define CXL_POISON_SOURCE_VENDOR	7
 
+/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
+struct cxl_mbox_inject_poison {
+	__le64 address;
+};
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/8] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-03-01 21:36 [PATCH v3 0/8] cxl: CXL Inject & Clear Poison alison.schofield
  2023-03-01 21:36 ` [PATCH v3 1/8] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
@ 2023-03-01 21:36 ` alison.schofield
  2023-03-30 18:16   ` Jonathan Cameron
  2023-03-01 21:36 ` [PATCH v3 3/8] tools/testing/cxl: Mock the Inject " alison.schofield
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: alison.schofield @ 2023-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the CLEAR POISON mailbox command. Add
a sysfs attribute and memdev driver support for clearing poison. The
attribute is only visible for devices supporting the capability when
the kernel is built with CONFIG_CXL_POISON_INJECT.

When a Device Physical Address (DPA) is written to the clear_poison
sysfs attribute, send a clear poison command to the device for the
specified address.

Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
clear poison request, the device removes the address from the device's
Poison List and writes 0 (zero) for 64 bytes starting at address. If
the device cannot clear poison from the address, it returns a permanent
media error and -ENXIO is returned to the user.

Additionally, and per the spec also, it is not an error to clear poison
of an address that is not poisoned. In this case, the device does not
overwrite the address and the device does not return an error.

*Implementation note: Although the CXL specification defines the clear
command to accept 64 bytes of 'write-data' to be used when clearing
the poisoned address, this implementation always uses 0 (zeros) for
the write-data.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
 drivers/cxl/core/memdev.c               | 59 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  6 +++
 3 files changed, 83 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index e19d1020f30a..e2c77eda443e 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -451,3 +451,21 @@ Description:
 		inject_poison attribute is only visible for devices supporting
 		the capability. Kconfig option CXL_POISON_INJECT must be on
 		to enable this option. The default is off.
+
+
+What:		/sys/bus/cxl/devices/memX/clear_poison
+Date:		January, 2023
+KernelVersion:	v6.3
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute, the memdev driver sends a clear poison command to
+		the device for the specified address. Clearing poison removes
+		the address from the device's Poison List and writes 0 (zero)
+		for 64 bytes starting at address. It is not an error to clear
+		poison from an address that does not have poison set, and if
+		poison was not set, the address is not overwritten. If the
+		device cannot clear poison from the address, -ENXIO is returned.
+		The clear_poison attribute is only visible for devices
+		supporting the capability. Kconfig option CXL_POISON_INJECT
+		must be on to enable this option. The default is off.
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 82e09b81e9c6..ed3e4517dc3a 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -251,6 +251,53 @@ static ssize_t inject_poison_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(inject_poison);
 
+static ssize_t clear_poison_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_mbox_clear_poison clear;
+	struct cxl_mbox_cmd mbox_cmd;
+	u64 dpa;
+	int rc;
+
+	rc = kstrtou64(buf, 0, &dpa);
+	if (rc)
+		return rc;
+
+	down_read(&cxl_dpa_rwsem);
+	rc = cxl_validate_poison_dpa(cxlmd, dpa);
+	if (rc) {
+		up_read(&cxl_dpa_rwsem);
+		return rc;
+	}
+
+	/*
+	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
+	 * is defined to accept 64 bytes of 'write-data', along with the
+	 * address to clear. The device writes the data into the address
+	 * atomically, while clearing poison if the location is marked as
+	 * being poisoned.
+	 *
+	 * Always use '0' for the write-data.
+	 */
+	clear = (struct cxl_mbox_clear_poison) {
+		.address = cpu_to_le64(dpa)
+	};
+
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_CLEAR_POISON,
+		.size_in = sizeof(clear),
+		.payload_in = &clear,
+	};
+
+	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
+
+	up_read(&cxl_dpa_rwsem);
+	return rc ? rc : len;
+}
+static DEVICE_ATTR_WO(clear_poison);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
@@ -259,6 +306,7 @@ static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_numa_node.attr,
 	&dev_attr_trigger_poison_list.attr,
 	&dev_attr_inject_poison.attr,
+	&dev_attr_clear_poison.attr,
 	NULL,
 };
 
@@ -295,6 +343,17 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
 			return 0;
 	}
+	if (a == &dev_attr_clear_poison.attr) {
+		struct device *dev = kobj_to_dev(kobj);
+
+		if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT))
+			return 0;
+
+		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
+			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
+			return 0;
+		}
+	}
 	return a->mode;
 }
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 01d27f362cd6..8a15274789a6 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -607,6 +607,12 @@ struct cxl_mbox_inject_poison {
 	__le64 address;
 };
 
+/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
+struct cxl_mbox_clear_poison {
+	__le64 address;
+	u8 write_data[CXL_POISON_LEN_MULT];
+} __packed;
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/8] tools/testing/cxl: Mock the Inject Poison mailbox command
  2023-03-01 21:36 [PATCH v3 0/8] cxl: CXL Inject & Clear Poison alison.schofield
  2023-03-01 21:36 ` [PATCH v3 1/8] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
  2023-03-01 21:36 ` [PATCH v3 2/8] cxl/memdev: Add support for the Clear " alison.schofield
@ 2023-03-01 21:36 ` alison.schofield
  2023-03-01 21:36 ` [PATCH v3 4/8] tools/testing/cxl: Mock the Clear " alison.schofield
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: alison.schofield @ 2023-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

Mock the injection of poison by storing the device:address entries in
mock_poison_list[]. Enforce a limit of 8 poison injections per memdev
device and 128 total entries for the cxl_test mock driver.

Introducing the mock_poison[] list here, makes it available for use in
the mock of Clear Poison, and the mock of Get Poison List.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 tools/testing/cxl/test/mem.c | 77 ++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 2fa9c18d4c2c..4fca886e2a7c 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -16,6 +16,9 @@
 #define DEV_SIZE SZ_2G
 #define EFFECT(x) (1U << x)
 
+#define MOCK_INJECT_DEV_MAX 8
+#define MOCK_INJECT_TEST_MAX 128
+
 static struct cxl_cel_entry mock_cel[] = {
 	{
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
@@ -45,6 +48,10 @@ static struct cxl_cel_entry mock_cel[] = {
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
 		.effect = cpu_to_le16(0),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
+		.effect = cpu_to_le16(0),
+	},
 };
 
 /* See CXL 2.0 Table 181 Get Health Info Output Payload */
@@ -474,6 +481,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
 		.total_capacity =
 			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
+		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
 	};
 
 	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
@@ -895,6 +903,11 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static struct mock_poison {
+	struct cxl_dev_state *cxlds;
+	u64 dpa;
+} mock_poison_list[MOCK_INJECT_TEST_MAX];
+
 static int mock_get_poison(struct cxl_dev_state *cxlds,
 			   struct cxl_mbox_cmd *cmd)
 {
@@ -923,6 +936,67 @@ static int mock_get_poison(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
+{
+	int count = 0;
+
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds == cxlds)
+			count++;
+	}
+	return (count >= MOCK_INJECT_DEV_MAX);
+}
+
+static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	if (mock_poison_dev_max_injected(cxlds)) {
+		dev_dbg(cxlds->dev,
+			"Device poison injection limit has been reached: %d\n",
+			MOCK_INJECT_DEV_MAX);
+		return false;
+	}
+
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (!mock_poison_list[i].cxlds) {
+			mock_poison_list[i].cxlds = cxlds;
+			mock_poison_list[i].dpa = dpa;
+			return true;
+		}
+	}
+	dev_dbg(cxlds->dev,
+		"Mock test poison injection limit has been reached: %d\n",
+		MOCK_INJECT_TEST_MAX);
+
+	return false;
+}
+
+static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds == cxlds &&
+		    mock_poison_list[i].dpa == dpa)
+			return true;
+	}
+	return false;
+}
+
+static int mock_inject_poison(struct cxl_dev_state *cxlds,
+			      struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_inject_poison *pi = cmd->payload_in;
+	u64 dpa = le64_to_cpu(pi->address);
+
+	if (mock_poison_found(cxlds, dpa)) {
+		/* Not an error to inject poison if already poisoned */
+		dev_dbg(cxlds->dev, "DPA: 0x%llx already poisoned\n", dpa);
+		return 0;
+	}
+	if (!mock_poison_add(cxlds, dpa))
+		return -ENXIO;
+
+	return 0;
+}
+
 static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct device *dev = cxlds->dev;
@@ -980,6 +1054,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_GET_POISON:
 		rc = mock_get_poison(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_INJECT_POISON:
+		rc = mock_inject_poison(cxlds, cmd);
+		break;
 	default:
 		break;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/8] tools/testing/cxl: Mock the Clear Poison mailbox command
  2023-03-01 21:36 [PATCH v3 0/8] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (2 preceding siblings ...)
  2023-03-01 21:36 ` [PATCH v3 3/8] tools/testing/cxl: Mock the Inject " alison.schofield
@ 2023-03-01 21:36 ` alison.schofield
  2023-03-01 21:36 ` [PATCH v3 5/8] tools/testing/cxl: Use injected poison for get poison list alison.schofield
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: alison.schofield @ 2023-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

Mock the clear of poison by deleting the device:address entry from
the mock_poison_list[]. Behave like a real CXL device and do not fail
if the address is not in the poison list, but offer a dev_dbg()
message.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 tools/testing/cxl/test/mem.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 4fca886e2a7c..720147c7cb2b 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -52,6 +52,10 @@ static struct cxl_cel_entry mock_cel[] = {
 		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
 		.effect = cpu_to_le16(0),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_CLEAR_POISON),
+		.effect = cpu_to_le16(0),
+	},
 };
 
 /* See CXL 2.0 Table 181 Get Health Info Output Payload */
@@ -997,6 +1001,35 @@ static int mock_inject_poison(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static bool mock_poison_del(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds == cxlds &&
+		    mock_poison_list[i].dpa == dpa) {
+			mock_poison_list[i].cxlds = NULL;
+			return true;
+		}
+	}
+	return false;
+}
+
+static int mock_clear_poison(struct cxl_dev_state *cxlds,
+			     struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_clear_poison *pi = cmd->payload_in;
+	u64 dpa = le64_to_cpu(pi->address);
+
+	/*
+	 * A real CXL device will write pi->write_data to the address
+	 * being cleared. In this mock, just delete this address from
+	 * the mock poison list.
+	 */
+	if (!mock_poison_del(cxlds, dpa))
+		dev_dbg(cxlds->dev, "DPA: 0x%llx not in poison list\n", dpa);
+
+	return 0;
+}
+
 static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct device *dev = cxlds->dev;
@@ -1057,6 +1090,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_INJECT_POISON:
 		rc = mock_inject_poison(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_CLEAR_POISON:
+		rc = mock_clear_poison(cxlds, cmd);
+		break;
 	default:
 		break;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 5/8] tools/testing/cxl: Use injected poison for get poison list
  2023-03-01 21:36 [PATCH v3 0/8] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (3 preceding siblings ...)
  2023-03-01 21:36 ` [PATCH v3 4/8] tools/testing/cxl: Mock the Clear " alison.schofield
@ 2023-03-01 21:36 ` alison.schofield
  2023-03-01 21:36 ` [PATCH v3 6/8] tools/testing/cxl: Add a sysfs attr to test poison injection limits alison.schofield
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: alison.schofield @ 2023-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

Prior to poison inject support, the mock of 'Get Poison List'
returned a poison list containing a single mocked error record.

Following the addition of poison inject and clear support to the
mock driver, use the mock_poison_list[], rather than faking an
error record. Mock_poison_list[] list tracks the actual poison
inject and clear requests issued by userspace.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 tools/testing/cxl/test/mem.c | 59 ++++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 720147c7cb2b..9658d95236b7 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -912,31 +912,52 @@ static struct mock_poison {
 	u64 dpa;
 } mock_poison_list[MOCK_INJECT_TEST_MAX];
 
+static struct cxl_mbox_poison_payload_out
+*cxl_get_injected_po(struct cxl_dev_state *cxlds, u64 offset, u64 length)
+{
+	struct cxl_mbox_poison_payload_out *po;
+	int nr_records = 0;
+	u64 dpa;
+
+	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
+	if (!po)
+		return NULL;
+
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds != cxlds)
+			continue;
+		if (mock_poison_list[i].dpa < offset ||
+		    mock_poison_list[i].dpa > offset + length - 1)
+			continue;
+
+		dpa = mock_poison_list[i].dpa + CXL_POISON_SOURCE_INJECTED;
+		po->record[nr_records].address = cpu_to_le64(dpa);
+		po->record[nr_records].length = cpu_to_le32(1);
+		nr_records++;
+		if (nr_records == MOCK_INJECT_DEV_MAX)
+			break;
+	}
+
+	/* Always return count, even when zero */
+	po->count = cpu_to_le16(nr_records);
+
+	return po;
+}
+
 static int mock_get_poison(struct cxl_dev_state *cxlds,
 			   struct cxl_mbox_cmd *cmd)
 {
 	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
+	struct cxl_mbox_poison_payload_out *po;
+	u64 offset = le64_to_cpu(pi->offset);
+	u64 length = le64_to_cpu(pi->length);
+	int nr_records;
 
-	/* Mock one poison record at pi.offset for 64 bytes */
-	struct {
-		struct cxl_mbox_poison_payload_out po;
-		struct cxl_poison_record record;
-	} __packed mock_plist = {
-		.po = {
-			.count = cpu_to_le16(1),
-		},
-		.record = {
-			.length = cpu_to_le32(1),
-			.address = cpu_to_le64(le64_to_cpu(pi->offset) +
-					       CXL_POISON_SOURCE_INJECTED),
-		},
-	};
+	po = cxl_get_injected_po(cxlds, offset, length);
+	nr_records = le16_to_cpu(po->count);
+	memcpy(cmd->payload_out, po, struct_size(po, record, nr_records));
+	cmd->size_out = struct_size(po, record, nr_records);
 
-	if (cmd->size_out < sizeof(mock_plist))
-		return -EINVAL;
-
-	memcpy(cmd->payload_out, &mock_plist, sizeof(mock_plist));
-	cmd->size_out = sizeof(mock_plist);
 	return 0;
 }
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 6/8] tools/testing/cxl: Add a sysfs attr to test poison injection limits
  2023-03-01 21:36 [PATCH v3 0/8] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (4 preceding siblings ...)
  2023-03-01 21:36 ` [PATCH v3 5/8] tools/testing/cxl: Use injected poison for get poison list alison.schofield
@ 2023-03-01 21:36 ` alison.schofield
  2023-03-01 22:16   ` Alison Schofield
  2023-03-01 21:36 ` [PATCH v3 7/8] cxl/memdev: Make inject and clear poison cmds kernel exclusive alison.schofield
  2023-03-01 21:36 ` [PATCH v3 8/8] cxl/mbox: Block inject and clear poison opcodes in raw mode alison.schofield
  7 siblings, 1 reply; 15+ messages in thread
From: alison.schofield @ 2023-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

CXL devices may report a maximum number of addresses that a device
allows to be poisoned using poison injection. When cxl_test creates
mock CXL memory devices, it defaults to MOCK_INJECT_DEV_MAX==88 for
all mocked memdevs.

Add a sysfs attribute, poison_inject_max to module cxl_mock_mem so
that users can set a custom device injection limit. Fail, and return
-EBUSY, if the mock poison list is not empty.

/sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max

A simple usage model is to set the attribute before running a test in
order to emulate a device's poison handling.

Suggested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 tools/testing/cxl/test/mem.c | 52 +++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 9658d95236b7..5c3b3e5a3b4b 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -19,6 +19,8 @@
 #define MOCK_INJECT_DEV_MAX 8
 #define MOCK_INJECT_TEST_MAX 128
 
+static unsigned int poison_inject_dev_max = MOCK_INJECT_DEV_MAX;
+
 static struct cxl_cel_entry mock_cel[] = {
 	{
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
@@ -485,7 +487,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
 		.total_capacity =
 			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
-		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
+		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_TEST_MAX),
 	};
 
 	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
@@ -919,7 +921,7 @@ static struct cxl_mbox_poison_payload_out
 	int nr_records = 0;
 	u64 dpa;
 
-	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
+	po = kzalloc(struct_size(po, record, poison_inject_dev_max), GFP_KERNEL);
 	if (!po)
 		return NULL;
 
@@ -934,7 +936,7 @@ static struct cxl_mbox_poison_payload_out
 		po->record[nr_records].address = cpu_to_le64(dpa);
 		po->record[nr_records].length = cpu_to_le32(1);
 		nr_records++;
-		if (nr_records == MOCK_INJECT_DEV_MAX)
+		if (nr_records == poison_inject_dev_max)
 			break;
 	}
 
@@ -969,7 +971,7 @@ static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
 		if (mock_poison_list[i].cxlds == cxlds)
 			count++;
 	}
-	return (count >= MOCK_INJECT_DEV_MAX);
+	return (count >= poison_inject_dev_max);
 }
 
 static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
@@ -1051,6 +1053,47 @@ static int mock_clear_poison(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static bool mock_poison_list_empty(void)
+{
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds)
+			return false;
+	}
+	return true;
+}
+
+static ssize_t poison_inject_max_show(struct device_driver *drv, char *buf)
+{
+	return sysfs_emit(buf, "%u\n", poison_inject_dev_max);
+}
+
+static ssize_t poison_inject_max_store(struct device_driver *drv,
+				       const char *buf, size_t len)
+{
+	int val;
+
+	if (kstrtoint(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	if (!mock_poison_list_empty())
+		return -EBUSY;
+
+	if (val <= MOCK_INJECT_TEST_MAX)
+		poison_inject_dev_max = val;
+	else
+		return -EINVAL;
+
+	return len;
+}
+
+static DRIVER_ATTR_RW(poison_inject_max);
+
+static struct attribute *cxl_mock_mem_core_attrs[] = {
+	&driver_attr_poison_inject_max.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(cxl_mock_mem_core);
+
 static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct device *dev = cxlds->dev;
@@ -1259,6 +1302,7 @@ static struct platform_driver cxl_mock_mem_driver = {
 	.driver = {
 		.name = KBUILD_MODNAME,
 		.dev_groups = cxl_mock_mem_groups,
+		.groups = cxl_mock_mem_core_groups,
 	},
 };
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 7/8] cxl/memdev: Make inject and clear poison cmds kernel exclusive
  2023-03-01 21:36 [PATCH v3 0/8] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (5 preceding siblings ...)
  2023-03-01 21:36 ` [PATCH v3 6/8] tools/testing/cxl: Add a sysfs attr to test poison injection limits alison.schofield
@ 2023-03-01 21:36 ` alison.schofield
  2023-03-30 18:24   ` Jonathan Cameron
  2023-03-01 21:36 ` [PATCH v3 8/8] cxl/mbox: Block inject and clear poison opcodes in raw mode alison.schofield
  7 siblings, 1 reply; 15+ messages in thread
From: alison.schofield @ 2023-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Inject and clear poison commands are intended to be used in debug
mode only, and if improperly used, can lead to data corruption. The
kernel provides a sysfs interface that provides the protection needed
to issue these commands.[1]

The CXL driver defines Enabled commands in its ABI.[2] Enabled means
that the device and the driver both support the command. If a device
supports inject and/or clear, those commands are flagged Enabled.

The ABI also defines another command flag: Exclusive. Exclusive
commands are reserved for kernel use. The exclusive flags can be
temporal, but for inject and clear, the status is permanent.

Document the exclusivity of Inject and Clear in the ABI kernel doc.
(Clean up a typo in kdoc too: 'CXL_MEM_COMMAND_FLAG_ENABLED')

Create an exclusive commands bitmap in the memdev driver, add the
inject and clear poison commands, and set it in the cxl_dev_state.

[1] Documentation/ABI/testing/sysfs-bus-cxl
[2] include/uapi/linux/cxl_mem.h

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/memdev.c    |  6 ++++++
 include/uapi/linux/cxl_mem.h | 21 ++++++++++++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index ed3e4517dc3a..f746a0c222b6 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -10,6 +10,8 @@
 
 static DECLARE_RWSEM(cxl_memdev_rwsem);
 
+static __read_mostly DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
+
 /*
  * An entire PCI topology full of devices should be enough for any
  * config
@@ -571,6 +573,10 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
 	cxlmd->cxlds = cxlds;
 	cxlds->cxlmd = cxlmd;
 
+	set_bit(CXL_MEM_COMMAND_ID_INJECT_POISON, exclusive_cmds);
+	set_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON, exclusive_cmds);
+	set_exclusive_cxl_commands(cxlds, exclusive_cmds);
+
 	cdev = &cxlmd->cdev;
 	rc = cdev_device_add(cdev, dev);
 	if (rc)
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 86bbacf2a315..6f9ae244f7fd 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -74,17 +74,28 @@ static const struct {
  * @id: ID number for the command.
  * @flags: Flags that specify command behavior.
  *
- *         CXL_MEM_COMMAND_FLAG_USER_ENABLED
+ *         CXL_MEM_COMMAND_FLAG_ENABLED
  *
  *         The given command id is supported by the driver and is supported by
  *         a related opcode on the device.
  *
  *         CXL_MEM_COMMAND_FLAG_EXCLUSIVE
  *
- *         Requests with the given command id will terminate with EBUSY as the
- *         kernel actively owns management of the given resource. For example,
- *         the label-storage-area can not be written while the kernel is
- *         actively managing that space.
+ *	   The given command id is for kernel exclusive use and is not
+ *	   available to userspace. Requests will terminate with EBUSY.
+ *
+ *	   The exclusive flag may be temporal, and only set while the
+ *	   kernel actively owns management of the given resource. For
+ *	   example, the label-storage-area can not be written while the
+ *	   kernel is actively managing that space.
+ *
+ *	   The exclusive flag can be permanent, as in commands that can
+ *	   never be issued through the ioctl interface.
+ *
+ *	   INJECT_POISON and CLEAR_POISON are permanently kernel exclusive.
+ *	   They are supported through a sysfs interface that validates the
+ *	   safety of each command based on the state of the memdev.
+ *	   See: Documentation/ABI/testing/sysfs-bus-cxl
  *
  * @size_in: Expected input size, or ~0 if variable length.
  * @size_out: Expected output size, or ~0 if variable length.
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 8/8] cxl/mbox: Block inject and clear poison opcodes in raw mode
  2023-03-01 21:36 [PATCH v3 0/8] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (6 preceding siblings ...)
  2023-03-01 21:36 ` [PATCH v3 7/8] cxl/memdev: Make inject and clear poison cmds kernel exclusive alison.schofield
@ 2023-03-01 21:36 ` alison.schofield
  2023-03-30 18:25   ` Jonathan Cameron
  7 siblings, 1 reply; 15+ messages in thread
From: alison.schofield @ 2023-03-01 21:36 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Inject and clear poison are commands intended for debug environments,
and can cause data corruption if issued without validation. They are
kernel exclusive commands not available to userspace through ioctls,
but could be submitted via the raw mode ioctl.

Add inject and clear poison to the cxl_disabled_raw_commands[] list.
Attempts by userspace to issue either command via the RAW ioctl fail
with -EPERM.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/mbox.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 77fc811bdfed..4b5e65edbc71 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -89,6 +89,10 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
  *
  * CXL_MBOX_OP_[GET_]SCAN_MEDIA: The kernel provides a native error list that
  * is kept up to date with patrol notifications and error management.
+ *
+ * CXL_MBOX_OP_[INJECT|CLEAR]_POISON: The kernel provides a sysfs interface
+ * to these commands that ensures data protection of mapped resources.
+ * See: Documentation/ABI/testing/sysfs-bus-cxl
  */
 static u16 cxl_disabled_raw_commands[] = {
 	CXL_MBOX_OP_ACTIVATE_FW,
@@ -97,6 +101,8 @@ static u16 cxl_disabled_raw_commands[] = {
 	CXL_MBOX_OP_SET_SHUTDOWN_STATE,
 	CXL_MBOX_OP_SCAN_MEDIA,
 	CXL_MBOX_OP_GET_SCAN_MEDIA,
+	CXL_MBOX_OP_INJECT_POISON,
+	CXL_MBOX_OP_CLEAR_POISON,
 };
 
 /*
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 6/8] tools/testing/cxl: Add a sysfs attr to test poison injection limits
  2023-03-01 21:36 ` [PATCH v3 6/8] tools/testing/cxl: Add a sysfs attr to test poison injection limits alison.schofield
@ 2023-03-01 22:16   ` Alison Schofield
  0 siblings, 0 replies; 15+ messages in thread
From: Alison Schofield @ 2023-03-01 22:16 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: linux-cxl, Jonathan Cameron

On Wed, Mar 01, 2023 at 01:36:31PM -0800, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices may report a maximum number of addresses that a device
> allows to be poisoned using poison injection. When cxl_test creates
> mock CXL memory devices, it defaults to MOCK_INJECT_DEV_MAX==88 for
 
oops...MOCK_INJECT_DEV_MAX==8  (not 88)

> all mocked memdevs.
> 
> Add a sysfs attribute, poison_inject_max to module cxl_mock_mem so
> that users can set a custom device injection limit. Fail, and return
> -EBUSY, if the mock poison list is not empty.
> 
> /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
> 
> A simple usage model is to set the attribute before running a test in
> order to emulate a device's poison handling.
> 
> Suggested-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  tools/testing/cxl/test/mem.c | 52 +++++++++++++++++++++++++++++++++---
>  1 file changed, 48 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 9658d95236b7..5c3b3e5a3b4b 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -19,6 +19,8 @@
>  #define MOCK_INJECT_DEV_MAX 8
>  #define MOCK_INJECT_TEST_MAX 128
>  
snip


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/8] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-03-01 21:36 ` [PATCH v3 1/8] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
@ 2023-03-30 18:13   ` Jonathan Cameron
  2023-03-30 18:44     ` Jonathan Cameron
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Cameron @ 2023-03-30 18:13 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed,  1 Mar 2023 13:36:26 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the INJECT POISON mailbox command. Add
> a sysfs attribute and memdev driver support for injecting poison. The
> attribute is only visible for devices supporting the capability when
> the kernel is built with CONFIG_CXL_POISON_INJECT.
> 
> When a Device Physical Address (DPA) is written to the inject_poison
> sysfs attribute, send an inject poison command to the device for the
> specified address.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> inject poison request, the device will return poison when the address
> is accessed through the CXL.mem bus. Injecting poison adds the address
> to the device's Poison List and the error source is set to Injected.
> In addition, the device adds a poison creation event to its internal
> Informational Event log, updates the Event Status register, and if
> configured, interrupts the host.
> 
> Also, per the CXL Specification, it is not an error to inject poison
> into an address that already has poison present and no error is
> returned from the device.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

One query inline about the protection against DPAs that are mapped.
I'm not sure the checks won't rule out injection on a DPA that is
a configured but not yet committed HDM decoder range.
I think we should allow injection into such a memory address as it's
not in use.

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++
>  drivers/cxl/Kconfig                     | 11 +++
>  drivers/cxl/core/memdev.c               | 97 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  5 ++
>  4 files changed, 135 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index d9421c965a3b..e19d1020f30a 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -429,3 +429,25 @@ Description:
>  		attribute is only visible for devices supporting the
>  		capability. The retrieved errors are logged as kernel
>  		trace events with the label 'cxl_poison'.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/inject_poison
> +Date:		January, 2023
> +KernelVersion:	v6.3
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute, the memdev driver sends an inject poison command to
> +		the device for the specified address. The DPA must be 64-byte
> +		aligned and the length of the injected poison is 64-bytes. If
> +		successful, the device returns poison when the address is
> +		accessed through the CXL.mem bus. Injecting poison adds the
> +		address to the device's Poison List and the error source is set
> +		to Injected. In addition, the device adds a poison creation
> +		event to its internal Informational Event log, updates the
> +		Event Status register, and if configured, interrupts the host.
> +		It is not an error to inject poison into an address that
> +		already has poison present and no error is returned. The
> +		inject_poison attribute is only visible for devices supporting
> +		the capability. Kconfig option CXL_POISON_INJECT must be on
> +		to enable this option. The default is off.
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index ff4e78117b31..a7ca0bbb8475 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -139,4 +139,15 @@ config CXL_REGION_INVALIDATION_TEST
>  	  If unsure, or if this kernel is meant for production environments,
>  	  say N.
>  
> +config CXL_POISON_INJECT
> +	bool "CXL: Support CXL Memory Device Poison Inject"
> +	depends on CXL_MEM
> +	help
> +	  Selecting this option creates the sysfs attributes inject_poison
> +	  and clear_poison for CXL memory devices supporting the capability.
> +	  This option is intended for debug scenarios only and is disabled
> +	  by default. See Documentation/ABI/testing/sysfs-bus-cxl.

Default for almost everything in Kconfig is off.  Is it useful to call that out
for this one?  The unsure bit below is enough for me.

> +
> +	  If unsure, say N.
> +
>  endif
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index c11b7bc253b4..82e09b81e9c6 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -165,6 +165,92 @@ static ssize_t trigger_poison_list_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(trigger_poison_list);
>  
> +static int cxl_dpa_mapped(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled;
> +	u64 *dpa = data;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> +		return 0;
> +
> +	if (*dpa <= cxled->dpa_res->end && *dpa >= cxled->dpa_res->start) {

I haven't chased it all the way through, but are we guaranteed that this
particular decoder is committed if we reach here? 
I think the resource is setup when we set size not at point of committing
so checking that may not be sufficient.

> +		dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n",
> +			*dpa, dev_name(&cxled->cxld.region->dev));
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
> +{
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_port *port;
> +	int rc;
> +
> +	if (!resource_size(&cxlds->dpa_res)) {
> +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> +		return -EINVAL;
> +	}
> +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> +			dpa, &cxlds->dpa_res);
> +		return -EINVAL;
> +	}
> +	if (!IS_ALIGNED(dpa, 64)) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n", dpa);
> +		return -EINVAL;
> +	}
> +	port = dev_get_drvdata(&cxlmd->dev);
> +	if (port && is_cxl_endpoint(port) && port->commit_end != -1) {
> +		rc = device_for_each_child(&port->dev, &dpa, cxl_dpa_mapped);
> +		if (rc)
> +			return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static ssize_t inject_poison_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_mbox_inject_poison inject;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +
> +	down_read(&cxl_dpa_rwsem);
> +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> +	if (rc) {
> +		up_read(&cxl_dpa_rwsem);
> +		return rc;
> +	}
> +
> +	inject = (struct cxl_mbox_inject_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> +		.size_in = sizeof(inject),
> +		.payload_in = &inject,
> +	};
> +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> +
> +	up_read(&cxl_dpa_rwsem);
> +	return rc ? rc : len;
> +}
> +static DEVICE_ATTR_WO(inject_poison);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> @@ -172,6 +258,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_label_storage_size.attr,
>  	&dev_attr_numa_node.attr,
>  	&dev_attr_trigger_poison_list.attr,
> +	&dev_attr_inject_poison.attr,
>  	NULL,
>  };
>  
> @@ -198,6 +285,16 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>  			return 0;
>  	}
> +	if (a == &dev_attr_inject_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);
> +
> +		if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT))
> +			return 0;
> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> +			return 0;
> +	}
>  	return a->mode;
>  }
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index a6eb1b42eb88..01d27f362cd6 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -602,6 +602,11 @@ struct cxl_mbox_poison_payload_out {
>  #define CXL_POISON_SOURCE_INJECTED	3
>  #define CXL_POISON_SOURCE_VENDOR	7
>  
> +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> +struct cxl_mbox_inject_poison {
> +	__le64 address;
> +};
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/8] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-03-01 21:36 ` [PATCH v3 2/8] cxl/memdev: Add support for the Clear " alison.schofield
@ 2023-03-30 18:16   ` Jonathan Cameron
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2023-03-30 18:16 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed,  1 Mar 2023 13:36:27 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the CLEAR POISON mailbox command. Add
> a sysfs attribute and memdev driver support for clearing poison. The
> attribute is only visible for devices supporting the capability when
> the kernel is built with CONFIG_CXL_POISON_INJECT.
> 
> When a Device Physical Address (DPA) is written to the clear_poison
> sysfs attribute, send a clear poison command to the device for the
> specified address.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
> clear poison request, the device removes the address from the device's
> Poison List and writes 0 (zero) for 64 bytes starting at address. If
> the device cannot clear poison from the address, it returns a permanent
> media error and -ENXIO is returned to the user.
> 
> Additionally, and per the spec also, it is not an error to clear poison
> of an address that is not poisoned. In this case, the device does not
> overwrite the address and the device does not return an error.

That's not inline with the spec.

"Clear Poison Write Data: The data the device shall always write into the
requested physical address, atomically, while clearing poison if the location
is marked as being poisoned."

The overwrite always happens whether or not it's poisoned.

> 
> *Implementation note: Although the CXL specification defines the clear
> command to accept 64 bytes of 'write-data' to be used when clearing
> the poisoned address, this implementation always uses 0 (zeros) for
> the write-data.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
>  drivers/cxl/core/memdev.c               | 59 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  6 +++
>  3 files changed, 83 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index e19d1020f30a..e2c77eda443e 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -451,3 +451,21 @@ Description:
>  		inject_poison attribute is only visible for devices supporting
>  		the capability. Kconfig option CXL_POISON_INJECT must be on
>  		to enable this option. The default is off.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/clear_poison
> +Date:		January, 2023
> +KernelVersion:	v6.3
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute, the memdev driver sends a clear poison command to
> +		the device for the specified address. Clearing poison removes
> +		the address from the device's Poison List and writes 0 (zero)
> +		for 64 bytes starting at address. It is not an error to clear
> +		poison from an address that does not have poison set, and if
> +		poison was not set, the address is not overwritten. If the

As above.

> +		device cannot clear poison from the address, -ENXIO is returned.
> +		The clear_poison attribute is only visible for devices
> +		supporting the capability. Kconfig option CXL_POISON_INJECT
> +		must be on to enable this option. The default is off.
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 82e09b81e9c6..ed3e4517dc3a 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -251,6 +251,53 @@ static ssize_t inject_poison_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(inject_poison);
>  
> +static ssize_t clear_poison_store(struct device *dev,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_mbox_clear_poison clear;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +
> +	down_read(&cxl_dpa_rwsem);
> +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> +	if (rc) {
> +		up_read(&cxl_dpa_rwsem);
> +		return rc;
> +	}
> +
> +	/*
> +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> +	 * is defined to accept 64 bytes of 'write-data', along with the
> +	 * address to clear. The device writes the data into the address
> +	 * atomically, while clearing poison if the location is marked as
> +	 * being poisoned.

This statement is correct in that it says the data is always written
whether or not there is poison at the address.

> +	 *
> +	 * Always use '0' for the write-data.
> +	 */
> +	clear = (struct cxl_mbox_clear_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};
> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> +		.size_in = sizeof(clear),
> +		.payload_in = &clear,
> +	};
> +
> +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> +
> +	up_read(&cxl_dpa_rwsem);
> +	return rc ? rc : len;
> +}
> +static DEVICE_ATTR_WO(clear_poison);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> @@ -259,6 +306,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_numa_node.attr,
>  	&dev_attr_trigger_poison_list.attr,
>  	&dev_attr_inject_poison.attr,
> +	&dev_attr_clear_poison.attr,
>  	NULL,
>  };
>  
> @@ -295,6 +343,17 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>  			return 0;
>  	}
> +	if (a == &dev_attr_clear_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);
> +
> +		if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT))
> +			return 0;
> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
> +			return 0;
> +		}
> +	}
>  	return a->mode;
>  }
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 01d27f362cd6..8a15274789a6 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -607,6 +607,12 @@ struct cxl_mbox_inject_poison {
>  	__le64 address;
>  };
>  
> +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> +struct cxl_mbox_clear_poison {
> +	__le64 address;
> +	u8 write_data[CXL_POISON_LEN_MULT];
> +} __packed;
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 7/8] cxl/memdev: Make inject and clear poison cmds kernel exclusive
  2023-03-01 21:36 ` [PATCH v3 7/8] cxl/memdev: Make inject and clear poison cmds kernel exclusive alison.schofield
@ 2023-03-30 18:24   ` Jonathan Cameron
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2023-03-30 18:24 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed,  1 Mar 2023 13:36:32 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Inject and clear poison commands are intended to be used in debug
> mode only, and if improperly used, can lead to data corruption. The
> kernel provides a sysfs interface that provides the protection needed
> to issue these commands.[1]
> 
> The CXL driver defines Enabled commands in its ABI.[2] Enabled means
> that the device and the driver both support the command. If a device
> supports inject and/or clear, those commands are flagged Enabled.
> 
> The ABI also defines another command flag: Exclusive. Exclusive
> commands are reserved for kernel use. The exclusive flags can be
> temporal, but for inject and clear, the status is permanent.
> 
> Document the exclusivity of Inject and Clear in the ABI kernel doc.
> (Clean up a typo in kdoc too: 'CXL_MEM_COMMAND_FLAG_ENABLED')
> 
> Create an exclusive commands bitmap in the memdev driver, add the
> inject and clear poison commands, and set it in the cxl_dev_state.
> 
> [1] Documentation/ABI/testing/sysfs-bus-cxl
> [2] include/uapi/linux/cxl_mem.h
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/cxl/core/memdev.c    |  6 ++++++
>  include/uapi/linux/cxl_mem.h | 21 ++++++++++++++++-----
>  2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index ed3e4517dc3a..f746a0c222b6 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -10,6 +10,8 @@
>  
>  static DECLARE_RWSEM(cxl_memdev_rwsem);
>  
> +static __read_mostly DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);

Seems like a macro to wrap up the DECLARE_BITMAP and the length
so that we don't have to be careful these are the correct size might be useful.



> +
>  /*
>   * An entire PCI topology full of devices should be enough for any
>   * config
> @@ -571,6 +573,10 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
>  	cxlmd->cxlds = cxlds;
>  	cxlds->cxlmd = cxlmd;
>  
> +	set_bit(CXL_MEM_COMMAND_ID_INJECT_POISON, exclusive_cmds);
> +	set_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON, exclusive_cmds);
> +	set_exclusive_cxl_commands(cxlds, exclusive_cmds);
> +
>  	cdev = &cxlmd->cdev;
>  	rc = cdev_device_add(cdev, dev);
>  	if (rc)
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 86bbacf2a315..6f9ae244f7fd 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -74,17 +74,28 @@ static const struct {
>   * @id: ID number for the command.
>   * @flags: Flags that specify command behavior.
>   *
> - *         CXL_MEM_COMMAND_FLAG_USER_ENABLED
> + *         CXL_MEM_COMMAND_FLAG_ENABLED
>   *
>   *         The given command id is supported by the driver and is supported by
>   *         a related opcode on the device.
>   *
>   *         CXL_MEM_COMMAND_FLAG_EXCLUSIVE
>   *
> - *         Requests with the given command id will terminate with EBUSY as the
> - *         kernel actively owns management of the given resource. For example,
> - *         the label-storage-area can not be written while the kernel is
> - *         actively managing that space.
> + *	   The given command id is for kernel exclusive use and is not
I'm guessing tabs and space differences to the above?

Other than those little things LGTM

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> + *	   available to userspace. Requests will terminate with EBUSY.
> + *
> + *	   The exclusive flag may be temporal, and only set while the
> + *	   kernel actively owns management of the given resource. For
> + *	   example, the label-storage-area can not be written while the
> + *	   kernel is actively managing that space.
> + *
> + *	   The exclusive flag can be permanent, as in commands that can
> + *	   never be issued through the ioctl interface.
> + *
> + *	   INJECT_POISON and CLEAR_POISON are permanently kernel exclusive.
> + *	   They are supported through a sysfs interface that validates the
> + *	   safety of each command based on the state of the memdev.
> + *	   See: Documentation/ABI/testing/sysfs-bus-cxl
>   *
>   * @size_in: Expected input size, or ~0 if variable length.
>   * @size_out: Expected output size, or ~0 if variable length.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 8/8] cxl/mbox: Block inject and clear poison opcodes in raw mode
  2023-03-01 21:36 ` [PATCH v3 8/8] cxl/mbox: Block inject and clear poison opcodes in raw mode alison.schofield
@ 2023-03-30 18:25   ` Jonathan Cameron
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2023-03-30 18:25 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed,  1 Mar 2023 13:36:33 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Inject and clear poison are commands intended for debug environments,
> and can cause data corruption if issued without validation. They are
> kernel exclusive commands not available to userspace through ioctls,
> but could be submitted via the raw mode ioctl.
> 
> Add inject and clear poison to the cxl_disabled_raw_commands[] list.
> Attempts by userspace to issue either command via the RAW ioctl fail
> with -EPERM.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/mbox.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 77fc811bdfed..4b5e65edbc71 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -89,6 +89,10 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>   *
>   * CXL_MBOX_OP_[GET_]SCAN_MEDIA: The kernel provides a native error list that
>   * is kept up to date with patrol notifications and error management.
> + *
> + * CXL_MBOX_OP_[INJECT|CLEAR]_POISON: The kernel provides a sysfs interface
> + * to these commands that ensures data protection of mapped resources.
> + * See: Documentation/ABI/testing/sysfs-bus-cxl
>   */
>  static u16 cxl_disabled_raw_commands[] = {
>  	CXL_MBOX_OP_ACTIVATE_FW,
> @@ -97,6 +101,8 @@ static u16 cxl_disabled_raw_commands[] = {
>  	CXL_MBOX_OP_SET_SHUTDOWN_STATE,
>  	CXL_MBOX_OP_SCAN_MEDIA,
>  	CXL_MBOX_OP_GET_SCAN_MEDIA,
> +	CXL_MBOX_OP_INJECT_POISON,
> +	CXL_MBOX_OP_CLEAR_POISON,
>  };
>  
>  /*


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/8] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-03-30 18:13   ` Jonathan Cameron
@ 2023-03-30 18:44     ` Jonathan Cameron
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2023-03-30 18:44 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Thu, 30 Mar 2023 19:13:35 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Wed,  1 Mar 2023 13:36:26 -0800
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the INJECT POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for injecting poison. The
> > attribute is only visible for devices supporting the capability when
> > the kernel is built with CONFIG_CXL_POISON_INJECT.
> > 
> > When a Device Physical Address (DPA) is written to the inject_poison
> > sysfs attribute, send an inject poison command to the device for the
> > specified address.
> > 
> > Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> > inject poison request, the device will return poison when the address
> > is accessed through the CXL.mem bus. Injecting poison adds the address
> > to the device's Poison List and the error source is set to Injected.
> > In addition, the device adds a poison creation event to its internal
> > Informational Event log, updates the Event Status register, and if
> > configured, interrupts the host.
> > 
> > Also, per the CXL Specification, it is not an error to inject poison
> > into an address that already has poison present and no error is
> > returned from the device.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>  
> 
> One query inline about the protection against DPAs that are mapped.
> I'm not sure the checks won't rule out injection on a DPA that is
> a configured but not yet committed HDM decoder range.
> I think we should allow injection into such a memory address as it's
> not in use.

Oops. I've managed to find an old version.  Ignore these comments, I'll
take a look at v5. 

Jonathan

> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++
> >  drivers/cxl/Kconfig                     | 11 +++
> >  drivers/cxl/core/memdev.c               | 97 +++++++++++++++++++++++++
> >  drivers/cxl/cxlmem.h                    |  5 ++
> >  4 files changed, 135 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index d9421c965a3b..e19d1020f30a 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -429,3 +429,25 @@ Description:
> >  		attribute is only visible for devices supporting the
> >  		capability. The retrieved errors are logged as kernel
> >  		trace events with the label 'cxl_poison'.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/memX/inject_poison
> > +Date:		January, 2023
> > +KernelVersion:	v6.3
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(WO) When a Device Physical Address (DPA) is written to this
> > +		attribute, the memdev driver sends an inject poison command to
> > +		the device for the specified address. The DPA must be 64-byte
> > +		aligned and the length of the injected poison is 64-bytes. If
> > +		successful, the device returns poison when the address is
> > +		accessed through the CXL.mem bus. Injecting poison adds the
> > +		address to the device's Poison List and the error source is set
> > +		to Injected. In addition, the device adds a poison creation
> > +		event to its internal Informational Event log, updates the
> > +		Event Status register, and if configured, interrupts the host.
> > +		It is not an error to inject poison into an address that
> > +		already has poison present and no error is returned. The
> > +		inject_poison attribute is only visible for devices supporting
> > +		the capability. Kconfig option CXL_POISON_INJECT must be on
> > +		to enable this option. The default is off.
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index ff4e78117b31..a7ca0bbb8475 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -139,4 +139,15 @@ config CXL_REGION_INVALIDATION_TEST
> >  	  If unsure, or if this kernel is meant for production environments,
> >  	  say N.
> >  
> > +config CXL_POISON_INJECT
> > +	bool "CXL: Support CXL Memory Device Poison Inject"
> > +	depends on CXL_MEM
> > +	help
> > +	  Selecting this option creates the sysfs attributes inject_poison
> > +	  and clear_poison for CXL memory devices supporting the capability.
> > +	  This option is intended for debug scenarios only and is disabled
> > +	  by default. See Documentation/ABI/testing/sysfs-bus-cxl.  
> 
> Default for almost everything in Kconfig is off.  Is it useful to call that out
> for this one?  The unsure bit below is enough for me.
> 
> > +
> > +	  If unsure, say N.
> > +
> >  endif
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index c11b7bc253b4..82e09b81e9c6 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -165,6 +165,92 @@ static ssize_t trigger_poison_list_store(struct device *dev,
> >  }
> >  static DEVICE_ATTR_WO(trigger_poison_list);
> >  
> > +static int cxl_dpa_mapped(struct device *dev, void *data)
> > +{
> > +	struct cxl_endpoint_decoder *cxled;
> > +	u64 *dpa = data;
> > +
> > +	if (!is_endpoint_decoder(dev))
> > +		return 0;
> > +
> > +	cxled = to_cxl_endpoint_decoder(dev);
> > +	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> > +		return 0;
> > +
> > +	if (*dpa <= cxled->dpa_res->end && *dpa >= cxled->dpa_res->start) {  
> 
> I haven't chased it all the way through, but are we guaranteed that this
> particular decoder is committed if we reach here? 
> I think the resource is setup when we set size not at point of committing
> so checking that may not be sufficient.
> 
> > +		dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n",
> > +			*dpa, dev_name(&cxled->cxld.region->dev));
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
> > +{
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct cxl_port *port;
> > +	int rc;
> > +
> > +	if (!resource_size(&cxlds->dpa_res)) {
> > +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> > +		return -EINVAL;
> > +	}
> > +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> > +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> > +			dpa, &cxlds->dpa_res);
> > +		return -EINVAL;
> > +	}
> > +	if (!IS_ALIGNED(dpa, 64)) {
> > +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n", dpa);
> > +		return -EINVAL;
> > +	}
> > +	port = dev_get_drvdata(&cxlmd->dev);
> > +	if (port && is_cxl_endpoint(port) && port->commit_end != -1) {
> > +		rc = device_for_each_child(&port->dev, &dpa, cxl_dpa_mapped);
> > +		if (rc)
> > +			return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static ssize_t inject_poison_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_mbox_inject_poison inject;
> > +	struct cxl_mbox_cmd mbox_cmd;
> > +	u64 dpa;
> > +	int rc;
> > +
> > +	rc = kstrtou64(buf, 0, &dpa);
> > +	if (rc)
> > +		return rc;
> > +
> > +	down_read(&cxl_dpa_rwsem);
> > +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> > +	if (rc) {
> > +		up_read(&cxl_dpa_rwsem);
> > +		return rc;
> > +	}
> > +
> > +	inject = (struct cxl_mbox_inject_poison) {
> > +		.address = cpu_to_le64(dpa)
> > +	};
> > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> > +		.size_in = sizeof(inject),
> > +		.payload_in = &inject,
> > +	};
> > +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> > +
> > +	up_read(&cxl_dpa_rwsem);
> > +	return rc ? rc : len;
> > +}
> > +static DEVICE_ATTR_WO(inject_poison);
> > +
> >  static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_serial.attr,
> >  	&dev_attr_firmware_version.attr,
> > @@ -172,6 +258,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_label_storage_size.attr,
> >  	&dev_attr_numa_node.attr,
> >  	&dev_attr_trigger_poison_list.attr,
> > +	&dev_attr_inject_poison.attr,
> >  	NULL,
> >  };
> >  
> > @@ -198,6 +285,16 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> >  			return 0;
> >  	}
> > +	if (a == &dev_attr_inject_poison.attr) {
> > +		struct device *dev = kobj_to_dev(kobj);
> > +
> > +		if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT))
> > +			return 0;
> > +
> > +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > +			return 0;
> > +	}
> >  	return a->mode;
> >  }
> >  
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index a6eb1b42eb88..01d27f362cd6 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -602,6 +602,11 @@ struct cxl_mbox_poison_payload_out {
> >  #define CXL_POISON_SOURCE_INJECTED	3
> >  #define CXL_POISON_SOURCE_VENDOR	7
> >  
> > +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> > +struct cxl_mbox_inject_poison {
> > +	__le64 address;
> > +};
> > +
> >  /**
> >   * struct cxl_mem_command - Driver representation of a memory device command
> >   * @info: Command information as it exists for the UAPI  
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-03-30 18:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-01 21:36 [PATCH v3 0/8] cxl: CXL Inject & Clear Poison alison.schofield
2023-03-01 21:36 ` [PATCH v3 1/8] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
2023-03-30 18:13   ` Jonathan Cameron
2023-03-30 18:44     ` Jonathan Cameron
2023-03-01 21:36 ` [PATCH v3 2/8] cxl/memdev: Add support for the Clear " alison.schofield
2023-03-30 18:16   ` Jonathan Cameron
2023-03-01 21:36 ` [PATCH v3 3/8] tools/testing/cxl: Mock the Inject " alison.schofield
2023-03-01 21:36 ` [PATCH v3 4/8] tools/testing/cxl: Mock the Clear " alison.schofield
2023-03-01 21:36 ` [PATCH v3 5/8] tools/testing/cxl: Use injected poison for get poison list alison.schofield
2023-03-01 21:36 ` [PATCH v3 6/8] tools/testing/cxl: Add a sysfs attr to test poison injection limits alison.schofield
2023-03-01 22:16   ` Alison Schofield
2023-03-01 21:36 ` [PATCH v3 7/8] cxl/memdev: Make inject and clear poison cmds kernel exclusive alison.schofield
2023-03-30 18:24   ` Jonathan Cameron
2023-03-01 21:36 ` [PATCH v3 8/8] cxl/mbox: Block inject and clear poison opcodes in raw mode alison.schofield
2023-03-30 18:25   ` Jonathan Cameron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.