All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] cxl: CXL Inject & Clear Poison
@ 2023-01-19  5:00 alison.schofield
  2023-01-19  5:00 ` [PATCH v2 1/6] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: alison.schofield @ 2023-01-19  5:00 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Built on cxl/next plus Patchset: CXL Poison List Retrieval & Tracing:
https://lore.kernel.org/linux-cxl/de11785ff05844299b40b100f8e0f56c7eef7f08.1674070170.git.alison.schofield@intel.com/

Changes in v2:
- Add Jonathan Reviewed-by tags to Patches 1,2,4
- Clean up input payload structs for both inject and clear (Dan)
- Commit message cleanups, including spec references (Dave)
- Use CXL_POISON_LEN_MULT in define of clear write data
- Use IS_ALIGNED() for 64byte align check (Dan)
- Add Kconfig CXL_POISON_INJECT  (Dan)
- Trivial space cleanup (Jonathan)
- Doc/ABI cleanup (Dave, Dan)
- Mock: Only use injected errors for get poison list
- Mock: Use 'POISONLMT -ENXIO' text from CMD_CMD_RC_TABLE  (Jonathan)
- Mock: Add Patch 6/6: A module param to mock device inject limit

Link to v1: https://lore.kernel.org/linux-cxl/cover.1669781852.git.alison.schofield@intel.com/

Introducing Inject and Clear Poison support for CXL Devices.

These are optional commands, meaning not all CXL devices must support
them. The sysfs attributes, inject_poison and clear_poison, are only
visible for devices reporting support of the capability and when the
kernel Kconfig option CONFIG_CXL_POISON_INJECT is on. (Default: off)

Example:
# echo 0x40000000 > /sys/bus/cxl/devices/mem1/inject_poison
# echo 1 > /sys/bus/cxl/devices/mem1/trigger_poison_list

cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Injected flags= overflow_time=0


Alison Schofield (6):
  cxl/memdev: Add support for the Inject Poison mailbox command
  cxl/memdev: Add support for the Clear Poison mailbox command
  tools/testing/cxl: Mock the Inject Poison mailbox command
  tools/testing/cxl: Mock the Clear Poison mailbox command
  tools/testing/cxl: Use injected poison for get poison list
  tools/testing/cxl: Add a param to test poison injection limits

 Documentation/ABI/testing/sysfs-bus-cxl |  40 ++++++
 drivers/cxl/Kconfig                     |  10 ++
 drivers/cxl/core/memdev.c               | 122 ++++++++++++++++
 drivers/cxl/cxlmem.h                    |  11 ++
 tools/testing/cxl/test/mem.c            | 178 +++++++++++++++++++++---
 5 files changed, 341 insertions(+), 20 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/6] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-01-19  5:00 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison alison.schofield
@ 2023-01-19  5:00 ` alison.schofield
  2023-01-27 23:06   ` Dan Williams
  2023-01-19  5:00 ` [PATCH v2 2/6] cxl/memdev: Add support for the Clear " alison.schofield
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: alison.schofield @ 2023-01-19  5:00 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the INJECT POISON mailbox command. Add
a sysfs attribute and memdev driver support for injecting poison.

When a Device Physical Address (DPA) is written to the inject_poison
sysfs attribute, send an inject poison command to the device for the
specified address.

Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
inject poison request, the device will return poison when the address
is accessed through the CXL.mem bus. Injecting poison adds the address
to the device's Poison List and the error source is set to Injected.
In addition, the device adds a poison creation event to its internal
Informational Event log, updates the Event Status register, and if
configured, interrupts the host.

Also, per the CXL Specification, it is not an error to inject poison
into an address that already has poison present and no error is
returned from the device.

The inject_poison attribute is only visible for devices supporting
the capability when the kernel is built with CONFIG_CXL_POISON_INJECT.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++
 drivers/cxl/Kconfig                     | 10 ++++
 drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  5 ++
 4 files changed, 104 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index b715a4609718..e9c6dd02bd09 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -416,3 +416,25 @@ Description:
 		if accessed, and the source of the poison. The retrieved
 		errors are logged as kernel trace events with the label
 		'cxl_poison'.
+
+
+What:		/sys/bus/cxl/devices/memX/inject_poison
+Date:		January, 2023
+KernelVersion:	v6.3
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute, the memdev driver sends an inject poison command to
+		the device for the specified address. The DPA must be 64-byte
+		aligned and the length of the injected poison is 64-bytes. If
+		successful, the device returns poison when the address is
+		accessed through the CXL.mem bus. Injecting poison adds the
+		address to the device's Poison List and the error source is set
+		to Injected. In addition, the device adds a poison creation
+		event to its internal Informational Event log, updates the
+		Event Status register, and if configured, interrupts the host.
+		It is not an error to inject poison into an address that
+		already has poison present and no error is returned. The
+		inject_poison attribute is only visible for devices supporting
+		the capability. Kconfig option CXL_POISON_INJECT must be on
+		to enable this option. The default is off.
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 0ac53c422c31..6541f54725cd 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -129,4 +129,14 @@ config CXL_REGION_INVALIDATION_TEST
 	  If unsure, or if this kernel is meant for production environments,
 	  say N.
 
+config CXL_POISON_INJECT
+	bool "CXL: Support CXL Memory Device Poison Inject"
+	depends on CXL_MEM
+	help
+	  Selecting this option creates the sysfs attributes inject_poison
+	  and clear_poison for CXL memory devices supporting the capability.
+	  See Documentation/ABI/testing/sysfs-bus-cxl.
+
+	  If unsure, say N.
+
 endif
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index e0af7e9c9989..226662cf3331 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -142,6 +142,61 @@ static ssize_t trigger_poison_list_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(trigger_poison_list);
 
+static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	if (!resource_size(&cxlds->dpa_res)) {
+		dev_dbg(cxlds->dev, "device has no dpa resource\n");
+		return -EINVAL;
+	}
+	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
+		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
+			dpa, &cxlds->dpa_res);
+		return -EINVAL;
+	}
+	if (!IS_ALIGNED(dpa, 64)) {
+		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n",
+			dpa);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static ssize_t inject_poison_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_mbox_inject_poison inject;
+	struct cxl_mbox_cmd mbox_cmd;
+	u64 dpa;
+	int rc;
+
+	rc = kstrtou64(buf, 0, &dpa);
+	if (rc)
+		return rc;
+
+	rc = cxl_validate_poison_dpa(cxlds, dpa);
+	if (rc)
+		return rc;
+
+	inject = (struct cxl_mbox_inject_poison) {
+		.address = cpu_to_le64(dpa)
+	};
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_INJECT_POISON,
+		.size_in = sizeof(inject),
+		.payload_in = &inject,
+	};
+
+	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_WO(inject_poison);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
@@ -149,6 +204,7 @@ static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_label_storage_size.attr,
 	&dev_attr_numa_node.attr,
 	&dev_attr_trigger_poison_list.attr,
+	&dev_attr_inject_poison.attr,
 	NULL,
 };
 
@@ -168,6 +224,10 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 	if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
 		return 0;
 
+	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
+	    a == &dev_attr_inject_poison.attr)
+		return 0;
+
 	if (a == &dev_attr_trigger_poison_list.attr) {
 		struct device *dev = kobj_to_dev(kobj);
 
@@ -175,6 +235,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
 			return 0;
 	}
+	if (a == &dev_attr_inject_poison.attr) {
+		struct device *dev = kobj_to_dev(kobj);
+
+		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
+			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
+			return 0;
+	}
 	return a->mode;
 }
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 28ba0cd8f2d3..862ca4f4cc06 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -436,6 +436,11 @@ struct cxl_mbox_poison_payload_out {
 #define CXL_POISON_SOURCE_INJECTED	3
 #define CXL_POISON_SOURCE_VENDOR	7
 
+/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
+struct cxl_mbox_inject_poison {
+	__le64 address;
+};
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/6] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-01-19  5:00 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison alison.schofield
  2023-01-19  5:00 ` [PATCH v2 1/6] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
@ 2023-01-19  5:00 ` alison.schofield
  2023-01-27 23:56   ` Dan Williams
  2023-01-19  5:00 ` [PATCH v2 3/6] tools/testing/cxl: Mock the Inject " alison.schofield
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: alison.schofield @ 2023-01-19  5:00 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the CLEAR POISON mailbox command. Add
a sysfs attribute and memdev driver support for clearing poison.

When a Device Physical Address (DPA) is written to the clear_poison
sysfs attribute, send a clear poison command to the device for the
specified address.

Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid clear
poison request, the device removes the address from the device's Poison
List and writes 0 (zero) for 64 bytes starting at address. If the device
cannot clear poison from the address, it returns a permanent media error
and -ENXIO is returned to the user.

Additionally, and per the spec also, it is not an error to clear poison
of an address that is not poisoned. No error is returned from the device
and the address is not overwritten.

*Implementation note: Although the CXL specification defines the clear
command to accept 64 bytes of 'write-data' to be used when clearing
the poisoned address, this implementation always uses 0 (zeros) for
the write-data.

The clear_poison attribute is only visible for devices supporting the
capability when the kernel is built with CONFIG_CXL_POISON_INJECT.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
 drivers/cxl/core/memdev.c               | 57 ++++++++++++++++++++++++-
 drivers/cxl/cxlmem.h                    |  6 +++
 3 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index e9c6dd02bd09..7e4897e7bc05 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -438,3 +438,21 @@ Description:
 		inject_poison attribute is only visible for devices supporting
 		the capability. Kconfig option CXL_POISON_INJECT must be on
 		to enable this option. The default is off.
+
+
+What:		/sys/bus/cxl/devices/memX/clear_poison
+Date:		January, 2023
+KernelVersion:	v6.3
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute, the memdev driver sends a clear poison command to
+		the device for the specified address. Clearing poison removes
+		the address from the device's Poison List and writes 0 (zero)
+		for 64 bytes starting at address. It is not an error to clear
+		poison from an address that does not have poison set, and if
+		poison was not set, the address is not overwritten. If the
+		device cannot clear poison from the address, -ENXIO is returned.
+		The clear_poison attribute is only visible for devices
+		supporting the capability. Kconfig option CXL_POISON_INJECT
+		must be on to enable this option. The default is off.
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 226662cf3331..4d86a4565c9e 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -197,6 +197,51 @@ static ssize_t inject_poison_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(inject_poison);
 
+static ssize_t clear_poison_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_mbox_clear_poison clear;
+	struct cxl_mbox_cmd mbox_cmd;
+	u64 dpa;
+	int rc;
+
+	rc = kstrtou64(buf, 0, &dpa);
+	if (rc)
+		return rc;
+
+	rc = cxl_validate_poison_dpa(cxlds, dpa);
+	if (rc)
+		return rc;
+	/*
+	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
+	 * is defined to accept 64 bytes of 'write-data', along with the
+	 * address to clear. The device writes the data into the address
+	 * atomically, while clearing poison if the location is marked as
+	 * being poisoned.
+	 *
+	 * Always use '0' for the write-data.
+	 */
+	clear = (struct cxl_mbox_clear_poison) {
+		.address = cpu_to_le64(dpa)
+	};
+
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_CLEAR_POISON,
+		.size_in = sizeof(clear),
+		.payload_in = &clear,
+	};
+
+	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_WO(clear_poison);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
@@ -205,6 +250,7 @@ static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_numa_node.attr,
 	&dev_attr_trigger_poison_list.attr,
 	&dev_attr_inject_poison.attr,
+	&dev_attr_clear_poison.attr,
 	NULL,
 };
 
@@ -225,7 +271,8 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 		return 0;
 
 	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
-	    a == &dev_attr_inject_poison.attr)
+	    (a == &dev_attr_inject_poison.attr ||
+	     a == &dev_attr_clear_poison.attr))
 		return 0;
 
 	if (a == &dev_attr_trigger_poison_list.attr) {
@@ -242,6 +289,14 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
 			return 0;
 	}
+	if (a == &dev_attr_clear_poison.attr) {
+		struct device *dev = kobj_to_dev(kobj);
+
+		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
+			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
+			return 0;
+		}
+	}
 	return a->mode;
 }
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 862ca4f4cc06..adcbd4a98819 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -441,6 +441,12 @@ struct cxl_mbox_inject_poison {
 	__le64 address;
 };
 
+/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
+struct cxl_mbox_clear_poison {
+	__le64 address;
+	u8 write_data[CXL_POISON_LEN_MULT];
+} __packed;
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/6] tools/testing/cxl: Mock the Inject Poison mailbox command
  2023-01-19  5:00 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison alison.schofield
  2023-01-19  5:00 ` [PATCH v2 1/6] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
  2023-01-19  5:00 ` [PATCH v2 2/6] cxl/memdev: Add support for the Clear " alison.schofield
@ 2023-01-19  5:00 ` alison.schofield
  2023-01-23 15:10   ` Jonathan Cameron
  2023-01-19  5:00 ` [PATCH v2 4/6] tools/testing/cxl: Mock the Clear " alison.schofield
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: alison.schofield @ 2023-01-19  5:00 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Mock the injection of poison by storing the device:address entries in
mock_poison_list[]. Enforce a limit of 8 poison injections per memdev
device and 128 total entries for the cxl_test mock driver.

Introducing the mock_poison[] list here, makes it available for use in
the mock of Clear Poison, and the mock of Get Poison List.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 tools/testing/cxl/test/mem.c | 77 ++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 09dc358bb33b..b0ecdc4e7c87 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -14,6 +14,9 @@
 #define DEV_SIZE SZ_2G
 #define EFFECT(x) (1U << x)
 
+#define MOCK_INJECT_DEV_MAX 8
+#define MOCK_INJECT_TEST_MAX 128
+
 static struct cxl_cel_entry mock_cel[] = {
 	{
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
@@ -43,6 +46,10 @@ static struct cxl_cel_entry mock_cel[] = {
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
 		.effect = cpu_to_le16(0),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
+		.effect = cpu_to_le16(0),
+	},
 };
 
 /* See CXL 2.0 Table 181 Get Health Info Output Payload */
@@ -144,6 +151,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
 		.total_capacity =
 			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
+		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
 	};
 
 	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
@@ -565,6 +573,11 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static struct mock_poison {
+	struct cxl_dev_state *cxlds;
+	u64 dpa;
+} mock_poison_list[MOCK_INJECT_TEST_MAX];
+
 static int mock_get_poison(struct cxl_dev_state *cxlds,
 			   struct cxl_mbox_cmd *cmd)
 {
@@ -593,6 +606,67 @@ static int mock_get_poison(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
+{
+	int count = 0;
+
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds == cxlds)
+			count++;
+	}
+	return (count >= MOCK_INJECT_DEV_MAX);
+}
+
+static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	if (mock_poison_dev_max_injected(cxlds)) {
+		dev_dbg(cxlds->dev,
+			"Device poison injection limit has been reached: %d\n",
+			MOCK_INJECT_DEV_MAX);
+		return false;
+	}
+
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (!mock_poison_list[i].cxlds) {
+			mock_poison_list[i].cxlds = cxlds;
+			mock_poison_list[i].dpa = dpa;
+			return true;
+		}
+	}
+	dev_dbg(cxlds->dev,
+		"Mock test poison injection limit has been reached: %d\n",
+		MOCK_INJECT_TEST_MAX);
+
+	return false;
+}
+
+static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds == cxlds &&
+		    mock_poison_list[i].dpa == dpa)
+			return true;
+	}
+	return false;
+}
+
+static int mock_inject_poison(struct cxl_dev_state *cxlds,
+			      struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_inject_poison *pi = cmd->payload_in;
+	u64 dpa = le64_to_cpu(pi->address);
+
+	if (mock_poison_found(cxlds, dpa)) {
+		/* Not an error to inject poison if already poisoned */
+		dev_dbg(cxlds->dev, "DPA: 0x%llx already poisoned\n", dpa);
+		return 0;
+	}
+	if (!mock_poison_add(cxlds, dpa))
+		return -ENXIO;
+
+	return 0;
+}
+
 static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct device *dev = cxlds->dev;
@@ -644,6 +718,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_GET_POISON:
 		rc = mock_get_poison(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_INJECT_POISON:
+		rc = mock_inject_poison(cxlds, cmd);
+		break;
 	default:
 		break;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/6] tools/testing/cxl: Mock the Clear Poison mailbox command
  2023-01-19  5:00 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (2 preceding siblings ...)
  2023-01-19  5:00 ` [PATCH v2 3/6] tools/testing/cxl: Mock the Inject " alison.schofield
@ 2023-01-19  5:00 ` alison.schofield
  2023-01-19  5:00 ` [PATCH v2 5/6] tools/testing/cxl: Use injected poison for get poison list alison.schofield
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: alison.schofield @ 2023-01-19  5:00 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

Mock the clear of poison by deleting the device:address entry from
the mock_poison_list[]. Behave like a real CXL device and do not fail
if the address is not in the poison list, but offer a dev_dbg()
message.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 tools/testing/cxl/test/mem.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index b0ecdc4e7c87..bb0be9fe3fe9 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -50,6 +50,10 @@ static struct cxl_cel_entry mock_cel[] = {
 		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
 		.effect = cpu_to_le16(0),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_CLEAR_POISON),
+		.effect = cpu_to_le16(0),
+	},
 };
 
 /* See CXL 2.0 Table 181 Get Health Info Output Payload */
@@ -667,6 +671,35 @@ static int mock_inject_poison(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static bool mock_poison_del(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds == cxlds &&
+		    mock_poison_list[i].dpa == dpa) {
+			mock_poison_list[i].cxlds = NULL;
+			return true;
+		}
+	}
+	return false;
+}
+
+static int mock_clear_poison(struct cxl_dev_state *cxlds,
+			     struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_clear_poison *pi = cmd->payload_in;
+	u64 dpa = le64_to_cpu(pi->address);
+
+	/*
+	 * A real CXL device will write pi->write_data to the address
+	 * being cleared. In this mock, just delete this address from
+	 * the mock poison list.
+	 */
+	if (!mock_poison_del(cxlds, dpa))
+		dev_dbg(cxlds->dev, "DPA: 0x%llx not in poison list\n", dpa);
+
+	return 0;
+}
+
 static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct device *dev = cxlds->dev;
@@ -721,6 +754,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_INJECT_POISON:
 		rc = mock_inject_poison(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_CLEAR_POISON:
+		rc = mock_clear_poison(cxlds, cmd);
+		break;
 	default:
 		break;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/6] tools/testing/cxl: Use injected poison for get poison list
  2023-01-19  5:00 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (3 preceding siblings ...)
  2023-01-19  5:00 ` [PATCH v2 4/6] tools/testing/cxl: Mock the Clear " alison.schofield
@ 2023-01-19  5:00 ` alison.schofield
  2023-01-23 15:16   ` Jonathan Cameron
  2023-01-19  5:00 ` [PATCH v2 6/6] tools/testing/cxl: Add a param to test poison injection limits alison.schofield
  2023-01-23 17:13 ` [PATCH v2 0/6] cxl: CXL Inject & Clear Poison Jonathan Cameron
  6 siblings, 1 reply; 23+ messages in thread
From: alison.schofield @ 2023-01-19  5:00 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Prior to poison inject support, the mock of 'Get Poison List'
returned a poison list containing a single mocked error record.

Now, following the addition of poison inject and clear support,
use the mock_poison_list[] as the source of records for 'Get
Poison List' requests.

This supports confirmation of inject and clear poison commands.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 tools/testing/cxl/test/mem.c | 56 ++++++++++++++++++++++++------------
 1 file changed, 38 insertions(+), 18 deletions(-)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index bb0be9fe3fe9..14d74bfb3124 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -582,31 +582,51 @@ static struct mock_poison {
 	u64 dpa;
 } mock_poison_list[MOCK_INJECT_TEST_MAX];
 
+static struct cxl_mbox_poison_payload_out
+*cxl_get_injected_po(struct cxl_dev_state *cxlds, u64 offset, u64 length)
+{
+	struct cxl_mbox_poison_payload_out *po;
+	int nr_records = 0;
+	u64 dpa;
+
+	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
+	if (!po)
+		return NULL;
+
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds != cxlds)
+			continue;
+		if (mock_poison_list[i].dpa < offset ||
+		    mock_poison_list[i].dpa > offset + length - 1)
+			continue;
+
+		dpa = mock_poison_list[i].dpa + CXL_POISON_SOURCE_INJECTED;
+		po->record[nr_records].address = cpu_to_le64(dpa);
+		po->record[nr_records].length = cpu_to_le32(1);
+		nr_records++;
+		if (nr_records == MOCK_INJECT_DEV_MAX)
+			break;
+	}
+
+	/* Always return count, even when zero */
+	po->count = cpu_to_le16(nr_records);
+
+	return po;
+}
+
 static int mock_get_poison(struct cxl_dev_state *cxlds,
 			   struct cxl_mbox_cmd *cmd)
 {
 	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
+	struct cxl_mbox_poison_payload_out *po;
+	u64 offset = le64_to_cpu(pi->offset);
+	u64 length = le64_to_cpu(pi->length);
 
-	/* Mock one poison record at pi.offset for 64 bytes */
-	struct {
-		struct cxl_mbox_poison_payload_out po;
-		struct cxl_poison_record record;
-	} mock_plist = {
-		.po = {
-			.count = cpu_to_le16(1),
-		},
-		.record = {
-			.length = cpu_to_le32(1),
-			.address = cpu_to_le64(pi->offset +
-					       CXL_POISON_SOURCE_INJECTED),
-		},
-	};
+	po = cxl_get_injected_po(cxlds, offset, length);
 
-	if (cmd->size_out < sizeof(mock_plist))
-		return -EINVAL;
+	memcpy(cmd->payload_out, po, struct_size(po, record, po->count));
+	cmd->size_out = struct_size(po, record, po->count);
 
-	memcpy(cmd->payload_out, &mock_plist, sizeof(mock_plist));
-	cmd->size_out = sizeof(mock_plist);
 	return 0;
 }
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 6/6] tools/testing/cxl: Add a param to test poison injection limits
  2023-01-19  5:00 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (4 preceding siblings ...)
  2023-01-19  5:00 ` [PATCH v2 5/6] tools/testing/cxl: Use injected poison for get poison list alison.schofield
@ 2023-01-19  5:00 ` alison.schofield
  2023-01-23 15:28   ` Jonathan Cameron
  2023-01-23 17:13 ` [PATCH v2 0/6] cxl: CXL Inject & Clear Poison Jonathan Cameron
  6 siblings, 1 reply; 23+ messages in thread
From: alison.schofield @ 2023-01-19  5:00 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

CXL devices may report a maximum number of addresses that a device
allows to be poisoned using poison injection. When cxl_test creates
mock CXL memory devices, it uses MOCK_INJECT_DEV_MAX (8) for all
mocked memdevs.

Add a module parameter, param_inject_dev_max to module cxl_mock_mem
so that testers can set custom injection limits.

Example: Set MOCK_INJECT_DEV_MAX to 7
$ echo 7 > /sys/module/cxl_mock_mem/parameters/param_inject_dev_max

A simple usage model is to set it before running a test in order to
emulate a device's poison handling. Changing the max value in the
midst of inject, clear, and get poison flows, needs to be carefully
managed by the user.

For example, if the max is reduced after more than max errors are
injected, those errors will remain in the poison list and may need
to be cleared out even though a request to read the poison list will
not show more than max. The driver does not clear out the errors that
are over max when this parameter changes.

Suggested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 tools/testing/cxl/test/mem.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 14d74bfb3124..5b938283c1d7 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -17,6 +17,8 @@
 #define MOCK_INJECT_DEV_MAX 8
 #define MOCK_INJECT_TEST_MAX 128
 
+int param_inject_dev_max = MOCK_INJECT_DEV_MAX;
+
 static struct cxl_cel_entry mock_cel[] = {
 	{
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
@@ -155,7 +157,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
 		.total_capacity =
 			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
-		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
+		.inject_poison_limit = cpu_to_le16(param_inject_dev_max),
 	};
 
 	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
@@ -589,7 +591,7 @@ static struct cxl_mbox_poison_payload_out
 	int nr_records = 0;
 	u64 dpa;
 
-	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
+	po = kzalloc(struct_size(po, record, param_inject_dev_max), GFP_KERNEL);
 	if (!po)
 		return NULL;
 
@@ -604,7 +606,7 @@ static struct cxl_mbox_poison_payload_out
 		po->record[nr_records].address = cpu_to_le64(dpa);
 		po->record[nr_records].length = cpu_to_le32(1);
 		nr_records++;
-		if (nr_records == MOCK_INJECT_DEV_MAX)
+		if (nr_records == param_inject_dev_max)
 			break;
 	}
 
@@ -638,7 +640,7 @@ static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
 		if (mock_poison_list[i].cxlds == cxlds)
 			count++;
 	}
-	return (count >= MOCK_INJECT_DEV_MAX);
+	return (count >= param_inject_dev_max);
 }
 
 static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
@@ -909,6 +911,9 @@ static struct platform_driver cxl_mock_mem_driver = {
 	},
 };
 
+module_param(param_inject_dev_max, int, 0644);
+MODULE_PARM_DESC(param_inject_dev_max, "Maximum number of physical addresses that can be poisoned in the mock device. The default is 8. The cxl_test driver limit is 128 across all mock devices.");
+
 module_platform_driver(cxl_mock_mem_driver);
 MODULE_LICENSE("GPL v2");
 MODULE_IMPORT_NS(CXL);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/6] tools/testing/cxl: Mock the Inject Poison mailbox command
  2023-01-19  5:00 ` [PATCH v2 3/6] tools/testing/cxl: Mock the Inject " alison.schofield
@ 2023-01-23 15:10   ` Jonathan Cameron
  2023-01-24  0:06     ` Alison Schofield
  0 siblings, 1 reply; 23+ messages in thread
From: Jonathan Cameron @ 2023-01-23 15:10 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, 18 Jan 2023 21:00:18 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Mock the injection of poison by storing the device:address entries in
> mock_poison_list[]. Enforce a limit of 8 poison injections per memdev
> device and 128 total entries for the cxl_test mock driver.
> 
> Introducing the mock_poison[] list here, makes it available for use in
> the mock of Clear Poison, and the mock of Get Poison List.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

I'm not that keen on the global nature of the list, but maybe that's
enough for this test for now at least. Nothing stops us making it more
flexible later.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  tools/testing/cxl/test/mem.c | 77 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 09dc358bb33b..b0ecdc4e7c87 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -14,6 +14,9 @@
>  #define DEV_SIZE SZ_2G
>  #define EFFECT(x) (1U << x)
>  
> +#define MOCK_INJECT_DEV_MAX 8
> +#define MOCK_INJECT_TEST_MAX 128
> +
>  static struct cxl_cel_entry mock_cel[] = {
>  	{
>  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
> @@ -43,6 +46,10 @@ static struct cxl_cel_entry mock_cel[] = {
>  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
>  		.effect = cpu_to_le16(0),
>  	},
> +	{
> +		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
> +		.effect = cpu_to_le16(0),
> +	},
>  };
>  
>  /* See CXL 2.0 Table 181 Get Health Info Output Payload */
> @@ -144,6 +151,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
>  		.total_capacity =
>  			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
> +		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
>  	};
>  
>  	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
> @@ -565,6 +573,11 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
>  	return 0;
>  }
>  
> +static struct mock_poison {
> +	struct cxl_dev_state *cxlds;
> +	u64 dpa;
> +} mock_poison_list[MOCK_INJECT_TEST_MAX];
> +
>  static int mock_get_poison(struct cxl_dev_state *cxlds,
>  			   struct cxl_mbox_cmd *cmd)
>  {
> @@ -593,6 +606,67 @@ static int mock_get_poison(struct cxl_dev_state *cxlds,
>  	return 0;
>  }
>  
> +static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
> +{
> +	int count = 0;
> +
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (mock_poison_list[i].cxlds == cxlds)
> +			count++;
> +	}
> +	return (count >= MOCK_INJECT_DEV_MAX);
> +}
> +
> +static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	if (mock_poison_dev_max_injected(cxlds)) {
> +		dev_dbg(cxlds->dev,
> +			"Device poison injection limit has been reached: %d\n",
> +			MOCK_INJECT_DEV_MAX);
> +		return false;
> +	}
> +
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (!mock_poison_list[i].cxlds) {
> +			mock_poison_list[i].cxlds = cxlds;
> +			mock_poison_list[i].dpa = dpa;
> +			return true;
> +		}
> +	}
> +	dev_dbg(cxlds->dev,
> +		"Mock test poison injection limit has been reached: %d\n",
> +		MOCK_INJECT_TEST_MAX);
> +
> +	return false;
> +}
> +
> +static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (mock_poison_list[i].cxlds == cxlds &&
> +		    mock_poison_list[i].dpa == dpa)
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static int mock_inject_poison(struct cxl_dev_state *cxlds,
> +			      struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_mbox_inject_poison *pi = cmd->payload_in;
> +	u64 dpa = le64_to_cpu(pi->address);
> +
> +	if (mock_poison_found(cxlds, dpa)) {
> +		/* Not an error to inject poison if already poisoned */
> +		dev_dbg(cxlds->dev, "DPA: 0x%llx already poisoned\n", dpa);
> +		return 0;
> +	}
> +	if (!mock_poison_add(cxlds, dpa))
> +		return -ENXIO;
> +
> +	return 0;
> +}
> +
>  static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  {
>  	struct device *dev = cxlds->dev;
> @@ -644,6 +718,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
>  	case CXL_MBOX_OP_GET_POISON:
>  		rc = mock_get_poison(cxlds, cmd);
>  		break;
> +	case CXL_MBOX_OP_INJECT_POISON:
> +		rc = mock_inject_poison(cxlds, cmd);
> +		break;
>  	default:
>  		break;
>  	}


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/6] tools/testing/cxl: Use injected poison for get poison list
  2023-01-19  5:00 ` [PATCH v2 5/6] tools/testing/cxl: Use injected poison for get poison list alison.schofield
@ 2023-01-23 15:16   ` Jonathan Cameron
  2023-01-24  0:24     ` Alison Schofield
  0 siblings, 1 reply; 23+ messages in thread
From: Jonathan Cameron @ 2023-01-23 15:16 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, 18 Jan 2023 21:00:20 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Prior to poison inject support, the mock of 'Get Poison List'
> returned a poison list containing a single mocked error record.
> 
> Now, following the addition of poison inject and clear support,
> use the mock_poison_list[] as the source of records for 'Get
> Poison List' requests.
> 
> This supports confirmation of inject and clear poison commands.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
A comment inline.  Either way I'm fine with this.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  tools/testing/cxl/test/mem.c | 56 ++++++++++++++++++++++++------------
>  1 file changed, 38 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index bb0be9fe3fe9..14d74bfb3124 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -582,31 +582,51 @@ static struct mock_poison {
>  	u64 dpa;
>  } mock_poison_list[MOCK_INJECT_TEST_MAX];
>  
> +static struct cxl_mbox_poison_payload_out
> +*cxl_get_injected_po(struct cxl_dev_state *cxlds, u64 offset, u64 length)
> +{
> +	struct cxl_mbox_poison_payload_out *po;
> +	int nr_records = 0;
> +	u64 dpa;
> +
> +	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
> +	if (!po)
> +		return NULL;
> +
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (mock_poison_list[i].cxlds != cxlds)
> +			continue;
> +		if (mock_poison_list[i].dpa < offset ||
> +		    mock_poison_list[i].dpa > offset + length - 1)

I was thinking that we could move this to the add side of things, but
perhaps it actually makes sense in both code paths?

> +			continue;
> +
> +		dpa = mock_poison_list[i].dpa + CXL_POISON_SOURCE_INJECTED;
> +		po->record[nr_records].address = cpu_to_le64(dpa);
> +		po->record[nr_records].length = cpu_to_le32(1);
> +		nr_records++;
> +		if (nr_records == MOCK_INJECT_DEV_MAX)
> +			break;
> +	}
> +
> +	/* Always return count, even when zero */
> +	po->count = cpu_to_le16(nr_records);
> +
> +	return po;
> +}
> +


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] tools/testing/cxl: Add a param to test poison injection limits
  2023-01-19  5:00 ` [PATCH v2 6/6] tools/testing/cxl: Add a param to test poison injection limits alison.schofield
@ 2023-01-23 15:28   ` Jonathan Cameron
  2023-01-23 23:57     ` Alison Schofield
  0 siblings, 1 reply; 23+ messages in thread
From: Jonathan Cameron @ 2023-01-23 15:28 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, 18 Jan 2023 21:00:21 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices may report a maximum number of addresses that a device
> allows to be poisoned using poison injection. When cxl_test creates
> mock CXL memory devices, it uses MOCK_INJECT_DEV_MAX (8) for all
> mocked memdevs.
> 
> Add a module parameter, param_inject_dev_max to module cxl_mock_mem
> so that testers can set custom injection limits.
> 
> Example: Set MOCK_INJECT_DEV_MAX to 7
> $ echo 7 > /sys/module/cxl_mock_mem/parameters/param_inject_dev_max
> 
> A simple usage model is to set it before running a test in order to
> emulate a device's poison handling. Changing the max value in the
> midst of inject, clear, and get poison flows, needs to be carefully
> managed by the user.
> 
> For example, if the max is reduced after more than max errors are
> injected, those errors will remain in the poison list and may need
> to be cleared out even though a request to read the poison list will
> not show more than max. The driver does not clear out the errors that
> are over max when this parameter changes.
> 
> Suggested-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Seems sensible. I was thinking of doing something similar in the QEMU
code as it's tedious injecting lots of errors just to make it overflow.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

I did some basic testing of the non mock parts of this against my
latest qemu branch.   Mostly works fine but I need to think a little
about how to handle clears of part of a larger poison entry and potential
to trigger list overflow on a clear.
You avoid that problem here by only dealing with 64 byte entries which
is sensible for the mocking driver.

> ---
>  tools/testing/cxl/test/mem.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 14d74bfb3124..5b938283c1d7 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -17,6 +17,8 @@
>  #define MOCK_INJECT_DEV_MAX 8
>  #define MOCK_INJECT_TEST_MAX 128
>  
> +int param_inject_dev_max = MOCK_INJECT_DEV_MAX;
> +
>  static struct cxl_cel_entry mock_cel[] = {
>  	{
>  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
> @@ -155,7 +157,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
>  		.total_capacity =
>  			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
> -		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
> +		.inject_poison_limit = cpu_to_le16(param_inject_dev_max),
>  	};
>  
>  	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
> @@ -589,7 +591,7 @@ static struct cxl_mbox_poison_payload_out
>  	int nr_records = 0;
>  	u64 dpa;
>  
> -	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
> +	po = kzalloc(struct_size(po, record, param_inject_dev_max), GFP_KERNEL);
>  	if (!po)
>  		return NULL;
>  
> @@ -604,7 +606,7 @@ static struct cxl_mbox_poison_payload_out
>  		po->record[nr_records].address = cpu_to_le64(dpa);
>  		po->record[nr_records].length = cpu_to_le32(1);
>  		nr_records++;
> -		if (nr_records == MOCK_INJECT_DEV_MAX)
> +		if (nr_records == param_inject_dev_max)
>  			break;
>  	}
>  
> @@ -638,7 +640,7 @@ static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
>  		if (mock_poison_list[i].cxlds == cxlds)
>  			count++;
>  	}
> -	return (count >= MOCK_INJECT_DEV_MAX);
> +	return (count >= param_inject_dev_max);
>  }
>  
>  static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> @@ -909,6 +911,9 @@ static struct platform_driver cxl_mock_mem_driver = {
>  	},
>  };
>  
> +module_param(param_inject_dev_max, int, 0644);
> +MODULE_PARM_DESC(param_inject_dev_max, "Maximum number of physical addresses that can be poisoned in the mock device. The default is 8. The cxl_test driver limit is 128 across all mock devices.");
Perhaps: Maximum number of 64 byte blocks that can be poisoned...
> +
>  module_platform_driver(cxl_mock_mem_driver);
>  MODULE_LICENSE("GPL v2");
>  MODULE_IMPORT_NS(CXL);


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/6] cxl: CXL Inject & Clear Poison
  2023-01-19  5:00 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (5 preceding siblings ...)
  2023-01-19  5:00 ` [PATCH v2 6/6] tools/testing/cxl: Add a param to test poison injection limits alison.schofield
@ 2023-01-23 17:13 ` Jonathan Cameron
  2023-01-23 23:42   ` Alison Schofield
  6 siblings, 1 reply; 23+ messages in thread
From: Jonathan Cameron @ 2023-01-23 17:13 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, 18 Jan 2023 21:00:15 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Built on cxl/next plus Patchset: CXL Poison List Retrieval & Tracing:
> https://lore.kernel.org/linux-cxl/de11785ff05844299b40b100f8e0f56c7eef7f08.1674070170.git.alison.schofield@intel.com/

Only tangentially relevant, but I've only just registered
as a result of getting a lot of 0 timestamps (which is what
you return if the timestamp base is unknown) that I don't
think we currently ever set the EP timestamp.

Recommendation in the spec (8.2.9.4.2) is:
"It is recommended that the host set hte timestamp
after ever Conventional or CXL Reset"

I'd go further and assume that if we are doing native error
handling then it's up to the OS to initialize the timestamp.

Also relevant to Ira's series as events are timestamped.
Currently Ira's QEMU code doesn't take this subtlety into
account (poison doesn't either - but I have patches).

Jonathan


> 
> Changes in v2:
> - Add Jonathan Reviewed-by tags to Patches 1,2,4
> - Clean up input payload structs for both inject and clear (Dan)
> - Commit message cleanups, including spec references (Dave)
> - Use CXL_POISON_LEN_MULT in define of clear write data
> - Use IS_ALIGNED() for 64byte align check (Dan)
> - Add Kconfig CXL_POISON_INJECT  (Dan)
> - Trivial space cleanup (Jonathan)
> - Doc/ABI cleanup (Dave, Dan)
> - Mock: Only use injected errors for get poison list
> - Mock: Use 'POISONLMT -ENXIO' text from CMD_CMD_RC_TABLE  (Jonathan)
> - Mock: Add Patch 6/6: A module param to mock device inject limit
> 
> Link to v1: https://lore.kernel.org/linux-cxl/cover.1669781852.git.alison.schofield@intel.com/
> 
> Introducing Inject and Clear Poison support for CXL Devices.
> 
> These are optional commands, meaning not all CXL devices must support
> them. The sysfs attributes, inject_poison and clear_poison, are only
> visible for devices reporting support of the capability and when the
> kernel Kconfig option CONFIG_CXL_POISON_INJECT is on. (Default: off)
> 
> Example:
> # echo 0x40000000 > /sys/bus/cxl/devices/mem1/inject_poison
> # echo 1 > /sys/bus/cxl/devices/mem1/trigger_poison_list
> 
> cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Injected flags= overflow_time=0
> 
> 
> Alison Schofield (6):
>   cxl/memdev: Add support for the Inject Poison mailbox command
>   cxl/memdev: Add support for the Clear Poison mailbox command
>   tools/testing/cxl: Mock the Inject Poison mailbox command
>   tools/testing/cxl: Mock the Clear Poison mailbox command
>   tools/testing/cxl: Use injected poison for get poison list
>   tools/testing/cxl: Add a param to test poison injection limits
> 
>  Documentation/ABI/testing/sysfs-bus-cxl |  40 ++++++
>  drivers/cxl/Kconfig                     |  10 ++
>  drivers/cxl/core/memdev.c               | 122 ++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  11 ++
>  tools/testing/cxl/test/mem.c            | 178 +++++++++++++++++++++---
>  5 files changed, 341 insertions(+), 20 deletions(-)
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/6] cxl: CXL Inject & Clear Poison
  2023-01-23 17:13 ` [PATCH v2 0/6] cxl: CXL Inject & Clear Poison Jonathan Cameron
@ 2023-01-23 23:42   ` Alison Schofield
  2023-01-24 10:21     ` Jonathan Cameron
  0 siblings, 1 reply; 23+ messages in thread
From: Alison Schofield @ 2023-01-23 23:42 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Mon, Jan 23, 2023 at 05:13:01PM +0000, Jonathan Cameron wrote:
> On Wed, 18 Jan 2023 21:00:15 -0800
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Built on cxl/next plus Patchset: CXL Poison List Retrieval & Tracing:
> > https://lore.kernel.org/linux-cxl/de11785ff05844299b40b100f8e0f56c7eef7f08.1674070170.git.alison.schofield@intel.com/
> 
> Only tangentially relevant, but I've only just registered
> as a result of getting a lot of 0 timestamps (which is what
> you return if the timestamp base is unknown) that I don't
> think we currently ever set the EP timestamp.
> 
> Recommendation in the spec (8.2.9.4.2) is:
> "It is recommended that the host set hte timestamp
> after ever Conventional or CXL Reset"
> 
> I'd go further and assume that if we are doing native error
> handling then it's up to the OS to initialize the timestamp.
> 
> Also relevant to Ira's series as events are timestamped.
> Currently Ira's QEMU code doesn't take this subtlety into
> account (poison doesn't either - but I have patches).
> 
> Jonathan
> 

Jonathan,

I hadn't seen the Set Timestamp cmd, but I think we are OK with
Get Poison List and it's overflow_t reporting, it does not use
a relative timestamp, but absolute since Jan-1970.

Table 8-106 says: 
Overflow Timestamp: The time that the device determined the poison
list overflowed. This field is only valid if the overflow indicator is set. The
number of unsigned nanoseconds that have elapsed since midnight, 01-
Jan-1970, UTC. If the device does not have a valid timestamp, return 0.

Alison


> 
> > 
> > Changes in v2:
> > - Add Jonathan Reviewed-by tags to Patches 1,2,4
> > - Clean up input payload structs for both inject and clear (Dan)
> > - Commit message cleanups, including spec references (Dave)
> > - Use CXL_POISON_LEN_MULT in define of clear write data
> > - Use IS_ALIGNED() for 64byte align check (Dan)
> > - Add Kconfig CXL_POISON_INJECT  (Dan)
> > - Trivial space cleanup (Jonathan)
> > - Doc/ABI cleanup (Dave, Dan)
> > - Mock: Only use injected errors for get poison list
> > - Mock: Use 'POISONLMT -ENXIO' text from CMD_CMD_RC_TABLE  (Jonathan)
> > - Mock: Add Patch 6/6: A module param to mock device inject limit
> > 
> > Link to v1: https://lore.kernel.org/linux-cxl/cover.1669781852.git.alison.schofield@intel.com/
> > 
> > Introducing Inject and Clear Poison support for CXL Devices.
> > 
> > These are optional commands, meaning not all CXL devices must support
> > them. The sysfs attributes, inject_poison and clear_poison, are only
> > visible for devices reporting support of the capability and when the
> > kernel Kconfig option CONFIG_CXL_POISON_INJECT is on. (Default: off)
> > 
> > Example:
> > # echo 0x40000000 > /sys/bus/cxl/devices/mem1/inject_poison
> > # echo 1 > /sys/bus/cxl/devices/mem1/trigger_poison_list
> > 
> > cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Injected flags= overflow_time=0
> > 
> > 
> > Alison Schofield (6):
> >   cxl/memdev: Add support for the Inject Poison mailbox command
> >   cxl/memdev: Add support for the Clear Poison mailbox command
> >   tools/testing/cxl: Mock the Inject Poison mailbox command
> >   tools/testing/cxl: Mock the Clear Poison mailbox command
> >   tools/testing/cxl: Use injected poison for get poison list
> >   tools/testing/cxl: Add a param to test poison injection limits
> > 
> >  Documentation/ABI/testing/sysfs-bus-cxl |  40 ++++++
> >  drivers/cxl/Kconfig                     |  10 ++
> >  drivers/cxl/core/memdev.c               | 122 ++++++++++++++++
> >  drivers/cxl/cxlmem.h                    |  11 ++
> >  tools/testing/cxl/test/mem.c            | 178 +++++++++++++++++++++---
> >  5 files changed, 341 insertions(+), 20 deletions(-)
> > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] tools/testing/cxl: Add a param to test poison injection limits
  2023-01-23 15:28   ` Jonathan Cameron
@ 2023-01-23 23:57     ` Alison Schofield
  0 siblings, 0 replies; 23+ messages in thread
From: Alison Schofield @ 2023-01-23 23:57 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Mon, Jan 23, 2023 at 03:28:31PM +0000, Jonathan Cameron wrote:
> On Wed, 18 Jan 2023 21:00:21 -0800
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices may report a maximum number of addresses that a device
> > allows to be poisoned using poison injection. When cxl_test creates
> > mock CXL memory devices, it uses MOCK_INJECT_DEV_MAX (8) for all
> > mocked memdevs.
> > 
> > Add a module parameter, param_inject_dev_max to module cxl_mock_mem
> > so that testers can set custom injection limits.
> > 
> > Example: Set MOCK_INJECT_DEV_MAX to 7
> > $ echo 7 > /sys/module/cxl_mock_mem/parameters/param_inject_dev_max
> > 
> > A simple usage model is to set it before running a test in order to
> > emulate a device's poison handling. Changing the max value in the
> > midst of inject, clear, and get poison flows, needs to be carefully
> > managed by the user.
> > 
> > For example, if the max is reduced after more than max errors are
> > injected, those errors will remain in the poison list and may need
> > to be cleared out even though a request to read the poison list will
> > not show more than max. The driver does not clear out the errors that
> > are over max when this parameter changes.
> > 
> > Suggested-by: Dave Jiang <dave.jiang@intel.com>
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> 
> Seems sensible. I was thinking of doing something similar in the QEMU
> code as it's tedious injecting lots of errors just to make it overflow.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> I did some basic testing of the non mock parts of this against my
> latest qemu branch.   Mostly works fine but I need to think a little
> about how to handle clears of part of a larger poison entry and potential
> to trigger list overflow on a clear.
> You avoid that problem here by only dealing with 64 byte entries which
> is sensible for the mocking driver.
> 

Thanks for the review Jonathan.
I did get feedback, in diff forum, to handle the changing of max_errors
more cleanly, ie. use module callback or make it a sysfs attribute.
I'll be revving for that.

Alison

> > ---
> >  tools/testing/cxl/test/mem.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index 14d74bfb3124..5b938283c1d7 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> > @@ -17,6 +17,8 @@
> >  #define MOCK_INJECT_DEV_MAX 8
> >  #define MOCK_INJECT_TEST_MAX 128
> >  
> > +int param_inject_dev_max = MOCK_INJECT_DEV_MAX;
> > +
> >  static struct cxl_cel_entry mock_cel[] = {
> >  	{
> >  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
> > @@ -155,7 +157,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> >  			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
> >  		.total_capacity =
> >  			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
> > -		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
> > +		.inject_poison_limit = cpu_to_le16(param_inject_dev_max),
> >  	};
> >  
> >  	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
> > @@ -589,7 +591,7 @@ static struct cxl_mbox_poison_payload_out
> >  	int nr_records = 0;
> >  	u64 dpa;
> >  
> > -	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
> > +	po = kzalloc(struct_size(po, record, param_inject_dev_max), GFP_KERNEL);
> >  	if (!po)
> >  		return NULL;
> >  
> > @@ -604,7 +606,7 @@ static struct cxl_mbox_poison_payload_out
> >  		po->record[nr_records].address = cpu_to_le64(dpa);
> >  		po->record[nr_records].length = cpu_to_le32(1);
> >  		nr_records++;
> > -		if (nr_records == MOCK_INJECT_DEV_MAX)
> > +		if (nr_records == param_inject_dev_max)
> >  			break;
> >  	}
> >  
> > @@ -638,7 +640,7 @@ static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
> >  		if (mock_poison_list[i].cxlds == cxlds)
> >  			count++;
> >  	}
> > -	return (count >= MOCK_INJECT_DEV_MAX);
> > +	return (count >= param_inject_dev_max);
> >  }
> >  
> >  static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> > @@ -909,6 +911,9 @@ static struct platform_driver cxl_mock_mem_driver = {
> >  	},
> >  };
> >  
> > +module_param(param_inject_dev_max, int, 0644);
> > +MODULE_PARM_DESC(param_inject_dev_max, "Maximum number of physical addresses that can be poisoned in the mock device. The default is 8. The cxl_test driver limit is 128 across all mock devices.");
> Perhaps: Maximum number of 64 byte blocks that can be poisoned...
> > +
> >  module_platform_driver(cxl_mock_mem_driver);
> >  MODULE_LICENSE("GPL v2");
> >  MODULE_IMPORT_NS(CXL);
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/6] tools/testing/cxl: Mock the Inject Poison mailbox command
  2023-01-23 15:10   ` Jonathan Cameron
@ 2023-01-24  0:06     ` Alison Schofield
  0 siblings, 0 replies; 23+ messages in thread
From: Alison Schofield @ 2023-01-24  0:06 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Mon, Jan 23, 2023 at 03:10:32PM +0000, Jonathan Cameron wrote:
> On Wed, 18 Jan 2023 21:00:18 -0800
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Mock the injection of poison by storing the device:address entries in
> > mock_poison_list[]. Enforce a limit of 8 poison injections per memdev
> > device and 128 total entries for the cxl_test mock driver.
> > 
> > Introducing the mock_poison[] list here, makes it available for use in
> > the mock of Clear Poison, and the mock of Get Poison List.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> 
> I'm not that keen on the global nature of the list, but maybe that's
> enough for this test for now at least. Nothing stops us making it more
> flexible later.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
Thanks for the review Jonathan.
The goal here was on enough capability to do a thorough unit test.
And, as you say, this can grow if a more elaborate usage is wanted.
Alison

> > ---
> >  tools/testing/cxl/test/mem.c | 77 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 77 insertions(+)
> > 
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index 09dc358bb33b..b0ecdc4e7c87 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> > @@ -14,6 +14,9 @@
> >  #define DEV_SIZE SZ_2G
> >  #define EFFECT(x) (1U << x)
> >  
> > +#define MOCK_INJECT_DEV_MAX 8
> > +#define MOCK_INJECT_TEST_MAX 128
> > +
> >  static struct cxl_cel_entry mock_cel[] = {
> >  	{
> >  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
> > @@ -43,6 +46,10 @@ static struct cxl_cel_entry mock_cel[] = {
> >  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
> >  		.effect = cpu_to_le16(0),
> >  	},
> > +	{
> > +		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
> > +		.effect = cpu_to_le16(0),
> > +	},
> >  };
> >  
> >  /* See CXL 2.0 Table 181 Get Health Info Output Payload */
> > @@ -144,6 +151,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> >  			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
> >  		.total_capacity =
> >  			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
> > +		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
> >  	};
> >  
> >  	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
> > @@ -565,6 +573,11 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
> >  	return 0;
> >  }
> >  
> > +static struct mock_poison {
> > +	struct cxl_dev_state *cxlds;
> > +	u64 dpa;
> > +} mock_poison_list[MOCK_INJECT_TEST_MAX];
> > +
> >  static int mock_get_poison(struct cxl_dev_state *cxlds,
> >  			   struct cxl_mbox_cmd *cmd)
> >  {
> > @@ -593,6 +606,67 @@ static int mock_get_poison(struct cxl_dev_state *cxlds,
> >  	return 0;
> >  }
> >  
> > +static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
> > +{
> > +	int count = 0;
> > +
> > +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> > +		if (mock_poison_list[i].cxlds == cxlds)
> > +			count++;
> > +	}
> > +	return (count >= MOCK_INJECT_DEV_MAX);
> > +}
> > +
> > +static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> > +{
> > +	if (mock_poison_dev_max_injected(cxlds)) {
> > +		dev_dbg(cxlds->dev,
> > +			"Device poison injection limit has been reached: %d\n",
> > +			MOCK_INJECT_DEV_MAX);
> > +		return false;
> > +	}
> > +
> > +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> > +		if (!mock_poison_list[i].cxlds) {
> > +			mock_poison_list[i].cxlds = cxlds;
> > +			mock_poison_list[i].dpa = dpa;
> > +			return true;
> > +		}
> > +	}
> > +	dev_dbg(cxlds->dev,
> > +		"Mock test poison injection limit has been reached: %d\n",
> > +		MOCK_INJECT_TEST_MAX);
> > +
> > +	return false;
> > +}
> > +
> > +static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
> > +{
> > +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> > +		if (mock_poison_list[i].cxlds == cxlds &&
> > +		    mock_poison_list[i].dpa == dpa)
> > +			return true;
> > +	}
> > +	return false;
> > +}
> > +
> > +static int mock_inject_poison(struct cxl_dev_state *cxlds,
> > +			      struct cxl_mbox_cmd *cmd)
> > +{
> > +	struct cxl_mbox_inject_poison *pi = cmd->payload_in;
> > +	u64 dpa = le64_to_cpu(pi->address);
> > +
> > +	if (mock_poison_found(cxlds, dpa)) {
> > +		/* Not an error to inject poison if already poisoned */
> > +		dev_dbg(cxlds->dev, "DPA: 0x%llx already poisoned\n", dpa);
> > +		return 0;
> > +	}
> > +	if (!mock_poison_add(cxlds, dpa))
> > +		return -ENXIO;
> > +
> > +	return 0;
> > +}
> > +
> >  static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> >  {
> >  	struct device *dev = cxlds->dev;
> > @@ -644,6 +718,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
> >  	case CXL_MBOX_OP_GET_POISON:
> >  		rc = mock_get_poison(cxlds, cmd);
> >  		break;
> > +	case CXL_MBOX_OP_INJECT_POISON:
> > +		rc = mock_inject_poison(cxlds, cmd);
> > +		break;
> >  	default:
> >  		break;
> >  	}
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/6] tools/testing/cxl: Use injected poison for get poison list
  2023-01-23 15:16   ` Jonathan Cameron
@ 2023-01-24  0:24     ` Alison Schofield
  2023-01-24 10:15       ` Jonathan Cameron
  0 siblings, 1 reply; 23+ messages in thread
From: Alison Schofield @ 2023-01-24  0:24 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Mon, Jan 23, 2023 at 03:16:53PM +0000, Jonathan Cameron wrote:
> On Wed, 18 Jan 2023 21:00:20 -0800
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Prior to poison inject support, the mock of 'Get Poison List'
> > returned a poison list containing a single mocked error record.
> > 
> > Now, following the addition of poison inject and clear support,
> > use the mock_poison_list[] as the source of records for 'Get
> > Poison List' requests.
> > 
> > This supports confirmation of inject and clear poison commands.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> A comment inline.  Either way I'm fine with this.

replied below...

> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > ---
> >  tools/testing/cxl/test/mem.c | 56 ++++++++++++++++++++++++------------
> >  1 file changed, 38 insertions(+), 18 deletions(-)
> > 
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index bb0be9fe3fe9..14d74bfb3124 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> > @@ -582,31 +582,51 @@ static struct mock_poison {
> >  	u64 dpa;
> >  } mock_poison_list[MOCK_INJECT_TEST_MAX];
> >  
> > +static struct cxl_mbox_poison_payload_out
> > +*cxl_get_injected_po(struct cxl_dev_state *cxlds, u64 offset, u64 length)
> > +{
> > +	struct cxl_mbox_poison_payload_out *po;
> > +	int nr_records = 0;
> > +	u64 dpa;
> > +
> > +	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
> > +	if (!po)
> > +		return NULL;
> > +
> > +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> > +		if (mock_poison_list[i].cxlds != cxlds)
> > +			continue;
> > +		if (mock_poison_list[i].dpa < offset ||
> > +		    mock_poison_list[i].dpa > offset + length - 1)
> 
> I was thinking that we could move this to the add side of things, but
> perhaps it actually makes sense in both code paths?
> 

I'm not understanding what can be done in the add side wrt the above.

The 'add side' is the inject, and it can only inject one 64byte length.

Here, mocking get-poison-list, we look at each DPA stored for cxlds,
and return it if it's in the offset/lenght requested.

Am I missing something?
Alison

> > +			continue;
> > +
> > +		dpa = mock_poison_list[i].dpa + CXL_POISON_SOURCE_INJECTED;
> > +		po->record[nr_records].address = cpu_to_le64(dpa);
> > +		po->record[nr_records].length = cpu_to_le32(1);
> > +		nr_records++;
> > +		if (nr_records == MOCK_INJECT_DEV_MAX)
> > +			break;
> > +	}
> > +
> > +	/* Always return count, even when zero */
> > +	po->count = cpu_to_le16(nr_records);
> > +
> > +	return po;
> > +}
> > +
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/6] tools/testing/cxl: Use injected poison for get poison list
  2023-01-24  0:24     ` Alison Schofield
@ 2023-01-24 10:15       ` Jonathan Cameron
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron @ 2023-01-24 10:15 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Mon, 23 Jan 2023 16:24:07 -0800
Alison Schofield <alison.schofield@intel.com> wrote:

> On Mon, Jan 23, 2023 at 03:16:53PM +0000, Jonathan Cameron wrote:
> > On Wed, 18 Jan 2023 21:00:20 -0800
> > alison.schofield@intel.com wrote:
> >   
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > Prior to poison inject support, the mock of 'Get Poison List'
> > > returned a poison list containing a single mocked error record.
> > > 
> > > Now, following the addition of poison inject and clear support,
> > > use the mock_poison_list[] as the source of records for 'Get
> > > Poison List' requests.
> > > 
> > > This supports confirmation of inject and clear poison commands.
> > > 
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>  
> > A comment inline.  Either way I'm fine with this.  
> 
> replied below...
> 
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >   
> > > ---
> > >  tools/testing/cxl/test/mem.c | 56 ++++++++++++++++++++++++------------
> > >  1 file changed, 38 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > > index bb0be9fe3fe9..14d74bfb3124 100644
> > > --- a/tools/testing/cxl/test/mem.c
> > > +++ b/tools/testing/cxl/test/mem.c
> > > @@ -582,31 +582,51 @@ static struct mock_poison {
> > >  	u64 dpa;
> > >  } mock_poison_list[MOCK_INJECT_TEST_MAX];
> > >  
> > > +static struct cxl_mbox_poison_payload_out
> > > +*cxl_get_injected_po(struct cxl_dev_state *cxlds, u64 offset, u64 length)
> > > +{
> > > +	struct cxl_mbox_poison_payload_out *po;
> > > +	int nr_records = 0;
> > > +	u64 dpa;
> > > +
> > > +	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
> > > +	if (!po)
> > > +		return NULL;
> > > +
> > > +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> > > +		if (mock_poison_list[i].cxlds != cxlds)
> > > +			continue;
> > > +		if (mock_poison_list[i].dpa < offset ||
> > > +		    mock_poison_list[i].dpa > offset + length - 1)  
> > 
> > I was thinking that we could move this to the add side of things, but
> > perhaps it actually makes sense in both code paths?
> >   
> 
> I'm not understanding what can be done in the add side wrt the above.
> 
> The 'add side' is the inject, and it can only inject one 64byte length.
> 
> Here, mocking get-poison-list, we look at each DPA stored for cxlds,
> and return it if it's in the offset/lenght requested.
> 
> Am I missing something?

Nope. I clearly hadn't had enough coffee.  My comment indeed makes
no sense.

Jonathan

> Alison
> 
> > > +			continue;
> > > +
> > > +		dpa = mock_poison_list[i].dpa + CXL_POISON_SOURCE_INJECTED;
> > > +		po->record[nr_records].address = cpu_to_le64(dpa);
> > > +		po->record[nr_records].length = cpu_to_le32(1);
> > > +		nr_records++;
> > > +		if (nr_records == MOCK_INJECT_DEV_MAX)
> > > +			break;
> > > +	}
> > > +
> > > +	/* Always return count, even when zero */
> > > +	po->count = cpu_to_le16(nr_records);
> > > +
> > > +	return po;
> > > +}
> > > +  
> >   


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/6] cxl: CXL Inject & Clear Poison
  2023-01-23 23:42   ` Alison Schofield
@ 2023-01-24 10:21     ` Jonathan Cameron
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron @ 2023-01-24 10:21 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Mon, 23 Jan 2023 15:42:33 -0800
Alison Schofield <alison.schofield@intel.com> wrote:

> On Mon, Jan 23, 2023 at 05:13:01PM +0000, Jonathan Cameron wrote:
> > On Wed, 18 Jan 2023 21:00:15 -0800
> > alison.schofield@intel.com wrote:
> >   
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > Built on cxl/next plus Patchset: CXL Poison List Retrieval & Tracing:
> > > https://lore.kernel.org/linux-cxl/de11785ff05844299b40b100f8e0f56c7eef7f08.1674070170.git.alison.schofield@intel.com/  
> > 
> > Only tangentially relevant, but I've only just registered
> > as a result of getting a lot of 0 timestamps (which is what
> > you return if the timestamp base is unknown) that I don't
> > think we currently ever set the EP timestamp.
> > 
> > Recommendation in the spec (8.2.9.4.2) is:
> > "It is recommended that the host set hte timestamp
> > after ever Conventional or CXL Reset"
> > 
> > I'd go further and assume that if we are doing native error
> > handling then it's up to the OS to initialize the timestamp.
> > 
> > Also relevant to Ira's series as events are timestamped.
> > Currently Ira's QEMU code doesn't take this subtlety into
> > account (poison doesn't either - but I have patches).
> > 
> > Jonathan
> >   
> 
> Jonathan,
> 
> I hadn't seen the Set Timestamp cmd, but I think we are OK with
> Get Poison List and it's overflow_t reporting, it does not use
> a relative timestamp, but absolute since Jan-1970.

I'd assume most devices have no ability to get that timestamp without
a set timestamp command.  You might get some with an RTC that is factory
set, but I doubt it will be common.

> 
> Table 8-106 says: 
> Overflow Timestamp: The time that the device determined the poison
> list overflowed. This field is only valid if the overflow indicator is set. The
> number of unsigned nanoseconds that have elapsed since midnight, 01-
> Jan-1970, UTC. If the device does not have a valid timestamp, return 0.

Yup, but Table 8-60  Set Timestamp Input Payload has

"Timestamp: The number of unsigned nanoseconds that have elapsed since midnight,
01-Jan-1970, UTC."
above that it is the text that says the host should set it on Conventional
or CXL reset (which obviously includes boot).

That's where the device idea of time is coming from so that it can be returned
in places like the Overflow timestamp.  My reading of that bit about
"not have a valid timestamp, return 0" is that is targeting the case where
a timestamp has not been set since reset.

All in all, we are fine (in that it 'works') but with out set timestamp
the generated time stamps are going to be garbage on many devices.

I'll spin an RFC patch adding the call and we can work out what
conditions it should be called under in that thread.

Jonathan



> 
> Alison
> 
> 
> >   
> > > 
> > > Changes in v2:
> > > - Add Jonathan Reviewed-by tags to Patches 1,2,4
> > > - Clean up input payload structs for both inject and clear (Dan)
> > > - Commit message cleanups, including spec references (Dave)
> > > - Use CXL_POISON_LEN_MULT in define of clear write data
> > > - Use IS_ALIGNED() for 64byte align check (Dan)
> > > - Add Kconfig CXL_POISON_INJECT  (Dan)
> > > - Trivial space cleanup (Jonathan)
> > > - Doc/ABI cleanup (Dave, Dan)
> > > - Mock: Only use injected errors for get poison list
> > > - Mock: Use 'POISONLMT -ENXIO' text from CMD_CMD_RC_TABLE  (Jonathan)
> > > - Mock: Add Patch 6/6: A module param to mock device inject limit
> > > 
> > > Link to v1: https://lore.kernel.org/linux-cxl/cover.1669781852.git.alison.schofield@intel.com/
> > > 
> > > Introducing Inject and Clear Poison support for CXL Devices.
> > > 
> > > These are optional commands, meaning not all CXL devices must support
> > > them. The sysfs attributes, inject_poison and clear_poison, are only
> > > visible for devices reporting support of the capability and when the
> > > kernel Kconfig option CONFIG_CXL_POISON_INJECT is on. (Default: off)
> > > 
> > > Example:
> > > # echo 0x40000000 > /sys/bus/cxl/devices/mem1/inject_poison
> > > # echo 1 > /sys/bus/cxl/devices/mem1/trigger_poison_list
> > > 
> > > cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Injected flags= overflow_time=0
> > > 
> > > 
> > > Alison Schofield (6):
> > >   cxl/memdev: Add support for the Inject Poison mailbox command
> > >   cxl/memdev: Add support for the Clear Poison mailbox command
> > >   tools/testing/cxl: Mock the Inject Poison mailbox command
> > >   tools/testing/cxl: Mock the Clear Poison mailbox command
> > >   tools/testing/cxl: Use injected poison for get poison list
> > >   tools/testing/cxl: Add a param to test poison injection limits
> > > 
> > >  Documentation/ABI/testing/sysfs-bus-cxl |  40 ++++++
> > >  drivers/cxl/Kconfig                     |  10 ++
> > >  drivers/cxl/core/memdev.c               | 122 ++++++++++++++++
> > >  drivers/cxl/cxlmem.h                    |  11 ++
> > >  tools/testing/cxl/test/mem.c            | 178 +++++++++++++++++++++---
> > >  5 files changed, 341 insertions(+), 20 deletions(-)
> > >   
> >   
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2 1/6] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-01-19  5:00 ` [PATCH v2 1/6] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
@ 2023-01-27 23:06   ` Dan Williams
  2023-01-28  2:47     ` Alison Schofield
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Williams @ 2023-01-27 23:06 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma,
	Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the INJECT POISON mailbox command. Add
> a sysfs attribute and memdev driver support for injecting poison.
> 
> When a Device Physical Address (DPA) is written to the inject_poison
> sysfs attribute, send an inject poison command to the device for the
> specified address.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> inject poison request, the device will return poison when the address
> is accessed through the CXL.mem bus. Injecting poison adds the address
> to the device's Poison List and the error source is set to Injected.
> In addition, the device adds a poison creation event to its internal
> Informational Event log, updates the Event Status register, and if
> configured, interrupts the host.
> 
> Also, per the CXL Specification, it is not an error to inject poison
> into an address that already has poison present and no error is
> returned from the device.
> 
> The inject_poison attribute is only visible for devices supporting
> the capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++
>  drivers/cxl/Kconfig                     | 10 ++++
>  drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  5 ++
>  4 files changed, 104 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index b715a4609718..e9c6dd02bd09 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -416,3 +416,25 @@ Description:
>  		if accessed, and the source of the poison. The retrieved
>  		errors are logged as kernel trace events with the label
>  		'cxl_poison'.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/inject_poison
> +Date:		January, 2023
> +KernelVersion:	v6.3
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute, the memdev driver sends an inject poison command to
> +		the device for the specified address. The DPA must be 64-byte
> +		aligned and the length of the injected poison is 64-bytes. If
> +		successful, the device returns poison when the address is
> +		accessed through the CXL.mem bus. Injecting poison adds the
> +		address to the device's Poison List and the error source is set
> +		to Injected. In addition, the device adds a poison creation
> +		event to its internal Informational Event log, updates the
> +		Event Status register, and if configured, interrupts the host.
> +		It is not an error to inject poison into an address that
> +		already has poison present and no error is returned. The
> +		inject_poison attribute is only visible for devices supporting
> +		the capability. Kconfig option CXL_POISON_INJECT must be on
> +		to enable this option. The default is off.
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 0ac53c422c31..6541f54725cd 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -129,4 +129,14 @@ config CXL_REGION_INVALIDATION_TEST
>  	  If unsure, or if this kernel is meant for production environments,
>  	  say N.
>  
> +config CXL_POISON_INJECT
> +	bool "CXL: Support CXL Memory Device Poison Inject"
> +	depends on CXL_MEM
> +	help
> +	  Selecting this option creates the sysfs attributes inject_poison
> +	  and clear_poison for CXL memory devices supporting the capability.
> +	  See Documentation/ABI/testing/sysfs-bus-cxl.

Could maybe clarify that this is meant for hardware debug scenarios so
that is the reason it is disabled by default.

> +
> +	  If unsure, say N.
> +
>  endif
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index e0af7e9c9989..226662cf3331 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -142,6 +142,61 @@ static ssize_t trigger_poison_list_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(trigger_poison_list);
>  
> +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	if (!resource_size(&cxlds->dpa_res)) {
> +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> +		return -EINVAL;
> +	}
> +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> +			dpa, &cxlds->dpa_res);
> +		return -EINVAL;
> +	}
> +	if (!IS_ALIGNED(dpa, 64)) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n",
> +			dpa);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static ssize_t inject_poison_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_mbox_inject_poison inject;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +
> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> +	if (rc)
> +		return rc;
> +
> +	inject = (struct cxl_mbox_inject_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> +		.size_in = sizeof(inject),
> +		.payload_in = &inject,
> +	};
> +
> +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(inject_poison);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> @@ -149,6 +204,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_label_storage_size.attr,
>  	&dev_attr_numa_node.attr,
>  	&dev_attr_trigger_poison_list.attr,
> +	&dev_attr_inject_poison.attr,
>  	NULL,
>  };
>  
> @@ -168,6 +224,10 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  	if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
>  		return 0;
>  
> +	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> +	    a == &dev_attr_inject_poison.attr)
> +		return 0;
> +
>  	if (a == &dev_attr_trigger_poison_list.attr) {
>  		struct device *dev = kobj_to_dev(kobj);
>  
> @@ -175,6 +235,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>  			return 0;
>  	}
> +	if (a == &dev_attr_inject_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);

I'd move the IS_ENABLED(CONFIG_CXL_POISON_INJECT) inside here so just
one spot in this function handles the poison attribute.

> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> +			return 0;

Ugh, this is a problem. So "inject poison" never should have been
enabled for the ioctl path way back in:

87815ee9d006 cxl/pci: Add media provisioning required commands

All the nice sysfs interface and compile option to turn it off in this
patch is moot since userspace can just send the ioctl if the sysfs
attribute is missing.

On the one hand this is already shipping ABI, but given cxl-cli has not
been enabled it chances are high that it can be deleted without anyone
caring (i.e. breaking deployed configurations). That would need to be a
lead in patch. As for how to detect that the inject poison opcode is
supported, that needs something like a custom "cxlds->has_poison_inject"
flag into cxl_walk_cel(). I.e.  cxlds->enabled_cmds is only about the
enabled ioctl wrapper commands for CXL opcodes not the availability of
opcodes for cxl_internal_send_cmd().


> +	}
>  	return a->mode;
>  }
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 28ba0cd8f2d3..862ca4f4cc06 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -436,6 +436,11 @@ struct cxl_mbox_poison_payload_out {
>  #define CXL_POISON_SOURCE_INJECTED	3
>  #define CXL_POISON_SOURCE_VENDOR	7
>  
> +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> +struct cxl_mbox_inject_poison {
> +	__le64 address;
> +};
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
> -- 
> 2.37.3
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2 2/6] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-01-19  5:00 ` [PATCH v2 2/6] cxl/memdev: Add support for the Clear " alison.schofield
@ 2023-01-27 23:56   ` Dan Williams
  2023-01-28  1:17     ` Alison Schofield
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Williams @ 2023-01-27 23:56 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma,
	Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the CLEAR POISON mailbox command. Add
> a sysfs attribute and memdev driver support for clearing poison.
> 
> When a Device Physical Address (DPA) is written to the clear_poison
> sysfs attribute, send a clear poison command to the device for the
> specified address.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid clear
> poison request, the device removes the address from the device's Poison
> List and writes 0 (zero) for 64 bytes starting at address. If the device
> cannot clear poison from the address, it returns a permanent media error
> and -ENXIO is returned to the user.
> 
> Additionally, and per the spec also, it is not an error to clear poison
> of an address that is not poisoned. No error is returned from the device
> and the address is not overwritten.
> 
> *Implementation note: Although the CXL specification defines the clear
> command to accept 64 bytes of 'write-data' to be used when clearing
> the poisoned address, this implementation always uses 0 (zeros) for
> the write-data.
> 
> The clear_poison attribute is only visible for devices supporting the
> capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
>  drivers/cxl/core/memdev.c               | 57 ++++++++++++++++++++++++-
>  drivers/cxl/cxlmem.h                    |  6 +++
>  3 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index e9c6dd02bd09..7e4897e7bc05 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -438,3 +438,21 @@ Description:
>  		inject_poison attribute is only visible for devices supporting
>  		the capability. Kconfig option CXL_POISON_INJECT must be on
>  		to enable this option. The default is off.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/clear_poison
> +Date:		January, 2023
> +KernelVersion:	v6.3
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute, the memdev driver sends a clear poison command to
> +		the device for the specified address. Clearing poison removes
> +		the address from the device's Poison List and writes 0 (zero)
> +		for 64 bytes starting at address. It is not an error to clear
> +		poison from an address that does not have poison set, and if
> +		poison was not set, the address is not overwritten. If the
> +		device cannot clear poison from the address, -ENXIO is returned.
> +		The clear_poison attribute is only visible for devices
> +		supporting the capability. Kconfig option CXL_POISON_INJECT
> +		must be on to enable this option. The default is off.

So unlike error inject, this interface leaves me cold because it is
changing the state of data without coordination.

You might say, "inject poison also changes the state of data without
coordination", while that is true it is expected that media can go bad
without warning. What software does not expect is that memory could be
put back into service without coordination. A memory error wants to be
cleared by the agent that currently owns the memory, like the page
allocator clearing PageHWPoison and putting the page back into service,
or a filesystem restoring a file that was previously quarantined.

The only way this interface can proceed is if it can assert that the
poison to be cleared is not mapped by any decoder which makes the owner
of the memory the administrator using the sysfs interface. That limits
its utility.

Inside the kernel the expectation is that the core-mm or filesystems are
using facilities like movdir64b to atomically clear poison without
needing to hassle with a CXL mailbox.

This sysfs interface can move forward but it needs the idle checks and
locking before it can issue the command. I would also have an eye
towards skipping the mailbox call on architectures that have poison
clearing instruction like x86's movdir64b, because as the spec says:
"This provides the same functionality as the host directly writing new
data to the device", so just try to do that by default. However that can
be a follow-on optimization.


> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 226662cf3331..4d86a4565c9e 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -197,6 +197,51 @@ static ssize_t inject_poison_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(inject_poison);
>  
> +static ssize_t clear_poison_store(struct device *dev,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_mbox_clear_poison clear;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +
> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> +	if (rc)
> +		return rc;
> +	/*
> +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> +	 * is defined to accept 64 bytes of 'write-data', along with the
> +	 * address to clear. The device writes the data into the address
> +	 * atomically, while clearing poison if the location is marked as
> +	 * being poisoned.
> +	 *
> +	 * Always use '0' for the write-data.
> +	 */
> +	clear = (struct cxl_mbox_clear_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};
> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> +		.size_in = sizeof(clear),
> +		.payload_in = &clear,
> +	};
> +
> +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(clear_poison);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> @@ -205,6 +250,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_numa_node.attr,
>  	&dev_attr_trigger_poison_list.attr,
>  	&dev_attr_inject_poison.attr,
> +	&dev_attr_clear_poison.attr,
>  	NULL,
>  };
>  
> @@ -225,7 +271,8 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  		return 0;
>  
>  	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> -	    a == &dev_attr_inject_poison.attr)
> +	    (a == &dev_attr_inject_poison.attr ||
> +	     a == &dev_attr_clear_poison.attr))
>  		return 0;
>  
>  	if (a == &dev_attr_trigger_poison_list.attr) {
> @@ -242,6 +289,14 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>  			return 0;
>  	}
> +	if (a == &dev_attr_clear_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);
> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
> +			return 0;

Similar comment as the last patch with respect to the command enabling.

> +		}
> +	}
>  	return a->mode;
>  }
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 862ca4f4cc06..adcbd4a98819 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -441,6 +441,12 @@ struct cxl_mbox_inject_poison {
>  	__le64 address;
>  };
>  
> +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> +struct cxl_mbox_clear_poison {
> +	__le64 address;
> +	u8 write_data[CXL_POISON_LEN_MULT];
> +} __packed;
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
> -- 
> 2.37.3
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-01-27 23:56   ` Dan Williams
@ 2023-01-28  1:17     ` Alison Schofield
  2023-01-28  2:19       ` Dan Williams
  0 siblings, 1 reply; 23+ messages in thread
From: Alison Schofield @ 2023-01-28  1:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang, linux-cxl,
	Jonathan Cameron

On Fri, Jan 27, 2023 at 03:56:39PM -0800, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the CLEAR POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for clearing poison.
> > 
> > When a Device Physical Address (DPA) is written to the clear_poison
> > sysfs attribute, send a clear poison command to the device for the
> > specified address.
> > 
> > Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid clear
> > poison request, the device removes the address from the device's Poison
> > List and writes 0 (zero) for 64 bytes starting at address. If the device
> > cannot clear poison from the address, it returns a permanent media error
> > and -ENXIO is returned to the user.
> > 
> > Additionally, and per the spec also, it is not an error to clear poison
> > of an address that is not poisoned. No error is returned from the device
> > and the address is not overwritten.
> > 
> > *Implementation note: Although the CXL specification defines the clear
> > command to accept 64 bytes of 'write-data' to be used when clearing
> > the poisoned address, this implementation always uses 0 (zeros) for
> > the write-data.
> > 
> > The clear_poison attribute is only visible for devices supporting the
> > capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
> >  drivers/cxl/core/memdev.c               | 57 ++++++++++++++++++++++++-
> >  drivers/cxl/cxlmem.h                    |  6 +++
> >  3 files changed, 80 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index e9c6dd02bd09..7e4897e7bc05 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -438,3 +438,21 @@ Description:
> >  		inject_poison attribute is only visible for devices supporting
> >  		the capability. Kconfig option CXL_POISON_INJECT must be on
> >  		to enable this option. The default is off.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/memX/clear_poison
> > +Date:		January, 2023
> > +KernelVersion:	v6.3
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(WO) When a Device Physical Address (DPA) is written to this
> > +		attribute, the memdev driver sends a clear poison command to
> > +		the device for the specified address. Clearing poison removes
> > +		the address from the device's Poison List and writes 0 (zero)
> > +		for 64 bytes starting at address. It is not an error to clear
> > +		poison from an address that does not have poison set, and if
> > +		poison was not set, the address is not overwritten. If the
> > +		device cannot clear poison from the address, -ENXIO is returned.
> > +		The clear_poison attribute is only visible for devices
> > +		supporting the capability. Kconfig option CXL_POISON_INJECT
> > +		must be on to enable this option. The default is off.
> 
> So unlike error inject, this interface leaves me cold because it is
> changing the state of data without coordination.
> 
> You might say, "inject poison also changes the state of data without
> coordination", while that is true it is expected that media can go bad
> without warning. What software does not expect is that memory could be
> put back into service without coordination. A memory error wants to be
> cleared by the agent that currently owns the memory, like the page
> allocator clearing PageHWPoison and putting the page back into service,
> or a filesystem restoring a file that was previously quarantined.
> 
> The only way this interface can proceed is if it can assert that the
> poison to be cleared is not mapped by any decoder which makes the owner
> of the memory the administrator using the sysfs interface. That limits
> its utility.

It is a debug interface. 'Changing the state of data without the
coordination' yes, but isn't that coordination up to the user.
I get putting more protection in place, as in not allowing
inject or clear of a DPA that is mapped. 

Do you expect, or fear, that users will 'trigger poison list', and
then issue 'clears' of poisoned DPA's directly, instead of letting
core-mm and filesystems to handle poison?

> 
> Inside the kernel the expectation is that the core-mm or filesystems are
> using facilities like movdir64b to atomically clear poison without
> needing to hassle with a CXL mailbox.
> 
> This sysfs interface can move forward but it needs the idle checks and
> locking before it can issue the command. I would also have an eye
> towards skipping the mailbox call on architectures that have poison
> clearing instruction like x86's movdir64b, because as the spec says:
> "This provides the same functionality as the host directly writing new
> data to the device", so just try to do that by default. However that can
> be a follow-on optimization.

This (have an eye towards skipping ...) sounds like you do want
users to be able to clear real poison using the sysfs interface,
and for the the driver to be able to coordinate it upwards.

I'll go off and learn more about the upwards coordination, and
more thoughts here are welcome!

Thanks,
Alison

> 
> 
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index 226662cf3331..4d86a4565c9e 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -197,6 +197,51 @@ static ssize_t inject_poison_store(struct device *dev,
> >  }
> >  static DEVICE_ATTR_WO(inject_poison);
> >  
> > +static ssize_t clear_poison_store(struct device *dev,
> > +				  struct device_attribute *attr,
> > +				  const char *buf, size_t len)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct cxl_mbox_clear_poison clear;
> > +	struct cxl_mbox_cmd mbox_cmd;
> > +	u64 dpa;
> > +	int rc;
> > +
> > +	rc = kstrtou64(buf, 0, &dpa);
> > +	if (rc)
> > +		return rc;
> > +
> > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > +	if (rc)
> > +		return rc;
> > +	/*
> > +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> > +	 * is defined to accept 64 bytes of 'write-data', along with the
> > +	 * address to clear. The device writes the data into the address
> > +	 * atomically, while clearing poison if the location is marked as
> > +	 * being poisoned.
> > +	 *
> > +	 * Always use '0' for the write-data.
> > +	 */
> > +	clear = (struct cxl_mbox_clear_poison) {
> > +		.address = cpu_to_le64(dpa)
> > +	};
> > +
> > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> > +		.size_in = sizeof(clear),
> > +		.payload_in = &clear,
> > +	};
> > +
> > +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(clear_poison);
> > +
> >  static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_serial.attr,
> >  	&dev_attr_firmware_version.attr,
> > @@ -205,6 +250,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_numa_node.attr,
> >  	&dev_attr_trigger_poison_list.attr,
> >  	&dev_attr_inject_poison.attr,
> > +	&dev_attr_clear_poison.attr,
> >  	NULL,
> >  };
> >  
> > @@ -225,7 +271,8 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  		return 0;
> >  
> >  	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> > -	    a == &dev_attr_inject_poison.attr)
> > +	    (a == &dev_attr_inject_poison.attr ||
> > +	     a == &dev_attr_clear_poison.attr))
> >  		return 0;
> >  
> >  	if (a == &dev_attr_trigger_poison_list.attr) {
> > @@ -242,6 +289,14 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> >  			return 0;
> >  	}
> > +	if (a == &dev_attr_clear_poison.attr) {
> > +		struct device *dev = kobj_to_dev(kobj);
> > +
> > +		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
> > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
> > +			return 0;
> 
> Similar comment as the last patch with respect to the command enabling.
> 
> > +		}
> > +	}
> >  	return a->mode;
> >  }
> >  
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 862ca4f4cc06..adcbd4a98819 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -441,6 +441,12 @@ struct cxl_mbox_inject_poison {
> >  	__le64 address;
> >  };
> >  
> > +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> > +struct cxl_mbox_clear_poison {
> > +	__le64 address;
> > +	u8 write_data[CXL_POISON_LEN_MULT];
> > +} __packed;
> > +
> >  /**
> >   * struct cxl_mem_command - Driver representation of a memory device command
> >   * @info: Command information as it exists for the UAPI
> > -- 
> > 2.37.3
> > 
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-01-28  1:17     ` Alison Schofield
@ 2023-01-28  2:19       ` Dan Williams
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Williams @ 2023-01-28  2:19 UTC (permalink / raw)
  To: Alison Schofield, Dan Williams
  Cc: Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang, linux-cxl,
	Jonathan Cameron

Alison Schofield wrote:
> On Fri, Jan 27, 2023 at 03:56:39PM -0800, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > CXL devices optionally support the CLEAR POISON mailbox command. Add
> > > a sysfs attribute and memdev driver support for clearing poison.
> > > 
> > > When a Device Physical Address (DPA) is written to the clear_poison
> > > sysfs attribute, send a clear poison command to the device for the
> > > specified address.
> > > 
> > > Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid clear
> > > poison request, the device removes the address from the device's Poison
> > > List and writes 0 (zero) for 64 bytes starting at address. If the device
> > > cannot clear poison from the address, it returns a permanent media error
> > > and -ENXIO is returned to the user.
> > > 
> > > Additionally, and per the spec also, it is not an error to clear poison
> > > of an address that is not poisoned. No error is returned from the device
> > > and the address is not overwritten.
> > > 
> > > *Implementation note: Although the CXL specification defines the clear
> > > command to accept 64 bytes of 'write-data' to be used when clearing
> > > the poisoned address, this implementation always uses 0 (zeros) for
> > > the write-data.
> > > 
> > > The clear_poison attribute is only visible for devices supporting the
> > > capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> > > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
> > >  drivers/cxl/core/memdev.c               | 57 ++++++++++++++++++++++++-
> > >  drivers/cxl/cxlmem.h                    |  6 +++
> > >  3 files changed, 80 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > > index e9c6dd02bd09..7e4897e7bc05 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > > @@ -438,3 +438,21 @@ Description:
> > >  		inject_poison attribute is only visible for devices supporting
> > >  		the capability. Kconfig option CXL_POISON_INJECT must be on
> > >  		to enable this option. The default is off.
> > > +
> > > +
> > > +What:		/sys/bus/cxl/devices/memX/clear_poison
> > > +Date:		January, 2023
> > > +KernelVersion:	v6.3
> > > +Contact:	linux-cxl@vger.kernel.org
> > > +Description:
> > > +		(WO) When a Device Physical Address (DPA) is written to this
> > > +		attribute, the memdev driver sends a clear poison command to
> > > +		the device for the specified address. Clearing poison removes
> > > +		the address from the device's Poison List and writes 0 (zero)
> > > +		for 64 bytes starting at address. It is not an error to clear
> > > +		poison from an address that does not have poison set, and if
> > > +		poison was not set, the address is not overwritten. If the
> > > +		device cannot clear poison from the address, -ENXIO is returned.
> > > +		The clear_poison attribute is only visible for devices
> > > +		supporting the capability. Kconfig option CXL_POISON_INJECT
> > > +		must be on to enable this option. The default is off.
> > 
> > So unlike error inject, this interface leaves me cold because it is
> > changing the state of data without coordination.
> > 
> > You might say, "inject poison also changes the state of data without
> > coordination", while that is true it is expected that media can go bad
> > without warning. What software does not expect is that memory could be
> > put back into service without coordination. A memory error wants to be
> > cleared by the agent that currently owns the memory, like the page
> > allocator clearing PageHWPoison and putting the page back into service,
> > or a filesystem restoring a file that was previously quarantined.
> > 
> > The only way this interface can proceed is if it can assert that the
> > poison to be cleared is not mapped by any decoder which makes the owner
> > of the memory the administrator using the sysfs interface. That limits
> > its utility.
> 
> It is a debug interface.

...but it isn't. It's an arbitrary write 64-bytes of zeros primitive.
Nothing gates that the location to be written was one that had poison
injected to it. 

> 'Changing the state of data without the
> coordination' yes, but isn't that coordination up to the user.

There's no way for the user to control that. Consider a block in a pmem
filesystem that is actively adding and removing blocks from files.
There's no mechanism for a user to coordinate that from the driver side.
It needs to be coordinated from the filesystem side.

> I get putting more protection in place, as in not allowing
> inject or clear of a DPA that is mapped. 

That solves most of the concern, and truly makes it a debug interface
since the user action cannot possibly confuse any other part of the
kernel when the DPA is not mapped.

> Do you expect, or fear, that users will 'trigger poison list', and
> then issue 'clears' of poisoned DPA's directly, instead of letting
> core-mm and filesystems to handle poison?

Yes. This policy is also carry over from pmem enabling where you see
that "ndctl clear-errors" is tearing down namespaces before issuing the
clear-error command.

> > Inside the kernel the expectation is that the core-mm or filesystems are
> > using facilities like movdir64b to atomically clear poison without
> > needing to hassle with a CXL mailbox.
> > 
> > This sysfs interface can move forward but it needs the idle checks and
> > locking before it can issue the command. I would also have an eye
> > towards skipping the mailbox call on architectures that have poison
> > clearing instruction like x86's movdir64b, because as the spec says:
> > "This provides the same functionality as the host directly writing new
> > data to the device", so just try to do that by default. However that can
> > be a follow-on optimization.
> 
> This (have an eye towards skipping ...) sounds like you do want
> users to be able to clear real poison using the sysfs interface,
> and for the the driver to be able to coordinate it upwards.

No, the reverse, that clearing poison from sysfs is a debug special case
because it cannot be done at runtime.

There are existing efforts to teach other parts of the kernel how to
better deal with poison memory in general (nothing specific to CXL) [1].
Correctable poison is a new concept for most of the kernel. Usually
memory_failure() is a one way trip to offline the page.

[1]: https://lore.kernel.org/all/20220603053738.1218681-7-ruansy.fnst@fujitsu.com/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-01-27 23:06   ` Dan Williams
@ 2023-01-28  2:47     ` Alison Schofield
  2023-01-29  3:49       ` Dan Williams
  0 siblings, 1 reply; 23+ messages in thread
From: Alison Schofield @ 2023-01-28  2:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang, linux-cxl,
	Jonathan Cameron

On Fri, Jan 27, 2023 at 03:06:16PM -0800, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the INJECT POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for injecting poison.
> > 
> > When a Device Physical Address (DPA) is written to the inject_poison
> > sysfs attribute, send an inject poison command to the device for the
> > specified address.
> > 
> > Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> > inject poison request, the device will return poison when the address
> > is accessed through the CXL.mem bus. Injecting poison adds the address
> > to the device's Poison List and the error source is set to Injected.
> > In addition, the device adds a poison creation event to its internal
> > Informational Event log, updates the Event Status register, and if
> > configured, interrupts the host.
> > 
> > Also, per the CXL Specification, it is not an error to inject poison
> > into an address that already has poison present and no error is
> > returned from the device.
> > 
> > The inject_poison attribute is only visible for devices supporting
> > the capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++
> >  drivers/cxl/Kconfig                     | 10 ++++
> >  drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
> >  drivers/cxl/cxlmem.h                    |  5 ++
> >  4 files changed, 104 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index b715a4609718..e9c6dd02bd09 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -416,3 +416,25 @@ Description:
> >  		if accessed, and the source of the poison. The retrieved
> >  		errors are logged as kernel trace events with the label
> >  		'cxl_poison'.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/memX/inject_poison
> > +Date:		January, 2023
> > +KernelVersion:	v6.3
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(WO) When a Device Physical Address (DPA) is written to this
> > +		attribute, the memdev driver sends an inject poison command to
> > +		the device for the specified address. The DPA must be 64-byte
> > +		aligned and the length of the injected poison is 64-bytes. If
> > +		successful, the device returns poison when the address is
> > +		accessed through the CXL.mem bus. Injecting poison adds the
> > +		address to the device's Poison List and the error source is set
> > +		to Injected. In addition, the device adds a poison creation
> > +		event to its internal Informational Event log, updates the
> > +		Event Status register, and if configured, interrupts the host.
> > +		It is not an error to inject poison into an address that
> > +		already has poison present and no error is returned. The
> > +		inject_poison attribute is only visible for devices supporting
> > +		the capability. Kconfig option CXL_POISON_INJECT must be on
> > +		to enable this option. The default is off.
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index 0ac53c422c31..6541f54725cd 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -129,4 +129,14 @@ config CXL_REGION_INVALIDATION_TEST
> >  	  If unsure, or if this kernel is meant for production environments,
> >  	  say N.
> >  
> > +config CXL_POISON_INJECT
> > +	bool "CXL: Support CXL Memory Device Poison Inject"
> > +	depends on CXL_MEM
> > +	help
> > +	  Selecting this option creates the sysfs attributes inject_poison
> > +	  and clear_poison for CXL memory devices supporting the capability.
> > +	  See Documentation/ABI/testing/sysfs-bus-cxl.
> 
> Could maybe clarify that this is meant for hardware debug scenarios so
> that is the reason it is disabled by default.
>
Got it. Thanks!

> > +
> > +	  If unsure, say N.
> > +
> >  endif
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index e0af7e9c9989..226662cf3331 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -142,6 +142,61 @@ static ssize_t trigger_poison_list_store(struct device *dev,
> >  }
> >  static DEVICE_ATTR_WO(trigger_poison_list);
> >  
> > +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> > +{
> > +	if (!resource_size(&cxlds->dpa_res)) {
> > +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> > +		return -EINVAL;
> > +	}
> > +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> > +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> > +			dpa, &cxlds->dpa_res);
> > +		return -EINVAL;
> > +	}
> > +	if (!IS_ALIGNED(dpa, 64)) {
> > +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n",
> > +			dpa);
> > +		return -EINVAL;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static ssize_t inject_poison_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct cxl_mbox_inject_poison inject;
> > +	struct cxl_mbox_cmd mbox_cmd;
> > +	u64 dpa;
> > +	int rc;
> > +
> > +	rc = kstrtou64(buf, 0, &dpa);
> > +	if (rc)
> > +		return rc;
> > +
> > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > +	if (rc)
> > +		return rc;
> > +
> > +	inject = (struct cxl_mbox_inject_poison) {
> > +		.address = cpu_to_le64(dpa)
> > +	};
> > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> > +		.size_in = sizeof(inject),
> > +		.payload_in = &inject,
> > +	};
> > +
> > +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(inject_poison);
> > +
> >  static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_serial.attr,
> >  	&dev_attr_firmware_version.attr,
> > @@ -149,6 +204,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_label_storage_size.attr,
> >  	&dev_attr_numa_node.attr,
> >  	&dev_attr_trigger_poison_list.attr,
> > +	&dev_attr_inject_poison.attr,
> >  	NULL,
> >  };
> >  
> > @@ -168,6 +224,10 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  	if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
> >  		return 0;
> >  
> > +	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> > +	    a == &dev_attr_inject_poison.attr)
> > +		return 0;
> > +
> >  	if (a == &dev_attr_trigger_poison_list.attr) {
> >  		struct device *dev = kobj_to_dev(kobj);
> >  
> > @@ -175,6 +235,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> >  			return 0;
> >  	}
> > +	if (a == &dev_attr_inject_poison.attr) {
> > +		struct device *dev = kobj_to_dev(kobj);
> 
> I'd move the IS_ENABLED(CONFIG_CXL_POISON_INJECT) inside here so just
> one spot in this function handles the poison attribute.

OK

> 
> > +
> > +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > +			return 0;
> 
> Ugh, this is a problem. So "inject poison" never should have been
> enabled for the ioctl path way back in:
> 
> 87815ee9d006 cxl/pci: Add media provisioning required commands
> 
> All the nice sysfs interface and compile option to turn it off in this
> patch is moot since userspace can just send the ioctl if the sysfs
> attribute is missing.
> 
> On the one hand this is already shipping ABI, but given cxl-cli has not
> been enabled it chances are high that it can be deleted without anyone
> caring (i.e. breaking deployed configurations). That would need to be a
> lead in patch.

I'm confused on how to kill it.

I see it in the enum here: include/uapi/linux/cxl_mem.h, and I also
see Ira's patch reminding us not to break backward compatibility on
that enum. Do I replace with dummy entries?

I'll block in raw mode too.

> As for how to detect that the inject poison opcode is
> supported, that needs something like a custom "cxlds->has_poison_inject"
> flag into cxl_walk_cel(). I.e.  cxlds->enabled_cmds is only about the
> enabled ioctl wrapper commands for CXL opcodes not the availability of
> opcodes for cxl_internal_send_cmd().
Sounds like fun ;)

Thanks Dan,
Alison

> 
> 
> > +	}
> >  	return a->mode;
> >  }
> >  
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 28ba0cd8f2d3..862ca4f4cc06 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -436,6 +436,11 @@ struct cxl_mbox_poison_payload_out {
> >  #define CXL_POISON_SOURCE_INJECTED	3
> >  #define CXL_POISON_SOURCE_VENDOR	7
> >  
> > +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> > +struct cxl_mbox_inject_poison {
> > +	__le64 address;
> > +};
> > +
> >  /**
> >   * struct cxl_mem_command - Driver representation of a memory device command
> >   * @info: Command information as it exists for the UAPI
> > -- 
> > 2.37.3
> > 
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-01-28  2:47     ` Alison Schofield
@ 2023-01-29  3:49       ` Dan Williams
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Williams @ 2023-01-29  3:49 UTC (permalink / raw)
  To: Alison Schofield, Dan Williams
  Cc: Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang, linux-cxl,
	Jonathan Cameron

Alison Schofield wrote:
> On Fri, Jan 27, 2023 at 03:06:16PM -0800, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > CXL devices optionally support the INJECT POISON mailbox command. Add
> > > a sysfs attribute and memdev driver support for injecting poison.
> > > 
> > > When a Device Physical Address (DPA) is written to the inject_poison
> > > sysfs attribute, send an inject poison command to the device for the
> > > specified address.
> > > 
> > > Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> > > inject poison request, the device will return poison when the address
> > > is accessed through the CXL.mem bus. Injecting poison adds the address
> > > to the device's Poison List and the error source is set to Injected.
> > > In addition, the device adds a poison creation event to its internal
> > > Informational Event log, updates the Event Status register, and if
> > > configured, interrupts the host.
> > > 
> > > Also, per the CXL Specification, it is not an error to inject poison
> > > into an address that already has poison present and no error is
> > > returned from the device.
> > > 
> > > The inject_poison attribute is only visible for devices supporting
> > > the capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> > > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++
> > >  drivers/cxl/Kconfig                     | 10 ++++
> > >  drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
> > >  drivers/cxl/cxlmem.h                    |  5 ++
> > >  4 files changed, 104 insertions(+)
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > > index b715a4609718..e9c6dd02bd09 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > > @@ -416,3 +416,25 @@ Description:
> > >  		if accessed, and the source of the poison. The retrieved
> > >  		errors are logged as kernel trace events with the label
> > >  		'cxl_poison'.
> > > +
> > > +
> > > +What:		/sys/bus/cxl/devices/memX/inject_poison
> > > +Date:		January, 2023
> > > +KernelVersion:	v6.3
> > > +Contact:	linux-cxl@vger.kernel.org
> > > +Description:
> > > +		(WO) When a Device Physical Address (DPA) is written to this
> > > +		attribute, the memdev driver sends an inject poison command to
> > > +		the device for the specified address. The DPA must be 64-byte
> > > +		aligned and the length of the injected poison is 64-bytes. If
> > > +		successful, the device returns poison when the address is
> > > +		accessed through the CXL.mem bus. Injecting poison adds the
> > > +		address to the device's Poison List and the error source is set
> > > +		to Injected. In addition, the device adds a poison creation
> > > +		event to its internal Informational Event log, updates the
> > > +		Event Status register, and if configured, interrupts the host.
> > > +		It is not an error to inject poison into an address that
> > > +		already has poison present and no error is returned. The
> > > +		inject_poison attribute is only visible for devices supporting
> > > +		the capability. Kconfig option CXL_POISON_INJECT must be on
> > > +		to enable this option. The default is off.
> > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > > index 0ac53c422c31..6541f54725cd 100644
> > > --- a/drivers/cxl/Kconfig
> > > +++ b/drivers/cxl/Kconfig
> > > @@ -129,4 +129,14 @@ config CXL_REGION_INVALIDATION_TEST
> > >  	  If unsure, or if this kernel is meant for production environments,
> > >  	  say N.
> > >  
> > > +config CXL_POISON_INJECT
> > > +	bool "CXL: Support CXL Memory Device Poison Inject"
> > > +	depends on CXL_MEM
> > > +	help
> > > +	  Selecting this option creates the sysfs attributes inject_poison
> > > +	  and clear_poison for CXL memory devices supporting the capability.
> > > +	  See Documentation/ABI/testing/sysfs-bus-cxl.
> > 
> > Could maybe clarify that this is meant for hardware debug scenarios so
> > that is the reason it is disabled by default.
> >
> Got it. Thanks!
> 
> > > +
> > > +	  If unsure, say N.
> > > +
> > >  endif
> > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > > index e0af7e9c9989..226662cf3331 100644
> > > --- a/drivers/cxl/core/memdev.c
> > > +++ b/drivers/cxl/core/memdev.c
> > > @@ -142,6 +142,61 @@ static ssize_t trigger_poison_list_store(struct device *dev,
> > >  }
> > >  static DEVICE_ATTR_WO(trigger_poison_list);
> > >  
> > > +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> > > +{
> > > +	if (!resource_size(&cxlds->dpa_res)) {
> > > +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> > > +		return -EINVAL;
> > > +	}
> > > +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> > > +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> > > +			dpa, &cxlds->dpa_res);
> > > +		return -EINVAL;
> > > +	}
> > > +	if (!IS_ALIGNED(dpa, 64)) {
> > > +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n",
> > > +			dpa);
> > > +		return -EINVAL;
> > > +	}
> > > +	return 0;
> > > +}
> > > +
> > > +static ssize_t inject_poison_store(struct device *dev,
> > > +				   struct device_attribute *attr,
> > > +				   const char *buf, size_t len)
> > > +{
> > > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > > +	struct cxl_mbox_inject_poison inject;
> > > +	struct cxl_mbox_cmd mbox_cmd;
> > > +	u64 dpa;
> > > +	int rc;
> > > +
> > > +	rc = kstrtou64(buf, 0, &dpa);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	inject = (struct cxl_mbox_inject_poison) {
> > > +		.address = cpu_to_le64(dpa)
> > > +	};
> > > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > > +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> > > +		.size_in = sizeof(inject),
> > > +		.payload_in = &inject,
> > > +	};
> > > +
> > > +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	return len;
> > > +}
> > > +static DEVICE_ATTR_WO(inject_poison);
> > > +
> > >  static struct attribute *cxl_memdev_attributes[] = {
> > >  	&dev_attr_serial.attr,
> > >  	&dev_attr_firmware_version.attr,
> > > @@ -149,6 +204,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> > >  	&dev_attr_label_storage_size.attr,
> > >  	&dev_attr_numa_node.attr,
> > >  	&dev_attr_trigger_poison_list.attr,
> > > +	&dev_attr_inject_poison.attr,
> > >  	NULL,
> > >  };
> > >  
> > > @@ -168,6 +224,10 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> > >  	if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
> > >  		return 0;
> > >  
> > > +	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> > > +	    a == &dev_attr_inject_poison.attr)
> > > +		return 0;
> > > +
> > >  	if (a == &dev_attr_trigger_poison_list.attr) {
> > >  		struct device *dev = kobj_to_dev(kobj);
> > >  
> > > @@ -175,6 +235,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> > >  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > >  			return 0;
> > >  	}
> > > +	if (a == &dev_attr_inject_poison.attr) {
> > > +		struct device *dev = kobj_to_dev(kobj);
> > 
> > I'd move the IS_ENABLED(CONFIG_CXL_POISON_INJECT) inside here so just
> > one spot in this function handles the poison attribute.
> 
> OK
> 
> > 
> > > +
> > > +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> > > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > > +			return 0;
> > 
> > Ugh, this is a problem. So "inject poison" never should have been
> > enabled for the ioctl path way back in:
> > 
> > 87815ee9d006 cxl/pci: Add media provisioning required commands
> > 
> > All the nice sysfs interface and compile option to turn it off in this
> > patch is moot since userspace can just send the ioctl if the sysfs
> > attribute is missing.
> > 
> > On the one hand this is already shipping ABI, but given cxl-cli has not
> > been enabled it chances are high that it can be deleted without anyone
> > caring (i.e. breaking deployed configurations). That would need to be a
> > lead in patch.
> 
> I'm confused on how to kill it.
> 
> I see it in the enum here: include/uapi/linux/cxl_mem.h, and I also
> see Ira's patch reminding us not to break backward compatibility on
> that enum. Do I replace with dummy entries?

Right, they need to be converted to "deprecated" entries that keep the
command id order intact.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-01-29  3:49 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-19  5:00 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison alison.schofield
2023-01-19  5:00 ` [PATCH v2 1/6] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
2023-01-27 23:06   ` Dan Williams
2023-01-28  2:47     ` Alison Schofield
2023-01-29  3:49       ` Dan Williams
2023-01-19  5:00 ` [PATCH v2 2/6] cxl/memdev: Add support for the Clear " alison.schofield
2023-01-27 23:56   ` Dan Williams
2023-01-28  1:17     ` Alison Schofield
2023-01-28  2:19       ` Dan Williams
2023-01-19  5:00 ` [PATCH v2 3/6] tools/testing/cxl: Mock the Inject " alison.schofield
2023-01-23 15:10   ` Jonathan Cameron
2023-01-24  0:06     ` Alison Schofield
2023-01-19  5:00 ` [PATCH v2 4/6] tools/testing/cxl: Mock the Clear " alison.schofield
2023-01-19  5:00 ` [PATCH v2 5/6] tools/testing/cxl: Use injected poison for get poison list alison.schofield
2023-01-23 15:16   ` Jonathan Cameron
2023-01-24  0:24     ` Alison Schofield
2023-01-24 10:15       ` Jonathan Cameron
2023-01-19  5:00 ` [PATCH v2 6/6] tools/testing/cxl: Add a param to test poison injection limits alison.schofield
2023-01-23 15:28   ` Jonathan Cameron
2023-01-23 23:57     ` Alison Schofield
2023-01-23 17:13 ` [PATCH v2 0/6] cxl: CXL Inject & Clear Poison Jonathan Cameron
2023-01-23 23:42   ` Alison Schofield
2023-01-24 10:21     ` Jonathan Cameron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.