All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] cxl: CXL Inject & Clear Poison
@ 2022-11-30  4:34 alison.schofield
  2022-11-30  4:34 ` [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: alison.schofield @ 2022-11-30  4:34 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Introducing Inject and Clear Poison support for CXL Devices.

These are optional commands, meaning not all CXL devices must support
them. The sysfs attributes, inject_poison and clear_poison, are only
visible for devices reporting support of the capability.

AFAICS the spec does not say support has to come in pairs. A device
could support Inject and not Clear, or the reverse.

Patches 1 & 2 are the required driver implementation.
Patches 3 & 4 are required for testing with cxl_test.

Then there's patch 5. I built on top of the Get Poison Patchset[1]
because Get Poison is useful for checking on the success of Inject
and Clear Poison. Patch 5 teaches the mock of Get Poison how to use
the injected poison list.

That could all be flipped. Inject & Clear could merge first, then
Get Poison would have the capabilities of Patch 5 from the onset.
Or, they can all roll together into one happy patchset.

Example:
# cxl list -m mem1 --media-errors -u
Now you see the default mock media-errors:
{
  "memdev":"mem1",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0x1",
  "host":"cxl_mem.1",
  "media_errors":{
    "nr_media_errors":2,
    "media_error_records":[
      {
        "dpa":"0x40000000",
        "length":64,
        "source":"Internal",
        "flags":"",
        "overflow_time":0
      },
      {
        "dpa":"0",
        "length":64,
        "source":"Internal",
        "flags":"",
        "overflow_time":0
      }
    ]
  }
}

# echo 0x41000000 > /sys/bus/cxl/devices/mem1/inject_poison
# echo 0x30000000 > /sys/bus/cxl/devices/mem1/inject_poison
Now you see the injected mock media-errors:
# cxl list -m mem1 --media-errors -u
{
  "memdev":"mem1",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0x1",
  "host":"cxl_mem.1",
  "media_errors":{
    "nr_media_errors":2,
    "media_error_records":[
      {
        "dpa":"0x41000000",
        "length":64,
        "source":"Injected",
        "flags":"",
        "overflow_time":0
      },
      {
        "dpa":"0x30000000",
        "length":64,
        "source":"Injected",
        "flags":"",
        "overflow_time":0
      }
    ]
  }

Alison Schofield (5):
  cxl/memdev: Add support for the Inject Poison mailbox command
  cxl/memdev: Add support for the Clear Poison mailbox command
  tools/testing/cxl: Mock the Inject Poison mailbox command
  tools/testing/cxl: Mock the Clear Poison mailbox command
  tools/testing/cxl: Use injected poison for Get Poison List

 Documentation/ABI/testing/sysfs-bus-cxl |  36 +++++
 drivers/cxl/core/memdev.c               | 100 ++++++++++++++
 drivers/cxl/cxlmem.h                    |   9 ++
 tools/testing/cxl/test/mem.c            | 170 +++++++++++++++++++++---
 4 files changed, 295 insertions(+), 20 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command
  2022-11-30  4:34 [PATCH 0/5] cxl: CXL Inject & Clear Poison alison.schofield
@ 2022-11-30  4:34 ` alison.schofield
  2022-11-30 14:31   ` Jonathan Cameron
                     ` (2 more replies)
  2022-11-30  4:34 ` [PATCH 2/5] cxl/memdev: Add support for the Clear " alison.schofield
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 25+ messages in thread
From: alison.schofield @ 2022-11-30  4:34 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the INJECT POISON mailbox command. Add
a sysfs attribute and memdev driver support for injecting poison.

When a Device Physical Address (DPA) is written to the inject_poison
sysfs attribute send an inject poison command to the device for the
specified address.

Per the CXL Specification (8.2.9.8.4.2), after receiving a valid
inject poison request, the device will return poison when the address
is accessed through the CXL.mem bus. Injecting poison adds the address
to the device's Poison List and the error source is set to injected
error. In addition, the device adds a poison creation event to its
internal Informational Event log, updates the Event Status register,
and if configured, interrupts the host.

Also, per the CXL Specification, it is not an error to inject poison
into an address that already has poison present and no error is returned
from the device. The memdev driver performs basic sanity checking on the
address, however, it does not go as far as reading the poison list to see
if the address is on the list. That discovery is left to the device.

The inject_poison attribute is only visible for devices supporting
the capability.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 19 +++++++++
 drivers/cxl/core/memdev.c               | 53 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  3 ++
 3 files changed, 75 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index b715a4609718..20db97f7a1aa 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -416,3 +416,22 @@ Description:
 		if accessed, and the source of the poison. The retrieved
 		errors are logged as kernel trace events with the label
 		'cxl_poison'.
+
+
+What:		/sys/bus/cxl/devices/memX/inject_poison
+Date:		December, 2022
+KernelVersion:	v6.2
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute the memdev driver sends an inject poison command to
+		the device for the specified address. If successful, the device
+		returns poison when the address is accessed through the CXL.mem
+		bus. Injecting poison adds the address to the device's Poison
+		List and the error source is set to injected error. In addition,
+		the device adds a poison creation event to its internal
+		Informational Event log, updates the Event Status register, and
+		if configured, interrupts the host. It is not an error to inject
+		poison into an address that already has poison present and no
+		error is returned. The inject_poison attribute is only visible
+                for devices supporting the capability.
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index d08b7295a01c..71130813030f 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -142,6 +142,51 @@ static ssize_t trigger_poison_list_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(trigger_poison_list);
 
+static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	if (!resource_size(&cxlds->dpa_res)) {
+		dev_dbg(cxlds->dev, "device has no dpa resource\n");
+		return -EINVAL;
+	}
+	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
+		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
+			dpa, &cxlds->dpa_res);
+		return -EINVAL;
+	}
+	if ((dpa & CXL_POISON_INJECT_RESERVED) != 0) {
+		dev_dbg(cxlds->dev, "dpa reserve bit(s) [5:0] set 0x%llx\n",
+			dpa);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static ssize_t inject_poison_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	u64 dpa;
+	int rc;
+
+	rc = kstrtou64(buf, 0, &dpa);
+	if (rc)
+		return rc;
+	rc = cxl_validate_poison_dpa(cxlds, dpa);
+	if (rc)
+		return rc;
+
+	dpa = cpu_to_le64(dpa);
+	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_INJECT_POISON, &dpa,
+			       sizeof(dpa), NULL, cxlds->payload_size);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_WO(inject_poison);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
@@ -149,6 +194,7 @@ static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_label_storage_size.attr,
 	&dev_attr_numa_node.attr,
 	&dev_attr_trigger_poison_list.attr,
+	&dev_attr_inject_poison.attr,
 	NULL,
 };
 
@@ -175,6 +221,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
 			return 0;
 	}
+	if (a == &dev_attr_inject_poison.attr) {
+		struct device *dev = kobj_to_dev(kobj);
+
+		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
+			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
+			return 0;
+	}
 	return a->mode;
 }
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 19a9e545ac19..0d4c34be7335 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -396,6 +396,9 @@ struct cxl_mbox_poison_payload_out {
 #define CXL_POISON_SOURCE_INJECTED	3
 #define CXL_POISON_SOURCE_VENDOR	7
 
+/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
+#define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] cxl/memdev: Add support for the Clear Poison mailbox command
  2022-11-30  4:34 [PATCH 0/5] cxl: CXL Inject & Clear Poison alison.schofield
  2022-11-30  4:34 ` [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
@ 2022-11-30  4:34 ` alison.schofield
  2022-11-30 14:43   ` Jonathan Cameron
  2022-12-01 17:54   ` Dave Jiang
  2022-11-30  4:34 ` [PATCH 3/5] tools/testing/cxl: Mock the Inject " alison.schofield
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 25+ messages in thread
From: alison.schofield @ 2022-11-30  4:34 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the CLEAR POISON mailbox command. Add
a sysfs attribute and memdev driver support for clearing poison.

When a Device Physical Address (DPA) is written to the clear_poison
sysfs attribute send a clear poison command to the device for the
specified address.

Per the CXL Specification (8.2.9.8.4.3), after receiving a valid clear
poison request, the device removes the address from the device's Poison
List and writes 0 (zero) for 64 bytes starting at address. If the device
cannot clear poison from the address, it returns a permanent media error
and ENXIO is returned to the user.

Additionally, and per the spec also, it is not an error to clear poison
of an address that is not poisoned. No error is returned and the address
is not overwritten. The memdev driver performs basic sanity checking on
the address, however, it does not go as far as reading the poison list to
see if the address is poisoned before clearing. That discovery is left to
the device. The device safely handles that case.

Implementation note: Although the CXL specification defines the clear
command to accept 64 bytes of 'write-data' to be used when clearing
the poisoned address, this implementation always uses 0 (zeros) for
the write-data.

The clear_poison attribute is only visible for devices supporting the
capability.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 17 +++++++++
 drivers/cxl/core/memdev.c               | 47 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  6 ++++
 3 files changed, 70 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 20db97f7a1aa..9d2b0fa07e17 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -435,3 +435,20 @@ Description:
 		poison into an address that already has poison present and no
 		error is returned. The inject_poison attribute is only visible
                 for devices supporting the capability.
+
+
+What:		/sys/bus/cxl/devices/memX/clear_poison
+Date:		December, 2022
+KernelVersion:	v6.2
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute the memdev driver sends a clear poison command to the
+		device for the specified address. Clearing poison removes the
+		address from the device's Poison List and writes 0 (zero)
+		for 64 bytes starting at address. It is not an error to clear
+		poison from an address that does not have poison set, and if
+		poison was not set, the address is not overwritten. If the
+		device cannot clear poison from the address, ENXIO is returned.
+		The clear_poison attribute is only visible for devices
+		supporting the capability.
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 71130813030f..85caffd5a85c 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -187,6 +187,44 @@ static ssize_t inject_poison_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(inject_poison);
 
+static ssize_t clear_poison_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_mbox_clear_poison *pi;
+	u64 dpa;
+	int rc;
+
+	rc = kstrtou64(buf, 0, &dpa);
+	if (rc)
+		return rc;
+	rc = cxl_validate_poison_dpa(cxlds, dpa);
+	if (rc)
+		return rc;
+	pi = kzalloc(sizeof(*pi), GFP_KERNEL);
+	if (!pi)
+		return -ENOMEM;
+	/*
+	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
+	 * is defined to accept 64 bytes of 'write-data', along with the
+	 * address to clear. The device writes 'write-data' into the DPA,
+	 * atomically, while clearing poison if the location is marked as
+	 * being poisoned.
+	 *
+	 * Always use '0' for the write-data.
+	 */
+	pi->address = cpu_to_le64(dpa);
+	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_POISON, pi,
+			       sizeof(*pi), NULL, cxlds->payload_size);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_WO(clear_poison);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
@@ -195,6 +233,7 @@ static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_numa_node.attr,
 	&dev_attr_trigger_poison_list.attr,
 	&dev_attr_inject_poison.attr,
+	&dev_attr_clear_poison.attr,
 	NULL,
 };
 
@@ -228,6 +267,14 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
 			return 0;
 	}
+	if (a == &dev_attr_clear_poison.attr) {
+		struct device *dev = kobj_to_dev(kobj);
+
+		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
+			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
+			return 0;
+		}
+	}
 	return a->mode;
 }
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 0d4c34be7335..532adf9c3afd 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -399,6 +399,12 @@ struct cxl_mbox_poison_payload_out {
 /* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
 #define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
 
+/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
+struct cxl_mbox_clear_poison {
+	__le64 address;
+	u8 write_data[64];
+} __packed;
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] tools/testing/cxl: Mock the Inject Poison mailbox command
  2022-11-30  4:34 [PATCH 0/5] cxl: CXL Inject & Clear Poison alison.schofield
  2022-11-30  4:34 ` [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
  2022-11-30  4:34 ` [PATCH 2/5] cxl/memdev: Add support for the Clear " alison.schofield
@ 2022-11-30  4:34 ` alison.schofield
  2022-11-30 14:58   ` Jonathan Cameron
  2022-11-30  4:34 ` [PATCH 4/5] tools/testing/cxl: Mock the Clear " alison.schofield
  2022-11-30  4:34 ` [PATCH 5/5] tools/testing/cxl: Use injected poison for Get Poison List alison.schofield
  4 siblings, 1 reply; 25+ messages in thread
From: alison.schofield @ 2022-11-30  4:34 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Mock the injection of poison by storing the device:address entry in a
cxl_test array: mock_poison[]. Limit the array to 64 entries and fail
new inject requests when full.

Introducing the mock_poison[] list here, makes it available for use in
the mock of Clear Poison, and the mock of Get Poison List.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 tools/testing/cxl/test/mem.c | 53 ++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index a4f81915ec03..98acb9a644df 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -13,6 +13,7 @@
 #define LSA_SIZE SZ_128K
 #define DEV_SIZE SZ_2G
 #define EFFECT(x) (1U << x)
+#define MOCK_INJECT_POISON_MAX 64
 
 static struct cxl_cel_entry mock_cel[] = {
 	{
@@ -43,6 +44,10 @@ static struct cxl_cel_entry mock_cel[] = {
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
 		.effect = cpu_to_le16(0),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
+		.effect = cpu_to_le16(0),
+	},
 };
 
 /* See CXL 2.0 Table 181 Get Health Info Output Payload */
@@ -210,6 +215,51 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static struct mock_poison {
+	struct cxl_dev_state *cxlds;
+	u64 dpa;
+} mock_poison[MOCK_INJECT_POISON_MAX];
+
+static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
+		if (!mock_poison[i].cxlds) {
+			mock_poison[i].cxlds = cxlds;
+			mock_poison[i].dpa = dpa;
+			return true;
+		}
+	}
+	dev_dbg(cxlds->dev, "Mock poison list full: %d\n",
+		MOCK_INJECT_POISON_MAX);
+	return false;
+}
+
+static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
+		if (mock_poison[i].cxlds == cxlds &&
+		    mock_poison[i].dpa == dpa)
+			return true;
+	}
+	return false;
+}
+
+static int mock_inject_poison(struct cxl_dev_state *cxlds,
+			      struct cxl_mbox_cmd *cmd)
+{
+	u64 dpa = le64_to_cpu(*((u64 *)cmd->payload_in));
+
+	if (mock_poison_found(cxlds, dpa)) {
+		/* Not an error to inject poison if already poisoned */
+		dev_dbg(cxlds->dev, "DPA: 0x%llx already poisoned\n", dpa);
+		return 0;
+	}
+	if (!mock_poison_add(cxlds, dpa))
+		return -ENXIO;
+
+	return 0;
+}
+
 static int mock_get_poison(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
@@ -268,6 +318,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_GET_POISON:
 		rc = mock_get_poison(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_INJECT_POISON:
+		rc = mock_inject_poison(cxlds, cmd);
+		break;
 	default:
 		break;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] tools/testing/cxl: Mock the Clear Poison mailbox command
  2022-11-30  4:34 [PATCH 0/5] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (2 preceding siblings ...)
  2022-11-30  4:34 ` [PATCH 3/5] tools/testing/cxl: Mock the Inject " alison.schofield
@ 2022-11-30  4:34 ` alison.schofield
  2022-11-30 15:01   ` Jonathan Cameron
  2022-11-30  4:34 ` [PATCH 5/5] tools/testing/cxl: Use injected poison for Get Poison List alison.schofield
  4 siblings, 1 reply; 25+ messages in thread
From: alison.schofield @ 2022-11-30  4:34 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Mock the clear of poison by deleting the device:address entry from
the cxl_test array: mock_poison[]. Behave like a real CXL device and do
not fail if the address is not in the poison list, but offer a dev_dbg()
message.

Unlike a real CXL device, no data is written to the address being cleared.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 tools/testing/cxl/test/mem.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 98acb9a644df..9d794fbe5ee1 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -48,6 +48,10 @@ static struct cxl_cel_entry mock_cel[] = {
 		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
 		.effect = cpu_to_le16(0),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_CLEAR_POISON),
+		.effect = cpu_to_le16(0),
+	},
 };
 
 /* See CXL 2.0 Table 181 Get Health Info Output Payload */
@@ -244,6 +248,35 @@ static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
 	return false;
 }
 
+static bool mock_poison_del(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
+		if (mock_poison[i].cxlds == cxlds &&
+		    mock_poison[i].dpa == dpa) {
+			mock_poison[i].cxlds = NULL;
+			return true;
+		}
+	}
+	return false;
+}
+
+static int mock_clear_poison(struct cxl_dev_state *cxlds,
+			     struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_clear_poison *pi = cmd->payload_in;
+
+	u64 dpa = le64_to_cpu(pi->address);
+	/*
+	 * A real CXL device will write pi->write_data to the address
+	 * being cleared. In this mock, just delete this address from
+	 * the mock poison list.
+	 */
+	if (!mock_poison_del(cxlds, dpa))
+		dev_dbg(cxlds->dev, "DPA: 0x%llx not in poison list\n", dpa);
+
+	return 0;
+}
+
 static int mock_inject_poison(struct cxl_dev_state *cxlds,
 			      struct cxl_mbox_cmd *cmd)
 {
@@ -321,6 +354,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_INJECT_POISON:
 		rc = mock_inject_poison(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_CLEAR_POISON:
+		rc = mock_clear_poison(cxlds, cmd);
+		break;
 	default:
 		break;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] tools/testing/cxl: Use injected poison for Get Poison List
  2022-11-30  4:34 [PATCH 0/5] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (3 preceding siblings ...)
  2022-11-30  4:34 ` [PATCH 4/5] tools/testing/cxl: Mock the Clear " alison.schofield
@ 2022-11-30  4:34 ` alison.schofield
  2022-11-30 15:15   ` Jonathan Cameron
  4 siblings, 1 reply; 25+ messages in thread
From: alison.schofield @ 2022-11-30  4:34 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Prior to poison inject & clear support, the mock of 'Get Poison List'
returned a list containing a single mocked error record.

Now that the mock of Inject and Clear poison maintains a mock_poison[]
list, use that when available for the device. Otherwise, return the
existing default mocked error record.

This enables roundtrip userspace testing of the inject, clear, and
poison list support.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 tools/testing/cxl/test/mem.c | 97 +++++++++++++++++++++++++-----------
 1 file changed, 69 insertions(+), 28 deletions(-)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 9d794fbe5ee1..31c147cf2d5b 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -224,6 +224,75 @@ static struct mock_poison {
 	u64 dpa;
 } mock_poison[MOCK_INJECT_POISON_MAX];
 
+struct mock_poison_po {
+	struct cxl_mbox_poison_payload_out poison_out;
+	struct cxl_poison_record record[MOCK_INJECT_POISON_MAX];
+};
+
+/*
+ * Use the default poison payload when no injected poison is currently
+ * mocked for this device. Mock one poison record with length 64 bytes.
+ */
+static struct mock_poison_po default_poison_po = {
+	.poison_out.count = cpu_to_le16(1),
+	.record[0].length = cpu_to_le32(1),
+};
+
+static struct mock_poison_po *cxl_get_injected_po(struct cxl_dev_state *cxlds,
+						  u64 offset, u64 length)
+{
+	struct mock_poison_po *po;
+	int nr_records = 0;
+	u64 dpa;
+
+	po = kzalloc(sizeof(*po), GFP_KERNEL);
+	if (!po)
+		return NULL;
+
+	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
+		if (mock_poison[i].cxlds != cxlds)
+			continue;
+		if (mock_poison[i].dpa < offset ||
+		    mock_poison[i].dpa > offset + length - 1)
+			continue;
+		dpa = cpu_to_le64(mock_poison[i].dpa |
+				  CXL_POISON_SOURCE_INJECTED);
+		po->record[nr_records].address = dpa;
+		po->record[nr_records].length = cpu_to_le32(1);
+		nr_records++;
+	}
+	if (!nr_records) {
+		kfree(po);
+		return NULL;
+	}
+	po->poison_out.count = cpu_to_le16(nr_records);
+	return po;
+}
+
+static int mock_get_poison(struct cxl_dev_state *cxlds,
+			   struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
+	struct mock_poison_po *po;
+
+	po = cxl_get_injected_po(cxlds, pi->offset, pi->length);
+	if (!po) {
+		po = &default_poison_po;
+		po->record[0].address = cpu_to_le64(pi->offset |
+						    CXL_POISON_SOURCE_INTERNAL);
+	}
+	if (cmd->size_out < sizeof(po))
+		return -EINVAL;
+
+	memcpy(cmd->payload_out, po, sizeof(*po));
+	cmd->size_out = sizeof(po);
+
+	if (po != &default_poison_po)
+		kfree(po);
+
+	return 0;
+}
+
 static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
 {
 	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
@@ -293,34 +362,6 @@ static int mock_inject_poison(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
-static int mock_get_poison(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
-{
-	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
-
-	struct {
-		struct cxl_mbox_poison_payload_out poison_out;
-		struct cxl_poison_record record;
-	} mock_poison_list = {
-		.poison_out = {
-			.count = cpu_to_le16(1),
-		},
-		/* Mock one poison record at pi.offset for 64 bytes */
-		.record = {
-			/* .address encodes DPA and poison source bits */
-			.address = cpu_to_le64(pi->offset |
-					       CXL_POISON_SOURCE_INTERNAL),
-			.length = cpu_to_le32(1),
-		},
-	};
-
-	if (cmd->size_out < sizeof(mock_poison_list))
-		return -EINVAL;
-
-	memcpy(cmd->payload_out, &mock_poison_list, sizeof(mock_poison_list));
-	cmd->size_out = sizeof(mock_poison_list);
-	return 0;
-}
-
 static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct device *dev = cxlds->dev;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command
  2022-11-30  4:34 ` [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
@ 2022-11-30 14:31   ` Jonathan Cameron
  2022-11-30 14:40     ` Jonathan Cameron
  2022-12-01 17:26   ` Dave Jiang
  2022-12-04 22:04   ` Dan Williams
  2 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2022-11-30 14:31 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Tue, 29 Nov 2022 20:34:33 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the INJECT POISON mailbox command. Add
> a sysfs attribute and memdev driver support for injecting poison.
> 
> When a Device Physical Address (DPA) is written to the inject_poison
> sysfs attribute send an inject poison command to the device for the
> specified address.
> 
> Per the CXL Specification (8.2.9.8.4.2), after receiving a valid
> inject poison request, the device will return poison when the address
> is accessed through the CXL.mem bus. Injecting poison adds the address
> to the device's Poison List and the error source is set to injected
> error. In addition, the device adds a poison creation event to its
> internal Informational Event log, updates the Event Status register,
> and if configured, interrupts the host.
> 
> Also, per the CXL Specification, it is not an error to inject poison
> into an address that already has poison present and no error is returned
> from the device. The memdev driver performs basic sanity checking on the
> address, however, it does not go as far as reading the poison list to see
> if the address is on the list. That discovery is left to the device.
> 
> The inject_poison attribute is only visible for devices supporting
> the capability.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
A few trivial things inline.  With those fixes LGTM

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 19 +++++++++
>  drivers/cxl/core/memdev.c               | 53 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  3 ++
>  3 files changed, 75 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index b715a4609718..20db97f7a1aa 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -416,3 +416,22 @@ Description:
>  		if accessed, and the source of the poison. The retrieved
>  		errors are logged as kernel trace events with the label
>  		'cxl_poison'.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/inject_poison
> +Date:		December, 2022
> +KernelVersion:	v6.2
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute the memdev driver sends an inject poison command to
> +		the device for the specified address. If successful, the device
> +		returns poison when the address is accessed through the CXL.mem
> +		bus. Injecting poison adds the address to the device's Poison
> +		List and the error source is set to injected error. In addition,
"set to Injected."

perhaps to match spec naming in Media Error Record.

> +		the device adds a poison creation event to its internal
> +		Informational Event log, updates the Event Status register, and
> +		if configured, interrupts the host. It is not an error to inject
> +		poison into an address that already has poison present and no
> +		error is returned. The inject_poison attribute is only visible
> +                for devices supporting the capability.

White space issues (spaces instead of tabs?)

Add something about the masked bits / granularity of addresses that are accepted.

> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index d08b7295a01c..71130813030f 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -142,6 +142,51 @@ static ssize_t trigger_poison_list_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(trigger_poison_list);
>  
> +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	if (!resource_size(&cxlds->dpa_res)) {
> +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> +		return -EINVAL;
> +	}
> +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> +			dpa, &cxlds->dpa_res);
> +		return -EINVAL;
> +	}
> +	if ((dpa & CXL_POISON_INJECT_RESERVED) != 0) {
> +		dev_dbg(cxlds->dev, "dpa reserve bit(s) [5:0] set 0x%llx\n",
> +			dpa);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static ssize_t inject_poison_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> +	if (rc)
> +		return rc;
> +
> +	dpa = cpu_to_le64(dpa);
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_INJECT_POISON, &dpa,

Endianness?

> +			       sizeof(dpa), NULL, cxlds->payload_size);
> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(inject_poison);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> @@ -149,6 +194,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_label_storage_size.attr,
>  	&dev_attr_numa_node.attr,
>  	&dev_attr_trigger_poison_list.attr,
> +	&dev_attr_inject_poison.attr,
>  	NULL,
>  };
>  
> @@ -175,6 +221,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>  			return 0;
>  	}
> +	if (a == &dev_attr_inject_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);
> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> +			return 0;
> +	}
>  	return a->mode;
>  }
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 19a9e545ac19..0d4c34be7335 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -396,6 +396,9 @@ struct cxl_mbox_poison_payload_out {
>  #define CXL_POISON_SOURCE_INJECTED	3
>  #define CXL_POISON_SOURCE_VENDOR	7
>  
> +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> +#define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command
  2022-11-30 14:31   ` Jonathan Cameron
@ 2022-11-30 14:40     ` Jonathan Cameron
  2022-12-01 16:42       ` Dave Jiang
  0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2022-11-30 14:40 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, 30 Nov 2022 14:31:36 +0000
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Tue, 29 Nov 2022 20:34:33 -0800
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the INJECT POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for injecting poison.
> > 
> > When a Device Physical Address (DPA) is written to the inject_poison
> > sysfs attribute send an inject poison command to the device for the
> > specified address.
> > 
> > Per the CXL Specification (8.2.9.8.4.2), after receiving a valid
> > inject poison request, the device will return poison when the address
> > is accessed through the CXL.mem bus. Injecting poison adds the address
> > to the device's Poison List and the error source is set to injected
> > error. In addition, the device adds a poison creation event to its
> > internal Informational Event log, updates the Event Status register,
> > and if configured, interrupts the host.
> > 
> > Also, per the CXL Specification, it is not an error to inject poison
> > into an address that already has poison present and no error is returned
> > from the device. The memdev driver performs basic sanity checking on the
> > address, however, it does not go as far as reading the poison list to see
> > if the address is on the list. That discovery is left to the device.
> > 
> > The inject_poison attribute is only visible for devices supporting
> > the capability.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>  
> A few trivial things inline.  With those fixes LGTM
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl | 19 +++++++++
> >  drivers/cxl/core/memdev.c               | 53 +++++++++++++++++++++++++
> >  drivers/cxl/cxlmem.h                    |  3 ++
> >  3 files changed, 75 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index b715a4609718..20db97f7a1aa 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -416,3 +416,22 @@ Description:
> >  		if accessed, and the source of the poison. The retrieved
> >  		errors are logged as kernel trace events with the label
> >  		'cxl_poison'.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/memX/inject_poison
> > +Date:		December, 2022
> > +KernelVersion:	v6.2
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(WO) When a Device Physical Address (DPA) is written to this
> > +		attribute the memdev driver sends an inject poison command to
> > +		the device for the specified address. If successful, the device
> > +		returns poison when the address is accessed through the CXL.mem
> > +		bus. Injecting poison adds the address to the device's Poison
> > +		List and the error source is set to injected error. In addition,  
> "set to Injected."
> 
> perhaps to match spec naming in Media Error Record.
> 
> > +		the device adds a poison creation event to its internal
> > +		Informational Event log, updates the Event Status register, and
> > +		if configured, interrupts the host. It is not an error to inject
> > +		poison into an address that already has poison present and no
> > +		error is returned. The inject_poison attribute is only visible
> > +                for devices supporting the capability.  
> 
> White space issues (spaces instead of tabs?)
> 
> Add something about the masked bits / granularity of addresses that are accepted.
> 
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index d08b7295a01c..71130813030f 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -142,6 +142,51 @@ static ssize_t trigger_poison_list_store(struct device *dev,
> >  }
> >  static DEVICE_ATTR_WO(trigger_poison_list);
> >  
> > +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> > +{
> > +	if (!resource_size(&cxlds->dpa_res)) {
> > +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> > +		return -EINVAL;
> > +	}
> > +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> > +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> > +			dpa, &cxlds->dpa_res);
> > +		return -EINVAL;
> > +	}
> > +	if ((dpa & CXL_POISON_INJECT_RESERVED) != 0) {
> > +		dev_dbg(cxlds->dev, "dpa reserve bit(s) [5:0] set 0x%llx\n",
> > +			dpa);
> > +		return -EINVAL;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static ssize_t inject_poison_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	u64 dpa;
> > +	int rc;
> > +
> > +	rc = kstrtou64(buf, 0, &dpa);
> > +	if (rc)
> > +		return rc;
> > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > +	if (rc)
> > +		return rc;
> > +
> > +	dpa = cpu_to_le64(dpa);
> > +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_INJECT_POISON, &dpa,  
> 
> Endianness?

Got thrown by the type and missed the cpu_to_le64().  Use a local __le64 so it's all explicit.
One of the static analysis tools will correctly moan about storing it to a u64
(can't remember which).

> 
> > +			       sizeof(dpa), NULL, cxlds->payload_size);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(inject_poison);
> > +
> >  static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_serial.attr,
> >  	&dev_attr_firmware_version.attr,
> > @@ -149,6 +194,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_label_storage_size.attr,
> >  	&dev_attr_numa_node.attr,
> >  	&dev_attr_trigger_poison_list.attr,
> > +	&dev_attr_inject_poison.attr,
> >  	NULL,
> >  };
> >  
> > @@ -175,6 +221,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> >  			return 0;
> >  	}
> > +	if (a == &dev_attr_inject_poison.attr) {
> > +		struct device *dev = kobj_to_dev(kobj);
> > +
> > +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > +			return 0;
> > +	}
> >  	return a->mode;
> >  }
> >  
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 19a9e545ac19..0d4c34be7335 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -396,6 +396,9 @@ struct cxl_mbox_poison_payload_out {
> >  #define CXL_POISON_SOURCE_INJECTED	3
> >  #define CXL_POISON_SOURCE_VENDOR	7
> >  
> > +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> > +#define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
> > +
> >  /**
> >   * struct cxl_mem_command - Driver representation of a memory device command
> >   * @info: Command information as it exists for the UAPI  
> 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl/memdev: Add support for the Clear Poison mailbox command
  2022-11-30  4:34 ` [PATCH 2/5] cxl/memdev: Add support for the Clear " alison.schofield
@ 2022-11-30 14:43   ` Jonathan Cameron
  2022-12-01 20:14     ` Alison Schofield
  2022-12-01 17:54   ` Dave Jiang
  1 sibling, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2022-11-30 14:43 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Tue, 29 Nov 2022 20:34:34 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the CLEAR POISON mailbox command. Add
> a sysfs attribute and memdev driver support for clearing poison.
> 
> When a Device Physical Address (DPA) is written to the clear_poison
> sysfs attribute send a clear poison command to the device for the
> specified address.
> 
> Per the CXL Specification (8.2.9.8.4.3), after receiving a valid clear
> poison request, the device removes the address from the device's Poison
> List and writes 0 (zero) for 64 bytes starting at address. If the device
> cannot clear poison from the address, it returns a permanent media error
> and ENXIO is returned to the user.

-ENXIO

> 
> Additionally, and per the spec also, it is not an error to clear poison
> of an address that is not poisoned. No error is returned and the address
> is not overwritten. The memdev driver performs basic sanity checking on
> the address, however, it does not go as far as reading the poison list to
> see if the address is poisoned before clearing. That discovery is left to
> the device. The device safely handles that case.
> 
> Implementation note: Although the CXL specification defines the clear
> command to accept 64 bytes of 'write-data' to be used when clearing
> the poisoned address, this implementation always uses 0 (zeros) for
> the write-data.

Maybe put a * above to refer to this note given the spec is referenced
for stuff different from what you are doing with it.  Nice to flag
up to anyone reading this that they shouldn't write a 'no that's not
what it says' comment before reading on. (who would do something
silly like that? :)

> 
> The clear_poison attribute is only visible for devices supporting the
> capability.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Otherwise, a few really trivial things inline + it made me notice I'd missread
the code for patch 1, hence the reply to my reply.

With this stuff tweaked.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 17 +++++++++
>  drivers/cxl/core/memdev.c               | 47 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  6 ++++
>  3 files changed, 70 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 20db97f7a1aa..9d2b0fa07e17 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -435,3 +435,20 @@ Description:
>  		poison into an address that already has poison present and no
>  		error is returned. The inject_poison attribute is only visible
>                  for devices supporting the capability.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/clear_poison
> +Date:		December, 2022
> +KernelVersion:	v6.2
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute the memdev driver sends a clear poison command to the
> +		device for the specified address. Clearing poison removes the
> +		address from the device's Poison List and writes 0 (zero)
> +		for 64 bytes starting at address. It is not an error to clear
> +		poison from an address that does not have poison set, and if
> +		poison was not set, the address is not overwritten. If the
> +		device cannot clear poison from the address, ENXIO is returned.

-ENXIO ?

> +		The clear_poison attribute is only visible for devices
> +		supporting the capability.
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 71130813030f..85caffd5a85c 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -187,6 +187,44 @@ static ssize_t inject_poison_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(inject_poison);
>  
> +static ssize_t clear_poison_store(struct device *dev,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_mbox_clear_poison *pi;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> +	if (rc)
> +		return rc;
Trivial:
blank line here.  Kind of make sense to keep the string parser and validation in
one block, but good to then separate that from the next bit of code.

> +	pi = kzalloc(sizeof(*pi), GFP_KERNEL);
> +	if (!pi)
> +		return -ENOMEM;
> +	/*
> +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> +	 * is defined to accept 64 bytes of 'write-data', along with the
> +	 * address to clear. The device writes 'write-data' into the DPA,
> +	 * atomically, while clearing poison if the location is marked as
> +	 * being poisoned.
> +	 *
> +	 * Always use '0' for the write-data.
> +	 */
> +	pi->address = cpu_to_le64(dpa);
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_POISON, pi,
> +			       sizeof(*pi), NULL, cxlds->payload_size);
> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(clear_poison);
...




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] tools/testing/cxl: Mock the Inject Poison mailbox command
  2022-11-30  4:34 ` [PATCH 3/5] tools/testing/cxl: Mock the Inject " alison.schofield
@ 2022-11-30 14:58   ` Jonathan Cameron
  2022-12-08  4:47     ` Alison Schofield
  0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2022-11-30 14:58 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Tue, 29 Nov 2022 20:34:35 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Mock the injection of poison by storing the device:address entry in a
> cxl_test array: mock_poison[]. Limit the array to 64 entries and fail
> new inject requests when full.
> 
> Introducing the mock_poison[] list here, makes it available for use in
> the mock of Clear Poison, and the mock of Get Poison List.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Main question I have here is whether we want to mock per device injected poison
lists, or just one global.  I think we want per device so we can reflect
the limits as would be retrieved from Identify Memory Device Output Payload.
Whilst we don't do anything useful with it yet, we should also update
the mocked response to that command to reflect this.

Perhaps we should have a sysfs attribute to read how many entries we can
inject?  Seems like that would be useful for testing flows on real devices,
particularly if the device only supports a very small number!

> ---
>  tools/testing/cxl/test/mem.c | 53 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index a4f81915ec03..98acb9a644df 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -13,6 +13,7 @@
>  #define LSA_SIZE SZ_128K
>  #define DEV_SIZE SZ_2G
>  #define EFFECT(x) (1U << x)
> +#define MOCK_INJECT_POISON_MAX 64
>  
>  static struct cxl_cel_entry mock_cel[] = {
>  	{
> @@ -43,6 +44,10 @@ static struct cxl_cel_entry mock_cel[] = {
>  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
>  		.effect = cpu_to_le16(0),
>  	},
> +	{
> +		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
> +		.effect = cpu_to_le16(0),
> +	},
>  };
>  
>  /* See CXL 2.0 Table 181 Get Health Info Output Payload */
> @@ -210,6 +215,51 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
>  	return 0;
>  }
>  
> +static struct mock_poison {
> +	struct cxl_dev_state *cxlds;
> +	u64 dpa;
> +} mock_poison[MOCK_INJECT_POISON_MAX];

Don't we want one of these per device instance?

> +
> +static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> +		if (!mock_poison[i].cxlds) {
> +			mock_poison[i].cxlds = cxlds;
> +			mock_poison[i].dpa = dpa;
> +			return true;
> +		}
> +	}
> +	dev_dbg(cxlds->dev, "Mock poison list full: %d\n",
> +		MOCK_INJECT_POISON_MAX);
Slightly nicer maybe to use the text from
https://elixir.bootlin.com/linux/latest/source/drivers/cxl/cxlmem.h#L128
and have: "Mock poison injection limit has been reached: %d\n" ..

> +	return false;
> +}


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] tools/testing/cxl: Mock the Clear Poison mailbox command
  2022-11-30  4:34 ` [PATCH 4/5] tools/testing/cxl: Mock the Clear " alison.schofield
@ 2022-11-30 15:01   ` Jonathan Cameron
  0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2022-11-30 15:01 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Tue, 29 Nov 2022 20:34:36 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Mock the clear of poison by deleting the device:address entry from
> the cxl_test array: mock_poison[]. Behave like a real CXL device and do
> not fail if the address is not in the poison list, but offer a dev_dbg()
> message.
> 
> Unlike a real CXL device, no data is written to the address being cleared.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Trivial comment inline. Subject to comment on previous patch about
doing per device injected poison lists...

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  tools/testing/cxl/test/mem.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 98acb9a644df..9d794fbe5ee1 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -48,6 +48,10 @@ static struct cxl_cel_entry mock_cel[] = {
>  		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
>  		.effect = cpu_to_le16(0),
>  	},
> +	{
> +		.opcode = cpu_to_le16(CXL_MBOX_OP_CLEAR_POISON),
> +		.effect = cpu_to_le16(0),
> +	},
>  };
>  
>  /* See CXL 2.0 Table 181 Get Health Info Output Payload */
> @@ -244,6 +248,35 @@ static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
>  	return false;
>  }
>  
> +static bool mock_poison_del(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> +		if (mock_poison[i].cxlds == cxlds &&
> +		    mock_poison[i].dpa == dpa) {
> +			mock_poison[i].cxlds = NULL;
> +			return true;
> +		}
> +	}
> +	return false;
> +}
> +
> +static int mock_clear_poison(struct cxl_dev_state *cxlds,
> +			     struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_mbox_clear_poison *pi = cmd->payload_in;
> +

Odd blank line. I'd put it after the next line instead.

> +	u64 dpa = le64_to_cpu(pi->address);
> +	/*
> +	 * A real CXL device will write pi->write_data to the address
> +	 * being cleared. In this mock, just delete this address from
> +	 * the mock poison list.
> +	 */
> +	if (!mock_poison_del(cxlds, dpa))
> +		dev_dbg(cxlds->dev, "DPA: 0x%llx not in poison list\n", dpa);
> +
> +	return 0;
> +}
> +
>  static int mock_inject_poison(struct cxl_dev_state *cxlds,
>  			      struct cxl_mbox_cmd *cmd)
>  {
> @@ -321,6 +354,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
>  	case CXL_MBOX_OP_INJECT_POISON:
>  		rc = mock_inject_poison(cxlds, cmd);
>  		break;
> +	case CXL_MBOX_OP_CLEAR_POISON:
> +		rc = mock_clear_poison(cxlds, cmd);
> +		break;
>  	default:
>  		break;
>  	}


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] tools/testing/cxl: Use injected poison for Get Poison List
  2022-11-30  4:34 ` [PATCH 5/5] tools/testing/cxl: Use injected poison for Get Poison List alison.schofield
@ 2022-11-30 15:15   ` Jonathan Cameron
  2022-12-08  4:30     ` Alison Schofield
  0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2022-11-30 15:15 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Tue, 29 Nov 2022 20:34:37 -0800
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Prior to poison inject & clear support, the mock of 'Get Poison List'
> returned a list containing a single mocked error record.
> 
> Now that the mock of Inject and Clear poison maintains a mock_poison[]
> list, use that when available for the device. Otherwise, return the
> existing default mocked error record.
> 
> This enables roundtrip userspace testing of the inject, clear, and
> poison list support.

I'm not that keen on having an unclearable record, unless you handle that
with the right error flow for clear poison.
Plus I think you should always return that as well as the injected poison list
so this is consistent.

Whether it matters to model it that well in the mocking code?  Up to you...

Jonathan



> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  tools/testing/cxl/test/mem.c | 97 +++++++++++++++++++++++++-----------
>  1 file changed, 69 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 9d794fbe5ee1..31c147cf2d5b 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -224,6 +224,75 @@ static struct mock_poison {
>  	u64 dpa;
>  } mock_poison[MOCK_INJECT_POISON_MAX];
>  
> +struct mock_poison_po {
> +	struct cxl_mbox_poison_payload_out poison_out;
> +	struct cxl_poison_record record[MOCK_INJECT_POISON_MAX];

Maybe use a variable length final element [] so that you can then
create a 1 record version on demand where it is used for
the default record.

> +};
> +
> +/*
> + * Use the default poison payload when no injected poison is currently
> + * mocked for this device. Mock one poison record with length 64 bytes.
> + */
> +static struct mock_poison_po default_poison_po = {
> +	.poison_out.count = cpu_to_le16(1),
> +	.record[0].length = cpu_to_le32(1),
> +};
> +
> +static struct mock_poison_po *cxl_get_injected_po(struct cxl_dev_state *cxlds,
> +						  u64 offset, u64 length)
> +{
> +	struct mock_poison_po *po;
> +	int nr_records = 0;
> +	u64 dpa;
> +
> +	po = kzalloc(sizeof(*po), GFP_KERNEL);
> +	if (!po)
> +		return NULL;
> +
> +	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> +		if (mock_poison[i].cxlds != cxlds)
> +			continue;
> +		if (mock_poison[i].dpa < offset ||
> +		    mock_poison[i].dpa > offset + length - 1)
> +			continue;
> +		dpa = cpu_to_le64(mock_poison[i].dpa |
> +				  CXL_POISON_SOURCE_INJECTED);
> +		po->record[nr_records].address = dpa;
> +		po->record[nr_records].length = cpu_to_le32(1);
> +		nr_records++;
> +	}
> +	if (!nr_records) {
> +		kfree(po);
> +		return NULL;
> +	}
> +	po->poison_out.count = cpu_to_le16(nr_records);
> +	return po;
> +}
> +
> +static int mock_get_poison(struct cxl_dev_state *cxlds,
> +			   struct cxl_mbox_cmd *cmd)
> +{

Why the function move?  If you want to do this, a trivial move only
patch first would have bad it slightly easier to review. But given it's small
not worth doing now, particularly as most of the function changes anyway.


> +	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
> +	struct mock_poison_po *po;
> +
> +	po = cxl_get_injected_po(cxlds, pi->offset, pi->length);
> +	if (!po) {
> +		po = &default_poison_po;

Keep the comment from below on what this value is. It is weird enough
to be worth a comment.

I'd be tempted to make the default_poison_po const and use a copy
of it here to avoid fun races on reads from multiple devices.
Or just allocate a 1 element version and fill it here.  However,
I think we should really have both the default and injected elements.

> +		po->record[0].address = cpu_to_le64(pi->offset |
> +						    CXL_POISON_SOURCE_INTERNAL);
> +	}
> +	if (cmd->size_out < sizeof(po))
> +		return -EINVAL;

This isn't technically correct, though it's probably fine for how we currently
use it. Should really assume full mailbox size and fill on that basis (record
count etc) but only copy the size of the target buffer.
Not sure we care enough about this being 'right' as opposed to useful.

> +
> +	memcpy(cmd->payload_out, po, sizeof(*po));
> +	cmd->size_out = sizeof(po);
> +
> +	if (po != &default_poison_po)
> +		kfree(po);
> +
> +	return 0;
> +}
> +
>  static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
>  {
>  	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> @@ -293,34 +362,6 @@ static int mock_inject_poison(struct cxl_dev_state *cxlds,
>  	return 0;
>  }
>  
> -static int mock_get_poison(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> -{
> -	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
> -
> -	struct {
> -		struct cxl_mbox_poison_payload_out poison_out;
> -		struct cxl_poison_record record;
> -	} mock_poison_list = {
> -		.poison_out = {
> -			.count = cpu_to_le16(1),
> -		},
> -		/* Mock one poison record at pi.offset for 64 bytes */
> -		.record = {
> -			/* .address encodes DPA and poison source bits */
> -			.address = cpu_to_le64(pi->offset |
> -					       CXL_POISON_SOURCE_INTERNAL),
> -			.length = cpu_to_le32(1),
> -		},
> -	};
> -
> -	if (cmd->size_out < sizeof(mock_poison_list))
> -		return -EINVAL;
> -
> -	memcpy(cmd->payload_out, &mock_poison_list, sizeof(mock_poison_list));
> -	cmd->size_out = sizeof(mock_poison_list);
> -	return 0;
> -}
> -
>  static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  {
>  	struct device *dev = cxlds->dev;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command
  2022-11-30 14:40     ` Jonathan Cameron
@ 2022-12-01 16:42       ` Dave Jiang
  2022-12-08  4:20         ` Alison Schofield
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2022-12-01 16:42 UTC (permalink / raw)
  To: Jonathan Cameron, alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, linux-cxl



On 11/30/2022 7:40 AM, Jonathan Cameron wrote:
> On Wed, 30 Nov 2022 14:31:36 +0000
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
>> On Tue, 29 Nov 2022 20:34:33 -0800
>> alison.schofield@intel.com wrote:
>>
>>> From: Alison Schofield <alison.schofield@intel.com>
>>>
>>> CXL devices optionally support the INJECT POISON mailbox command. Add
>>> a sysfs attribute and memdev driver support for injecting poison.
>>>
>>> When a Device Physical Address (DPA) is written to the inject_poison
>>> sysfs attribute send an inject poison command to the device for the
>>> specified address.
>>>
>>> Per the CXL Specification (8.2.9.8.4.2), after receiving a valid
>>> inject poison request, the device will return poison when the address
>>> is accessed through the CXL.mem bus. Injecting poison adds the address
>>> to the device's Poison List and the error source is set to injected
>>> error. In addition, the device adds a poison creation event to its
>>> internal Informational Event log, updates the Event Status register,
>>> and if configured, interrupts the host.
>>>
>>> Also, per the CXL Specification, it is not an error to inject poison
>>> into an address that already has poison present and no error is returned
>>> from the device. The memdev driver performs basic sanity checking on the
>>> address, however, it does not go as far as reading the poison list to see
>>> if the address is on the list. That discovery is left to the device.
>>>
>>> The inject_poison attribute is only visible for devices supporting
>>> the capability.
>>>
>>> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
>> A few trivial things inline.  With those fixes LGTM
>>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>
>>> ---
>>>   Documentation/ABI/testing/sysfs-bus-cxl | 19 +++++++++
>>>   drivers/cxl/core/memdev.c               | 53 +++++++++++++++++++++++++
>>>   drivers/cxl/cxlmem.h                    |  3 ++
>>>   3 files changed, 75 insertions(+)
>>>
>>> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
>>> index b715a4609718..20db97f7a1aa 100644
>>> --- a/Documentation/ABI/testing/sysfs-bus-cxl
>>> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
>>> @@ -416,3 +416,22 @@ Description:
>>>   		if accessed, and the source of the poison. The retrieved
>>>   		errors are logged as kernel trace events with the label
>>>   		'cxl_poison'.
>>> +
>>> +
>>> +What:		/sys/bus/cxl/devices/memX/inject_poison
>>> +Date:		December, 2022
>>> +KernelVersion:	v6.2
>>> +Contact:	linux-cxl@vger.kernel.org
>>> +Description:
>>> +		(WO) When a Device Physical Address (DPA) is written to this
>>> +		attribute the memdev driver sends an inject poison command to
>>> +		the device for the specified address. If successful, the device
>>> +		returns poison when the address is accessed through the CXL.mem
>>> +		bus. Injecting poison adds the address to the device's Poison
>>> +		List and the error source is set to injected error. In addition,
>> "set to Injected."
>>
>> perhaps to match spec naming in Media Error Record.
>>
>>> +		the device adds a poison creation event to its internal
>>> +		Informational Event log, updates the Event Status register, and
>>> +		if configured, interrupts the host. It is not an error to inject
>>> +		poison into an address that already has poison present and no
>>> +		error is returned. The inject_poison attribute is only visible
>>> +                for devices supporting the capability.
>>
>> White space issues (spaces instead of tabs?)
>>
>> Add something about the masked bits / granularity of addresses that are accepted.
>>
>>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>>> index d08b7295a01c..71130813030f 100644
>>> --- a/drivers/cxl/core/memdev.c
>>> +++ b/drivers/cxl/core/memdev.c
>>> @@ -142,6 +142,51 @@ static ssize_t trigger_poison_list_store(struct device *dev,
>>>   }
>>>   static DEVICE_ATTR_WO(trigger_poison_list);
>>>   
>>> +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
>>> +{
>>> +	if (!resource_size(&cxlds->dpa_res)) {
>>> +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
>>> +		return -EINVAL;
>>> +	}
>>> +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
>>> +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
>>> +			dpa, &cxlds->dpa_res);
>>> +		return -EINVAL;
>>> +	}
>>> +	if ((dpa & CXL_POISON_INJECT_RESERVED) != 0) {
>>> +		dev_dbg(cxlds->dev, "dpa reserve bit(s) [5:0] set 0x%llx\n",
>>> +			dpa);
>>> +		return -EINVAL;
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>> +static ssize_t inject_poison_store(struct device *dev,
>>> +				   struct device_attribute *attr,
>>> +				   const char *buf, size_t len)
>>> +{
>>> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>>> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>>> +	u64 dpa;
>>> +	int rc;
>>> +
>>> +	rc = kstrtou64(buf, 0, &dpa);
>>> +	if (rc)
>>> +		return rc;
>>> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
>>> +	if (rc)
>>> +		return rc;
>>> +
>>> +	dpa = cpu_to_le64(dpa);
>>> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_INJECT_POISON, &dpa,
>>
>> Endianness?
> 
> Got thrown by the type and missed the cpu_to_le64().  Use a local __le64 so it's all explicit.
> One of the static analysis tools will correctly moan about storing it to a u64
> (can't remember which).

Sparse. I just got yelled at by 0-day for something similar. :)

> 
>>
>>> +			       sizeof(dpa), NULL, cxlds->payload_size);
>>> +	if (rc)
>>> +		return rc;
>>> +
>>> +	return len;
>>> +}
>>> +static DEVICE_ATTR_WO(inject_poison);
>>> +
>>>   static struct attribute *cxl_memdev_attributes[] = {
>>>   	&dev_attr_serial.attr,
>>>   	&dev_attr_firmware_version.attr,
>>> @@ -149,6 +194,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>>>   	&dev_attr_label_storage_size.attr,
>>>   	&dev_attr_numa_node.attr,
>>>   	&dev_attr_trigger_poison_list.attr,
>>> +	&dev_attr_inject_poison.attr,
>>>   	NULL,
>>>   };
>>>   
>>> @@ -175,6 +221,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>>>   			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>>>   			return 0;
>>>   	}
>>> +	if (a == &dev_attr_inject_poison.attr) {
>>> +		struct device *dev = kobj_to_dev(kobj);
>>> +
>>> +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
>>> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>>> +			return 0;
>>> +	}
>>>   	return a->mode;
>>>   }
>>>   
>>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>>> index 19a9e545ac19..0d4c34be7335 100644
>>> --- a/drivers/cxl/cxlmem.h
>>> +++ b/drivers/cxl/cxlmem.h
>>> @@ -396,6 +396,9 @@ struct cxl_mbox_poison_payload_out {
>>>   #define CXL_POISON_SOURCE_INJECTED	3
>>>   #define CXL_POISON_SOURCE_VENDOR	7
>>>   
>>> +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
>>> +#define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
>>> +
>>>   /**
>>>    * struct cxl_mem_command - Driver representation of a memory device command
>>>    * @info: Command information as it exists for the UAPI
>>
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command
  2022-11-30  4:34 ` [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
  2022-11-30 14:31   ` Jonathan Cameron
@ 2022-12-01 17:26   ` Dave Jiang
  2022-12-08  4:17     ` Alison Schofield
  2022-12-04 22:04   ` Dan Williams
  2 siblings, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2022-12-01 17:26 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl



On 11/29/2022 9:34 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the INJECT POISON mailbox command. Add
> a sysfs attribute and memdev driver support for injecting poison.
> 
> When a Device Physical Address (DPA) is written to the inject_poison
> sysfs attribute send an inject poison command to the device for the

A comma between attribute and send would make this read better.

> specified address.
> 
> Per the CXL Specification (8.2.9.8.4.2), after receiving a valid

spec v3?

> inject poison request, the device will return poison when the address
> is accessed through the CXL.mem bus. Injecting poison adds the address
> to the device's Poison List and the error source is set to injected

comma before 'and'.
'the' before 'injected'

> error. In addition, the device adds a poison creation event to its
> internal Informational Event log, updates the Event Status register,
> and if configured, interrupts the host.
> 
> Also, per the CXL Specification, it is not an error to inject poison
> into an address that already has poison present and no error is returned

comma before 'and'.

DJ

> from the device. The memdev driver performs basic sanity checking on the
> address, however, it does not go as far as reading the poison list to see
> if the address is on the list. That discovery is left to the device.
> 
> The inject_poison attribute is only visible for devices supporting
> the capability.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>   Documentation/ABI/testing/sysfs-bus-cxl | 19 +++++++++
>   drivers/cxl/core/memdev.c               | 53 +++++++++++++++++++++++++
>   drivers/cxl/cxlmem.h                    |  3 ++
>   3 files changed, 75 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index b715a4609718..20db97f7a1aa 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -416,3 +416,22 @@ Description:
>   		if accessed, and the source of the poison. The retrieved
>   		errors are logged as kernel trace events with the label
>   		'cxl_poison'.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/inject_poison
> +Date:		December, 2022
> +KernelVersion:	v6.2
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute the memdev driver sends an inject poison command to
> +		the device for the specified address. If successful, the device
> +		returns poison when the address is accessed through the CXL.mem
> +		bus. Injecting poison adds the address to the device's Poison
> +		List and the error source is set to injected error. In addition,
> +		the device adds a poison creation event to its internal
> +		Informational Event log, updates the Event Status register, and
> +		if configured, interrupts the host. It is not an error to inject
> +		poison into an address that already has poison present and no
> +		error is returned. The inject_poison attribute is only visible
> +                for devices supporting the capability.
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index d08b7295a01c..71130813030f 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -142,6 +142,51 @@ static ssize_t trigger_poison_list_store(struct device *dev,
>   }
>   static DEVICE_ATTR_WO(trigger_poison_list);
>   
> +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	if (!resource_size(&cxlds->dpa_res)) {
> +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> +		return -EINVAL;
> +	}
> +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> +			dpa, &cxlds->dpa_res);
> +		return -EINVAL;
> +	}
> +	if ((dpa & CXL_POISON_INJECT_RESERVED) != 0) {
> +		dev_dbg(cxlds->dev, "dpa reserve bit(s) [5:0] set 0x%llx\n",
> +			dpa);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static ssize_t inject_poison_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> +	if (rc)
> +		return rc;
> +
> +	dpa = cpu_to_le64(dpa);
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_INJECT_POISON, &dpa,
> +			       sizeof(dpa), NULL, cxlds->payload_size);
> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(inject_poison);
> +
>   static struct attribute *cxl_memdev_attributes[] = {
>   	&dev_attr_serial.attr,
>   	&dev_attr_firmware_version.attr,
> @@ -149,6 +194,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>   	&dev_attr_label_storage_size.attr,
>   	&dev_attr_numa_node.attr,
>   	&dev_attr_trigger_poison_list.attr,
> +	&dev_attr_inject_poison.attr,
>   	NULL,
>   };
>   
> @@ -175,6 +221,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>   			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>   			return 0;
>   	}
> +	if (a == &dev_attr_inject_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);
> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> +			return 0;
> +	}
>   	return a->mode;
>   }
>   
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 19a9e545ac19..0d4c34be7335 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -396,6 +396,9 @@ struct cxl_mbox_poison_payload_out {
>   #define CXL_POISON_SOURCE_INJECTED	3
>   #define CXL_POISON_SOURCE_VENDOR	7
>   
> +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> +#define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
> +
>   /**
>    * struct cxl_mem_command - Driver representation of a memory device command
>    * @info: Command information as it exists for the UAPI

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl/memdev: Add support for the Clear Poison mailbox command
  2022-11-30  4:34 ` [PATCH 2/5] cxl/memdev: Add support for the Clear " alison.schofield
  2022-11-30 14:43   ` Jonathan Cameron
@ 2022-12-01 17:54   ` Dave Jiang
  2022-12-01 20:09     ` Alison Schofield
  1 sibling, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2022-12-01 17:54 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl



On 11/29/2022 9:34 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the CLEAR POISON mailbox command. Add
> a sysfs attribute and memdev driver support for clearing poison.
> 
> When a Device Physical Address (DPA) is written to the clear_poison
> sysfs attribute send a clear poison command to the device for the

comma between 'attribute' and 'send'

> specified address.
> 
> Per the CXL Specification (8.2.9.8.4.3), after receiving a valid clear

Please add spec version.

> poison request, the device removes the address from the device's Poison
> List and writes 0 (zero) for 64 bytes starting at address. If the device
> cannot clear poison from the address, it returns a permanent media error
> and ENXIO is returned to the user.
> 
> Additionally, and per the spec also, it is not an error to clear poison
> of an address that is not poisoned. No error is returned and the address
> is not overwritten. The memdev driver performs basic sanity checking on
> the address, however, it does not go as far as reading the poison list to
> see if the address is poisoned before clearing. That discovery is left to
> the device. The device safely handles that case.
> 
> Implementation note: Although the CXL specification defines the clear
> command to accept 64 bytes of 'write-data' to be used when clearing
> the poisoned address, this implementation always uses 0 (zeros) for
> the write-data.
> 
> The clear_poison attribute is only visible for devices supporting the
> capability.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>   Documentation/ABI/testing/sysfs-bus-cxl | 17 +++++++++
>   drivers/cxl/core/memdev.c               | 47 +++++++++++++++++++++++++
>   drivers/cxl/cxlmem.h                    |  6 ++++
>   3 files changed, 70 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 20db97f7a1aa..9d2b0fa07e17 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -435,3 +435,20 @@ Description:
>   		poison into an address that already has poison present and no
>   		error is returned. The inject_poison attribute is only visible
>                   for devices supporting the capability.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/clear_poison
> +Date:		December, 2022
> +KernelVersion:	v6.2
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute the memdev driver sends a clear poison command to the

comma between 'attribute' and 'the'.

DJ

> +		device for the specified address. Clearing poison removes the
> +		address from the device's Poison List and writes 0 (zero)
> +		for 64 bytes starting at address. It is not an error to clear
> +		poison from an address that does not have poison set, and if
> +		poison was not set, the address is not overwritten. If the
> +		device cannot clear poison from the address, ENXIO is returned.
> +		The clear_poison attribute is only visible for devices
> +		supporting the capability.
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 71130813030f..85caffd5a85c 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -187,6 +187,44 @@ static ssize_t inject_poison_store(struct device *dev,
>   }
>   static DEVICE_ATTR_WO(inject_poison);
>   
> +static ssize_t clear_poison_store(struct device *dev,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_mbox_clear_poison *pi;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> +	if (rc)
> +		return rc;
> +	pi = kzalloc(sizeof(*pi), GFP_KERNEL);
> +	if (!pi)
> +		return -ENOMEM;
> +	/*
> +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> +	 * is defined to accept 64 bytes of 'write-data', along with the
> +	 * address to clear. The device writes 'write-data' into the DPA,
> +	 * atomically, while clearing poison if the location is marked as
> +	 * being poisoned.
> +	 *
> +	 * Always use '0' for the write-data.
> +	 */
> +	pi->address = cpu_to_le64(dpa);
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_POISON, pi,
> +			       sizeof(*pi), NULL, cxlds->payload_size);
> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(clear_poison);
> +
>   static struct attribute *cxl_memdev_attributes[] = {
>   	&dev_attr_serial.attr,
>   	&dev_attr_firmware_version.attr,
> @@ -195,6 +233,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>   	&dev_attr_numa_node.attr,
>   	&dev_attr_trigger_poison_list.attr,
>   	&dev_attr_inject_poison.attr,
> +	&dev_attr_clear_poison.attr,
>   	NULL,
>   };
>   
> @@ -228,6 +267,14 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>   			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>   			return 0;
>   	}
> +	if (a == &dev_attr_clear_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);
> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
> +			return 0;
> +		}
> +	}
>   	return a->mode;
>   }
>   
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 0d4c34be7335..532adf9c3afd 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -399,6 +399,12 @@ struct cxl_mbox_poison_payload_out {
>   /* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
>   #define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
>   
> +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> +struct cxl_mbox_clear_poison {
> +	__le64 address;
> +	u8 write_data[64];
> +} __packed;
> +
>   /**
>    * struct cxl_mem_command - Driver representation of a memory device command
>    * @info: Command information as it exists for the UAPI

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl/memdev: Add support for the Clear Poison mailbox command
  2022-12-01 17:54   ` Dave Jiang
@ 2022-12-01 20:09     ` Alison Schofield
  0 siblings, 0 replies; 25+ messages in thread
From: Alison Schofield @ 2022-12-01 20:09 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, linux-cxl

On Thu, Dec 01, 2022 at 10:54:47AM -0700, Dave Jiang wrote:
> 
> 
> On 11/29/2022 9:34 PM, alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the CLEAR POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for clearing poison.
> > 
> > When a Device Physical Address (DPA) is written to the clear_poison
> > sysfs attribute send a clear poison command to the device for the
> 
> comma between 'attribute' and 'send'

Thanks for the review Dave!  Addressed this and your suggestions
below also.

> 
> > specified address.
> > 
> > Per the CXL Specification (8.2.9.8.4.3), after receiving a valid clear
> 
> Please add spec version.
> 
> > poison request, the device removes the address from the device's Poison
> > List and writes 0 (zero) for 64 bytes starting at address. If the device
> > cannot clear poison from the address, it returns a permanent media error
> > and ENXIO is returned to the user.
> > 
> > Additionally, and per the spec also, it is not an error to clear poison
> > of an address that is not poisoned. No error is returned and the address
> > is not overwritten. The memdev driver performs basic sanity checking on
> > the address, however, it does not go as far as reading the poison list to
> > see if the address is poisoned before clearing. That discovery is left to
> > the device. The device safely handles that case.
> > 
> > Implementation note: Although the CXL specification defines the clear
> > command to accept 64 bytes of 'write-data' to be used when clearing
> > the poisoned address, this implementation always uses 0 (zeros) for
> > the write-data.
> > 
> > The clear_poison attribute is only visible for devices supporting the
> > capability.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >   Documentation/ABI/testing/sysfs-bus-cxl | 17 +++++++++
> >   drivers/cxl/core/memdev.c               | 47 +++++++++++++++++++++++++
> >   drivers/cxl/cxlmem.h                    |  6 ++++
> >   3 files changed, 70 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 20db97f7a1aa..9d2b0fa07e17 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -435,3 +435,20 @@ Description:
> >   		poison into an address that already has poison present and no
> >   		error is returned. The inject_poison attribute is only visible
> >                   for devices supporting the capability.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/memX/clear_poison
> > +Date:		December, 2022
> > +KernelVersion:	v6.2
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(WO) When a Device Physical Address (DPA) is written to this
> > +		attribute the memdev driver sends a clear poison command to the
> 
> comma between 'attribute' and 'the'.
> 
> DJ
> 
> > +		device for the specified address. Clearing poison removes the
> > +		address from the device's Poison List and writes 0 (zero)
> > +		for 64 bytes starting at address. It is not an error to clear
> > +		poison from an address that does not have poison set, and if
> > +		poison was not set, the address is not overwritten. If the
> > +		device cannot clear poison from the address, ENXIO is returned.
> > +		The clear_poison attribute is only visible for devices
> > +		supporting the capability.
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index 71130813030f..85caffd5a85c 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -187,6 +187,44 @@ static ssize_t inject_poison_store(struct device *dev,
> >   }
> >   static DEVICE_ATTR_WO(inject_poison);
> > +static ssize_t clear_poison_store(struct device *dev,
> > +				  struct device_attribute *attr,
> > +				  const char *buf, size_t len)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct cxl_mbox_clear_poison *pi;
> > +	u64 dpa;
> > +	int rc;
> > +
> > +	rc = kstrtou64(buf, 0, &dpa);
> > +	if (rc)
> > +		return rc;
> > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > +	if (rc)
> > +		return rc;
> > +	pi = kzalloc(sizeof(*pi), GFP_KERNEL);
> > +	if (!pi)
> > +		return -ENOMEM;
> > +	/*
> > +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> > +	 * is defined to accept 64 bytes of 'write-data', along with the
> > +	 * address to clear. The device writes 'write-data' into the DPA,
> > +	 * atomically, while clearing poison if the location is marked as
> > +	 * being poisoned.
> > +	 *
> > +	 * Always use '0' for the write-data.
> > +	 */
> > +	pi->address = cpu_to_le64(dpa);
> > +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_POISON, pi,
> > +			       sizeof(*pi), NULL, cxlds->payload_size);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(clear_poison);
> > +
> >   static struct attribute *cxl_memdev_attributes[] = {
> >   	&dev_attr_serial.attr,
> >   	&dev_attr_firmware_version.attr,
> > @@ -195,6 +233,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> >   	&dev_attr_numa_node.attr,
> >   	&dev_attr_trigger_poison_list.attr,
> >   	&dev_attr_inject_poison.attr,
> > +	&dev_attr_clear_poison.attr,
> >   	NULL,
> >   };
> > @@ -228,6 +267,14 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >   			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> >   			return 0;
> >   	}
> > +	if (a == &dev_attr_clear_poison.attr) {
> > +		struct device *dev = kobj_to_dev(kobj);
> > +
> > +		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
> > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
> > +			return 0;
> > +		}
> > +	}
> >   	return a->mode;
> >   }
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 0d4c34be7335..532adf9c3afd 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -399,6 +399,12 @@ struct cxl_mbox_poison_payload_out {
> >   /* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> >   #define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
> > +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> > +struct cxl_mbox_clear_poison {
> > +	__le64 address;
> > +	u8 write_data[64];
> > +} __packed;
> > +
> >   /**
> >    * struct cxl_mem_command - Driver representation of a memory device command
> >    * @info: Command information as it exists for the UAPI

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl/memdev: Add support for the Clear Poison mailbox command
  2022-11-30 14:43   ` Jonathan Cameron
@ 2022-12-01 20:14     ` Alison Schofield
  0 siblings, 0 replies; 25+ messages in thread
From: Alison Schofield @ 2022-12-01 20:14 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, Nov 30, 2022 at 02:43:30PM +0000, Jonathan Cameron wrote:
> On Tue, 29 Nov 2022 20:34:34 -0800
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the CLEAR POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for clearing poison.
> > 
> > When a Device Physical Address (DPA) is written to the clear_poison
> > sysfs attribute send a clear poison command to the device for the
> > specified address.
> > 
> > Per the CXL Specification (8.2.9.8.4.3), after receiving a valid clear
> > poison request, the device removes the address from the device's Poison
> > List and writes 0 (zero) for 64 bytes starting at address. If the device
> > cannot clear poison from the address, it returns a permanent media error
> > and ENXIO is returned to the user.
> 
> -ENXIO
> 
> > 
> > Additionally, and per the spec also, it is not an error to clear poison
> > of an address that is not poisoned. No error is returned and the address
> > is not overwritten. The memdev driver performs basic sanity checking on
> > the address, however, it does not go as far as reading the poison list to
> > see if the address is poisoned before clearing. That discovery is left to
> > the device. The device safely handles that case.
> > 
> > Implementation note: Although the CXL specification defines the clear
> > command to accept 64 bytes of 'write-data' to be used when clearing
> > the poisoned address, this implementation always uses 0 (zeros) for
> > the write-data.
> 
> Maybe put a * above to refer to this note given the spec is referenced
> for stuff different from what you are doing with it.  Nice to flag
> up to anyone reading this that they shouldn't write a 'no that's not
> what it says' comment before reading on. (who would do something
> silly like that? :)
> 

Rereading the 2 paragraphs above, it's not flowing for me now either.
I hop between 'spec says' and 'driver does'. Let me give that another
pass.

> > 
> > The clear_poison attribute is only visible for devices supporting the
> > capability.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Otherwise, a few really trivial things inline + it made me notice I'd missread
> the code for patch 1, hence the reply to my reply.
> 
> With this stuff tweaked.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

I'll pick up the stuff below too.
Thanks!

> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl | 17 +++++++++
> >  drivers/cxl/core/memdev.c               | 47 +++++++++++++++++++++++++
> >  drivers/cxl/cxlmem.h                    |  6 ++++
> >  3 files changed, 70 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 20db97f7a1aa..9d2b0fa07e17 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -435,3 +435,20 @@ Description:
> >  		poison into an address that already has poison present and no
> >  		error is returned. The inject_poison attribute is only visible
> >                  for devices supporting the capability.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/memX/clear_poison
> > +Date:		December, 2022
> > +KernelVersion:	v6.2
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(WO) When a Device Physical Address (DPA) is written to this
> > +		attribute the memdev driver sends a clear poison command to the
> > +		device for the specified address. Clearing poison removes the
> > +		address from the device's Poison List and writes 0 (zero)
> > +		for 64 bytes starting at address. It is not an error to clear
> > +		poison from an address that does not have poison set, and if
> > +		poison was not set, the address is not overwritten. If the
> > +		device cannot clear poison from the address, ENXIO is returned.
> 
> -ENXIO ?
> 
> > +		The clear_poison attribute is only visible for devices
> > +		supporting the capability.
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index 71130813030f..85caffd5a85c 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -187,6 +187,44 @@ static ssize_t inject_poison_store(struct device *dev,
> >  }
> >  static DEVICE_ATTR_WO(inject_poison);
> >  
> > +static ssize_t clear_poison_store(struct device *dev,
> > +				  struct device_attribute *attr,
> > +				  const char *buf, size_t len)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct cxl_mbox_clear_poison *pi;
> > +	u64 dpa;
> > +	int rc;
> > +
> > +	rc = kstrtou64(buf, 0, &dpa);
> > +	if (rc)
> > +		return rc;
> > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > +	if (rc)
> > +		return rc;
> Trivial:
> blank line here.  Kind of make sense to keep the string parser and validation in
> one block, but good to then separate that from the next bit of code.
> 
> > +	pi = kzalloc(sizeof(*pi), GFP_KERNEL);
> > +	if (!pi)
> > +		return -ENOMEM;
> > +	/*
> > +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> > +	 * is defined to accept 64 bytes of 'write-data', along with the
> > +	 * address to clear. The device writes 'write-data' into the DPA,
> > +	 * atomically, while clearing poison if the location is marked as
> > +	 * being poisoned.
> > +	 *
> > +	 * Always use '0' for the write-data.
> > +	 */
> > +	pi->address = cpu_to_le64(dpa);
> > +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_POISON, pi,
> > +			       sizeof(*pi), NULL, cxlds->payload_size);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(clear_poison);
> ...
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command
  2022-11-30  4:34 ` [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
  2022-11-30 14:31   ` Jonathan Cameron
  2022-12-01 17:26   ` Dave Jiang
@ 2022-12-04 22:04   ` Dan Williams
  2022-12-08  4:16     ` Alison Schofield
  2 siblings, 1 reply; 25+ messages in thread
From: Dan Williams @ 2022-12-04 22:04 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma,
	Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the INJECT POISON mailbox command. Add
> a sysfs attribute and memdev driver support for injecting poison.
> 
> When a Device Physical Address (DPA) is written to the inject_poison
> sysfs attribute send an inject poison command to the device for the
> specified address.
> 
> Per the CXL Specification (8.2.9.8.4.2), after receiving a valid
> inject poison request, the device will return poison when the address
> is accessed through the CXL.mem bus. Injecting poison adds the address
> to the device's Poison List and the error source is set to injected
> error. In addition, the device adds a poison creation event to its
> internal Informational Event log, updates the Event Status register,
> and if configured, interrupts the host.
> 
> Also, per the CXL Specification, it is not an error to inject poison
> into an address that already has poison present and no error is returned
> from the device. The memdev driver performs basic sanity checking on the
> address, however, it does not go as far as reading the poison list to see
> if the address is on the list. That discovery is left to the device.
> 
> The inject_poison attribute is only visible for devices supporting
> the capability.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 19 +++++++++
>  drivers/cxl/core/memdev.c               | 53 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  3 ++
>  3 files changed, 75 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index b715a4609718..20db97f7a1aa 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -416,3 +416,22 @@ Description:
>  		if accessed, and the source of the poison. The retrieved
>  		errors are logged as kernel trace events with the label
>  		'cxl_poison'.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/inject_poison
> +Date:		December, 2022
> +KernelVersion:	v6.2
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute the memdev driver sends an inject poison command to
> +		the device for the specified address. If successful, the device
> +		returns poison when the address is accessed through the CXL.mem
> +		bus. Injecting poison adds the address to the device's Poison
> +		List and the error source is set to injected error. In addition,
> +		the device adds a poison creation event to its internal
> +		Informational Event log, updates the Event Status register, and
> +		if configured, interrupts the host. It is not an error to inject
> +		poison into an address that already has poison present and no
> +		error is returned. The inject_poison attribute is only visible

This description needs to clarify how much poison is injected and that
the address needs to be 64-byte aligned, or what happens if the address
is not aligned.

I also think this whole facility needs to have its own Kconfig symbol so
that it can be disabled at build time, and it likely needs to disable
itself at run time if Linux fails obtain CXL error reporting control
from the BIOS.

> +                for devices supporting the capability.

Stray indenting...

> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index d08b7295a01c..71130813030f 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -142,6 +142,51 @@ static ssize_t trigger_poison_list_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(trigger_poison_list);
>  
> +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	if (!resource_size(&cxlds->dpa_res)) {
> +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> +		return -EINVAL;
> +	}
> +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> +			dpa, &cxlds->dpa_res);
> +		return -EINVAL;
> +	}
> +	if ((dpa & CXL_POISON_INJECT_RESERVED) != 0) {
> +		dev_dbg(cxlds->dev, "dpa reserve bit(s) [5:0] set 0x%llx\n",
> +			dpa);

I think it's less that those bits are reserved and more that poison is
tracked in terms of 64-byte aligned cache lines. So I would do:

    if (!IS_ALIGNED(dpa, 64))

> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static ssize_t inject_poison_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> +	if (rc)
> +		return rc;
> +
> +	dpa = cpu_to_le64(dpa);
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_INJECT_POISON, &dpa,
> +			       sizeof(dpa), NULL, cxlds->payload_size);

I would make it clear that the input payload is a single little endian
64-bit address with something like this:

struct cxl_mbox_poison_inject cmd;

cmd = (struct cxl_mbox_poison_inject) {
	.dpa = __le64_to_cpu(dpa),
};

...where cxl_mbox_poison_inject is:

struct cxl_mbox_poison_inject {
        __le64 dpa;
};

...otherwise its strange to assign the result of cpu_to_le64() to a u64,
I expect sparse would complain about that.

> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(inject_poison);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> @@ -149,6 +194,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_label_storage_size.attr,
>  	&dev_attr_numa_node.attr,
>  	&dev_attr_trigger_poison_list.attr,
> +	&dev_attr_inject_poison.attr,
>  	NULL,
>  };
>  
> @@ -175,6 +221,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>  			return 0;
>  	}
> +	if (a == &dev_attr_inject_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);
> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> +			return 0;
> +	}
>  	return a->mode;
>  }
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 19a9e545ac19..0d4c34be7335 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -396,6 +396,9 @@ struct cxl_mbox_poison_payload_out {
>  #define CXL_POISON_SOURCE_INJECTED	3
>  #define CXL_POISON_SOURCE_VENDOR	7
>  
> +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> +#define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
> -- 
> 2.37.3
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command
  2022-12-04 22:04   ` Dan Williams
@ 2022-12-08  4:16     ` Alison Schofield
  0 siblings, 0 replies; 25+ messages in thread
From: Alison Schofield @ 2022-12-08  4:16 UTC (permalink / raw)
  To: Dan Williams; +Cc: Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang, linux-cxl

On Sun, Dec 04, 2022 at 02:04:45PM -0800, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the INJECT POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for injecting poison.
> > 
> > When a Device Physical Address (DPA) is written to the inject_poison
> > sysfs attribute send an inject poison command to the device for the
> > specified address.
> > 
> > Per the CXL Specification (8.2.9.8.4.2), after receiving a valid
> > inject poison request, the device will return poison when the address
> > is accessed through the CXL.mem bus. Injecting poison adds the address
> > to the device's Poison List and the error source is set to injected
> > error. In addition, the device adds a poison creation event to its
> > internal Informational Event log, updates the Event Status register,
> > and if configured, interrupts the host.
> > 
> > Also, per the CXL Specification, it is not an error to inject poison
> > into an address that already has poison present and no error is returned
> > from the device. The memdev driver performs basic sanity checking on the
> > address, however, it does not go as far as reading the poison list to see
> > if the address is on the list. That discovery is left to the device.
> > 
> > The inject_poison attribute is only visible for devices supporting
> > the capability.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl | 19 +++++++++
> >  drivers/cxl/core/memdev.c               | 53 +++++++++++++++++++++++++
> >  drivers/cxl/cxlmem.h                    |  3 ++
> >  3 files changed, 75 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index b715a4609718..20db97f7a1aa 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -416,3 +416,22 @@ Description:
> >  		if accessed, and the source of the poison. The retrieved
> >  		errors are logged as kernel trace events with the label
> >  		'cxl_poison'.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/memX/inject_poison
> > +Date:		December, 2022
> > +KernelVersion:	v6.2
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(WO) When a Device Physical Address (DPA) is written to this
> > +		attribute the memdev driver sends an inject poison command to
> > +		the device for the specified address. If successful, the device
> > +		returns poison when the address is accessed through the CXL.mem
> > +		bus. Injecting poison adds the address to the device's Poison
> > +		List and the error source is set to injected error. In addition,
> > +		the device adds a poison creation event to its internal
> > +		Informational Event log, updates the Event Status register, and
> > +		if configured, interrupts the host. It is not an error to inject
> > +		poison into an address that already has poison present and no
> > +		error is returned. The inject_poison attribute is only visible
> 
> This description needs to clarify how much poison is injected and that
> the address needs to be 64-byte aligned, or what happens if the address
> is not aligned.

Will clarify!
Is 64 bytes and if it's not aligned, fail -EINVAL.

> 
> I also think this whole facility needs to have its own Kconfig symbol so
> that it can be disabled at build time, and it likely needs to disable
> itself at run time if Linux fails obtain CXL error reporting control
> from the BIOS.

wrt Kconfig and 'whole facility' I'm thinking the Kconfig control
is for Inject&Clear, and that you are not thinking Get Poison List and
Scan Media are part of the group that needs a Kconfig control.

I'm seeking clarification on the error reporting control - as in,
confirmation that the Poison commands need to acquire that control.

> 
> > +                for devices supporting the capability.
> 
> Stray indenting...
> 
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index d08b7295a01c..71130813030f 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -142,6 +142,51 @@ static ssize_t trigger_poison_list_store(struct device *dev,
> >  }
> >  static DEVICE_ATTR_WO(trigger_poison_list);
> >  
> > +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> > +{
> > +	if (!resource_size(&cxlds->dpa_res)) {
> > +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> > +		return -EINVAL;
> > +	}
> > +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> > +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> > +			dpa, &cxlds->dpa_res);
> > +		return -EINVAL;
> > +	}
> > +	if ((dpa & CXL_POISON_INJECT_RESERVED) != 0) {
> > +		dev_dbg(cxlds->dev, "dpa reserve bit(s) [5:0] set 0x%llx\n",
> > +			dpa);
> 
> I think it's less that those bits are reserved and more that poison is
> tracked in terms of 64-byte aligned cache lines. So I would do:
> 
>     if (!IS_ALIGNED(dpa, 64))
> 

:) thanks!


> > +		return -EINVAL;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static ssize_t inject_poison_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	u64 dpa;
> > +	int rc;
> > +
> > +	rc = kstrtou64(buf, 0, &dpa);
> > +	if (rc)
> > +		return rc;
> > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > +	if (rc)
> > +		return rc;
> > +
> > +	dpa = cpu_to_le64(dpa);
> > +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_INJECT_POISON, &dpa,
> > +			       sizeof(dpa), NULL, cxlds->payload_size);
> 
> I would make it clear that the input payload is a single little endian
> 64-bit address with something like this:
> 
> struct cxl_mbox_poison_inject cmd;
> 
> cmd = (struct cxl_mbox_poison_inject) {
> 	.dpa = __le64_to_cpu(dpa),
> };
> 
> ...where cxl_mbox_poison_inject is:
> 
> struct cxl_mbox_poison_inject {
>         __le64 dpa;
> };
> 
> ...otherwise its strange to assign the result of cpu_to_le64() to a u64,
> I expect sparse would complain about that.

I did fix up the sparse warning but not w a struct like you suggest.
I was treating this input payload unfairly and unclearly by not giving
it it's own struct definition. I will give it a struct!

> 
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(inject_poison);
> > +
> >  static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_serial.attr,
> >  	&dev_attr_firmware_version.attr,
> > @@ -149,6 +194,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_label_storage_size.attr,
> >  	&dev_attr_numa_node.attr,
> >  	&dev_attr_trigger_poison_list.attr,
> > +	&dev_attr_inject_poison.attr,
> >  	NULL,
> >  };
> >  
> > @@ -175,6 +221,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> >  			return 0;
> >  	}
> > +	if (a == &dev_attr_inject_poison.attr) {
> > +		struct device *dev = kobj_to_dev(kobj);
> > +
> > +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > +			return 0;
> > +	}
> >  	return a->mode;
> >  }
> >  
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 19a9e545ac19..0d4c34be7335 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -396,6 +396,9 @@ struct cxl_mbox_poison_payload_out {
> >  #define CXL_POISON_SOURCE_INJECTED	3
> >  #define CXL_POISON_SOURCE_VENDOR	7
> >  
> > +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> > +#define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
> > +
> >  /**
> >   * struct cxl_mem_command - Driver representation of a memory device command
> >   * @info: Command information as it exists for the UAPI
> > -- 
> > 2.37.3
> > 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command
  2022-12-01 17:26   ` Dave Jiang
@ 2022-12-08  4:17     ` Alison Schofield
  0 siblings, 0 replies; 25+ messages in thread
From: Alison Schofield @ 2022-12-08  4:17 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, linux-cxl

On Thu, Dec 01, 2022 at 10:26:16AM -0700, Dave Jiang wrote:
> 
> 
> On 11/29/2022 9:34 PM, alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the INJECT POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for injecting poison.
> > 
> > When a Device Physical Address (DPA) is written to the inject_poison
> > sysfs attribute send an inject poison command to the device for the
> 
> A comma between attribute and send would make this read better.
> 
> > specified address.
> > 
> > Per the CXL Specification (8.2.9.8.4.2), after receiving a valid
> 
> spec v3?
> 
> > inject poison request, the device will return poison when the address
> > is accessed through the CXL.mem bus. Injecting poison adds the address
> > to the device's Poison List and the error source is set to injected
> 
> comma before 'and'.
> 'the' before 'injected'
> 
> > error. In addition, the device adds a poison creation event to its
> > internal Informational Event log, updates the Event Status register,
> > and if configured, interrupts the host.
> > 
> > Also, per the CXL Specification, it is not an error to inject poison
> > into an address that already has poison present and no error is returned
> 
> comma before 'and'.
> 
> DJ

Thanks Dave!
> 
snip
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command
  2022-12-01 16:42       ` Dave Jiang
@ 2022-12-08  4:20         ` Alison Schofield
  0 siblings, 0 replies; 25+ messages in thread
From: Alison Schofield @ 2022-12-08  4:20 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Jonathan Cameron, Dan Williams, Ira Weiny, Vishal Verma,
	Ben Widawsky, linux-cxl

On Thu, Dec 01, 2022 at 09:42:01AM -0700, Dave Jiang wrote:
> 
> 
> On 11/30/2022 7:40 AM, Jonathan Cameron wrote:
> > On Wed, 30 Nov 2022 14:31:36 +0000
> > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> > 
> > > On Tue, 29 Nov 2022 20:34:33 -0800
> > > alison.schofield@intel.com wrote:
> > > 
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > > 
> > > > CXL devices optionally support the INJECT POISON mailbox command. Add
> > > > a sysfs attribute and memdev driver support for injecting poison.
> > > > 
> > > > When a Device Physical Address (DPA) is written to the inject_poison
> > > > sysfs attribute send an inject poison command to the device for the
> > > > specified address.
> > > > 
> > > > Per the CXL Specification (8.2.9.8.4.2), after receiving a valid
> > > > inject poison request, the device will return poison when the address
> > > > is accessed through the CXL.mem bus. Injecting poison adds the address
> > > > to the device's Poison List and the error source is set to injected
> > > > error. In addition, the device adds a poison creation event to its
> > > > internal Informational Event log, updates the Event Status register,
> > > > and if configured, interrupts the host.
> > > > 
> > > > Also, per the CXL Specification, it is not an error to inject poison
> > > > into an address that already has poison present and no error is returned
> > > > from the device. The memdev driver performs basic sanity checking on the
> > > > address, however, it does not go as far as reading the poison list to see
> > > > if the address is on the list. That discovery is left to the device.
> > > > 
> > > > The inject_poison attribute is only visible for devices supporting
> > > > the capability.
> > > > 
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > A few trivial things inline.  With those fixes LGTM
> > > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > 
> > > > ---
> > > >   Documentation/ABI/testing/sysfs-bus-cxl | 19 +++++++++
> > > >   drivers/cxl/core/memdev.c               | 53 +++++++++++++++++++++++++
> > > >   drivers/cxl/cxlmem.h                    |  3 ++
> > > >   3 files changed, 75 insertions(+)
> > > > 
> > > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > > > index b715a4609718..20db97f7a1aa 100644
> > > > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > > > @@ -416,3 +416,22 @@ Description:
> > > >   		if accessed, and the source of the poison. The retrieved
> > > >   		errors are logged as kernel trace events with the label
> > > >   		'cxl_poison'.
> > > > +
> > > > +
> > > > +What:		/sys/bus/cxl/devices/memX/inject_poison
> > > > +Date:		December, 2022
> > > > +KernelVersion:	v6.2
> > > > +Contact:	linux-cxl@vger.kernel.org
> > > > +Description:
> > > > +		(WO) When a Device Physical Address (DPA) is written to this
> > > > +		attribute the memdev driver sends an inject poison command to
> > > > +		the device for the specified address. If successful, the device
> > > > +		returns poison when the address is accessed through the CXL.mem
> > > > +		bus. Injecting poison adds the address to the device's Poison
> > > > +		List and the error source is set to injected error. In addition,
> > > "set to Injected."
> > > 
> > > perhaps to match spec naming in Media Error Record.
> > > 
> > > > +		the device adds a poison creation event to its internal
> > > > +		Informational Event log, updates the Event Status register, and
> > > > +		if configured, interrupts the host. It is not an error to inject
> > > > +		poison into an address that already has poison present and no
> > > > +		error is returned. The inject_poison attribute is only visible
> > > > +                for devices supporting the capability.
> > > 
> > > White space issues (spaces instead of tabs?)
> > > 
> > > Add something about the masked bits / granularity of addresses that are accepted.
> > > 
> > > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > > > index d08b7295a01c..71130813030f 100644
> > > > --- a/drivers/cxl/core/memdev.c
> > > > +++ b/drivers/cxl/core/memdev.c
> > > > @@ -142,6 +142,51 @@ static ssize_t trigger_poison_list_store(struct device *dev,
> > > >   }
> > > >   static DEVICE_ATTR_WO(trigger_poison_list);
> > > > +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> > > > +{
> > > > +	if (!resource_size(&cxlds->dpa_res)) {
> > > > +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> > > > +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> > > > +			dpa, &cxlds->dpa_res);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +	if ((dpa & CXL_POISON_INJECT_RESERVED) != 0) {
> > > > +		dev_dbg(cxlds->dev, "dpa reserve bit(s) [5:0] set 0x%llx\n",
> > > > +			dpa);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static ssize_t inject_poison_store(struct device *dev,
> > > > +				   struct device_attribute *attr,
> > > > +				   const char *buf, size_t len)
> > > > +{
> > > > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > > > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > > > +	u64 dpa;
> > > > +	int rc;
> > > > +
> > > > +	rc = kstrtou64(buf, 0, &dpa);
> > > > +	if (rc)
> > > > +		return rc;
> > > > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > > > +	if (rc)
> > > > +		return rc;
> > > > +
> > > > +	dpa = cpu_to_le64(dpa);
> > > > +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_INJECT_POISON, &dpa,
> > > 
> > > Endianness?
> > 
> > Got thrown by the type and missed the cpu_to_le64().  Use a local __le64 so it's all explicit.
> > One of the static analysis tools will correctly moan about storing it to a u64
> > (can't remember which).
> 
> Sparse. I just got yelled at by 0-day for something similar. :)
> 

Thanks for all your endian-ness eyes.

> > 
> > > 
> > > > +			       sizeof(dpa), NULL, cxlds->payload_size);
> > > > +	if (rc)
> > > > +		return rc;
> > > > +
> > > > +	return len;
> > > > +}
> > > > +static DEVICE_ATTR_WO(inject_poison);
> > > > +
> > > >   static struct attribute *cxl_memdev_attributes[] = {
> > > >   	&dev_attr_serial.attr,
> > > >   	&dev_attr_firmware_version.attr,
> > > > @@ -149,6 +194,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> > > >   	&dev_attr_label_storage_size.attr,
> > > >   	&dev_attr_numa_node.attr,
> > > >   	&dev_attr_trigger_poison_list.attr,
> > > > +	&dev_attr_inject_poison.attr,
> > > >   	NULL,
> > > >   };
> > > > @@ -175,6 +221,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> > > >   			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > > >   			return 0;
> > > >   	}
> > > > +	if (a == &dev_attr_inject_poison.attr) {
> > > > +		struct device *dev = kobj_to_dev(kobj);
> > > > +
> > > > +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> > > > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > > > +			return 0;
> > > > +	}
> > > >   	return a->mode;
> > > >   }
> > > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > > index 19a9e545ac19..0d4c34be7335 100644
> > > > --- a/drivers/cxl/cxlmem.h
> > > > +++ b/drivers/cxl/cxlmem.h
> > > > @@ -396,6 +396,9 @@ struct cxl_mbox_poison_payload_out {
> > > >   #define CXL_POISON_SOURCE_INJECTED	3
> > > >   #define CXL_POISON_SOURCE_VENDOR	7
> > > > +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> > > > +#define CXL_POISON_INJECT_RESERVED	GENMASK_ULL(5, 0)
> > > > +
> > > >   /**
> > > >    * struct cxl_mem_command - Driver representation of a memory device command
> > > >    * @info: Command information as it exists for the UAPI
> > > 
> > > 
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] tools/testing/cxl: Use injected poison for Get Poison List
  2022-11-30 15:15   ` Jonathan Cameron
@ 2022-12-08  4:30     ` Alison Schofield
  2022-12-08 14:54       ` Jonathan Cameron
  0 siblings, 1 reply; 25+ messages in thread
From: Alison Schofield @ 2022-12-08  4:30 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, Nov 30, 2022 at 03:15:22PM +0000, Jonathan Cameron wrote:
> On Tue, 29 Nov 2022 20:34:37 -0800
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Prior to poison inject & clear support, the mock of 'Get Poison List'
> > returned a list containing a single mocked error record.
> > 
> > Now that the mock of Inject and Clear poison maintains a mock_poison[]
> > list, use that when available for the device. Otherwise, return the
> > existing default mocked error record.
> > 
> > This enables roundtrip userspace testing of the inject, clear, and
> > poison list support.
> 
> I'm not that keen on having an unclearable record, unless you handle that
> with the right error flow for clear poison.
> Plus I think you should always return that as well as the injected poison list
> so this is consistent.
> 
> Whether it matters to model it that well in the mocking code?  Up to you...
> 
> Jonathan
> 
I think I went off the rails here ;)
With this mock of Inject and Clear, returning totally fake errors is
useless. So, I would change my above statement to say:

"Now that the mock of Inject and Clear poison maintains a mock_poison[]
ist, use that when available for the device."

Do you agree with that usage Jonathan?  If the user wants to get errors
when they read the poison list, they need to inject them.
I guess I also could flip the order of these patchsets and do away
with the fake poison list entirely.

a bit more below...

> 
> 
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  tools/testing/cxl/test/mem.c | 97 +++++++++++++++++++++++++-----------
> >  1 file changed, 69 insertions(+), 28 deletions(-)
> > 
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index 9d794fbe5ee1..31c147cf2d5b 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> > @@ -224,6 +224,75 @@ static struct mock_poison {
> >  	u64 dpa;
> >  } mock_poison[MOCK_INJECT_POISON_MAX];
> >  
> > +struct mock_poison_po {
> > +	struct cxl_mbox_poison_payload_out poison_out;
> > +	struct cxl_poison_record record[MOCK_INJECT_POISON_MAX];
> 
> Maybe use a variable length final element [] so that you can then
> create a 1 record version on demand where it is used for
> the default record.
> 
> > +};
> > +
> > +/*
> > + * Use the default poison payload when no injected poison is currently
> > + * mocked for this device. Mock one poison record with length 64 bytes.
> > + */
> > +static struct mock_poison_po default_poison_po = {
> > +	.poison_out.count = cpu_to_le16(1),
> > +	.record[0].length = cpu_to_le32(1),
> > +};
> > +
> > +static struct mock_poison_po *cxl_get_injected_po(struct cxl_dev_state *cxlds,
> > +						  u64 offset, u64 length)
> > +{
> > +	struct mock_poison_po *po;
> > +	int nr_records = 0;
> > +	u64 dpa;
> > +
> > +	po = kzalloc(sizeof(*po), GFP_KERNEL);
> > +	if (!po)
> > +		return NULL;
> > +
> > +	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> > +		if (mock_poison[i].cxlds != cxlds)
> > +			continue;
> > +		if (mock_poison[i].dpa < offset ||
> > +		    mock_poison[i].dpa > offset + length - 1)
> > +			continue;
> > +		dpa = cpu_to_le64(mock_poison[i].dpa |
> > +				  CXL_POISON_SOURCE_INJECTED);
> > +		po->record[nr_records].address = dpa;
> > +		po->record[nr_records].length = cpu_to_le32(1);
> > +		nr_records++;
> > +	}
> > +	if (!nr_records) {
> > +		kfree(po);
> > +		return NULL;
> > +	}
> > +	po->poison_out.count = cpu_to_le16(nr_records);
> > +	return po;
> > +}
> > +
> > +static int mock_get_poison(struct cxl_dev_state *cxlds,
> > +			   struct cxl_mbox_cmd *cmd)
> > +{
> 
> Why the function move?  If you want to do this, a trivial move only
> patch first would have bad it slightly easier to review. But given it's small
> not worth doing now, particularly as most of the function changes anyway.
> 

Let me see how this cleans up, including your feedback below, when 
I get rid of the either injected or default choice.

> 
> > +	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
> > +	struct mock_poison_po *po;
> > +
> > +	po = cxl_get_injected_po(cxlds, pi->offset, pi->length);
> > +	if (!po) {
> > +		po = &default_poison_po;
> 
> Keep the comment from below on what this value is. It is weird enough
> to be worth a comment.
> 
> I'd be tempted to make the default_poison_po const and use a copy
> of it here to avoid fun races on reads from multiple devices.
> Or just allocate a 1 element version and fill it here.  However,
> I think we should really have both the default and injected elements.
> 
> > +		po->record[0].address = cpu_to_le64(pi->offset |
> > +						    CXL_POISON_SOURCE_INTERNAL);
> > +	}
> > +	if (cmd->size_out < sizeof(po))
> > +		return -EINVAL;
> 
> This isn't technically correct, though it's probably fine for how we currently
> use it. Should really assume full mailbox size and fill on that basis (record
> count etc) but only copy the size of the target buffer.
> Not sure we care enough about this being 'right' as opposed to useful.
> 
> > +
> > +	memcpy(cmd->payload_out, po, sizeof(*po));
> > +	cmd->size_out = sizeof(po);
> > +
> > +	if (po != &default_poison_po)
> > +		kfree(po);
> > +
> > +	return 0;
> > +}
> > +
> >  static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> >  {
> >  	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> > @@ -293,34 +362,6 @@ static int mock_inject_poison(struct cxl_dev_state *cxlds,
> >  	return 0;
> >  }
> >  
> > -static int mock_get_poison(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> > -{
> > -	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
> > -
> > -	struct {
> > -		struct cxl_mbox_poison_payload_out poison_out;
> > -		struct cxl_poison_record record;
> > -	} mock_poison_list = {
> > -		.poison_out = {
> > -			.count = cpu_to_le16(1),
> > -		},
> > -		/* Mock one poison record at pi.offset for 64 bytes */
> > -		.record = {
> > -			/* .address encodes DPA and poison source bits */
> > -			.address = cpu_to_le64(pi->offset |
> > -					       CXL_POISON_SOURCE_INTERNAL),
> > -			.length = cpu_to_le32(1),
> > -		},
> > -	};
> > -
> > -	if (cmd->size_out < sizeof(mock_poison_list))
> > -		return -EINVAL;
> > -
> > -	memcpy(cmd->payload_out, &mock_poison_list, sizeof(mock_poison_list));
> > -	cmd->size_out = sizeof(mock_poison_list);
> > -	return 0;
> > -}
> > -
> >  static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> >  {
> >  	struct device *dev = cxlds->dev;
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] tools/testing/cxl: Mock the Inject Poison mailbox command
  2022-11-30 14:58   ` Jonathan Cameron
@ 2022-12-08  4:47     ` Alison Schofield
  2022-12-08 14:53       ` Jonathan Cameron
  0 siblings, 1 reply; 25+ messages in thread
From: Alison Schofield @ 2022-12-08  4:47 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, Nov 30, 2022 at 02:58:47PM +0000, Jonathan Cameron wrote:
> On Tue, 29 Nov 2022 20:34:35 -0800
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Mock the injection of poison by storing the device:address entry in a
> > cxl_test array: mock_poison[]. Limit the array to 64 entries and fail
> > new inject requests when full.
> > 
> > Introducing the mock_poison[] list here, makes it available for use in
> > the mock of Clear Poison, and the mock of Get Poison List.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> 
> Main question I have here is whether we want to mock per device injected poison
> lists, or just one global.  I think we want per device so we can reflect
> the limits as would be retrieved from Identify Memory Device Output Payload.
> Whilst we don't do anything useful with it yet, we should also update
> the mocked response to that command to reflect this.
> 
> Perhaps we should have a sysfs attribute to read how many entries we can
> inject?  Seems like that would be useful for testing flows on real devices,
> particularly if the device only supports a very small number!

Jonathan,

Thanks for the review and helping me see the use case. I see the usefulness of
the per device model but am not clear on the sysfs attr for number of entries.

If user wants to know how many entries can be injected, just inject til I tell
you no more!  Remember that count, run your test w it.

If user wants to dial down that number to something custom, then I can
see RW attr being useful. Mock driver sets it to a max, and user can
lower it.

Is that useful?

Alison

> 
> > ---
> >  tools/testing/cxl/test/mem.c | 53 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 53 insertions(+)
> > 
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index a4f81915ec03..98acb9a644df 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> > @@ -13,6 +13,7 @@
> >  #define LSA_SIZE SZ_128K
> >  #define DEV_SIZE SZ_2G
> >  #define EFFECT(x) (1U << x)
> > +#define MOCK_INJECT_POISON_MAX 64
> >  
> >  static struct cxl_cel_entry mock_cel[] = {
> >  	{
> > @@ -43,6 +44,10 @@ static struct cxl_cel_entry mock_cel[] = {
> >  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
> >  		.effect = cpu_to_le16(0),
> >  	},
> > +	{
> > +		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
> > +		.effect = cpu_to_le16(0),
> > +	},
> >  };
> >  
> >  /* See CXL 2.0 Table 181 Get Health Info Output Payload */
> > @@ -210,6 +215,51 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
> >  	return 0;
> >  }
> >  
> > +static struct mock_poison {
> > +	struct cxl_dev_state *cxlds;
> > +	u64 dpa;
> > +} mock_poison[MOCK_INJECT_POISON_MAX];
> 
> Don't we want one of these per device instance?
> 
> > +
> > +static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> > +{
> > +	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> > +		if (!mock_poison[i].cxlds) {
> > +			mock_poison[i].cxlds = cxlds;
> > +			mock_poison[i].dpa = dpa;
> > +			return true;
> > +		}
> > +	}
> > +	dev_dbg(cxlds->dev, "Mock poison list full: %d\n",
> > +		MOCK_INJECT_POISON_MAX);
> Slightly nicer maybe to use the text from
> https://elixir.bootlin.com/linux/latest/source/drivers/cxl/cxlmem.h#L128
> and have: "Mock poison injection limit has been reached: %d\n" ..
> 
> > +	return false;
> > +}
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] tools/testing/cxl: Mock the Inject Poison mailbox command
  2022-12-08  4:47     ` Alison Schofield
@ 2022-12-08 14:53       ` Jonathan Cameron
  0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2022-12-08 14:53 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, 7 Dec 2022 20:47:27 -0800
Alison Schofield <alison.schofield@intel.com> wrote:

> On Wed, Nov 30, 2022 at 02:58:47PM +0000, Jonathan Cameron wrote:
> > On Tue, 29 Nov 2022 20:34:35 -0800
> > alison.schofield@intel.com wrote:
> >   
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > Mock the injection of poison by storing the device:address entry in a
> > > cxl_test array: mock_poison[]. Limit the array to 64 entries and fail
> > > new inject requests when full.
> > > 
> > > Introducing the mock_poison[] list here, makes it available for use in
> > > the mock of Clear Poison, and the mock of Get Poison List.
> > > 
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>  
> > 
> > Main question I have here is whether we want to mock per device injected poison
> > lists, or just one global.  I think we want per device so we can reflect
> > the limits as would be retrieved from Identify Memory Device Output Payload.
> > Whilst we don't do anything useful with it yet, we should also update
> > the mocked response to that command to reflect this.
> > 
> > Perhaps we should have a sysfs attribute to read how many entries we can
> > inject?  Seems like that would be useful for testing flows on real devices,
> > particularly if the device only supports a very small number!  
> 
> Jonathan,
> 
> Thanks for the review and helping me see the use case. I see the usefulness of
> the per device model but am not clear on the sysfs attr for number of entries.
> 
> If user wants to know how many entries can be injected, just inject til I tell
> you no more!  Remember that count, run your test w it.
> 
> If user wants to dial down that number to something custom, then I can
> see RW attr being useful. Mock driver sets it to a max, and user can
> lower it.
> 
> Is that useful?
At this stage - no idea ;)

Probably a wait and see question. If no one asks for it, don't bother
providing the support.

Jonathan

> 
> Alison
> 
> >   
> > > ---
> > >  tools/testing/cxl/test/mem.c | 53 ++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 53 insertions(+)
> > > 
> > > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > > index a4f81915ec03..98acb9a644df 100644
> > > --- a/tools/testing/cxl/test/mem.c
> > > +++ b/tools/testing/cxl/test/mem.c
> > > @@ -13,6 +13,7 @@
> > >  #define LSA_SIZE SZ_128K
> > >  #define DEV_SIZE SZ_2G
> > >  #define EFFECT(x) (1U << x)
> > > +#define MOCK_INJECT_POISON_MAX 64
> > >  
> > >  static struct cxl_cel_entry mock_cel[] = {
> > >  	{
> > > @@ -43,6 +44,10 @@ static struct cxl_cel_entry mock_cel[] = {
> > >  		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
> > >  		.effect = cpu_to_le16(0),
> > >  	},
> > > +	{
> > > +		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
> > > +		.effect = cpu_to_le16(0),
> > > +	},
> > >  };
> > >  
> > >  /* See CXL 2.0 Table 181 Get Health Info Output Payload */
> > > @@ -210,6 +215,51 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
> > >  	return 0;
> > >  }
> > >  
> > > +static struct mock_poison {
> > > +	struct cxl_dev_state *cxlds;
> > > +	u64 dpa;
> > > +} mock_poison[MOCK_INJECT_POISON_MAX];  
> > 
> > Don't we want one of these per device instance?
> >   
> > > +
> > > +static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> > > +{
> > > +	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> > > +		if (!mock_poison[i].cxlds) {
> > > +			mock_poison[i].cxlds = cxlds;
> > > +			mock_poison[i].dpa = dpa;
> > > +			return true;
> > > +		}
> > > +	}
> > > +	dev_dbg(cxlds->dev, "Mock poison list full: %d\n",
> > > +		MOCK_INJECT_POISON_MAX);  
> > Slightly nicer maybe to use the text from
> > https://elixir.bootlin.com/linux/latest/source/drivers/cxl/cxlmem.h#L128
> > and have: "Mock poison injection limit has been reached: %d\n" ..
> >   
> > > +	return false;
> > > +}  
> >   
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] tools/testing/cxl: Use injected poison for Get Poison List
  2022-12-08  4:30     ` Alison Schofield
@ 2022-12-08 14:54       ` Jonathan Cameron
  0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2022-12-08 14:54 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Wed, 7 Dec 2022 20:30:36 -0800
Alison Schofield <alison.schofield@intel.com> wrote:

> On Wed, Nov 30, 2022 at 03:15:22PM +0000, Jonathan Cameron wrote:
> > On Tue, 29 Nov 2022 20:34:37 -0800
> > alison.schofield@intel.com wrote:
> >   
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > Prior to poison inject & clear support, the mock of 'Get Poison List'
> > > returned a list containing a single mocked error record.
> > > 
> > > Now that the mock of Inject and Clear poison maintains a mock_poison[]
> > > list, use that when available for the device. Otherwise, return the
> > > existing default mocked error record.
> > > 
> > > This enables roundtrip userspace testing of the inject, clear, and
> > > poison list support.  
> > 
> > I'm not that keen on having an unclearable record, unless you handle that
> > with the right error flow for clear poison.
> > Plus I think you should always return that as well as the injected poison list
> > so this is consistent.
> > 
> > Whether it matters to model it that well in the mocking code?  Up to you...
> > 
> > Jonathan
> >   
> I think I went off the rails here ;)
> With this mock of Inject and Clear, returning totally fake errors is
> useless. So, I would change my above statement to say:
> 
> "Now that the mock of Inject and Clear poison maintains a mock_poison[]
> ist, use that when available for the device."
> 
> Do you agree with that usage Jonathan?  If the user wants to get errors
> when they read the poison list, they need to inject them.
> I guess I also could flip the order of these patchsets and do away
> with the fake poison list entirely.

Sounds good.

> 
> a bit more below...
> 
> > 
> >   
> > > 
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  tools/testing/cxl/test/mem.c | 97 +++++++++++++++++++++++++-----------
> > >  1 file changed, 69 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > > index 9d794fbe5ee1..31c147cf2d5b 100644
> > > --- a/tools/testing/cxl/test/mem.c
> > > +++ b/tools/testing/cxl/test/mem.c
> > > @@ -224,6 +224,75 @@ static struct mock_poison {
> > >  	u64 dpa;
> > >  } mock_poison[MOCK_INJECT_POISON_MAX];
> > >  
> > > +struct mock_poison_po {
> > > +	struct cxl_mbox_poison_payload_out poison_out;
> > > +	struct cxl_poison_record record[MOCK_INJECT_POISON_MAX];  
> > 
> > Maybe use a variable length final element [] so that you can then
> > create a 1 record version on demand where it is used for
> > the default record.
> >   
> > > +};
> > > +
> > > +/*
> > > + * Use the default poison payload when no injected poison is currently
> > > + * mocked for this device. Mock one poison record with length 64 bytes.
> > > + */
> > > +static struct mock_poison_po default_poison_po = {
> > > +	.poison_out.count = cpu_to_le16(1),
> > > +	.record[0].length = cpu_to_le32(1),
> > > +};
> > > +
> > > +static struct mock_poison_po *cxl_get_injected_po(struct cxl_dev_state *cxlds,
> > > +						  u64 offset, u64 length)
> > > +{
> > > +	struct mock_poison_po *po;
> > > +	int nr_records = 0;
> > > +	u64 dpa;
> > > +
> > > +	po = kzalloc(sizeof(*po), GFP_KERNEL);
> > > +	if (!po)
> > > +		return NULL;
> > > +
> > > +	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> > > +		if (mock_poison[i].cxlds != cxlds)
> > > +			continue;
> > > +		if (mock_poison[i].dpa < offset ||
> > > +		    mock_poison[i].dpa > offset + length - 1)
> > > +			continue;
> > > +		dpa = cpu_to_le64(mock_poison[i].dpa |
> > > +				  CXL_POISON_SOURCE_INJECTED);
> > > +		po->record[nr_records].address = dpa;
> > > +		po->record[nr_records].length = cpu_to_le32(1);
> > > +		nr_records++;
> > > +	}
> > > +	if (!nr_records) {
> > > +		kfree(po);
> > > +		return NULL;
> > > +	}
> > > +	po->poison_out.count = cpu_to_le16(nr_records);
> > > +	return po;
> > > +}
> > > +
> > > +static int mock_get_poison(struct cxl_dev_state *cxlds,
> > > +			   struct cxl_mbox_cmd *cmd)
> > > +{  
> > 
> > Why the function move?  If you want to do this, a trivial move only
> > patch first would have bad it slightly easier to review. But given it's small
> > not worth doing now, particularly as most of the function changes anyway.
> >   
> 
> Let me see how this cleans up, including your feedback below, when 
> I get rid of the either injected or default choice.
> 
> >   
> > > +	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
> > > +	struct mock_poison_po *po;
> > > +
> > > +	po = cxl_get_injected_po(cxlds, pi->offset, pi->length);
> > > +	if (!po) {
> > > +		po = &default_poison_po;  
> > 
> > Keep the comment from below on what this value is. It is weird enough
> > to be worth a comment.
> > 
> > I'd be tempted to make the default_poison_po const and use a copy
> > of it here to avoid fun races on reads from multiple devices.
> > Or just allocate a 1 element version and fill it here.  However,
> > I think we should really have both the default and injected elements.
> >   
> > > +		po->record[0].address = cpu_to_le64(pi->offset |
> > > +						    CXL_POISON_SOURCE_INTERNAL);
> > > +	}
> > > +	if (cmd->size_out < sizeof(po))
> > > +		return -EINVAL;  
> > 
> > This isn't technically correct, though it's probably fine for how we currently
> > use it. Should really assume full mailbox size and fill on that basis (record
> > count etc) but only copy the size of the target buffer.
> > Not sure we care enough about this being 'right' as opposed to useful.
> >   
> > > +
> > > +	memcpy(cmd->payload_out, po, sizeof(*po));
> > > +	cmd->size_out = sizeof(po);
> > > +
> > > +	if (po != &default_poison_po)
> > > +		kfree(po);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> > >  {
> > >  	for (int i = 0; i < MOCK_INJECT_POISON_MAX; i++) {
> > > @@ -293,34 +362,6 @@ static int mock_inject_poison(struct cxl_dev_state *cxlds,
> > >  	return 0;
> > >  }
> > >  
> > > -static int mock_get_poison(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> > > -{
> > > -	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
> > > -
> > > -	struct {
> > > -		struct cxl_mbox_poison_payload_out poison_out;
> > > -		struct cxl_poison_record record;
> > > -	} mock_poison_list = {
> > > -		.poison_out = {
> > > -			.count = cpu_to_le16(1),
> > > -		},
> > > -		/* Mock one poison record at pi.offset for 64 bytes */
> > > -		.record = {
> > > -			/* .address encodes DPA and poison source bits */
> > > -			.address = cpu_to_le64(pi->offset |
> > > -					       CXL_POISON_SOURCE_INTERNAL),
> > > -			.length = cpu_to_le32(1),
> > > -		},
> > > -	};
> > > -
> > > -	if (cmd->size_out < sizeof(mock_poison_list))
> > > -		return -EINVAL;
> > > -
> > > -	memcpy(cmd->payload_out, &mock_poison_list, sizeof(mock_poison_list));
> > > -	cmd->size_out = sizeof(mock_poison_list);
> > > -	return 0;
> > > -}
> > > -
> > >  static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> > >  {
> > >  	struct device *dev = cxlds->dev;  
> >   


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-12-08 14:55 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-30  4:34 [PATCH 0/5] cxl: CXL Inject & Clear Poison alison.schofield
2022-11-30  4:34 ` [PATCH 1/5] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
2022-11-30 14:31   ` Jonathan Cameron
2022-11-30 14:40     ` Jonathan Cameron
2022-12-01 16:42       ` Dave Jiang
2022-12-08  4:20         ` Alison Schofield
2022-12-01 17:26   ` Dave Jiang
2022-12-08  4:17     ` Alison Schofield
2022-12-04 22:04   ` Dan Williams
2022-12-08  4:16     ` Alison Schofield
2022-11-30  4:34 ` [PATCH 2/5] cxl/memdev: Add support for the Clear " alison.schofield
2022-11-30 14:43   ` Jonathan Cameron
2022-12-01 20:14     ` Alison Schofield
2022-12-01 17:54   ` Dave Jiang
2022-12-01 20:09     ` Alison Schofield
2022-11-30  4:34 ` [PATCH 3/5] tools/testing/cxl: Mock the Inject " alison.schofield
2022-11-30 14:58   ` Jonathan Cameron
2022-12-08  4:47     ` Alison Schofield
2022-12-08 14:53       ` Jonathan Cameron
2022-11-30  4:34 ` [PATCH 4/5] tools/testing/cxl: Mock the Clear " alison.schofield
2022-11-30 15:01   ` Jonathan Cameron
2022-11-30  4:34 ` [PATCH 5/5] tools/testing/cxl: Use injected poison for Get Poison List alison.schofield
2022-11-30 15:15   ` Jonathan Cameron
2022-12-08  4:30     ` Alison Schofield
2022-12-08 14:54       ` Jonathan Cameron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.