linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/12] cxl: CXL Inject & Clear Poison
@ 2023-03-27  5:03 alison.schofield
  2023-03-27  5:03 ` [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
                   ` (11 more replies)
  0 siblings, 12 replies; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Built on cxl/next plus [PATCH v11 0/6] CXL Poison List Retrieval & Tracing

Changes in v5:
- Move the inject & clear sysfs attrs to debugfs attrs (Dan)
- Allow inject and clear to mapped DPAs and dev_warn_once() 
- Add cxl_poison trace events for inject and clear (CXL Community Forum)
- Add conditional compiles for CONFIG_DEBUG_FS
- Conform to spacing in uapi header cxlmem.h (Jonathan)

Tags: Kept Jonathans Reviewed-by on, what are now, patches 6-11
Link to v4:
https://lore.kernel.org/linux-cxl/cover.1677704994.git.alison.schofield@intel.com/

Introducing Inject and Clear Poison support for CXL Devices.

These are optional commands, meaning not all CXL devices must support
them. The debugfs attributes, inject_poison and clear_poison, are only
visible for devices reporting support of the capability.

Example:
# echo 0x40000000 > /sys/kernel/debug/cxl/mem1/inject_poison

cxl_poison: memdev=mem1 pcidev=cxl_mem.1 trace_type=Inject region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Unknown flags= overflow_time=0

Example in which the injected poison landed in region5:
# echo 0x40000000 > /sys/kernel/debug/cxl/mem0/inject_poison

cxl_poison: memdev=mem0 host=cxl_mem.0 serial=0 trace_type=Inject region=region5 region_uuid=cfcfc13a-5290-4983-aa74-4c8465c25f26 hpa=0xf110000000 dpa=0x40000000 dpa_length=0x40 source=Unknown flags= overflow_time=0

Alison Schofield (12):
  cxl/memdev: Add support for the Inject Poison mailbox command
  cxl/memdev: Add support for the Clear Poison mailbox command
  cxl/memdev: Warn of poison inject or clear to a mapped region
  cxl/memdev: Trace inject and clear poison as cxl_poison events
  cxl/mem: Add debugfs attributes for poison inject and clear
  cxl/memdev: Make inject and clear poison cmds kernel exclusive
  cxl/mbox: Block inject and clear poison opcodes in raw mode
  tools/testing/cxl: Mock the Inject Poison mailbox command
  tools/testing/cxl: Mock the Clear Poison mailbox command
  tools/testing/cxl: Use injected poison for get poison list
  tools/testing/cxl: Add a sysfs attr to test poison inject limits
  tools/testing/cxl: Require CONFIG_DEBUG_FS

 Documentation/ABI/testing/debugfs-cxl |  36 +++++
 drivers/cxl/core/core.h               |   2 +
 drivers/cxl/core/mbox.c               |   5 +
 drivers/cxl/core/memdev.c             | 179 +++++++++++++++++++++
 drivers/cxl/core/trace.h              |   8 +-
 drivers/cxl/cxlmem.h                  |  13 ++
 drivers/cxl/mem.c                     |  26 ++++
 include/uapi/linux/cxl_mem.h          |  20 ++-
 tools/testing/cxl/config_check.c      |   1 +
 tools/testing/cxl/test/mem.c          | 216 +++++++++++++++++++++++---
 10 files changed, 479 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/ABI/testing/debugfs-cxl

-- 
2.37.3


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-30 18:47   ` Jonathan Cameron
  2023-03-31 18:11   ` Dave Jiang
  2023-03-27  5:03 ` [PATCH v5 02/12] cxl/memdev: Add support for the Clear " alison.schofield
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the INJECT POISON mailbox command. Add
memdev driver support for the mailbox command.

Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
inject poison request, the device will return poison when the address
is accessed through the CXL.mem bus. Injecting poison adds the address
to the device's Poison List and the error source is set to Injected.
In addition, the device adds a poison creation event to its internal
Informational Event log, updates the Event Status register, and if
configured, interrupts the host.

Also, per the CXL Specification, it is not an error to inject poison
into an address that already has poison present and no error is
returned from the device.

If the address is not contained in the device's dpa resource, or is
not 64 byte aligned, return -EINVAL without issuing the mbox command.

Poison injection is intended for debug only and will be exposed to
userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/memdev.c | 55 +++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h      |  6 +++++
 2 files changed, 61 insertions(+)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index f26b5b6cda10..3b3ac2868848 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -213,6 +213,61 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_trigger_poison_list, CXL);
 
+static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
+{
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_FS))
+		return 0;
+
+	if (!resource_size(&cxlds->dpa_res)) {
+		dev_dbg(cxlds->dev, "device has no dpa resource\n");
+		return -EINVAL;
+	}
+	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
+		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
+			dpa, &cxlds->dpa_res);
+		return -EINVAL;
+	}
+	if (!IS_ALIGNED(dpa, 64)) {
+		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n", dpa);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int cxl_inject_poison(struct device *dev, u64 dpa)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_mbox_inject_poison inject;
+	struct cxl_mbox_cmd mbox_cmd;
+	int rc;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_FS))
+		return 0;
+
+	down_read(&cxl_dpa_rwsem);
+	rc = cxl_validate_poison_dpa(cxlmd, dpa);
+	if (rc)
+		goto out;
+
+	inject = (struct cxl_mbox_inject_poison) {
+		.address = cpu_to_le64(dpa)
+	};
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_INJECT_POISON,
+		.size_in = sizeof(inject),
+		.payload_in = &inject,
+	};
+	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
+out:
+	up_read(&cxl_dpa_rwsem);
+
+	return rc;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5febaa3f9b04..527efef2d700 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -602,6 +602,11 @@ struct cxl_mbox_poison_payload_out {
 #define CXL_POISON_SOURCE_INJECTED	3
 #define CXL_POISON_SOURCE_VENDOR	7
 
+/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
+struct cxl_mbox_inject_poison {
+	__le64 address;
+};
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI
@@ -678,6 +683,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 ssize_t cxl_trigger_poison_list(struct device *dev,
 				struct device_attribute *attr, const char *buf,
 				size_t len);
+int cxl_inject_poison(struct device *dev, u64 dpa);
 
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 02/12] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
  2023-03-27  5:03 ` [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-30 18:50   ` Jonathan Cameron
  2023-03-31 18:40   ` Dave Jiang
  2023-03-27  5:03 ` [PATCH v5 03/12] cxl/memdev: Warn of poison inject or clear to a mapped region alison.schofield
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the CLEAR POISON mailbox command. Add
memdev driver support for clearing poison.

Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
clear poison request, the device removes the address from the device's
Poison List and writes 0 (zero) for 64 bytes starting at address. If
the device cannot clear poison from the address, it returns a permanent
media error and -ENXIO is returned to the user.

Additionally, and per the spec also, it is not an error to clear poison
of an address that is not poisoned. In this case, the device does not
overwrite the address and the device does not return an error.

If the address is not contained in the device's dpa resource, or is
not 64 byte aligned, return -EINVAL without issuing the mbox command.

Poison clearing is intended for debug only and will be exposed to
userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.

Implementation note: Although the CXL specification defines the clear
command to accept 64 bytes of 'write-data' to be used when clearing
the poisoned address, this implementation always uses 0 (zeros) for
the write-data.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/memdev.c | 43 +++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h      |  7 +++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 3b3ac2868848..0e39c3c3fb09 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -268,6 +268,49 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
 
+int cxl_clear_poison(struct device *dev, u64 dpa)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_mbox_clear_poison clear;
+	struct cxl_mbox_cmd mbox_cmd;
+	int rc;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_FS))
+		return 0;
+
+	down_read(&cxl_dpa_rwsem);
+	rc = cxl_validate_poison_dpa(cxlmd, dpa);
+	if (rc)
+		goto out;
+
+	/*
+	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
+	 * is defined to accept 64 bytes of 'write-data', along with the
+	 * address to clear. The device writes the data into the address
+	 * atomically, while clearing poison if the location is marked as
+	 * being poisoned.
+	 *
+	 * Always use '0' for the write-data.
+	 */
+	clear = (struct cxl_mbox_clear_poison) {
+		.address = cpu_to_le64(dpa)
+	};
+
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_CLEAR_POISON,
+		.size_in = sizeof(clear),
+		.payload_in = &clear,
+	};
+
+	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
+
+out:
+	up_read(&cxl_dpa_rwsem);
+
+	return rc;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 527efef2d700..1d8677ab2306 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -607,6 +607,12 @@ struct cxl_mbox_inject_poison {
 	__le64 address;
 };
 
+/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
+struct cxl_mbox_clear_poison {
+	__le64 address;
+	u8 write_data[CXL_POISON_LEN_MULT];
+} __packed;
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI
@@ -684,6 +690,7 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
 				struct device_attribute *attr, const char *buf,
 				size_t len);
 int cxl_inject_poison(struct device *dev, u64 dpa);
+int cxl_clear_poison(struct device *dev, u64 dpa);
 
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 03/12] cxl/memdev: Warn of poison inject or clear to a mapped region
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
  2023-03-27  5:03 ` [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
  2023-03-27  5:03 ` [PATCH v5 02/12] cxl/memdev: Add support for the Clear " alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-30 18:55   ` Jonathan Cameron
  2023-03-27  5:03 ` [PATCH v5 04/12] cxl/memdev: Trace inject and clear poison as cxl_poison events alison.schofield
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Inject and clear poison capabilities and intended for debug usage only.
In order to be useful in debug environments, the driver needs to allow
inject and clear operations on DPAs mapped in regions.

dev_warn_once() when either operation occurs.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/memdev.c | 59 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 0e39c3c3fb09..a83619c31f61 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -213,6 +213,50 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_trigger_poison_list, CXL);
 
+struct cxl_dpa_to_region_context {
+	struct cxl_region *cxlr;
+	u64 dpa;
+};
+
+static int __cxl_dpa_to_region(struct device *dev, void *arg)
+{
+	struct cxl_dpa_to_region_context *ctx = arg;
+	struct cxl_endpoint_decoder *cxled;
+	u64 dpa = ctx->dpa;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
+		return 0;
+
+	if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start)
+		return 0;
+
+	dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
+		dev_name(&cxled->cxld.region->dev));
+
+	ctx->cxlr = cxled->cxld.region;
+
+	return 0;
+}
+
+static struct cxl_region *cxl_dpa_to_region(struct cxl_memdev *cxlmd, u64 dpa)
+{
+	struct cxl_dpa_to_region_context ctx;
+	struct cxl_port *port;
+
+	ctx = (struct cxl_dpa_to_region_context) {
+		.dpa = dpa,
+	};
+	port = dev_get_drvdata(&cxlmd->dev);
+	if (port && is_cxl_endpoint(port) && port->commit_end != -1)
+		device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);
+
+	return ctx.cxlr;
+}
+
 static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
 {
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
@@ -242,6 +286,7 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_mbox_inject_poison inject;
 	struct cxl_mbox_cmd mbox_cmd;
+	struct cxl_region *cxlr;
 	int rc;
 
 	if (!IS_ENABLED(CONFIG_DEBUG_FS))
@@ -261,6 +306,13 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
 		.payload_in = &inject,
 	};
 	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
+	if (rc)
+		goto out;
+
+	cxlr = cxl_dpa_to_region(cxlmd, dpa);
+	if (cxlr)
+		dev_warn_once(dev, "poison inject dpa:0x%llx region: %s\n",
+			      dpa, dev_name(&cxlr->dev));
 out:
 	up_read(&cxl_dpa_rwsem);
 
@@ -273,6 +325,7 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_mbox_clear_poison clear;
 	struct cxl_mbox_cmd mbox_cmd;
+	struct cxl_region *cxlr;
 	int rc;
 
 	if (!IS_ENABLED(CONFIG_DEBUG_FS))
@@ -303,7 +356,13 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
 	};
 
 	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
+	if (rc)
+		goto out;
 
+	cxlr = cxl_dpa_to_region(cxlmd, dpa);
+	if (cxlr)
+		dev_warn_once(dev, "poison clear dpa:0x%llx region: %s\n",
+			      dpa, dev_name(&cxlr->dev));
 out:
 	up_read(&cxl_dpa_rwsem);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 04/12] cxl/memdev: Trace inject and clear poison as cxl_poison events
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (2 preceding siblings ...)
  2023-03-27  5:03 ` [PATCH v5 03/12] cxl/memdev: Warn of poison inject or clear to a mapped region alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-30 19:03   ` Jonathan Cameron
  2023-03-31 18:53   ` Dave Jiang
  2023-03-27  5:03 ` [PATCH v5 05/12] cxl/mem: Add debugfs attributes for poison inject and clear alison.schofield
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

The cxl_poison trace event allows users to view the history of poison
list reads. With the addition of inject and clear poison capabilities,
users will expect similar tracing.

Add trace types 'Inject' and 'Clear' to the cxl_poison trace_event and
trace successful operations only.

If the driver finds that the DPA being injected or cleared of poison
is mapped in a region, that region info is included in the cxl_poison
trace event. Region reconfigurations can make this extra info useless
if the debug operations are not carefully managed.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/core.h   |  2 ++
 drivers/cxl/core/memdev.c | 16 ++++++++++++++++
 drivers/cxl/core/trace.h  |  8 +++++---
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 57bd22e01a0b..5b673eca8f12 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -71,6 +71,8 @@ void cxl_mbox_init(void);
 
 enum cxl_poison_trace_type {
 	CXL_POISON_TRACE_LIST,
+	CXL_POISON_TRACE_INJECT,
+	CXL_POISON_TRACE_CLEAR,
 };
 
 struct cxl_trigger_poison_context {
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index a83619c31f61..71ebe3795616 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -6,6 +6,7 @@
 #include <linux/idr.h>
 #include <linux/pci.h>
 #include <cxlmem.h>
+#include "trace.h"
 #include "core.h"
 
 static DECLARE_RWSEM(cxl_memdev_rwsem);
@@ -285,6 +286,7 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_mbox_inject_poison inject;
+	struct cxl_poison_record record;
 	struct cxl_mbox_cmd mbox_cmd;
 	struct cxl_region *cxlr;
 	int rc;
@@ -313,6 +315,13 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
 	if (cxlr)
 		dev_warn_once(dev, "poison inject dpa:0x%llx region: %s\n",
 			      dpa, dev_name(&cxlr->dev));
+
+	record = (struct cxl_poison_record) {
+		.address = cpu_to_le64(dpa),
+		.length = cpu_to_le32(1),
+	};
+	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
+
 out:
 	up_read(&cxl_dpa_rwsem);
 
@@ -324,6 +333,7 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_mbox_clear_poison clear;
+	struct cxl_poison_record record;
 	struct cxl_mbox_cmd mbox_cmd;
 	struct cxl_region *cxlr;
 	int rc;
@@ -363,6 +373,12 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
 	if (cxlr)
 		dev_warn_once(dev, "poison clear dpa:0x%llx region: %s\n",
 			      dpa, dev_name(&cxlr->dev));
+
+	record = (struct cxl_poison_record) {
+		.address = cpu_to_le64(dpa),
+		.length = cpu_to_le32(1),
+	};
+	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
 out:
 	up_read(&cxl_dpa_rwsem);
 
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 65d81d27cb85..5e5e29995d3e 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -602,9 +602,11 @@ TRACE_EVENT(cxl_memory_module,
 	)
 );
 
-#define show_poison_trace_type(type)		   \
-	__print_symbolic(type,			   \
-	{ CXL_POISON_TRACE_LIST,	"List"	})
+#define show_poison_trace_type(type)			\
+	__print_symbolic(type,				\
+	{ CXL_POISON_TRACE_LIST,	"List"   },	\
+	{ CXL_POISON_TRACE_INJECT,	"Inject" },	\
+	{ CXL_POISON_TRACE_CLEAR,	"Clear"  })
 
 #define __show_poison_source(source)                          \
 	__print_symbolic(source,                              \
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 05/12] cxl/mem: Add debugfs attributes for poison inject and clear
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (3 preceding siblings ...)
  2023-03-27  5:03 ` [PATCH v5 04/12] cxl/memdev: Trace inject and clear poison as cxl_poison events alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-30 18:58   ` Jonathan Cameron
  2023-03-27  5:03 ` [PATCH v5 06/12] cxl/memdev: Make inject and clear poison cmds kernel exclusive alison.schofield
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Inject and Clear Poison commands are optionally supported by CXL
memdev devices and are intended for use in debug environments only.
Add debugfs attributes for user access.

Documentation/ABI/testing/debugfs-cxl describes the usage.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/ABI/testing/debugfs-cxl | 36 +++++++++++++++++++++++++++
 drivers/cxl/mem.c                     | 26 +++++++++++++++++++
 2 files changed, 62 insertions(+)
 create mode 100644 Documentation/ABI/testing/debugfs-cxl

diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
new file mode 100644
index 000000000000..3f1624f95f11
--- /dev/null
+++ b/Documentation/ABI/testing/debugfs-cxl
@@ -0,0 +1,36 @@
+What:		/sys/kernel/debug/cxl/memX/inject_poison
+Date:		April, 2023
+KernelVersion:	v6.4
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute, the memdev driver sends an inject poison command to
+		the device for the specified address. The DPA must be 64-byte
+		aligned and the length of the injected poison is 64-bytes. If
+		successful, the device returns poison when the address is
+		accessed through the CXL.mem bus. Injecting poison adds the
+		address to the device's Poison List and the error source is set
+		to Injected. In addition, the device adds a poison creation
+		event to its internal Informational Event log, updates the
+		Event Status register, and if configured, interrupts the host.
+		It is not an error to inject poison into an address that
+		already has poison present and no error is returned. The
+		inject_poison attribute is only visible for devices supporting
+		the capability.
+
+
+What:		/sys/kernel/debug/memX/clear_poison
+Date:		April, 2023
+KernelVersion:	v6.4
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute, the memdev driver sends a clear poison command to
+		the device for the specified address. Clearing poison removes
+		the address from the device's Poison List and writes 0 (zero)
+		for 64 bytes starting at address. It is not an error to clear
+		poison from an address that does not have poison set, and if
+		poison was not set, the address is not overwritten. If the
+		device cannot clear poison from the address, -ENXIO is returned.
+		The clear_poison attribute is only visible for devices
+		supporting the capability.
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 576f5b691589..ff67ecf07001 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -94,6 +94,22 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
 	return 0;
 }
 
+static int cxl_debugfs_poison_inject(void *data, u64 dpa)
+{
+	return cxl_inject_poison(data, dpa);
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_inject_fops, NULL,
+			 cxl_debugfs_poison_inject, "%llx\n");
+
+static int cxl_debugfs_poison_clear(void *data, u64 dpa)
+{
+	return cxl_clear_poison(data, dpa);
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
+			 cxl_debugfs_poison_clear, "%llx\n");
+
 static int cxl_mem_probe(struct device *dev)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
@@ -117,6 +133,16 @@ static int cxl_mem_probe(struct device *dev)
 
 	dentry = cxl_debugfs_create_dir(dev_name(dev));
 	debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
+
+	if (test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
+		     to_cxl_memdev(dev)->cxlds->enabled_cmds))
+		debugfs_create_file("inject_poison", 0200, dentry, dev,
+				    &cxl_poison_inject_fops);
+	if (test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
+		     to_cxl_memdev(dev)->cxlds->enabled_cmds))
+		debugfs_create_file("clear_poison", 0200, dentry, dev,
+				    &cxl_poison_clear_fops);
+
 	rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
 	if (rc)
 		return rc;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 06/12] cxl/memdev: Make inject and clear poison cmds kernel exclusive
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (4 preceding siblings ...)
  2023-03-27  5:03 ` [PATCH v5 05/12] cxl/mem: Add debugfs attributes for poison inject and clear alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-31 19:10   ` Dave Jiang
  2023-03-27  5:03 ` [PATCH v5 07/12] cxl/mbox: Block inject and clear poison opcodes in raw mode alison.schofield
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

Inject and clear poison commands are intended to be used in debug
mode only, and if improperly used, can lead to data corruption. The
kernel provides a debugfs interface to issue these commands [1]

The CXL driver defines Enabled commands in its ABI.[2] Enabled means
that the device and the driver both support the command. If a device
supports inject and/or clear, those commands are flagged Enabled.

The ABI also defines another command flag: Exclusive. Exclusive
commands are reserved for kernel use. The exclusive flags can be
temporal, but for inject and clear, the status is permanent.

Document the exclusivity of Inject and Clear in the ABI kernel doc.
(Clean up a typo in kdoc too: 'CXL_MEM_COMMAND_FLAG_ENABLED')

Create an exclusive commands bitmap in the memdev driver, add the
inject and clear poison commands, and set it in the cxl_dev_state.

[1] Documentation/ABI/testing/debugfs-cxl
[2] include/uapi/linux/cxl_mem.h

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cxl/core/memdev.c    |  6 ++++++
 include/uapi/linux/cxl_mem.h | 20 +++++++++++++++-----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 71ebe3795616..617d8378ca9a 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -11,6 +11,8 @@
 
 static DECLARE_RWSEM(cxl_memdev_rwsem);
 
+static __read_mostly DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
+
 /*
  * An entire PCI topology full of devices should be enough for any
  * config
@@ -628,6 +630,10 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
 	cxlmd->cxlds = cxlds;
 	cxlds->cxlmd = cxlmd;
 
+	set_bit(CXL_MEM_COMMAND_ID_INJECT_POISON, exclusive_cmds);
+	set_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON, exclusive_cmds);
+	set_exclusive_cxl_commands(cxlds, exclusive_cmds);
+
 	cdev = &cxlmd->cdev;
 	rc = cdev_device_add(cdev, dev);
 	if (rc)
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 86bbacf2a315..6294278f9dcb 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -74,17 +74,27 @@ static const struct {
  * @id: ID number for the command.
  * @flags: Flags that specify command behavior.
  *
- *         CXL_MEM_COMMAND_FLAG_USER_ENABLED
+ *         CXL_MEM_COMMAND_FLAG_ENABLED
  *
  *         The given command id is supported by the driver and is supported by
  *         a related opcode on the device.
  *
  *         CXL_MEM_COMMAND_FLAG_EXCLUSIVE
  *
- *         Requests with the given command id will terminate with EBUSY as the
- *         kernel actively owns management of the given resource. For example,
- *         the label-storage-area can not be written while the kernel is
- *         actively managing that space.
+ *         The given command id is for kernel exclusive use and is not
+ *         available to userspace. Requests will terminate with EBUSY.
+ *
+ *         The exclusive flag may be temporal, and only set while the
+ *         kernel actively owns management of the given resource. For
+ *         example, the label-storage-area can not be written while the
+ *         kernel is actively managing that space.
+ *
+ *         The exclusive flag can be permanent, as in commands that can
+ *         never be issued through the ioctl interface.
+ *
+ *         INJECT_POISON and CLEAR_POISON are permanently kernel exclusive,
+ *         and are supported through a debugfs interface.
+ *         See: Documentation/ABI/testing/debugfs-cxl
  *
  * @size_in: Expected input size, or ~0 if variable length.
  * @size_out: Expected output size, or ~0 if variable length.
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 07/12] cxl/mbox: Block inject and clear poison opcodes in raw mode
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (5 preceding siblings ...)
  2023-03-27  5:03 ` [PATCH v5 06/12] cxl/memdev: Make inject and clear poison cmds kernel exclusive alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-31 19:10   ` Dave Jiang
  2023-03-27  5:03 ` [PATCH v5 08/12] tools/testing/cxl: Mock the Inject Poison mailbox command alison.schofield
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

Inject and clear poison are commands intended for debug environments,
and can cause data corruption if issued without validation. They are
kernel exclusive commands not available to userspace through ioctls,
but could be submitted via the raw mode ioctl.

Add inject and clear poison to the cxl_disabled_raw_commands[] list.
Attempts by userspace to issue either command via the RAW ioctl fail
with -EPERM.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cxl/core/mbox.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index a8369ef56f61..b380208f85c2 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -89,6 +89,9 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
  *
  * CXL_MBOX_OP_[GET_]SCAN_MEDIA: The kernel provides a native error list that
  * is kept up to date with patrol notifications and error management.
+ *
+ * CXL_MBOX_OP_[INJECT|CLEAR]_POISON: The kernel provides a debugfs interface
+ * to these commands. See: Documentation/ABI/testing/debugfs-cxl
  */
 static u16 cxl_disabled_raw_commands[] = {
 	CXL_MBOX_OP_ACTIVATE_FW,
@@ -97,6 +100,8 @@ static u16 cxl_disabled_raw_commands[] = {
 	CXL_MBOX_OP_SET_SHUTDOWN_STATE,
 	CXL_MBOX_OP_SCAN_MEDIA,
 	CXL_MBOX_OP_GET_SCAN_MEDIA,
+	CXL_MBOX_OP_INJECT_POISON,
+	CXL_MBOX_OP_CLEAR_POISON,
 };
 
 /*
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 08/12] tools/testing/cxl: Mock the Inject Poison mailbox command
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (6 preceding siblings ...)
  2023-03-27  5:03 ` [PATCH v5 07/12] cxl/mbox: Block inject and clear poison opcodes in raw mode alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-31 19:13   ` Dave Jiang
  2023-03-27  5:03 ` [PATCH v5 09/12] tools/testing/cxl: Mock the Clear " alison.schofield
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

Mock the injection of poison by storing the device:address entries in
mock_poison_list[]. Enforce a limit of 8 poison injections per memdev
device and 128 total entries for the cxl_test mock driver.

Introducing the mock_poison[] list here, makes it available for use in
the mock of Clear Poison, and the mock of Get Poison List.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 tools/testing/cxl/test/mem.c | 77 ++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 2fa9c18d4c2c..4fca886e2a7c 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -16,6 +16,9 @@
 #define DEV_SIZE SZ_2G
 #define EFFECT(x) (1U << x)
 
+#define MOCK_INJECT_DEV_MAX 8
+#define MOCK_INJECT_TEST_MAX 128
+
 static struct cxl_cel_entry mock_cel[] = {
 	{
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
@@ -45,6 +48,10 @@ static struct cxl_cel_entry mock_cel[] = {
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
 		.effect = cpu_to_le16(0),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
+		.effect = cpu_to_le16(0),
+	},
 };
 
 /* See CXL 2.0 Table 181 Get Health Info Output Payload */
@@ -474,6 +481,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
 		.total_capacity =
 			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
+		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
 	};
 
 	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
@@ -895,6 +903,11 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static struct mock_poison {
+	struct cxl_dev_state *cxlds;
+	u64 dpa;
+} mock_poison_list[MOCK_INJECT_TEST_MAX];
+
 static int mock_get_poison(struct cxl_dev_state *cxlds,
 			   struct cxl_mbox_cmd *cmd)
 {
@@ -923,6 +936,67 @@ static int mock_get_poison(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
+{
+	int count = 0;
+
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds == cxlds)
+			count++;
+	}
+	return (count >= MOCK_INJECT_DEV_MAX);
+}
+
+static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	if (mock_poison_dev_max_injected(cxlds)) {
+		dev_dbg(cxlds->dev,
+			"Device poison injection limit has been reached: %d\n",
+			MOCK_INJECT_DEV_MAX);
+		return false;
+	}
+
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (!mock_poison_list[i].cxlds) {
+			mock_poison_list[i].cxlds = cxlds;
+			mock_poison_list[i].dpa = dpa;
+			return true;
+		}
+	}
+	dev_dbg(cxlds->dev,
+		"Mock test poison injection limit has been reached: %d\n",
+		MOCK_INJECT_TEST_MAX);
+
+	return false;
+}
+
+static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds == cxlds &&
+		    mock_poison_list[i].dpa == dpa)
+			return true;
+	}
+	return false;
+}
+
+static int mock_inject_poison(struct cxl_dev_state *cxlds,
+			      struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_inject_poison *pi = cmd->payload_in;
+	u64 dpa = le64_to_cpu(pi->address);
+
+	if (mock_poison_found(cxlds, dpa)) {
+		/* Not an error to inject poison if already poisoned */
+		dev_dbg(cxlds->dev, "DPA: 0x%llx already poisoned\n", dpa);
+		return 0;
+	}
+	if (!mock_poison_add(cxlds, dpa))
+		return -ENXIO;
+
+	return 0;
+}
+
 static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct device *dev = cxlds->dev;
@@ -980,6 +1054,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_GET_POISON:
 		rc = mock_get_poison(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_INJECT_POISON:
+		rc = mock_inject_poison(cxlds, cmd);
+		break;
 	default:
 		break;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 09/12] tools/testing/cxl: Mock the Clear Poison mailbox command
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (7 preceding siblings ...)
  2023-03-27  5:03 ` [PATCH v5 08/12] tools/testing/cxl: Mock the Inject Poison mailbox command alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-31 19:15   ` Dave Jiang
  2023-03-27  5:03 ` [PATCH v5 10/12] tools/testing/cxl: Use injected poison for get poison list alison.schofield
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

Mock the clear of poison by deleting the device:address entry from
the mock_poison_list[]. Behave like a real CXL device and do not fail
if the address is not in the poison list, but offer a dev_dbg()
message.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 tools/testing/cxl/test/mem.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 4fca886e2a7c..720147c7cb2b 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -52,6 +52,10 @@ static struct cxl_cel_entry mock_cel[] = {
 		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
 		.effect = cpu_to_le16(0),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_CLEAR_POISON),
+		.effect = cpu_to_le16(0),
+	},
 };
 
 /* See CXL 2.0 Table 181 Get Health Info Output Payload */
@@ -997,6 +1001,35 @@ static int mock_inject_poison(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static bool mock_poison_del(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds == cxlds &&
+		    mock_poison_list[i].dpa == dpa) {
+			mock_poison_list[i].cxlds = NULL;
+			return true;
+		}
+	}
+	return false;
+}
+
+static int mock_clear_poison(struct cxl_dev_state *cxlds,
+			     struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_clear_poison *pi = cmd->payload_in;
+	u64 dpa = le64_to_cpu(pi->address);
+
+	/*
+	 * A real CXL device will write pi->write_data to the address
+	 * being cleared. In this mock, just delete this address from
+	 * the mock poison list.
+	 */
+	if (!mock_poison_del(cxlds, dpa))
+		dev_dbg(cxlds->dev, "DPA: 0x%llx not in poison list\n", dpa);
+
+	return 0;
+}
+
 static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct device *dev = cxlds->dev;
@@ -1057,6 +1090,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_INJECT_POISON:
 		rc = mock_inject_poison(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_CLEAR_POISON:
+		rc = mock_clear_poison(cxlds, cmd);
+		break;
 	default:
 		break;
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 10/12] tools/testing/cxl: Use injected poison for get poison list
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (8 preceding siblings ...)
  2023-03-27  5:03 ` [PATCH v5 09/12] tools/testing/cxl: Mock the Clear " alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-31 19:16   ` Dave Jiang
  2023-03-27  5:03 ` [PATCH v5 11/12] tools/testing/cxl: Add a sysfs attr to test poison inject limits alison.schofield
  2023-03-27  5:03 ` [PATCH v5 12/12] tools/testing/cxl: Require CONFIG_DEBUG_FS alison.schofield
  11 siblings, 1 reply; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

Prior to poison inject support, the mock of 'Get Poison List'
returned a poison list containing a single mocked error record.

Following the addition of poison inject and clear support to the
mock driver, use the mock_poison_list[], rather than faking an
error record. Mock_poison_list[] list tracks the actual poison
inject and clear requests issued by userspace.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 tools/testing/cxl/test/mem.c | 59 ++++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 720147c7cb2b..9658d95236b7 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -912,31 +912,52 @@ static struct mock_poison {
 	u64 dpa;
 } mock_poison_list[MOCK_INJECT_TEST_MAX];
 
+static struct cxl_mbox_poison_payload_out
+*cxl_get_injected_po(struct cxl_dev_state *cxlds, u64 offset, u64 length)
+{
+	struct cxl_mbox_poison_payload_out *po;
+	int nr_records = 0;
+	u64 dpa;
+
+	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
+	if (!po)
+		return NULL;
+
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds != cxlds)
+			continue;
+		if (mock_poison_list[i].dpa < offset ||
+		    mock_poison_list[i].dpa > offset + length - 1)
+			continue;
+
+		dpa = mock_poison_list[i].dpa + CXL_POISON_SOURCE_INJECTED;
+		po->record[nr_records].address = cpu_to_le64(dpa);
+		po->record[nr_records].length = cpu_to_le32(1);
+		nr_records++;
+		if (nr_records == MOCK_INJECT_DEV_MAX)
+			break;
+	}
+
+	/* Always return count, even when zero */
+	po->count = cpu_to_le16(nr_records);
+
+	return po;
+}
+
 static int mock_get_poison(struct cxl_dev_state *cxlds,
 			   struct cxl_mbox_cmd *cmd)
 {
 	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
+	struct cxl_mbox_poison_payload_out *po;
+	u64 offset = le64_to_cpu(pi->offset);
+	u64 length = le64_to_cpu(pi->length);
+	int nr_records;
 
-	/* Mock one poison record at pi.offset for 64 bytes */
-	struct {
-		struct cxl_mbox_poison_payload_out po;
-		struct cxl_poison_record record;
-	} __packed mock_plist = {
-		.po = {
-			.count = cpu_to_le16(1),
-		},
-		.record = {
-			.length = cpu_to_le32(1),
-			.address = cpu_to_le64(le64_to_cpu(pi->offset) +
-					       CXL_POISON_SOURCE_INJECTED),
-		},
-	};
+	po = cxl_get_injected_po(cxlds, offset, length);
+	nr_records = le16_to_cpu(po->count);
+	memcpy(cmd->payload_out, po, struct_size(po, record, nr_records));
+	cmd->size_out = struct_size(po, record, nr_records);
 
-	if (cmd->size_out < sizeof(mock_plist))
-		return -EINVAL;
-
-	memcpy(cmd->payload_out, &mock_plist, sizeof(mock_plist));
-	cmd->size_out = sizeof(mock_plist);
 	return 0;
 }
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 11/12] tools/testing/cxl: Add a sysfs attr to test poison inject limits
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (9 preceding siblings ...)
  2023-03-27  5:03 ` [PATCH v5 10/12] tools/testing/cxl: Use injected poison for get poison list alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-31 19:18   ` Dave Jiang
  2023-03-27  5:03 ` [PATCH v5 12/12] tools/testing/cxl: Require CONFIG_DEBUG_FS alison.schofield
  11 siblings, 1 reply; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

CXL devices may report a maximum number of addresses that a device
allows to be poisoned using poison injection. When cxl_test creates
mock CXL memory devices, it defaults to MOCK_INJECT_DEV_MAX==88 for
all mocked memdevs.

Add a sysfs attribute, poison_inject_max to module cxl_mock_mem so
that users can set a custom device injection limit. Fail, and return
-EBUSY, if the mock poison list is not empty.

/sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max

A simple usage model is to set the attribute before running a test in
order to emulate a device's poison handling.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 tools/testing/cxl/test/mem.c | 52 +++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 9658d95236b7..5c3b3e5a3b4b 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -19,6 +19,8 @@
 #define MOCK_INJECT_DEV_MAX 8
 #define MOCK_INJECT_TEST_MAX 128
 
+static unsigned int poison_inject_dev_max = MOCK_INJECT_DEV_MAX;
+
 static struct cxl_cel_entry mock_cel[] = {
 	{
 		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
@@ -485,7 +487,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
 		.total_capacity =
 			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
-		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
+		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_TEST_MAX),
 	};
 
 	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
@@ -919,7 +921,7 @@ static struct cxl_mbox_poison_payload_out
 	int nr_records = 0;
 	u64 dpa;
 
-	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
+	po = kzalloc(struct_size(po, record, poison_inject_dev_max), GFP_KERNEL);
 	if (!po)
 		return NULL;
 
@@ -934,7 +936,7 @@ static struct cxl_mbox_poison_payload_out
 		po->record[nr_records].address = cpu_to_le64(dpa);
 		po->record[nr_records].length = cpu_to_le32(1);
 		nr_records++;
-		if (nr_records == MOCK_INJECT_DEV_MAX)
+		if (nr_records == poison_inject_dev_max)
 			break;
 	}
 
@@ -969,7 +971,7 @@ static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
 		if (mock_poison_list[i].cxlds == cxlds)
 			count++;
 	}
-	return (count >= MOCK_INJECT_DEV_MAX);
+	return (count >= poison_inject_dev_max);
 }
 
 static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
@@ -1051,6 +1053,47 @@ static int mock_clear_poison(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+static bool mock_poison_list_empty(void)
+{
+	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
+		if (mock_poison_list[i].cxlds)
+			return false;
+	}
+	return true;
+}
+
+static ssize_t poison_inject_max_show(struct device_driver *drv, char *buf)
+{
+	return sysfs_emit(buf, "%u\n", poison_inject_dev_max);
+}
+
+static ssize_t poison_inject_max_store(struct device_driver *drv,
+				       const char *buf, size_t len)
+{
+	int val;
+
+	if (kstrtoint(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	if (!mock_poison_list_empty())
+		return -EBUSY;
+
+	if (val <= MOCK_INJECT_TEST_MAX)
+		poison_inject_dev_max = val;
+	else
+		return -EINVAL;
+
+	return len;
+}
+
+static DRIVER_ATTR_RW(poison_inject_max);
+
+static struct attribute *cxl_mock_mem_core_attrs[] = {
+	&driver_attr_poison_inject_max.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(cxl_mock_mem_core);
+
 static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct device *dev = cxlds->dev;
@@ -1259,6 +1302,7 @@ static struct platform_driver cxl_mock_mem_driver = {
 	.driver = {
 		.name = KBUILD_MODNAME,
 		.dev_groups = cxl_mock_mem_groups,
+		.groups = cxl_mock_mem_core_groups,
 	},
 };
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 12/12] tools/testing/cxl: Require CONFIG_DEBUG_FS
  2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
                   ` (10 preceding siblings ...)
  2023-03-27  5:03 ` [PATCH v5 11/12] tools/testing/cxl: Add a sysfs attr to test poison inject limits alison.schofield
@ 2023-03-27  5:03 ` alison.schofield
  2023-03-31 19:20   ` Dave Jiang
  11 siblings, 1 reply; 35+ messages in thread
From: alison.schofield @ 2023-03-27  5:03 UTC (permalink / raw)
  To: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang
  Cc: Alison Schofield, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

The cxl_mem driver uses debugfs to support poison inject and clear.
Add debugfs to the list of required symbols so that cxl_test can
emulate those poison operations.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 tools/testing/cxl/config_check.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/cxl/config_check.c b/tools/testing/cxl/config_check.c
index 99b56b5f6edf..0902c5d6e410 100644
--- a/tools/testing/cxl/config_check.c
+++ b/tools/testing/cxl/config_check.c
@@ -13,4 +13,5 @@ void check(void)
 	BUILD_BUG_ON(!IS_MODULE(CONFIG_CXL_PMEM));
 	BUILD_BUG_ON(!IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST));
 	BUILD_BUG_ON(!IS_ENABLED(CONFIG_NVDIMM_SECURITY_TEST));
+	BUILD_BUG_ON(!IS_ENABLED(CONFIG_DEBUG_FS));
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-03-27  5:03 ` [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
@ 2023-03-30 18:47   ` Jonathan Cameron
  2023-03-31 18:11   ` Dave Jiang
  1 sibling, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2023-03-30 18:47 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Sun, 26 Mar 2023 22:03:07 -0700
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the INJECT POISON mailbox command. Add
> memdev driver support for the mailbox command.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> inject poison request, the device will return poison when the address
> is accessed through the CXL.mem bus. Injecting poison adds the address
> to the device's Poison List and the error source is set to Injected.
> In addition, the device adds a poison creation event to its internal
> Informational Event log, updates the Event Status register, and if
> configured, interrupts the host.
> 
> Also, per the CXL Specification, it is not an error to inject poison
> into an address that already has poison present and no error is
> returned from the device.
> 
> If the address is not contained in the device's dpa resource, or is
> not 64 byte aligned, return -EINVAL without issuing the mbox command.
> 
> Poison injection is intended for debug only and will be exposed to
> userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/memdev.c | 55 +++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h      |  6 +++++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index f26b5b6cda10..3b3ac2868848 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -213,6 +213,61 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_trigger_poison_list, CXL);
>  
> +static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
> +{
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +
> +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> +		return 0;
> +
> +	if (!resource_size(&cxlds->dpa_res)) {
> +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> +		return -EINVAL;
> +	}
> +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> +			dpa, &cxlds->dpa_res);
> +		return -EINVAL;
> +	}
> +	if (!IS_ALIGNED(dpa, 64)) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n", dpa);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +int cxl_inject_poison(struct device *dev, u64 dpa)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_mbox_inject_poison inject;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	int rc;
> +
> +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> +		return 0;
> +
> +	down_read(&cxl_dpa_rwsem);
> +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> +	if (rc)
> +		goto out;
> +
> +	inject = (struct cxl_mbox_inject_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> +		.size_in = sizeof(inject),
> +		.payload_in = &inject,
> +	};
> +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> +out:
> +	up_read(&cxl_dpa_rwsem);
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 5febaa3f9b04..527efef2d700 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -602,6 +602,11 @@ struct cxl_mbox_poison_payload_out {
>  #define CXL_POISON_SOURCE_INJECTED	3
>  #define CXL_POISON_SOURCE_VENDOR	7
>  
> +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> +struct cxl_mbox_inject_poison {
> +	__le64 address;
> +};
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
> @@ -678,6 +683,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  ssize_t cxl_trigger_poison_list(struct device *dev,
>  				struct device_attribute *attr, const char *buf,
>  				size_t len);
> +int cxl_inject_poison(struct device *dev, u64 dpa);
>  
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/12] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-03-27  5:03 ` [PATCH v5 02/12] cxl/memdev: Add support for the Clear " alison.schofield
@ 2023-03-30 18:50   ` Jonathan Cameron
  2023-03-30 20:12     ` Alison Schofield
  2023-03-31 18:40   ` Dave Jiang
  1 sibling, 1 reply; 35+ messages in thread
From: Jonathan Cameron @ 2023-03-30 18:50 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Sun, 26 Mar 2023 22:03:08 -0700
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the CLEAR POISON mailbox command. Add
> memdev driver support for clearing poison.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
> clear poison request, the device removes the address from the device's
> Poison List and writes 0 (zero) for 64 bytes starting at address. If
> the device cannot clear poison from the address, it returns a permanent
> media error and -ENXIO is returned to the user.
> 
> Additionally, and per the spec also, it is not an error to clear poison
> of an address that is not poisoned. In this case, the device does not
> overwrite the address and the device does not return an error.

That's not inline with the spec.

"Clear Poison Write Data: The data the device shall always write into the
requested physical address, atomically, while clearing poison if the location
is marked as being poisoned."

The overwrite always happens whether or not it's poisoned.

Other than that
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> 
> If the address is not contained in the device's dpa resource, or is
> not 64 byte aligned, return -EINVAL without issuing the mbox command.
> 
> Poison clearing is intended for debug only and will be exposed to
> userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
> 
> Implementation note: Although the CXL specification defines the clear
> command to accept 64 bytes of 'write-data' to be used when clearing
> the poisoned address, this implementation always uses 0 (zeros) for
> the write-data.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/cxl/core/memdev.c | 43 +++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h      |  7 +++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 3b3ac2868848..0e39c3c3fb09 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -268,6 +268,49 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
>  
> +int cxl_clear_poison(struct device *dev, u64 dpa)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_mbox_clear_poison clear;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	int rc;
> +
> +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> +		return 0;
> +
> +	down_read(&cxl_dpa_rwsem);
> +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> +	if (rc)
> +		goto out;
> +
> +	/*
> +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> +	 * is defined to accept 64 bytes of 'write-data', along with the
> +	 * address to clear. The device writes the data into the address
> +	 * atomically, while clearing poison if the location is marked as
> +	 * being poisoned.

This description is correct.

> +	 *
> +	 * Always use '0' for the write-data.
> +	 */
> +	clear = (struct cxl_mbox_clear_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};
> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> +		.size_in = sizeof(clear),
> +		.payload_in = &clear,
> +	};
> +
> +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> +
> +out:
> +	up_read(&cxl_dpa_rwsem);
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 527efef2d700..1d8677ab2306 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -607,6 +607,12 @@ struct cxl_mbox_inject_poison {
>  	__le64 address;
>  };
>  
> +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> +struct cxl_mbox_clear_poison {
> +	__le64 address;
> +	u8 write_data[CXL_POISON_LEN_MULT];
> +} __packed;
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
> @@ -684,6 +690,7 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
>  				struct device_attribute *attr, const char *buf,
>  				size_t len);
>  int cxl_inject_poison(struct device *dev, u64 dpa);
> +int cxl_clear_poison(struct device *dev, u64 dpa);
>  
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 03/12] cxl/memdev: Warn of poison inject or clear to a mapped region
  2023-03-27  5:03 ` [PATCH v5 03/12] cxl/memdev: Warn of poison inject or clear to a mapped region alison.schofield
@ 2023-03-30 18:55   ` Jonathan Cameron
  2023-04-11 17:43     ` Alison Schofield
  0 siblings, 1 reply; 35+ messages in thread
From: Jonathan Cameron @ 2023-03-30 18:55 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Sun, 26 Mar 2023 22:03:09 -0700
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Inject and clear poison capabilities and intended for debug usage only.
> In order to be useful in debug environments, the driver needs to allow
> inject and clear operations on DPAs mapped in regions.
> 
> dev_warn_once() when either operation occurs.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/cxl/core/memdev.c | 59 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 0e39c3c3fb09..a83619c31f61 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -213,6 +213,50 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_trigger_poison_list, CXL);
>  
> +struct cxl_dpa_to_region_context {
> +	struct cxl_region *cxlr;
> +	u64 dpa;
> +};
> +
> +static int __cxl_dpa_to_region(struct device *dev, void *arg)
> +{
> +	struct cxl_dpa_to_region_context *ctx = arg;
> +	struct cxl_endpoint_decoder *cxled;
> +	u64 dpa = ctx->dpa;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> +		return 0;
> +
> +	if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start)
> +		return 0;
> +
> +	dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
> +		dev_name(&cxled->cxld.region->dev));
> +
> +	ctx->cxlr = cxled->cxld.region;
> +
If we have a match, little point in letting walk continue.

return 1;

Also, I "think" we just know that the association has been built.
Injecting poison is probably still fine if the region / decoder hasn't yet
been committed.

Jonathan


> +	return 0;
> +}
> +
> +static struct cxl_region *cxl_dpa_to_region(struct cxl_memdev *cxlmd, u64 dpa)
> +{
> +	struct cxl_dpa_to_region_context ctx;
> +	struct cxl_port *port;
> +
> +	ctx = (struct cxl_dpa_to_region_context) {
> +		.dpa = dpa,
> +	};
> +	port = dev_get_drvdata(&cxlmd->dev);
> +	if (port && is_cxl_endpoint(port) && port->commit_end != -1)
> +		device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);
> +
> +	return ctx.cxlr;
> +}
> +
>  static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
>  {
>  	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> @@ -242,6 +286,7 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
>  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>  	struct cxl_mbox_inject_poison inject;
>  	struct cxl_mbox_cmd mbox_cmd;
> +	struct cxl_region *cxlr;
>  	int rc;
>  
>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> @@ -261,6 +306,13 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
>  		.payload_in = &inject,
>  	};
>  	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> +	if (rc)
> +		goto out;
> +
> +	cxlr = cxl_dpa_to_region(cxlmd, dpa);
> +	if (cxlr)
> +		dev_warn_once(dev, "poison inject dpa:0x%llx region: %s\n",
> +			      dpa, dev_name(&cxlr->dev));
>  out:
>  	up_read(&cxl_dpa_rwsem);
>  
> @@ -273,6 +325,7 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
>  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>  	struct cxl_mbox_clear_poison clear;
>  	struct cxl_mbox_cmd mbox_cmd;
> +	struct cxl_region *cxlr;
>  	int rc;
>  
>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> @@ -303,7 +356,13 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
>  	};
>  
>  	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> +	if (rc)
> +		goto out;
>  
> +	cxlr = cxl_dpa_to_region(cxlmd, dpa);
> +	if (cxlr)
> +		dev_warn_once(dev, "poison clear dpa:0x%llx region: %s\n",
> +			      dpa, dev_name(&cxlr->dev));
>  out:
>  	up_read(&cxl_dpa_rwsem);
>  


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 05/12] cxl/mem: Add debugfs attributes for poison inject and clear
  2023-03-27  5:03 ` [PATCH v5 05/12] cxl/mem: Add debugfs attributes for poison inject and clear alison.schofield
@ 2023-03-30 18:58   ` Jonathan Cameron
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2023-03-30 18:58 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Sun, 26 Mar 2023 22:03:11 -0700
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Inject and Clear Poison commands are optionally supported by CXL
> memdev devices and are intended for use in debug environments only.
> Add debugfs attributes for user access.
> 
> Documentation/ABI/testing/debugfs-cxl describes the usage.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  Documentation/ABI/testing/debugfs-cxl | 36 +++++++++++++++++++++++++++
>  drivers/cxl/mem.c                     | 26 +++++++++++++++++++
>  2 files changed, 62 insertions(+)
>  create mode 100644 Documentation/ABI/testing/debugfs-cxl
> 
> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
> new file mode 100644
> index 000000000000..3f1624f95f11
> --- /dev/null
> +++ b/Documentation/ABI/testing/debugfs-cxl
> @@ -0,0 +1,36 @@
> +What:		/sys/kernel/debug/cxl/memX/inject_poison
> +Date:		April, 2023
> +KernelVersion:	v6.4
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute, the memdev driver sends an inject poison command to
> +		the device for the specified address. The DPA must be 64-byte
> +		aligned and the length of the injected poison is 64-bytes. If
> +		successful, the device returns poison when the address is
> +		accessed through the CXL.mem bus. Injecting poison adds the
> +		address to the device's Poison List and the error source is set
> +		to Injected. In addition, the device adds a poison creation
> +		event to its internal Informational Event log, updates the
> +		Event Status register, and if configured, interrupts the host.
> +		It is not an error to inject poison into an address that
> +		already has poison present and no error is returned. The
> +		inject_poison attribute is only visible for devices supporting
> +		the capability.
> +
> +
> +What:		/sys/kernel/debug/memX/clear_poison
> +Date:		April, 2023
> +KernelVersion:	v6.4
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute, the memdev driver sends a clear poison command to
> +		the device for the specified address. Clearing poison removes
> +		the address from the device's Poison List and writes 0 (zero)
> +		for 64 bytes starting at address. It is not an error to clear
> +		poison from an address that does not have poison set, and if
> +		poison was not set, the address is not overwritten. If the
As in earlier patch. I don't think this statement is true.  The 64 bytes
at the address are always overwritten, whether or not there is poison.

Otherwise makes sense to hide these in debugfs given they are definitely for debug.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> +		device cannot clear poison from the address, -ENXIO is returned.
> +		The clear_poison attribute is only visible for devices
> +		supporting the capability.
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 576f5b691589..ff67ecf07001 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -94,6 +94,22 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
>  	return 0;
>  }
>  
> +static int cxl_debugfs_poison_inject(void *data, u64 dpa)
> +{
> +	return cxl_inject_poison(data, dpa);
> +}
> +
> +DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_inject_fops, NULL,
> +			 cxl_debugfs_poison_inject, "%llx\n");
> +
> +static int cxl_debugfs_poison_clear(void *data, u64 dpa)
> +{
> +	return cxl_clear_poison(data, dpa);
> +}
> +
> +DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
> +			 cxl_debugfs_poison_clear, "%llx\n");
> +
>  static int cxl_mem_probe(struct device *dev)
>  {
>  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> @@ -117,6 +133,16 @@ static int cxl_mem_probe(struct device *dev)
>  
>  	dentry = cxl_debugfs_create_dir(dev_name(dev));
>  	debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
> +
> +	if (test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> +		     to_cxl_memdev(dev)->cxlds->enabled_cmds))
> +		debugfs_create_file("inject_poison", 0200, dentry, dev,
> +				    &cxl_poison_inject_fops);
> +	if (test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
> +		     to_cxl_memdev(dev)->cxlds->enabled_cmds))
> +		debugfs_create_file("clear_poison", 0200, dentry, dev,
> +				    &cxl_poison_clear_fops);
> +
>  	rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
>  	if (rc)
>  		return rc;


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 04/12] cxl/memdev: Trace inject and clear poison as cxl_poison events
  2023-03-27  5:03 ` [PATCH v5 04/12] cxl/memdev: Trace inject and clear poison as cxl_poison events alison.schofield
@ 2023-03-30 19:03   ` Jonathan Cameron
  2023-03-31 18:53   ` Dave Jiang
  1 sibling, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2023-03-30 19:03 UTC (permalink / raw)
  To: alison.schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Sun, 26 Mar 2023 22:03:10 -0700
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> The cxl_poison trace event allows users to view the history of poison
> list reads. With the addition of inject and clear poison capabilities,
> users will expect similar tracing.
> 
> Add trace types 'Inject' and 'Clear' to the cxl_poison trace_event and
> trace successful operations only.
> 
> If the driver finds that the DPA being injected or cleared of poison
> is mapped in a region, that region info is included in the cxl_poison
> trace event. Region reconfigurations can make this extra info useless
> if the debug operations are not carefully managed.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Ok. I guess adding this info makes sense for debug logs.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/core.h   |  2 ++
>  drivers/cxl/core/memdev.c | 16 ++++++++++++++++
>  drivers/cxl/core/trace.h  |  8 +++++---
>  3 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 57bd22e01a0b..5b673eca8f12 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -71,6 +71,8 @@ void cxl_mbox_init(void);
>  
>  enum cxl_poison_trace_type {
>  	CXL_POISON_TRACE_LIST,
> +	CXL_POISON_TRACE_INJECT,
> +	CXL_POISON_TRACE_CLEAR,
>  };
>  
>  struct cxl_trigger_poison_context {
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index a83619c31f61..71ebe3795616 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -6,6 +6,7 @@
>  #include <linux/idr.h>
>  #include <linux/pci.h>
>  #include <cxlmem.h>
> +#include "trace.h"
>  #include "core.h"
>  
>  static DECLARE_RWSEM(cxl_memdev_rwsem);
> @@ -285,6 +286,7 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
>  {
>  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>  	struct cxl_mbox_inject_poison inject;
> +	struct cxl_poison_record record;
>  	struct cxl_mbox_cmd mbox_cmd;
>  	struct cxl_region *cxlr;
>  	int rc;
> @@ -313,6 +315,13 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
>  	if (cxlr)
>  		dev_warn_once(dev, "poison inject dpa:0x%llx region: %s\n",
>  			      dpa, dev_name(&cxlr->dev));
> +
> +	record = (struct cxl_poison_record) {
> +		.address = cpu_to_le64(dpa),
> +		.length = cpu_to_le32(1),
> +	};
> +	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
> +
>  out:
>  	up_read(&cxl_dpa_rwsem);
>  
> @@ -324,6 +333,7 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
>  {
>  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>  	struct cxl_mbox_clear_poison clear;
> +	struct cxl_poison_record record;
>  	struct cxl_mbox_cmd mbox_cmd;
>  	struct cxl_region *cxlr;
>  	int rc;
> @@ -363,6 +373,12 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
>  	if (cxlr)
>  		dev_warn_once(dev, "poison clear dpa:0x%llx region: %s\n",
>  			      dpa, dev_name(&cxlr->dev));
> +
> +	record = (struct cxl_poison_record) {
> +		.address = cpu_to_le64(dpa),
> +		.length = cpu_to_le32(1),
> +	};
> +	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
>  out:
>  	up_read(&cxl_dpa_rwsem);
>  
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index 65d81d27cb85..5e5e29995d3e 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -602,9 +602,11 @@ TRACE_EVENT(cxl_memory_module,
>  	)
>  );
>  
> -#define show_poison_trace_type(type)		   \
> -	__print_symbolic(type,			   \
> -	{ CXL_POISON_TRACE_LIST,	"List"	})
> +#define show_poison_trace_type(type)			\
> +	__print_symbolic(type,				\
> +	{ CXL_POISON_TRACE_LIST,	"List"   },	\
> +	{ CXL_POISON_TRACE_INJECT,	"Inject" },	\
> +	{ CXL_POISON_TRACE_CLEAR,	"Clear"  })
>  
>  #define __show_poison_source(source)                          \
>  	__print_symbolic(source,                              \


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/12] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-03-30 18:50   ` Jonathan Cameron
@ 2023-03-30 20:12     ` Alison Schofield
  2023-04-03 14:08       ` Jonathan Cameron
  0 siblings, 1 reply; 35+ messages in thread
From: Alison Schofield @ 2023-03-30 20:12 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Williams, Dan J, Weiny, Ira, Verma, Vishal L, Ben Widawsky,
	Jiang, Dave, linux-cxl

On Thu, Mar 30, 2023 at 11:50:18AM -0700, Jonathan Cameron wrote:
> On Sun, 26 Mar 2023 22:03:08 -0700
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the CLEAR POISON mailbox command. Add
> > memdev driver support for clearing poison.
> > 
> > Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
> > clear poison request, the device removes the address from the device's
> > Poison List and writes 0 (zero) for 64 bytes starting at address. If
> > the device cannot clear poison from the address, it returns a permanent
> > media error and -ENXIO is returned to the user.
> > 
> > Additionally, and per the spec also, it is not an error to clear poison
> > of an address that is not poisoned. In this case, the device does not
> > overwrite the address and the device does not return an error.
> 
> That's not inline with the spec.
> 
> "Clear Poison Write Data: The data the device shall always write into the
> requested physical address, atomically, while clearing poison if the location
> is marked as being poisoned."
> 
> The overwrite always happens whether or not it's poisoned.

Jonathan,

I read that with an emphasis on that final 'if' clause:
"The data the device shall always write (...blah blah blah...) if the
location is marked as being poisoned.

So, if the location was not marked as being poisoned, the device won't
write anything.

Which means, the user cannot use the Clear command to randomly write stuff
wherever they please.

What do you think of that ? 

Alison


> Other than that
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > 
> > If the address is not contained in the device's dpa resource, or is
> > not 64 byte aligned, return -EINVAL without issuing the mbox command.
> > 
> > Poison clearing is intended for debug only and will be exposed to
> > userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
> > 
> > Implementation note: Although the CXL specification defines the clear
> > command to accept 64 bytes of 'write-data' to be used when clearing
> > the poisoned address, this implementation always uses 0 (zeros) for
> > the write-data.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  drivers/cxl/core/memdev.c | 43 +++++++++++++++++++++++++++++++++++++++
> >  drivers/cxl/cxlmem.h      |  7 +++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index 3b3ac2868848..0e39c3c3fb09 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -268,6 +268,49 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
> >  
> > +int cxl_clear_poison(struct device *dev, u64 dpa)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_mbox_clear_poison clear;
> > +	struct cxl_mbox_cmd mbox_cmd;
> > +	int rc;
> > +
> > +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> > +		return 0;
> > +
> > +	down_read(&cxl_dpa_rwsem);
> > +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> > +	if (rc)
> > +		goto out;
> > +
> > +	/*
> > +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> > +	 * is defined to accept 64 bytes of 'write-data', along with the
> > +	 * address to clear. The device writes the data into the address
> > +	 * atomically, while clearing poison if the location is marked as
> > +	 * being poisoned.
> 
> This description is correct.
> 
> > +	 *
> > +	 * Always use '0' for the write-data.
> > +	 */
> > +	clear = (struct cxl_mbox_clear_poison) {
> > +		.address = cpu_to_le64(dpa)
> > +	};
> > +
> > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> > +		.size_in = sizeof(clear),
> > +		.payload_in = &clear,
> > +	};
> > +
> > +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> > +
> > +out:
> > +	up_read(&cxl_dpa_rwsem);
> > +
> > +	return rc;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
> > +
> >  static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_serial.attr,
> >  	&dev_attr_firmware_version.attr,
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 527efef2d700..1d8677ab2306 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -607,6 +607,12 @@ struct cxl_mbox_inject_poison {
> >  	__le64 address;
> >  };
> >  
> > +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> > +struct cxl_mbox_clear_poison {
> > +	__le64 address;
> > +	u8 write_data[CXL_POISON_LEN_MULT];
> > +} __packed;
> > +
> >  /**
> >   * struct cxl_mem_command - Driver representation of a memory device command
> >   * @info: Command information as it exists for the UAPI
> > @@ -684,6 +690,7 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
> >  				struct device_attribute *attr, const char *buf,
> >  				size_t len);
> >  int cxl_inject_poison(struct device *dev, u64 dpa);
> > +int cxl_clear_poison(struct device *dev, u64 dpa);
> >  
> >  #ifdef CONFIG_CXL_SUSPEND
> >  void cxl_mem_active_inc(void);
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-03-27  5:03 ` [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
  2023-03-30 18:47   ` Jonathan Cameron
@ 2023-03-31 18:11   ` Dave Jiang
  2023-03-31 18:52     ` Alison Schofield
  1 sibling, 1 reply; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 18:11 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the INJECT POISON mailbox command. Add
> memdev driver support for the mailbox command.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> inject poison request, the device will return poison when the address
> is accessed through the CXL.mem bus. Injecting poison adds the address

s/bus/protocol/?

> to the device's Poison List and the error source is set to Injected.
> In addition, the device adds a poison creation event to its internal
> Informational Event log, updates the Event Status register, and if
> configured, interrupts the host.
> 
> Also, per the CXL Specification, it is not an error to inject poison
> into an address that already has poison present and no error is
> returned from the device.
> 
> If the address is not contained in the device's dpa resource, or is
> not 64 byte aligned, return -EINVAL without issuing the mbox command.
> 
> Poison injection is intended for debug only and will be exposed to
> userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Just a NIT below. Otherwise,
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/core/memdev.c | 55 +++++++++++++++++++++++++++++++++++++++
>   drivers/cxl/cxlmem.h      |  6 +++++
>   2 files changed, 61 insertions(+)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index f26b5b6cda10..3b3ac2868848 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -213,6 +213,61 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_trigger_poison_list, CXL);
>   
> +static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
> +{
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +
> +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> +		return 0;
> +
> +	if (!resource_size(&cxlds->dpa_res)) {
> +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> +		return -EINVAL;
> +	}
> +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> +			dpa, &cxlds->dpa_res);
> +		return -EINVAL;
> +	}
> +	if (!IS_ALIGNED(dpa, 64)) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n", dpa);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +int cxl_inject_poison(struct device *dev, u64 dpa)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_mbox_inject_poison inject;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	int rc;
> +
> +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> +		return 0;
> +
> +	down_read(&cxl_dpa_rwsem);
> +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> +	if (rc)
> +		goto out;
> +
> +	inject = (struct cxl_mbox_inject_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};

Why not inject.address = cpu_to_le64(dpa);? Uneless there are more 
assignments coming in later patches?

DJ

> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> +		.size_in = sizeof(inject),
> +		.payload_in = &inject,
> +	};
> +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> +out:
> +	up_read(&cxl_dpa_rwsem);
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
> +
>   static struct attribute *cxl_memdev_attributes[] = {
>   	&dev_attr_serial.attr,
>   	&dev_attr_firmware_version.attr,
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 5febaa3f9b04..527efef2d700 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -602,6 +602,11 @@ struct cxl_mbox_poison_payload_out {
>   #define CXL_POISON_SOURCE_INJECTED	3
>   #define CXL_POISON_SOURCE_VENDOR	7
>   
> +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> +struct cxl_mbox_inject_poison {
> +	__le64 address;
> +};
> +
>   /**
>    * struct cxl_mem_command - Driver representation of a memory device command
>    * @info: Command information as it exists for the UAPI
> @@ -678,6 +683,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>   ssize_t cxl_trigger_poison_list(struct device *dev,
>   				struct device_attribute *attr, const char *buf,
>   				size_t len);
> +int cxl_inject_poison(struct device *dev, u64 dpa);
>   
>   #ifdef CONFIG_CXL_SUSPEND
>   void cxl_mem_active_inc(void);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/12] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-03-27  5:03 ` [PATCH v5 02/12] cxl/memdev: Add support for the Clear " alison.schofield
  2023-03-30 18:50   ` Jonathan Cameron
@ 2023-03-31 18:40   ` Dave Jiang
  2023-03-31 19:55     ` Alison Schofield
  1 sibling, 1 reply; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 18:40 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the CLEAR POISON mailbox command. Add
> memdev driver support for clearing poison.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
> clear poison request, the device removes the address from the device's
> Poison List and writes 0 (zero) for 64 bytes starting at address. If
> the device cannot clear poison from the address, it returns a permanent
> media error and -ENXIO is returned to the user.
> 
> Additionally, and per the spec also, it is not an error to clear poison
> of an address that is not poisoned. In this case, the device does not
> overwrite the address and the device does not return an error.
> 
> If the address is not contained in the device's dpa resource, or is
> not 64 byte aligned, return -EINVAL without issuing the mbox command.
> 
> Poison clearing is intended for debug only and will be exposed to
> userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
> 
> Implementation note: Although the CXL specification defines the clear
> command to accept 64 bytes of 'write-data' to be used when clearing
> the poisoned address, this implementation always uses 0 (zeros) for
> the write-data.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>   drivers/cxl/core/memdev.c | 43 +++++++++++++++++++++++++++++++++++++++
>   drivers/cxl/cxlmem.h      |  7 +++++++
>   2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 3b3ac2868848..0e39c3c3fb09 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -268,6 +268,49 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
>   
> +int cxl_clear_poison(struct device *dev, u64 dpa)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_mbox_clear_poison clear;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	int rc;
> +
> +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> +		return 0;
> +
> +	down_read(&cxl_dpa_rwsem);
> +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> +	if (rc)
> +		goto out;
> +
> +	/*
> +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> +	 * is defined to accept 64 bytes of 'write-data', along with the
> +	 * address to clear. The device writes the data into the address
> +	 * atomically, while clearing poison if the location is marked as
> +	 * being poisoned.
> +	 *
> +	 * Always use '0' for the write-data.
> +	 */
> +	clear = (struct cxl_mbox_clear_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};

The write_data[] should be 0s in order to clear the poison right? Since 
'clear' is allocated on the stack, if it's not initialized then it would 
be random garbage in the data. You could just init all 'clear' members 
when you declare the variable at top if you like.

DJ

> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> +		.size_in = sizeof(clear),
> +		.payload_in = &clear,
> +	};
> +
> +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> +
> +out:
> +	up_read(&cxl_dpa_rwsem);
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
> +
>   static struct attribute *cxl_memdev_attributes[] = {
>   	&dev_attr_serial.attr,
>   	&dev_attr_firmware_version.attr,
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 527efef2d700..1d8677ab2306 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -607,6 +607,12 @@ struct cxl_mbox_inject_poison {
>   	__le64 address;
>   };
>   
> +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> +struct cxl_mbox_clear_poison {
> +	__le64 address;
> +	u8 write_data[CXL_POISON_LEN_MULT];
> +} __packed;
> +
>   /**
>    * struct cxl_mem_command - Driver representation of a memory device command
>    * @info: Command information as it exists for the UAPI
> @@ -684,6 +690,7 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
>   				struct device_attribute *attr, const char *buf,
>   				size_t len);
>   int cxl_inject_poison(struct device *dev, u64 dpa);
> +int cxl_clear_poison(struct device *dev, u64 dpa);
>   
>   #ifdef CONFIG_CXL_SUSPEND
>   void cxl_mem_active_inc(void);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command
  2023-03-31 18:11   ` Dave Jiang
@ 2023-03-31 18:52     ` Alison Schofield
  0 siblings, 0 replies; 35+ messages in thread
From: Alison Schofield @ 2023-03-31 18:52 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, linux-cxl

On Fri, Mar 31, 2023 at 11:11:11AM -0700, Dave Jiang wrote:
> 
> 
> On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the INJECT POISON mailbox command. Add
> > memdev driver support for the mailbox command.
> > 
> > Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> > inject poison request, the device will return poison when the address
> > is accessed through the CXL.mem bus. Injecting poison adds the address
> 
> s/bus/protocol/?

Quoting the spec there, but probably better to translate into kernel
driver language and say: ..is accessed through the CXL.mem driver.

> 
> > to the device's Poison List and the error source is set to Injected.
> > In addition, the device adds a poison creation event to its internal
> > Informational Event log, updates the Event Status register, and if
> > configured, interrupts the host.
> > 
> > Also, per the CXL Specification, it is not an error to inject poison
> > into an address that already has poison present and no error is
> > returned from the device.
> > 
> > If the address is not contained in the device's dpa resource, or is
> > not 64 byte aligned, return -EINVAL without issuing the mbox command.
> > 
> > Poison injection is intended for debug only and will be exposed to
> > userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> 
> Just a NIT below. Otherwise,
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
Thanks!
> > ---
snip
> > +
> > +	inject = (struct cxl_mbox_inject_poison) {
> > +		.address = cpu_to_le64(dpa)
> > +	};
> 
> Why not inject.address = cpu_to_le64(dpa);? Uneless there are more
> assignments coming in later patches?

Actually nothing else in that struct. It's just a pattern repeated needlessly.
I'll clean up.

Thanks,
Alison

> 
> DJ
> 

snip to end.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 04/12] cxl/memdev: Trace inject and clear poison as cxl_poison events
  2023-03-27  5:03 ` [PATCH v5 04/12] cxl/memdev: Trace inject and clear poison as cxl_poison events alison.schofield
  2023-03-30 19:03   ` Jonathan Cameron
@ 2023-03-31 18:53   ` Dave Jiang
  1 sibling, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 18:53 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> The cxl_poison trace event allows users to view the history of poison
> list reads. With the addition of inject and clear poison capabilities,
> users will expect similar tracing.
> 
> Add trace types 'Inject' and 'Clear' to the cxl_poison trace_event and
> trace successful operations only.
> 
> If the driver finds that the DPA being injected or cleared of poison
> is mapped in a region, that region info is included in the cxl_poison
> trace event. Region reconfigurations can make this extra info useless
> if the debug operations are not carefully managed.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/core/core.h   |  2 ++
>   drivers/cxl/core/memdev.c | 16 ++++++++++++++++
>   drivers/cxl/core/trace.h  |  8 +++++---
>   3 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 57bd22e01a0b..5b673eca8f12 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -71,6 +71,8 @@ void cxl_mbox_init(void);
>   
>   enum cxl_poison_trace_type {
>   	CXL_POISON_TRACE_LIST,
> +	CXL_POISON_TRACE_INJECT,
> +	CXL_POISON_TRACE_CLEAR,
>   };
>   
>   struct cxl_trigger_poison_context {
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index a83619c31f61..71ebe3795616 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -6,6 +6,7 @@
>   #include <linux/idr.h>
>   #include <linux/pci.h>
>   #include <cxlmem.h>
> +#include "trace.h"
>   #include "core.h"
>   
>   static DECLARE_RWSEM(cxl_memdev_rwsem);
> @@ -285,6 +286,7 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
>   {
>   	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>   	struct cxl_mbox_inject_poison inject;
> +	struct cxl_poison_record record;
>   	struct cxl_mbox_cmd mbox_cmd;
>   	struct cxl_region *cxlr;
>   	int rc;
> @@ -313,6 +315,13 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
>   	if (cxlr)
>   		dev_warn_once(dev, "poison inject dpa:0x%llx region: %s\n",
>   			      dpa, dev_name(&cxlr->dev));
> +
> +	record = (struct cxl_poison_record) {
> +		.address = cpu_to_le64(dpa),
> +		.length = cpu_to_le32(1),
> +	};
> +	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
> +
>   out:
>   	up_read(&cxl_dpa_rwsem);
>   
> @@ -324,6 +333,7 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
>   {
>   	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>   	struct cxl_mbox_clear_poison clear;
> +	struct cxl_poison_record record;
>   	struct cxl_mbox_cmd mbox_cmd;
>   	struct cxl_region *cxlr;
>   	int rc;
> @@ -363,6 +373,12 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
>   	if (cxlr)
>   		dev_warn_once(dev, "poison clear dpa:0x%llx region: %s\n",
>   			      dpa, dev_name(&cxlr->dev));
> +
> +	record = (struct cxl_poison_record) {
> +		.address = cpu_to_le64(dpa),
> +		.length = cpu_to_le32(1),
> +	};
> +	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
>   out:
>   	up_read(&cxl_dpa_rwsem);
>   
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index 65d81d27cb85..5e5e29995d3e 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -602,9 +602,11 @@ TRACE_EVENT(cxl_memory_module,
>   	)
>   );
>   
> -#define show_poison_trace_type(type)		   \
> -	__print_symbolic(type,			   \
> -	{ CXL_POISON_TRACE_LIST,	"List"	})
> +#define show_poison_trace_type(type)			\
> +	__print_symbolic(type,				\
> +	{ CXL_POISON_TRACE_LIST,	"List"   },	\
> +	{ CXL_POISON_TRACE_INJECT,	"Inject" },	\
> +	{ CXL_POISON_TRACE_CLEAR,	"Clear"  })
>   
>   #define __show_poison_source(source)                          \
>   	__print_symbolic(source,                              \

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 06/12] cxl/memdev: Make inject and clear poison cmds kernel exclusive
  2023-03-27  5:03 ` [PATCH v5 06/12] cxl/memdev: Make inject and clear poison cmds kernel exclusive alison.schofield
@ 2023-03-31 19:10   ` Dave Jiang
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 19:10 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl, Jonathan Cameron



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Inject and clear poison commands are intended to be used in debug
> mode only, and if improperly used, can lead to data corruption. The
> kernel provides a debugfs interface to issue these commands [1]
> 
> The CXL driver defines Enabled commands in its ABI.[2] Enabled means
> that the device and the driver both support the command. If a device
> supports inject and/or clear, those commands are flagged Enabled.
> 
> The ABI also defines another command flag: Exclusive. Exclusive
> commands are reserved for kernel use. The exclusive flags can be
> temporal, but for inject and clear, the status is permanent.
> 
> Document the exclusivity of Inject and Clear in the ABI kernel doc.
> (Clean up a typo in kdoc too: 'CXL_MEM_COMMAND_FLAG_ENABLED')
> 
> Create an exclusive commands bitmap in the memdev driver, add the
> inject and clear poison commands, and set it in the cxl_dev_state.
> 
> [1] Documentation/ABI/testing/debugfs-cxl
> [2] include/uapi/linux/cxl_mem.h
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/core/memdev.c    |  6 ++++++
>   include/uapi/linux/cxl_mem.h | 20 +++++++++++++++-----
>   2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 71ebe3795616..617d8378ca9a 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -11,6 +11,8 @@
>   
>   static DECLARE_RWSEM(cxl_memdev_rwsem);
>   
> +static __read_mostly DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
> +
>   /*
>    * An entire PCI topology full of devices should be enough for any
>    * config
> @@ -628,6 +630,10 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
>   	cxlmd->cxlds = cxlds;
>   	cxlds->cxlmd = cxlmd;
>   
> +	set_bit(CXL_MEM_COMMAND_ID_INJECT_POISON, exclusive_cmds);
> +	set_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON, exclusive_cmds);
> +	set_exclusive_cxl_commands(cxlds, exclusive_cmds);
> +
>   	cdev = &cxlmd->cdev;
>   	rc = cdev_device_add(cdev, dev);
>   	if (rc)
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 86bbacf2a315..6294278f9dcb 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -74,17 +74,27 @@ static const struct {
>    * @id: ID number for the command.
>    * @flags: Flags that specify command behavior.
>    *
> - *         CXL_MEM_COMMAND_FLAG_USER_ENABLED
> + *         CXL_MEM_COMMAND_FLAG_ENABLED
>    *
>    *         The given command id is supported by the driver and is supported by
>    *         a related opcode on the device.
>    *
>    *         CXL_MEM_COMMAND_FLAG_EXCLUSIVE
>    *
> - *         Requests with the given command id will terminate with EBUSY as the
> - *         kernel actively owns management of the given resource. For example,
> - *         the label-storage-area can not be written while the kernel is
> - *         actively managing that space.
> + *         The given command id is for kernel exclusive use and is not
> + *         available to userspace. Requests will terminate with EBUSY.
> + *
> + *         The exclusive flag may be temporal, and only set while the
> + *         kernel actively owns management of the given resource. For
> + *         example, the label-storage-area can not be written while the
> + *         kernel is actively managing that space.
> + *
> + *         The exclusive flag can be permanent, as in commands that can
> + *         never be issued through the ioctl interface.
> + *
> + *         INJECT_POISON and CLEAR_POISON are permanently kernel exclusive,
> + *         and are supported through a debugfs interface.
> + *         See: Documentation/ABI/testing/debugfs-cxl
>    *
>    * @size_in: Expected input size, or ~0 if variable length.
>    * @size_out: Expected output size, or ~0 if variable length.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 07/12] cxl/mbox: Block inject and clear poison opcodes in raw mode
  2023-03-27  5:03 ` [PATCH v5 07/12] cxl/mbox: Block inject and clear poison opcodes in raw mode alison.schofield
@ 2023-03-31 19:10   ` Dave Jiang
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 19:10 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl, Jonathan Cameron



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Inject and clear poison are commands intended for debug environments,
> and can cause data corruption if issued without validation. They are
> kernel exclusive commands not available to userspace through ioctls,
> but could be submitted via the raw mode ioctl.
> 
> Add inject and clear poison to the cxl_disabled_raw_commands[] list.
> Attempts by userspace to issue either command via the RAW ioctl fail
> with -EPERM.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/core/mbox.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index a8369ef56f61..b380208f85c2 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -89,6 +89,9 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>    *
>    * CXL_MBOX_OP_[GET_]SCAN_MEDIA: The kernel provides a native error list that
>    * is kept up to date with patrol notifications and error management.
> + *
> + * CXL_MBOX_OP_[INJECT|CLEAR]_POISON: The kernel provides a debugfs interface
> + * to these commands. See: Documentation/ABI/testing/debugfs-cxl
>    */
>   static u16 cxl_disabled_raw_commands[] = {
>   	CXL_MBOX_OP_ACTIVATE_FW,
> @@ -97,6 +100,8 @@ static u16 cxl_disabled_raw_commands[] = {
>   	CXL_MBOX_OP_SET_SHUTDOWN_STATE,
>   	CXL_MBOX_OP_SCAN_MEDIA,
>   	CXL_MBOX_OP_GET_SCAN_MEDIA,
> +	CXL_MBOX_OP_INJECT_POISON,
> +	CXL_MBOX_OP_CLEAR_POISON,
>   };
>   
>   /*

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 08/12] tools/testing/cxl: Mock the Inject Poison mailbox command
  2023-03-27  5:03 ` [PATCH v5 08/12] tools/testing/cxl: Mock the Inject Poison mailbox command alison.schofield
@ 2023-03-31 19:13   ` Dave Jiang
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 19:13 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl, Jonathan Cameron



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Mock the injection of poison by storing the device:address entries in
> mock_poison_list[]. Enforce a limit of 8 poison injections per memdev
> device and 128 total entries for the cxl_test mock driver.
> 
> Introducing the mock_poison[] list here, makes it available for use in
> the mock of Clear Poison, and the mock of Get Poison List.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   tools/testing/cxl/test/mem.c | 77 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 77 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 2fa9c18d4c2c..4fca886e2a7c 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -16,6 +16,9 @@
>   #define DEV_SIZE SZ_2G
>   #define EFFECT(x) (1U << x)
>   
> +#define MOCK_INJECT_DEV_MAX 8
> +#define MOCK_INJECT_TEST_MAX 128
> +
>   static struct cxl_cel_entry mock_cel[] = {
>   	{
>   		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
> @@ -45,6 +48,10 @@ static struct cxl_cel_entry mock_cel[] = {
>   		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_POISON),
>   		.effect = cpu_to_le16(0),
>   	},
> +	{
> +		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
> +		.effect = cpu_to_le16(0),
> +	},
>   };
>   
>   /* See CXL 2.0 Table 181 Get Health Info Output Payload */
> @@ -474,6 +481,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>   			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
>   		.total_capacity =
>   			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
> +		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
>   	};
>   
>   	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
> @@ -895,6 +903,11 @@ static int mock_health_info(struct cxl_dev_state *cxlds,
>   	return 0;
>   }
>   
> +static struct mock_poison {
> +	struct cxl_dev_state *cxlds;
> +	u64 dpa;
> +} mock_poison_list[MOCK_INJECT_TEST_MAX];
> +
>   static int mock_get_poison(struct cxl_dev_state *cxlds,
>   			   struct cxl_mbox_cmd *cmd)
>   {
> @@ -923,6 +936,67 @@ static int mock_get_poison(struct cxl_dev_state *cxlds,
>   	return 0;
>   }
>   
> +static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
> +{
> +	int count = 0;
> +
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (mock_poison_list[i].cxlds == cxlds)
> +			count++;
> +	}
> +	return (count >= MOCK_INJECT_DEV_MAX);
> +}
> +
> +static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	if (mock_poison_dev_max_injected(cxlds)) {
> +		dev_dbg(cxlds->dev,
> +			"Device poison injection limit has been reached: %d\n",
> +			MOCK_INJECT_DEV_MAX);
> +		return false;
> +	}
> +
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (!mock_poison_list[i].cxlds) {
> +			mock_poison_list[i].cxlds = cxlds;
> +			mock_poison_list[i].dpa = dpa;
> +			return true;
> +		}
> +	}
> +	dev_dbg(cxlds->dev,
> +		"Mock test poison injection limit has been reached: %d\n",
> +		MOCK_INJECT_TEST_MAX);
> +
> +	return false;
> +}
> +
> +static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (mock_poison_list[i].cxlds == cxlds &&
> +		    mock_poison_list[i].dpa == dpa)
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static int mock_inject_poison(struct cxl_dev_state *cxlds,
> +			      struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_mbox_inject_poison *pi = cmd->payload_in;
> +	u64 dpa = le64_to_cpu(pi->address);
> +
> +	if (mock_poison_found(cxlds, dpa)) {
> +		/* Not an error to inject poison if already poisoned */
> +		dev_dbg(cxlds->dev, "DPA: 0x%llx already poisoned\n", dpa);
> +		return 0;
> +	}
> +	if (!mock_poison_add(cxlds, dpa))
> +		return -ENXIO;
> +
> +	return 0;
> +}
> +
>   static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>   {
>   	struct device *dev = cxlds->dev;
> @@ -980,6 +1054,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
>   	case CXL_MBOX_OP_GET_POISON:
>   		rc = mock_get_poison(cxlds, cmd);
>   		break;
> +	case CXL_MBOX_OP_INJECT_POISON:
> +		rc = mock_inject_poison(cxlds, cmd);
> +		break;
>   	default:
>   		break;
>   	}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 09/12] tools/testing/cxl: Mock the Clear Poison mailbox command
  2023-03-27  5:03 ` [PATCH v5 09/12] tools/testing/cxl: Mock the Clear " alison.schofield
@ 2023-03-31 19:15   ` Dave Jiang
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 19:15 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl, Jonathan Cameron



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Mock the clear of poison by deleting the device:address entry from
> the mock_poison_list[]. Behave like a real CXL device and do not fail
> if the address is not in the poison list, but offer a dev_dbg()
> message.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   tools/testing/cxl/test/mem.c | 36 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 36 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 4fca886e2a7c..720147c7cb2b 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -52,6 +52,10 @@ static struct cxl_cel_entry mock_cel[] = {
>   		.opcode = cpu_to_le16(CXL_MBOX_OP_INJECT_POISON),
>   		.effect = cpu_to_le16(0),
>   	},
> +	{
> +		.opcode = cpu_to_le16(CXL_MBOX_OP_CLEAR_POISON),
> +		.effect = cpu_to_le16(0),
> +	},
>   };
>   
>   /* See CXL 2.0 Table 181 Get Health Info Output Payload */
> @@ -997,6 +1001,35 @@ static int mock_inject_poison(struct cxl_dev_state *cxlds,
>   	return 0;
>   }
>   
> +static bool mock_poison_del(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (mock_poison_list[i].cxlds == cxlds &&
> +		    mock_poison_list[i].dpa == dpa) {
> +			mock_poison_list[i].cxlds = NULL;
> +			return true;
> +		}
> +	}
> +	return false;
> +}
> +
> +static int mock_clear_poison(struct cxl_dev_state *cxlds,
> +			     struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_mbox_clear_poison *pi = cmd->payload_in;
> +	u64 dpa = le64_to_cpu(pi->address);
> +
> +	/*
> +	 * A real CXL device will write pi->write_data to the address
> +	 * being cleared. In this mock, just delete this address from
> +	 * the mock poison list.
> +	 */
> +	if (!mock_poison_del(cxlds, dpa))
> +		dev_dbg(cxlds->dev, "DPA: 0x%llx not in poison list\n", dpa);
> +
> +	return 0;
> +}
> +
>   static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>   {
>   	struct device *dev = cxlds->dev;
> @@ -1057,6 +1090,9 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
>   	case CXL_MBOX_OP_INJECT_POISON:
>   		rc = mock_inject_poison(cxlds, cmd);
>   		break;
> +	case CXL_MBOX_OP_CLEAR_POISON:
> +		rc = mock_clear_poison(cxlds, cmd);
> +		break;
>   	default:
>   		break;
>   	}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 10/12] tools/testing/cxl: Use injected poison for get poison list
  2023-03-27  5:03 ` [PATCH v5 10/12] tools/testing/cxl: Use injected poison for get poison list alison.schofield
@ 2023-03-31 19:16   ` Dave Jiang
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 19:16 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl, Jonathan Cameron



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Prior to poison inject support, the mock of 'Get Poison List'
> returned a poison list containing a single mocked error record.
> 
> Following the addition of poison inject and clear support to the
> mock driver, use the mock_poison_list[], rather than faking an
> error record. Mock_poison_list[] list tracks the actual poison
> inject and clear requests issued by userspace.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   tools/testing/cxl/test/mem.c | 59 ++++++++++++++++++++++++------------
>   1 file changed, 40 insertions(+), 19 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 720147c7cb2b..9658d95236b7 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -912,31 +912,52 @@ static struct mock_poison {
>   	u64 dpa;
>   } mock_poison_list[MOCK_INJECT_TEST_MAX];
>   
> +static struct cxl_mbox_poison_payload_out
> +*cxl_get_injected_po(struct cxl_dev_state *cxlds, u64 offset, u64 length)
> +{
> +	struct cxl_mbox_poison_payload_out *po;
> +	int nr_records = 0;
> +	u64 dpa;
> +
> +	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
> +	if (!po)
> +		return NULL;
> +
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (mock_poison_list[i].cxlds != cxlds)
> +			continue;
> +		if (mock_poison_list[i].dpa < offset ||
> +		    mock_poison_list[i].dpa > offset + length - 1)
> +			continue;
> +
> +		dpa = mock_poison_list[i].dpa + CXL_POISON_SOURCE_INJECTED;
> +		po->record[nr_records].address = cpu_to_le64(dpa);
> +		po->record[nr_records].length = cpu_to_le32(1);
> +		nr_records++;
> +		if (nr_records == MOCK_INJECT_DEV_MAX)
> +			break;
> +	}
> +
> +	/* Always return count, even when zero */
> +	po->count = cpu_to_le16(nr_records);
> +
> +	return po;
> +}
> +
>   static int mock_get_poison(struct cxl_dev_state *cxlds,
>   			   struct cxl_mbox_cmd *cmd)
>   {
>   	struct cxl_mbox_poison_payload_in *pi = cmd->payload_in;
> +	struct cxl_mbox_poison_payload_out *po;
> +	u64 offset = le64_to_cpu(pi->offset);
> +	u64 length = le64_to_cpu(pi->length);
> +	int nr_records;
>   
> -	/* Mock one poison record at pi.offset for 64 bytes */
> -	struct {
> -		struct cxl_mbox_poison_payload_out po;
> -		struct cxl_poison_record record;
> -	} __packed mock_plist = {
> -		.po = {
> -			.count = cpu_to_le16(1),
> -		},
> -		.record = {
> -			.length = cpu_to_le32(1),
> -			.address = cpu_to_le64(le64_to_cpu(pi->offset) +
> -					       CXL_POISON_SOURCE_INJECTED),
> -		},
> -	};
> +	po = cxl_get_injected_po(cxlds, offset, length);
> +	nr_records = le16_to_cpu(po->count);
> +	memcpy(cmd->payload_out, po, struct_size(po, record, nr_records));
> +	cmd->size_out = struct_size(po, record, nr_records);
>   
> -	if (cmd->size_out < sizeof(mock_plist))
> -		return -EINVAL;
> -
> -	memcpy(cmd->payload_out, &mock_plist, sizeof(mock_plist));
> -	cmd->size_out = sizeof(mock_plist);
>   	return 0;
>   }
>   

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 11/12] tools/testing/cxl: Add a sysfs attr to test poison inject limits
  2023-03-27  5:03 ` [PATCH v5 11/12] tools/testing/cxl: Add a sysfs attr to test poison inject limits alison.schofield
@ 2023-03-31 19:18   ` Dave Jiang
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 19:18 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl, Jonathan Cameron



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices may report a maximum number of addresses that a device
> allows to be poisoned using poison injection. When cxl_test creates
> mock CXL memory devices, it defaults to MOCK_INJECT_DEV_MAX==88 for
> all mocked memdevs.
> 
> Add a sysfs attribute, poison_inject_max to module cxl_mock_mem so
> that users can set a custom device injection limit. Fail, and return
> -EBUSY, if the mock poison list is not empty.
> 
> /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
> 
> A simple usage model is to set the attribute before running a test in
> order to emulate a device's poison handling.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   tools/testing/cxl/test/mem.c | 52 +++++++++++++++++++++++++++++++++---
>   1 file changed, 48 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 9658d95236b7..5c3b3e5a3b4b 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -19,6 +19,8 @@
>   #define MOCK_INJECT_DEV_MAX 8
>   #define MOCK_INJECT_TEST_MAX 128
>   
> +static unsigned int poison_inject_dev_max = MOCK_INJECT_DEV_MAX;
> +
>   static struct cxl_cel_entry mock_cel[] = {
>   	{
>   		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_SUPPORTED_LOGS),
> @@ -485,7 +487,7 @@ static int mock_id(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>   			cpu_to_le64(SZ_256M / CXL_CAPACITY_MULTIPLIER),
>   		.total_capacity =
>   			cpu_to_le64(DEV_SIZE / CXL_CAPACITY_MULTIPLIER),
> -		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_DEV_MAX),
> +		.inject_poison_limit = cpu_to_le16(MOCK_INJECT_TEST_MAX),
>   	};
>   
>   	put_unaligned_le24(CXL_POISON_LIST_MAX, id.poison_list_max_mer);
> @@ -919,7 +921,7 @@ static struct cxl_mbox_poison_payload_out
>   	int nr_records = 0;
>   	u64 dpa;
>   
> -	po = kzalloc(struct_size(po, record, MOCK_INJECT_DEV_MAX), GFP_KERNEL);
> +	po = kzalloc(struct_size(po, record, poison_inject_dev_max), GFP_KERNEL);
>   	if (!po)
>   		return NULL;
>   
> @@ -934,7 +936,7 @@ static struct cxl_mbox_poison_payload_out
>   		po->record[nr_records].address = cpu_to_le64(dpa);
>   		po->record[nr_records].length = cpu_to_le32(1);
>   		nr_records++;
> -		if (nr_records == MOCK_INJECT_DEV_MAX)
> +		if (nr_records == poison_inject_dev_max)
>   			break;
>   	}
>   
> @@ -969,7 +971,7 @@ static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
>   		if (mock_poison_list[i].cxlds == cxlds)
>   			count++;
>   	}
> -	return (count >= MOCK_INJECT_DEV_MAX);
> +	return (count >= poison_inject_dev_max);
>   }
>   
>   static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> @@ -1051,6 +1053,47 @@ static int mock_clear_poison(struct cxl_dev_state *cxlds,
>   	return 0;
>   }
>   
> +static bool mock_poison_list_empty(void)
> +{
> +	for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> +		if (mock_poison_list[i].cxlds)
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static ssize_t poison_inject_max_show(struct device_driver *drv, char *buf)
> +{
> +	return sysfs_emit(buf, "%u\n", poison_inject_dev_max);
> +}
> +
> +static ssize_t poison_inject_max_store(struct device_driver *drv,
> +				       const char *buf, size_t len)
> +{
> +	int val;
> +
> +	if (kstrtoint(buf, 0, &val) < 0)
> +		return -EINVAL;
> +
> +	if (!mock_poison_list_empty())
> +		return -EBUSY;
> +
> +	if (val <= MOCK_INJECT_TEST_MAX)
> +		poison_inject_dev_max = val;
> +	else
> +		return -EINVAL;
> +
> +	return len;
> +}
> +
> +static DRIVER_ATTR_RW(poison_inject_max);
> +
> +static struct attribute *cxl_mock_mem_core_attrs[] = {
> +	&driver_attr_poison_inject_max.attr,
> +	NULL
> +};
> +ATTRIBUTE_GROUPS(cxl_mock_mem_core);
> +
>   static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>   {
>   	struct device *dev = cxlds->dev;
> @@ -1259,6 +1302,7 @@ static struct platform_driver cxl_mock_mem_driver = {
>   	.driver = {
>   		.name = KBUILD_MODNAME,
>   		.dev_groups = cxl_mock_mem_groups,
> +		.groups = cxl_mock_mem_core_groups,
>   	},
>   };
>   

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 12/12] tools/testing/cxl: Require CONFIG_DEBUG_FS
  2023-03-27  5:03 ` [PATCH v5 12/12] tools/testing/cxl: Require CONFIG_DEBUG_FS alison.schofield
@ 2023-03-31 19:20   ` Dave Jiang
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 19:20 UTC (permalink / raw)
  To: alison.schofield, Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky
  Cc: linux-cxl



On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> The cxl_mem driver uses debugfs to support poison inject and clear.
> Add debugfs to the list of required symbols so that cxl_test can
> emulate those poison operations.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   tools/testing/cxl/config_check.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/cxl/config_check.c b/tools/testing/cxl/config_check.c
> index 99b56b5f6edf..0902c5d6e410 100644
> --- a/tools/testing/cxl/config_check.c
> +++ b/tools/testing/cxl/config_check.c
> @@ -13,4 +13,5 @@ void check(void)
>   	BUILD_BUG_ON(!IS_MODULE(CONFIG_CXL_PMEM));
>   	BUILD_BUG_ON(!IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST));
>   	BUILD_BUG_ON(!IS_ENABLED(CONFIG_NVDIMM_SECURITY_TEST));
> +	BUILD_BUG_ON(!IS_ENABLED(CONFIG_DEBUG_FS));
>   }

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/12] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-03-31 18:40   ` Dave Jiang
@ 2023-03-31 19:55     ` Alison Schofield
  2023-03-31 21:18       ` Dave Jiang
  0 siblings, 1 reply; 35+ messages in thread
From: Alison Schofield @ 2023-03-31 19:55 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, linux-cxl

On Fri, Mar 31, 2023 at 11:40:01AM -0700, Dave Jiang wrote:
> 
> 
> On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the CLEAR POISON mailbox command. Add
> > memdev driver support for clearing poison.
> > 
> > Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
> > clear poison request, the device removes the address from the device's
> > Poison List and writes 0 (zero) for 64 bytes starting at address. If
> > the device cannot clear poison from the address, it returns a permanent
> > media error and -ENXIO is returned to the user.
> > 
> > Additionally, and per the spec also, it is not an error to clear poison
> > of an address that is not poisoned. In this case, the device does not
> > overwrite the address and the device does not return an error.
> > 
> > If the address is not contained in the device's dpa resource, or is
> > not 64 byte aligned, return -EINVAL without issuing the mbox command.
> > 
> > Poison clearing is intended for debug only and will be exposed to
> > userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
> > 
> > Implementation note: Although the CXL specification defines the clear
> > command to accept 64 bytes of 'write-data' to be used when clearing
> > the poisoned address, this implementation always uses 0 (zeros) for
> > the write-data.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >   drivers/cxl/core/memdev.c | 43 +++++++++++++++++++++++++++++++++++++++
> >   drivers/cxl/cxlmem.h      |  7 +++++++
> >   2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index 3b3ac2868848..0e39c3c3fb09 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -268,6 +268,49 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
> >   }
> >   EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
> > +int cxl_clear_poison(struct device *dev, u64 dpa)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_mbox_clear_poison clear;
> > +	struct cxl_mbox_cmd mbox_cmd;
> > +	int rc;
> > +
> > +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> > +		return 0;
> > +
> > +	down_read(&cxl_dpa_rwsem);
> > +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> > +	if (rc)
> > +		goto out;
> > +
> > +	/*
> > +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> > +	 * is defined to accept 64 bytes of 'write-data', along with the
> > +	 * address to clear. The device writes the data into the address
> > +	 * atomically, while clearing poison if the location is marked as
> > +	 * being poisoned.
> > +	 *
> > +	 * Always use '0' for the write-data.
> > +	 */
> > +	clear = (struct cxl_mbox_clear_poison) {
> > +		.address = cpu_to_le64(dpa)
> > +	};
> 
> The write_data[] should be 0s in order to clear the poison right? Since
> 'clear' is allocated on the stack, if it's not initialized then it would be
> random garbage in the data. You could just init all 'clear' members when you
> declare the variable at top if you like.

Declaring like this initializes any unspecified fields to zero.
This is the same initialization used across all the mbox_cmd setups
here and in core/mbox.c. 

Am I using that construct incorrectly?

> 
> DJ
> 
> > +
> > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> > +		.size_in = sizeof(clear),
> > +		.payload_in = &clear,
> > +	};
> > +
> > +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> > +
> > +out:
> > +	up_read(&cxl_dpa_rwsem);
> > +
> > +	return rc;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
> > +
> >   static struct attribute *cxl_memdev_attributes[] = {
> >   	&dev_attr_serial.attr,
> >   	&dev_attr_firmware_version.attr,
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 527efef2d700..1d8677ab2306 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -607,6 +607,12 @@ struct cxl_mbox_inject_poison {
> >   	__le64 address;
> >   };
> > +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> > +struct cxl_mbox_clear_poison {
> > +	__le64 address;
> > +	u8 write_data[CXL_POISON_LEN_MULT];
> > +} __packed;
> > +
> >   /**
> >    * struct cxl_mem_command - Driver representation of a memory device command
> >    * @info: Command information as it exists for the UAPI
> > @@ -684,6 +690,7 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
> >   				struct device_attribute *attr, const char *buf,
> >   				size_t len);
> >   int cxl_inject_poison(struct device *dev, u64 dpa);
> > +int cxl_clear_poison(struct device *dev, u64 dpa);
> >   #ifdef CONFIG_CXL_SUSPEND
> >   void cxl_mem_active_inc(void);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/12] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-03-31 19:55     ` Alison Schofield
@ 2023-03-31 21:18       ` Dave Jiang
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2023-03-31 21:18 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, linux-cxl



On 3/31/23 12:55 PM, Alison Schofield wrote:
> On Fri, Mar 31, 2023 at 11:40:01AM -0700, Dave Jiang wrote:
>>
>>
>> On 3/26/23 10:03 PM, alison.schofield@intel.com wrote:
>>> From: Alison Schofield <alison.schofield@intel.com>
>>>
>>> CXL devices optionally support the CLEAR POISON mailbox command. Add
>>> memdev driver support for clearing poison.
>>>
>>> Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
>>> clear poison request, the device removes the address from the device's
>>> Poison List and writes 0 (zero) for 64 bytes starting at address. If
>>> the device cannot clear poison from the address, it returns a permanent
>>> media error and -ENXIO is returned to the user.
>>>
>>> Additionally, and per the spec also, it is not an error to clear poison
>>> of an address that is not poisoned. In this case, the device does not
>>> overwrite the address and the device does not return an error.
>>>
>>> If the address is not contained in the device's dpa resource, or is
>>> not 64 byte aligned, return -EINVAL without issuing the mbox command.
>>>
>>> Poison clearing is intended for debug only and will be exposed to
>>> userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
>>>
>>> Implementation note: Although the CXL specification defines the clear
>>> command to accept 64 bytes of 'write-data' to be used when clearing
>>> the poisoned address, this implementation always uses 0 (zeros) for
>>> the write-data.
>>>
>>> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
>>> ---
>>>    drivers/cxl/core/memdev.c | 43 +++++++++++++++++++++++++++++++++++++++
>>>    drivers/cxl/cxlmem.h      |  7 +++++++
>>>    2 files changed, 50 insertions(+)
>>>
>>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>>> index 3b3ac2868848..0e39c3c3fb09 100644
>>> --- a/drivers/cxl/core/memdev.c
>>> +++ b/drivers/cxl/core/memdev.c
>>> @@ -268,6 +268,49 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
>>>    }
>>>    EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
>>> +int cxl_clear_poison(struct device *dev, u64 dpa)
>>> +{
>>> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>>> +	struct cxl_mbox_clear_poison clear;
>>> +	struct cxl_mbox_cmd mbox_cmd;
>>> +	int rc;
>>> +
>>> +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>>> +		return 0;
>>> +
>>> +	down_read(&cxl_dpa_rwsem);
>>> +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>>> +	if (rc)
>>> +		goto out;
>>> +
>>> +	/*
>>> +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
>>> +	 * is defined to accept 64 bytes of 'write-data', along with the
>>> +	 * address to clear. The device writes the data into the address
>>> +	 * atomically, while clearing poison if the location is marked as
>>> +	 * being poisoned.
>>> +	 *
>>> +	 * Always use '0' for the write-data.
>>> +	 */
>>> +	clear = (struct cxl_mbox_clear_poison) {
>>> +		.address = cpu_to_le64(dpa)
>>> +	};
>>
>> The write_data[] should be 0s in order to clear the poison right? Since
>> 'clear' is allocated on the stack, if it's not initialized then it would be
>> random garbage in the data. You could just init all 'clear' members when you
>> declare the variable at top if you like.
> 
> Declaring like this initializes any unspecified fields to zero.
> This is the same initialization used across all the mbox_cmd setups
> here and in core/mbox.c.

I thought you need to do:
	clear = (struct cxl_mbox_clear_poison) {
		.address = cpu_to_le64(dpa),
		.write_data = { 0 },
	};

I didn't think it would initialize the other members to 0 if you omit 
them? But my simple C code test seems to indicate otherwise. So sorry 
about the noise.

> 
> Am I using that construct incorrectly?
> 
>>
>> DJ
>>
>>> +
>>> +	mbox_cmd = (struct cxl_mbox_cmd) {
>>> +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
>>> +		.size_in = sizeof(clear),
>>> +		.payload_in = &clear,
>>> +	};
>>> +
>>> +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
>>> +
>>> +out:
>>> +	up_read(&cxl_dpa_rwsem);
>>> +
>>> +	return rc;
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
>>> +
>>>    static struct attribute *cxl_memdev_attributes[] = {
>>>    	&dev_attr_serial.attr,
>>>    	&dev_attr_firmware_version.attr,
>>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>>> index 527efef2d700..1d8677ab2306 100644
>>> --- a/drivers/cxl/cxlmem.h
>>> +++ b/drivers/cxl/cxlmem.h
>>> @@ -607,6 +607,12 @@ struct cxl_mbox_inject_poison {
>>>    	__le64 address;
>>>    };
>>> +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
>>> +struct cxl_mbox_clear_poison {
>>> +	__le64 address;
>>> +	u8 write_data[CXL_POISON_LEN_MULT];
>>> +} __packed;
>>> +
>>>    /**
>>>     * struct cxl_mem_command - Driver representation of a memory device command
>>>     * @info: Command information as it exists for the UAPI
>>> @@ -684,6 +690,7 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
>>>    				struct device_attribute *attr, const char *buf,
>>>    				size_t len);
>>>    int cxl_inject_poison(struct device *dev, u64 dpa);
>>> +int cxl_clear_poison(struct device *dev, u64 dpa);
>>>    #ifdef CONFIG_CXL_SUSPEND
>>>    void cxl_mem_active_inc(void);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/12] cxl/memdev: Add support for the Clear Poison mailbox command
  2023-03-30 20:12     ` Alison Schofield
@ 2023-04-03 14:08       ` Jonathan Cameron
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2023-04-03 14:08 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Williams, Dan J, Weiny, Ira, Verma, Vishal L, Ben Widawsky,
	Jiang, Dave, linux-cxl

On Thu, 30 Mar 2023 13:12:23 -0700
Alison Schofield <alison.schofield@intel.com> wrote:

> On Thu, Mar 30, 2023 at 11:50:18AM -0700, Jonathan Cameron wrote:
> > On Sun, 26 Mar 2023 22:03:08 -0700
> > alison.schofield@intel.com wrote:
> >   
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > CXL devices optionally support the CLEAR POISON mailbox command. Add
> > > memdev driver support for clearing poison.
> > > 
> > > Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
> > > clear poison request, the device removes the address from the device's
> > > Poison List and writes 0 (zero) for 64 bytes starting at address. If
> > > the device cannot clear poison from the address, it returns a permanent
> > > media error and -ENXIO is returned to the user.
> > > 
> > > Additionally, and per the spec also, it is not an error to clear poison
> > > of an address that is not poisoned. In this case, the device does not
> > > overwrite the address and the device does not return an error.  
> > 
> > That's not inline with the spec.
> > 
> > "Clear Poison Write Data: The data the device shall always write into the
> > requested physical address, atomically, while clearing poison if the location
> > is marked as being poisoned."
> > 
> > The overwrite always happens whether or not it's poisoned.  
> 
> Jonathan,
> 
> I read that with an emphasis on that final 'if' clause:
> "The data the device shall always write (...blah blah blah...) if the
> location is marked as being poisoned.
> 
> So, if the location was not marked as being poisoned, the device won't
> write anything.
> 
> Which means, the user cannot use the Clear command to randomly write stuff
> wherever they please.
> 
> What do you think of that ? 

Clarification needed for the spec perhaps.  I'd argue the 'always' is there to make
it clear it does the write whether or not that condition is present. Otherwise
that word has no purpose in the sentence.  Hence the user can write random
data.  They can anyway if they have the ability to inject poison so I don't
see that mattering a lot.

Jonathan


> 
> Alison
> 
> 
> > Other than that
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >   
> > > 
> > > If the address is not contained in the device's dpa resource, or is
> > > not 64 byte aligned, return -EINVAL without issuing the mbox command.
> > > 
> > > Poison clearing is intended for debug only and will be exposed to
> > > userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
> > > 
> > > Implementation note: Although the CXL specification defines the clear
> > > command to accept 64 bytes of 'write-data' to be used when clearing
> > > the poisoned address, this implementation always uses 0 (zeros) for
> > > the write-data.
> > > 
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  drivers/cxl/core/memdev.c | 43 +++++++++++++++++++++++++++++++++++++++
> > >  drivers/cxl/cxlmem.h      |  7 +++++++
> > >  2 files changed, 50 insertions(+)
> > > 
> > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > > index 3b3ac2868848..0e39c3c3fb09 100644
> > > --- a/drivers/cxl/core/memdev.c
> > > +++ b/drivers/cxl/core/memdev.c
> > > @@ -268,6 +268,49 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
> > >  
> > > +int cxl_clear_poison(struct device *dev, u64 dpa)
> > > +{
> > > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > > +	struct cxl_mbox_clear_poison clear;
> > > +	struct cxl_mbox_cmd mbox_cmd;
> > > +	int rc;
> > > +
> > > +	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> > > +		return 0;
> > > +
> > > +	down_read(&cxl_dpa_rwsem);
> > > +	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> > > +	if (rc)
> > > +		goto out;
> > > +
> > > +	/*
> > > +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> > > +	 * is defined to accept 64 bytes of 'write-data', along with the
> > > +	 * address to clear. The device writes the data into the address
> > > +	 * atomically, while clearing poison if the location is marked as
> > > +	 * being poisoned.  
> > 
> > This description is correct.
> >   
> > > +	 *
> > > +	 * Always use '0' for the write-data.
> > > +	 */
> > > +	clear = (struct cxl_mbox_clear_poison) {
> > > +		.address = cpu_to_le64(dpa)
> > > +	};
> > > +
> > > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > > +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> > > +		.size_in = sizeof(clear),
> > > +		.payload_in = &clear,
> > > +	};
> > > +
> > > +	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> > > +
> > > +out:
> > > +	up_read(&cxl_dpa_rwsem);
> > > +
> > > +	return rc;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
> > > +
> > >  static struct attribute *cxl_memdev_attributes[] = {
> > >  	&dev_attr_serial.attr,
> > >  	&dev_attr_firmware_version.attr,
> > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > index 527efef2d700..1d8677ab2306 100644
> > > --- a/drivers/cxl/cxlmem.h
> > > +++ b/drivers/cxl/cxlmem.h
> > > @@ -607,6 +607,12 @@ struct cxl_mbox_inject_poison {
> > >  	__le64 address;
> > >  };
> > >  
> > > +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> > > +struct cxl_mbox_clear_poison {
> > > +	__le64 address;
> > > +	u8 write_data[CXL_POISON_LEN_MULT];
> > > +} __packed;
> > > +
> > >  /**
> > >   * struct cxl_mem_command - Driver representation of a memory device command
> > >   * @info: Command information as it exists for the UAPI
> > > @@ -684,6 +690,7 @@ ssize_t cxl_trigger_poison_list(struct device *dev,
> > >  				struct device_attribute *attr, const char *buf,
> > >  				size_t len);
> > >  int cxl_inject_poison(struct device *dev, u64 dpa);
> > > +int cxl_clear_poison(struct device *dev, u64 dpa);
> > >  
> > >  #ifdef CONFIG_CXL_SUSPEND
> > >  void cxl_mem_active_inc(void);  
> >   


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 03/12] cxl/memdev: Warn of poison inject or clear to a mapped region
  2023-03-30 18:55   ` Jonathan Cameron
@ 2023-04-11 17:43     ` Alison Schofield
  2023-04-13 17:07       ` Jonathan Cameron
  0 siblings, 1 reply; 35+ messages in thread
From: Alison Schofield @ 2023-04-11 17:43 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl

On Thu, Mar 30, 2023 at 07:55:46PM +0100, Jonathan Cameron wrote:
> On Sun, 26 Mar 2023 22:03:09 -0700
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Inject and clear poison capabilities and intended for debug usage only.
> > In order to be useful in debug environments, the driver needs to allow
> > inject and clear operations on DPAs mapped in regions.
> > 
> > dev_warn_once() when either operation occurs.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  drivers/cxl/core/memdev.c | 59 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 59 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index 0e39c3c3fb09..a83619c31f61 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c

snip

> > +static int __cxl_dpa_to_region(struct device *dev, void *arg)
> > +{
> > +	struct cxl_dpa_to_region_context *ctx = arg;
> > +	struct cxl_endpoint_decoder *cxled;
> > +	u64 dpa = ctx->dpa;
> > +
> > +	if (!is_endpoint_decoder(dev))
> > +		return 0;
> > +
> > +	cxled = to_cxl_endpoint_decoder(dev);
> > +	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> > +		return 0;
> > +
> > +	if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start)
> > +		return 0;
> > +
> > +	dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
> > +		dev_name(&cxled->cxld.region->dev));
> > +
> > +	ctx->cxlr = cxled->cxld.region;
> > +
> If we have a match, little point in letting walk continue.
> 
> return 1;

Yes, thanks!  Returning 1 now to stop the walk.

> 
> Also, I "think" we just know that the association has been built.
> Injecting poison is probably still fine if the region / decoder hasn't yet
> been committed.

I think you are right. If we want to allow inject in the space between
mapping and commit, then this work needs to move to the region driver,
similar to how cxl_get_poison_by_endpoint() in the get poison list
series works.

I'm not seeing how injecting poison in that gap, would be an important
debug scenario. Is it?

Alison

> 
> Jonathan
> 
> 
> > +	return 0;
> > +}
> > +
> > +static struct cxl_region *cxl_dpa_to_region(struct cxl_memdev *cxlmd, u64 dpa)
> > +{
> > +	struct cxl_dpa_to_region_context ctx;
> > +	struct cxl_port *port;
> > +
> > +	ctx = (struct cxl_dpa_to_region_context) {
> > +		.dpa = dpa,
> > +	};
> > +	port = dev_get_drvdata(&cxlmd->dev);
> > +	if (port && is_cxl_endpoint(port) && port->commit_end != -1)
> > +		device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);
> > +
> > +	return ctx.cxlr;
> > +}
> > +
> >  static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
> >  {
> >  	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > @@ -242,6 +286,7 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
> >  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> >  	struct cxl_mbox_inject_poison inject;
> >  	struct cxl_mbox_cmd mbox_cmd;
> > +	struct cxl_region *cxlr;
> >  	int rc;
> >  
> >  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> > @@ -261,6 +306,13 @@ int cxl_inject_poison(struct device *dev, u64 dpa)
> >  		.payload_in = &inject,
> >  	};
> >  	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> > +	if (rc)
> > +		goto out;
> > +
> > +	cxlr = cxl_dpa_to_region(cxlmd, dpa);
> > +	if (cxlr)
> > +		dev_warn_once(dev, "poison inject dpa:0x%llx region: %s\n",
> > +			      dpa, dev_name(&cxlr->dev));
> >  out:
> >  	up_read(&cxl_dpa_rwsem);
> >  
> > @@ -273,6 +325,7 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
> >  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> >  	struct cxl_mbox_clear_poison clear;
> >  	struct cxl_mbox_cmd mbox_cmd;
> > +	struct cxl_region *cxlr;
> >  	int rc;
> >  
> >  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> > @@ -303,7 +356,13 @@ int cxl_clear_poison(struct device *dev, u64 dpa)
> >  	};
> >  
> >  	rc = cxl_internal_send_cmd(cxlmd->cxlds, &mbox_cmd);
> > +	if (rc)
> > +		goto out;
> >  
> > +	cxlr = cxl_dpa_to_region(cxlmd, dpa);
> > +	if (cxlr)
> > +		dev_warn_once(dev, "poison clear dpa:0x%llx region: %s\n",
> > +			      dpa, dev_name(&cxlr->dev));
> >  out:
> >  	up_read(&cxl_dpa_rwsem);
> >  
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 03/12] cxl/memdev: Warn of poison inject or clear to a mapped region
  2023-04-11 17:43     ` Alison Schofield
@ 2023-04-13 17:07       ` Jonathan Cameron
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2023-04-13 17:07 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Dan Williams, Ira Weiny, Vishal Verma, Ben Widawsky, Dave Jiang,
	linux-cxl


> > Also, I "think" we just know that the association has been built.
> > Injecting poison is probably still fine if the region / decoder hasn't yet
> > been committed.  
> 
> I think you are right. If we want to allow inject in the space between
> mapping and commit, then this work needs to move to the region driver,
> similar to how cxl_get_poison_by_endpoint() in the get poison list
> series works.
> 
> I'm not seeing how injecting poison in that gap, would be an important
> debug scenario. Is it?
> 

Probably not ;) Maybe a comment to say that this is being conservative by
preventing it earlier than strictly necessary.

If this merged whilst I wasn't paying attention no need to add one.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-04-13 17:08 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-27  5:03 [PATCH v5 0/12] cxl: CXL Inject & Clear Poison alison.schofield
2023-03-27  5:03 ` [PATCH v5 01/12] cxl/memdev: Add support for the Inject Poison mailbox command alison.schofield
2023-03-30 18:47   ` Jonathan Cameron
2023-03-31 18:11   ` Dave Jiang
2023-03-31 18:52     ` Alison Schofield
2023-03-27  5:03 ` [PATCH v5 02/12] cxl/memdev: Add support for the Clear " alison.schofield
2023-03-30 18:50   ` Jonathan Cameron
2023-03-30 20:12     ` Alison Schofield
2023-04-03 14:08       ` Jonathan Cameron
2023-03-31 18:40   ` Dave Jiang
2023-03-31 19:55     ` Alison Schofield
2023-03-31 21:18       ` Dave Jiang
2023-03-27  5:03 ` [PATCH v5 03/12] cxl/memdev: Warn of poison inject or clear to a mapped region alison.schofield
2023-03-30 18:55   ` Jonathan Cameron
2023-04-11 17:43     ` Alison Schofield
2023-04-13 17:07       ` Jonathan Cameron
2023-03-27  5:03 ` [PATCH v5 04/12] cxl/memdev: Trace inject and clear poison as cxl_poison events alison.schofield
2023-03-30 19:03   ` Jonathan Cameron
2023-03-31 18:53   ` Dave Jiang
2023-03-27  5:03 ` [PATCH v5 05/12] cxl/mem: Add debugfs attributes for poison inject and clear alison.schofield
2023-03-30 18:58   ` Jonathan Cameron
2023-03-27  5:03 ` [PATCH v5 06/12] cxl/memdev: Make inject and clear poison cmds kernel exclusive alison.schofield
2023-03-31 19:10   ` Dave Jiang
2023-03-27  5:03 ` [PATCH v5 07/12] cxl/mbox: Block inject and clear poison opcodes in raw mode alison.schofield
2023-03-31 19:10   ` Dave Jiang
2023-03-27  5:03 ` [PATCH v5 08/12] tools/testing/cxl: Mock the Inject Poison mailbox command alison.schofield
2023-03-31 19:13   ` Dave Jiang
2023-03-27  5:03 ` [PATCH v5 09/12] tools/testing/cxl: Mock the Clear " alison.schofield
2023-03-31 19:15   ` Dave Jiang
2023-03-27  5:03 ` [PATCH v5 10/12] tools/testing/cxl: Use injected poison for get poison list alison.schofield
2023-03-31 19:16   ` Dave Jiang
2023-03-27  5:03 ` [PATCH v5 11/12] tools/testing/cxl: Add a sysfs attr to test poison inject limits alison.schofield
2023-03-31 19:18   ` Dave Jiang
2023-03-27  5:03 ` [PATCH v5 12/12] tools/testing/cxl: Require CONFIG_DEBUG_FS alison.schofield
2023-03-31 19:20   ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).