nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [ndctl PATCH v12 0/8] Support poison list retrieval
@ 2024-03-27 19:52 alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 1/8] util/trace: move trace helpers from ndctl/cxl/ to ndctl/util/ alison.schofield
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: alison.schofield @ 2024-03-27 19:52 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Changes since v11:
- Remove needless rc init (DaveJ)
- Update man page examples (Wonjae, Dan)
- Replace fprintf() w err() in libcxl.c  (Fan)
- Update stale --poison comment in unit test (Wonjae)
- Move ndctl/cxl/event_trace.c/.h to ndctl/util/  (Dan)
- Constify the pointer in poison_source array declaration
- Move the json flags from the poison_ctx to event_ctx
- Move intro of poison_ctx to parsing patch, Patch 6
- Add unsupported feature err() message in libcxl.c 
- v11: https://lore.kernel.org/cover.1710386468.git.alison.schofield@intel.com/


Begin cover letter:
Add the option to add a memory devices poison list to the cxl-list
json output. Offer the option by memdev and by region. 

From the man page cxl-list:

       -L, --media-errors
           Include media-error information. The poison list is retrieved from
           the device(s) and media_error records are added to the listing.
           Apply this option to memdevs and regions where devices support the
           poison list capability. "offset:" is relative to the region
           resource when listing by region and is the absolute device DPA when
           listing by memdev. "source:" is one of: External, Internal,
           Injected, Vendor Specific, or Unknown, as defined in CXL
           Specification v3.1 Table 8-140.

           # cxl list -m mem9 --media-errors -u
           {
             "memdev":"mem9",
             "pmem_size":"1024.00 MiB (1073.74 MB)",
             "pmem_qos_class":42,
             "ram_size":"1024.00 MiB (1073.74 MB)",
             "ram_qos_class":42,
             "serial":"0x5",
             "numa_node":1,
             "host":"cxl_mem.5",
             "media_errors":[
               {
                 "offset":"0x40000000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }

           # cxl list -r region5 --media-errors -u
           {
             "region":"region5",
             "resource":"0xf110000000",
             "size":"2.00 GiB (2.15 GB)",
             "type":"pmem",
             "interleave_ways":2,
             "interleave_granularity":4096,
             "decode_state":"commit",
             "media_errors":[
               {
                 "offset":"0x1000",
                 "length":64,
                 "source":"Injected"
               },
               {
                 "offset":"0x2000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }

           More complex cxl list queries can be created by using cxl list object
           and filtering options. The first example below emits all the endpoint
           ports with their decoders and memdevs with media-errors. The second
           example filters that output further by a single memdev.

              # cxl list -DEM -p endpoint --media-errors
              # cxl list -DEM -p mem9 --media-errors


Alison Schofield (8):
  util/trace: move trace helpers from ndctl/cxl/ to ndctl/util/
  util/trace: add an optional pid check to event parsing
  util/trace: pass an event_ctx to its own parse_event method
  util/trace: add helpers to retrieve tep fields by type
  libcxl: add interfaces for GET_POISON_LIST mailbox commands
  cxl/list: collect and parse media_error records
  cxl/list: add --media-errors option to cxl list
  cxl/test: add cxl-poison.sh unit test

 Documentation/cxl/cxl-list.txt |  64 ++++++++++-
 cxl/event_trace.h              |  27 -----
 cxl/filter.h                   |   3 +
 cxl/json.c                     | 195 +++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.c               |  53 +++++++++
 cxl/lib/libcxl.sym             |   2 +
 cxl/libcxl.h                   |   2 +
 cxl/list.c                     |   3 +
 cxl/meson.build                |   2 +-
 cxl/monitor.c                  |  11 +-
 test/cxl-poison.sh             | 137 +++++++++++++++++++++++
 test/meson.build               |   2 +
 {cxl => util}/event_trace.c    |  68 +++++++++---
 util/event_trace.h             |  42 +++++++
 14 files changed, 562 insertions(+), 49 deletions(-)
 delete mode 100644 cxl/event_trace.h
 create mode 100644 test/cxl-poison.sh
 rename {cxl => util}/event_trace.c (76%)
 create mode 100644 util/event_trace.h


base-commit: 5e9157d6721a878757f0fe8a3c51f06f9e94934a
-- 
2.37.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ndctl PATCH v12 1/8] util/trace: move trace helpers from ndctl/cxl/ to ndctl/util/
  2024-03-27 19:52 [ndctl PATCH v12 0/8] Support poison list retrieval alison.schofield
@ 2024-03-27 19:52 ` alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 2/8] util/trace: add an optional pid check to event parsing alison.schofield
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: alison.schofield @ 2024-03-27 19:52 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

A set of helpers used to parse kernel trace events were introduced
in ndctl/cxl/ in support of the CXL monitor command. The work these
helpers perform may be useful beyond CXL.

Move them to the ndctl/util/ where other generic helpers reside.
Replace cxl-ish naming with generic names and update the single
user, cxl/monitor.c, to match.

This move is in preparation for extending the helpers in support
of cxl_poison trace events.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 cxl/meson.build             |  2 +-
 cxl/monitor.c               | 11 +++++------
 {cxl => util}/event_trace.c | 21 ++++++++++-----------
 {cxl => util}/event_trace.h | 12 ++++++------
 4 files changed, 22 insertions(+), 24 deletions(-)
 rename {cxl => util}/event_trace.c (88%)
 rename {cxl => util}/event_trace.h (61%)

diff --git a/cxl/meson.build b/cxl/meson.build
index 61b4d8762b42..e4d1683ce8c6 100644
--- a/cxl/meson.build
+++ b/cxl/meson.build
@@ -27,7 +27,7 @@ deps = [
 
 if get_option('libtracefs').enabled()
   cxl_src += [
-    'event_trace.c',
+    '../util/event_trace.c',
     'monitor.c',
   ]
   deps += [
diff --git a/cxl/monitor.c b/cxl/monitor.c
index a85452a4dc82..2066f984668d 100644
--- a/cxl/monitor.c
+++ b/cxl/monitor.c
@@ -28,8 +28,7 @@
 #define ENABLE_DEBUG
 #endif
 #include <util/log.h>
-
-#include "event_trace.h"
+#include <util/event_trace.h>
 
 static const char *cxl_system = "cxl";
 const char *default_log = "/var/log/cxl-monitor.log";
@@ -87,9 +86,9 @@ static int monitor_event(struct cxl_ctx *ctx)
 		goto epoll_ctl_err;
 	}
 
-	rc = cxl_event_tracing_enable(inst, cxl_system, NULL);
+	rc = trace_event_enable(inst, cxl_system, NULL);
 	if (rc < 0) {
-		err(&monitor, "cxl_trace_event_enable() failed: %d\n", rc);
+		err(&monitor, "trace_event_enable() failed: %d\n", rc);
 		goto event_en_err;
 	}
 
@@ -112,7 +111,7 @@ static int monitor_event(struct cxl_ctx *ctx)
 		}
 
 		list_head_init(&ectx.jlist_head);
-		rc = cxl_parse_events(inst, &ectx);
+		rc = trace_event_parse(inst, &ectx);
 		if (rc < 0)
 			goto parse_err;
 
@@ -129,7 +128,7 @@ static int monitor_event(struct cxl_ctx *ctx)
 	}
 
 parse_err:
-	if (cxl_event_tracing_disable(inst) < 0)
+	if (trace_event_disable(inst) < 0)
 		err(&monitor, "failed to disable tracing\n");
 event_en_err:
 epoll_ctl_err:
diff --git a/cxl/event_trace.c b/util/event_trace.c
similarity index 88%
rename from cxl/event_trace.c
rename to util/event_trace.c
index 1b5aa09de8b2..16013412bc06 100644
--- a/cxl/event_trace.c
+++ b/util/event_trace.c
@@ -59,8 +59,8 @@ static struct json_object *num_to_json(void *num, int elem_size, unsigned long f
 	return json_object_new_int64(val);
 }
 
-static int cxl_event_to_json(struct tep_event *event, struct tep_record *record,
-			     struct list_head *jlist_head)
+static int event_to_json(struct tep_event *event, struct tep_record *record,
+			 struct list_head *jlist_head)
 {
 	struct json_object *jevent, *jobj, *jarray;
 	struct tep_format_field **fields;
@@ -200,8 +200,8 @@ err_jnode:
 	return rc;
 }
 
-static int cxl_event_parse(struct tep_event *event, struct tep_record *record,
-			   int cpu, void *ctx)
+static int event_parse(struct tep_event *event, struct tep_record *record,
+		       int cpu, void *ctx)
 {
 	struct event_ctx *event_ctx = (struct event_ctx *)ctx;
 
@@ -218,10 +218,10 @@ static int cxl_event_parse(struct tep_event *event, struct tep_record *record,
 		return event_ctx->parse_event(event, record,
 					      &event_ctx->jlist_head);
 
-	return cxl_event_to_json(event, record, &event_ctx->jlist_head);
+	return event_to_json(event, record, &event_ctx->jlist_head);
 }
 
-int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx)
+int trace_event_parse(struct tracefs_instance *inst, struct event_ctx *ectx)
 {
 	struct tep_handle *tep;
 	int rc;
@@ -230,14 +230,13 @@ int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx)
 	if (!tep)
 		return -ENOMEM;
 
-	rc = tracefs_iterate_raw_events(tep, inst, NULL, 0, cxl_event_parse,
-					ectx);
+	rc = tracefs_iterate_raw_events(tep, inst, NULL, 0, event_parse, ectx);
 	tep_free(tep);
 	return rc;
 }
 
-int cxl_event_tracing_enable(struct tracefs_instance *inst, const char *system,
-		const char *event)
+int trace_event_enable(struct tracefs_instance *inst, const char *system,
+		       const char *event)
 {
 	int rc;
 
@@ -252,7 +251,7 @@ int cxl_event_tracing_enable(struct tracefs_instance *inst, const char *system,
 	return 0;
 }
 
-int cxl_event_tracing_disable(struct tracefs_instance *inst)
+int trace_event_disable(struct tracefs_instance *inst)
 {
 	return tracefs_trace_off(inst);
 }
diff --git a/cxl/event_trace.h b/util/event_trace.h
similarity index 61%
rename from cxl/event_trace.h
rename to util/event_trace.h
index ec6267202c8b..37c39aded871 100644
--- a/cxl/event_trace.h
+++ b/util/event_trace.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /* Copyright (C) 2022 Intel Corporation. All rights reserved. */
-#ifndef __CXL_EVENT_TRACE_H__
-#define __CXL_EVENT_TRACE_H__
+#ifndef __UTIL_EVENT_TRACE_H__
+#define __UTIL_EVENT_TRACE_H__
 
 #include <json-c/json.h>
 #include <ccan/list/list.h>
@@ -19,9 +19,9 @@ struct event_ctx {
 			   struct list_head *jlist_head); /* optional */
 };
 
-int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx);
-int cxl_event_tracing_enable(struct tracefs_instance *inst, const char *system,
-		const char *event);
-int cxl_event_tracing_disable(struct tracefs_instance *inst);
+int trace_event_parse(struct tracefs_instance *inst, struct event_ctx *ectx);
+int trace_event_enable(struct tracefs_instance *inst, const char *system,
+		       const char *event);
+int trace_event_disable(struct tracefs_instance *inst);
 
 #endif
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [ndctl PATCH v12 2/8] util/trace: add an optional pid check to event parsing
  2024-03-27 19:52 [ndctl PATCH v12 0/8] Support poison list retrieval alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 1/8] util/trace: move trace helpers from ndctl/cxl/ to ndctl/util/ alison.schofield
@ 2024-03-27 19:52 ` alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 3/8] util/trace: pass an event_ctx to its own parse_event method alison.schofield
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: alison.schofield @ 2024-03-27 19:52 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Alison Schofield, nvdimm, linux-cxl, Jonathan Cameron, Dave Jiang

From: Alison Schofield <alison.schofield@intel.com>

When parsing events, callers may only be interested in events
that originate from the current process. Introduce an optional
argument to the event trace context: event_pid. When event_pid is
present, simply skip the parsing of events without a matching pid.
It is not a failure to see other, non matching events.

The initial use case for this is CXL device poison listings where
only the media-error records requested by this process are wanted.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 util/event_trace.c | 5 +++++
 util/event_trace.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/util/event_trace.c b/util/event_trace.c
index 16013412bc06..57318e2adace 100644
--- a/util/event_trace.c
+++ b/util/event_trace.c
@@ -214,6 +214,11 @@ static int event_parse(struct tep_event *event, struct tep_record *record,
 			return 0;
 	}
 
+	if (event_ctx->event_pid) {
+		if (event_ctx->event_pid != tep_data_pid(event->tep, record))
+			return 0;
+	}
+
 	if (event_ctx->parse_event)
 		return event_ctx->parse_event(event, record,
 					      &event_ctx->jlist_head);
diff --git a/util/event_trace.h b/util/event_trace.h
index 37c39aded871..6586e1dc254d 100644
--- a/util/event_trace.h
+++ b/util/event_trace.h
@@ -15,6 +15,7 @@ struct event_ctx {
 	const char *system;
 	struct list_head jlist_head;
 	const char *event_name; /* optional */
+	int event_pid; /* optional */
 	int (*parse_event)(struct tep_event *event, struct tep_record *record,
 			   struct list_head *jlist_head); /* optional */
 };
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [ndctl PATCH v12 3/8] util/trace: pass an event_ctx to its own parse_event method
  2024-03-27 19:52 [ndctl PATCH v12 0/8] Support poison list retrieval alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 1/8] util/trace: move trace helpers from ndctl/cxl/ to ndctl/util/ alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 2/8] util/trace: add an optional pid check to event parsing alison.schofield
@ 2024-03-27 19:52 ` alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 4/8] util/trace: add helpers to retrieve tep fields by type alison.schofield
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: alison.schofield @ 2024-03-27 19:52 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Tidy-up the calling convention used in trace event parsing by
passing the entire event_ctx to its parse_event method. This
makes it explicit that a parse_event operates on an event_ctx
object and it allows the parse_event function to access any
members of the event_ctx structure.

This is in preparation for adding a private parser requiring more
context for cxl_poison events.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 util/event_trace.c | 9 ++++-----
 util/event_trace.h | 2 +-
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/util/event_trace.c b/util/event_trace.c
index 57318e2adace..1f5c180a030b 100644
--- a/util/event_trace.c
+++ b/util/event_trace.c
@@ -60,7 +60,7 @@ static struct json_object *num_to_json(void *num, int elem_size, unsigned long f
 }
 
 static int event_to_json(struct tep_event *event, struct tep_record *record,
-			 struct list_head *jlist_head)
+			 struct event_ctx *ctx)
 {
 	struct json_object *jevent, *jobj, *jarray;
 	struct tep_format_field **fields;
@@ -190,7 +190,7 @@ static int event_to_json(struct tep_event *event, struct tep_record *record,
 		}
 	}
 
-	list_add_tail(jlist_head, &jnode->list);
+	list_add_tail(&ctx->jlist_head, &jnode->list);
 	return 0;
 
 err_jevent:
@@ -220,10 +220,9 @@ static int event_parse(struct tep_event *event, struct tep_record *record,
 	}
 
 	if (event_ctx->parse_event)
-		return event_ctx->parse_event(event, record,
-					      &event_ctx->jlist_head);
+		return event_ctx->parse_event(event, record, event_ctx);
 
-	return event_to_json(event, record, &event_ctx->jlist_head);
+	return event_to_json(event, record, event_ctx);
 }
 
 int trace_event_parse(struct tracefs_instance *inst, struct event_ctx *ectx)
diff --git a/util/event_trace.h b/util/event_trace.h
index 6586e1dc254d..9c53eba7533f 100644
--- a/util/event_trace.h
+++ b/util/event_trace.h
@@ -17,7 +17,7 @@ struct event_ctx {
 	const char *event_name; /* optional */
 	int event_pid; /* optional */
 	int (*parse_event)(struct tep_event *event, struct tep_record *record,
-			   struct list_head *jlist_head); /* optional */
+			   struct event_ctx *ctx);
 };
 
 int trace_event_parse(struct tracefs_instance *inst, struct event_ctx *ectx);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [ndctl PATCH v12 4/8] util/trace: add helpers to retrieve tep fields by type
  2024-03-27 19:52 [ndctl PATCH v12 0/8] Support poison list retrieval alison.schofield
                   ` (2 preceding siblings ...)
  2024-03-27 19:52 ` [ndctl PATCH v12 3/8] util/trace: pass an event_ctx to its own parse_event method alison.schofield
@ 2024-03-27 19:52 ` alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 5/8] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: alison.schofield @ 2024-03-27 19:52 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl, Dave Jiang, Fan Ni

From: Alison Schofield <alison.schofield@intel.com>

Add helpers to extract the value of an event record field given the
field name. This is useful when the user knows the name and format
of the field and simply needs to get it. The helpers also return
the 'type'_MAX of the type when the field is

Since this is in preparation for adding a cxl_poison private parser
for 'cxl list --media-errors' support those specific required
types: u8, u32, u64.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
---
 util/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++
 util/event_trace.h |  8 +++++++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/util/event_trace.c b/util/event_trace.c
index 1f5c180a030b..bde3a76adfbf 100644
--- a/util/event_trace.c
+++ b/util/event_trace.c
@@ -15,6 +15,43 @@
 #define _GNU_SOURCE
 #include <string.h>
 
+u64 trace_get_field_u64(struct tep_event *event, struct tep_record *record,
+			const char *name)
+{
+	unsigned long long val;
+
+	if (tep_get_field_val(NULL, event, name, record, &val, 0))
+		return ULLONG_MAX;
+
+	return val;
+}
+
+u32 trace_get_field_u32(struct tep_event *event, struct tep_record *record,
+			const char *name)
+{
+	char *val;
+	int len;
+
+	val = tep_get_field_raw(NULL, event, name, record, &len, 0);
+	if (!val)
+		return UINT_MAX;
+
+	return *(u32 *)val;
+}
+
+u8 trace_get_field_u8(struct tep_event *event, struct tep_record *record,
+		      const char *name)
+{
+	char *val;
+	int len;
+
+	val = tep_get_field_raw(NULL, event, name, record, &len, 0);
+	if (!val)
+		return UCHAR_MAX;
+
+	return *(u8 *)val;
+}
+
 static struct json_object *num_to_json(void *num, int elem_size, unsigned long flags)
 {
 	bool sign = flags & TEP_FIELD_IS_SIGNED;
diff --git a/util/event_trace.h b/util/event_trace.h
index 9c53eba7533f..4d498577a00f 100644
--- a/util/event_trace.h
+++ b/util/event_trace.h
@@ -5,6 +5,7 @@
 
 #include <json-c/json.h>
 #include <ccan/list/list.h>
+#include <ccan/short_types/short_types.h>
 
 struct jlist_node {
 	struct json_object *jobj;
@@ -24,5 +25,10 @@ int trace_event_parse(struct tracefs_instance *inst, struct event_ctx *ectx);
 int trace_event_enable(struct tracefs_instance *inst, const char *system,
 		       const char *event);
 int trace_event_disable(struct tracefs_instance *inst);
-
+u8 trace_get_field_u8(struct tep_event *event, struct tep_record *record,
+		      const char *name);
+u32 trace_get_field_u32(struct tep_event *event, struct tep_record *record,
+			const char *name);
+u64 trace_get_field_u64(struct tep_event *event, struct tep_record *record,
+			const char *name);
 #endif
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [ndctl PATCH v12 5/8] libcxl: add interfaces for GET_POISON_LIST mailbox commands
  2024-03-27 19:52 [ndctl PATCH v12 0/8] Support poison list retrieval alison.schofield
                   ` (3 preceding siblings ...)
  2024-03-27 19:52 ` [ndctl PATCH v12 4/8] util/trace: add helpers to retrieve tep fields by type alison.schofield
@ 2024-03-27 19:52 ` alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 6/8] cxl/list: collect and parse media_error records alison.schofield
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: alison.schofield @ 2024-03-27 19:52 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl, Dave Jiang

From: Alison Schofield <alison.schofield@intel.com>

CXL devices maintain a list of locations that are poisoned or result
in poison if the addresses are accessed by the host.

Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison
List as a set of  Media Error Records that include the source of the
error, the starting device physical address and length.

Trigger the retrieval of the poison list by writing to the memory
device sysfs attribute: trigger_poison_list. The CXL driver only
offers triggering per memdev, so the trigger by region interface
offered here is a convenience API that triggers a poison list
retrieval for each memdev contributing to a region.

int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
int cxl_region_trigger_poison_list(struct cxl_region *region);

The resulting poison records are logged as kernel trace events
named 'cxl_poison'.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 cxl/lib/libcxl.c   | 53 ++++++++++++++++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.sym |  2 ++
 cxl/libcxl.h       |  2 ++
 3 files changed, 57 insertions(+)

diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index ff27cdf7c44a..a8ce521fdcf9 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -1761,6 +1761,59 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev)
 	return 0;
 }
 
+CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev)
+{
+	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
+	char *path = memdev->dev_buf;
+	int len = memdev->buf_len, rc;
+
+	if (snprintf(path, len, "%s/trigger_poison_list",
+		     memdev->dev_path) >= len) {
+		err(ctx, "%s: buffer too small\n",
+		    cxl_memdev_get_devname(memdev));
+		return -ENXIO;
+	}
+
+	if (access(path, F_OK) != 0) {
+		err(ctx, "%s: trigger_poison_list unsupported by device\n",
+		    cxl_memdev_get_devname(memdev));
+		return -ENXIO;
+	}
+
+	rc = sysfs_write_attr(ctx, path, "1\n");
+	if (rc < 0) {
+		err(ctx, "%s: Failed trigger_poison_list\n",
+		    cxl_memdev_get_devname(memdev));
+		return rc;
+	}
+	return 0;
+}
+
+CXL_EXPORT int cxl_region_trigger_poison_list(struct cxl_region *region)
+{
+	struct cxl_memdev_mapping *mapping;
+	int rc;
+
+	cxl_mapping_foreach(region, mapping) {
+		struct cxl_decoder *decoder;
+		struct cxl_memdev *memdev;
+
+		decoder = cxl_mapping_get_decoder(mapping);
+		if (!decoder)
+			continue;
+
+		memdev = cxl_decoder_get_memdev(decoder);
+		if (!memdev)
+			continue;
+
+		rc = cxl_memdev_trigger_poison_list(memdev);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
 CXL_EXPORT int cxl_memdev_enable(struct cxl_memdev *memdev)
 {
 	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index de2cd84b2960..3f709c60db3d 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -280,4 +280,6 @@ global:
 	cxl_memdev_get_pmem_qos_class;
 	cxl_memdev_get_ram_qos_class;
 	cxl_region_qos_class_mismatch;
+	cxl_memdev_trigger_poison_list;
+	cxl_region_trigger_poison_list;
 } LIBCXL_6;
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index a6af3fb04693..29165043ca3f 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -467,6 +467,8 @@ enum cxl_setpartition_mode {
 
 int cxl_cmd_partition_set_mode(struct cxl_cmd *cmd,
 		enum cxl_setpartition_mode mode);
+int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
+int cxl_region_trigger_poison_list(struct cxl_region *region);
 
 int cxl_cmd_alert_config_set_life_used_prog_warn_threshold(struct cxl_cmd *cmd,
 							   int threshold);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [ndctl PATCH v12 6/8] cxl/list: collect and parse media_error records
  2024-03-27 19:52 [ndctl PATCH v12 0/8] Support poison list retrieval alison.schofield
                   ` (4 preceding siblings ...)
  2024-03-27 19:52 ` [ndctl PATCH v12 5/8] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
@ 2024-03-27 19:52 ` alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 7/8] cxl/list: add --media-errors option to cxl list alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 8/8] cxl/test: add cxl-poison.sh unit test alison.schofield
  7 siblings, 0 replies; 9+ messages in thread
From: alison.schofield @ 2024-03-27 19:52 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl, Dave Jiang

From: Alison Schofield <alison.schofield@intel.com>

Media_error records are logged as events in the kernel tracing
subsystem. To prepare the media_error records for cxl list, enable
tracing, trigger the poison list read, and parse the generated
cxl_poison events into a json representation.

Use the event_trace private parsing option to customize the json
representation based on cxl-list calling options and event field
settings.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 cxl/json.c         | 195 +++++++++++++++++++++++++++++++++++++++++++++
 util/event_trace.h |   8 ++
 2 files changed, 203 insertions(+)

diff --git a/cxl/json.c b/cxl/json.c
index fbe41c78e82a..e3f54cc2568c 100644
--- a/cxl/json.c
+++ b/cxl/json.c
@@ -1,16 +1,20 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (C) 2015-2021 Intel Corporation. All rights reserved.
 #include <limits.h>
+#include <errno.h>
 #include <util/json.h>
+#include <util/bitmap.h>
 #include <uuid/uuid.h>
 #include <cxl/libcxl.h>
 #include <json-c/json.h>
 #include <json-c/printbuf.h>
 #include <ccan/short_types/short_types.h>
+#include <tracefs/tracefs.h>
 
 #include "filter.h"
 #include "json.h"
 #include "../daxctl/json.h"
+#include "../util/event_trace.h"
 
 #define CXL_FW_VERSION_STR_LEN	16
 #define CXL_FW_MAX_SLOTS	4
@@ -571,6 +575,185 @@ err_jobj:
 	return NULL;
 }
 
+/* CXL Spec 3.1 Table 8-140 Media Error Record */
+#define CXL_POISON_SOURCE_MAX 7
+static const char * const poison_source[] = { "Unknown", "External", "Internal",
+					     "Injected", "Reserved", "Reserved",
+					     "Reserved", "Vendor" };
+
+/* CXL Spec 3.1 Table 8-139 Get Poison List Output Payload */
+#define CXL_POISON_FLAG_MORE BIT(0)
+#define CXL_POISON_FLAG_OVERFLOW BIT(1)
+#define CXL_POISON_FLAG_SCANNING BIT(2)
+
+static int poison_event_to_json(struct tep_event *event,
+				struct tep_record *record,
+				struct event_ctx *e_ctx)
+{
+	struct cxl_poison_ctx *p_ctx = e_ctx->poison_ctx;
+	struct json_object *jp, *jobj, *jpoison = p_ctx->jpoison;
+	struct cxl_memdev *memdev = p_ctx->memdev;
+	struct cxl_region *region = p_ctx->region;
+	unsigned long flags = e_ctx->json_flags;
+	const char *region_name = NULL;
+	char flag_str[32] = { '\0' };
+	bool overflow = false;
+	u8 source, pflags;
+	u64 offset, ts;
+	u32 length;
+	char *str;
+	int len;
+
+	jp = json_object_new_object();
+	if (!jp)
+		return -ENOMEM;
+
+	/* Skip records not in this region when listing by region */
+	if (region)
+		region_name = cxl_region_get_devname(region);
+	if (region_name)
+		str = tep_get_field_raw(NULL, event, "region", record, &len, 0);
+	if ((region_name) && (strcmp(region_name, str) != 0)) {
+		json_object_put(jp);
+		return 0;
+	}
+	/* Include offset,length by region (hpa) or by memdev (dpa) */
+	if (region) {
+		offset = trace_get_field_u64(event, record, "hpa");
+		if (offset != ULLONG_MAX) {
+			offset = offset - cxl_region_get_resource(region);
+			jobj = util_json_object_hex(offset, flags);
+			if (jobj)
+				json_object_object_add(jp, "offset", jobj);
+		}
+	} else if (memdev) {
+		offset = trace_get_field_u64(event, record, "dpa");
+		if (offset != ULLONG_MAX) {
+			jobj = util_json_object_hex(offset, flags);
+			if (jobj)
+				json_object_object_add(jp, "offset", jobj);
+		}
+	}
+	length = trace_get_field_u32(event, record, "dpa_length");
+	jobj = util_json_object_size(length, flags);
+	if (jobj)
+		json_object_object_add(jp, "length", jobj);
+
+	/* Always include the poison source */
+	source = trace_get_field_u8(event, record, "source");
+	if (source <= CXL_POISON_SOURCE_MAX)
+		jobj = json_object_new_string(poison_source[source]);
+	else
+		jobj = json_object_new_string("Reserved");
+	if (jobj)
+		json_object_object_add(jp, "source", jobj);
+
+	/* Include flags and overflow time if present */
+	pflags = trace_get_field_u8(event, record, "flags");
+	if (pflags && pflags < UCHAR_MAX) {
+		if (pflags & CXL_POISON_FLAG_MORE)
+			strcat(flag_str, "More,");
+		if (pflags & CXL_POISON_FLAG_SCANNING)
+			strcat(flag_str, "Scanning,");
+		if (pflags & CXL_POISON_FLAG_OVERFLOW) {
+			strcat(flag_str, "Overflow,");
+			overflow = true;
+		}
+		jobj = json_object_new_string(flag_str);
+		if (jobj)
+			json_object_object_add(jp, "flags", jobj);
+	}
+	if (overflow) {
+		ts = trace_get_field_u64(event, record, "overflow_ts");
+		jobj = util_json_object_hex(ts, flags);
+		if (jobj)
+			json_object_object_add(jp, "overflow_t", jobj);
+	}
+	json_object_array_add(jpoison, jp);
+
+	return 0;
+}
+
+static struct json_object *
+util_cxl_poison_events_to_json(struct tracefs_instance *inst,
+			       struct cxl_poison_ctx *p_ctx,
+			       unsigned long flags)
+{
+	struct event_ctx ectx = {
+		.event_name = "cxl_poison",
+		.event_pid = getpid(),
+		.system = "cxl",
+		.poison_ctx = p_ctx,
+		.json_flags = flags,
+		.parse_event = poison_event_to_json,
+	};
+	int rc;
+
+	p_ctx->jpoison = json_object_new_array();
+	if (!p_ctx->jpoison)
+		return NULL;
+
+	rc = trace_event_parse(inst, &ectx);
+	if (rc < 0) {
+		fprintf(stderr, "Failed to parse events: %d\n", rc);
+		goto put_jobj;
+	}
+	if (json_object_array_length(p_ctx->jpoison) == 0)
+		goto put_jobj;
+
+	return p_ctx->jpoison;
+
+put_jobj:
+	json_object_put(p_ctx->jpoison);
+	return NULL;
+}
+
+static struct json_object *
+util_cxl_poison_list_to_json(struct cxl_region *region,
+			     struct cxl_memdev *memdev,
+			     unsigned long flags)
+{
+	struct json_object *jpoison = NULL;
+	struct cxl_poison_ctx p_ctx;
+	struct tracefs_instance *inst;
+	int rc;
+
+	inst = tracefs_instance_create("cxl list");
+	if (!inst) {
+		fprintf(stderr, "tracefs_instance_create() failed\n");
+		return NULL;
+	}
+
+	rc = trace_event_enable(inst, "cxl", "cxl_poison");
+	if (rc < 0) {
+		fprintf(stderr, "Failed to enable trace: %d\n", rc);
+		goto err_free;
+	}
+
+	if (region)
+		rc = cxl_region_trigger_poison_list(region);
+	else
+		rc = cxl_memdev_trigger_poison_list(memdev);
+	if (rc)
+		goto err_free;
+
+	rc = trace_event_disable(inst);
+	if (rc < 0) {
+		fprintf(stderr, "Failed to disable trace: %d\n", rc);
+		goto err_free;
+	}
+
+	p_ctx = (struct cxl_poison_ctx){
+		.region = region,
+		.memdev = memdev,
+	};
+	jpoison = util_cxl_poison_events_to_json(inst, &p_ctx, flags);
+
+err_free:
+	tracefs_instance_free(inst);
+	return jpoison;
+}
+
 struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
 		unsigned long flags)
 {
@@ -664,6 +847,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
 			json_object_object_add(jdev, "firmware", jobj);
 	}
 
+	if (flags & UTIL_JSON_MEDIA_ERRORS) {
+		jobj = util_cxl_poison_list_to_json(NULL, memdev, flags);
+		if (jobj)
+			json_object_object_add(jdev, "media_errors", jobj);
+	}
+
 	json_object_set_userdata(jdev, memdev, NULL);
 	return jdev;
 }
@@ -1012,6 +1201,12 @@ struct json_object *util_cxl_region_to_json(struct cxl_region *region,
 			json_object_object_add(jregion, "state", jobj);
 	}
 
+	if (flags & UTIL_JSON_MEDIA_ERRORS) {
+		jobj = util_cxl_poison_list_to_json(region, NULL, flags);
+		if (jobj)
+			json_object_object_add(jregion, "media_errors", jobj);
+	}
+
 	util_cxl_mappings_append_json(jregion, region, flags);
 
 	if (flags & UTIL_JSON_DAX) {
diff --git a/util/event_trace.h b/util/event_trace.h
index 4d498577a00f..a87407ff8296 100644
--- a/util/event_trace.h
+++ b/util/event_trace.h
@@ -12,11 +12,19 @@ struct jlist_node {
 	struct list_node list;
 };
 
+struct cxl_poison_ctx {
+	struct json_object *jpoison;
+	struct cxl_region *region;
+	struct cxl_memdev *memdev;
+};
+
 struct event_ctx {
 	const char *system;
 	struct list_head jlist_head;
 	const char *event_name; /* optional */
 	int event_pid; /* optional */
+	struct cxl_poison_ctx *poison_ctx; /* optional */
+	unsigned long json_flags;
 	int (*parse_event)(struct tep_event *event, struct tep_record *record,
 			   struct event_ctx *ctx);
 };
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [ndctl PATCH v12 7/8] cxl/list: add --media-errors option to cxl list
  2024-03-27 19:52 [ndctl PATCH v12 0/8] Support poison list retrieval alison.schofield
                   ` (5 preceding siblings ...)
  2024-03-27 19:52 ` [ndctl PATCH v12 6/8] cxl/list: collect and parse media_error records alison.schofield
@ 2024-03-27 19:52 ` alison.schofield
  2024-03-27 19:52 ` [ndctl PATCH v12 8/8] cxl/test: add cxl-poison.sh unit test alison.schofield
  7 siblings, 0 replies; 9+ messages in thread
From: alison.schofield @ 2024-03-27 19:52 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl, Dave Jiang

From: Alison Schofield <alison.schofield@intel.com>

The --media-errors option to 'cxl list' retrieves poison lists from
memory devices supporting the capability and displays the returned
media_error records in the cxl list json. This option can apply to
memdevs or regions.

Include media-errors in the -vvv verbose option.

Example usage in the Documentation/cxl/cxl-list.txt update.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 Documentation/cxl/cxl-list.txt | 64 +++++++++++++++++++++++++++++++++-
 cxl/filter.h                   |  3 ++
 cxl/list.c                     |  3 ++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
index 838de4086678..8a030ba7c3b1 100644
--- a/Documentation/cxl/cxl-list.txt
+++ b/Documentation/cxl/cxl-list.txt
@@ -415,6 +415,68 @@ OPTIONS
 --region::
 	Specify CXL region device name(s), or device id(s), to filter the listing.
 
+-L::
+--media-errors::
+	Include media-error information. The poison list is retrieved from the
+	device(s) and media_error records are added to the listing. Apply this
+	option to memdevs and regions where devices support the poison list
+	capability. "offset:" is relative to the region resource when listing
+	by region and is the absolute device DPA when listing by memdev.
+	"source:" is one of: External, Internal, Injected, Vendor Specific,
+	or Unknown, as defined in CXL Specification v3.1 Table 8-140.
+
+----
+# cxl list -m mem9 --media-errors -u
+{
+  "memdev":"mem9",
+  "pmem_size":"1024.00 MiB (1073.74 MB)",
+  "pmem_qos_class":42,
+  "ram_size":"1024.00 MiB (1073.74 MB)",
+  "ram_qos_class":42,
+  "serial":"0x5",
+  "numa_node":1,
+  "host":"cxl_mem.5",
+  "media_errors":[
+    {
+      "offset":"0x40000000",
+      "length":64,
+      "source":"Injected"
+    }
+  ]
+}
+
+# cxl list -r region5 --media-errors -u
+{
+  "region":"region5",
+  "resource":"0xf110000000",
+  "size":"2.00 GiB (2.15 GB)",
+  "type":"pmem",
+  "interleave_ways":2,
+  "interleave_granularity":4096,
+  "decode_state":"commit",
+  "media_errors":[
+    {
+      "offset":"0x1000",
+      "length":64,
+      "source":"Injected"
+    },
+    {
+      "offset":"0x2000",
+      "length":64,
+      "source":"Injected"
+    }
+  ]
+}
+----
+More complex cxl list queries can be created by using cxl list object and
+filtering options. The first example below emits all the endpoint ports
+with their decoders and memdevs with media-errors. The second example filters
+that output further by a single memdev.
+----
+# cxl list -DEM -p endpoint --media-errors
+# cxl list -DEM -p mem9 --media-errors
+----
+
 -v::
 --verbose::
 	Increase verbosity of the output. This can be specified
@@ -431,7 +493,7 @@ OPTIONS
 	  devices with --idle.
 	- *-vvv*
 	  Everything *-vv* provides, plus enable
-	  --health and --partition.
+	  --health, --partition, --media-errors.
 
 --debug::
 	If the cxl tool was built with debug enabled, turn on debug
diff --git a/cxl/filter.h b/cxl/filter.h
index 3f65990f835a..956a46e0c7a9 100644
--- a/cxl/filter.h
+++ b/cxl/filter.h
@@ -30,6 +30,7 @@ struct cxl_filter_params {
 	bool fw;
 	bool alert_config;
 	bool dax;
+	bool media_errors;
 	int verbose;
 	struct log_ctx ctx;
 };
@@ -88,6 +89,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
 		flags |= UTIL_JSON_ALERT_CONFIG;
 	if (param->dax)
 		flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
+	if (param->media_errors)
+		flags |= UTIL_JSON_MEDIA_ERRORS;
 	return flags;
 }
 
diff --git a/cxl/list.c b/cxl/list.c
index 93ba51ef895c..0b25d78248d5 100644
--- a/cxl/list.c
+++ b/cxl/list.c
@@ -57,6 +57,8 @@ static const struct option options[] = {
 		    "include memory device firmware information"),
 	OPT_BOOLEAN('A', "alert-config", &param.alert_config,
 		    "include alert configuration information"),
+	OPT_BOOLEAN('L', "media-errors", &param.media_errors,
+		    "include media-error information "),
 	OPT_INCR('v', "verbose", &param.verbose, "increase output detail"),
 #ifdef ENABLE_DEBUG
 	OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
@@ -121,6 +123,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
 		param.fw = true;
 		param.alert_config = true;
 		param.dax = true;
+		param.media_errors = true;
 		/* fallthrough */
 	case 2:
 		param.idle = true;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [ndctl PATCH v12 8/8] cxl/test: add cxl-poison.sh unit test
  2024-03-27 19:52 [ndctl PATCH v12 0/8] Support poison list retrieval alison.schofield
                   ` (6 preceding siblings ...)
  2024-03-27 19:52 ` [ndctl PATCH v12 7/8] cxl/list: add --media-errors option to cxl list alison.schofield
@ 2024-03-27 19:52 ` alison.schofield
  7 siblings, 0 replies; 9+ messages in thread
From: alison.schofield @ 2024-03-27 19:52 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl, Dave Jiang

From: Alison Schofield <alison.schofield@intel.com>

Exercise cxl list, libcxl, and driver pieces of the get poison list
pathway. Inject and clear poison using debugfs and use cxl-cli to
read the poison list by memdev and by region.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 test/cxl-poison.sh | 137 +++++++++++++++++++++++++++++++++++++++++++++
 test/meson.build   |   2 +
 2 files changed, 139 insertions(+)
 create mode 100644 test/cxl-poison.sh

diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh
new file mode 100644
index 000000000000..2caf092db460
--- /dev/null
+++ b/test/cxl-poison.sh
@@ -0,0 +1,137 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2023 Intel Corporation. All rights reserved.
+
+. "$(dirname "$0")"/common
+
+rc=77
+
+set -ex
+
+trap 'err $LINENO' ERR
+
+check_prereq "jq"
+
+modprobe -r cxl_test
+modprobe cxl_test
+
+rc=1
+
+# THEORY OF OPERATION: Exercise cxl-cli and cxl driver ability to
+# inject, clear, and get the poison list. Do it by memdev and by region.
+
+find_memdev()
+{
+	readarray -t capable_mems < <("$CXL" list -b "$CXL_TEST_BUS" -M |
+		jq -r ".[] | select(.pmem_size != null) |
+		select(.ram_size != null) | .memdev")
+
+	if [ ${#capable_mems[@]} == 0 ]; then
+		echo "no memdevs found for test"
+		err "$LINENO"
+	fi
+
+	memdev=${capable_mems[0]}
+}
+
+create_x2_region()
+{
+	# Find an x2 decoder
+	decoder="$($CXL list -b "$CXL_TEST_BUS" -D -d root | jq -r ".[] |
+		select(.pmem_capable == true) |
+		select(.nr_targets == 2) |
+		.decoder")"
+
+	# Find a memdev for each host-bridge interleave position
+	port_dev0="$($CXL list -T -d "$decoder" | jq -r ".[] |
+		.targets | .[] | select(.position == 0) | .target")"
+	port_dev1="$($CXL list -T -d "$decoder" | jq -r ".[] |
+		.targets | .[] | select(.position == 1) | .target")"
+	mem0="$($CXL list -M -p "$port_dev0" | jq -r ".[0].memdev")"
+	mem1="$($CXL list -M -p "$port_dev1" | jq -r ".[0].memdev")"
+
+	region="$($CXL create-region -d "$decoder" -m "$mem0" "$mem1" |
+		jq -r ".region")"
+	if [[ ! $region ]]; then
+		echo "create-region failed for $decoder"
+		err "$LINENO"
+	fi
+	echo "$region"
+}
+
+# When cxl-cli support for inject and clear arrives, replace
+# the writes to /sys/kernel/debug with the new cxl commands.
+
+inject_poison_sysfs()
+{
+	memdev="$1"
+	addr="$2"
+
+	echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
+}
+
+clear_poison_sysfs()
+{
+	memdev="$1"
+	addr="$2"
+
+	echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
+}
+
+validate_poison_found()
+{
+	list_by="$1"
+	nr_expect="$2"
+
+	poison_list="$($CXL list "$list_by" --media-errors |
+		jq -r '.[].media_errors')"
+	if [[ ! $poison_list ]]; then
+		nr_found=0
+	else
+		nr_found=$(jq "length" <<< "$poison_list")
+	fi
+	if [ "$nr_found" -ne "$nr_expect" ]; then
+		echo "$nr_expect poison records expected, $nr_found found"
+		err "$LINENO"
+	fi
+}
+
+test_poison_by_memdev()
+{
+	find_memdev
+	inject_poison_sysfs "$memdev" "0x40000000"
+	inject_poison_sysfs "$memdev" "0x40001000"
+	inject_poison_sysfs "$memdev" "0x600"
+	inject_poison_sysfs "$memdev" "0x0"
+	validate_poison_found "-m $memdev" 4
+
+	clear_poison_sysfs "$memdev" "0x40000000"
+	clear_poison_sysfs "$memdev" "0x40001000"
+	clear_poison_sysfs "$memdev" "0x600"
+	clear_poison_sysfs "$memdev" "0x0"
+	validate_poison_found "-m $memdev" 0
+}
+
+test_poison_by_region()
+{
+	create_x2_region
+	inject_poison_sysfs "$mem0" "0x40000000"
+	inject_poison_sysfs "$mem1" "0x40000000"
+	validate_poison_found "-r $region" 2
+
+	clear_poison_sysfs "$mem0" "0x40000000"
+	clear_poison_sysfs "$mem1" "0x40000000"
+	validate_poison_found "-r $region" 0
+}
+
+# Turn tracing on. Note that 'cxl list --media-errors' toggles the tracing.
+# Turning it on here allows the test user to also view inject and clear
+# trace events.
+echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable
+
+test_poison_by_memdev
+test_poison_by_region
+
+check_dmesg "$LINENO"
+
+modprobe -r cxl-test
diff --git a/test/meson.build b/test/meson.build
index a965a79fd6cb..d871e28e17ce 100644
--- a/test/meson.build
+++ b/test/meson.build
@@ -160,6 +160,7 @@ cxl_events = find_program('cxl-events.sh')
 cxl_sanitize = find_program('cxl-sanitize.sh')
 cxl_destroy_region = find_program('cxl-destroy-region.sh')
 cxl_qos_class = find_program('cxl-qos-class.sh')
+cxl_poison = find_program('cxl-poison.sh')
 
 tests = [
   [ 'libndctl',               libndctl,		  'ndctl' ],
@@ -192,6 +193,7 @@ tests = [
   [ 'cxl-sanitize.sh',        cxl_sanitize,       'cxl'   ],
   [ 'cxl-destroy-region.sh',  cxl_destroy_region, 'cxl'   ],
   [ 'cxl-qos-class.sh',       cxl_qos_class,      'cxl'   ],
+  [ 'cxl-poison.sh',          cxl_poison,         'cxl'   ],
 ]
 
 if get_option('destructive').enabled()
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-03-27 19:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-27 19:52 [ndctl PATCH v12 0/8] Support poison list retrieval alison.schofield
2024-03-27 19:52 ` [ndctl PATCH v12 1/8] util/trace: move trace helpers from ndctl/cxl/ to ndctl/util/ alison.schofield
2024-03-27 19:52 ` [ndctl PATCH v12 2/8] util/trace: add an optional pid check to event parsing alison.schofield
2024-03-27 19:52 ` [ndctl PATCH v12 3/8] util/trace: pass an event_ctx to its own parse_event method alison.schofield
2024-03-27 19:52 ` [ndctl PATCH v12 4/8] util/trace: add helpers to retrieve tep fields by type alison.schofield
2024-03-27 19:52 ` [ndctl PATCH v12 5/8] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
2024-03-27 19:52 ` [ndctl PATCH v12 6/8] cxl/list: collect and parse media_error records alison.schofield
2024-03-27 19:52 ` [ndctl PATCH v12 7/8] cxl/list: add --media-errors option to cxl list alison.schofield
2024-03-27 19:52 ` [ndctl PATCH v12 8/8] cxl/test: add cxl-poison.sh unit test alison.schofield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).