nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [ndctl PATCH v11 0/7] Support poison list retrieval
@ 2024-03-14  4:05 alison.schofield
  2024-03-14  4:05 ` [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: alison.schofield @ 2024-03-14  4:05 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Changes since v10:
- Use offset, length notation in json output (Dan)
- Remove endpoint decoder from json output
- Man page updates to reflect above changes
- Remove open coded tep_find_field() (Dan)
- Use raw instead of custom string helper 
- Use get_field_val() in u8,32,64 helpers instead of _raw (Dan)
- Pass event_ctx to its own parsing method as a typical 'this' pointer (Dan)
- Replace private_ctx w poison_ctx in event_ctx. This addresses Dan's
  feedback to avoid a void* but stops short of his suggestion to wrap
  event_ctx in a private_ctx for this single use case.
- v10: https://lore.kernel.org/linux-cxl/cover.1709748564.git.alison.schofield@intel.com/


Begin cover letter:
Add the option to add a memory devices poison list to the cxl-list
json output. Offer the option by memdev and by region. 

From the man page cxl-list:

       -L, --media-errors
           Include media-error information. The poison list is retrieved from
           the device(s) and media_error records are added to the listing.
           Apply this option to memdevs and regions where devices support the
           poison list capability. "offset:" is relative to the region
           resource when listing by region and is the absolute device DPA when
           listing by memdev. "source:" is one of: External, Internal,
           Injected, Vendor Specific, or Unknown, as defined in CXL
           Specification v3.1 Table 8-140.

           # cxl list -m mem9 --media-errors -u
           {
             "memdev":"mem9",
             "pmem_size":"1024.00 MiB (1073.74 MB)",
             "pmem_qos_class":42,
             "ram_size":"1024.00 MiB (1073.74 MB)",
             "ram_qos_class":42,
             "serial":"0x5",
             "numa_node":1,
             "host":"cxl_mem.5",
             "media_errors":[
               {
                 "offset":"0x40000000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }

       In the above example, region mappings can be found using: "cxl list -p
       mem9 --decoders"

           # cxl list -r region5 --media-errors -u
           {
             "region":"region5",
             "resource":"0xf110000000",
             "size":"2.00 GiB (2.15 GB)",
             "type":"pmem",
             "interleave_ways":2,
             "interleave_granularity":4096,
             "decode_state":"commit",
             "media_errors":[
               {
                 "offset":"0x1000",
                 "length":64,
                 "source":"Injected"
               },
               {
                 "offset":"0x2000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }

       In the above example, memdev mappings can be found using: "cxl list -r
       region5 --targets" and "cxl list -d <decoder_name>"



Alison Schofield (7):
  libcxl: add interfaces for GET_POISON_LIST mailbox commands
  cxl/event_trace: add an optional pid check to event parsing
  cxl/event_trace: support poison context in event parsing
  cxl/event_trace: add helpers to retrieve tep fields by type
  cxl/list: collect and parse media_error records
  cxl/list: add --media-errors option to cxl list
  cxl/test: add cxl-poison.sh unit test

 Documentation/cxl/cxl-list.txt |  62 ++++++++++-
 cxl/event_trace.c              |  51 ++++++++-
 cxl/event_trace.h              |  19 +++-
 cxl/filter.h                   |   3 +
 cxl/json.c                     | 194 +++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.c               |  47 ++++++++
 cxl/lib/libcxl.sym             |   2 +
 cxl/libcxl.h                   |   2 +
 cxl/list.c                     |   3 +
 test/cxl-poison.sh             | 137 +++++++++++++++++++++++
 test/meson.build               |   2 +
 11 files changed, 514 insertions(+), 8 deletions(-)
 create mode 100644 test/cxl-poison.sh


base-commit: e0d0680bd3e554bd5f211e989480c5a13a023b2d
-- 
2.37.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands
  2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
@ 2024-03-14  4:05 ` alison.schofield
  2024-03-18 17:51   ` fan
  2024-03-14  4:05 ` [ndctl PATCH v11 2/7] cxl/event_trace: add an optional pid check to event parsing alison.schofield
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: alison.schofield @ 2024-03-14  4:05 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl, Dave Jiang

From: Alison Schofield <alison.schofield@intel.com>

CXL devices maintain a list of locations that are poisoned or result
in poison if the addresses are accessed by the host.

Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison
List as a set of  Media Error Records that include the source of the
error, the starting device physical address and length.

Trigger the retrieval of the poison list by writing to the memory
device sysfs attribute: trigger_poison_list. The CXL driver only
offers triggering per memdev, so the trigger by region interface
offered here is a convenience API that triggers a poison list
retrieval for each memdev contributing to a region.

int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
int cxl_region_trigger_poison_list(struct cxl_region *region);

The resulting poison records are logged as kernel trace events
named 'cxl_poison'.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 cxl/lib/libcxl.c   | 47 ++++++++++++++++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.sym |  2 ++
 cxl/libcxl.h       |  2 ++
 3 files changed, 51 insertions(+)

diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index ff27cdf7c44a..73db8f15c704 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -1761,6 +1761,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev)
 	return 0;
 }
 
+CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev)
+{
+	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
+	char *path = memdev->dev_buf;
+	int len = memdev->buf_len, rc;
+
+	if (snprintf(path, len, "%s/trigger_poison_list",
+		     memdev->dev_path) >= len) {
+		err(ctx, "%s: buffer too small\n",
+		    cxl_memdev_get_devname(memdev));
+		return -ENXIO;
+	}
+	rc = sysfs_write_attr(ctx, path, "1\n");
+	if (rc < 0) {
+		fprintf(stderr,
+			"%s: Failed write sysfs attr trigger_poison_list\n",
+			cxl_memdev_get_devname(memdev));
+		return rc;
+	}
+	return 0;
+}
+
+CXL_EXPORT int cxl_region_trigger_poison_list(struct cxl_region *region)
+{
+	struct cxl_memdev_mapping *mapping;
+	int rc;
+
+	cxl_mapping_foreach(region, mapping) {
+		struct cxl_decoder *decoder;
+		struct cxl_memdev *memdev;
+
+		decoder = cxl_mapping_get_decoder(mapping);
+		if (!decoder)
+			continue;
+
+		memdev = cxl_decoder_get_memdev(decoder);
+		if (!memdev)
+			continue;
+
+		rc = cxl_memdev_trigger_poison_list(memdev);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
 CXL_EXPORT int cxl_memdev_enable(struct cxl_memdev *memdev)
 {
 	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index de2cd84b2960..3f709c60db3d 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -280,4 +280,6 @@ global:
 	cxl_memdev_get_pmem_qos_class;
 	cxl_memdev_get_ram_qos_class;
 	cxl_region_qos_class_mismatch;
+	cxl_memdev_trigger_poison_list;
+	cxl_region_trigger_poison_list;
 } LIBCXL_6;
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index a6af3fb04693..29165043ca3f 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -467,6 +467,8 @@ enum cxl_setpartition_mode {
 
 int cxl_cmd_partition_set_mode(struct cxl_cmd *cmd,
 		enum cxl_setpartition_mode mode);
+int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
+int cxl_region_trigger_poison_list(struct cxl_region *region);
 
 int cxl_cmd_alert_config_set_life_used_prog_warn_threshold(struct cxl_cmd *cmd,
 							   int threshold);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ndctl PATCH v11 2/7] cxl/event_trace: add an optional pid check to event parsing
  2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
  2024-03-14  4:05 ` [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
@ 2024-03-14  4:05 ` alison.schofield
  2024-03-14  4:05 ` [ndctl PATCH v11 3/7] cxl/event_trace: support poison context in " alison.schofield
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: alison.schofield @ 2024-03-14  4:05 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Alison Schofield, nvdimm, linux-cxl, Jonathan Cameron, Dave Jiang

From: Alison Schofield <alison.schofield@intel.com>

When parsing CXL events, callers may only be interested in events
that originate from the current process. Introduce an optional
argument to the event trace context: event_pid. When event_pid is
present, simply skip the parsing of events without a matching pid.
It is not a failure to see other, non matching events.

The initial use case for this is device poison listings where
only the media-error records requested by this process are wanted.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 cxl/event_trace.c | 5 +++++
 cxl/event_trace.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/cxl/event_trace.c b/cxl/event_trace.c
index 1b5aa09de8b2..93a95f9729fd 100644
--- a/cxl/event_trace.c
+++ b/cxl/event_trace.c
@@ -214,6 +214,11 @@ static int cxl_event_parse(struct tep_event *event, struct tep_record *record,
 			return 0;
 	}
 
+	if (event_ctx->event_pid) {
+		if (event_ctx->event_pid != tep_data_pid(event->tep, record))
+			return 0;
+	}
+
 	if (event_ctx->parse_event)
 		return event_ctx->parse_event(event, record,
 					      &event_ctx->jlist_head);
diff --git a/cxl/event_trace.h b/cxl/event_trace.h
index ec6267202c8b..7f7773b2201f 100644
--- a/cxl/event_trace.h
+++ b/cxl/event_trace.h
@@ -15,6 +15,7 @@ struct event_ctx {
 	const char *system;
 	struct list_head jlist_head;
 	const char *event_name; /* optional */
+	int event_pid; /* optional */
 	int (*parse_event)(struct tep_event *event, struct tep_record *record,
 			   struct list_head *jlist_head); /* optional */
 };
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ndctl PATCH v11 3/7] cxl/event_trace: support poison context in event parsing
  2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
  2024-03-14  4:05 ` [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
  2024-03-14  4:05 ` [ndctl PATCH v11 2/7] cxl/event_trace: add an optional pid check to event parsing alison.schofield
@ 2024-03-14  4:05 ` alison.schofield
  2024-03-14  4:05 ` [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type alison.schofield
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: alison.schofield @ 2024-03-14  4:05 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl, Dave Jiang

From: Alison Schofield <alison.schofield@intel.com>

CXL event tracing provides helpers to iterate through a trace
buffer and extract events of interest. It offers two parsing
options: a default parser that adds every field of an event to
a json object, and a private parsing option where the caller can
parse each event as it wishes.

Although the private parser can do some conditional parsing based
on field values, it has no method to receive additional information
needed to make parsing decisions in the callback.

Provide additional information required by cxl_poison events by
adding a pointer to the poison_ctx directly the struct event_context.

Tidy-up the calling convention by passing the entire event_ctx to
it's own parse_event method rather than growing the param list.

This is in preparation for adding a private parser requiring the
additional context for cxl_poison events.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 cxl/event_trace.c |  9 ++++-----
 cxl/event_trace.h | 10 +++++++++-
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/cxl/event_trace.c b/cxl/event_trace.c
index 93a95f9729fd..640abdab67bf 100644
--- a/cxl/event_trace.c
+++ b/cxl/event_trace.c
@@ -60,7 +60,7 @@ static struct json_object *num_to_json(void *num, int elem_size, unsigned long f
 }
 
 static int cxl_event_to_json(struct tep_event *event, struct tep_record *record,
-			     struct list_head *jlist_head)
+			     struct event_ctx *ctx)
 {
 	struct json_object *jevent, *jobj, *jarray;
 	struct tep_format_field **fields;
@@ -190,7 +190,7 @@ static int cxl_event_to_json(struct tep_event *event, struct tep_record *record,
 		}
 	}
 
-	list_add_tail(jlist_head, &jnode->list);
+	list_add_tail(&ctx->jlist_head, &jnode->list);
 	return 0;
 
 err_jevent:
@@ -220,10 +220,9 @@ static int cxl_event_parse(struct tep_event *event, struct tep_record *record,
 	}
 
 	if (event_ctx->parse_event)
-		return event_ctx->parse_event(event, record,
-					      &event_ctx->jlist_head);
+		return event_ctx->parse_event(event, record, event_ctx);
 
-	return cxl_event_to_json(event, record, &event_ctx->jlist_head);
+	return cxl_event_to_json(event, record, event_ctx);
 }
 
 int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx)
diff --git a/cxl/event_trace.h b/cxl/event_trace.h
index 7f7773b2201f..b77cafb410c4 100644
--- a/cxl/event_trace.h
+++ b/cxl/event_trace.h
@@ -11,13 +11,21 @@ struct jlist_node {
 	struct list_node list;
 };
 
+struct poison_ctx {
+	struct json_object *jpoison;
+	struct cxl_region *region;
+	struct cxl_memdev *memdev;
+	unsigned long flags;
+};
+
 struct event_ctx {
 	const char *system;
 	struct list_head jlist_head;
 	const char *event_name; /* optional */
 	int event_pid; /* optional */
+	struct poison_ctx *poison_ctx; /* optional */
 	int (*parse_event)(struct tep_event *event, struct tep_record *record,
-			   struct list_head *jlist_head); /* optional */
+			   struct event_ctx *ctx);
 };
 
 int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type
  2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
                   ` (2 preceding siblings ...)
  2024-03-14  4:05 ` [ndctl PATCH v11 3/7] cxl/event_trace: support poison context in " alison.schofield
@ 2024-03-14  4:05 ` alison.schofield
  2024-03-15 15:44   ` Dave Jiang
                     ` (2 more replies)
  2024-03-14  4:05 ` [ndctl PATCH v11 5/7] cxl/list: collect and parse media_error records alison.schofield
                   ` (5 subsequent siblings)
  9 siblings, 3 replies; 29+ messages in thread
From: alison.schofield @ 2024-03-14  4:05 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Add helpers to extract the value of an event record field given the
field name. This is useful when the user knows the name and format
of the field and simply needs to get it. The helpers also return
the 'type'_MAX of the type when the field is

Since this is in preparation for adding a cxl_poison private parser
for 'cxl list --media-errors' support those specific required
types: u8, u32, u64.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++
 cxl/event_trace.h |  8 +++++++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/cxl/event_trace.c b/cxl/event_trace.c
index 640abdab67bf..324edb982888 100644
--- a/cxl/event_trace.c
+++ b/cxl/event_trace.c
@@ -15,6 +15,43 @@
 #define _GNU_SOURCE
 #include <string.h>
 
+u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
+		      const char *name)
+{
+	unsigned long long val;
+
+	if (tep_get_field_val(NULL, event, name, record, &val, 0))
+		return ULLONG_MAX;
+
+	return val;
+}
+
+u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record,
+		      const char *name)
+{
+	char *val;
+	int len;
+
+	val = tep_get_field_raw(NULL, event, name, record, &len, 0);
+	if (!val)
+		return UINT_MAX;
+
+	return *(u32 *)val;
+}
+
+u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record,
+		    const char *name)
+{
+	char *val;
+	int len;
+
+	val = tep_get_field_raw(NULL, event, name, record, &len, 0);
+	if (!val)
+		return UCHAR_MAX;
+
+	return *(u8 *)val;
+}
+
 static struct json_object *num_to_json(void *num, int elem_size, unsigned long flags)
 {
 	bool sign = flags & TEP_FIELD_IS_SIGNED;
diff --git a/cxl/event_trace.h b/cxl/event_trace.h
index b77cafb410c4..7b30c3922aef 100644
--- a/cxl/event_trace.h
+++ b/cxl/event_trace.h
@@ -5,6 +5,7 @@
 
 #include <json-c/json.h>
 #include <ccan/list/list.h>
+#include <ccan/short_types/short_types.h>
 
 struct jlist_node {
 	struct json_object *jobj;
@@ -32,5 +33,10 @@ int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx);
 int cxl_event_tracing_enable(struct tracefs_instance *inst, const char *system,
 		const char *event);
 int cxl_event_tracing_disable(struct tracefs_instance *inst);
-
+u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record,
+		    const char *name);
+u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record,
+		      const char *name);
+u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
+		      const char *name);
 #endif
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ndctl PATCH v11 5/7] cxl/list: collect and parse media_error records
  2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
                   ` (3 preceding siblings ...)
  2024-03-14  4:05 ` [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type alison.schofield
@ 2024-03-14  4:05 ` alison.schofield
  2024-03-15 16:16   ` Dave Jiang
  2024-03-14  4:05 ` [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list alison.schofield
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: alison.schofield @ 2024-03-14  4:05 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Media_error records are logged as events in the kernel tracing
subsystem. To prepare the media_error records for cxl list, enable
tracing, trigger the poison list read, and parse the generated
cxl_poison events into a json representation.

Use the event_trace private parsing option to customize the json
representation based on cxl-list calling options and event field
settings.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 cxl/json.c | 194 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 194 insertions(+)

diff --git a/cxl/json.c b/cxl/json.c
index fbe41c78e82a..974e98f13cec 100644
--- a/cxl/json.c
+++ b/cxl/json.c
@@ -1,16 +1,20 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (C) 2015-2021 Intel Corporation. All rights reserved.
 #include <limits.h>
+#include <errno.h>
 #include <util/json.h>
+#include <util/bitmap.h>
 #include <uuid/uuid.h>
 #include <cxl/libcxl.h>
 #include <json-c/json.h>
 #include <json-c/printbuf.h>
 #include <ccan/short_types/short_types.h>
+#include <tracefs/tracefs.h>
 
 #include "filter.h"
 #include "json.h"
 #include "../daxctl/json.h"
+#include "event_trace.h"
 
 #define CXL_FW_VERSION_STR_LEN	16
 #define CXL_FW_MAX_SLOTS	4
@@ -571,6 +575,184 @@ err_jobj:
 	return NULL;
 }
 
+/* CXL Spec 3.1 Table 8-140 Media Error Record */
+#define CXL_POISON_SOURCE_MAX 7
+static const char *poison_source[] = { "Unknown",  "External", "Internal",
+				       "Injected", "Reserved", "Reserved",
+				       "Reserved", "Vendor" };
+
+/* CXL Spec 3.1 Table 8-139 Get Poison List Output Payload */
+#define CXL_POISON_FLAG_MORE BIT(0)
+#define CXL_POISON_FLAG_OVERFLOW BIT(1)
+#define CXL_POISON_FLAG_SCANNING BIT(2)
+
+static int poison_event_to_json(struct tep_event *event,
+				struct tep_record *record,
+				struct event_ctx *e_ctx)
+{
+	struct poison_ctx *p_ctx = e_ctx->poison_ctx;
+	struct json_object *jp, *jobj, *jpoison = p_ctx->jpoison;
+	struct cxl_memdev *memdev = p_ctx->memdev;
+	struct cxl_region *region = p_ctx->region;
+	unsigned long flags = p_ctx->flags;
+	const char *region_name = NULL;
+	char flag_str[32] = { '\0' };
+	bool overflow = false;
+	u8 source, pflags;
+	u64 offset, ts;
+	u32 length;
+	char *str;
+	int len;
+
+	jp = json_object_new_object();
+	if (!jp)
+		return -ENOMEM;
+
+	/* Skip records not in this region when listing by region */
+	if (region)
+		region_name = cxl_region_get_devname(region);
+	if (region_name)
+		str = tep_get_field_raw(NULL, event, "region", record, &len, 0);
+	if ((region_name) && (strcmp(region_name, str) != 0)) {
+		json_object_put(jp);
+		return 0;
+	}
+	/* Include offset,length by region (hpa) or by memdev (dpa) */
+	if (region) {
+		offset = cxl_get_field_u64(event, record, "hpa");
+		if (offset != ULLONG_MAX) {
+			offset = offset - cxl_region_get_resource(region);
+			jobj = util_json_object_hex(offset, flags);
+			if (jobj)
+				json_object_object_add(jp, "offset", jobj);
+		}
+	} else if (memdev) {
+		offset = cxl_get_field_u64(event, record, "dpa");
+		if (offset != ULLONG_MAX) {
+			jobj = util_json_object_hex(offset, flags);
+			if (jobj)
+				json_object_object_add(jp, "offset", jobj);
+		}
+	}
+	length = cxl_get_field_u32(event, record, "dpa_length");
+	jobj = util_json_object_size(length, flags);
+	if (jobj)
+		json_object_object_add(jp, "length", jobj);
+
+	/* Always include the poison source */
+	source = cxl_get_field_u8(event, record, "source");
+	if (source <= CXL_POISON_SOURCE_MAX)
+		jobj = json_object_new_string(poison_source[source]);
+	else
+		jobj = json_object_new_string("Reserved");
+	if (jobj)
+		json_object_object_add(jp, "source", jobj);
+
+	/* Include flags and overflow time if present */
+	pflags = cxl_get_field_u8(event, record, "flags");
+	if (pflags && pflags < UCHAR_MAX) {
+		if (pflags & CXL_POISON_FLAG_MORE)
+			strcat(flag_str, "More,");
+		if (pflags & CXL_POISON_FLAG_SCANNING)
+			strcat(flag_str, "Scanning,");
+		if (pflags & CXL_POISON_FLAG_OVERFLOW) {
+			strcat(flag_str, "Overflow,");
+			overflow = true;
+		}
+		jobj = json_object_new_string(flag_str);
+		if (jobj)
+			json_object_object_add(jp, "flags", jobj);
+	}
+	if (overflow) {
+		ts = cxl_get_field_u64(event, record, "overflow_ts");
+		jobj = util_json_object_hex(ts, flags);
+		if (jobj)
+			json_object_object_add(jp, "overflow_t", jobj);
+	}
+	json_object_array_add(jpoison, jp);
+
+	return 0;
+}
+
+static struct json_object *
+util_cxl_poison_events_to_json(struct tracefs_instance *inst,
+			       struct poison_ctx *p_ctx)
+{
+	struct event_ctx ectx = {
+		.event_name = "cxl_poison",
+		.event_pid = getpid(),
+		.system = "cxl",
+		.poison_ctx = p_ctx,
+		.parse_event = poison_event_to_json,
+	};
+	int rc = 0;
+
+	p_ctx->jpoison = json_object_new_array();
+	if (!p_ctx->jpoison)
+		return NULL;
+
+	rc = cxl_parse_events(inst, &ectx);
+	if (rc < 0) {
+		fprintf(stderr, "Failed to parse events: %d\n", rc);
+		goto put_jobj;
+	}
+	if (json_object_array_length(p_ctx->jpoison) == 0)
+		goto put_jobj;
+
+	return p_ctx->jpoison;
+
+put_jobj:
+	json_object_put(p_ctx->jpoison);
+	return NULL;
+}
+
+static struct json_object *
+util_cxl_poison_list_to_json(struct cxl_region *region,
+			     struct cxl_memdev *memdev,
+			     unsigned long flags)
+{
+	struct json_object *jpoison = NULL;
+	struct poison_ctx p_ctx;
+	struct tracefs_instance *inst;
+	int rc;
+
+	inst = tracefs_instance_create("cxl list");
+	if (!inst) {
+		fprintf(stderr, "tracefs_instance_create() failed\n");
+		return NULL;
+	}
+
+	rc = cxl_event_tracing_enable(inst, "cxl", "cxl_poison");
+	if (rc < 0) {
+		fprintf(stderr, "Failed to enable trace: %d\n", rc);
+		goto err_free;
+	}
+
+	if (region)
+		rc = cxl_region_trigger_poison_list(region);
+	else
+		rc = cxl_memdev_trigger_poison_list(memdev);
+	if (rc)
+		goto err_free;
+
+	rc = cxl_event_tracing_disable(inst);
+	if (rc < 0) {
+		fprintf(stderr, "Failed to disable trace: %d\n", rc);
+		goto err_free;
+	}
+
+	p_ctx = (struct poison_ctx) {
+		.region = region,
+		.memdev = memdev,
+		.flags = flags,
+	};
+	jpoison = util_cxl_poison_events_to_json(inst, &p_ctx);
+
+err_free:
+	tracefs_instance_free(inst);
+	return jpoison;
+}
+
 struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
 		unsigned long flags)
 {
@@ -664,6 +846,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
 			json_object_object_add(jdev, "firmware", jobj);
 	}
 
+	if (flags & UTIL_JSON_MEDIA_ERRORS) {
+		jobj = util_cxl_poison_list_to_json(NULL, memdev, flags);
+		if (jobj)
+			json_object_object_add(jdev, "media_errors", jobj);
+	}
+
 	json_object_set_userdata(jdev, memdev, NULL);
 	return jdev;
 }
@@ -1012,6 +1200,12 @@ struct json_object *util_cxl_region_to_json(struct cxl_region *region,
 			json_object_object_add(jregion, "state", jobj);
 	}
 
+	if (flags & UTIL_JSON_MEDIA_ERRORS) {
+		jobj = util_cxl_poison_list_to_json(region, NULL, flags);
+		if (jobj)
+			json_object_object_add(jregion, "media_errors", jobj);
+	}
+
 	util_cxl_mappings_append_json(jregion, region, flags);
 
 	if (flags & UTIL_JSON_DAX) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list
  2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
                   ` (4 preceding siblings ...)
  2024-03-14  4:05 ` [ndctl PATCH v11 5/7] cxl/list: collect and parse media_error records alison.schofield
@ 2024-03-14  4:05 ` alison.schofield
  2024-03-15 16:41   ` Dave Jiang
  2024-03-14  4:05 ` [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test alison.schofield
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: alison.schofield @ 2024-03-14  4:05 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

The --media-errors option to 'cxl list' retrieves poison lists from
memory devices supporting the capability and displays the returned
media_error records in the cxl list json. This option can apply to
memdevs or regions.

Include media-errors in the -vvv verbose option.

Example usage in the Documentation/cxl/cxl-list.txt update.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/cxl/cxl-list.txt | 62 +++++++++++++++++++++++++++++++++-
 cxl/filter.h                   |  3 ++
 cxl/list.c                     |  3 ++
 3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
index 838de4086678..6d3ef92c29e8 100644
--- a/Documentation/cxl/cxl-list.txt
+++ b/Documentation/cxl/cxl-list.txt
@@ -415,6 +415,66 @@ OPTIONS
 --region::
 	Specify CXL region device name(s), or device id(s), to filter the listing.
 
+-L::
+--media-errors::
+	Include media-error information. The poison list is retrieved from the
+	device(s) and media_error records are added to the listing. Apply this
+	option to memdevs and regions where devices support the poison list
+	capability. "offset:" is relative to the region resource when listing
+	by region and is the absolute device DPA when listing by memdev.
+	"source:" is one of: External, Internal, Injected, Vendor Specific,
+	or Unknown, as defined in CXL Specification v3.1 Table 8-140.
+
+----
+# cxl list -m mem9 --media-errors -u
+{
+  "memdev":"mem9",
+  "pmem_size":"1024.00 MiB (1073.74 MB)",
+  "pmem_qos_class":42,
+  "ram_size":"1024.00 MiB (1073.74 MB)",
+  "ram_qos_class":42,
+  "serial":"0x5",
+  "numa_node":1,
+  "host":"cxl_mem.5",
+  "media_errors":[
+    {
+      "offset":"0x40000000",
+      "length":64,
+      "source":"Injected"
+    }
+  ]
+}
+----
+In the above example, region mappings can be found using:
+"cxl list -p mem9 --decoders"
+----
+# cxl list -r region5 --media-errors -u
+{
+  "region":"region5",
+  "resource":"0xf110000000",
+  "size":"2.00 GiB (2.15 GB)",
+  "type":"pmem",
+  "interleave_ways":2,
+  "interleave_granularity":4096,
+  "decode_state":"commit",
+  "media_errors":[
+    {
+      "offset":"0x1000",
+      "length":64,
+      "source":"Injected"
+    },
+    {
+      "offset":"0x2000",
+      "length":64,
+      "source":"Injected"
+    }
+  ]
+}
+----
+In the above example, memdev mappings can be found using:
+"cxl list -r region5 --targets" and "cxl list -d <decoder_name>"
+
+
 -v::
 --verbose::
 	Increase verbosity of the output. This can be specified
@@ -431,7 +491,7 @@ OPTIONS
 	  devices with --idle.
 	- *-vvv*
 	  Everything *-vv* provides, plus enable
-	  --health and --partition.
+	  --health, --partition, --media-errors.
 
 --debug::
 	If the cxl tool was built with debug enabled, turn on debug
diff --git a/cxl/filter.h b/cxl/filter.h
index 3f65990f835a..956a46e0c7a9 100644
--- a/cxl/filter.h
+++ b/cxl/filter.h
@@ -30,6 +30,7 @@ struct cxl_filter_params {
 	bool fw;
 	bool alert_config;
 	bool dax;
+	bool media_errors;
 	int verbose;
 	struct log_ctx ctx;
 };
@@ -88,6 +89,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
 		flags |= UTIL_JSON_ALERT_CONFIG;
 	if (param->dax)
 		flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
+	if (param->media_errors)
+		flags |= UTIL_JSON_MEDIA_ERRORS;
 	return flags;
 }
 
diff --git a/cxl/list.c b/cxl/list.c
index 93ba51ef895c..0b25d78248d5 100644
--- a/cxl/list.c
+++ b/cxl/list.c
@@ -57,6 +57,8 @@ static const struct option options[] = {
 		    "include memory device firmware information"),
 	OPT_BOOLEAN('A', "alert-config", &param.alert_config,
 		    "include alert configuration information"),
+	OPT_BOOLEAN('L', "media-errors", &param.media_errors,
+		    "include media-error information "),
 	OPT_INCR('v', "verbose", &param.verbose, "increase output detail"),
 #ifdef ENABLE_DEBUG
 	OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
@@ -121,6 +123,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
 		param.fw = true;
 		param.alert_config = true;
 		param.dax = true;
+		param.media_errors = true;
 		/* fallthrough */
 	case 2:
 		param.idle = true;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test
  2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
                   ` (5 preceding siblings ...)
  2024-03-14  4:05 ` [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list alison.schofield
@ 2024-03-14  4:05 ` alison.schofield
  2024-03-15 17:03   ` Dave Jiang
       [not found] ` <CGME20240314040548epcas2p3698bf9d1463a1d2255dc95ac506d3ae8@epcms2p4>
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: alison.schofield @ 2024-03-14  4:05 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Exercise cxl list, libcxl, and driver pieces of the get poison list
pathway. Inject and clear poison using debugfs and use cxl-cli to
read the poison list by memdev and by region.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 test/cxl-poison.sh | 137 +++++++++++++++++++++++++++++++++++++++++++++
 test/meson.build   |   2 +
 2 files changed, 139 insertions(+)
 create mode 100644 test/cxl-poison.sh

diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh
new file mode 100644
index 000000000000..af2e9dcd1a11
--- /dev/null
+++ b/test/cxl-poison.sh
@@ -0,0 +1,137 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2023 Intel Corporation. All rights reserved.
+
+. "$(dirname "$0")"/common
+
+rc=77
+
+set -ex
+
+trap 'err $LINENO' ERR
+
+check_prereq "jq"
+
+modprobe -r cxl_test
+modprobe cxl_test
+
+rc=1
+
+# THEORY OF OPERATION: Exercise cxl-cli and cxl driver ability to
+# inject, clear, and get the poison list. Do it by memdev and by region.
+
+find_memdev()
+{
+	readarray -t capable_mems < <("$CXL" list -b "$CXL_TEST_BUS" -M |
+		jq -r ".[] | select(.pmem_size != null) |
+		select(.ram_size != null) | .memdev")
+
+	if [ ${#capable_mems[@]} == 0 ]; then
+		echo "no memdevs found for test"
+		err "$LINENO"
+	fi
+
+	memdev=${capable_mems[0]}
+}
+
+create_x2_region()
+{
+	# Find an x2 decoder
+	decoder="$($CXL list -b "$CXL_TEST_BUS" -D -d root | jq -r ".[] |
+		select(.pmem_capable == true) |
+		select(.nr_targets == 2) |
+		.decoder")"
+
+	# Find a memdev for each host-bridge interleave position
+	port_dev0="$($CXL list -T -d "$decoder" | jq -r ".[] |
+		.targets | .[] | select(.position == 0) | .target")"
+	port_dev1="$($CXL list -T -d "$decoder" | jq -r ".[] |
+		.targets | .[] | select(.position == 1) | .target")"
+	mem0="$($CXL list -M -p "$port_dev0" | jq -r ".[0].memdev")"
+	mem1="$($CXL list -M -p "$port_dev1" | jq -r ".[0].memdev")"
+
+	region="$($CXL create-region -d "$decoder" -m "$mem0" "$mem1" |
+		jq -r ".region")"
+	if [[ ! $region ]]; then
+		echo "create-region failed for $decoder"
+		err "$LINENO"
+	fi
+	echo "$region"
+}
+
+# When cxl-cli support for inject and clear arrives, replace
+# the writes to /sys/kernel/debug with the new cxl commands.
+
+inject_poison_sysfs()
+{
+	memdev="$1"
+	addr="$2"
+
+	echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
+}
+
+clear_poison_sysfs()
+{
+	memdev="$1"
+	addr="$2"
+
+	echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
+}
+
+validate_poison_found()
+{
+	list_by="$1"
+	nr_expect="$2"
+
+	poison_list="$($CXL list "$list_by" --media-errors |
+		jq -r '.[].media_errors')"
+	if [[ ! $poison_list ]]; then
+		nr_found=0
+	else
+		nr_found=$(jq "length" <<< "$poison_list")
+	fi
+	if [ "$nr_found" -ne "$nr_expect" ]; then
+		echo "$nr_expect poison records expected, $nr_found found"
+		err "$LINENO"
+	fi
+}
+
+test_poison_by_memdev()
+{
+	find_memdev
+	inject_poison_sysfs "$memdev" "0x40000000"
+	inject_poison_sysfs "$memdev" "0x40001000"
+	inject_poison_sysfs "$memdev" "0x600"
+	inject_poison_sysfs "$memdev" "0x0"
+	validate_poison_found "-m $memdev" 4
+
+	clear_poison_sysfs "$memdev" "0x40000000"
+	clear_poison_sysfs "$memdev" "0x40001000"
+	clear_poison_sysfs "$memdev" "0x600"
+	clear_poison_sysfs "$memdev" "0x0"
+	validate_poison_found "-m $memdev" 0
+}
+
+test_poison_by_region()
+{
+	create_x2_region
+	inject_poison_sysfs "$mem0" "0x40000000"
+	inject_poison_sysfs "$mem1" "0x40000000"
+	validate_poison_found "-r $region" 2
+
+	clear_poison_sysfs "$mem0" "0x40000000"
+	clear_poison_sysfs "$mem1" "0x40000000"
+	validate_poison_found "-r $region" 0
+}
+
+# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing.
+# Turning it on here allows the test user to also view inject and clear
+# trace events.
+echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable
+
+test_poison_by_memdev
+test_poison_by_region
+
+check_dmesg "$LINENO"
+
+modprobe -r cxl-test
diff --git a/test/meson.build b/test/meson.build
index a965a79fd6cb..d871e28e17ce 100644
--- a/test/meson.build
+++ b/test/meson.build
@@ -160,6 +160,7 @@ cxl_events = find_program('cxl-events.sh')
 cxl_sanitize = find_program('cxl-sanitize.sh')
 cxl_destroy_region = find_program('cxl-destroy-region.sh')
 cxl_qos_class = find_program('cxl-qos-class.sh')
+cxl_poison = find_program('cxl-poison.sh')
 
 tests = [
   [ 'libndctl',               libndctl,		  'ndctl' ],
@@ -192,6 +193,7 @@ tests = [
   [ 'cxl-sanitize.sh',        cxl_sanitize,       'cxl'   ],
   [ 'cxl-destroy-region.sh',  cxl_destroy_region, 'cxl'   ],
   [ 'cxl-qos-class.sh',       cxl_qos_class,      'cxl'   ],
+  [ 'cxl-poison.sh',          cxl_poison,         'cxl'   ],
 ]
 
 if get_option('destructive').enabled()
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* RE: [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list
       [not found] ` <CGME20240314040548epcas2p3698bf9d1463a1d2255dc95ac506d3ae8@epcms2p4>
@ 2024-03-15  1:09   ` Wonjae Lee
  2024-03-15  2:36     ` Alison Schofield
  0 siblings, 1 reply; 29+ messages in thread
From: Wonjae Lee @ 2024-03-15  1:09 UTC (permalink / raw)
  To: alison.schofield, Vishal Verma; +Cc: Hojin Nam, nvdimm, linux-cxl

alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> The --media-errors option to 'cxl list' retrieves poison lists from
> memory devices supporting the capability and displays the returned
> media_error records in the cxl list json. This option can apply to
> memdevs or regions.
>
> Include media-errors in the -vvv verbose option.
>
> Example usage in the Documentation/cxl/cxl-list.txt update.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> cxl/filter.h                    3 ++
> cxl/list.c                      3 ++
> 3 files changed, 67 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> index 838de4086678..6d3ef92c29e8 100644
> --- a/Documentation/cxl/cxl-list.txt
> +++ b/Documentation/cxl/cxl-list.txt

[snip]

+----
+In the above example, region mappings can be found using:
+"cxl list -p mem9 --decoders"
+----

Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
cover letter, too.

Thanks,
Wonjae

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list
  2024-03-15  1:09   ` [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list Wonjae Lee
@ 2024-03-15  2:36     ` Alison Schofield
  2024-03-15  3:35       ` Dan Williams
  0 siblings, 1 reply; 29+ messages in thread
From: Alison Schofield @ 2024-03-15  2:36 UTC (permalink / raw)
  To: Wonjae Lee; +Cc: Vishal Verma, Hojin Nam, nvdimm, linux-cxl

On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > The --media-errors option to 'cxl list' retrieves poison lists from
> > memory devices supporting the capability and displays the returned
> > media_error records in the cxl list json. This option can apply to
> > memdevs or regions.
> >
> > Include media-errors in the -vvv verbose option.
> >
> > Example usage in the Documentation/cxl/cxl-list.txt update.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > cxl/filter.h                    3 ++
> > cxl/list.c                      3 ++
> > 3 files changed, 67 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > index 838de4086678..6d3ef92c29e8 100644
> > --- a/Documentation/cxl/cxl-list.txt
> > +++ b/Documentation/cxl/cxl-list.txt
> 
> [snip]
> 
> +----
> +In the above example, region mappings can be found using:
> +"cxl list -p mem9 --decoders"
> +----
> 
> Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> cover letter, too.

Thanks for the review! I went with -p because it gives only
the endpoint decoder while -m gives all the decoders up to
the root - more than needed to discover the region.

Here are the 2 outputs - what do you think?

# cxl list -p mem9 --decoders -u
{
  "decoder":"decoder20.0",
  "resource":"0xf110000000",
  "size":"2.00 GiB (2.15 GB)",
  "interleave_ways":2,
  "interleave_granularity":4096,
  "region":"region5",
  "dpa_resource":"0x40000000",
  "dpa_size":"1024.00 MiB (1073.74 MB)",
  "mode":"pmem"
}

# cxl list -m mem9 --decoders -u
[
  {
    "root decoders":[
      {
        "decoder":"decoder7.1",
        "resource":"0xf050000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":2,
        "interleave_granularity":4096,
        "max_available_extent":"2.00 GiB (2.15 GB)",
        "volatile_capable":true,
        "qos_class":42,
        "nr_targets":2
      },
      {
        "decoder":"decoder7.3",
        "resource":"0xf110000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":2,
        "interleave_granularity":4096,
        "max_available_extent":0,
        "pmem_capable":true,
        "qos_class":42,
        "nr_targets":2
      }
    ]
  },
  {
    "port decoders":[
      {
        "decoder":"decoder9.0",
        "resource":"0xf110000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":1,
        "region":"region5",
        "nr_targets":1
      },
      {
        "decoder":"decoder13.0",
        "resource":"0xf110000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":1,
        "region":"region5",
        "nr_targets":1
      }
    ]
  },
  {
    "endpoint decoders":[
      {
        "decoder":"decoder20.0",
        "resource":"0xf110000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":2,
        "interleave_granularity":4096,
        "region":"region5",
        "dpa_resource":"0x40000000",
        "dpa_size":"1024.00 MiB (1073.74 MB)",
        "mode":"pmem"
      }
    ]
  }
]

> 
> Thanks,
> Wonjae

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list
  2024-03-15  2:36     ` Alison Schofield
@ 2024-03-15  3:35       ` Dan Williams
  2024-03-20 20:40         ` Alison Schofield
  2024-03-27 19:48         ` Alison Schofield
  0 siblings, 2 replies; 29+ messages in thread
From: Dan Williams @ 2024-03-15  3:35 UTC (permalink / raw)
  To: Alison Schofield, Wonjae Lee; +Cc: Vishal Verma, Hojin Nam, nvdimm, linux-cxl

Alison Schofield wrote:
> On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> > alison.schofield@intel.com wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > The --media-errors option to 'cxl list' retrieves poison lists from
> > > memory devices supporting the capability and displays the returned
> > > media_error records in the cxl list json. This option can apply to
> > > memdevs or regions.
> > >
> > > Include media-errors in the -vvv verbose option.
> > >
> > > Example usage in the Documentation/cxl/cxl-list.txt update.
> > >
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > > cxl/filter.h                    3 ++
> > > cxl/list.c                      3 ++
> > > 3 files changed, 67 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > > index 838de4086678..6d3ef92c29e8 100644
> > > --- a/Documentation/cxl/cxl-list.txt
> > > +++ b/Documentation/cxl/cxl-list.txt
> > 
> > [snip]
> > 
> > +----
> > +In the above example, region mappings can be found using:
> > +"cxl list -p mem9 --decoders"
> > +----
> > 
> > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> > cover letter, too.
> 
> Thanks for the review! I went with -p because it gives only
> the endpoint decoder while -m gives all the decoders up to
> the root - more than needed to discover the region.

The first thing that comes to mind to list memory devices with their
decoders is:

    cxl list -MD -d endpoint

...however the problem is that endpoint ports connect memdevs to their
parent port, so the above results in:

  Warning: no matching devices found

I think I want to special case "-d endpoint" when both -M and -D are
specified to also imply -E, "endpoint ports". However that also seems to
have a bug at present:

# cxl list -EDM -d endpoint -iu
{
  "endpoint":"endpoint2",
  "host":"mem0",
  "parent_dport":"0000:34:00.0",
  "depth":2
}

That needs to be fixed up to merge:

# cxl list -ED -d endpoint -iu
{
  "endpoint":"endpoint2",
  "host":"mem0",
  "parent_dport":"0000:34:00.0",
  "depth":2,
  "decoders:endpoint2":[
    {
      "decoder":"decoder2.0",
      "interleave_ways":1,
      "state":"disabled"
    }
  ]
}

...and:

# cxl list -EMu
{
  "endpoint":"endpoint2",
  "host":"mem0",
  "parent_dport":"0000:34:00.0",
  "depth":2,
  "memdev":{
    "memdev":"mem0",
    "pmem_size":"512.00 MiB (536.87 MB)",
    "serial":"0",
    "host":"0000:35:00.0"
  }
}

...so that one can get a nice listing of just endpoint ports, their
decoders (with media errors) and their memdevs.

The reason that "cxl list -p mem9 -D" works is subtle because it filters
the endpoint decoders by an endpoint port filter, but I think most users
would expect to not need to enable endpoint-port listings to see their
decoders the natural key to filter endpoint decoders is by memdev.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type
  2024-03-14  4:05 ` [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type alison.schofield
@ 2024-03-15 15:44   ` Dave Jiang
  2024-03-15 17:39   ` Dan Williams
  2024-03-18 21:21   ` fan
  2 siblings, 0 replies; 29+ messages in thread
From: Dave Jiang @ 2024-03-15 15:44 UTC (permalink / raw)
  To: alison.schofield, Vishal Verma; +Cc: nvdimm, linux-cxl



On 3/13/24 9:05 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Add helpers to extract the value of an event record field given the
> field name. This is useful when the user knows the name and format
> of the field and simply needs to get it. The helpers also return
> the 'type'_MAX of the type when the field is
> 
> Since this is in preparation for adding a cxl_poison private parser
> for 'cxl list --media-errors' support those specific required
> types: u8, u32, u64.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++
>  cxl/event_trace.h |  8 +++++++-
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/cxl/event_trace.c b/cxl/event_trace.c
> index 640abdab67bf..324edb982888 100644
> --- a/cxl/event_trace.c
> +++ b/cxl/event_trace.c
> @@ -15,6 +15,43 @@
>  #define _GNU_SOURCE
>  #include <string.h>
>  
> +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
> +		      const char *name)
> +{
> +	unsigned long long val;
> +
> +	if (tep_get_field_val(NULL, event, name, record, &val, 0))
> +		return ULLONG_MAX;
> +
> +	return val;
> +}
> +
> +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record,
> +		      const char *name)
> +{
> +	char *val;
> +	int len;
> +
> +	val = tep_get_field_raw(NULL, event, name, record, &len, 0);
> +	if (!val)
> +		return UINT_MAX;
> +
> +	return *(u32 *)val;
> +}
> +
> +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record,
> +		    const char *name)
> +{
> +	char *val;
> +	int len;
> +
> +	val = tep_get_field_raw(NULL, event, name, record, &len, 0);
> +	if (!val)
> +		return UCHAR_MAX;
> +
> +	return *(u8 *)val;
> +}
> +
>  static struct json_object *num_to_json(void *num, int elem_size, unsigned long flags)
>  {
>  	bool sign = flags & TEP_FIELD_IS_SIGNED;
> diff --git a/cxl/event_trace.h b/cxl/event_trace.h
> index b77cafb410c4..7b30c3922aef 100644
> --- a/cxl/event_trace.h
> +++ b/cxl/event_trace.h
> @@ -5,6 +5,7 @@
>  
>  #include <json-c/json.h>
>  #include <ccan/list/list.h>
> +#include <ccan/short_types/short_types.h>
>  
>  struct jlist_node {
>  	struct json_object *jobj;
> @@ -32,5 +33,10 @@ int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx);
>  int cxl_event_tracing_enable(struct tracefs_instance *inst, const char *system,
>  		const char *event);
>  int cxl_event_tracing_disable(struct tracefs_instance *inst);
> -
> +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record,
> +		    const char *name);
> +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record,
> +		      const char *name);
> +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
> +		      const char *name);
>  #endif

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 5/7] cxl/list: collect and parse media_error records
  2024-03-14  4:05 ` [ndctl PATCH v11 5/7] cxl/list: collect and parse media_error records alison.schofield
@ 2024-03-15 16:16   ` Dave Jiang
  2024-03-20 20:24     ` Alison Schofield
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Jiang @ 2024-03-15 16:16 UTC (permalink / raw)
  To: alison.schofield, Vishal Verma; +Cc: nvdimm, linux-cxl



On 3/13/24 9:05 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Media_error records are logged as events in the kernel tracing
> subsystem. To prepare the media_error records for cxl list, enable
> tracing, trigger the poison list read, and parse the generated
> cxl_poison events into a json representation.
> 
> Use the event_trace private parsing option to customize the json
> representation based on cxl-list calling options and event field
> settings.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Minor nit below.
> ---
>  cxl/json.c | 194 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 194 insertions(+)
> 
> diff --git a/cxl/json.c b/cxl/json.c
> index fbe41c78e82a..974e98f13cec 100644
> --- a/cxl/json.c
> +++ b/cxl/json.c
> @@ -1,16 +1,20 @@
>  // SPDX-License-Identifier: GPL-2.0
>  // Copyright (C) 2015-2021 Intel Corporation. All rights reserved.
>  #include <limits.h>
> +#include <errno.h>
>  #include <util/json.h>
> +#include <util/bitmap.h>
>  #include <uuid/uuid.h>
>  #include <cxl/libcxl.h>
>  #include <json-c/json.h>
>  #include <json-c/printbuf.h>
>  #include <ccan/short_types/short_types.h>
> +#include <tracefs/tracefs.h>
>  
>  #include "filter.h"
>  #include "json.h"
>  #include "../daxctl/json.h"
> +#include "event_trace.h"
>  
>  #define CXL_FW_VERSION_STR_LEN	16
>  #define CXL_FW_MAX_SLOTS	4
> @@ -571,6 +575,184 @@ err_jobj:
>  	return NULL;
>  }
>  
> +/* CXL Spec 3.1 Table 8-140 Media Error Record */
> +#define CXL_POISON_SOURCE_MAX 7
> +static const char *poison_source[] = { "Unknown",  "External", "Internal",
> +				       "Injected", "Reserved", "Reserved",
> +				       "Reserved", "Vendor" };
> +
> +/* CXL Spec 3.1 Table 8-139 Get Poison List Output Payload */
> +#define CXL_POISON_FLAG_MORE BIT(0)
> +#define CXL_POISON_FLAG_OVERFLOW BIT(1)
> +#define CXL_POISON_FLAG_SCANNING BIT(2)
> +
> +static int poison_event_to_json(struct tep_event *event,
> +				struct tep_record *record,
> +				struct event_ctx *e_ctx)
> +{
> +	struct poison_ctx *p_ctx = e_ctx->poison_ctx;
> +	struct json_object *jp, *jobj, *jpoison = p_ctx->jpoison;
> +	struct cxl_memdev *memdev = p_ctx->memdev;
> +	struct cxl_region *region = p_ctx->region;
> +	unsigned long flags = p_ctx->flags;
> +	const char *region_name = NULL;
> +	char flag_str[32] = { '\0' };
> +	bool overflow = false;
> +	u8 source, pflags;
> +	u64 offset, ts;
> +	u32 length;
> +	char *str;
> +	int len;
> +
> +	jp = json_object_new_object();
> +	if (!jp)
> +		return -ENOMEM;
> +
> +	/* Skip records not in this region when listing by region */
> +	if (region)
> +		region_name = cxl_region_get_devname(region);
> +	if (region_name)
> +		str = tep_get_field_raw(NULL, event, "region", record, &len, 0);
> +	if ((region_name) && (strcmp(region_name, str) != 0)) {
> +		json_object_put(jp);
> +		return 0;
> +	}
> +	/* Include offset,length by region (hpa) or by memdev (dpa) */
> +	if (region) {
> +		offset = cxl_get_field_u64(event, record, "hpa");
> +		if (offset != ULLONG_MAX) {
> +			offset = offset - cxl_region_get_resource(region);
> +			jobj = util_json_object_hex(offset, flags);
> +			if (jobj)
> +				json_object_object_add(jp, "offset", jobj);
> +		}
> +	} else if (memdev) {
> +		offset = cxl_get_field_u64(event, record, "dpa");
> +		if (offset != ULLONG_MAX) {
> +			jobj = util_json_object_hex(offset, flags);
> +			if (jobj)
> +				json_object_object_add(jp, "offset", jobj);
> +		}
> +	}
> +	length = cxl_get_field_u32(event, record, "dpa_length");
> +	jobj = util_json_object_size(length, flags);
> +	if (jobj)
> +		json_object_object_add(jp, "length", jobj);
> +
> +	/* Always include the poison source */
> +	source = cxl_get_field_u8(event, record, "source");
> +	if (source <= CXL_POISON_SOURCE_MAX)
> +		jobj = json_object_new_string(poison_source[source]);
> +	else
> +		jobj = json_object_new_string("Reserved");
> +	if (jobj)
> +		json_object_object_add(jp, "source", jobj);
> +
> +	/* Include flags and overflow time if present */
> +	pflags = cxl_get_field_u8(event, record, "flags");
> +	if (pflags && pflags < UCHAR_MAX) {
> +		if (pflags & CXL_POISON_FLAG_MORE)
> +			strcat(flag_str, "More,");
> +		if (pflags & CXL_POISON_FLAG_SCANNING)
> +			strcat(flag_str, "Scanning,");
> +		if (pflags & CXL_POISON_FLAG_OVERFLOW) {
> +			strcat(flag_str, "Overflow,");
> +			overflow = true;
> +		}
> +		jobj = json_object_new_string(flag_str);
> +		if (jobj)
> +			json_object_object_add(jp, "flags", jobj);
> +	}
> +	if (overflow) {
> +		ts = cxl_get_field_u64(event, record, "overflow_ts");
> +		jobj = util_json_object_hex(ts, flags);
> +		if (jobj)
> +			json_object_object_add(jp, "overflow_t", jobj);
> +	}
> +	json_object_array_add(jpoison, jp);
> +
> +	return 0;
> +}
> +
> +static struct json_object *
> +util_cxl_poison_events_to_json(struct tracefs_instance *inst,
> +			       struct poison_ctx *p_ctx)
> +{
> +	struct event_ctx ectx = {
> +		.event_name = "cxl_poison",
> +		.event_pid = getpid(),
> +		.system = "cxl",
> +		.poison_ctx = p_ctx,
> +		.parse_event = poison_event_to_json,
> +	};
> +	int rc = 0;

No need to init rc here. 

DJ

> +
> +	p_ctx->jpoison = json_object_new_array();
> +	if (!p_ctx->jpoison)
> +		return NULL;
> +
> +	rc = cxl_parse_events(inst, &ectx);
> +	if (rc < 0) {
> +		fprintf(stderr, "Failed to parse events: %d\n", rc);
> +		goto put_jobj;
> +	}
> +	if (json_object_array_length(p_ctx->jpoison) == 0)
> +		goto put_jobj;
> +
> +	return p_ctx->jpoison;
> +
> +put_jobj:
> +	json_object_put(p_ctx->jpoison);
> +	return NULL;
> +}
> +
> +static struct json_object *
> +util_cxl_poison_list_to_json(struct cxl_region *region,
> +			     struct cxl_memdev *memdev,
> +			     unsigned long flags)
> +{
> +	struct json_object *jpoison = NULL;
> +	struct poison_ctx p_ctx;
> +	struct tracefs_instance *inst;
> +	int rc;
> +
> +	inst = tracefs_instance_create("cxl list");
> +	if (!inst) {
> +		fprintf(stderr, "tracefs_instance_create() failed\n");
> +		return NULL;
> +	}
> +
> +	rc = cxl_event_tracing_enable(inst, "cxl", "cxl_poison");
> +	if (rc < 0) {
> +		fprintf(stderr, "Failed to enable trace: %d\n", rc);
> +		goto err_free;
> +	}
> +
> +	if (region)
> +		rc = cxl_region_trigger_poison_list(region);
> +	else
> +		rc = cxl_memdev_trigger_poison_list(memdev);
> +	if (rc)
> +		goto err_free;
> +
> +	rc = cxl_event_tracing_disable(inst);
> +	if (rc < 0) {
> +		fprintf(stderr, "Failed to disable trace: %d\n", rc);
> +		goto err_free;
> +	}
> +
> +	p_ctx = (struct poison_ctx) {
> +		.region = region,
> +		.memdev = memdev,
> +		.flags = flags,
> +	};
> +	jpoison = util_cxl_poison_events_to_json(inst, &p_ctx);
> +
> +err_free:
> +	tracefs_instance_free(inst);
> +	return jpoison;
> +}
> +
>  struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
>  		unsigned long flags)
>  {
> @@ -664,6 +846,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
>  			json_object_object_add(jdev, "firmware", jobj);
>  	}
>  
> +	if (flags & UTIL_JSON_MEDIA_ERRORS) {
> +		jobj = util_cxl_poison_list_to_json(NULL, memdev, flags);
> +		if (jobj)
> +			json_object_object_add(jdev, "media_errors", jobj);
> +	}
> +
>  	json_object_set_userdata(jdev, memdev, NULL);
>  	return jdev;
>  }
> @@ -1012,6 +1200,12 @@ struct json_object *util_cxl_region_to_json(struct cxl_region *region,
>  			json_object_object_add(jregion, "state", jobj);
>  	}
>  
> +	if (flags & UTIL_JSON_MEDIA_ERRORS) {
> +		jobj = util_cxl_poison_list_to_json(region, NULL, flags);
> +		if (jobj)
> +			json_object_object_add(jregion, "media_errors", jobj);
> +	}
> +
>  	util_cxl_mappings_append_json(jregion, region, flags);
>  
>  	if (flags & UTIL_JSON_DAX) {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list
  2024-03-14  4:05 ` [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list alison.schofield
@ 2024-03-15 16:41   ` Dave Jiang
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Jiang @ 2024-03-15 16:41 UTC (permalink / raw)
  To: alison.schofield, Vishal Verma; +Cc: nvdimm, linux-cxl



On 3/13/24 9:05 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> The --media-errors option to 'cxl list' retrieves poison lists from
> memory devices supporting the capability and displays the returned
> media_error records in the cxl list json. This option can apply to
> memdevs or regions.
> 
> Include media-errors in the -vvv verbose option.
> 
> Example usage in the Documentation/cxl/cxl-list.txt update.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  Documentation/cxl/cxl-list.txt | 62 +++++++++++++++++++++++++++++++++-
>  cxl/filter.h                   |  3 ++
>  cxl/list.c                     |  3 ++
>  3 files changed, 67 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> index 838de4086678..6d3ef92c29e8 100644
> --- a/Documentation/cxl/cxl-list.txt
> +++ b/Documentation/cxl/cxl-list.txt
> @@ -415,6 +415,66 @@ OPTIONS
>  --region::
>  	Specify CXL region device name(s), or device id(s), to filter the listing.
>  
> +-L::
> +--media-errors::
> +	Include media-error information. The poison list is retrieved from the
> +	device(s) and media_error records are added to the listing. Apply this
> +	option to memdevs and regions where devices support the poison list
> +	capability. "offset:" is relative to the region resource when listing
> +	by region and is the absolute device DPA when listing by memdev.
> +	"source:" is one of: External, Internal, Injected, Vendor Specific,
> +	or Unknown, as defined in CXL Specification v3.1 Table 8-140.
> +
> +----
> +# cxl list -m mem9 --media-errors -u
> +{
> +  "memdev":"mem9",
> +  "pmem_size":"1024.00 MiB (1073.74 MB)",
> +  "pmem_qos_class":42,
> +  "ram_size":"1024.00 MiB (1073.74 MB)",
> +  "ram_qos_class":42,
> +  "serial":"0x5",
> +  "numa_node":1,
> +  "host":"cxl_mem.5",
> +  "media_errors":[
> +    {
> +      "offset":"0x40000000",
> +      "length":64,
> +      "source":"Injected"
> +    }
> +  ]
> +}
> +----
> +In the above example, region mappings can be found using:
> +"cxl list -p mem9 --decoders"
> +----
> +# cxl list -r region5 --media-errors -u
> +{
> +  "region":"region5",
> +  "resource":"0xf110000000",
> +  "size":"2.00 GiB (2.15 GB)",
> +  "type":"pmem",
> +  "interleave_ways":2,
> +  "interleave_granularity":4096,
> +  "decode_state":"commit",
> +  "media_errors":[
> +    {
> +      "offset":"0x1000",
> +      "length":64,
> +      "source":"Injected"
> +    },
> +    {
> +      "offset":"0x2000",
> +      "length":64,
> +      "source":"Injected"
> +    }
> +  ]
> +}
> +----
> +In the above example, memdev mappings can be found using:
> +"cxl list -r region5 --targets" and "cxl list -d <decoder_name>"
> +
> +
>  -v::
>  --verbose::
>  	Increase verbosity of the output. This can be specified
> @@ -431,7 +491,7 @@ OPTIONS
>  	  devices with --idle.
>  	- *-vvv*
>  	  Everything *-vv* provides, plus enable
> -	  --health and --partition.
> +	  --health, --partition, --media-errors.
>  
>  --debug::
>  	If the cxl tool was built with debug enabled, turn on debug
> diff --git a/cxl/filter.h b/cxl/filter.h
> index 3f65990f835a..956a46e0c7a9 100644
> --- a/cxl/filter.h
> +++ b/cxl/filter.h
> @@ -30,6 +30,7 @@ struct cxl_filter_params {
>  	bool fw;
>  	bool alert_config;
>  	bool dax;
> +	bool media_errors;
>  	int verbose;
>  	struct log_ctx ctx;
>  };
> @@ -88,6 +89,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
>  		flags |= UTIL_JSON_ALERT_CONFIG;
>  	if (param->dax)
>  		flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
> +	if (param->media_errors)
> +		flags |= UTIL_JSON_MEDIA_ERRORS;
>  	return flags;
>  }
>  
> diff --git a/cxl/list.c b/cxl/list.c
> index 93ba51ef895c..0b25d78248d5 100644
> --- a/cxl/list.c
> +++ b/cxl/list.c
> @@ -57,6 +57,8 @@ static const struct option options[] = {
>  		    "include memory device firmware information"),
>  	OPT_BOOLEAN('A', "alert-config", &param.alert_config,
>  		    "include alert configuration information"),
> +	OPT_BOOLEAN('L', "media-errors", &param.media_errors,
> +		    "include media-error information "),
>  	OPT_INCR('v', "verbose", &param.verbose, "increase output detail"),
>  #ifdef ENABLE_DEBUG
>  	OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
> @@ -121,6 +123,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
>  		param.fw = true;
>  		param.alert_config = true;
>  		param.dax = true;
> +		param.media_errors = true;
>  		/* fallthrough */
>  	case 2:
>  		param.idle = true;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test
  2024-03-14  4:05 ` [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test alison.schofield
@ 2024-03-15 17:03   ` Dave Jiang
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Jiang @ 2024-03-15 17:03 UTC (permalink / raw)
  To: alison.schofield, Vishal Verma; +Cc: nvdimm, linux-cxl



On 3/13/24 9:05 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Exercise cxl list, libcxl, and driver pieces of the get poison list
> pathway. Inject and clear poison using debugfs and use cxl-cli to
> read the poison list by memdev and by region.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  test/cxl-poison.sh | 137 +++++++++++++++++++++++++++++++++++++++++++++
>  test/meson.build   |   2 +
>  2 files changed, 139 insertions(+)
>  create mode 100644 test/cxl-poison.sh
> 
> diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh
> new file mode 100644
> index 000000000000..af2e9dcd1a11
> --- /dev/null
> +++ b/test/cxl-poison.sh
> @@ -0,0 +1,137 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2023 Intel Corporation. All rights reserved.
> +
> +. "$(dirname "$0")"/common
> +
> +rc=77
> +
> +set -ex
> +
> +trap 'err $LINENO' ERR
> +
> +check_prereq "jq"
> +
> +modprobe -r cxl_test
> +modprobe cxl_test
> +
> +rc=1
> +
> +# THEORY OF OPERATION: Exercise cxl-cli and cxl driver ability to
> +# inject, clear, and get the poison list. Do it by memdev and by region.
> +
> +find_memdev()
> +{
> +	readarray -t capable_mems < <("$CXL" list -b "$CXL_TEST_BUS" -M |
> +		jq -r ".[] | select(.pmem_size != null) |
> +		select(.ram_size != null) | .memdev")
> +
> +	if [ ${#capable_mems[@]} == 0 ]; then
> +		echo "no memdevs found for test"
> +		err "$LINENO"
> +	fi
> +
> +	memdev=${capable_mems[0]}
> +}
> +
> +create_x2_region()
> +{
> +	# Find an x2 decoder
> +	decoder="$($CXL list -b "$CXL_TEST_BUS" -D -d root | jq -r ".[] |
> +		select(.pmem_capable == true) |
> +		select(.nr_targets == 2) |
> +		.decoder")"
> +
> +	# Find a memdev for each host-bridge interleave position
> +	port_dev0="$($CXL list -T -d "$decoder" | jq -r ".[] |
> +		.targets | .[] | select(.position == 0) | .target")"
> +	port_dev1="$($CXL list -T -d "$decoder" | jq -r ".[] |
> +		.targets | .[] | select(.position == 1) | .target")"
> +	mem0="$($CXL list -M -p "$port_dev0" | jq -r ".[0].memdev")"
> +	mem1="$($CXL list -M -p "$port_dev1" | jq -r ".[0].memdev")"
> +
> +	region="$($CXL create-region -d "$decoder" -m "$mem0" "$mem1" |
> +		jq -r ".region")"
> +	if [[ ! $region ]]; then
> +		echo "create-region failed for $decoder"
> +		err "$LINENO"
> +	fi
> +	echo "$region"
> +}
> +
> +# When cxl-cli support for inject and clear arrives, replace
> +# the writes to /sys/kernel/debug with the new cxl commands.
> +
> +inject_poison_sysfs()
> +{
> +	memdev="$1"
> +	addr="$2"
> +
> +	echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
> +}
> +
> +clear_poison_sysfs()
> +{
> +	memdev="$1"
> +	addr="$2"
> +
> +	echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
> +}
> +
> +validate_poison_found()
> +{
> +	list_by="$1"
> +	nr_expect="$2"
> +
> +	poison_list="$($CXL list "$list_by" --media-errors |
> +		jq -r '.[].media_errors')"
> +	if [[ ! $poison_list ]]; then
> +		nr_found=0
> +	else
> +		nr_found=$(jq "length" <<< "$poison_list")
> +	fi
> +	if [ "$nr_found" -ne "$nr_expect" ]; then
> +		echo "$nr_expect poison records expected, $nr_found found"
> +		err "$LINENO"
> +	fi
> +}
> +
> +test_poison_by_memdev()
> +{
> +	find_memdev
> +	inject_poison_sysfs "$memdev" "0x40000000"
> +	inject_poison_sysfs "$memdev" "0x40001000"
> +	inject_poison_sysfs "$memdev" "0x600"
> +	inject_poison_sysfs "$memdev" "0x0"
> +	validate_poison_found "-m $memdev" 4
> +
> +	clear_poison_sysfs "$memdev" "0x40000000"
> +	clear_poison_sysfs "$memdev" "0x40001000"
> +	clear_poison_sysfs "$memdev" "0x600"
> +	clear_poison_sysfs "$memdev" "0x0"
> +	validate_poison_found "-m $memdev" 0
> +}
> +
> +test_poison_by_region()
> +{
> +	create_x2_region
> +	inject_poison_sysfs "$mem0" "0x40000000"
> +	inject_poison_sysfs "$mem1" "0x40000000"
> +	validate_poison_found "-r $region" 2
> +
> +	clear_poison_sysfs "$mem0" "0x40000000"
> +	clear_poison_sysfs "$mem1" "0x40000000"
> +	validate_poison_found "-r $region" 0
> +}
> +
> +# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing.
> +# Turning it on here allows the test user to also view inject and clear
> +# trace events.
> +echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable
> +
> +test_poison_by_memdev
> +test_poison_by_region
> +
> +check_dmesg "$LINENO"
> +
> +modprobe -r cxl-test
> diff --git a/test/meson.build b/test/meson.build
> index a965a79fd6cb..d871e28e17ce 100644
> --- a/test/meson.build
> +++ b/test/meson.build
> @@ -160,6 +160,7 @@ cxl_events = find_program('cxl-events.sh')
>  cxl_sanitize = find_program('cxl-sanitize.sh')
>  cxl_destroy_region = find_program('cxl-destroy-region.sh')
>  cxl_qos_class = find_program('cxl-qos-class.sh')
> +cxl_poison = find_program('cxl-poison.sh')
>  
>  tests = [
>    [ 'libndctl',               libndctl,		  'ndctl' ],
> @@ -192,6 +193,7 @@ tests = [
>    [ 'cxl-sanitize.sh',        cxl_sanitize,       'cxl'   ],
>    [ 'cxl-destroy-region.sh',  cxl_destroy_region, 'cxl'   ],
>    [ 'cxl-qos-class.sh',       cxl_qos_class,      'cxl'   ],
> +  [ 'cxl-poison.sh',          cxl_poison,         'cxl'   ],
>  ]
>  
>  if get_option('destructive').enabled()

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type
  2024-03-14  4:05 ` [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type alison.schofield
  2024-03-15 15:44   ` Dave Jiang
@ 2024-03-15 17:39   ` Dan Williams
  2024-03-18 17:28     ` Alison Schofield
  2024-03-18 21:21   ` fan
  2 siblings, 1 reply; 29+ messages in thread
From: Dan Williams @ 2024-03-15 17:39 UTC (permalink / raw)
  To: alison.schofield, Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Add helpers to extract the value of an event record field given the
> field name. This is useful when the user knows the name and format
> of the field and simply needs to get it. The helpers also return
> the 'type'_MAX of the type when the field is
> 
> Since this is in preparation for adding a cxl_poison private parser
> for 'cxl list --media-errors' support those specific required
> types: u8, u32, u64.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++
>  cxl/event_trace.h |  8 +++++++-
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/cxl/event_trace.c b/cxl/event_trace.c
> index 640abdab67bf..324edb982888 100644
> --- a/cxl/event_trace.c
> +++ b/cxl/event_trace.c
> @@ -15,6 +15,43 @@
>  #define _GNU_SOURCE
>  #include <string.h>
>  
> +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
> +		      const char *name)
> +{
> +	unsigned long long val;
> +
> +	if (tep_get_field_val(NULL, event, name, record, &val, 0))
> +		return ULLONG_MAX;
> +
> +	return val;
> +}

Hm, why are these prefixed "cxl_" there is nothing cxl specific in the
internals. Maybe these event trace helpers grow non-CXL users in the
future. Could be "trace_" or "util_" like other generic helpers in the
codebase.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test
       [not found] ` <CGME20240314040551epcas2p40829b16b09f439519a692070fb460242@epcms2p1>
@ 2024-03-15 23:03   ` Wonjae Lee
  2024-03-18 17:17     ` Alison Schofield
  0 siblings, 1 reply; 29+ messages in thread
From: Wonjae Lee @ 2024-03-15 23:03 UTC (permalink / raw)
  To: alison.schofield, Vishal Verma; +Cc: Hojin Nam, nvdimm, linux-cxl

alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> Exercise cxl list, libcxl, and driver pieces of the get poison list
> pathway. Inject and clear poison using debugfs and use cxl-cli to
> read the poison list by memdev and by region.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> test/cxl-poison.sh 137 +++++++++++++++++++++++++++++++++++++++++++++
> test/meson.build    2 +
> 2 files changed, 139 insertions(+)
> create mode 100644 test/cxl-poison.sh
>
> diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh
> new file mode 100644
> index 000000000000..af2e9dcd1a11
> --- /dev/null
> +++ b/test/cxl-poison.sh

[snip]

> +# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing.

Hi,

I know it's trivial and not sure if I'm understanding the history of
the patch series correctly, but --poison seems to be an option that
was suggested before --media-errors. I'm wondering if it's okay to
leave this comment.

Thanks,
Wonjae

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test
  2024-03-15 23:03   ` [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test Wonjae Lee
@ 2024-03-18 17:17     ` Alison Schofield
  0 siblings, 0 replies; 29+ messages in thread
From: Alison Schofield @ 2024-03-18 17:17 UTC (permalink / raw)
  To: Wonjae Lee; +Cc: Vishal Verma, Hojin Nam, nvdimm, linux-cxl

On Sat, Mar 16, 2024 at 08:03:34AM +0900, Wonjae Lee wrote:
> alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > Exercise cxl list, libcxl, and driver pieces of the get poison list
> > pathway. Inject and clear poison using debugfs and use cxl-cli to
> > read the poison list by memdev and by region.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> > test/cxl-poison.sh 137 +++++++++++++++++++++++++++++++++++++++++++++
> > test/meson.build    2 +
> > 2 files changed, 139 insertions(+)
> > create mode 100644 test/cxl-poison.sh
> >
> > diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh
> > new file mode 100644
> > index 000000000000..af2e9dcd1a11
> > --- /dev/null
> > +++ b/test/cxl-poison.sh
> 
> [snip]
> 
> > +# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing.
> 
> Hi,
> 
> I know it's trivial and not sure if I'm understanding the history of
> the patch series correctly, but --poison seems to be an option that
> was suggested before --media-errors. I'm wondering if it's okay to
> leave this comment.

Thanks Wonjae - I appreciate your find. I'll fix it up.
Alison

> 
> Thanks,
> Wonjae

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type
  2024-03-15 17:39   ` Dan Williams
@ 2024-03-18 17:28     ` Alison Schofield
  0 siblings, 0 replies; 29+ messages in thread
From: Alison Schofield @ 2024-03-18 17:28 UTC (permalink / raw)
  To: Dan Williams, Dave Jiang; +Cc: Vishal Verma, nvdimm, linux-cxl

On Fri, Mar 15, 2024 at 10:39:53AM -0700, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Add helpers to extract the value of an event record field given the
> > field name. This is useful when the user knows the name and format
> > of the field and simply needs to get it. The helpers also return
> > the 'type'_MAX of the type when the field is
> > 
> > Since this is in preparation for adding a cxl_poison private parser
> > for 'cxl list --media-errors' support those specific required
> > types: u8, u32, u64.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> >  cxl/event_trace.h |  8 +++++++-
> >  2 files changed, 44 insertions(+), 1 deletion(-)
> > 
> > diff --git a/cxl/event_trace.c b/cxl/event_trace.c
> > index 640abdab67bf..324edb982888 100644
> > --- a/cxl/event_trace.c
> > +++ b/cxl/event_trace.c
> > @@ -15,6 +15,43 @@
> >  #define _GNU_SOURCE
> >  #include <string.h>
> >  
> > +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
> > +		      const char *name)
> > +{
> > +	unsigned long long val;
> > +
> > +	if (tep_get_field_val(NULL, event, name, record, &val, 0))
> > +		return ULLONG_MAX;
> > +
> > +	return val;
> > +}
> 
> Hm, why are these prefixed "cxl_" there is nothing cxl specific in the
> internals. Maybe these event trace helpers grow non-CXL users in the
> future. Could be "trace_" or "util_" like other generic helpers in the
> codebase.

All the helpers in cxl/event_trace.c are prefixed "cxl_". The cxl
special-ness is only that ndctl/cxl is the only user of trace events
in ndctl/.  cxl/monitor.c and now cxl/json.c (this usage)

I can move: ndctl/cxl/event_trace.h,c to ndctl/utils/event_trace.h,c.
and update cxl/monitor.c to find.

Yay?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands
  2024-03-14  4:05 ` [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
@ 2024-03-18 17:51   ` fan
  2024-03-18 20:11     ` Alison Schofield
  0 siblings, 1 reply; 29+ messages in thread
From: fan @ 2024-03-18 17:51 UTC (permalink / raw)
  To: alison.schofield; +Cc: Vishal Verma, nvdimm, linux-cxl, Dave Jiang

On Wed, Mar 13, 2024 at 09:05:17PM -0700, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices maintain a list of locations that are poisoned or result
> in poison if the addresses are accessed by the host.
> 
> Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison
> List as a set of  Media Error Records that include the source of the
> error, the starting device physical address and length.
> 
> Trigger the retrieval of the poison list by writing to the memory
> device sysfs attribute: trigger_poison_list. The CXL driver only
> offers triggering per memdev, so the trigger by region interface
> offered here is a convenience API that triggers a poison list
> retrieval for each memdev contributing to a region.
> 
> int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
> int cxl_region_trigger_poison_list(struct cxl_region *region);
> 
> The resulting poison records are logged as kernel trace events
> named 'cxl_poison'.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  cxl/lib/libcxl.c   | 47 ++++++++++++++++++++++++++++++++++++++++++++++
>  cxl/lib/libcxl.sym |  2 ++
>  cxl/libcxl.h       |  2 ++
>  3 files changed, 51 insertions(+)
> 
> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> index ff27cdf7c44a..73db8f15c704 100644
> --- a/cxl/lib/libcxl.c
> +++ b/cxl/lib/libcxl.c
> @@ -1761,6 +1761,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev)
>  	return 0;
>  }
>  
> +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev)
> +{
> +	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
> +	char *path = memdev->dev_buf;
> +	int len = memdev->buf_len, rc;
> +
> +	if (snprintf(path, len, "%s/trigger_poison_list",
> +		     memdev->dev_path) >= len) {
> +		err(ctx, "%s: buffer too small\n",
> +		    cxl_memdev_get_devname(memdev));
> +		return -ENXIO;
> +	}
> +	rc = sysfs_write_attr(ctx, path, "1\n");
> +	if (rc < 0) {
> +		fprintf(stderr,
> +			"%s: Failed write sysfs attr trigger_poison_list\n",
> +			cxl_memdev_get_devname(memdev));

Should we use err() instead of fprintf here? 

Fan

> +		return rc;
> +	}
> +	return 0;
> +}
> +
> +CXL_EXPORT int cxl_region_trigger_poison_list(struct cxl_region *region)
> +{
> +	struct cxl_memdev_mapping *mapping;
> +	int rc;
> +
> +	cxl_mapping_foreach(region, mapping) {
> +		struct cxl_decoder *decoder;
> +		struct cxl_memdev *memdev;
> +
> +		decoder = cxl_mapping_get_decoder(mapping);
> +		if (!decoder)
> +			continue;
> +
> +		memdev = cxl_decoder_get_memdev(decoder);
> +		if (!memdev)
> +			continue;
> +
> +		rc = cxl_memdev_trigger_poison_list(memdev);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +
>  CXL_EXPORT int cxl_memdev_enable(struct cxl_memdev *memdev)
>  {
>  	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
> index de2cd84b2960..3f709c60db3d 100644
> --- a/cxl/lib/libcxl.sym
> +++ b/cxl/lib/libcxl.sym
> @@ -280,4 +280,6 @@ global:
>  	cxl_memdev_get_pmem_qos_class;
>  	cxl_memdev_get_ram_qos_class;
>  	cxl_region_qos_class_mismatch;
> +	cxl_memdev_trigger_poison_list;
> +	cxl_region_trigger_poison_list;
>  } LIBCXL_6;
> diff --git a/cxl/libcxl.h b/cxl/libcxl.h
> index a6af3fb04693..29165043ca3f 100644
> --- a/cxl/libcxl.h
> +++ b/cxl/libcxl.h
> @@ -467,6 +467,8 @@ enum cxl_setpartition_mode {
>  
>  int cxl_cmd_partition_set_mode(struct cxl_cmd *cmd,
>  		enum cxl_setpartition_mode mode);
> +int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
> +int cxl_region_trigger_poison_list(struct cxl_region *region);
>  
>  int cxl_cmd_alert_config_set_life_used_prog_warn_threshold(struct cxl_cmd *cmd,
>  							   int threshold);
> -- 
> 2.37.3
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands
  2024-03-18 17:51   ` fan
@ 2024-03-18 20:11     ` Alison Schofield
  2024-03-18 21:01       ` Dan Williams
  0 siblings, 1 reply; 29+ messages in thread
From: Alison Schofield @ 2024-03-18 20:11 UTC (permalink / raw)
  To: fan; +Cc: Vishal Verma, nvdimm, linux-cxl, Dave Jiang

On Mon, Mar 18, 2024 at 10:51:13AM -0700, fan wrote:
> On Wed, Mar 13, 2024 at 09:05:17PM -0700, alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices maintain a list of locations that are poisoned or result
> > in poison if the addresses are accessed by the host.
> > 
> > Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison
> > List as a set of  Media Error Records that include the source of the
> > error, the starting device physical address and length.
> > 
> > Trigger the retrieval of the poison list by writing to the memory
> > device sysfs attribute: trigger_poison_list. The CXL driver only
> > offers triggering per memdev, so the trigger by region interface
> > offered here is a convenience API that triggers a poison list
> > retrieval for each memdev contributing to a region.
> > 
> > int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
> > int cxl_region_trigger_poison_list(struct cxl_region *region);
> > 
> > The resulting poison records are logged as kernel trace events
> > named 'cxl_poison'.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > ---
> >  cxl/lib/libcxl.c   | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> >  cxl/lib/libcxl.sym |  2 ++
> >  cxl/libcxl.h       |  2 ++
> >  3 files changed, 51 insertions(+)
> > 
> > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> > index ff27cdf7c44a..73db8f15c704 100644
> > --- a/cxl/lib/libcxl.c
> > +++ b/cxl/lib/libcxl.c
> > @@ -1761,6 +1761,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev)
> >  	return 0;
> >  }
> >  
> > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev)
> > +{
> > +	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
> > +	char *path = memdev->dev_buf;
> > +	int len = memdev->buf_len, rc;
> > +
> > +	if (snprintf(path, len, "%s/trigger_poison_list",
> > +		     memdev->dev_path) >= len) {
> > +		err(ctx, "%s: buffer too small\n",
> > +		    cxl_memdev_get_devname(memdev));
> > +		return -ENXIO;
> > +	}
> > +	rc = sysfs_write_attr(ctx, path, "1\n");
> > +	if (rc < 0) {
> > +		fprintf(stderr,
> > +			"%s: Failed write sysfs attr trigger_poison_list\n",
> > +			cxl_memdev_get_devname(memdev));
> 
> Should we use err() instead of fprintf here? 

Thanks Fan,

How about this?

- use fprintf if access() fails, ie device doesn't support poison list,
- use err() for failure to actually read the poison list on a device with
  support

Alison


> 
> Fan
> 
> > +		return rc;
> > +	}
> > +	return 0;
> > +}
> > +
> > +CXL_EXPORT int cxl_region_trigger_poison_list(struct cxl_region *region)
> > +{
> > +	struct cxl_memdev_mapping *mapping;
> > +	int rc;
> > +
> > +	cxl_mapping_foreach(region, mapping) {
> > +		struct cxl_decoder *decoder;
> > +		struct cxl_memdev *memdev;
> > +
> > +		decoder = cxl_mapping_get_decoder(mapping);
> > +		if (!decoder)
> > +			continue;
> > +
> > +		memdev = cxl_decoder_get_memdev(decoder);
> > +		if (!memdev)
> > +			continue;
> > +
> > +		rc = cxl_memdev_trigger_poison_list(memdev);
> > +		if (rc)
> > +			return rc;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  CXL_EXPORT int cxl_memdev_enable(struct cxl_memdev *memdev)
> >  {
> >  	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
> > diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
> > index de2cd84b2960..3f709c60db3d 100644
> > --- a/cxl/lib/libcxl.sym
> > +++ b/cxl/lib/libcxl.sym
> > @@ -280,4 +280,6 @@ global:
> >  	cxl_memdev_get_pmem_qos_class;
> >  	cxl_memdev_get_ram_qos_class;
> >  	cxl_region_qos_class_mismatch;
> > +	cxl_memdev_trigger_poison_list;
> > +	cxl_region_trigger_poison_list;
> >  } LIBCXL_6;
> > diff --git a/cxl/libcxl.h b/cxl/libcxl.h
> > index a6af3fb04693..29165043ca3f 100644
> > --- a/cxl/libcxl.h
> > +++ b/cxl/libcxl.h
> > @@ -467,6 +467,8 @@ enum cxl_setpartition_mode {
> >  
> >  int cxl_cmd_partition_set_mode(struct cxl_cmd *cmd,
> >  		enum cxl_setpartition_mode mode);
> > +int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
> > +int cxl_region_trigger_poison_list(struct cxl_region *region);
> >  
> >  int cxl_cmd_alert_config_set_life_used_prog_warn_threshold(struct cxl_cmd *cmd,
> >  							   int threshold);
> > -- 
> > 2.37.3
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands
  2024-03-18 20:11     ` Alison Schofield
@ 2024-03-18 21:01       ` Dan Williams
  2024-03-19 16:43         ` Alison Schofield
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Williams @ 2024-03-18 21:01 UTC (permalink / raw)
  To: Alison Schofield, fan; +Cc: Vishal Verma, nvdimm, linux-cxl, Dave Jiang

Alison Schofield wrote:
> On Mon, Mar 18, 2024 at 10:51:13AM -0700, fan wrote:
> > On Wed, Mar 13, 2024 at 09:05:17PM -0700, alison.schofield@intel.com wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > CXL devices maintain a list of locations that are poisoned or result
> > > in poison if the addresses are accessed by the host.
> > > 
> > > Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison
> > > List as a set of  Media Error Records that include the source of the
> > > error, the starting device physical address and length.
> > > 
> > > Trigger the retrieval of the poison list by writing to the memory
> > > device sysfs attribute: trigger_poison_list. The CXL driver only
> > > offers triggering per memdev, so the trigger by region interface
> > > offered here is a convenience API that triggers a poison list
> > > retrieval for each memdev contributing to a region.
> > > 
> > > int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
> > > int cxl_region_trigger_poison_list(struct cxl_region *region);
> > > 
> > > The resulting poison records are logged as kernel trace events
> > > named 'cxl_poison'.
> > > 
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > ---
> > >  cxl/lib/libcxl.c   | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> > >  cxl/lib/libcxl.sym |  2 ++
> > >  cxl/libcxl.h       |  2 ++
> > >  3 files changed, 51 insertions(+)
> > > 
> > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> > > index ff27cdf7c44a..73db8f15c704 100644
> > > --- a/cxl/lib/libcxl.c
> > > +++ b/cxl/lib/libcxl.c
> > > @@ -1761,6 +1761,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev)
> > >  	return 0;
> > >  }
> > >  
> > > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev)
> > > +{
> > > +	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
> > > +	char *path = memdev->dev_buf;
> > > +	int len = memdev->buf_len, rc;
> > > +
> > > +	if (snprintf(path, len, "%s/trigger_poison_list",
> > > +		     memdev->dev_path) >= len) {
> > > +		err(ctx, "%s: buffer too small\n",
> > > +		    cxl_memdev_get_devname(memdev));
> > > +		return -ENXIO;
> > > +	}
> > > +	rc = sysfs_write_attr(ctx, path, "1\n");
> > > +	if (rc < 0) {
> > > +		fprintf(stderr,
> > > +			"%s: Failed write sysfs attr trigger_poison_list\n",
> > > +			cxl_memdev_get_devname(memdev));
> > 
> > Should we use err() instead of fprintf here? 
> 
> Thanks Fan,
> 
> How about this?
> 
> - use fprintf if access() fails, ie device doesn't support poison list,
> - use err() for failure to actually read the poison list on a device with
>   support

Why? There is no raw usage of fprintf in any of the libraries (ndctl,
daxctl, cxl) to date. If someone builds the library without logging then
it should not chat on stderr at all, and if someone redirects logging to
syslog then it also should emit messages only there and not stderr.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type
  2024-03-14  4:05 ` [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type alison.schofield
  2024-03-15 15:44   ` Dave Jiang
  2024-03-15 17:39   ` Dan Williams
@ 2024-03-18 21:21   ` fan
  2 siblings, 0 replies; 29+ messages in thread
From: fan @ 2024-03-18 21:21 UTC (permalink / raw)
  To: alison.schofield; +Cc: Vishal Verma, nvdimm, linux-cxl

On Wed, Mar 13, 2024 at 09:05:20PM -0700, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Add helpers to extract the value of an event record field given the
> field name. This is useful when the user knows the name and format
> of the field and simply needs to get it. The helpers also return
> the 'type'_MAX of the type when the field is
> 
> Since this is in preparation for adding a cxl_poison private parser
> for 'cxl list --media-errors' support those specific required
> types: u8, u32, u64.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---

Reviewed-by: Fan Ni <fan.ni@samsung.com>

>  cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++
>  cxl/event_trace.h |  8 +++++++-
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/cxl/event_trace.c b/cxl/event_trace.c
> index 640abdab67bf..324edb982888 100644
> --- a/cxl/event_trace.c
> +++ b/cxl/event_trace.c
> @@ -15,6 +15,43 @@
>  #define _GNU_SOURCE
>  #include <string.h>
>  
> +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
> +		      const char *name)
> +{
> +	unsigned long long val;
> +
> +	if (tep_get_field_val(NULL, event, name, record, &val, 0))
> +		return ULLONG_MAX;
> +
> +	return val;
> +}
> +
> +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record,
> +		      const char *name)
> +{
> +	char *val;
> +	int len;
> +
> +	val = tep_get_field_raw(NULL, event, name, record, &len, 0);
> +	if (!val)
> +		return UINT_MAX;
> +
> +	return *(u32 *)val;
> +}
> +
> +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record,
> +		    const char *name)
> +{
> +	char *val;
> +	int len;
> +
> +	val = tep_get_field_raw(NULL, event, name, record, &len, 0);
> +	if (!val)
> +		return UCHAR_MAX;
> +
> +	return *(u8 *)val;
> +}
> +
>  static struct json_object *num_to_json(void *num, int elem_size, unsigned long flags)
>  {
>  	bool sign = flags & TEP_FIELD_IS_SIGNED;
> diff --git a/cxl/event_trace.h b/cxl/event_trace.h
> index b77cafb410c4..7b30c3922aef 100644
> --- a/cxl/event_trace.h
> +++ b/cxl/event_trace.h
> @@ -5,6 +5,7 @@
>  
>  #include <json-c/json.h>
>  #include <ccan/list/list.h>
> +#include <ccan/short_types/short_types.h>
>  
>  struct jlist_node {
>  	struct json_object *jobj;
> @@ -32,5 +33,10 @@ int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx);
>  int cxl_event_tracing_enable(struct tracefs_instance *inst, const char *system,
>  		const char *event);
>  int cxl_event_tracing_disable(struct tracefs_instance *inst);
> -
> +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record,
> +		    const char *name);
> +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record,
> +		      const char *name);
> +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
> +		      const char *name);
>  #endif
> -- 
> 2.37.3
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands
  2024-03-18 21:01       ` Dan Williams
@ 2024-03-19 16:43         ` Alison Schofield
  0 siblings, 0 replies; 29+ messages in thread
From: Alison Schofield @ 2024-03-19 16:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: fan, Vishal Verma, nvdimm, linux-cxl, Dave Jiang

On Mon, Mar 18, 2024 at 02:01:38PM -0700, Dan Williams wrote:
> Alison Schofield wrote:
> > On Mon, Mar 18, 2024 at 10:51:13AM -0700, fan wrote:
> > > On Wed, Mar 13, 2024 at 09:05:17PM -0700, alison.schofield@intel.com wrote:
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > > 
> > > > CXL devices maintain a list of locations that are poisoned or result
> > > > in poison if the addresses are accessed by the host.
> > > > 
> > > > Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison
> > > > List as a set of  Media Error Records that include the source of the
> > > > error, the starting device physical address and length.
> > > > 
> > > > Trigger the retrieval of the poison list by writing to the memory
> > > > device sysfs attribute: trigger_poison_list. The CXL driver only
> > > > offers triggering per memdev, so the trigger by region interface
> > > > offered here is a convenience API that triggers a poison list
> > > > retrieval for each memdev contributing to a region.
> > > > 
> > > > int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
> > > > int cxl_region_trigger_poison_list(struct cxl_region *region);
> > > > 
> > > > The resulting poison records are logged as kernel trace events
> > > > named 'cxl_poison'.
> > > > 
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > > ---
> > > >  cxl/lib/libcxl.c   | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >  cxl/lib/libcxl.sym |  2 ++
> > > >  cxl/libcxl.h       |  2 ++
> > > >  3 files changed, 51 insertions(+)
> > > > 
> > > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> > > > index ff27cdf7c44a..73db8f15c704 100644
> > > > --- a/cxl/lib/libcxl.c
> > > > +++ b/cxl/lib/libcxl.c
> > > > @@ -1761,6 +1761,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev)
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev)
> > > > +{
> > > > +	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
> > > > +	char *path = memdev->dev_buf;
> > > > +	int len = memdev->buf_len, rc;
> > > > +
> > > > +	if (snprintf(path, len, "%s/trigger_poison_list",
> > > > +		     memdev->dev_path) >= len) {
> > > > +		err(ctx, "%s: buffer too small\n",
> > > > +		    cxl_memdev_get_devname(memdev));
> > > > +		return -ENXIO;
> > > > +	}
> > > > +	rc = sysfs_write_attr(ctx, path, "1\n");
> > > > +	if (rc < 0) {
> > > > +		fprintf(stderr,
> > > > +			"%s: Failed write sysfs attr trigger_poison_list\n",
> > > > +			cxl_memdev_get_devname(memdev));
> > > 
> > > Should we use err() instead of fprintf here? 
> > 
> > Thanks Fan,
> > 
> > How about this?
> > 
> > - use fprintf if access() fails, ie device doesn't support poison list,
> > - use err() for failure to actually read the poison list on a device with
> >   support
> 
> Why? There is no raw usage of fprintf in any of the libraries (ndctl,
> daxctl, cxl) to date. If someone builds the library without logging then
> it should not chat on stderr at all, and if someone redirects logging to
> syslog then it also should emit messages only there and not stderr.

Why indeed :(

I'll remove the fprintf() and only use err() for both cases: device
doesn't support feature, or failure to read list.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 5/7] cxl/list: collect and parse media_error records
  2024-03-15 16:16   ` Dave Jiang
@ 2024-03-20 20:24     ` Alison Schofield
  0 siblings, 0 replies; 29+ messages in thread
From: Alison Schofield @ 2024-03-20 20:24 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Vishal Verma, nvdimm, linux-cxl

On Fri, Mar 15, 2024 at 09:16:27AM -0700, Dave Jiang wrote:
> 
> 
> On 3/13/24 9:05 PM, alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Media_error records are logged as events in the kernel tracing
> > subsystem. To prepare the media_error records for cxl list, enable
> > tracing, trigger the poison list read, and parse the generated
> > cxl_poison events into a json representation.
> > 
> > Use the event_trace private parsing option to customize the json
> > representation based on cxl-list calling options and event field
> > settings.
> > 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Minor nit below.

Nit removed. Thanks!

> > ---
> >  cxl/json.c | 194 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 194 insertions(+)
> > 
> > diff --git a/cxl/json.c b/cxl/json.c
> > index fbe41c78e82a..974e98f13cec 100644
> > --- a/cxl/json.c
> > +++ b/cxl/json.c
> > @@ -1,16 +1,20 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  // Copyright (C) 2015-2021 Intel Corporation. All rights reserved.
> >  #include <limits.h>
> > +#include <errno.h>
> >  #include <util/json.h>
> > +#include <util/bitmap.h>
> >  #include <uuid/uuid.h>
> >  #include <cxl/libcxl.h>
> >  #include <json-c/json.h>
> >  #include <json-c/printbuf.h>
> >  #include <ccan/short_types/short_types.h>
> > +#include <tracefs/tracefs.h>
> >  
> >  #include "filter.h"
> >  #include "json.h"
> >  #include "../daxctl/json.h"
> > +#include "event_trace.h"
> >  
> >  #define CXL_FW_VERSION_STR_LEN	16
> >  #define CXL_FW_MAX_SLOTS	4
> > @@ -571,6 +575,184 @@ err_jobj:
> >  	return NULL;
> >  }
> >  
> > +/* CXL Spec 3.1 Table 8-140 Media Error Record */
> > +#define CXL_POISON_SOURCE_MAX 7
> > +static const char *poison_source[] = { "Unknown",  "External", "Internal",
> > +				       "Injected", "Reserved", "Reserved",
> > +				       "Reserved", "Vendor" };
> > +
> > +/* CXL Spec 3.1 Table 8-139 Get Poison List Output Payload */
> > +#define CXL_POISON_FLAG_MORE BIT(0)
> > +#define CXL_POISON_FLAG_OVERFLOW BIT(1)
> > +#define CXL_POISON_FLAG_SCANNING BIT(2)
> > +
> > +static int poison_event_to_json(struct tep_event *event,
> > +				struct tep_record *record,
> > +				struct event_ctx *e_ctx)
> > +{
> > +	struct poison_ctx *p_ctx = e_ctx->poison_ctx;
> > +	struct json_object *jp, *jobj, *jpoison = p_ctx->jpoison;
> > +	struct cxl_memdev *memdev = p_ctx->memdev;
> > +	struct cxl_region *region = p_ctx->region;
> > +	unsigned long flags = p_ctx->flags;
> > +	const char *region_name = NULL;
> > +	char flag_str[32] = { '\0' };
> > +	bool overflow = false;
> > +	u8 source, pflags;
> > +	u64 offset, ts;
> > +	u32 length;
> > +	char *str;
> > +	int len;
> > +
> > +	jp = json_object_new_object();
> > +	if (!jp)
> > +		return -ENOMEM;
> > +
> > +	/* Skip records not in this region when listing by region */
> > +	if (region)
> > +		region_name = cxl_region_get_devname(region);
> > +	if (region_name)
> > +		str = tep_get_field_raw(NULL, event, "region", record, &len, 0);
> > +	if ((region_name) && (strcmp(region_name, str) != 0)) {
> > +		json_object_put(jp);
> > +		return 0;
> > +	}
> > +	/* Include offset,length by region (hpa) or by memdev (dpa) */
> > +	if (region) {
> > +		offset = cxl_get_field_u64(event, record, "hpa");
> > +		if (offset != ULLONG_MAX) {
> > +			offset = offset - cxl_region_get_resource(region);
> > +			jobj = util_json_object_hex(offset, flags);
> > +			if (jobj)
> > +				json_object_object_add(jp, "offset", jobj);
> > +		}
> > +	} else if (memdev) {
> > +		offset = cxl_get_field_u64(event, record, "dpa");
> > +		if (offset != ULLONG_MAX) {
> > +			jobj = util_json_object_hex(offset, flags);
> > +			if (jobj)
> > +				json_object_object_add(jp, "offset", jobj);
> > +		}
> > +	}
> > +	length = cxl_get_field_u32(event, record, "dpa_length");
> > +	jobj = util_json_object_size(length, flags);
> > +	if (jobj)
> > +		json_object_object_add(jp, "length", jobj);
> > +
> > +	/* Always include the poison source */
> > +	source = cxl_get_field_u8(event, record, "source");
> > +	if (source <= CXL_POISON_SOURCE_MAX)
> > +		jobj = json_object_new_string(poison_source[source]);
> > +	else
> > +		jobj = json_object_new_string("Reserved");
> > +	if (jobj)
> > +		json_object_object_add(jp, "source", jobj);
> > +
> > +	/* Include flags and overflow time if present */
> > +	pflags = cxl_get_field_u8(event, record, "flags");
> > +	if (pflags && pflags < UCHAR_MAX) {
> > +		if (pflags & CXL_POISON_FLAG_MORE)
> > +			strcat(flag_str, "More,");
> > +		if (pflags & CXL_POISON_FLAG_SCANNING)
> > +			strcat(flag_str, "Scanning,");
> > +		if (pflags & CXL_POISON_FLAG_OVERFLOW) {
> > +			strcat(flag_str, "Overflow,");
> > +			overflow = true;
> > +		}
> > +		jobj = json_object_new_string(flag_str);
> > +		if (jobj)
> > +			json_object_object_add(jp, "flags", jobj);
> > +	}
> > +	if (overflow) {
> > +		ts = cxl_get_field_u64(event, record, "overflow_ts");
> > +		jobj = util_json_object_hex(ts, flags);
> > +		if (jobj)
> > +			json_object_object_add(jp, "overflow_t", jobj);
> > +	}
> > +	json_object_array_add(jpoison, jp);
> > +
> > +	return 0;
> > +}
> > +
> > +static struct json_object *
> > +util_cxl_poison_events_to_json(struct tracefs_instance *inst,
> > +			       struct poison_ctx *p_ctx)
> > +{
> > +	struct event_ctx ectx = {
> > +		.event_name = "cxl_poison",
> > +		.event_pid = getpid(),
> > +		.system = "cxl",
> > +		.poison_ctx = p_ctx,
> > +		.parse_event = poison_event_to_json,
> > +	};
> > +	int rc = 0;
> 
> No need to init rc here. 
> 
> DJ
> 
> > +
> > +	p_ctx->jpoison = json_object_new_array();
> > +	if (!p_ctx->jpoison)
> > +		return NULL;
> > +
> > +	rc = cxl_parse_events(inst, &ectx);
> > +	if (rc < 0) {
> > +		fprintf(stderr, "Failed to parse events: %d\n", rc);
> > +		goto put_jobj;
> > +	}
> > +	if (json_object_array_length(p_ctx->jpoison) == 0)
> > +		goto put_jobj;
> > +
> > +	return p_ctx->jpoison;
> > +
> > +put_jobj:
> > +	json_object_put(p_ctx->jpoison);
> > +	return NULL;
> > +}
> > +
> > +static struct json_object *
> > +util_cxl_poison_list_to_json(struct cxl_region *region,
> > +			     struct cxl_memdev *memdev,
> > +			     unsigned long flags)
> > +{
> > +	struct json_object *jpoison = NULL;
> > +	struct poison_ctx p_ctx;
> > +	struct tracefs_instance *inst;
> > +	int rc;
> > +
> > +	inst = tracefs_instance_create("cxl list");
> > +	if (!inst) {
> > +		fprintf(stderr, "tracefs_instance_create() failed\n");
> > +		return NULL;
> > +	}
> > +
> > +	rc = cxl_event_tracing_enable(inst, "cxl", "cxl_poison");
> > +	if (rc < 0) {
> > +		fprintf(stderr, "Failed to enable trace: %d\n", rc);
> > +		goto err_free;
> > +	}
> > +
> > +	if (region)
> > +		rc = cxl_region_trigger_poison_list(region);
> > +	else
> > +		rc = cxl_memdev_trigger_poison_list(memdev);
> > +	if (rc)
> > +		goto err_free;
> > +
> > +	rc = cxl_event_tracing_disable(inst);
> > +	if (rc < 0) {
> > +		fprintf(stderr, "Failed to disable trace: %d\n", rc);
> > +		goto err_free;
> > +	}
> > +
> > +	p_ctx = (struct poison_ctx) {
> > +		.region = region,
> > +		.memdev = memdev,
> > +		.flags = flags,
> > +	};
> > +	jpoison = util_cxl_poison_events_to_json(inst, &p_ctx);
> > +
> > +err_free:
> > +	tracefs_instance_free(inst);
> > +	return jpoison;
> > +}
> > +
> >  struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
> >  		unsigned long flags)
> >  {
> > @@ -664,6 +846,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
> >  			json_object_object_add(jdev, "firmware", jobj);
> >  	}
> >  
> > +	if (flags & UTIL_JSON_MEDIA_ERRORS) {
> > +		jobj = util_cxl_poison_list_to_json(NULL, memdev, flags);
> > +		if (jobj)
> > +			json_object_object_add(jdev, "media_errors", jobj);
> > +	}
> > +
> >  	json_object_set_userdata(jdev, memdev, NULL);
> >  	return jdev;
> >  }
> > @@ -1012,6 +1200,12 @@ struct json_object *util_cxl_region_to_json(struct cxl_region *region,
> >  			json_object_object_add(jregion, "state", jobj);
> >  	}
> >  
> > +	if (flags & UTIL_JSON_MEDIA_ERRORS) {
> > +		jobj = util_cxl_poison_list_to_json(region, NULL, flags);
> > +		if (jobj)
> > +			json_object_object_add(jregion, "media_errors", jobj);
> > +	}
> > +
> >  	util_cxl_mappings_append_json(jregion, region, flags);
> >  
> >  	if (flags & UTIL_JSON_DAX) {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list
  2024-03-15  3:35       ` Dan Williams
@ 2024-03-20 20:40         ` Alison Schofield
  2024-03-27 19:48         ` Alison Schofield
  1 sibling, 0 replies; 29+ messages in thread
From: Alison Schofield @ 2024-03-20 20:40 UTC (permalink / raw)
  To: Dan Williams; +Cc: Wonjae Lee, Vishal Verma, Hojin Nam, nvdimm, linux-cxl

On Thu, Mar 14, 2024 at 08:35:01PM -0700, Dan Williams wrote:
> Alison Schofield wrote:
> > On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> > > alison.schofield@intel.com wrote:
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > >
> > > > The --media-errors option to 'cxl list' retrieves poison lists from
> > > > memory devices supporting the capability and displays the returned
> > > > media_error records in the cxl list json. This option can apply to
> > > > memdevs or regions.
> > > >
> > > > Include media-errors in the -vvv verbose option.
> > > >
> > > > Example usage in the Documentation/cxl/cxl-list.txt update.
> > > >
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > ---
> > > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > > > cxl/filter.h                    3 ++
> > > > cxl/list.c                      3 ++
> > > > 3 files changed, 67 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > > > index 838de4086678..6d3ef92c29e8 100644
> > > > --- a/Documentation/cxl/cxl-list.txt
> > > > +++ b/Documentation/cxl/cxl-list.txt
> > > 
> > > [snip]
> > > 
> > > +----
> > > +In the above example, region mappings can be found using:
> > > +"cxl list -p mem9 --decoders"
> > > +----
> > > 
> > > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> > > cover letter, too.
> > 
> > Thanks for the review! I went with -p because it gives only
> > the endpoint decoder while -m gives all the decoders up to
> > the root - more than needed to discover the region.
> 
> The first thing that comes to mind to list memory devices with their
> decoders is:
> 
>     cxl list -MD -d endpoint
> 
> ...however the problem is that endpoint ports connect memdevs to their
> parent port, so the above results in:
> 
>   Warning: no matching devices found
> 
> I think I want to special case "-d endpoint" when both -M and -D are
> specified to also imply -E, "endpoint ports". However that also seems to
> have a bug at present:
> 
> # cxl list -EDM -d endpoint -iu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2
> }
> 
> That needs to be fixed up to merge:
> 
> # cxl list -ED -d endpoint -iu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2,
>   "decoders:endpoint2":[
>     {
>       "decoder":"decoder2.0",
>       "interleave_ways":1,
>       "state":"disabled"
>     }
>   ]
> }
> 
> ...and:
> 
> # cxl list -EMu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2,
>   "memdev":{
>     "memdev":"mem0",
>     "pmem_size":"512.00 MiB (536.87 MB)",
>     "serial":"0",
>     "host":"0000:35:00.0"
>   }
> }
> 
> ...so that one can get a nice listing of just endpoint ports, their
> decoders (with media errors) and their memdevs.
> 
> The reason that "cxl list -p mem9 -D" works is subtle because it filters
> the endpoint decoders by an endpoint port filter, but I think most users
> would expect to not need to enable endpoint-port listings to see their
> decoders the natural key to filter endpoint decoders is by memdev.

Wonjae, Dan,

This feedback inspires me to seek more input from future users. This
tool should be adding a convenience and I don't want to proceed without
more user feedback confirming this implementation is more convenient
than the currently available method (trace & trigger). We also want to
avoid working with or around some awkward json output for eternity.

I'm following this response with a reply to the cover letter seeking
more inputs.

Thanks,
Alison



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 0/7] Support poison list retrieval
  2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
                   ` (8 preceding siblings ...)
       [not found] ` <CGME20240314040551epcas2p40829b16b09f439519a692070fb460242@epcms2p1>
@ 2024-03-20 20:42 ` Alison Schofield
  9 siblings, 0 replies; 29+ messages in thread
From: Alison Schofield @ 2024-03-20 20:42 UTC (permalink / raw)
  To: Vishal Verma; +Cc: nvdimm, linux-cxl

On Wed, Mar 13, 2024 at 09:05:16PM -0700, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>

Asking folks to share this with future users of the poison list
feature of ndctl. ie. cxl list --media-errors

I'd like to get additional 'user' input on the json output provided by
this --media-errors option to cxl-list. After a few iterations of what
should be included in the cxl-list output, I'm not so sure that we've
captured sufficient input from potential users. (Since they typically
won't use this til it's released in ndctl.)

To guide your thinking recall that users can retrieve a devices poison
list now without any cxl-cli (ndctl) tool support. Users can trigger
the collection via sysfs and see the results in the trace logs like:

this:
- echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable
- echo 1 > /sys/bus/cxl/devices/memX/trigger_poison_list
- Examine the cxl_poison events in the trace file at 

or this:
- cxl monitor --daemon --log=<poison-log-path>
- echo 1 > /sys/bus/cxl/devices/memX/trigger_poison_list
- Examine the cxl_poison events in the monitor log

or this:
- enable tp_printk
- echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable
- echo 1 > /sys/bus/cxl/devices/memX/trigger_poison_list
- Examine the cxl_poison events in the dmesg log

So, a few ways to get at this cxl_poison trace data:
memdev=mem9
host=cxl_mem.5 
serial=5 
trace_type=List 
region=region5 
region_uuid=99352a43-44cb-405d-85c9-fdbd971455d8
hpa=0xf110001000
dpa=0x40000000
dpa_length=0x40
source=Injected
flags=
overflow_time=0

The tool should be providing a better experience that the sysfs/trace.
The tools does look up memdevs contributing to a region and triggers
the needed poison list reads, so that's a small convenience. It's
usefulness needs to extend to the json listing.

Here's history of json output pulled from the patch cover letters.
It's long, but I didn't want to omit any detail.

I've appended here the history of changes to the output.
Only including samples where the json output actually changed.
I'm including it to spur conversation not as a guideline.

Subject: [ndctl PATCH v11 0/7] Support poison list retrieval

           # cxl list -m mem9 --media-errors -u
           {
             "memdev":"mem9",
             "pmem_size":"1024.00 MiB (1073.74 MB)",
             "pmem_qos_class":42,
             "ram_size":"1024.00 MiB (1073.74 MB)",
             "ram_qos_class":42,
             "serial":"0x5",
             "numa_node":1,
             "host":"cxl_mem.5",
             "media_errors":[
               {
                 "offset":"0x40000000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }


           # cxl list -r region5 --media-errors -u
           {
             "region":"region5",
             "resource":"0xf110000000",
             "size":"2.00 GiB (2.15 GB)",
             "type":"pmem",
             "interleave_ways":2,
             "interleave_granularity":4096,
             "decode_state":"commit",
             "media_errors":[
               {
                 "offset":"0x1000",
                 "length":64,
                 "source":"Injected"
               },
               {
                 "offset":"0x2000",
                 "length":64,
                 "source":"Injected"
               }
             ]
           }


Subject: [ndctl PATCH v7 0/7] Support poison list retrieval

# cxl list -m mem1 --media-errors
[
  {
    "memdev":"mem1",
    "pmem_size":1073741824,
    "ram_size":1073741824,
    "serial":1,
    "numa_node":1,
    "host":"cxl_mem.1",
    "media_errors":[
      {
        "dpa":0,
        "length":64,
        "source":"Internal"
      },
      {
        "decoder":"decoder10.0",
        "hpa":1035355557888,
        "dpa":1073741824,
        "length":64,
        "source":"External"
      },
      {
        "decoder":"decoder10.0",
        "hpa":1035355566080,
        "dpa":1073745920,
        "length":64,
        "source":"Injected"
      }
    ]
  }
]

# cxl list -r region5 --media-errors
[
  {
    "region":"region5",
    "resource":1035355553792,
    "size":2147483648,
    "type":"pmem",
    "interleave_ways":2,
    "interleave_granularity":4096,
    "decode_state":"commit",
    "media_errors":[
      {
        "decoder":"decoder10.0",
        "hpa":1035355557888,
        "dpa":1073741824,
        "length":64,
        "source":"External"
      },
      {
        "decoder":"decoder8.1",
        "hpa":1035355566080,
        "dpa":1073745920,
        "length":64,
        "source":"Internal"
      }
    ]
  }
]

Subject: [ndctl PATCH v6 0/7] Support poison list retrieval

# cxl list -m mem1 --media-errors
[
  {
    "memdev":"mem1",
    "pmem_size":1073741824,
    "ram_size":1073741824,
    "serial":1,
    "numa_node":1,
    "host":"cxl_mem.1",
    "media_errors":[
      {
        "dpa":0,
        "dpa_length":64,
        "source":"Injected"
      },
      {
        "region":"region5",
        "dpa":1073741824,
        "dpa_length":64,
        "hpa":1035355557888,
        "source":"Injected"
      },
      {
        "region":"region5",
        "dpa":1073745920,
        "dpa_length":64,
        "hpa":1035355566080,
        "source":"Injected"
      }
    ]
  }
]

# cxl list -r region5 --media-errors
[
  {
    "region":"region5",
    "resource":1035355553792,
    "size":2147483648,
    "type":"pmem",
    "interleave_ways":2,
    "interleave_granularity":4096,
    "decode_state":"commit",
    "media_errors":[
      {
        "memdev":"mem1",
        "dpa":1073741824,
        "dpa_length":64,
        "hpa":1035355557888,
        "source":"Injected"
      },
      {
        "memdev":"mem1",
        "dpa":1073745920,
        "dpa_length":64,
        "hpa":1035355566080,
        "source":"Injected"
      }
    ]
  }
]

Subject: [ndctl PATCH v2 0/3] Support poison list retrieval

Example: By memdev
cxl list -m mem1 --poison -u
{
  "memdev":"mem1",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0x1",
  "numa_node":1,
  "host":"cxl_mem.1",
  "poison":{
    "nr_poison_records":4,
    "poison_records":[
      {
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "dpa":"0x40001000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "dpa":"0",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "dpa":"0x600",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      }
    ]
  }
}

Example: By region
cxl list -r region5 --poison -u
{
  "region":"region5",
  "resource":"0xf110000000",
  "size":"2.00 GiB (2.15 GB)",
  "type":"pmem",
  "interleave_ways":2,
  "interleave_granularity":4096,
  "decode_state":"commit",
  "poison":{
    "nr_poison_records":2,
    "poison_records":[
      {
        "memdev":"mem1",
        "region":"region5",
        "hpa":"0xf110001000",
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "memdev":"mem0",
        "region":"region5",
        "hpa":"0xf110000000",
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      }
    ]
  }
}


Example: By memdev and coincidentally in a region
# cxl list -m mem0 --poison -u
{
  "memdev":"mem0",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0",
  "numa_node":0,
  "host":"cxl_mem.0",
  "poison":{
    "nr_poison_records":1,
    "poison_records":[
      {
        "region":"region5",
        "hpa":"0xf110000000",
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      }
    ]
  }
}


Example: No poison found
cxl list -m mem9 --poison -u
{
  "memdev":"mem9",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0x9",
  "numa_node":1,
  "host":"cxl_mem.9",
  "poison":{
    "nr_poison_records":0
  }
}


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list
  2024-03-15  3:35       ` Dan Williams
  2024-03-20 20:40         ` Alison Schofield
@ 2024-03-27 19:48         ` Alison Schofield
  2024-04-18 20:12           ` Alison Schofield
  1 sibling, 1 reply; 29+ messages in thread
From: Alison Schofield @ 2024-03-27 19:48 UTC (permalink / raw)
  To: Dan Williams; +Cc: Wonjae Lee, Vishal Verma, Hojin Nam, nvdimm, linux-cxl

On Thu, Mar 14, 2024 at 08:35:01PM -0700, Dan Williams wrote:
> Alison Schofield wrote:
> > On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> > > alison.schofield@intel.com wrote:
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > >
> > > > The --media-errors option to 'cxl list' retrieves poison lists from
> > > > memory devices supporting the capability and displays the returned
> > > > media_error records in the cxl list json. This option can apply to
> > > > memdevs or regions.
> > > >
> > > > Include media-errors in the -vvv verbose option.
> > > >
> > > > Example usage in the Documentation/cxl/cxl-list.txt update.
> > > >
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > ---
> > > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > > > cxl/filter.h                    3 ++
> > > > cxl/list.c                      3 ++
> > > > 3 files changed, 67 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > > > index 838de4086678..6d3ef92c29e8 100644
> > > > --- a/Documentation/cxl/cxl-list.txt
> > > > +++ b/Documentation/cxl/cxl-list.txt
> > > 
> > > [snip]
> > > 
> > > +----
> > > +In the above example, region mappings can be found using:
> > > +"cxl list -p mem9 --decoders"
> > > +----
> > > 
> > > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> > > cover letter, too.
> > 
> > Thanks for the review! I went with -p because it gives only
> > the endpoint decoder while -m gives all the decoders up to
> > the root - more than needed to discover the region.
> 
> The first thing that comes to mind to list memory devices with their
> decoders is:
> 
>     cxl list -MD -d endpoint
> 
> ...however the problem is that endpoint ports connect memdevs to their
> parent port, so the above results in:
> 
>   Warning: no matching devices found
> 
> I think I want to special case "-d endpoint" when both -M and -D are
> specified to also imply -E, "endpoint ports". However that also seems to
> have a bug at present:
> 
> # cxl list -EDM -d endpoint -iu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2
> }
> 
> That needs to be fixed up to merge:

What's to fix up? Doesn't filtering by '-d endpoint' exclude the
objects you specified in -EDM.  It becomes the equivalent of
of 'cxl list -E'

> 
> # cxl list -ED -d endpoint -iu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2,
>   "decoders:endpoint2":[
>     {
>       "decoder":"decoder2.0",
>       "interleave_ways":1,
>       "state":"disabled"
>     }
>   ]
> }
> 
> ...and:
> 
> # cxl list -EMu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2,
>   "memdev":{
>     "memdev":"mem0",
>     "pmem_size":"512.00 MiB (536.87 MB)",
>     "serial":"0",
>     "host":"0000:35:00.0"
>   }
> }
>

Some of the examples above that use "-d endpoint", filtering on endpoint
decoders, and so are, by design, excluding memdev info.  Filtering on
endpoint ports, ie -p endpoint, supports a listing of the endpoint
memdevs and decoders. 

> ...so that one can get a nice listing of just endpoint ports, their
> decoders (with media errors) and their memdevs.
> 

Dissecting the above sentence:
"of just endpoint ports"  --> -p endpoint
"their decoders" --> -DE
"their memdevs"  --> -M
"(with media errors)" --media-errors

Yields this query:
cxl list -p endpoint -DEM --media-errors

You wrote (with media errors) after 'decoders' and that is of concern,
but maybe just a typo?  ATM --media-errors applies to memdev or region
objects, not to decoder objects.

> The reason that "cxl list -p mem9 -D" works is subtle because it filters
> the endpoint decoders by an endpoint port filter, but I think most users
> would expect to not need to enable endpoint-port listings to see their
> decoders the natural key to filter endpoint decoders is by memdev.

Not following this subtle comment. I find it to be an exacting filter
targeting exactly a memdev that may be of interest and supplying
the decoder and region mappings. It would be best suggested in one
step, and that's is an update in the v12 man page:
cxl list -p mem9 -DEM --media-errors

I don't understand the desire to use endpoint decoders as a filter when
using endpoint ports which have memdevs and endpoint decoders as
children works, and flows with the whole top down cxl list filtering 
design. I also don't see a need to special case, and 'imply' endpoint
ports, when use can explicitly add -p endpoint to their query.
(the special case seems like it would add confusion to the cxl list
usage)

I'm following this w a v12 that does update the man page suggestions.
Let's continue this conversation there.

Thanks,
Alison





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list
  2024-03-27 19:48         ` Alison Schofield
@ 2024-04-18 20:12           ` Alison Schofield
  0 siblings, 0 replies; 29+ messages in thread
From: Alison Schofield @ 2024-04-18 20:12 UTC (permalink / raw)
  To: Dan Williams; +Cc: Wonjae Lee, Vishal Verma, Hojin Nam, nvdimm, linux-cxl

Hi Dan,

Here's where I believe we last left off.

I thought we had closure on the json format of the media error records,
and on the fact that those objects are appended to memdev or region
objects.

The open is on how to use 'cxl list' to view the poison records.

Can we pick up that discussion below in this v11 thread?

The v12 that I refer to below is here:
https://lore.kernel.org/cover.1711519822.git.alison.schofield@intel.com/

-- Alison


On Wed, Mar 27, 2024 at 12:48:12PM -0700, Alison Schofield wrote:
> On Thu, Mar 14, 2024 at 08:35:01PM -0700, Dan Williams wrote:
> > Alison Schofield wrote:
> > > On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> > > > alison.schofield@intel.com wrote:
> > > > > From: Alison Schofield <alison.schofield@intel.com>
> > > > >
> > > > > The --media-errors option to 'cxl list' retrieves poison lists from
> > > > > memory devices supporting the capability and displays the returned
> > > > > media_error records in the cxl list json. This option can apply to
> > > > > memdevs or regions.
> > > > >
> > > > > Include media-errors in the -vvv verbose option.
> > > > >
> > > > > Example usage in the Documentation/cxl/cxl-list.txt update.
> > > > >
> > > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > > ---
> > > > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > > > > cxl/filter.h                    3 ++
> > > > > cxl/list.c                      3 ++
> > > > > 3 files changed, 67 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > > > > index 838de4086678..6d3ef92c29e8 100644
> > > > > --- a/Documentation/cxl/cxl-list.txt
> > > > > +++ b/Documentation/cxl/cxl-list.txt
> > > > 
> > > > [snip]
> > > > 
> > > > +----
> > > > +In the above example, region mappings can be found using:
> > > > +"cxl list -p mem9 --decoders"
> > > > +----
> > > > 
> > > > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> > > > cover letter, too.
> > > 
> > > Thanks for the review! I went with -p because it gives only
> > > the endpoint decoder while -m gives all the decoders up to
> > > the root - more than needed to discover the region.
> > 
> > The first thing that comes to mind to list memory devices with their
> > decoders is:
> > 
> >     cxl list -MD -d endpoint
> > 
> > ...however the problem is that endpoint ports connect memdevs to their
> > parent port, so the above results in:
> > 
> >   Warning: no matching devices found
> > 
> > I think I want to special case "-d endpoint" when both -M and -D are
> > specified to also imply -E, "endpoint ports". However that also seems to
> > have a bug at present:
> > 
> > # cxl list -EDM -d endpoint -iu
> > {
> >   "endpoint":"endpoint2",
> >   "host":"mem0",
> >   "parent_dport":"0000:34:00.0",
> >   "depth":2
> > }
> > 
> > That needs to be fixed up to merge:
> 
> What's to fix up? Doesn't filtering by '-d endpoint' exclude the
> objects you specified in -EDM.  It becomes the equivalent of
> of 'cxl list -E'
> 
> > 
> > # cxl list -ED -d endpoint -iu
> > {
> >   "endpoint":"endpoint2",
> >   "host":"mem0",
> >   "parent_dport":"0000:34:00.0",
> >   "depth":2,
> >   "decoders:endpoint2":[
> >     {
> >       "decoder":"decoder2.0",
> >       "interleave_ways":1,
> >       "state":"disabled"
> >     }
> >   ]
> > }
> > 
> > ...and:
> > 
> > # cxl list -EMu
> > {
> >   "endpoint":"endpoint2",
> >   "host":"mem0",
> >   "parent_dport":"0000:34:00.0",
> >   "depth":2,
> >   "memdev":{
> >     "memdev":"mem0",
> >     "pmem_size":"512.00 MiB (536.87 MB)",
> >     "serial":"0",
> >     "host":"0000:35:00.0"
> >   }
> > }
> >
> 
> Some of the examples above that use "-d endpoint", filtering on endpoint
> decoders, and so are, by design, excluding memdev info.  Filtering on
> endpoint ports, ie -p endpoint, supports a listing of the endpoint
> memdevs and decoders. 
> 
> > ...so that one can get a nice listing of just endpoint ports, their
> > decoders (with media errors) and their memdevs.
> > 
> 
> Dissecting the above sentence:
> "of just endpoint ports"  --> -p endpoint
> "their decoders" --> -DE
> "their memdevs"  --> -M
> "(with media errors)" --media-errors
> 
> Yields this query:
> cxl list -p endpoint -DEM --media-errors
> 
> You wrote (with media errors) after 'decoders' and that is of concern,
> but maybe just a typo?  ATM --media-errors applies to memdev or region
> objects, not to decoder objects.
> 
> > The reason that "cxl list -p mem9 -D" works is subtle because it filters
> > the endpoint decoders by an endpoint port filter, but I think most users
> > would expect to not need to enable endpoint-port listings to see their
> > decoders the natural key to filter endpoint decoders is by memdev.
> 
> Not following this subtle comment. I find it to be an exacting filter
> targeting exactly a memdev that may be of interest and supplying
> the decoder and region mappings. It would be best suggested in one
> step, and that's is an update in the v12 man page:
> cxl list -p mem9 -DEM --media-errors
> 
> I don't understand the desire to use endpoint decoders as a filter when
> using endpoint ports which have memdevs and endpoint decoders as
> children works, and flows with the whole top down cxl list filtering 
> design. I also don't see a need to special case, and 'imply' endpoint
> ports, when use can explicitly add -p endpoint to their query.
> (the special case seems like it would add confusion to the cxl list
> usage)
> 
> I'm following this w a v12 that does update the man page suggestions.
> Let's continue this conversation there.
> 
> Thanks,
> Alison
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-04-18 20:12 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-14  4:05 [ndctl PATCH v11 0/7] Support poison list retrieval alison.schofield
2024-03-14  4:05 ` [ndctl PATCH v11 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
2024-03-18 17:51   ` fan
2024-03-18 20:11     ` Alison Schofield
2024-03-18 21:01       ` Dan Williams
2024-03-19 16:43         ` Alison Schofield
2024-03-14  4:05 ` [ndctl PATCH v11 2/7] cxl/event_trace: add an optional pid check to event parsing alison.schofield
2024-03-14  4:05 ` [ndctl PATCH v11 3/7] cxl/event_trace: support poison context in " alison.schofield
2024-03-14  4:05 ` [ndctl PATCH v11 4/7] cxl/event_trace: add helpers to retrieve tep fields by type alison.schofield
2024-03-15 15:44   ` Dave Jiang
2024-03-15 17:39   ` Dan Williams
2024-03-18 17:28     ` Alison Schofield
2024-03-18 21:21   ` fan
2024-03-14  4:05 ` [ndctl PATCH v11 5/7] cxl/list: collect and parse media_error records alison.schofield
2024-03-15 16:16   ` Dave Jiang
2024-03-20 20:24     ` Alison Schofield
2024-03-14  4:05 ` [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list alison.schofield
2024-03-15 16:41   ` Dave Jiang
2024-03-14  4:05 ` [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test alison.schofield
2024-03-15 17:03   ` Dave Jiang
     [not found] ` <CGME20240314040548epcas2p3698bf9d1463a1d2255dc95ac506d3ae8@epcms2p4>
2024-03-15  1:09   ` [ndctl PATCH v11 6/7] cxl/list: add --media-errors option to cxl list Wonjae Lee
2024-03-15  2:36     ` Alison Schofield
2024-03-15  3:35       ` Dan Williams
2024-03-20 20:40         ` Alison Schofield
2024-03-27 19:48         ` Alison Schofield
2024-04-18 20:12           ` Alison Schofield
     [not found] ` <CGME20240314040551epcas2p40829b16b09f439519a692070fb460242@epcms2p1>
2024-03-15 23:03   ` [ndctl PATCH v11 7/7] cxl/test: add cxl-poison.sh unit test Wonjae Lee
2024-03-18 17:17     ` Alison Schofield
2024-03-20 20:42 ` [ndctl PATCH v11 0/7] Support poison list retrieval Alison Schofield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).