All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] CXL: Read and clear event logs
@ 2022-08-13  5:32 ira.weiny
  2022-08-13  5:32 ` [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command ira.weiny
                   ` (9 more replies)
  0 siblings, 10 replies; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Event records inform the OS of various device events.  Events are not needed
for any kernel operation but various user level software will want to track
events.

Add event reporting through the trace event mechanism.  On driver load read and
clear all device events.

Normally interrupts will trigger new events to be reported as they occur.
Because the interrupt code is still being worked on this series provides a
cxl-test mechanism to create a series of events and trigger the reporting of
those events.

This series is submitted as an RFC for a few reasons:

	1) Interrupt support is still missing
	2) I'd like to get comments on the format of the trace events
	3) Some of the event formats are badly aligned and I would like to see
	   if there is any clarification on how the data will be formatted
	   (See individual patches for details)

Ira Weiny (9):
  cxl/mem: Implement Get Event Records command
  cxl/mem: Implement Clear Event Records command
  cxl/mem: Clear events on driver load
  cxl/mem: Trace General Media Event Record
  cxl/mem: Trace DRAM Event Record
  cxl/mem: Trace Memory Module Event Record
  cxl/test: Add generic mock events
  cxl/test: Add specific events
  cxl/test: Simulate event log overflow

 MAINTAINERS                       |   1 +
 drivers/cxl/core/mbox.c           | 143 ++++++++
 drivers/cxl/cxlmem.h              | 149 +++++++++
 drivers/cxl/pci.c                 |   2 +
 include/trace/events/cxl-events.h | 521 ++++++++++++++++++++++++++++++
 include/uapi/linux/cxl_mem.h      |   2 +
 tools/testing/cxl/test/mem.c      | 399 +++++++++++++++++++++++
 7 files changed, 1217 insertions(+)
 create mode 100644 include/trace/events/cxl-events.h


base-commit: 1cd8a2537eb07751d405ab7e2223f20338a90506
-- 
2.35.3


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
@ 2022-08-13  5:32 ` ira.weiny
  2022-08-16 16:39   ` Steven Rostedt
                     ` (2 more replies)
  2022-08-13  5:32 ` [RFC PATCH 2/9] cxl/mem: Implement Clear " ira.weiny
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Event records are defined for CXL devices.  Each record is reported in
one event log.  Devices are required to support the storage of at least
one event record in each event log type.

Devices track event log overflow by incrementing a counter and tracking
the time of the first and last overflow event seen.

Software queries events via the Get Event Record mailbox command; CXL
v3.0 section 8.2.9.2.2.

Issue the Get Event Record mailbox command on driver load.  Trace each
record found, as well as any overflow conditions.  Only 1 event is
requested for each query.  Optimization of multiple record queries is
deferred.

This patch traces a raw event record only and leaves the specific event
record types to subsequent patches.

NOTE: checkpatch is not completely happy with the tracing part of this
patch but AFAICT it is correct.  I'm open to suggestions if I've done
something wrong.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 MAINTAINERS                       |   1 +
 drivers/cxl/core/mbox.c           |  60 ++++++++++++++
 drivers/cxl/cxlmem.h              |  66 ++++++++++++++++
 include/trace/events/cxl-events.h | 127 ++++++++++++++++++++++++++++++
 include/uapi/linux/cxl_mem.h      |   1 +
 5 files changed, 255 insertions(+)
 create mode 100644 include/trace/events/cxl-events.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 54fa6e2059de..1cb9cec31009 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5014,6 +5014,7 @@ M:	Dan Williams <dan.j.williams@intel.com>
 L:	linux-cxl@vger.kernel.org
 S:	Maintained
 F:	drivers/cxl/
+F:	include/trace/events/cxl*.h
 F:	include/uapi/linux/cxl_mem.h
 
 CONEXANT ACCESSRUNNER USB DRIVER
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 16176b9278b4..2cceed8608dc 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -7,6 +7,9 @@
 #include <cxlmem.h>
 #include <cxl.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/cxl-events.h>
+
 #include "core.h"
 
 static bool cxl_raw_allow_all;
@@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
 #endif
 	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
+	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
 	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
 	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
 	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -704,6 +708,62 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
 
+static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
+				   enum cxl_event_log_type type)
+{
+	struct cxl_get_event_payload payload;
+
+	do {
+		u8 log_type = type;
+		u16 record_count;
+		int rc;
+
+		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
+				       &log_type, sizeof(log_type),
+				       &payload, sizeof(payload));
+		if (rc)
+			return rc;
+
+		record_count = le16_to_cpu(payload.record_count);
+		if (record_count > 0)
+			trace_cxl_event(dev_name(cxlds->dev), type,
+					&payload.record);
+
+		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
+			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
+						 &payload);
+
+	} while (payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
+
+	return 0;
+}
+
+/**
+ * cxl_mem_get_event_records - Get Event Records from the device
+ * @cxlds: The device data for the operation
+ *
+ * Retrieve all event records available on the device and report them as trace
+ * events.
+ *
+ * See CXL v3.0 @8.2.9.2.2 Get Event Records
+ */
+void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
+{
+	struct device *dev = cxlds->dev;
+	enum cxl_event_log_type log_type;
+
+	for (log_type = CXL_EVENT_TYPE_INFO;
+	     log_type < CXL_EVENT_TYPE_MAX; log_type++) {
+		int rc;
+
+		rc = cxl_mem_get_records_log(cxlds, log_type);
+		if (rc)
+			dev_err(dev, "Failed to query %s Event Logs : %d",
+				cxl_event_log_type_str(log_type), rc);
+	}
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
+
 /**
  * cxl_mem_get_partition_info - Get partition info
  * @cxlds: The device data for the operation
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 88e3a8e54b6a..f83634f3bc8d 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -4,6 +4,7 @@
 #define __CXL_MEM_H__
 #include <uapi/linux/cxl_mem.h>
 #include <linux/cdev.h>
+#include <linux/uuid.h>
 #include "cxl.h"
 
 /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -253,6 +254,7 @@ struct cxl_dev_state {
 enum cxl_opcode {
 	CXL_MBOX_OP_INVALID		= 0x0000,
 	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
+	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
 	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
 	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
 	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
@@ -322,6 +324,69 @@ struct cxl_mbox_identify {
 	u8 qos_telemetry_caps;
 } __packed;
 
+/*
+ * Common Event Record Format
+ * CXL v3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+	uuid_t id;
+	__le32 flags_length;
+	__le16 handle;
+	__le16 related_handle;
+	__le64 timestamp;
+	__le64 reserved1;
+	__le64 reserved2;
+} __packed;
+
+#define EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+	struct cxl_event_record_hdr hdr;
+	u8 data[EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * Get Event Records output payload
+ * CXL v3.0 section 8.2.9.2.2; Table 8-50
+ *
+ * Space given for 1 record
+ */
+#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
+#define CXL_GET_EVENT_FLAG_MORE_RECORDS	BIT(1)
+struct cxl_get_event_payload {
+	u8 flags;
+	u8 reserved1;
+	__le16 overflow_err_count;
+	__le64 first_overflow_timestamp;
+	__le64 last_overflow_timestamp;
+	__le16 record_count;
+	u8 reserved2[0xa];
+	struct cxl_event_record_raw record;
+} __packed;
+
+enum cxl_event_log_type {
+	CXL_EVENT_TYPE_INFO = 0x00,
+	CXL_EVENT_TYPE_WARN,
+	CXL_EVENT_TYPE_FAIL,
+	CXL_EVENT_TYPE_FATAL,
+	CXL_EVENT_TYPE_MAX
+};
+static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
+{
+	switch (type) {
+	case CXL_EVENT_TYPE_INFO:
+		return "Informational";
+	case CXL_EVENT_TYPE_WARN:
+		return "Warning";
+	case CXL_EVENT_TYPE_FAIL:
+		return "Failure";
+	case CXL_EVENT_TYPE_FATAL:
+		return "Fatal";
+	default:
+		break;
+	}
+	return "<unknown>";
+}
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
@@ -381,6 +446,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
 struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
 void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
 void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
+void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
 void cxl_mem_active_dec(void);
diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
new file mode 100644
index 000000000000..f4baeae66cf3
--- /dev/null
+++ b/include/trace/events/cxl-events.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cxl_events
+
+#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
+#define _CXL_TRACE_EVENTS_H
+
+#include <linux/tracepoint.h>
+
+#define EVENT_LOGS					\
+	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
+	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
+	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
+	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
+	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")
+
+/*
+ * First define the enums in the above macros to be exported to userspace via
+ * TRACE_DEFINE_ENUM().
+ */
+#undef EM
+#undef EMe
+#define EM(a, b)	TRACE_DEFINE_ENUM(a);
+#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
+
+EVENT_LOGS
+#define show_log_type(type) __print_symbolic(type, EVENT_LOGS)
+
+/*
+ * Now redefine the EM and EMe macros to map the enums to the strings that will
+ * be printed in the output
+ */
+#undef EM
+#undef EMe
+#define EM(a, b)        {a, b},
+#define EMe(a, b)       {a, b}
+
+TRACE_EVENT(cxl_event_overflow,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_get_event_payload *payload),
+
+	TP_ARGS(dev_name, log, payload),
+
+	TP_STRUCT__entry(
+		__string(dev_name, dev_name)
+		__field(int, log)
+		__field(u16, count)
+		__field(u64, first)
+		__field(u64, last)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name);
+		__entry->log = log;
+		__entry->count = le16_to_cpu(payload->overflow_err_count);
+		__entry->first = le64_to_cpu(payload->first_overflow_timestamp);
+		__entry->last = le64_to_cpu(payload->last_overflow_timestamp);
+	),
+
+	TP_printk("%s: EVENT LOG %s OVERFLOW %u records from %llu to %llu",
+		__get_str(dev_name), show_log_type(__entry->log),
+		__entry->count, __entry->first, __entry->last)
+
+);
+
+/*
+ * Common Event Record Format
+ * CXL v2.0 section 8.2.9.1.1; Table 153
+ */
+#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
+#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
+#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
+#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
+#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
+	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"Permanent Condition"		}, \
+	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"Maintanance Needed"		}, \
+	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
+	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
+)
+
+TRACE_EVENT(cxl_event,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_event_record_raw *rec),
+
+	TP_ARGS(dev_name, log, rec),
+
+	TP_STRUCT__entry(
+		__string(dev_name, dev_name)
+		__field(int, log)
+		__array(u8, id, UUID_SIZE)
+		__field(u32, flags)
+		__field(u16, handle)
+		__field(u16, related_handle)
+		__field(u64, timestamp)
+		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
+		__field(u8, length)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name);
+		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
+		__entry->log = log;
+		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
+		__entry->length = le32_to_cpu(rec->hdr.flags_length) & 0xFF;
+		__entry->handle = le16_to_cpu(rec->hdr.handle);
+		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
+		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
+		memcpy(__entry->data, &rec->data, EVENT_RECORD_DATA_LENGTH);
+	),
+
+	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s' " \
+		  ": %s",
+		__get_str(dev_name), show_log_type(__entry->log),
+		__entry->timestamp, __entry->id, __entry->handle,
+		__entry->related_handle, show_hdr_flags(__entry->flags),
+		__print_hex(__entry->data, EVENT_RECORD_DATA_LENGTH)
+		)
+);
+
+#endif /* _CXL_TRACE_EVENTS_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE cxl-events
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index c71021a2a9ed..70459be5bdd4 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -24,6 +24,7 @@
 	___C(IDENTIFY, "Identify Command"),                               \
 	___C(RAW, "Raw device command"),                                  \
 	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
+	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
 	___C(GET_FW_INFO, "Get FW Info"),                                 \
 	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
 	___C(GET_LSA, "Get Label Storage Area"),                          \
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 2/9] cxl/mem: Implement Clear Event Records command
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
  2022-08-13  5:32 ` [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command ira.weiny
@ 2022-08-13  5:32 ` ira.weiny
  2022-08-24 15:55   ` Jonathan Cameron
  2022-08-13  5:32 ` [RFC PATCH 3/9] cxl/mem: Clear events on driver load ira.weiny
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL v3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
command.  After an event record is read it needs to be cleared from the
event log.

Implement cxl_clear_event_record() and call it for each record retrieved
from the device.

Each record is cleared individually.  A clear all bit is specified but
events could arrive between a get and the final clear all operation.
Therefore each event is cleared specifically.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/cxl/core/mbox.c      | 31 ++++++++++++++++++++++++++++---
 drivers/cxl/cxlmem.h         | 15 +++++++++++++++
 include/uapi/linux/cxl_mem.h |  1 +
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 2cceed8608dc..493f5ceb5d1c 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 #endif
 	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
 	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
+	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
 	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
 	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
 	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -708,6 +709,26 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
 
+static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
+				  enum cxl_event_log_type log,
+				  __le16 handle)
+{
+	struct cxl_mbox_clear_event_payload payload;
+	int rc;
+
+	memset(&payload, 0, sizeof(payload));
+	payload.event_log = log;
+	payload.nr_recs = 1;
+	payload.handle = handle;
+
+	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
+			       &payload, sizeof(payload), NULL, 0);
+	if (rc)
+		return rc;
+
+	return 0;
+}
+
 static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 				   enum cxl_event_log_type type)
 {
@@ -725,9 +746,12 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 			return rc;
 
 		record_count = le16_to_cpu(payload.record_count);
-		if (record_count > 0)
+		if (record_count > 0) {
 			trace_cxl_event(dev_name(cxlds->dev), type,
 					&payload.record);
+			cxl_clear_event_record(cxlds, type,
+					       payload.record.hdr.handle);
+		}
 
 		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
 			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
@@ -742,10 +766,11 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
  * cxl_mem_get_event_records - Get Event Records from the device
  * @cxlds: The device data for the operation
  *
- * Retrieve all event records available on the device and report them as trace
- * events.
+ * Retrieve all event records available on the device, report them as trace
+ * events, and clear them.
  *
  * See CXL v3.0 @8.2.9.2.2 Get Event Records
+ * See CXL v3.0 @8.2.9.2.3 Clear Event Records
  */
 void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index f83634f3bc8d..5506e7210cf6 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -255,6 +255,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_INVALID		= 0x0000,
 	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
 	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
+	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
 	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
 	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
 	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
@@ -387,6 +388,20 @@ static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
 	return "<unknown>";
 }
 
+/*
+ * Clear Event Records input payload
+ * CXL v3.0 section 8.2.9.2.3; Table 8-51
+ *
+ * Space given for 1 record
+ */
+struct cxl_mbox_clear_event_payload {
+	u8 event_log;		/* enum cxl_event_log_type */
+	u8 clear_flags;
+	u8 nr_recs;		/* 1 for this struct */
+	u8 reserved[3];
+	__le16 handle;
+};
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 70459be5bdd4..7c1ad8062792 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -25,6 +25,7 @@
 	___C(RAW, "Raw device command"),                                  \
 	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
 	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
+	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
 	___C(GET_FW_INFO, "Get FW Info"),                                 \
 	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
 	___C(GET_LSA, "Get Label Storage Area"),                          \
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 3/9] cxl/mem: Clear events on driver load
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
  2022-08-13  5:32 ` [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command ira.weiny
  2022-08-13  5:32 ` [RFC PATCH 2/9] cxl/mem: Implement Clear " ira.weiny
@ 2022-08-13  5:32 ` ira.weiny
  2022-08-24 15:57   ` Jonathan Cameron
  2022-08-13  5:32 ` [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record ira.weiny
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

The information contained in the events prior to the driver loading can
be queried at any time through other mailbox commands.

Ensure a clean slate of events by reading and clearing the events.  The
events are sent to the trace buffer but it is not anticipated to have
anyone listening to it at driver load time.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/cxl/pci.c            | 2 ++
 tools/testing/cxl/test/mem.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..5f1b492bd388 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -498,6 +498,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
+	cxl_mem_get_event_records(cxlds);
+
 	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
 
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index aa2df3a15051..e2f5445d24ff 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
+	cxl_mem_get_event_records(cxlds);
+
 	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(dev, cxlmd);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
                   ` (2 preceding siblings ...)
  2022-08-13  5:32 ` [RFC PATCH 3/9] cxl/mem: Clear events on driver load ira.weiny
@ 2022-08-13  5:32 ` ira.weiny
  2022-08-24 16:11   ` Jonathan Cameron
  2022-08-13  5:32 ` [RFC PATCH 5/9] cxl/mem: Trace DRAM " ira.weiny
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL v3.0 section 8.2.9.2.1.1 defines the General Media Event Record.

Determine if the event read is a general media record and if so trace
the record.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
A couple of specification questions I've had.

1) The component id is not specified as a UUID or any particular
format.  It is therefore reported as a byte array.  Is this intentional?

2) This record has a very odd byte layout with a 16 bit field
   (validity_flags) landing on a 3 byte boundary and a 3 byte bit field
   (device) landing on a 7 byte boundary.

I've made my best guess as to how the endianess of these fields should
be resolved.  But I'm happy to hear from other folks if what I have done
is wrong.

struct cxl_evt_gen_media {
	struct cxl_event_record_hdr hdr;
	__le64 phys_addr;
	u8 descriptor;
	u8 type;
	u8 transaction_type;
	u16 validity_flags;			/* ??? */
	u8 channel;
	u8 rank;
	u8 device[CXL_EVT_GEN_MED_DEV_SIZE];	/* ??? */
	u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
} __packed;
---
 drivers/cxl/core/mbox.c           |  30 ++++++-
 drivers/cxl/cxlmem.h              |  19 +++++
 include/trace/events/cxl-events.h | 125 ++++++++++++++++++++++++++++++
 3 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 493f5ceb5d1c..0e433f072163 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -709,6 +709,32 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
 
+/*
+ * General Media Event Record
+ * CXL v3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+static const uuid_t gen_media_event_uuid =
+	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
+
+static void cxl_trace_event_record(const char *dev_name,
+				   enum cxl_event_log_type type,
+				   struct cxl_get_event_payload *payload)
+{
+	uuid_t *id = &payload->record.hdr.id;
+
+	if (uuid_equal(id, &gen_media_event_uuid)) {
+		struct cxl_evt_gen_media *rec =
+				(struct cxl_evt_gen_media *)&payload->record;
+
+		trace_cxl_gen_media_event(dev_name, type, rec);
+		return;
+	}
+
+	/* For unknown record types print just the header */
+	trace_cxl_event(dev_name, type, &payload->record);
+}
+
 static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
 				  enum cxl_event_log_type log,
 				  __le16 handle)
@@ -747,8 +773,8 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 
 		record_count = le16_to_cpu(payload.record_count);
 		if (record_count > 0) {
-			trace_cxl_event(dev_name(cxlds->dev), type,
-					&payload.record);
+			cxl_trace_event_record(dev_name(cxlds->dev), type,
+					       &payload);
 			cxl_clear_event_record(cxlds, type,
 					       payload.record.hdr.handle);
 		}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5506e7210cf6..33669459ae4b 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -402,6 +402,25 @@ struct cxl_mbox_clear_event_payload {
 	__le16 handle;
 };
 
+/*
+ * General Media Event Record
+ * CXL v3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVT_GEN_MED_DEV_SIZE	3
+#define CXL_EVT_GEN_MED_COMP_ID_SIZE	0x10
+struct cxl_evt_gen_media {
+	struct cxl_event_record_hdr hdr;
+	__le64 phys_addr;
+	u8 descriptor;
+	u8 type;
+	u8 transaction_type;
+	u16 validity_flags;
+	u8 channel;
+	u8 rank;
+	u8 device[CXL_EVT_GEN_MED_DEV_SIZE];
+	u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
+} __packed;
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
index f4baeae66cf3..b51c51fd4e62 100644
--- a/include/trace/events/cxl-events.h
+++ b/include/trace/events/cxl-events.h
@@ -119,6 +119,131 @@ TRACE_EVENT(cxl_event,
 		)
 );
 
+/*
+ * General Media Event Record - GMER
+ * CXL v2.0 Section 8.2.9.1.1.1; Table 154
+ */
+#define CXL_GMER_PHYS_ADDR_VOLATILE			BIT(0)
+#define CXL_GMER_PHYS_ADDR_MASK				0x3f
+
+#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT		BIT(0)
+#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT		BIT(1)
+#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW		BIT(2)
+#define show_event_desc_flags(flags)	__print_flags(flags, "|",		   \
+	{ CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,		"Uncorrectable Event"	}, \
+	{ CXL_GMER_EVT_DESC_THRESHOLD_EVENT,		"Threshold event"	}, \
+	{ CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW,	"Poison List Overflow"	}  \
+)
+
+#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR			0x00
+#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR			0x01
+#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR		0x02
+#define show_mem_event_type(type)	__print_symbolic(type,			\
+	{ CXL_GMER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },		\
+	{ CXL_GMER_MEM_EVT_TYPE_INV_ADDR,		"Invalid Address" },	\
+	{ CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,	"Data Path Error" }	\
+)
+
+#define CXL_GMER_TRANS_UNKNOWN				0x00
+#define CXL_GMER_TRANS_HOST_READ			0x01
+#define CXL_GMER_TRANS_HOST_WRITE			0x02
+#define CXL_GMER_TRANS_HOST_SCAN_MEDIA			0x03
+#define CXL_GMER_TRANS_HOST_INJECT_POISON		0x04
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB		0x05
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT	0x06
+#define show_trans_type(type)	__print_symbolic(type,					\
+	{ CXL_GMER_TRANS_UNKNOWN,			"Unknown" },			\
+	{ CXL_GMER_TRANS_HOST_READ,			"Host Read" },			\
+	{ CXL_GMER_TRANS_HOST_WRITE,			"Host Write" },			\
+	{ CXL_GMER_TRANS_HOST_SCAN_MEDIA,		"Host Scan Media" },		\
+	{ CXL_GMER_TRANS_HOST_INJECT_POISON,		"Host Inject Poison" },		\
+	{ CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,		"Internal Media Scrub" },	\
+	{ CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,	"Internal Media Management" }	\
+)
+
+#define CXL_GMER_VALID_CHANNEL				BIT(0)
+#define CXL_GMER_VALID_RANK				BIT(1)
+#define CXL_GMER_VALID_DEVICE				BIT(2)
+#define CXL_GMER_VALID_COMPONENT			BIT(3)
+#define show_valid_flags(flags)	__print_flags(flags, "|",		   \
+	{ CXL_GMER_VALID_CHANNEL,			"CHANNEL"	}, \
+	{ CXL_GMER_VALID_RANK,				"RANK"		}, \
+	{ CXL_GMER_VALID_DEVICE,			"DEVICE"	}, \
+	{ CXL_GMER_VALID_COMPONENT,			"COMPONENT"	}  \
+)
+
+TRACE_EVENT(cxl_gen_media_event,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_evt_gen_media *rec),
+
+	TP_ARGS(dev_name, log, rec),
+
+	TP_STRUCT__entry(
+		/* Common */
+		__string(dev_name, dev_name)
+		__field(int, log)
+		__array(u8, id, UUID_SIZE)
+		__field(u32, flags)
+		__field(u16, handle)
+		__field(u16, related_handle)
+		__field(u64, timestamp)
+
+		/* General Media */
+		__field(u64, phys_addr)
+		__field(u8, descriptor)
+		__field(u8, type)
+		__field(u8, transaction_type)
+		__field(u8, channel)
+		__field(u32, device)
+		__array(u8, comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE)
+		__field(u16, validity_flags)
+		__field(u8, rank) /* Out of order to pack trace record */
+	),
+
+	TP_fast_assign(
+		/* Common */
+		__assign_str(dev_name, dev_name);
+		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
+		__entry->log = log;
+		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
+		__entry->handle = le16_to_cpu(rec->hdr.handle);
+		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
+		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
+
+		/* General Media */
+		__entry->phys_addr = le64_to_cpu(rec->phys_addr);
+		__entry->descriptor = rec->descriptor;
+		__entry->type = rec->type;
+		__entry->transaction_type = rec->transaction_type;
+		__entry->channel = rec->channel;
+		__entry->rank = rec->rank;
+		__entry->device = rec->device[0] << 24 |
+				  rec->device[1] << 16 |
+				  rec->device[2] << 8; /* 3 byte LE ? */
+		__entry->device = le32_to_cpu(__entry->device);
+		memcpy(__entry->comp_id, &rec->component_id,
+			CXL_EVT_GEN_MED_COMP_ID_SIZE);
+		__entry->validity_flags = le16_to_cpu(rec->validity_flags);
+	),
+
+	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
+		  "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
+		  "rank=%u device=%x comp_id=%s valid_flags='%s'",
+		__get_str(dev_name), show_log_type(__entry->log),
+		__entry->timestamp, __entry->id, __entry->handle,
+		__entry->related_handle, show_hdr_flags(__entry->flags),
+		__entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
+		(__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
+		show_event_desc_flags(__entry->descriptor),
+		show_mem_event_type(__entry->type),
+		show_trans_type(__entry->transaction_type),
+		__entry->channel, __entry->rank, __entry->device,
+		__print_hex(__entry->comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE),
+		show_valid_flags(__entry->validity_flags)
+		)
+);
+
 #endif /* _CXL_TRACE_EVENTS_H */
 
 /* This part must be outside protection */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 5/9] cxl/mem: Trace DRAM Event Record
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
                   ` (3 preceding siblings ...)
  2022-08-13  5:32 ` [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record ira.weiny
@ 2022-08-13  5:32 ` ira.weiny
  2022-08-25 10:46   ` Jonathan Cameron
  2022-08-13  5:32 ` [RFC PATCH 6/9] cxl/mem: Trace Memory Module " ira.weiny
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL v3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.

Determine if the event read is a DRAM event record and if so trace the
record.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
This record has a very odd byte layout with 2 - 16 bit fields
(validity_flags and column) aligned on an odd byte boundary.  In
addition nibble_mask and row are oddly aligned.

I've made my best guess as to how the endianess of these fields should
be resolved.  But I'm happy to hear from other folks if what I have is
wrong.

struct cxl_evt_dram_rec {
	struct cxl_event_record_hdr hdr;
	__le64 phys_addr;
	u8 descriptor;
	u8 type;
	u8 transaction_type;
	u16 validity_flags;
	u8 channel;
	u8 rank;
	u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
	u8 bank_group;
	u8 bank;
	u8 row[CXL_EVT_DER_ROW_SIZE];
	u16 column;
	u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
} __packed;
---
 drivers/cxl/core/mbox.c           |  16 +++++
 drivers/cxl/cxlmem.h              |  24 +++++++
 include/trace/events/cxl-events.h | 114 ++++++++++++++++++++++++++++++
 3 files changed, 154 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 0e433f072163..6414588a3c7b 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -717,6 +717,14 @@ static const uuid_t gen_media_event_uuid =
 	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
 		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
 
+/*
+ * DRAM Event Record
+ * CXL v3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+static const uuid_t dram_event_uuid =
+	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
+
 static void cxl_trace_event_record(const char *dev_name,
 				   enum cxl_event_log_type type,
 				   struct cxl_get_event_payload *payload)
@@ -731,6 +739,14 @@ static void cxl_trace_event_record(const char *dev_name,
 		return;
 	}
 
+	if (uuid_equal(id, &dram_event_uuid)) {
+		struct cxl_evt_dram_rec *rec =
+				(struct cxl_evt_dram_rec *)&payload->record;
+
+		trace_cxl_dram_event(dev_name, type, rec);
+		return;
+	}
+
 	/* For unknown record types print just the header */
 	trace_cxl_event(dev_name, type, &payload->record);
 }
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 33669459ae4b..50536c0a7850 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -421,6 +421,30 @@ struct cxl_evt_gen_media {
 	u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
 } __packed;
 
+/*
+ * DRAM Event Record - DER
+ * CXL v3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVT_DER_NIBBLE_MASK_SIZE		3
+#define CXL_EVT_DER_ROW_SIZE			3
+#define CXL_EVT_DER_CORRECTION_MASK_SIZE	0x20
+struct cxl_evt_dram_rec {
+	struct cxl_event_record_hdr hdr;
+	__le64 phys_addr;
+	u8 descriptor;
+	u8 type;
+	u8 transaction_type;
+	u16 validity_flags;
+	u8 channel;
+	u8 rank;
+	u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
+	u8 bank_group;
+	u8 bank;
+	u8 row[CXL_EVT_DER_ROW_SIZE];
+	u16 column;
+	u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
+} __packed;
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
index b51c51fd4e62..db9b34ddd240 100644
--- a/include/trace/events/cxl-events.h
+++ b/include/trace/events/cxl-events.h
@@ -244,6 +244,120 @@ TRACE_EVENT(cxl_gen_media_event,
 		)
 );
 
+/*
+ * DRAM Event Record - DER
+ *
+ * CXL v2.0 section 8.2.9.1.1.2; Table 155
+ */
+/*
+ * DRAM Event Record defines many fields the same as the General Media Event
+ * Record.  Reuse those definitions as appropriate.
+ */
+#define CXL_DER_VALID_CHANNEL				BIT(0)
+#define CXL_DER_VALID_RANK				BIT(1)
+#define CXL_DER_VALID_NIBBLE				BIT(2)
+#define CXL_DER_VALID_BANK_GROUP			BIT(3)
+#define CXL_DER_VALID_BANK				BIT(4)
+#define CXL_DER_VALID_ROW				BIT(5)
+#define CXL_DER_VALID_COLUMN				BIT(6)
+#define CXL_DER_VALID_CORRECTION_MASK			BIT(7)
+#define show_dram_valid_flags(flags)	__print_flags(flags, "|",			   \
+	{ CXL_DER_VALID_CHANNEL,			"CHANNEL"		}, \
+	{ CXL_DER_VALID_RANK,				"RANK"			}, \
+	{ CXL_DER_VALID_NIBBLE,				"NIBBLE"		}, \
+	{ CXL_DER_VALID_BANK_GROUP,			"BANK GROUP"		}, \
+	{ CXL_DER_VALID_BANK,				"BANK"			}, \
+	{ CXL_DER_VALID_ROW,				"ROW"			}, \
+	{ CXL_DER_VALID_COLUMN,				"COLUMN"		}, \
+	{ CXL_DER_VALID_CORRECTION_MASK,		"CORRECTION MASK"	}  \
+)
+
+TRACE_EVENT(cxl_dram_event,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_evt_dram_rec *rec),
+
+	TP_ARGS(dev_name, log, rec),
+
+	TP_STRUCT__entry(
+		/* Common */
+		__string(dev_name, dev_name)
+		__field(int, log)
+		__array(u8, id, UUID_SIZE)
+		__field(u32, flags)
+		__field(u16, handle)
+		__field(u16, related_handle)
+		__field(u64, timestamp)
+
+		/* DRAM */
+		__field(u64, phys_addr)
+		__field(u8, descriptor)
+		__field(u8, type)
+		__field(u8, transaction_type)
+		__field(u8, channel)
+		__field(u16, validity_flags)
+		__field(u16, column)	/* Out of order to pack trace record */
+		__field(u32, nibble_mask)
+		__field(u32, row)
+		__array(u8, cor_mask, CXL_EVT_DER_CORRECTION_MASK_SIZE)
+		__field(u8, rank)	/* Out of order to pack trace record */
+		__field(u8, bank_group)	/* Out of order to pack trace record */
+		__field(u8, bank)	/* Out of order to pack trace record */
+	),
+
+	TP_fast_assign(
+		/* Common */
+		__assign_str(dev_name, dev_name);
+		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
+		__entry->log = log;
+		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
+		__entry->handle = le16_to_cpu(rec->hdr.handle);
+		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
+		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
+
+		/* DRAM */
+		__entry->phys_addr = le64_to_cpu(rec->phys_addr);
+		__entry->descriptor = rec->descriptor;
+		__entry->type = rec->type;
+		__entry->transaction_type = rec->transaction_type;
+		__entry->validity_flags = le16_to_cpu(rec->validity_flags);
+		__entry->channel = rec->channel;
+		__entry->rank = rec->rank;
+		__entry->nibble_mask = rec->nibble_mask[0] << 24 |
+				       rec->nibble_mask[1] << 16 |
+				       rec->nibble_mask[2] << 8; /* 3 byte LE ? */
+		__entry->nibble_mask = le32_to_cpu(__entry->nibble_mask);
+		__entry->bank_group = rec->bank_group;
+		__entry->bank = rec->bank;
+		__entry->row = rec->row[0] << 24 |
+			       rec->row[1] << 16 |
+			       rec->row[2] << 8; /* 3 byte LE ? */
+		__entry->row = le32_to_cpu(__entry->row);
+		__entry->column = le16_to_cpu(rec->column);
+		memcpy(__entry->cor_mask, &rec->correction_mask,
+			CXL_EVT_DER_CORRECTION_MASK_SIZE);
+	),
+
+	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
+		  "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
+		  "rank=%u nibble_mask=%x bank_group=%u bank=%u row=%u column=%u " \
+		  "cor_mask=%s valid_flags='%s'",
+		__get_str(dev_name), show_log_type(__entry->log),
+		__entry->timestamp, __entry->id, __entry->handle,
+		__entry->related_handle, show_hdr_flags(__entry->flags),
+		__entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
+		(__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
+		show_event_desc_flags(__entry->descriptor),
+		show_mem_event_type(__entry->type),
+		show_trans_type(__entry->transaction_type),
+		__entry->channel, __entry->rank, __entry->nibble_mask,
+		__entry->bank_group, __entry->bank,
+		__entry->row, __entry->column,
+		__print_hex(__entry->cor_mask, CXL_EVT_DER_CORRECTION_MASK_SIZE),
+		show_dram_valid_flags(__entry->validity_flags)
+		)
+);
+
 #endif /* _CXL_TRACE_EVENTS_H */
 
 /* This part must be outside protection */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 6/9] cxl/mem: Trace Memory Module Event Record
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
                   ` (4 preceding siblings ...)
  2022-08-13  5:32 ` [RFC PATCH 5/9] cxl/mem: Trace DRAM " ira.weiny
@ 2022-08-13  5:32 ` ira.weiny
  2022-08-25 10:58   ` Jonathan Cameron
  2022-08-13  5:32 ` [RFC PATCH 7/9] cxl/test: Add generic mock events ira.weiny
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL v3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.

Determine if the event read is memory module record and if so trace the
record.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/cxl/core/mbox.c           |  16 +++
 drivers/cxl/cxlmem.h              |  25 +++++
 include/trace/events/cxl-events.h | 155 ++++++++++++++++++++++++++++++
 3 files changed, 196 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 6414588a3c7b..99b09bfeaff5 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -725,6 +725,14 @@ static const uuid_t dram_event_uuid =
 	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
 		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
 
+/*
+ * Memory Module Event Record
+ * CXL v3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+static const uuid_t mem_mod_event_uuid =
+	UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
+
 static void cxl_trace_event_record(const char *dev_name,
 				   enum cxl_event_log_type type,
 				   struct cxl_get_event_payload *payload)
@@ -747,6 +755,14 @@ static void cxl_trace_event_record(const char *dev_name,
 		return;
 	}
 
+	if (uuid_equal(id, &mem_mod_event_uuid)) {
+		struct cxl_evt_mem_mod_rec *rec =
+				(struct cxl_evt_mem_mod_rec *)&payload->record;
+
+		trace_cxl_mem_mod_event(dev_name, type, rec);
+		return;
+	}
+
 	/* For unknown record types print just the header */
 	trace_cxl_event(dev_name, type, &payload->record);
 }
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 50536c0a7850..a02a41dfd988 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -445,6 +445,31 @@ struct cxl_evt_dram_rec {
 	u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
 } __packed;
 
+/*
+ * Get Health Info Record
+ * CXL v3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+	u8 health_status;
+	u8 media_status;
+	u8 add_status;
+	u8 life_used;
+	u16 device_temp;
+	u32 dirty_shutdown_cnt;
+	u32 cor_vol_err_cnt;
+	u32 cor_per_err_cnt;
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL v3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_evt_mem_mod_rec {
+	struct cxl_event_record_hdr hdr;
+	u8 event_type;
+	struct cxl_get_health_info info;
+} __packed;
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
index db9b34ddd240..dbbe25fee25c 100644
--- a/include/trace/events/cxl-events.h
+++ b/include/trace/events/cxl-events.h
@@ -358,6 +358,161 @@ TRACE_EVENT(cxl_dram_event,
 		)
 );
 
+/*
+ * Memory Module Event Record - MMER
+ *
+ * CXL v2.0 section 8.2.9.1.1.3; Table 156, Table 181
+ *
+ * Device Health Information - DHI; Table 181
+ */
+#define CXL_MMER_HEALTH_STATUS_CHANGE		0x00
+#define CXL_MMER_MEDIA_STATUS_CHANGE		0x01
+#define CXL_MMER_LIFE_USED_CHANGE		0x02
+#define CXL_MMER_TEMP_CHANGE			0x03
+#define CXL_MMER_DATA_PATH_ERROR		0x04
+#define CXL_MMER_LAS_ERROR			0x05
+#define show_dev_evt_type(type)	__print_symbolic(type,			   \
+	{ CXL_MMER_HEALTH_STATUS_CHANGE,	"Health Status Change"	}, \
+	{ CXL_MMER_MEDIA_STATUS_CHANGE,		"Media Status Change"	}, \
+	{ CXL_MMER_LIFE_USED_CHANGE,		"Life Used Change"	}, \
+	{ CXL_MMER_TEMP_CHANGE,			"Temperature Change"	}, \
+	{ CXL_MMER_DATA_PATH_ERROR,		"Data Path Error"	}, \
+	{ CXL_MMER_LAS_ERROR,			"LSA Error"		}  \
+)
+
+#define CXL_DHI_HS_MAINTENANCE_NEEDED				BIT(0)
+#define CXL_DHI_HS_PERFORMANCE_DEGRADED				BIT(1)
+#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED			BIT(2)
+#define show_health_status_flags(flags)	__print_flags(flags, "|",	   \
+	{ CXL_DHI_HS_MAINTENANCE_NEEDED,	"Maintenance Needed"	}, \
+	{ CXL_DHI_HS_PERFORMANCE_DEGRADED,	"Performance Degraded"	}, \
+	{ CXL_DHI_HS_HW_REPLACEMENT_NEEDED,	"Replacement Needed"	}  \
+)
+
+#define CXL_DHI_MS_NORMAL							0x00
+#define CXL_DHI_MS_NOT_READY							0x01
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST					0x02
+#define CXL_DHI_MS_ALL_DATA_LOST						0x03
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS			0x04
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN			0x05
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT				0x06
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS				0x07
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN				0x08
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT					0x09
+#define show_media_status(ms)	__print_symbolic(ms,			   \
+	{ CXL_DHI_MS_NORMAL,						   \
+		"Normal"						}, \
+	{ CXL_DHI_MS_NOT_READY,						   \
+		"Not Ready"						}, \
+	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOST,				   \
+		"Write Persistency Lost"				}, \
+	{ CXL_DHI_MS_ALL_DATA_LOST,					   \
+		"All Data Lost"						}, \
+	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS,		   \
+		"Write Persistency Loss in the Event of Power Loss"	}, \
+	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN,		   \
+		"Write Persistency Loss in Event of Shutdown"		}, \
+	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT,			   \
+		"Write Persistency Loss Imminent"			}, \
+	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS,		   \
+		"All Data Loss in Event of Power Loss"			}, \
+	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN,		   \
+		"All Data loss in the Event of Shutdown"		}, \
+	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT,			   \
+		"All Data Loss Imminent"				}  \
+)
+
+#define CXL_DHI_AS_NORMAL		0x0
+#define CXL_DHI_AS_WARNING		0x1
+#define CXL_DHI_AS_CRITICAL		0x2
+#define show_add_status(as) __print_symbolic(as,	   \
+	{ CXL_DHI_AS_NORMAL,		"Normal"	}, \
+	{ CXL_DHI_AS_WARNING,		"Warning"	}, \
+	{ CXL_DHI_AS_CRITICAL,		"Critical"	}  \
+)
+
+#define CXL_DHI_AS_LIFE_USED(as)			(as & 0x3)
+#define CXL_DHI_AS_DEV_TEMP(as)				((as & 0xC) >> 2)
+#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)			((as & 0x10) >> 4)
+#define CXL_DHI_AS_COR_PER_ERR_CNT(as)			((as & 0x20) >> 5)
+
+TRACE_EVENT(cxl_mem_mod_event,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_evt_mem_mod_rec *rec),
+
+	TP_ARGS(dev_name, log, rec),
+
+	TP_STRUCT__entry(
+		/* Common */
+		__string(dev_name, dev_name)
+		__field(int, log)
+		__array(u8, id, UUID_SIZE)
+		__field(u32, flags)
+		__field(u16, handle)
+		__field(u16, related_handle)
+		__field(u64, timestamp)
+
+		/* Memory Module Event */
+		__field(u8, event_type)
+
+		/* Device Health Info */
+		__field(u8, health_status)
+		__field(u8, media_status)
+		__field(u8, life_used)
+		__field(u32, dirty_shutdown_cnt)
+		__field(u32, cor_vol_err_cnt)
+		__field(u32, cor_per_err_cnt)
+		__field(s16, device_temp)
+		__field(u8, add_status)
+	),
+
+	TP_fast_assign(
+		/* Common */
+		__assign_str(dev_name, dev_name);
+		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
+		__entry->log = log;
+		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
+		__entry->handle = le16_to_cpu(rec->hdr.handle);
+		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
+		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
+
+		/* Memory Module Event */
+		__entry->event_type = rec->event_type;
+
+		/* Device Health Info */
+		__entry->health_status = rec->info.health_status;
+		__entry->media_status = rec->info.media_status;
+		__entry->life_used = rec->info.life_used;
+		__entry->dirty_shutdown_cnt = le32_to_cpu(rec->info.dirty_shutdown_cnt);
+		__entry->cor_vol_err_cnt = le32_to_cpu(rec->info.cor_vol_err_cnt);
+		__entry->cor_per_err_cnt = le32_to_cpu(rec->info.cor_per_err_cnt);
+		__entry->device_temp = le16_to_cpu(rec->info.device_temp);
+		__entry->add_status = rec->info.add_status;
+	),
+
+	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
+		  "evt_type='%s' health_status='%s' media_status='%s' as_life_used=%s " \
+		  "as_dev_temp=%s as_cor_vol_err_cnt=%s as_cor_per_err_cnt=%s " \
+		  "life_used=%u dev_temp=%d dirty_shutdown_cnt=%u cor_vol_err_cnt=%u " \
+		  "cor_per_err_cnt=%u",
+		__get_str(dev_name), show_log_type(__entry->log),
+		__entry->timestamp, __entry->id, __entry->handle,
+		__entry->related_handle, show_hdr_flags(__entry->flags),
+
+		show_dev_evt_type(__entry->event_type),
+		show_health_status_flags(__entry->health_status),
+		show_media_status(__entry->media_status),
+		show_add_status(CXL_DHI_AS_LIFE_USED(__entry->add_status)),
+		show_add_status(CXL_DHI_AS_DEV_TEMP(__entry->add_status)),
+		show_add_status(CXL_DHI_AS_COR_VOL_ERR_CNT(__entry->add_status)),
+		show_add_status(CXL_DHI_AS_COR_PER_ERR_CNT(__entry->add_status)),
+		__entry->life_used, __entry->device_temp,
+		__entry->dirty_shutdown_cnt, __entry->cor_vol_err_cnt,
+		__entry->cor_per_err_cnt)
+);
+
+
 #endif /* _CXL_TRACE_EVENTS_H */
 
 /* This part must be outside protection */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 7/9] cxl/test: Add generic mock events
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
                   ` (5 preceding siblings ...)
  2022-08-13  5:32 ` [RFC PATCH 6/9] cxl/mem: Trace Memory Module " ira.weiny
@ 2022-08-13  5:32 ` ira.weiny
  2022-08-25 11:31   ` Jonathan Cameron
  2022-08-13  5:32 ` [RFC PATCH 8/9] cxl/test: Add specific events ira.weiny
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Facilitate testing basic Get/Clear Event functionality by creating
multiple logs and generic events with made up UUID's.

Data is completely made up with data patterns which should be easy to
spot in trace output.

Test traces are easy to obtain with a small script such as this:

	#!/bin/bash -x

	devices=`find /sys/devices/platform -name cxl_mem*`

	# Generate fake events if reset is passed in
	if [ "$1" == "reset" ]; then
	        for device in $devices; do
	                echo 1 > $device/mem*/event_reset
	        done
	fi

	# Turn on tracing
	echo "" > /sys/kernel/tracing/trace
	echo 1 > /sys/kernel/tracing/events/cxl_events/enable
	echo 1 > /sys/kernel/tracing/tracing_on

	# Generate fake interrupt
	for device in $devices; do
	        echo 1 > $device/mem*/event_trigger
	        # just trigger 1
	        break;
	done

	# Turn off tracing and report events
	echo 0 > /sys/kernel/tracing/tracing_on
	cat /sys/kernel/tracing/trace

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 tools/testing/cxl/test/mem.c | 291 +++++++++++++++++++++++++++++++++++
 1 file changed, 291 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index e2f5445d24ff..87196d62acf5 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -9,6 +9,8 @@
 #include <linux/bits.h>
 #include <cxlmem.h>
 
+#include <trace/events/cxl-events.h>
+
 #define LSA_SIZE SZ_128K
 #define DEV_SIZE SZ_2G
 #define EFFECT(x) (1U << x)
@@ -137,6 +139,287 @@ static int mock_partition_info(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
+/*
+ * Mock Events
+ */
+struct mock_event_log {
+	int cur_event;
+	int nr_events;
+	struct xarray events;
+};
+
+struct mock_event_store {
+	struct cxl_dev_state *cxlds;
+	struct mock_event_log *mock_logs[CXL_EVENT_TYPE_MAX];
+};
+
+DEFINE_XARRAY(mock_cxlds_event_store);
+
+void delete_event_store(void *ds)
+{
+	xa_store(&mock_cxlds_event_store, (unsigned long)ds, NULL, GFP_KERNEL);
+}
+
+void store_event_store(struct mock_event_store *es)
+{
+	struct cxl_dev_state *cxlds = es->cxlds;
+
+	if (xa_insert(&mock_cxlds_event_store, (unsigned long)cxlds, es,
+		      GFP_KERNEL)) {
+		dev_err(cxlds->dev, "Event store not available for %s\n",
+			dev_name(cxlds->dev));
+		return;
+	}
+
+	devm_add_action_or_reset(cxlds->dev, delete_event_store, cxlds);
+}
+
+struct mock_event_log *find_event_log(struct cxl_dev_state *cxlds, int log_type)
+{
+	struct mock_event_store *es = xa_load(&mock_cxlds_event_store,
+					      (unsigned long)cxlds);
+
+	if (!es || log_type >= CXL_EVENT_TYPE_MAX)
+		return NULL;
+	return es->mock_logs[log_type];
+}
+
+struct cxl_event_record_raw *get_cur_event(struct mock_event_log *log)
+{
+	return xa_load(&log->events, log->cur_event);
+}
+
+__le16 get_cur_event_handle(struct mock_event_log *log)
+{
+	return cpu_to_le16(log->cur_event);
+}
+
+static bool log_empty(struct mock_event_log *log)
+{
+	return log->cur_event == log->nr_events;
+}
+
+static int log_rec_left(struct mock_event_log *log)
+{
+	return log->nr_events - log->cur_event;
+}
+
+static void xa_events_destroy(void *l)
+{
+	struct mock_event_log *log = l;
+
+	xa_destroy(&log->events);
+}
+
+static void event_store_add_event(struct mock_event_store *es,
+				  enum cxl_event_log_type log_type,
+				  struct cxl_event_record_raw *event)
+{
+	struct mock_event_log *log;
+	struct device *dev = es->cxlds->dev;
+	int rc;
+
+	if (log_type >= CXL_EVENT_TYPE_MAX)
+		return;
+
+	log = es->mock_logs[log_type];
+	if (!log) {
+		log = devm_kzalloc(dev, sizeof(*log), GFP_KERNEL);
+		if (!log) {
+			dev_err(dev, "Failed to create %s log\n",
+				cxl_event_log_type_str(log_type));
+			return;
+		}
+		xa_init(&log->events);
+		devm_add_action(dev, xa_events_destroy, log);
+		es->mock_logs[log_type] = log;
+	}
+
+	rc = xa_insert(&log->events, log->nr_events, event, GFP_KERNEL);
+	if (rc) {
+		dev_err(dev, "Failed to store event %s log\n",
+			cxl_event_log_type_str(log_type));
+		return;
+	}
+	log->nr_events++;
+}
+
+/*
+ * Get and clear event only handle 1 record at a time as this is what is
+ * currently implemented in the main code.
+ */
+static int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_get_event_payload *pl;
+	struct mock_event_log *log;
+	u8 log_type;
+
+	/* Valid request? */
+	if (cmd->size_in != 1)
+		return -EINVAL;
+
+	log_type = *((u8 *)cmd->payload_in);
+	if (log_type >= CXL_EVENT_TYPE_MAX)
+		return -EINVAL;
+
+	log = find_event_log(cxlds, log_type);
+	if (!log || log_empty(log))
+		goto no_data;
+
+	/* Don't handle more than 1 record at a time */
+	if (cmd->size_out < sizeof(*pl))
+		return -EINVAL;
+
+	pl = cmd->payload_out;
+	memset(pl, 0, sizeof(*pl));
+
+	pl->record_count = cpu_to_le16(1);
+
+	if (log_rec_left(log) > 1)
+		pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
+
+	memcpy(&pl->record, get_cur_event(log), sizeof(pl->record));
+	pl->record.hdr.handle = get_cur_event_handle(log);
+	return 0;
+
+no_data:
+	/* Room for header? */
+	if (cmd->size_out < (sizeof(*pl) - sizeof(pl->record)))
+		return -EINVAL;
+
+	memset(cmd->payload_out, 0, cmd->size_out);
+	return 0;
+}
+
+/*
+ * Get and clear event only handle 1 record at a time as this is what is
+ * currently implemented in the main code.
+ */
+static int mock_clear_event(struct cxl_dev_state *cxlds,
+			    struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
+	struct mock_event_log *log;
+	u8 log_type = pl->event_log;
+
+	/* Don't handle more than 1 record at a time */
+	if (pl->nr_recs != 1)
+		return -EINVAL;
+
+	if (log_type >= CXL_EVENT_TYPE_MAX)
+		return -EINVAL;
+
+	log = find_event_log(cxlds, log_type);
+	if (!log)
+		return 0; /* No mock data in this log */
+
+	/*
+	 * The current code clears events as they are read
+	 * Test that behavior; not clearning from the middle of the log
+	 */
+	if (log->cur_event != le16_to_cpu(pl->handle)) {
+		dev_err(cxlds->dev, "Clearing events out of order\n");
+		return -EINVAL;
+	}
+
+	log->cur_event++;
+	return 0;
+}
+
+static ssize_t event_reset_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	struct cxl_memdev *cxlmd = container_of(dev, struct cxl_memdev, dev);
+	int i;
+
+	for (i = CXL_EVENT_TYPE_INFO; i < CXL_EVENT_TYPE_MAX; i++) {
+		struct mock_event_log *log;
+
+		log = find_event_log(cxlmd->cxlds, i);
+		if (log)
+			log->cur_event = 0;
+	}
+
+	return count;
+}
+static DEVICE_ATTR_WO(event_reset);
+
+static ssize_t event_trigger_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	struct cxl_memdev *cxlmd = container_of(dev, struct cxl_memdev, dev);
+
+	cxl_mem_get_event_records(cxlmd->cxlds);
+
+	return count;
+}
+static DEVICE_ATTR_WO(event_trigger);
+
+static struct attribute *cxl_mock_event_attrs[] = {
+	&dev_attr_event_reset.attr,
+	&dev_attr_event_trigger.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(cxl_mock_event);
+
+void remove_mock_event_groups(void *dev)
+{
+	device_remove_groups(dev, cxl_mock_event_groups);
+}
+
+struct cxl_event_record_raw maint_needed = {
+	.hdr = {
+		.id = UUID_INIT(0xDEADBEEF, 0xCAFE, 0xBABE, 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
+		.flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_MAINT_NEEDED << 8) |
+					      sizeof(struct cxl_event_record_raw)),
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0xa5b6),
+	},
+	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
+};
+
+struct cxl_event_record_raw hardware_replace = {
+	.hdr = {
+		.id = UUID_INIT(0xBABECAFE, 0xBEEF, 0xDEAD, 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
+		.flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_HW_REPLACE << 8) |
+					     sizeof(struct cxl_event_record_raw)),
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0xb6a5),
+	},
+	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
+};
+
+static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
+{
+	struct device *dev = &cxlmd->dev;
+	struct mock_event_store *es;
+
+	/*
+	 * The memory device gets the sysfs attributes such that the cxlmd
+	 * pointer can be used to get to a cxlds pointer.
+	 */
+	if (device_add_groups(dev, cxl_mock_event_groups))
+		return;
+	if (devm_add_action_or_reset(dev, remove_mock_event_groups, dev))
+		return;
+
+	/*
+	 * All the mock event data hangs off the device itself.
+	 */
+	es = devm_kzalloc(cxlmd->cxlds->dev, sizeof(*es), GFP_KERNEL);
+	if (!es)
+		return;
+	es->cxlds = cxlmd->cxlds;
+
+	event_store_add_event(es, CXL_EVENT_TYPE_INFO, &maint_needed);
+
+	event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
+
+	store_event_store(es);
+}
+
 static int mock_get_lsa(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct cxl_mbox_get_lsa *get_lsa = cmd->payload_in;
@@ -224,6 +507,12 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_GET_PARTITION_INFO:
 		rc = mock_partition_info(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_GET_EVENT_RECORD:
+		rc = mock_get_event(cxlds, cmd);
+		break;
+	case CXL_MBOX_OP_CLEAR_EVENT_RECORD:
+		rc = mock_clear_event(cxlds, cmd);
+		break;
 	case CXL_MBOX_OP_SET_LSA:
 		rc = mock_set_lsa(cxlds, cmd);
 		break;
@@ -285,6 +574,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
+	devm_cxl_mock_event_logs(cxlmd);
+
 	cxl_mem_get_event_records(cxlds);
 
 	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 8/9] cxl/test: Add specific events
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
                   ` (6 preceding siblings ...)
  2022-08-13  5:32 ` [RFC PATCH 7/9] cxl/test: Add generic mock events ira.weiny
@ 2022-08-13  5:32 ` ira.weiny
  2022-08-25 11:37   ` Jonathan Cameron
  2022-08-13  5:32 ` [RFC PATCH 9/9] cxl/test: Simulate event log overflow ira.weiny
  2022-08-22 16:18 ` [RFC PATCH 0/9] CXL: Read and clear event logs Davidlohr Bueso
  9 siblings, 1 reply; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Each type of event has different trace point outputs.

Add mock General Media Event, DRAM event, and Memory Module Event
records to the mock list of events returned.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 tools/testing/cxl/test/mem.c | 70 ++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 87196d62acf5..c5d7857ae2e5 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -391,6 +391,70 @@ struct cxl_event_record_raw hardware_replace = {
 	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 };
 
+struct cxl_evt_gen_media gen_media = {
+	.hdr = {
+		.id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+				0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+		.flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_PERMANENT << 8) |
+					     sizeof(struct cxl_evt_gen_media)),
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0),
+	},
+	.phys_addr = cpu_to_le64(0x2000),
+	.descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+	.type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+	.transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+	.validity_flags = cpu_to_le16(CXL_GMER_VALID_CHANNEL |
+				      CXL_GMER_VALID_RANK),
+	.channel = 1,
+	.rank = 30
+};
+
+struct cxl_evt_dram_rec dram_rec = {
+	.hdr = {
+		.id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+				0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+		.flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_PERF_DEGRADED << 8) |
+					     sizeof(struct cxl_evt_dram_rec)),
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0),
+	},
+	.phys_addr = cpu_to_le64(0x8000),
+	.descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+	.type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+	.transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+	.validity_flags = cpu_to_le16(CXL_DER_VALID_CHANNEL |
+				      CXL_DER_VALID_BANK_GROUP |
+				      CXL_DER_VALID_BANK |
+				      CXL_DER_VALID_COLUMN),
+	.channel = 1,
+	.bank_group = 5,
+	.bank = 2,
+	.column = cpu_to_le16(1024)
+};
+
+struct cxl_evt_mem_mod_rec mem_mod_rec = {
+	.hdr = {
+		.id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+				0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+		.flags_length = cpu_to_le32(sizeof(struct cxl_evt_mem_mod_rec)),
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0),
+	},
+	.event_type = CXL_MMER_TEMP_CHANGE,
+	.info = {
+		.health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+		.media_status = CXL_DHI_MS_ALL_DATA_LOST,
+		.add_status = (CXL_DHI_AS_CRITICAL << 2) |
+			      (CXL_DHI_AS_WARNING << 4) |
+			      (CXL_DHI_AS_WARNING << 5),
+		.device_temp = cpu_to_le16(1000),
+		.dirty_shutdown_cnt = cpu_to_le32(30000),
+		.cor_vol_err_cnt = cpu_to_le32(30100),
+		.cor_per_err_cnt = cpu_to_le32(40100),
+	}
+};
+
 static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
 {
 	struct device *dev = &cxlmd->dev;
@@ -414,8 +478,14 @@ static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
 	es->cxlds = cxlmd->cxlds;
 
 	event_store_add_event(es, CXL_EVENT_TYPE_INFO, &maint_needed);
+	event_store_add_event(es, CXL_EVENT_TYPE_INFO,
+			      (struct cxl_event_record_raw *)&gen_media);
+	event_store_add_event(es, CXL_EVENT_TYPE_INFO,
+			      (struct cxl_event_record_raw *)&mem_mod_rec);
 
 	event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
+	event_store_add_event(es, CXL_EVENT_TYPE_FATAL,
+			      (struct cxl_event_record_raw *)&dram_rec);
 
 	store_event_store(es);
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 9/9] cxl/test: Simulate event log overflow
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
                   ` (7 preceding siblings ...)
  2022-08-13  5:32 ` [RFC PATCH 8/9] cxl/test: Add specific events ira.weiny
@ 2022-08-13  5:32 ` ira.weiny
  2022-08-16 16:44   ` Steven Rostedt
  2022-08-22 16:18 ` [RFC PATCH 0/9] CXL: Read and clear event logs Davidlohr Bueso
  9 siblings, 1 reply; 51+ messages in thread
From: ira.weiny @ 2022-08-13  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Log overflow is marked by a separate trace message.

Simulate a log with lots of messages and flag overflow until it is
drained a bit.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 tools/testing/cxl/test/mem.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index c5d7857ae2e5..87e6b10896c9 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -244,6 +244,15 @@ static void event_store_add_event(struct mock_event_store *es,
 	log->nr_events++;
 }
 
+static u16 log_overflow(struct mock_event_log *log)
+{
+	int cnt = log_rec_left(log) - 5;
+
+	if (cnt < 0)
+		return 0;
+	return cnt;
+}
+
 /*
  * Get and clear event only handle 1 record at a time as this is what is
  * currently implemented in the main code.
@@ -253,6 +262,7 @@ static int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 	struct cxl_get_event_payload *pl;
 	struct mock_event_log *log;
 	u8 log_type;
+	u16 nr_overflow;
 
 	/* Valid request? */
 	if (cmd->size_in != 1)
@@ -278,6 +288,20 @@ static int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 	if (log_rec_left(log) > 1)
 		pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
 
+	nr_overflow = log_overflow(log);
+	if (nr_overflow) {
+		u64 ns;
+
+		pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW;
+		pl->overflow_err_count = cpu_to_le16(nr_overflow);
+		ns = ktime_get_real_ns();
+		ns -= 5000000000; /* 5s ago */
+		pl->first_overflow_timestamp = cpu_to_le64(ns);
+		ns = ktime_get_real_ns();
+		ns -= 1000000000; /* 1s ago */
+		pl->last_overflow_timestamp = cpu_to_le64(ns);
+	}
+
 	memcpy(&pl->record, get_cur_event(log), sizeof(pl->record));
 	pl->record.hdr.handle = get_cur_event_handle(log);
 	return 0;
@@ -483,6 +507,18 @@ static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
 	event_store_add_event(es, CXL_EVENT_TYPE_INFO,
 			      (struct cxl_event_record_raw *)&mem_mod_rec);
 
+	event_store_add_event(es, CXL_EVENT_TYPE_FAIL, &maint_needed);
+	event_store_add_event(es, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	event_store_add_event(es, CXL_EVENT_TYPE_FAIL,
+			      (struct cxl_event_record_raw *)&dram_rec);
+	event_store_add_event(es, CXL_EVENT_TYPE_FAIL,
+			      (struct cxl_event_record_raw *)&gen_media);
+	event_store_add_event(es, CXL_EVENT_TYPE_FAIL,
+			      (struct cxl_event_record_raw *)&mem_mod_rec);
+	event_store_add_event(es, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	event_store_add_event(es, CXL_EVENT_TYPE_FAIL,
+			      (struct cxl_event_record_raw *)&dram_rec);
+
 	event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
 	event_store_add_event(es, CXL_EVENT_TYPE_FATAL,
 			      (struct cxl_event_record_raw *)&dram_rec);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-08-13  5:32 ` [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command ira.weiny
@ 2022-08-16 16:39   ` Steven Rostedt
  2022-08-16 16:41     ` Steven Rostedt
  2022-08-16 23:35     ` Ira Weiny
  2022-08-17 22:54   ` Dave Jiang
  2022-08-24 15:50   ` Jonathan Cameron
  2 siblings, 2 replies; 51+ messages in thread
From: Steven Rostedt @ 2022-08-16 16:39 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 12 Aug 2022 22:32:35 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Event records are defined for CXL devices.  Each record is reported in
> one event log.  Devices are required to support the storage of at least
> one event record in each event log type.
> 
> Devices track event log overflow by incrementing a counter and tracking
> the time of the first and last overflow event seen.
> 
> Software queries events via the Get Event Record mailbox command; CXL
> v3.0 section 8.2.9.2.2.
> 
> Issue the Get Event Record mailbox command on driver load.  Trace each
> record found, as well as any overflow conditions.  Only 1 event is
> requested for each query.  Optimization of multiple record queries is
> deferred.
> 
> This patch traces a raw event record only and leaves the specific event
> record types to subsequent patches.
> 
> NOTE: checkpatch is not completely happy with the tracing part of this
> patch but AFAICT it is correct.  I'm open to suggestions if I've done
> something wrong.

The include/trace/events/*.h files are all broken according to
checkpatch.pl ;-) Don't worry about the formatting there. I need to update
that script to detect that it's looking at TRACE_EVENT() that has different
rules than normal macros.

> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  MAINTAINERS                       |   1 +
>  drivers/cxl/core/mbox.c           |  60 ++++++++++++++
>  drivers/cxl/cxlmem.h              |  66 ++++++++++++++++
>  include/trace/events/cxl-events.h | 127 ++++++++++++++++++++++++++++++
>  include/uapi/linux/cxl_mem.h      |   1 +
>  5 files changed, 255 insertions(+)
>  create mode 100644 include/trace/events/cxl-events.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 54fa6e2059de..1cb9cec31009 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5014,6 +5014,7 @@ M:	Dan Williams <dan.j.williams@intel.com>
>  L:	linux-cxl@vger.kernel.org
>  S:	Maintained
>  F:	drivers/cxl/
> +F:	include/trace/events/cxl*.h
>  F:	include/uapi/linux/cxl_mem.h
>  
>  CONEXANT ACCESSRUNNER USB DRIVER
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..2cceed8608dc 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -7,6 +7,9 @@
>  #include <cxlmem.h>
>  #include <cxl.h>
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl-events.h>
> +
>  #include "core.h"
>  
>  static bool cxl_raw_allow_all;
> @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
>  #endif
>  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> +	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
>  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -704,6 +708,62 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> +static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> +				   enum cxl_event_log_type type)
> +{
> +	struct cxl_get_event_payload payload;
> +
> +	do {
> +		u8 log_type = type;
> +		u16 record_count;
> +		int rc;
> +
> +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> +				       &log_type, sizeof(log_type),
> +				       &payload, sizeof(payload));
> +		if (rc)
> +			return rc;
> +
> +		record_count = le16_to_cpu(payload.record_count);
> +		if (record_count > 0)
> +			trace_cxl_event(dev_name(cxlds->dev), type,
> +					&payload.record);
> +
> +		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> +			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> +						 &payload);

If you want to avoid the compare operations when the tracepoints are not
enabled, you can add:

		if (trace_cxl_event_enabled()) {
			if (record_count > 0)
				trace_cxl_event(dev_name(cxlds->dev), type,
						&payload.record);
		}

		if (trace_cxl_event_overflow_enabled()) {
			if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
				trace_cxl_event_overflow(dev_name(cxlds->dev), type,
							 &payload);
		}

Those "<tracepoint>_enabled()" functions are static branches. Which means
when not enabled it's a nop that skips this code, and when either is
enabled, it turns into a jump to the contents of the if block.


> +
> +	} while (payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> +
> +	return 0;
> +}
> +
> +/**
> + * cxl_mem_get_event_records - Get Event Records from the device
> + * @cxlds: The device data for the operation
> + *
> + * Retrieve all event records available on the device and report them as trace
> + * events.
> + *
> + * See CXL v3.0 @8.2.9.2.2 Get Event Records
> + */
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	enum cxl_event_log_type log_type;
> +
> +	for (log_type = CXL_EVENT_TYPE_INFO;
> +	     log_type < CXL_EVENT_TYPE_MAX; log_type++) {
> +		int rc;
> +
> +		rc = cxl_mem_get_records_log(cxlds, log_type);
> +		if (rc)
> +			dev_err(dev, "Failed to query %s Event Logs : %d",
> +				cxl_event_log_type_str(log_type), rc);
> +	}
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> +
>  /**
>   * cxl_mem_get_partition_info - Get partition info
>   * @cxlds: The device data for the operation
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..f83634f3bc8d 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -4,6 +4,7 @@
>  #define __CXL_MEM_H__
>  #include <uapi/linux/cxl_mem.h>
>  #include <linux/cdev.h>
> +#include <linux/uuid.h>
>  #include "cxl.h"
>  
>  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -253,6 +254,7 @@ struct cxl_dev_state {
>  enum cxl_opcode {
>  	CXL_MBOX_OP_INVALID		= 0x0000,
>  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> +	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
>  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -322,6 +324,69 @@ struct cxl_mbox_identify {
>  	u8 qos_telemetry_caps;
>  } __packed;
>  
> +/*
> + * Common Event Record Format
> + * CXL v3.0 section 8.2.9.2.1; Table 8-42
> + */
> +struct cxl_event_record_hdr {
> +	uuid_t id;
> +	__le32 flags_length;
> +	__le16 handle;
> +	__le16 related_handle;
> +	__le64 timestamp;
> +	__le64 reserved1;
> +	__le64 reserved2;
> +} __packed;
> +
> +#define EVENT_RECORD_DATA_LENGTH 0x50
> +struct cxl_event_record_raw {
> +	struct cxl_event_record_hdr hdr;
> +	u8 data[EVENT_RECORD_DATA_LENGTH];
> +} __packed;
> +
> +/*
> + * Get Event Records output payload
> + * CXL v3.0 section 8.2.9.2.2; Table 8-50
> + *
> + * Space given for 1 record
> + */
> +#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
> +#define CXL_GET_EVENT_FLAG_MORE_RECORDS	BIT(1)
> +struct cxl_get_event_payload {
> +	u8 flags;
> +	u8 reserved1;
> +	__le16 overflow_err_count;
> +	__le64 first_overflow_timestamp;
> +	__le64 last_overflow_timestamp;
> +	__le16 record_count;
> +	u8 reserved2[0xa];
> +	struct cxl_event_record_raw record;
> +} __packed;
> +
> +enum cxl_event_log_type {
> +	CXL_EVENT_TYPE_INFO = 0x00,
> +	CXL_EVENT_TYPE_WARN,
> +	CXL_EVENT_TYPE_FAIL,
> +	CXL_EVENT_TYPE_FATAL,
> +	CXL_EVENT_TYPE_MAX
> +};
> +static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
> +{
> +	switch (type) {
> +	case CXL_EVENT_TYPE_INFO:
> +		return "Informational";
> +	case CXL_EVENT_TYPE_WARN:
> +		return "Warning";
> +	case CXL_EVENT_TYPE_FAIL:
> +		return "Failure";
> +	case CXL_EVENT_TYPE_FATAL:
> +		return "Fatal";
> +	default:
> +		break;
> +	}
> +	return "<unknown>";
> +}
> +
>  struct cxl_mbox_get_partition_info {
>  	__le64 active_volatile_cap;
>  	__le64 active_persistent_cap;
> @@ -381,6 +446,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
>  struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
>  void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>  void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);
>  void cxl_mem_active_dec(void);
> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> new file mode 100644
> index 000000000000..f4baeae66cf3
> --- /dev/null
> +++ b/include/trace/events/cxl-events.h
> @@ -0,0 +1,127 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl_events
> +
> +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_EVENTS_H
> +
> +#include <linux/tracepoint.h>
> +
> +#define EVENT_LOGS					\
> +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
> +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
> +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
> +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
> +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")
> +
> +/*
> + * First define the enums in the above macros to be exported to userspace via
> + * TRACE_DEFINE_ENUM().
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b)	TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
> +
> +EVENT_LOGS
> +#define show_log_type(type) __print_symbolic(type, EVENT_LOGS)
> +
> +/*
> + * Now redefine the EM and EMe macros to map the enums to the strings that will
> + * be printed in the output
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b)        {a, b},
> +#define EMe(a, b)       {a, b}
> +
> +TRACE_EVENT(cxl_event_overflow,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_get_event_payload *payload),
> +
> +	TP_ARGS(dev_name, log, payload),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__field(u16, count)
> +		__field(u64, first)
> +		__field(u64, last)

Because you have a dynamic string, you will save some bytes in the ring
buffer if you have:

		__string(dev_name, dev_name)
		__field(int, log)
		__field(u64, first)
		__field(u64, last)
		__field(u16, count)


> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->log = log;
> +		__entry->count = le16_to_cpu(payload->overflow_err_count);
> +		__entry->first = le64_to_cpu(payload->first_overflow_timestamp);
> +		__entry->last = le64_to_cpu(payload->last_overflow_timestamp);
> +	),
> +
> +	TP_printk("%s: EVENT LOG %s OVERFLOW %u records from %llu to %llu",
> +		__get_str(dev_name), show_log_type(__entry->log),
> +		__entry->count, __entry->first, __entry->last)
> +
> +);
> +
> +/*
> + * Common Event Record Format
> + * CXL v2.0 section 8.2.9.1.1; Table 153
> + */
> +#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
> +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
> +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
> +#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
> +	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"Permanent Condition"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"Maintanance Needed"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> +)
> +
> +TRACE_EVENT(cxl_event,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_event_record_raw *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__array(u8, id, UUID_SIZE)
> +		__field(u32, flags)
> +		__field(u16, handle)
> +		__field(u16, related_handle)
> +		__field(u64, timestamp)
> +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> +		__field(u8, length)

The above looks good.

> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> +		__entry->log = log;
> +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> +		__entry->length = le32_to_cpu(rec->hdr.flags_length) & 0xFF;
> +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);

I wonder if I should add le64_to_cpu() and le32_to_cpu() to the functions
that libtraceevent can parse, and then we could move that logic to the
TP_printk(). That is, out of the fast path.

-- Steve

> +		memcpy(__entry->data, &rec->data, EVENT_RECORD_DATA_LENGTH);
> +	),
> +
> +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s' " \
> +		  ": %s",
> +		__get_str(dev_name), show_log_type(__entry->log),
> +		__entry->timestamp, __entry->id, __entry->handle,
> +		__entry->related_handle, show_hdr_flags(__entry->flags),
> +		__print_hex(__entry->data, EVENT_RECORD_DATA_LENGTH)
> +		)
> +);
> +
> +#endif /* _CXL_TRACE_EVENTS_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl-events
> +#include <trace/define_trace.h>
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index c71021a2a9ed..70459be5bdd4 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -24,6 +24,7 @@
>  	___C(IDENTIFY, "Identify Command"),                               \
>  	___C(RAW, "Raw device command"),                                  \
>  	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
> +	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
>  	___C(GET_FW_INFO, "Get FW Info"),                                 \
>  	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
>  	___C(GET_LSA, "Get Label Storage Area"),                          \


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-08-16 16:39   ` Steven Rostedt
@ 2022-08-16 16:41     ` Steven Rostedt
  2022-08-16 23:11       ` Ira Weiny
  2022-08-16 23:35     ` Ira Weiny
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2022-08-16 16:41 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl

On Tue, 16 Aug 2022 12:39:58 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > +		record_count = le16_to_cpu(payload.record_count);
> > +		if (record_count > 0)
> > +			trace_cxl_event(dev_name(cxlds->dev), type,
> > +					&payload.record);
> > +
> > +		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> > +			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> > +						 &payload);  
> 
> If you want to avoid the compare operations when the tracepoints are not
> enabled, you can add:
> 
> 		if (trace_cxl_event_enabled()) {
> 			if (record_count > 0)
> 				trace_cxl_event(dev_name(cxlds->dev), type,
> 						&payload.record);
> 		}
> 
> 		if (trace_cxl_event_overflow_enabled()) {
> 			if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> 				trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> 							 &payload);
> 		}
> 
> Those "<tracepoint>_enabled()" functions are static branches. Which means
> when not enabled it's a nop that skips this code, and when either is
> enabled, it turns into a jump to the contents of the if block.

Ignore this suggestion. I see in the second patch you add more logic to the
if condition. Only use this suggestion if the logic is only for when the
tracepoint is enabled.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 9/9] cxl/test: Simulate event log overflow
  2022-08-13  5:32 ` [RFC PATCH 9/9] cxl/test: Simulate event log overflow ira.weiny
@ 2022-08-16 16:44   ` Steven Rostedt
  0 siblings, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2022-08-16 16:44 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl


I just skimmed through the rest of the patches, and it looks fine to me.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-08-16 16:41     ` Steven Rostedt
@ 2022-08-16 23:11       ` Ira Weiny
  0 siblings, 0 replies; 51+ messages in thread
From: Ira Weiny @ 2022-08-16 23:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl

On Tue, Aug 16, 2022 at 12:41:28PM -0400, Steven Rostedt wrote:
> On Tue, 16 Aug 2022 12:39:58 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > > +		record_count = le16_to_cpu(payload.record_count);
> > > +		if (record_count > 0)
> > > +			trace_cxl_event(dev_name(cxlds->dev), type,
> > > +					&payload.record);
> > > +
> > > +		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> > > +			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> > > +						 &payload);  
> > 
> > If you want to avoid the compare operations when the tracepoints are not
> > enabled, you can add:
> > 
> > 		if (trace_cxl_event_enabled()) {
> > 			if (record_count > 0)
> > 				trace_cxl_event(dev_name(cxlds->dev), type,
> > 						&payload.record);
> > 		}
> > 
> > 		if (trace_cxl_event_overflow_enabled()) {
> > 			if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> > 				trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> > 							 &payload);
> > 		}
> > 
> > Those "<tracepoint>_enabled()" functions are static branches. Which means
> > when not enabled it's a nop that skips this code, and when either is
> > enabled, it turns into a jump to the contents of the if block.
> 
> Ignore this suggestion. I see in the second patch you add more logic to the
> if condition.

Correct.

> Only use this suggestion if the logic is only for when the
> tracepoint is enabled.

This could apply to the overflow trace.  I think it is more likely that either
all of these traces will be enabled or none.

So other than doing:

	if (trace_cxl_event_enabled() ||
	    trace_cxl_event_overflow_enabled() ||
	    trace_cxl_gen_media_event() ||
	    <every event defined>
	    ...) {

Is there a way to know if 

	/sys/kernel/tracing/events/cxl_events/enable

is != 0?

I feel like the real optimization will be to shut this entire functionality
down if no one is listening.

Ira

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-08-16 16:39   ` Steven Rostedt
  2022-08-16 16:41     ` Steven Rostedt
@ 2022-08-16 23:35     ` Ira Weiny
  1 sibling, 0 replies; 51+ messages in thread
From: Ira Weiny @ 2022-08-16 23:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl

On Tue, Aug 16, 2022 at 12:39:58PM -0400, Steven Rostedt wrote:
> On Fri, 12 Aug 2022 22:32:35 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Event records are defined for CXL devices.  Each record is reported in
> > one event log.  Devices are required to support the storage of at least
> > one event record in each event log type.
> > 
> > Devices track event log overflow by incrementing a counter and tracking
> > the time of the first and last overflow event seen.
> > 
> > Software queries events via the Get Event Record mailbox command; CXL
> > v3.0 section 8.2.9.2.2.
> > 
> > Issue the Get Event Record mailbox command on driver load.  Trace each
> > record found, as well as any overflow conditions.  Only 1 event is
> > requested for each query.  Optimization of multiple record queries is
> > deferred.
> > 
> > This patch traces a raw event record only and leaves the specific event
> > record types to subsequent patches.
> > 
> > NOTE: checkpatch is not completely happy with the tracing part of this
> > patch but AFAICT it is correct.  I'm open to suggestions if I've done
> > something wrong.
> 
> The include/trace/events/*.h files are all broken according to
> checkpatch.pl ;-) Don't worry about the formatting there. I need to update
> that script to detect that it's looking at TRACE_EVENT() that has different
> rules than normal macros.

Thanks!

[snip]

> > +
> > +/*
> > + * Now redefine the EM and EMe macros to map the enums to the strings that will
> > + * be printed in the output
> > + */
> > +#undef EM
> > +#undef EMe
> > +#define EM(a, b)        {a, b},
> > +#define EMe(a, b)       {a, b}
> > +
> > +TRACE_EVENT(cxl_event_overflow,
> > +
> > +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > +		 struct cxl_get_event_payload *payload),
> > +
> > +	TP_ARGS(dev_name, log, payload),
> > +
> > +	TP_STRUCT__entry(
> > +		__string(dev_name, dev_name)
> > +		__field(int, log)
> > +		__field(u16, count)
> > +		__field(u64, first)
> > +		__field(u64, last)
> 
> Because you have a dynamic string, you will save some bytes in the ring
> buffer if you have:
> 
> 		__string(dev_name, dev_name)
> 		__field(int, log)
> 		__field(u64, first)
> 		__field(u64, last)
> 		__field(u16, count)

Thanks I missed this one.  I was trying to pack things better but I missed this
one.

> 
> 
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__assign_str(dev_name, dev_name);
> > +		__entry->log = log;
> > +		__entry->count = le16_to_cpu(payload->overflow_err_count);
> > +		__entry->first = le64_to_cpu(payload->first_overflow_timestamp);
> > +		__entry->last = le64_to_cpu(payload->last_overflow_timestamp);
> > +	),
> > +
> > +	TP_printk("%s: EVENT LOG %s OVERFLOW %u records from %llu to %llu",
> > +		__get_str(dev_name), show_log_type(__entry->log),
> > +		__entry->count, __entry->first, __entry->last)
> > +
> > +);
> > +
> > +/*
> > + * Common Event Record Format
> > + * CXL v2.0 section 8.2.9.1.1; Table 153
> > + */
> > +#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
> > +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
> > +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
> > +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
> > +#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
> > +	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"Permanent Condition"		}, \
> > +	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"Maintanance Needed"		}, \
> > +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> > +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> > +)
> > +
> > +TRACE_EVENT(cxl_event,
> > +
> > +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > +		 struct cxl_event_record_raw *rec),
> > +
> > +	TP_ARGS(dev_name, log, rec),
> > +
> > +	TP_STRUCT__entry(
> > +		__string(dev_name, dev_name)
> > +		__field(int, log)
> > +		__array(u8, id, UUID_SIZE)
> > +		__field(u32, flags)
> > +		__field(u16, handle)
> > +		__field(u16, related_handle)
> > +		__field(u64, timestamp)
> > +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> > +		__field(u8, length)
> 
> The above looks good.
> 
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__assign_str(dev_name, dev_name);
> > +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > +		__entry->log = log;
> > +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > +		__entry->length = le32_to_cpu(rec->hdr.flags_length) & 0xFF;
> > +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> > +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> 
> I wonder if I should add le64_to_cpu() and le32_to_cpu() to the functions
> that libtraceevent can parse, and then we could move that logic to the
> TP_printk(). That is, out of the fast path.

I would not do it for this series.  I don't see performance on these traces
being an issue.  The built in logging the trace mechanism provides (space
considerations) as well as user API to gain access to the data is what we are
leveraging more.

What I would really like (and was looking for the time to enhance) would be a
way to create 'sub-class' like events.

In this case each of the traces comes with a common header which is part of the
generic cxl_event trace point.

It would be nice if we could define something like:

DECLARE_EVENT_BASE_CLASS(event_header,

	TP_PROTO(const char *dev_name, enum cxl_event_log_type log_type,
		 struct cxl_event_record_hdr hdr),
	...

	TP_base_printk(<print header fields>),
);

Then somehow use that header in the 

TRACE_EVENT(cxl_event,

	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
		 struct cxl_event_record_raw *rec),

	TP_SUB_CLASS(dev_name, log, (struct cxl_event_record_hdr *)rec),

	TP_fast_assign(
		call_base_assign(rec),
		...
	),

	TP_printk(<automatically print header fields>
		  <print raw data>),
	...
);

TRACE_EVENT(cxl_gen_media_event,

	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
		 struct cxl_evt_gen_media *rec),

	TP_SUB_CLASS(dev_name, log, (struct cxl_event_record_hdr *)rec),

	TP_fast_assign(
		call_base_assign(rec),
		...
	),

	TP_printk(<automatically print header fields>
		  <print media event fields>),
	...
);

<etc>

Does that make sense?  I've no idea how this could be done.  But I think the
real work would be in the printk merging between the base class (header prints)
and the rest of the record.  I _think_ that the fast assign could just call the
assign defined in the base class somehow.

Am I way off base thinking this is possible?

Thanks for the review!
Ira

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-08-13  5:32 ` [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command ira.weiny
  2022-08-16 16:39   ` Steven Rostedt
@ 2022-08-17 22:54   ` Dave Jiang
  2022-09-07  4:53     ` Ira Weiny
  2022-08-24 15:50   ` Jonathan Cameron
  2 siblings, 1 reply; 51+ messages in thread
From: Dave Jiang @ 2022-08-17 22:54 UTC (permalink / raw)
  To: ira.weiny, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ben Widawsky, Steven Rostedt,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl


On 8/12/2022 10:32 PM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
>
> Event records are defined for CXL devices.  Each record is reported in
> one event log.  Devices are required to support the storage of at least
> one event record in each event log type.
>
> Devices track event log overflow by incrementing a counter and tracking
> the time of the first and last overflow event seen.
>
> Software queries events via the Get Event Record mailbox command; CXL
> v3.0 section 8.2.9.2.2.
>
> Issue the Get Event Record mailbox command on driver load.  Trace each
> record found, as well as any overflow conditions.  Only 1 event is
> requested for each query.  Optimization of multiple record queries is
> deferred.
>
> This patch traces a raw event record only and leaves the specific event
> record types to subsequent patches.
>
> NOTE: checkpatch is not completely happy with the tracing part of this
> patch but AFAICT it is correct.  I'm open to suggestions if I've done
> something wrong.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>   MAINTAINERS                       |   1 +
>   drivers/cxl/core/mbox.c           |  60 ++++++++++++++
>   drivers/cxl/cxlmem.h              |  66 ++++++++++++++++
>   include/trace/events/cxl-events.h | 127 ++++++++++++++++++++++++++++++
>   include/uapi/linux/cxl_mem.h      |   1 +
>   5 files changed, 255 insertions(+)
>   create mode 100644 include/trace/events/cxl-events.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 54fa6e2059de..1cb9cec31009 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5014,6 +5014,7 @@ M:	Dan Williams <dan.j.williams@intel.com>
>   L:	linux-cxl@vger.kernel.org
>   S:	Maintained
>   F:	drivers/cxl/
> +F:	include/trace/events/cxl*.h
>   F:	include/uapi/linux/cxl_mem.h
>   
>   CONEXANT ACCESSRUNNER USB DRIVER
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..2cceed8608dc 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -7,6 +7,9 @@
>   #include <cxlmem.h>
>   #include <cxl.h>
>   
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl-events.h>
> +
>   #include "core.h"
>   
>   static bool cxl_raw_allow_all;
> @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>   	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
>   #endif
>   	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> +	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
>   	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>   	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>   	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -704,6 +708,62 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>   
> +static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> +				   enum cxl_event_log_type type)
> +{
> +	struct cxl_get_event_payload payload;
> +
> +	do {
> +		u8 log_type = type;
> +		u16 record_count;
> +		int rc;
> +
> +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> +				       &log_type, sizeof(log_type),
> +				       &payload, sizeof(payload));
> +		if (rc)
> +			return rc;
> +
> +		record_count = le16_to_cpu(payload.record_count);
> +		if (record_count > 0)
> +			trace_cxl_event(dev_name(cxlds->dev), type,
> +					&payload.record);
> +
> +		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> +			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> +						 &payload);
> +
> +	} while (payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> +
> +	return 0;
> +}
> +
> +/**
> + * cxl_mem_get_event_records - Get Event Records from the device
> + * @cxlds: The device data for the operation
> + *
> + * Retrieve all event records available on the device and report them as trace
> + * events.
> + *
> + * See CXL v3.0 @8.2.9.2.2 Get Event Records
> + */
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	enum cxl_event_log_type log_type;
> +
> +	for (log_type = CXL_EVENT_TYPE_INFO;
> +	     log_type < CXL_EVENT_TYPE_MAX; log_type++) {
> +		int rc;
> +
> +		rc = cxl_mem_get_records_log(cxlds, log_type);
> +		if (rc)
> +			dev_err(dev, "Failed to query %s Event Logs : %d",
> +				cxl_event_log_type_str(log_type), rc);
> +	}
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> +
>   /**
>    * cxl_mem_get_partition_info - Get partition info
>    * @cxlds: The device data for the operation
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..f83634f3bc8d 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -4,6 +4,7 @@
>   #define __CXL_MEM_H__
>   #include <uapi/linux/cxl_mem.h>
>   #include <linux/cdev.h>
> +#include <linux/uuid.h>
>   #include "cxl.h"
>   
>   /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -253,6 +254,7 @@ struct cxl_dev_state {
>   enum cxl_opcode {
>   	CXL_MBOX_OP_INVALID		= 0x0000,
>   	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> +	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
>   	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>   	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>   	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -322,6 +324,69 @@ struct cxl_mbox_identify {
>   	u8 qos_telemetry_caps;
>   } __packed;
>   
> +/*
> + * Common Event Record Format
> + * CXL v3.0 section 8.2.9.2.1; Table 8-42
> + */
> +struct cxl_event_record_hdr {
> +	uuid_t id;
> +	__le32 flags_length;
> +	__le16 handle;
> +	__le16 related_handle;
> +	__le64 timestamp;
> +	__le64 reserved1;
> +	__le64 reserved2;
> +} __packed;
> +
> +#define EVENT_RECORD_DATA_LENGTH 0x50
> +struct cxl_event_record_raw {
> +	struct cxl_event_record_hdr hdr;
> +	u8 data[EVENT_RECORD_DATA_LENGTH];
> +} __packed;
> +
> +/*
> + * Get Event Records output payload
> + * CXL v3.0 section 8.2.9.2.2; Table 8-50
> + *
> + * Space given for 1 record
> + */
> +#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
> +#define CXL_GET_EVENT_FLAG_MORE_RECORDS	BIT(1)
> +struct cxl_get_event_payload {
> +	u8 flags;
> +	u8 reserved1;
> +	__le16 overflow_err_count;
> +	__le64 first_overflow_timestamp;
> +	__le64 last_overflow_timestamp;
> +	__le16 record_count;
> +	u8 reserved2[0xa];
> +	struct cxl_event_record_raw record;
> +} __packed;
> +
> +enum cxl_event_log_type {
> +	CXL_EVENT_TYPE_INFO = 0x00,
> +	CXL_EVENT_TYPE_WARN,
> +	CXL_EVENT_TYPE_FAIL,
> +	CXL_EVENT_TYPE_FATAL,
> +	CXL_EVENT_TYPE_MAX
> +};
> +static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
> +{
> +	switch (type) {
> +	case CXL_EVENT_TYPE_INFO:
> +		return "Informational";
> +	case CXL_EVENT_TYPE_WARN:
> +		return "Warning";
> +	case CXL_EVENT_TYPE_FAIL:
> +		return "Failure";
> +	case CXL_EVENT_TYPE_FATAL:
> +		return "Fatal";
> +	default:
> +		break;
> +	}
> +	return "<unknown>";
> +}
> +
>   struct cxl_mbox_get_partition_info {
>   	__le64 active_volatile_cap;
>   	__le64 active_persistent_cap;
> @@ -381,6 +446,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
>   struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
>   void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>   void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
>   #ifdef CONFIG_CXL_SUSPEND
>   void cxl_mem_active_inc(void);
>   void cxl_mem_active_dec(void);
> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> new file mode 100644
> index 000000000000..f4baeae66cf3
> --- /dev/null
> +++ b/include/trace/events/cxl-events.h
> @@ -0,0 +1,127 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl_events
> +
> +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_EVENTS_H
> +
> +#include <linux/tracepoint.h>
> +
> +#define EVENT_LOGS					\
> +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
> +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
> +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
> +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
> +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")
> +
> +/*
> + * First define the enums in the above macros to be exported to userspace via
> + * TRACE_DEFINE_ENUM().
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b)	TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
> +
> +EVENT_LOGS
> +#define show_log_type(type) __print_symbolic(type, EVENT_LOGS)
> +
> +/*
> + * Now redefine the EM and EMe macros to map the enums to the strings that will
> + * be printed in the output
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b)        {a, b},
> +#define EMe(a, b)       {a, b}
> +
> +TRACE_EVENT(cxl_event_overflow,

Kind of a general comment for the event names. Maybe just "overflow" 
instead of "cxl_event_overflow" since it shows up in sysfs under the 
cxl_events directory and becomes redundant?

- Dave

> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_get_event_payload *payload),
> +
> +	TP_ARGS(dev_name, log, payload),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__field(u16, count)
> +		__field(u64, first)
> +		__field(u64, last)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->log = log;
> +		__entry->count = le16_to_cpu(payload->overflow_err_count);
> +		__entry->first = le64_to_cpu(payload->first_overflow_timestamp);
> +		__entry->last = le64_to_cpu(payload->last_overflow_timestamp);
> +	),
> +
> +	TP_printk("%s: EVENT LOG %s OVERFLOW %u records from %llu to %llu",
> +		__get_str(dev_name), show_log_type(__entry->log),
> +		__entry->count, __entry->first, __entry->last)
> +
> +);
> +
> +/*
> + * Common Event Record Format
> + * CXL v2.0 section 8.2.9.1.1; Table 153
> + */
> +#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
> +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
> +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
> +#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
> +	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"Permanent Condition"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"Maintanance Needed"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> +)
> +
> +TRACE_EVENT(cxl_event,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_event_record_raw *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__array(u8, id, UUID_SIZE)
> +		__field(u32, flags)
> +		__field(u16, handle)
> +		__field(u16, related_handle)
> +		__field(u64, timestamp)
> +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> +		__field(u8, length)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> +		__entry->log = log;
> +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> +		__entry->length = le32_to_cpu(rec->hdr.flags_length) & 0xFF;
> +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> +		memcpy(__entry->data, &rec->data, EVENT_RECORD_DATA_LENGTH);
> +	),
> +
> +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s' " \
> +		  ": %s",
> +		__get_str(dev_name), show_log_type(__entry->log),
> +		__entry->timestamp, __entry->id, __entry->handle,
> +		__entry->related_handle, show_hdr_flags(__entry->flags),
> +		__print_hex(__entry->data, EVENT_RECORD_DATA_LENGTH)
> +		)
> +);
> +
> +#endif /* _CXL_TRACE_EVENTS_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl-events
> +#include <trace/define_trace.h>
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index c71021a2a9ed..70459be5bdd4 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -24,6 +24,7 @@
>   	___C(IDENTIFY, "Identify Command"),                               \
>   	___C(RAW, "Raw device command"),                                  \
>   	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
> +	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
>   	___C(GET_FW_INFO, "Get FW Info"),                                 \
>   	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
>   	___C(GET_LSA, "Get Label Storage Area"),                          \

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 0/9] CXL: Read and clear event logs
  2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
                   ` (8 preceding siblings ...)
  2022-08-13  5:32 ` [RFC PATCH 9/9] cxl/test: Simulate event log overflow ira.weiny
@ 2022-08-22 16:18 ` Davidlohr Bueso
  2022-08-22 22:53   ` Ira Weiny
  9 siblings, 1 reply; 51+ messages in thread
From: Davidlohr Bueso @ 2022-08-22 16:18 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, a.manzanares, linux-kernel,
	linux-cxl

On Fri, 12 Aug 2022, ira.weiny@intel.com wrote:

>From: Ira Weiny <ira.weiny@intel.com>
>
>Event records inform the OS of various device events.  Events are not needed
>for any kernel operation but various user level software will want to track
>events.
>
>Add event reporting through the trace event mechanism.  On driver load read and
>clear all device events.
>
>Normally interrupts will trigger new events to be reported as they occur.
>Because the interrupt code is still being worked on this series provides a
>cxl-test mechanism to create a series of events and trigger the reporting of
>those events.

Where is this irq code being worked on? I've asked about this for async mbox
commands, and Jonathan has also posted some code for the PMU implementation.

Could we not just start with an initial MSI/MSI-X support? Then gradually
interested users can be added? So each "feature" would need to do implement
it's "get message number" and to install the isr just do the standard:

      irq = pci_irq_vector(pdev, num);
      irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s\n", dev_name(dev),
			       cxl_irq_cap_table[feature].name);
      rc = devm_request_irq(dev, irq, isr_fn, IRQF_SHARED, irq_name, info);

The only complexity I see for this is to know the number of vectors to request
apriori, for which we'd have to get the larges value of all CXL features that
can support interrupts. Something like the following? One thing I have not
considered in this is the DOE stuff.

Thanks,
Davidlohr

------
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 88e3a8e54b6a..b334d2f497c1 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -245,6 +245,8 @@ struct cxl_dev_state {
	resource_size_t component_reg_phys;
	u64 serial;

+	int irq_type; /* MSI-X, MSI */
+
	struct xarray doe_mbs;

	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index eec597dbe763..95f4b91f43b1 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -53,15 +53,6 @@
  #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
  #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)

-/* Register Block Identifier (RBI) */
-enum cxl_regloc_type {
-	CXL_REGLOC_RBI_EMPTY = 0,
-	CXL_REGLOC_RBI_COMPONENT,
-	CXL_REGLOC_RBI_VIRT,
-	CXL_REGLOC_RBI_MEMDEV,
-	CXL_REGLOC_RBI_TYPES
-};
-
  static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
						 struct cxl_register_map *map)
  {
@@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
  struct cxl_dev_state;
  int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
  void read_cdat_data(struct cxl_port *port);
+
+#define CXL_IRQ_CAPABILITY_TABLE				\
+	C(ISOLATION, "isolation", NULL),			\
+	C(PMU, "pmu_overflow", NULL), /* per pmu instance */	\
+	C(MBOX, "mailbox", NULL), /* primary-only */		\
+	C(EVENT, "event", NULL),
+
+#undef C
+#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
+enum  { CXL_IRQ_CAPABILITY_TABLE };
+#undef C
+#define C(a, b, c) { b, c }
+/**
+ * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
+ *
+ * @name: Name of the device generating this interrupt.
+ * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
+ *                  where there is only one instance it also indicates which
+ *                  MSI/MSI-X vector is used for the interrupt message generated
+ *                  in association with the feature. If the feature does not
+ *                  have the Interrupt Supported bit set, then return -1.
+ */
+struct cxl_irq_cap {
+	const char *name;
+	int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
+};
+
+static const
+struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
+#undef C
+
+/* Register Block Identifier (RBI) */
+enum cxl_regloc_type {
+	CXL_REGLOC_RBI_EMPTY = 0,
+	CXL_REGLOC_RBI_COMPONENT,
+	CXL_REGLOC_RBI_VIRT,
+	CXL_REGLOC_RBI_MEMDEV,
+	CXL_REGLOC_RBI_TYPES
+};
+
  #endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..c0fe78e0559b 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
	return rc;
  }

+static void cxl_pci_free_irq_vectors(void *data)
+{
+	pci_free_irq_vectors(data);
+}
+
+static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
+{
+	struct device *dev = cxlds->dev;
+	struct pci_dev *pdev = to_pci_dev(dev);
+	int rc, i, vectors = -1;
+
+	for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
+		int irq;
+
+		if (!cxl_irq_cap_table[i].get_max_msgnum)
+			continue;
+
+		irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
+		vectors = max_t(int, irq, vectors);
+	}
+
+	if (vectors == -1)
+		return -EINVAL; /* no irq support whatsoever */
+
+	vectors++;
+	rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
+	if (rc < 0) {
+		rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSI);
+		if (rc < 0)
+			return rc;
+
+		cxlds->irq_type = PCI_IRQ_MSI;
+	} else {
+		cxlds->irq_type = PCI_IRQ_MSIX;
+	}
+
+	if (rc != vectors) {
+		pci_err(pdev, "Not enough interrupts; use polling where supported\n");
+		/* Some got allocated; clean them up */
+		cxl_pci_free_irq_vectors(pdev);
+		return -ENOSPC;
+	}
+
+	return devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
+}
+
  static void cxl_pci_destroy_doe(void *mbs)
  {
	xa_destroy(mbs);
@@ -476,6 +522,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)

	cxlds->component_reg_phys = cxl_regmap_to_base(pdev, &map);

+	if (cxl_pci_alloc_irq_vectors(cxlds))
+		cxlds->irq_type = 0;
+
	devm_cxl_pci_create_doe(cxlds);

	rc = cxl_pci_setup_mailbox(cxlds);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 0/9] CXL: Read and clear event logs
  2022-08-22 16:18 ` [RFC PATCH 0/9] CXL: Read and clear event logs Davidlohr Bueso
@ 2022-08-22 22:53   ` Ira Weiny
  2022-08-23 16:12     ` Davidlohr Bueso
  2022-08-24 10:07     ` Jonathan Cameron
  0 siblings, 2 replies; 51+ messages in thread
From: Ira Weiny @ 2022-08-22 22:53 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, a.manzanares, linux-kernel,
	linux-cxl

On Mon, Aug 22, 2022 at 09:18:02AM -0700, Davidlohr Bueso wrote:
> On Fri, 12 Aug 2022, ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Event records inform the OS of various device events.  Events are not needed
> > for any kernel operation but various user level software will want to track
> > events.
> > 
> > Add event reporting through the trace event mechanism.  On driver load read and
> > clear all device events.
> > 
> > Normally interrupts will trigger new events to be reported as they occur.
> > Because the interrupt code is still being worked on this series provides a
> > cxl-test mechanism to create a series of events and trigger the reporting of
> > those events.
> 
> Where is this irq code being worked on? I've asked about this for async mbox
> commands, and Jonathan has also posted some code for the PMU implementation.

I'm still trying to work out how to share irq's between PCI and CXL.  Mainly
for DOE.

I thought that we could skip IRQ support for DOE completely and this would
support your proposal below.  But I just found that:

"A device may interrupt the host when CDAT content changes using the MSI
associated with this DOE Capability instance."

So I guess it needs to be supported at some point.

> 
> Could we not just start with an initial MSI/MSI-X support? Then gradually
> interested users can be added? So each "feature" would need to do implement
> it's "get message number" and to install the isr just do the standard:
> 
>      irq = pci_irq_vector(pdev, num);
>      irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s\n", dev_name(dev),
> 			       cxl_irq_cap_table[feature].name);
>      rc = devm_request_irq(dev, irq, isr_fn, IRQF_SHARED, irq_name, info);
> 
> The only complexity I see for this is to know the number of vectors to request
> apriori, for which we'd have to get the larges value of all CXL features that
> can support interrupts. Something like the following?

Generally it seems ok but I have questions below.

> One thing I have not
> considered in this is the DOE stuff.

I think this is the harder thing to support because of needing to allow both
the PCI layer and the CXL layer to create irqs.  Potentially at different
times.

> 
> Thanks,
> Davidlohr
> 
> ------
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..b334d2f497c1 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -245,6 +245,8 @@ struct cxl_dev_state {
> 	resource_size_t component_reg_phys;
> 	u64 serial;
> 
> +	int irq_type; /* MSI-X, MSI */
> +
> 	struct xarray doe_mbs;
> 
> 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index eec597dbe763..95f4b91f43b1 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -53,15 +53,6 @@
>  #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
>  #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
> 
> -/* Register Block Identifier (RBI) */
> -enum cxl_regloc_type {
> -	CXL_REGLOC_RBI_EMPTY = 0,
> -	CXL_REGLOC_RBI_COMPONENT,
> -	CXL_REGLOC_RBI_VIRT,
> -	CXL_REGLOC_RBI_MEMDEV,
> -	CXL_REGLOC_RBI_TYPES
> -};

Why move this?

> -
>  static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
> 						 struct cxl_register_map *map)
>  {
> @@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
>  struct cxl_dev_state;
>  int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
>  void read_cdat_data(struct cxl_port *port);
> +
> +#define CXL_IRQ_CAPABILITY_TABLE				\
> +	C(ISOLATION, "isolation", NULL),			\
> +	C(PMU, "pmu_overflow", NULL), /* per pmu instance */	\
> +	C(MBOX, "mailbox", NULL), /* primary-only */		\
> +	C(EVENT, "event", NULL),

This is defining get_max_msgnum to NULL right?

> +
> +#undef C
> +#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
> +enum  { CXL_IRQ_CAPABILITY_TABLE };
> +#undef C
> +#define C(a, b, c) { b, c }
> +/**
> + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
> + *
> + * @name: Name of the device generating this interrupt.
> + * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
> + *                  where there is only one instance it also indicates which
> + *                  MSI/MSI-X vector is used for the interrupt message generated
> + *                  in association with the feature. If the feature does not
> + *                  have the Interrupt Supported bit set, then return -1.
> + */
> +struct cxl_irq_cap {
> +	const char *name;
> +	int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
> +};
> +
> +static const
> +struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
> +#undef C

Why all this macro magic?

> +
> +/* Register Block Identifier (RBI) */
> +enum cxl_regloc_type {
> +	CXL_REGLOC_RBI_EMPTY = 0,
> +	CXL_REGLOC_RBI_COMPONENT,
> +	CXL_REGLOC_RBI_VIRT,
> +	CXL_REGLOC_RBI_MEMDEV,
> +	CXL_REGLOC_RBI_TYPES
> +};
> +
>  #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..c0fe78e0559b 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
> 	return rc;
>  }
> 
> +static void cxl_pci_free_irq_vectors(void *data)
> +{
> +	pci_free_irq_vectors(data);
> +}
> +
> +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	int rc, i, vectors = -1;
> +
> +	for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
> +		int irq;
> +
> +		if (!cxl_irq_cap_table[i].get_max_msgnum)
> +			continue;
> +
> +		irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
> +		vectors = max_t(int, irq, vectors);
> +	}
> +
> +	if (vectors == -1)
> +		return -EINVAL; /* no irq support whatsoever */
> +
> +	vectors++;

This is pretty much what earlier versions of the DOE code did with the
exception of only have 1 get_max_msgnum() calls defined (for DOE).  But there
was a lot of debate about how to share vectors with the PCI layer.  And
eventually we got rid of it.  I'm still trying to figure it out.  Sorry for
being slow.

Perhaps we do this for this series.  However, won't we have an issue if we want
to support switch events?

Ira

> +	rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
> +	if (rc < 0) {
> +		rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSI);
> +		if (rc < 0)
> +			return rc;
> +
> +		cxlds->irq_type = PCI_IRQ_MSI;
> +	} else {
> +		cxlds->irq_type = PCI_IRQ_MSIX;
> +	}
> +
> +	if (rc != vectors) {
> +		pci_err(pdev, "Not enough interrupts; use polling where supported\n");
> +		/* Some got allocated; clean them up */
> +		cxl_pci_free_irq_vectors(pdev);
> +		return -ENOSPC;
> +	}
> +
> +	return devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> +}
> +
>  static void cxl_pci_destroy_doe(void *mbs)
>  {
> 	xa_destroy(mbs);
> @@ -476,6 +522,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> 
> 	cxlds->component_reg_phys = cxl_regmap_to_base(pdev, &map);
> 
> +	if (cxl_pci_alloc_irq_vectors(cxlds))
> +		cxlds->irq_type = 0;
> +
> 	devm_cxl_pci_create_doe(cxlds);
> 
> 	rc = cxl_pci_setup_mailbox(cxlds);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 0/9] CXL: Read and clear event logs
  2022-08-22 22:53   ` Ira Weiny
@ 2022-08-23 16:12     ` Davidlohr Bueso
  2022-08-24 10:07     ` Jonathan Cameron
  1 sibling, 0 replies; 51+ messages in thread
From: Davidlohr Bueso @ 2022-08-23 16:12 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, a.manzanares, linux-kernel,
	linux-cxl

On Mon, 22 Aug 2022, Ira Weiny wrote:

>Generally it seems ok but I have questions below.
>
>> One thing I have not
>> considered in this is the DOE stuff.
>
>I think this is the harder thing to support because of needing to allow both
>the PCI layer and the CXL layer to create irqs.  Potentially at different
>times.

I agree.

>> -/* Register Block Identifier (RBI) */
>> -enum cxl_regloc_type {
>> -	CXL_REGLOC_RBI_EMPTY = 0,
>> -	CXL_REGLOC_RBI_COMPONENT,
>> -	CXL_REGLOC_RBI_VIRT,
>> -	CXL_REGLOC_RBI_MEMDEV,
>> -	CXL_REGLOC_RBI_TYPES
>> -};
>
>Why move this?

That was sloppy of me, sorry. I wanted to reuse struct cxlds forward declaration,
no idea why that diff formed.

>> -
>>  static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
>>						 struct cxl_register_map *map)
>>  {
>> @@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
>>  struct cxl_dev_state;
>>  int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
>>  void read_cdat_data(struct cxl_port *port);
>> +
>> +#define CXL_IRQ_CAPABILITY_TABLE				\
>> +	C(ISOLATION, "isolation", NULL),			\
>> +	C(PMU, "pmu_overflow", NULL), /* per pmu instance */	\
>> +	C(MBOX, "mailbox", NULL), /* primary-only */		\
>> +	C(EVENT, "event", NULL),
>
>This is defining get_max_msgnum to NULL right?

Yes. So untl there are any users everything's a nop.

>> +
>> +#undef C
>> +#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
>> +enum  { CXL_IRQ_CAPABILITY_TABLE };
>> +#undef C
>> +#define C(a, b, c) { b, c }
>> +/**
>> + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
>> + *
>> + * @name: Name of the device generating this interrupt.
>> + * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
>> + *                  where there is only one instance it also indicates which
>> + *                  MSI/MSI-X vector is used for the interrupt message generated
>> + *                  in association with the feature. If the feature does not
>> + *                  have the Interrupt Supported bit set, then return -1.
>> + */
>> +struct cxl_irq_cap {
>> +	const char *name;
>> +	int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
>> +};
>> +
>> +static const
>> +struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
>> +#undef C
>
>Why all this macro magic?

A nifty trick Dan likes, it avoids duplicating the fields (enums + the table).

>> +
>> +/* Register Block Identifier (RBI) */
>> +enum cxl_regloc_type {
>> +	CXL_REGLOC_RBI_EMPTY = 0,
>> +	CXL_REGLOC_RBI_COMPONENT,
>> +	CXL_REGLOC_RBI_VIRT,
>> +	CXL_REGLOC_RBI_MEMDEV,
>> +	CXL_REGLOC_RBI_TYPES
>> +};
>> +
>>  #endif /* __CXL_PCI_H__ */
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index faeb5d9d7a7a..c0fe78e0559b 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
>>	return rc;
>>  }
>>
>> +static void cxl_pci_free_irq_vectors(void *data)
>> +{
>> +	pci_free_irq_vectors(data);
>> +}
>> +
>> +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
>> +{
>> +	struct device *dev = cxlds->dev;
>> +	struct pci_dev *pdev = to_pci_dev(dev);
>> +	int rc, i, vectors = -1;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
>> +		int irq;
>> +
>> +		if (!cxl_irq_cap_table[i].get_max_msgnum)
>> +			continue;
>> +
>> +		irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
>> +		vectors = max_t(int, irq, vectors);
>> +	}
>> +
>> +	if (vectors == -1)
>> +		return -EINVAL; /* no irq support whatsoever */
>> +
>> +	vectors++;
>
>This is pretty much what earlier versions of the DOE code did with the
>exception of only have 1 get_max_msgnum() calls defined (for DOE).  But there
>was a lot of debate about how to share vectors with the PCI layer.  And
>eventually we got rid of it.  I'm still trying to figure it out.  Sorry for
>being slow.

That makes sense, thanks for the explanation. And no not slow, it is _I_
that needs to go re-read the DOE stuff with more attention. But while I
knew this was the hardest part, all I really wanted was a basic irq
support to add to the bg cmd handling series.

>Perhaps we do this for this series.  However, won't we have an issue if we want
>to support switch events?

If possible, could you elaborate more on this?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 0/9] CXL: Read and clear event logs
  2022-08-22 22:53   ` Ira Weiny
  2022-08-23 16:12     ` Davidlohr Bueso
@ 2022-08-24 10:07     ` Jonathan Cameron
  2022-09-01 18:10       ` Dave Jiang
  1 sibling, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-08-24 10:07 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Davidlohr Bueso, Dan Williams, Alison Schofield, Vishal Verma,
	Ben Widawsky, Steven Rostedt, a.manzanares, linux-kernel,
	linux-cxl

On Mon, 22 Aug 2022 15:53:54 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Mon, Aug 22, 2022 at 09:18:02AM -0700, Davidlohr Bueso wrote:
> > On Fri, 12 Aug 2022, ira.weiny@intel.com wrote:
> >   
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > Event records inform the OS of various device events.  Events are not needed
> > > for any kernel operation but various user level software will want to track
> > > events.
> > > 
> > > Add event reporting through the trace event mechanism.  On driver load read and
> > > clear all device events.
> > > 
> > > Normally interrupts will trigger new events to be reported as they occur.
> > > Because the interrupt code is still being worked on this series provides a
> > > cxl-test mechanism to create a series of events and trigger the reporting of
> > > those events.  
> > 
> > Where is this irq code being worked on? I've asked about this for async mbox
> > commands, and Jonathan has also posted some code for the PMU implementation.  
> 
> I'm still trying to work out how to share irq's between PCI and CXL.  Mainly
> for DOE.
> 
> I thought that we could skip IRQ support for DOE completely and this would
> support your proposal below.  But I just found that:
> 
> "A device may interrupt the host when CDAT content changes using the MSI
> associated with this DOE Capability instance."

As of today that doesn't work because there is no status flag anywhere to let
you know that was the interrupt source.

It's been raised in appropriate places, but I can't say anymore on that
until stuff is published.

Hence I'd not worry about that corner for now.

> 
> So I guess it needs to be supported at some point.
> 
> > 
> > Could we not just start with an initial MSI/MSI-X support? Then gradually
> > interested users can be added? So each "feature" would need to do implement
> > it's "get message number" and to install the isr just do the standard:
> > 
> >      irq = pci_irq_vector(pdev, num);
> >      irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s\n", dev_name(dev),
> > 			       cxl_irq_cap_table[feature].name);
> >      rc = devm_request_irq(dev, irq, isr_fn, IRQF_SHARED, irq_name, info);
> > 
> > The only complexity I see for this is to know the number of vectors to request
> > apriori, for which we'd have to get the larges value of all CXL features that
> > can support interrupts. Something like the following?  
> 
> Generally it seems ok but I have questions below.
> 
> > One thing I have not
> > considered in this is the DOE stuff.  
> 
> I think this is the harder thing to support because of needing to allow both
> the PCI layer and the CXL layer to create irqs.  Potentially at different
> times.

My reasoning on this is that IRQ creation has to be done by
the PCI device driver.  That may result in some juggling and late starting
or indeed restarting of DOE mailboxes once we can know the list of vectors.
(e.g. query them by polling, then a later driver register can request enabling
the DOE with an irq).
Or it needs the ability to do dynamic increasing of the requested IRQ vectors.

> 
> > 
> > Thanks,
> > Davidlohr
> > 
> > ------
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 88e3a8e54b6a..b334d2f497c1 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -245,6 +245,8 @@ struct cxl_dev_state {
> > 	resource_size_t component_reg_phys;
> > 	u64 serial;
> > 
> > +	int irq_type; /* MSI-X, MSI */
> > +
> > 	struct xarray doe_mbs;
> > 
> > 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > index eec597dbe763..95f4b91f43b1 100644
> > --- a/drivers/cxl/cxlpci.h
> > +++ b/drivers/cxl/cxlpci.h
> > @@ -53,15 +53,6 @@
> >  #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
> >  #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
> > 
> > -/* Register Block Identifier (RBI) */
> > -enum cxl_regloc_type {
> > -	CXL_REGLOC_RBI_EMPTY = 0,
> > -	CXL_REGLOC_RBI_COMPONENT,
> > -	CXL_REGLOC_RBI_VIRT,
> > -	CXL_REGLOC_RBI_MEMDEV,
> > -	CXL_REGLOC_RBI_TYPES
> > -};  
> 
> Why move this?
> 
> > -
> >  static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
> > 						 struct cxl_register_map *map)
> >  {
> > @@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> >  struct cxl_dev_state;
> >  int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
> >  void read_cdat_data(struct cxl_port *port);
> > +
> > +#define CXL_IRQ_CAPABILITY_TABLE				\
> > +	C(ISOLATION, "isolation", NULL),			\
> > +	C(PMU, "pmu_overflow", NULL), /* per pmu instance */	\
> > +	C(MBOX, "mailbox", NULL), /* primary-only */		\
> > +	C(EVENT, "event", NULL),  
> 
> This is defining get_max_msgnum to NULL right?
> 
> > +
> > +#undef C
> > +#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
> > +enum  { CXL_IRQ_CAPABILITY_TABLE };
> > +#undef C
> > +#define C(a, b, c) { b, c }
> > +/**
> > + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
> > + *
> > + * @name: Name of the device generating this interrupt.
> > + * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
> > + *                  where there is only one instance it also indicates which
> > + *                  MSI/MSI-X vector is used for the interrupt message generated
> > + *                  in association with the feature. If the feature does not
> > + *                  have the Interrupt Supported bit set, then return -1.
> > + */
> > +struct cxl_irq_cap {
> > +	const char *name;
> > +	int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
> > +};
> > +
> > +static const
> > +struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
> > +#undef C  
> 
> Why all this macro magic?

Agreed. I'm rarely persuaded it's a good idea to do this sort of trickery
and it definitely isn't worth the readabilty problems unless there a
large number of users.

> 
> > +
> > +/* Register Block Identifier (RBI) */
> > +enum cxl_regloc_type {
> > +	CXL_REGLOC_RBI_EMPTY = 0,
> > +	CXL_REGLOC_RBI_COMPONENT,
> > +	CXL_REGLOC_RBI_VIRT,
> > +	CXL_REGLOC_RBI_MEMDEV,
> > +	CXL_REGLOC_RBI_TYPES
> > +};
> > +
> >  #endif /* __CXL_PCI_H__ */
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index faeb5d9d7a7a..c0fe78e0559b 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
> > 	return rc;
> >  }
> > 
> > +static void cxl_pci_free_irq_vectors(void *data)
> > +{
> > +	pci_free_irq_vectors(data);
> > +}
> > +
> > +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> > +{
> > +	struct device *dev = cxlds->dev;
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +	int rc, i, vectors = -1;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
> > +		int irq;
> > +
> > +		if (!cxl_irq_cap_table[i].get_max_msgnum)
> > +			continue;
> > +
> > +		irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
> > +		vectors = max_t(int, irq, vectors);
> > +	}
> > +
> > +	if (vectors == -1)
> > +		return -EINVAL; /* no irq support whatsoever */
> > +
> > +	vectors++;  
> 
> This is pretty much what earlier versions of the DOE code did with the
> exception of only have 1 get_max_msgnum() calls defined (for DOE).  But there
> was a lot of debate about how to share vectors with the PCI layer.  And
> eventually we got rid of it.  I'm still trying to figure it out.  Sorry for
> being slow.

I'm not yet setting huge advantage in wrapping this up. For now a set of
linear calls to establish the max irq vector is more readable.  Sure
down the line moving to this may make sense.

> 
> Perhaps we do this for this series.  However, won't we have an issue if we want
> to support switch events?

We 'could' extend existing stuff in the portdrv code (which is ultimately
where this general approach was copied from ;) but I suspect doing that
for non generic PCI stuff is going to be controversial.

That whole infrastructure in PCI may need a rewrite.

> 
> Ira
> 
> > +	rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
> > +	if (rc < 0) {
> > +		rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSI);
> > +		if (rc < 0)
> > +			return rc;
> > +
> > +		cxlds->irq_type = PCI_IRQ_MSI;
> > +	} else {
> > +		cxlds->irq_type = PCI_IRQ_MSIX;
> > +	}
> > +
> > +	if (rc != vectors) {
> > +		pci_err(pdev, "Not enough interrupts; use polling where supported\n");
> > +		/* Some got allocated; clean them up */
> > +		cxl_pci_free_irq_vectors(pdev);
> > +		return -ENOSPC;
> > +	}
> > +
> > +	return devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> > +}
> > +
> >  static void cxl_pci_destroy_doe(void *mbs)
> >  {
> > 	xa_destroy(mbs);
> > @@ -476,6 +522,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > 
> > 	cxlds->component_reg_phys = cxl_regmap_to_base(pdev, &map);
> > 
> > +	if (cxl_pci_alloc_irq_vectors(cxlds))
> > +		cxlds->irq_type = 0;
> > +
> > 	devm_cxl_pci_create_doe(cxlds);
> > 
> > 	rc = cxl_pci_setup_mailbox(cxlds);  


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-08-13  5:32 ` [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command ira.weiny
  2022-08-16 16:39   ` Steven Rostedt
  2022-08-17 22:54   ` Dave Jiang
@ 2022-08-24 15:50   ` Jonathan Cameron
  2022-09-07  4:28     ` Ira Weiny
  2 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-08-24 15:50 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 12 Aug 2022 22:32:35 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Event records are defined for CXL devices.  Each record is reported in
> one event log.  Devices are required to support the storage of at least
> one event record in each event log type.
Hi Ira,

Someone went and slipped in a new field in CXL r3.0.  Might be easier
just to add it now?

A few other comments inline.

> 
> Devices track event log overflow by incrementing a counter and tracking
> the time of the first and last overflow event seen.
> 
> Software queries events via the Get Event Record mailbox command; CXL
> v3.0 section 8.2.9.2.2.
rev3.0

You reference 3.0 but use 2.0 definitions below (I'm guessing this crossed
with spec release).

> 
> Issue the Get Event Record mailbox command on driver load.  Trace each
> record found, as well as any overflow conditions.  Only 1 event is
> requested for each query.  Optimization of multiple record queries is
> deferred.
I'd be tempted to make it easier by using a variable sized fail element and
an allocation, but fair enough that can come later.

> 
> This patch traces a raw event record only and leaves the specific event
> record types to subsequent patches.
> 
> NOTE: checkpatch is not completely happy with the tracing part of this
> patch but AFAICT it is correct.  I'm open to suggestions if I've done
> something wrong.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>


> ---
>  MAINTAINERS                       |   1 +
>  drivers/cxl/core/mbox.c           |  60 ++++++++++++++
>  drivers/cxl/cxlmem.h              |  66 ++++++++++++++++
>  include/trace/events/cxl-events.h | 127 ++++++++++++++++++++++++++++++
>  include/uapi/linux/cxl_mem.h      |   1 +
>  5 files changed, 255 insertions(+)
>  create mode 100644 include/trace/events/cxl-events.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 54fa6e2059de..1cb9cec31009 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5014,6 +5014,7 @@ M:	Dan Williams <dan.j.williams@intel.com>
>  L:	linux-cxl@vger.kernel.org
>  S:	Maintained
>  F:	drivers/cxl/
> +F:	include/trace/events/cxl*.h
>  F:	include/uapi/linux/cxl_mem.h
>  
>  CONEXANT ACCESSRUNNER USB DRIVER
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..2cceed8608dc 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -7,6 +7,9 @@
>  #include <cxlmem.h>
>  #include <cxl.h>
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl-events.h>
> +
>  #include "core.h"
>  
>  static bool cxl_raw_allow_all;
> @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
>  #endif
>  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> +	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
>  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -704,6 +708,62 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> +static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> +				   enum cxl_event_log_type type)
> +{
> +	struct cxl_get_event_payload payload;
> +
> +	do {
> +		u8 log_type = type;
> +		u16 record_count;
> +		int rc;
> +
> +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> +				       &log_type, sizeof(log_type),
> +				       &payload, sizeof(payload));
> +		if (rc)
> +			return rc;
> +
> +		record_count = le16_to_cpu(payload.record_count);
> +		if (record_count > 0)

If it is anything other than 1 you have a problem..  So fornow
I would check for that.

> +			trace_cxl_event(dev_name(cxlds->dev), type,
> +					&payload.record);
> +
> +		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> +			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> +						 &payload);
> +
> +	} while (payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> +
> +	return 0;
> +}
> +
> +/**
> + * cxl_mem_get_event_records - Get Event Records from the device
> + * @cxlds: The device data for the operation
> + *
> + * Retrieve all event records available on the device and report them as trace
> + * events.
> + *
> + * See CXL v3.0 @8.2.9.2.2 Get Event Records
> + */
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	enum cxl_event_log_type log_type;
> +
> +	for (log_type = CXL_EVENT_TYPE_INFO;
> +	     log_type < CXL_EVENT_TYPE_MAX; log_type++) {
> +		int rc;
> +
> +		rc = cxl_mem_get_records_log(cxlds, log_type);
> +		if (rc)
> +			dev_err(dev, "Failed to query %s Event Logs : %d",
> +				cxl_event_log_type_str(log_type), rc);
> +	}
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> +
>  /**
>   * cxl_mem_get_partition_info - Get partition info
>   * @cxlds: The device data for the operation
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..f83634f3bc8d 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -4,6 +4,7 @@
>  #define __CXL_MEM_H__
>  #include <uapi/linux/cxl_mem.h>
>  #include <linux/cdev.h>
> +#include <linux/uuid.h>
>  #include "cxl.h"
>  
>  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -253,6 +254,7 @@ struct cxl_dev_state {
>  enum cxl_opcode {
>  	CXL_MBOX_OP_INVALID		= 0x0000,
>  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> +	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
>  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -322,6 +324,69 @@ struct cxl_mbox_identify {
>  	u8 qos_telemetry_caps;
>  } __packed;
>  
> +/*
> + * Common Event Record Format
> + * CXL v3.0 section 8.2.9.2.1; Table 8-42
> + */
> +struct cxl_event_record_hdr {
> +	uuid_t id;
> +	__le32 flags_length;

Can you split this into a u8 and
u8[3] then use the get_unaligned_le24 accessor
where appropriate? Oh for 24bit types ;)

> +	__le16 handle;
> +	__le16 related_handle;
> +	__le64 timestamp;
> +	__le64 reserved1;

As below. Maintenance op from CXL 3.0?  Seems easy
to add now rather than needing a change later.

> +	__le64 reserved2;
> +} __packed;
> +
> +#define EVENT_RECORD_DATA_LENGTH 0x50
> +struct cxl_event_record_raw {
> +	struct cxl_event_record_hdr hdr;
> +	u8 data[EVENT_RECORD_DATA_LENGTH];
> +} __packed;
> +
> +/*
> + * Get Event Records output payload
> + * CXL v3.0 section 8.2.9.2.2; Table 8-50

r3.0 :) (just drop the v and go with 3.0 would be my preference).

> + *
> + * Space given for 1 record
> + */
> +#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
> +#define CXL_GET_EVENT_FLAG_MORE_RECORDS	BIT(1)
> +struct cxl_get_event_payload {
> +	u8 flags;
> +	u8 reserved1;
> +	__le16 overflow_err_count;
> +	__le64 first_overflow_timestamp;
> +	__le64 last_overflow_timestamp;
> +	__le16 record_count;
> +	u8 reserved2[0xa];
> +	struct cxl_event_record_raw record;
> +} __packed;
> +
> +enum cxl_event_log_type {
> +	CXL_EVENT_TYPE_INFO = 0x00,
> +	CXL_EVENT_TYPE_WARN,
> +	CXL_EVENT_TYPE_FAIL,
> +	CXL_EVENT_TYPE_FATAL,

Worth putting Dynamic capacity in now? Up to you.

> +	CXL_EVENT_TYPE_MAX
> +};

Blank line for readability.

> +static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
> +{
> +	switch (type) {
> +	case CXL_EVENT_TYPE_INFO:
> +		return "Informational";
> +	case CXL_EVENT_TYPE_WARN:
> +		return "Warning";
> +	case CXL_EVENT_TYPE_FAIL:
> +		return "Failure";
> +	case CXL_EVENT_TYPE_FATAL:
> +		return "Fatal";
> +	default:
> +		break;
> +	}
> +	return "<unknown>";
> +}
> +
>  struct cxl_mbox_get_partition_info {
>  	__le64 active_volatile_cap;
>  	__le64 active_persistent_cap;
> @@ -381,6 +446,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
>  struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
>  void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>  void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);
>  void cxl_mem_active_dec(void);
> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> new file mode 100644
> index 000000000000..f4baeae66cf3
> --- /dev/null
> +++ b/include/trace/events/cxl-events.h
> @@ -0,0 +1,127 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl_events
> +
> +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_EVENTS_H
> +
> +#include <linux/tracepoint.h>
> +
> +#define EVENT_LOGS					\
> +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
> +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
> +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
> +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
> +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")

Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
dynamic capacity events so I guess it doesn't matter.

> +
> +/*
> + * First define the enums in the above macros to be exported to userspace via
> + * TRACE_DEFINE_ENUM().
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b)	TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
> +
> +EVENT_LOGS
> +#define show_log_type(type) __print_symbolic(type, EVENT_LOGS)
> +
> +/*
> + * Now redefine the EM and EMe macros to map the enums to the strings that will
> + * be printed in the output
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b)        {a, b},
> +#define EMe(a, b)       {a, b}
> +
> +TRACE_EVENT(cxl_event_overflow,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_get_event_payload *payload),
> +
> +	TP_ARGS(dev_name, log, payload),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__field(u16, count)
> +		__field(u64, first)
> +		__field(u64, last)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->log = log;
> +		__entry->count = le16_to_cpu(payload->overflow_err_count);
> +		__entry->first = le64_to_cpu(payload->first_overflow_timestamp);
> +		__entry->last = le64_to_cpu(payload->last_overflow_timestamp);
> +	),
> +
> +	TP_printk("%s: EVENT LOG %s OVERFLOW %u records from %llu to %llu",
> +		__get_str(dev_name), show_log_type(__entry->log),
> +		__entry->count, __entry->first, __entry->last)
> +
> +);
> +
> +/*
> + * Common Event Record Format
> + * CXL v2.0 section 8.2.9.1.1; Table 153
> + */
> +#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
> +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
> +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
> +#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
> +	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"Permanent Condition"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"Maintanance Needed"		}, \

Maintenance

> +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> +)
> +
> +TRACE_EVENT(cxl_event,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_event_record_raw *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__array(u8, id, UUID_SIZE)
> +		__field(u32, flags)
> +		__field(u16, handle)
> +		__field(u16, related_handle)
> +		__field(u64, timestamp)
> +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> +		__field(u8, length)

Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
(only noticed because I happen to have that spec revision open rather than 2.0).

> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> +		__entry->log = log;
> +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> +		__entry->length = le32_to_cpu(rec->hdr.flags_length) & 0xFF;
> +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> +		memcpy(__entry->data, &rec->data, EVENT_RECORD_DATA_LENGTH);
> +	),
> +
> +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s' " \
> +		  ": %s",
> +		__get_str(dev_name), show_log_type(__entry->log),
> +		__entry->timestamp, __entry->id, __entry->handle,
> +		__entry->related_handle, show_hdr_flags(__entry->flags),
> +		__print_hex(__entry->data, EVENT_RECORD_DATA_LENGTH)
> +		)
> +);
> +
> +#endif /* _CXL_TRACE_EVENTS_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl-events
> +#include <trace/define_trace.h>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 2/9] cxl/mem: Implement Clear Event Records command
  2022-08-13  5:32 ` [RFC PATCH 2/9] cxl/mem: Implement Clear " ira.weiny
@ 2022-08-24 15:55   ` Jonathan Cameron
  2022-09-09 21:35     ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-08-24 15:55 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 12 Aug 2022 22:32:36 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL v3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> command.  After an event record is read it needs to be cleared from the
> event log.
> 
> Implement cxl_clear_event_record() and call it for each record retrieved
> from the device.
> 
> Each record is cleared individually.  A clear all bit is specified but
> events could arrive between a get and the final clear all operation.
> Therefore each event is cleared specifically.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Trivial suggestions inline, but other than that LGTM

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/mbox.c      | 31 ++++++++++++++++++++++++++++---
>  drivers/cxl/cxlmem.h         | 15 +++++++++++++++
>  include/uapi/linux/cxl_mem.h |  1 +
>  3 files changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 2cceed8608dc..493f5ceb5d1c 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  #endif
>  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
>  	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> +	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
>  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -708,6 +709,26 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> +				  enum cxl_event_log_type log,
> +				  __le16 handle)
> +{
> +	struct cxl_mbox_clear_event_payload payload;
> +	int rc;
> +
> +	memset(&payload, 0, sizeof(payload));

Could just do payload = {};

Thouch as you are setting stuff, why not just do

payload = {
	.event_log = log,
	.nr_recs = 1,
	.handle = handle,
};
and let the compiler zero anything else (I think there are no holes to complicate
things).

> +	payload.event_log = log;
> +	payload.nr_recs = 1;
> +	payload.handle = handle;
> +
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> +			       &payload, sizeof(payload), NULL, 0);

return cxl_mbox_send_cmd() and drop rc definition.


> +	if (rc)
> +		return rc;
> +
> +	return 0;
> +}
> +
>  static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>  				   enum cxl_event_log_type type)
>  {
> @@ -725,9 +746,12 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>  			return rc;
>  
>  		record_count = le16_to_cpu(payload.record_count);
> -		if (record_count > 0)
> +		if (record_count > 0) {
>  			trace_cxl_event(dev_name(cxlds->dev), type,
>  					&payload.record);
> +			cxl_clear_event_record(cxlds, type,
> +					       payload.record.hdr.handle);
> +		}
>  
>  		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
>  			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> @@ -742,10 +766,11 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>   * cxl_mem_get_event_records - Get Event Records from the device
>   * @cxlds: The device data for the operation
>   *
> - * Retrieve all event records available on the device and report them as trace
> - * events.
> + * Retrieve all event records available on the device, report them as trace
> + * events, and clear them.
>   *
>   * See CXL v3.0 @8.2.9.2.2 Get Event Records
> + * See CXL v3.0 @8.2.9.2.3 Clear Event Records
>   */
>  void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
>  {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index f83634f3bc8d..5506e7210cf6 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -255,6 +255,7 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_INVALID		= 0x0000,
>  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
>  	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
> +	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
>  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -387,6 +388,20 @@ static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
>  	return "<unknown>";
>  }
>  
> +/*
> + * Clear Event Records input payload
> + * CXL v3.0 section 8.2.9.2.3; Table 8-51
> + *
> + * Space given for 1 record
> + */
> +struct cxl_mbox_clear_event_payload {
> +	u8 event_log;		/* enum cxl_event_log_type */
> +	u8 clear_flags;
> +	u8 nr_recs;		/* 1 for this struct */
> +	u8 reserved[3];
> +	__le16 handle;
> +};
> +
>  struct cxl_mbox_get_partition_info {
>  	__le64 active_volatile_cap;
>  	__le64 active_persistent_cap;
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 70459be5bdd4..7c1ad8062792 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -25,6 +25,7 @@
>  	___C(RAW, "Raw device command"),                                  \
>  	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
>  	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
> +	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
>  	___C(GET_FW_INFO, "Get FW Info"),                                 \
>  	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
>  	___C(GET_LSA, "Get Label Storage Area"),                          \


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 3/9] cxl/mem: Clear events on driver load
  2022-08-13  5:32 ` [RFC PATCH 3/9] cxl/mem: Clear events on driver load ira.weiny
@ 2022-08-24 15:57   ` Jonathan Cameron
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Cameron @ 2022-08-24 15:57 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 12 Aug 2022 22:32:37 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> The information contained in the events prior to the driver loading can
> be queried at any time through other mailbox commands.
> 
> Ensure a clean slate of events by reading and clearing the events.  The
> events are sent to the trace buffer but it is not anticipated to have
> anyone listening to it at driver load time.

I'm not totally sold on it being a good idea to drop records on binding
the device.  Let's see what others think...

> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  drivers/cxl/pci.c            | 2 ++
>  tools/testing/cxl/test/mem.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..5f1b492bd388 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -498,6 +498,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
> +	cxl_mem_get_event_records(cxlds);
> +
>  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>  		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
>  
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index aa2df3a15051..e2f5445d24ff 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
> +	cxl_mem_get_event_records(cxlds);
> +
>  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>  		rc = devm_cxl_add_nvdimm(dev, cxlmd);
>  


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record
  2022-08-13  5:32 ` [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record ira.weiny
@ 2022-08-24 16:11   ` Jonathan Cameron
  2022-09-12 22:38     ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-08-24 16:11 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 12 Aug 2022 22:32:38 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL v3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
> 
> Determine if the event read is a general media record and if so trace
> the record.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
A few trivial things inline...

> 
> ---
> A couple of specification questions I've had.
> 
> 1) The component id is not specified as a UUID or any particular
> format.  It is therefore reported as a byte array.  Is this intentional?
Given spec gives "device specific" I'm guessing it's intentional.

> 
> 2) This record has a very odd byte layout with a 16 bit field
>    (validity_flags) landing on a 3 byte boundary and a 3 byte bit field
>    (device) landing on a 7 byte boundary.

oops. Guess 'we' weren't paying attention.  Stuck with it now.

> 
> I've made my best guess as to how the endianess of these fields should
> be resolved.  But I'm happy to hear from other folks if what I have done
> is wrong.

Mailing list probably isn't the place to look for clarification on this.

> 
> struct cxl_evt_gen_media {
> 	struct cxl_event_record_hdr hdr;
> 	__le64 phys_addr;
> 	u8 descriptor;
> 	u8 type;
> 	u8 transaction_type;
> 	u16 validity_flags;			/* ??? */
> 	u8 channel;
> 	u8 rank;
> 	u8 device[CXL_EVT_GEN_MED_DEV_SIZE];	/* ??? */
> 	u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> } __packed;
> ---
>  drivers/cxl/core/mbox.c           |  30 ++++++-
>  drivers/cxl/cxlmem.h              |  19 +++++
>  include/trace/events/cxl-events.h | 125 ++++++++++++++++++++++++++++++
>  3 files changed, 172 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 493f5ceb5d1c..0e433f072163 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -709,6 +709,32 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> +/*
> + * General Media Event Record
> + * CXL v3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +static const uuid_t gen_media_event_uuid =
> +	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> +		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
> +
> +static void cxl_trace_event_record(const char *dev_name,
> +				   enum cxl_event_log_type type,
> +				   struct cxl_get_event_payload *payload)
> +{
> +	uuid_t *id = &payload->record.hdr.id;
> +
> +	if (uuid_equal(id, &gen_media_event_uuid)) {
> +		struct cxl_evt_gen_media *rec =
> +				(struct cxl_evt_gen_media *)&payload->record;
> +
> +		trace_cxl_gen_media_event(dev_name, type, rec);
> +		return;
> +	}
> +
> +	/* For unknown record types print just the header */
> +	trace_cxl_event(dev_name, type, &payload->record);
> +}
> +
>  static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
>  				  enum cxl_event_log_type log,
>  				  __le16 handle)
> @@ -747,8 +773,8 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>  
>  		record_count = le16_to_cpu(payload.record_count);
>  		if (record_count > 0) {
> -			trace_cxl_event(dev_name(cxlds->dev), type,
> -					&payload.record);
> +			cxl_trace_event_record(dev_name(cxlds->dev), type,
> +					       &payload);
>  			cxl_clear_event_record(cxlds, type,
>  					       payload.record.hdr.handle);
>  		}
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 5506e7210cf6..33669459ae4b 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -402,6 +402,25 @@ struct cxl_mbox_clear_event_payload {
>  	__le16 handle;
>  };
>  
> +/*
> + * General Media Event Record
> + * CXL v3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +#define CXL_EVT_GEN_MED_DEV_SIZE	3
> +#define CXL_EVT_GEN_MED_COMP_ID_SIZE	0x10
> +struct cxl_evt_gen_media {
> +	struct cxl_event_record_hdr hdr;
> +	__le64 phys_addr;
> +	u8 descriptor;
> +	u8 type;
> +	u8 transaction_type;
> +	u16 validity_flags;
> +	u8 channel;
> +	u8 rank;
> +	u8 device[CXL_EVT_GEN_MED_DEV_SIZE];
> +	u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> +} __packed;
> +
>  struct cxl_mbox_get_partition_info {
>  	__le64 active_volatile_cap;
>  	__le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> index f4baeae66cf3..b51c51fd4e62 100644
> --- a/include/trace/events/cxl-events.h
> +++ b/include/trace/events/cxl-events.h
> @@ -119,6 +119,131 @@ TRACE_EVENT(cxl_event,
>  		)
>  );
>  
> +/*
> + * General Media Event Record - GMER
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> + */
> +#define CXL_GMER_PHYS_ADDR_VOLATILE			BIT(0)
> +#define CXL_GMER_PHYS_ADDR_MASK				0x3f

Inverse of mask is confusing. Just specify the full mask.

> +
> +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT		BIT(0)
> +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT		BIT(1)
> +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW		BIT(2)
> +#define show_event_desc_flags(flags)	__print_flags(flags, "|",		   \
> +	{ CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,		"Uncorrectable Event"	}, \
> +	{ CXL_GMER_EVT_DESC_THRESHOLD_EVENT,		"Threshold event"	}, \
> +	{ CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW,	"Poison List Overflow"	}  \
> +)
> +
> +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR			0x00
> +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR			0x01
> +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR		0x02
> +#define show_mem_event_type(type)	__print_symbolic(type,			\
> +	{ CXL_GMER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },		\
> +	{ CXL_GMER_MEM_EVT_TYPE_INV_ADDR,		"Invalid Address" },	\
> +	{ CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,	"Data Path Error" }	\
> +)
> +
> +#define CXL_GMER_TRANS_UNKNOWN				0x00
> +#define CXL_GMER_TRANS_HOST_READ			0x01
> +#define CXL_GMER_TRANS_HOST_WRITE			0x02
> +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA			0x03
> +#define CXL_GMER_TRANS_HOST_INJECT_POISON		0x04
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB		0x05
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT	0x06
> +#define show_trans_type(type)	__print_symbolic(type,					\
> +	{ CXL_GMER_TRANS_UNKNOWN,			"Unknown" },			\
> +	{ CXL_GMER_TRANS_HOST_READ,			"Host Read" },			\
> +	{ CXL_GMER_TRANS_HOST_WRITE,			"Host Write" },			\
> +	{ CXL_GMER_TRANS_HOST_SCAN_MEDIA,		"Host Scan Media" },		\
> +	{ CXL_GMER_TRANS_HOST_INJECT_POISON,		"Host Inject Poison" },		\
> +	{ CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,		"Internal Media Scrub" },	\
> +	{ CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,	"Internal Media Management" }	\
> +)
> +
> +#define CXL_GMER_VALID_CHANNEL				BIT(0)
> +#define CXL_GMER_VALID_RANK				BIT(1)
> +#define CXL_GMER_VALID_DEVICE				BIT(2)
> +#define CXL_GMER_VALID_COMPONENT			BIT(3)
> +#define show_valid_flags(flags)	__print_flags(flags, "|",		   \
> +	{ CXL_GMER_VALID_CHANNEL,			"CHANNEL"	}, \
> +	{ CXL_GMER_VALID_RANK,				"RANK"		}, \
> +	{ CXL_GMER_VALID_DEVICE,			"DEVICE"	}, \
> +	{ CXL_GMER_VALID_COMPONENT,			"COMPONENT"	}  \
> +)
> +
> +TRACE_EVENT(cxl_gen_media_event,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_evt_gen_media *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		/* Common */
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__array(u8, id, UUID_SIZE)
> +		__field(u32, flags)
> +		__field(u16, handle)
> +		__field(u16, related_handle)
> +		__field(u64, timestamp)
> +
> +		/* General Media */
> +		__field(u64, phys_addr)
> +		__field(u8, descriptor)
> +		__field(u8, type)
> +		__field(u8, transaction_type)
> +		__field(u8, channel)
> +		__field(u32, device)
> +		__array(u8, comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE)
> +		__field(u16, validity_flags)
> +		__field(u8, rank) /* Out of order to pack trace record */
> +	),
> +
> +	TP_fast_assign(
> +		/* Common */
> +		__assign_str(dev_name, dev_name);
> +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> +		__entry->log = log;
> +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> +
> +		/* General Media */
> +		__entry->phys_addr = le64_to_cpu(rec->phys_addr);
> +		__entry->descriptor = rec->descriptor;
> +		__entry->type = rec->type;
> +		__entry->transaction_type = rec->transaction_type;
> +		__entry->channel = rec->channel;
> +		__entry->rank = rec->rank;
> +		__entry->device = rec->device[0] << 24 |
> +				  rec->device[1] << 16 |
> +				  rec->device[2] << 8; /* 3 byte LE ? */
> +		__entry->device = le32_to_cpu(__entry->device);
> +		memcpy(__entry->comp_id, &rec->component_id,
> +			CXL_EVT_GEN_MED_COMP_ID_SIZE);
> +		__entry->validity_flags = le16_to_cpu(rec->validity_flags);
> +	),
> +
> +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> +		  "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> +		  "rank=%u device=%x comp_id=%s valid_flags='%s'",
> +		__get_str(dev_name), show_log_type(__entry->log),
> +		__entry->timestamp, __entry->id, __entry->handle,
> +		__entry->related_handle, show_hdr_flags(__entry->flags),
> +		__entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> +		(__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> +		show_event_desc_flags(__entry->descriptor),
> +		show_mem_event_type(__entry->type),
> +		show_trans_type(__entry->transaction_type),
> +		__entry->channel, __entry->rank, __entry->device,
> +		__print_hex(__entry->comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE),
> +		show_valid_flags(__entry->validity_flags)

Can we make the printing of fields with valid flags conditional?
Been a while since I wrote a Trace point, but I think I recall doing that..

> +		)
> +);
> +
>  #endif /* _CXL_TRACE_EVENTS_H */
>  
>  /* This part must be outside protection */


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 5/9] cxl/mem: Trace DRAM Event Record
  2022-08-13  5:32 ` [RFC PATCH 5/9] cxl/mem: Trace DRAM " ira.weiny
@ 2022-08-25 10:46   ` Jonathan Cameron
  2022-09-12 23:04     ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-08-25 10:46 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 12 Aug 2022 22:32:39 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL v3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
> 
> Determine if the event read is a DRAM event record and if so trace the
> record.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> This record has a very odd byte layout with 2 - 16 bit fields
> (validity_flags and column) aligned on an odd byte boundary.  In
> addition nibble_mask and row are oddly aligned.
> 
> I've made my best guess as to how the endianess of these fields should
> be resolved.  But I'm happy to hear from other folks if what I have is
> wrong.
My assumption is same as you.  We should sanity check of course by
poking relevant people.  

Similar comments in here to previous.  Use the get_unaligned_le24()
accessors + consider not printing invalid fields.
> 
> struct cxl_evt_dram_rec {
> 	struct cxl_event_record_hdr hdr;
> 	__le64 phys_addr;
> 	u8 descriptor;
> 	u8 type;
> 	u8 transaction_type;
> 	u16 validity_flags;
> 	u8 channel;
> 	u8 rank;
> 	u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
> 	u8 bank_group;
> 	u8 bank;
> 	u8 row[CXL_EVT_DER_ROW_SIZE];
> 	u16 column;
> 	u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> } __packed;
> ---
>  drivers/cxl/core/mbox.c           |  16 +++++
>  drivers/cxl/cxlmem.h              |  24 +++++++
>  include/trace/events/cxl-events.h | 114 ++++++++++++++++++++++++++++++
>  3 files changed, 154 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 0e433f072163..6414588a3c7b 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -717,6 +717,14 @@ static const uuid_t gen_media_event_uuid =
>  	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
>  		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
>  
> +/*
> + * DRAM Event Record
> + * CXL v3.0 section 8.2.9.2.1.2; Table 8-44
rev3.0, r3.0 or just 3.0  

> + */
> +static const uuid_t dram_event_uuid =
> +	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> +		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
> +
>  static void cxl_trace_event_record(const char *dev_name,
>  				   enum cxl_event_log_type type,
>  				   struct cxl_get_event_payload *payload)
> @@ -731,6 +739,14 @@ static void cxl_trace_event_record(const char *dev_name,
>  		return;
>  	}
>  
> +	if (uuid_equal(id, &dram_event_uuid)) {
Why not else if?  Should be obvious to compiler that multiple uuid_equal
conditions can't match, but even better to not make it try hard perhaps?

> +		struct cxl_evt_dram_rec *rec =
> +				(struct cxl_evt_dram_rec *)&payload->record;
> +
> +		trace_cxl_dram_event(dev_name, type, rec);
> +		return;
> +	}
> +
>  	/* For unknown record types print just the header */
>  	trace_cxl_event(dev_name, type, &payload->record);
>  }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 33669459ae4b..50536c0a7850 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -421,6 +421,30 @@ struct cxl_evt_gen_media {
>  	u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
>  } __packed;
>  
> +/*
> + * DRAM Event Record - DER
> + * CXL v3.0 section 8.2.9.2.1.2; Table 3-44
> + */
> +#define CXL_EVT_DER_NIBBLE_MASK_SIZE		3
> +#define CXL_EVT_DER_ROW_SIZE			3
> +#define CXL_EVT_DER_CORRECTION_MASK_SIZE	0x20
> +struct cxl_evt_dram_rec {
> +	struct cxl_event_record_hdr hdr;
> +	__le64 phys_addr;
> +	u8 descriptor;
> +	u8 type;
> +	u8 transaction_type;
> +	u16 validity_flags;
I've not tried it, but can we just mark these as __le16 and use
the unaligned accessors?  get_unaligned_le16 etc
Also there is get_unaligned_le24() for the 3 byte ones.

> +	u8 channel;
> +	u8 rank;
> +	u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
> +	u8 bank_group;
> +	u8 bank;
> +	u8 row[CXL_EVT_DER_ROW_SIZE];
> +	u16 column;
> +	u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> +} __packed;
> +
>  struct cxl_mbox_get_partition_info {
>  	__le64 active_volatile_cap;
>  	__le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> index b51c51fd4e62..db9b34ddd240 100644
> --- a/include/trace/events/cxl-events.h
> +++ b/include/trace/events/cxl-events.h
> @@ -244,6 +244,120 @@ TRACE_EVENT(cxl_gen_media_event,
>  		)
>  );
>  
> +/*
> + * DRAM Event Record - DER
> + *
> + * CXL v2.0 section 8.2.9.1.1.2; Table 155
> + */
> +/*
> + * DRAM Event Record defines many fields the same as the General Media Event
> + * Record.  Reuse those definitions as appropriate.
> + */
> +#define CXL_DER_VALID_CHANNEL				BIT(0)
> +#define CXL_DER_VALID_RANK				BIT(1)
> +#define CXL_DER_VALID_NIBBLE				BIT(2)
> +#define CXL_DER_VALID_BANK_GROUP			BIT(3)
> +#define CXL_DER_VALID_BANK				BIT(4)
> +#define CXL_DER_VALID_ROW				BIT(5)
> +#define CXL_DER_VALID_COLUMN				BIT(6)
> +#define CXL_DER_VALID_CORRECTION_MASK			BIT(7)
> +#define show_dram_valid_flags(flags)	__print_flags(flags, "|",			   \
> +	{ CXL_DER_VALID_CHANNEL,			"CHANNEL"		}, \
> +	{ CXL_DER_VALID_RANK,				"RANK"			}, \
> +	{ CXL_DER_VALID_NIBBLE,				"NIBBLE"		}, \
> +	{ CXL_DER_VALID_BANK_GROUP,			"BANK GROUP"		}, \
> +	{ CXL_DER_VALID_BANK,				"BANK"			}, \
> +	{ CXL_DER_VALID_ROW,				"ROW"			}, \
> +	{ CXL_DER_VALID_COLUMN,				"COLUMN"		}, \
> +	{ CXL_DER_VALID_CORRECTION_MASK,		"CORRECTION MASK"	}  \
> +)
> +
> +TRACE_EVENT(cxl_dram_event,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_evt_dram_rec *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		/* Common */
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__array(u8, id, UUID_SIZE)
> +		__field(u32, flags)
> +		__field(u16, handle)
> +		__field(u16, related_handle)
> +		__field(u64, timestamp)
> +
> +		/* DRAM */
> +		__field(u64, phys_addr)
> +		__field(u8, descriptor)
> +		__field(u8, type)
> +		__field(u8, transaction_type)
> +		__field(u8, channel)
> +		__field(u16, validity_flags)
> +		__field(u16, column)	/* Out of order to pack trace record */
> +		__field(u32, nibble_mask)
> +		__field(u32, row)
> +		__array(u8, cor_mask, CXL_EVT_DER_CORRECTION_MASK_SIZE)
> +		__field(u8, rank)	/* Out of order to pack trace record */
> +		__field(u8, bank_group)	/* Out of order to pack trace record */
> +		__field(u8, bank)	/* Out of order to pack trace record */
> +	),
> +
> +	TP_fast_assign(
> +		/* Common */
> +		__assign_str(dev_name, dev_name);
> +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> +		__entry->log = log;
> +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> +
> +		/* DRAM */
> +		__entry->phys_addr = le64_to_cpu(rec->phys_addr);
> +		__entry->descriptor = rec->descriptor;
> +		__entry->type = rec->type;
> +		__entry->transaction_type = rec->transaction_type;
> +		__entry->validity_flags = le16_to_cpu(rec->validity_flags);
> +		__entry->channel = rec->channel;
> +		__entry->rank = rec->rank;
> +		__entry->nibble_mask = rec->nibble_mask[0] << 24 |
> +				       rec->nibble_mask[1] << 16 |
> +				       rec->nibble_mask[2] << 8; /* 3 byte LE ? */

Use get_unalinged_le24() ? I'd definitely expect these to be le24.


> +		__entry->nibble_mask = le32_to_cpu(__entry->nibble_mask);

That doesn't look right.  You will have unwound the endianness using
the shifts above. Don't convert it again (noop on le systems, so you
probably won't see a problem when testing).

> +		__entry->bank_group = rec->bank_group;
> +		__entry->bank = rec->bank;
> +		__entry->row = rec->row[0] << 24 |
> +			       rec->row[1] << 16 |
> +			       rec->row[2] << 8; /* 3 byte LE ? */

get_unaligned_le24()

> +		__entry->row = le32_to_cpu(__entry->row);

> +		__entry->column = le16_to_cpu(rec->column);
> +		memcpy(__entry->cor_mask, &rec->correction_mask,
> +			CXL_EVT_DER_CORRECTION_MASK_SIZE);
> +	),
> +
> +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> +		  "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> +		  "rank=%u nibble_mask=%x bank_group=%u bank=%u row=%u column=%u " \
> +		  "cor_mask=%s valid_flags='%s'",
> +		__get_str(dev_name), show_log_type(__entry->log),
> +		__entry->timestamp, __entry->id, __entry->handle,
> +		__entry->related_handle, show_hdr_flags(__entry->flags),
> +		__entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> +		(__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> +		show_event_desc_flags(__entry->descriptor),
As before can we not print the invalid ones based on the validity flags?

Few years ago now, but I did something along those lines for the CCIX equivalent of
this stuff.  (honestly can't remember much about it now though!)
Was a bit fiddly but lead to nicer prints in my opinion.

https://lore.kernel.org/all/20191114133919.32290-2-Jonathan.Cameron@huawei.com/


> +		show_mem_event_type(__entry->type),
> +		show_trans_type(__entry->transaction_type),
> +		__entry->channel, __entry->rank, __entry->nibble_mask,
> +		__entry->bank_group, __entry->bank,
> +		__entry->row, __entry->column,
> +		__print_hex(__entry->cor_mask, CXL_EVT_DER_CORRECTION_MASK_SIZE),
> +		show_dram_valid_flags(__entry->validity_flags)
> +		)
> +);
> +
>  #endif /* _CXL_TRACE_EVENTS_H */
>  
>  /* This part must be outside protection */


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 6/9] cxl/mem: Trace Memory Module Event Record
  2022-08-13  5:32 ` [RFC PATCH 6/9] cxl/mem: Trace Memory Module " ira.weiny
@ 2022-08-25 10:58   ` Jonathan Cameron
  2022-09-14 21:17     ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-08-25 10:58 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 12 Aug 2022 22:32:40 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL v3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> 
> Determine if the event read is memory module record and if so trace the
> record.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Similar comments to on previous patches around using
get_unaligned_le*()

> ---
>  drivers/cxl/core/mbox.c           |  16 +++
>  drivers/cxl/cxlmem.h              |  25 +++++
>  include/trace/events/cxl-events.h | 155 ++++++++++++++++++++++++++++++
>  3 files changed, 196 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 6414588a3c7b..99b09bfeaff5 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -725,6 +725,14 @@ static const uuid_t dram_event_uuid =
>  	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
>  		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
>  
> +/*
> + * Memory Module Event Record
> + * CXL v3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +static const uuid_t mem_mod_event_uuid =
> +	UUID_INIT(0xfe927475, 0xdd59, 0x4339,
> +		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
> +
>  static void cxl_trace_event_record(const char *dev_name,
>  				   enum cxl_event_log_type type,
>  				   struct cxl_get_event_payload *payload)
> @@ -747,6 +755,14 @@ static void cxl_trace_event_record(const char *dev_name,
>  		return;
>  	}
>  
> +	if (uuid_equal(id, &mem_mod_event_uuid)) {
> +		struct cxl_evt_mem_mod_rec *rec =
> +				(struct cxl_evt_mem_mod_rec *)&payload->record;
> +
> +		trace_cxl_mem_mod_event(dev_name, type, rec);
> +		return;
> +	}
> +
>  	/* For unknown record types print just the header */
>  	trace_cxl_event(dev_name, type, &payload->record);
>  }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 50536c0a7850..a02a41dfd988 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -445,6 +445,31 @@ struct cxl_evt_dram_rec {
>  	u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
>  } __packed;
>  
> +/*
> + * Get Health Info Record
> + * CXL v3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +struct cxl_get_health_info {
> +	u8 health_status;
> +	u8 media_status;
> +	u8 add_status;
> +	u8 life_used;
> +	u16 device_temp;

As previous - even though they aren't aligned, I'd have thought
__le16 etc will still work.  The unaligned accessors are fine
taking __le16 * for example.

> +	u32 dirty_shutdown_cnt;
> +	u32 cor_vol_err_cnt;
> +	u32 cor_per_err_cnt;
> +} __packed;
> +
> +/*
> + * Memory Module Event Record
> + * CXL v3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +struct cxl_evt_mem_mod_rec {
> +	struct cxl_event_record_hdr hdr;
> +	u8 event_type;
> +	struct cxl_get_health_info info;
> +} __packed;
> +
>  struct cxl_mbox_get_partition_info {
>  	__le64 active_volatile_cap;
>  	__le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> index db9b34ddd240..dbbe25fee25c 100644
> --- a/include/trace/events/cxl-events.h
> +++ b/include/trace/events/cxl-events.h
> @@ -358,6 +358,161 @@ TRACE_EVENT(cxl_dram_event,
>  		)
>  );
>  
> +/*
> + * Memory Module Event Record - MMER
> + *
> + * CXL v2.0 section 8.2.9.1.1.3; Table 156, Table 181
> + *
> + * Device Health Information - DHI; Table 181
> + */
> +#define CXL_MMER_HEALTH_STATUS_CHANGE		0x00
> +#define CXL_MMER_MEDIA_STATUS_CHANGE		0x01
> +#define CXL_MMER_LIFE_USED_CHANGE		0x02
> +#define CXL_MMER_TEMP_CHANGE			0x03
> +#define CXL_MMER_DATA_PATH_ERROR		0x04
> +#define CXL_MMER_LAS_ERROR			0x05
> +#define show_dev_evt_type(type)	__print_symbolic(type,			   \
> +	{ CXL_MMER_HEALTH_STATUS_CHANGE,	"Health Status Change"	}, \
> +	{ CXL_MMER_MEDIA_STATUS_CHANGE,		"Media Status Change"	}, \
> +	{ CXL_MMER_LIFE_USED_CHANGE,		"Life Used Change"	}, \
> +	{ CXL_MMER_TEMP_CHANGE,			"Temperature Change"	}, \
> +	{ CXL_MMER_DATA_PATH_ERROR,		"Data Path Error"	}, \
> +	{ CXL_MMER_LAS_ERROR,			"LSA Error"		}  \
> +)
> +
> +#define CXL_DHI_HS_MAINTENANCE_NEEDED				BIT(0)
> +#define CXL_DHI_HS_PERFORMANCE_DEGRADED				BIT(1)
> +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED			BIT(2)
> +#define show_health_status_flags(flags)	__print_flags(flags, "|",	   \
> +	{ CXL_DHI_HS_MAINTENANCE_NEEDED,	"Maintenance Needed"	}, \
> +	{ CXL_DHI_HS_PERFORMANCE_DEGRADED,	"Performance Degraded"	}, \
> +	{ CXL_DHI_HS_HW_REPLACEMENT_NEEDED,	"Replacement Needed"	}  \
> +)
> +
> +#define CXL_DHI_MS_NORMAL							0x00
> +#define CXL_DHI_MS_NOT_READY							0x01
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST					0x02
> +#define CXL_DHI_MS_ALL_DATA_LOST						0x03
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS			0x04
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN			0x05
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT				0x06
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS				0x07
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN				0x08
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT					0x09
> +#define show_media_status(ms)	__print_symbolic(ms,			   \
> +	{ CXL_DHI_MS_NORMAL,						   \
> +		"Normal"						}, \
> +	{ CXL_DHI_MS_NOT_READY,						   \
> +		"Not Ready"						}, \
> +	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOST,				   \
> +		"Write Persistency Lost"				}, \
> +	{ CXL_DHI_MS_ALL_DATA_LOST,					   \
> +		"All Data Lost"						}, \
> +	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS,		   \
> +		"Write Persistency Loss in the Event of Power Loss"	}, \
> +	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN,		   \
> +		"Write Persistency Loss in Event of Shutdown"		}, \
> +	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT,			   \
> +		"Write Persistency Loss Imminent"			}, \
> +	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS,		   \
> +		"All Data Loss in Event of Power Loss"			}, \
> +	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN,		   \
> +		"All Data loss in the Event of Shutdown"		}, \
> +	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT,			   \
> +		"All Data Loss Imminent"				}  \
> +)
> +
> +#define CXL_DHI_AS_NORMAL		0x0
> +#define CXL_DHI_AS_WARNING		0x1
> +#define CXL_DHI_AS_CRITICAL		0x2
> +#define show_add_status(as) __print_symbolic(as,	   \
> +	{ CXL_DHI_AS_NORMAL,		"Normal"	}, \
> +	{ CXL_DHI_AS_WARNING,		"Warning"	}, \
> +	{ CXL_DHI_AS_CRITICAL,		"Critical"	}  \
> +)
> +
> +#define CXL_DHI_AS_LIFE_USED(as)			(as & 0x3)
> +#define CXL_DHI_AS_DEV_TEMP(as)				((as & 0xC) >> 2)
> +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)			((as & 0x10) >> 4)
> +#define CXL_DHI_AS_COR_PER_ERR_CNT(as)			((as & 0x20) >> 5)
> +
> +TRACE_EVENT(cxl_mem_mod_event,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_evt_mem_mod_rec *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		/* Common */
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__array(u8, id, UUID_SIZE)
> +		__field(u32, flags)
> +		__field(u16, handle)
> +		__field(u16, related_handle)
> +		__field(u64, timestamp)
> +
> +		/* Memory Module Event */
> +		__field(u8, event_type)
> +
> +		/* Device Health Info */
> +		__field(u8, health_status)
> +		__field(u8, media_status)
> +		__field(u8, life_used)
> +		__field(u32, dirty_shutdown_cnt)
> +		__field(u32, cor_vol_err_cnt)
> +		__field(u32, cor_per_err_cnt)
> +		__field(s16, device_temp)
> +		__field(u8, add_status)
> +	),
> +
> +	TP_fast_assign(
> +		/* Common */
> +		__assign_str(dev_name, dev_name);
> +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> +		__entry->log = log;
> +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> +
> +		/* Memory Module Event */
> +		__entry->event_type = rec->event_type;
> +
> +		/* Device Health Info */
> +		__entry->health_status = rec->info.health_status;
> +		__entry->media_status = rec->info.media_status;
> +		__entry->life_used = rec->info.life_used;
> +		__entry->dirty_shutdown_cnt = le32_to_cpu(rec->info.dirty_shutdown_cnt);
> +		__entry->cor_vol_err_cnt = le32_to_cpu(rec->info.cor_vol_err_cnt);

I've lost track, but my guess is some / all of these need the unaligned_get_le32()
etc rather than aligned form.  Maybe just be lazy and use the unaligned versions
even when things happen to be aligned - then we don't have to think about it
when reviewing :)


> +		__entry->cor_per_err_cnt = le32_to_cpu(rec->info.cor_per_err_cnt);
> +		__entry->device_temp = le16_to_cpu(rec->info.device_temp);
> +		__entry->add_status = rec->info.add_status;
> +	),
> +
> +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> +		  "evt_type='%s' health_status='%s' media_status='%s' as_life_used=%s " \
> +		  "as_dev_temp=%s as_cor_vol_err_cnt=%s as_cor_per_err_cnt=%s " \
> +		  "life_used=%u dev_temp=%d dirty_shutdown_cnt=%u cor_vol_err_cnt=%u " \
> +		  "cor_per_err_cnt=%u",
> +		__get_str(dev_name), show_log_type(__entry->log),
> +		__entry->timestamp, __entry->id, __entry->handle,
> +		__entry->related_handle, show_hdr_flags(__entry->flags),
> +
> +		show_dev_evt_type(__entry->event_type),
> +		show_health_status_flags(__entry->health_status),
> +		show_media_status(__entry->media_status),
> +		show_add_status(CXL_DHI_AS_LIFE_USED(__entry->add_status)),
> +		show_add_status(CXL_DHI_AS_DEV_TEMP(__entry->add_status)),
> +		show_add_status(CXL_DHI_AS_COR_VOL_ERR_CNT(__entry->add_status)),
> +		show_add_status(CXL_DHI_AS_COR_PER_ERR_CNT(__entry->add_status)),
> +		__entry->life_used, __entry->device_temp,
> +		__entry->dirty_shutdown_cnt, __entry->cor_vol_err_cnt,
> +		__entry->cor_per_err_cnt)
> +);
> +
> +
>  #endif /* _CXL_TRACE_EVENTS_H */
>  
>  /* This part must be outside protection */


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 7/9] cxl/test: Add generic mock events
  2022-08-13  5:32 ` [RFC PATCH 7/9] cxl/test: Add generic mock events ira.weiny
@ 2022-08-25 11:31   ` Jonathan Cameron
  2022-09-15 18:53     ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-08-25 11:31 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 12 Aug 2022 22:32:41 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Facilitate testing basic Get/Clear Event functionality by creating
> multiple logs and generic events with made up UUID's.
> 
> Data is completely made up with data patterns which should be easy to
> spot in trace output.
Hi Ira,

I'm tempted to hack the QEMU emulation for this in with appropriately
complex interface to inject all the record types...
Lots to do there though, so not sure where this fits in my priority list!

> 
> Test traces are easy to obtain with a small script such as this:
> 
> 	#!/bin/bash -x
> 
> 	devices=`find /sys/devices/platform -name cxl_mem*`
> 
> 	# Generate fake events if reset is passed in

reset is rather unintuitive naming.

fill_event_queue maybe or something more in that direction?

> 	if [ "$1" == "reset" ]; then
> 	        for device in $devices; do
> 	                echo 1 > $device/mem*/event_reset
> 	        done
> 	fi
> 
> 	# Turn on tracing
> 	echo "" > /sys/kernel/tracing/trace
> 	echo 1 > /sys/kernel/tracing/events/cxl_events/enable
> 	echo 1 > /sys/kernel/tracing/tracing_on
> 
> 	# Generate fake interrupt
> 	for device in $devices; do
> 	        echo 1 > $device/mem*/event_trigger
> 	        # just trigger 1
> 	        break;
> 	done
> 
> 	# Turn off tracing and report events
> 	echo 0 > /sys/kernel/tracing/tracing_on
> 	cat /sys/kernel/tracing/trace
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  tools/testing/cxl/test/mem.c | 291 +++++++++++++++++++++++++++++++++++
>  1 file changed, 291 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index e2f5445d24ff..87196d62acf5 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -9,6 +9,8 @@
>  #include <linux/bits.h>
>  #include <cxlmem.h>
>  
> +#include <trace/events/cxl-events.h>
> +
>  #define LSA_SIZE SZ_128K
>  #define DEV_SIZE SZ_2G
>  #define EFFECT(x) (1U << x)
> @@ -137,6 +139,287 @@ static int mock_partition_info(struct cxl_dev_state *cxlds,
>  	return 0;
>  }
>  
> +/*
> + * Mock Events
> + */
> +struct mock_event_log {
> +	int cur_event;
> +	int nr_events;
> +	struct xarray events;

I'm not convinced an xarray is appropriate here (I'd have used
a fixed size array) but meh, I don't care that much and mocking
code doesn't have to be quick or elegant :)

> +};
> +
> +struct mock_event_store {
> +	struct cxl_dev_state *cxlds;
> +	struct mock_event_log *mock_logs[CXL_EVENT_TYPE_MAX];

Each entry isn't terribly big and there aren't that many of them.
Make the code simpler by just embedding the instances here?

> +};
> +
> +DEFINE_XARRAY(mock_cxlds_event_store);
> +
> +void delete_event_store(void *ds)
> +{
> +	xa_store(&mock_cxlds_event_store, (unsigned long)ds, NULL, GFP_KERNEL);
> +}
> +
> +void store_event_store(struct mock_event_store *es)
> +{
> +	struct cxl_dev_state *cxlds = es->cxlds;
> +
> +	if (xa_insert(&mock_cxlds_event_store, (unsigned long)cxlds, es,
> +		      GFP_KERNEL)) {
> +		dev_err(cxlds->dev, "Event store not available for %s\n",
> +			dev_name(cxlds->dev));
> +		return;
> +	}
> +
> +	devm_add_action_or_reset(cxlds->dev, delete_event_store, cxlds);
> +}
> +
> +struct mock_event_log *find_event_log(struct cxl_dev_state *cxlds, int log_type)
> +{
> +	struct mock_event_store *es = xa_load(&mock_cxlds_event_store,
> +					      (unsigned long)cxlds);
> +
> +	if (!es || log_type >= CXL_EVENT_TYPE_MAX)
> +		return NULL;
> +	return es->mock_logs[log_type];
> +}
> +
> +struct cxl_event_record_raw *get_cur_event(struct mock_event_log *log)
> +{
> +	return xa_load(&log->events, log->cur_event);
> +}
> +
> +__le16 get_cur_event_handle(struct mock_event_log *log)
> +{
> +	return cpu_to_le16(log->cur_event);
> +}
> +
> +static bool log_empty(struct mock_event_log *log)
> +{
> +	return log->cur_event == log->nr_events;
> +}
> +
> +static int log_rec_left(struct mock_event_log *log)
> +{
> +	return log->nr_events - log->cur_event;
> +}
> +
> +static void xa_events_destroy(void *l)
> +{
> +	struct mock_event_log *log = l;
> +
> +	xa_destroy(&log->events);
> +}
> +
> +static void event_store_add_event(struct mock_event_store *es,
> +				  enum cxl_event_log_type log_type,
> +				  struct cxl_event_record_raw *event)
> +{
> +	struct mock_event_log *log;
> +	struct device *dev = es->cxlds->dev;
> +	int rc;
> +
> +	if (log_type >= CXL_EVENT_TYPE_MAX)
> +		return;
> +
> +	log = es->mock_logs[log_type];
> +	if (!log) {
> +		log = devm_kzalloc(dev, sizeof(*log), GFP_KERNEL);

As above, I'd just embed the logs directly in the containing structure
rather than allocating on demand. init them all up front.

> +		if (!log) {
> +			dev_err(dev, "Failed to create %s log\n",
> +				cxl_event_log_type_str(log_type));
> +			return;
> +		}
> +		xa_init(&log->events);
> +		devm_add_action(dev, xa_events_destroy, log);
> +		es->mock_logs[log_type] = log;
> +	}
> +
> +	rc = xa_insert(&log->events, log->nr_events, event, GFP_KERNEL);
Not sure using an xa for a list really makes that much sense, but
doesn't matter hugely. 
> +	if (rc) {
> +		dev_err(dev, "Failed to store event %s log\n",
> +			cxl_event_log_type_str(log_type));
> +		return;
> +	}
> +	log->nr_events++;

Having an index into a static set of events is more complex.
I'd either switch to a simple array of pointers, or actually add and
remove events (or pointers to them anyway).

> +}
> +
> +/*
> + * Get and clear event only handle 1 record at a time as this is what is
> + * currently implemented in the main code.
> + */
> +static int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_get_event_payload *pl;
> +	struct mock_event_log *log;
> +	u8 log_type;
> +
> +	/* Valid request? */
> +	if (cmd->size_in != 1)
> +		return -EINVAL;
> +
> +	log_type = *((u8 *)cmd->payload_in);
> +	if (log_type >= CXL_EVENT_TYPE_MAX)
> +		return -EINVAL;
> +
> +	log = find_event_log(cxlds, log_type);
> +	if (!log || log_empty(log))
> +		goto no_data;
> +
> +	/* Don't handle more than 1 record at a time */
> +	if (cmd->size_out < sizeof(*pl))
> +		return -EINVAL;
> +
> +	pl = cmd->payload_out;
> +	memset(pl, 0, sizeof(*pl));
> +
> +	pl->record_count = cpu_to_le16(1);
> +
> +	if (log_rec_left(log) > 1)
> +		pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
> +
> +	memcpy(&pl->record, get_cur_event(log), sizeof(pl->record));
> +	pl->record.hdr.handle = get_cur_event_handle(log);
> +	return 0;
> +
> +no_data:
> +	/* Room for header? */
> +	if (cmd->size_out < (sizeof(*pl) - sizeof(pl->record)))
> +		return -EINVAL;
> +
> +	memset(cmd->payload_out, 0, cmd->size_out);
> +	return 0;
> +}
> +
> +/*
> + * Get and clear event only handle 1 record at a time as this is what is
> + * currently implemented in the main code.

Duplicating this comment seems unnecessary.
 
> + */
> +static int mock_clear_event(struct cxl_dev_state *cxlds,
> +			    struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
> +	struct mock_event_log *log;
> +	u8 log_type = pl->event_log;
> +
> +	/* Don't handle more than 1 record at a time */
> +	if (pl->nr_recs != 1)
> +		return -EINVAL;
> +
> +	if (log_type >= CXL_EVENT_TYPE_MAX)
> +		return -EINVAL;
> +
> +	log = find_event_log(cxlds, log_type);
> +	if (!log)
> +		return 0; /* No mock data in this log */
> +
> +	/*
> +	 * The current code clears events as they are read
> +	 * Test that behavior; not clearning from the middle of the log
> +	 */
> +	if (log->cur_event != le16_to_cpu(pl->handle)) {
> +		dev_err(cxlds->dev, "Clearing events out of order\n");
> +		return -EINVAL;
> +	}
> +
> +	log->cur_event++;
> +	return 0;
> +}
> +
> +static ssize_t event_reset_store(struct device *dev,
> +				 struct device_attribute *attr,
> +				 const char *buf, size_t count)
> +{
> +	struct cxl_memdev *cxlmd = container_of(dev, struct cxl_memdev, dev);
> +	int i;
> +
> +	for (i = CXL_EVENT_TYPE_INFO; i < CXL_EVENT_TYPE_MAX; i++) {
> +		struct mock_event_log *log;
> +
> +		log = find_event_log(cxlmd->cxlds, i);
> +		if (log)
> +			log->cur_event = 0;
> +	}
> +
> +	return count;
> +}
> +static DEVICE_ATTR_WO(event_reset);
> +
> +static ssize_t event_trigger_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t count)
> +{
> +	struct cxl_memdev *cxlmd = container_of(dev, struct cxl_memdev, dev);
> +
> +	cxl_mem_get_event_records(cxlmd->cxlds);
> +
> +	return count;
> +}
> +static DEVICE_ATTR_WO(event_trigger);
> +
> +static struct attribute *cxl_mock_event_attrs[] = {
> +	&dev_attr_event_reset.attr,
> +	&dev_attr_event_trigger.attr,
> +	NULL
> +};
> +ATTRIBUTE_GROUPS(cxl_mock_event);
> +
> +void remove_mock_event_groups(void *dev)
static 
> +{
> +	device_remove_groups(dev, cxl_mock_event_groups);
> +}
> +
> +struct cxl_event_record_raw maint_needed = {
> +	.hdr = {
> +		.id = UUID_INIT(0xDEADBEEF, 0xCAFE, 0xBABE, 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> +		.flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_MAINT_NEEDED << 8) |
> +					      sizeof(struct cxl_event_record_raw)),
> +		/* .handle = Set dynamically */
> +		.related_handle = cpu_to_le16(0xa5b6),
> +	},
> +	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +struct cxl_event_record_raw hardware_replace = {
static const?
> +	.hdr = {
> +		.id = UUID_INIT(0xBABECAFE, 0xBEEF, 0xDEAD, 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> +		.flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_HW_REPLACE << 8) |
> +					     sizeof(struct cxl_event_record_raw)),
> +		/* .handle = Set dynamically */
> +		.related_handle = cpu_to_le16(0xb6a5),
> +	},
> +	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
> +{
> +	struct device *dev = &cxlmd->dev;
> +	struct mock_event_store *es;
> +
> +	/*
> +	 * The memory device gets the sysfs attributes such that the cxlmd
> +	 * pointer can be used to get to a cxlds pointer.
> +	 */
> +	if (device_add_groups(dev, cxl_mock_event_groups))

Whilst it might not matter in a mocking driver, it's normal to jump through
hoops to avoid doing this because it races with userspace notifications in
all sorts of hideous ways.  It makes the sysfs maintainers very grumpy ;)
To do it here, you would need to pass the group to devm_cxl_add_memdev()
and have that slip it in before the cdev_device_add() call I think.
That wouldn't be particular invasive though. 


> +		return;
> +	if (devm_add_action_or_reset(dev, remove_mock_event_groups, dev))
> +		return;
> +
> +	/*
> +	 * All the mock event data hangs off the device itself.

Nitpick of the day: Single line comment syntax ;)

> +	 */
> +	es = devm_kzalloc(cxlmd->cxlds->dev, sizeof(*es), GFP_KERNEL);
> +	if (!es)
> +		return;
> +	es->cxlds = cxlmd->cxlds;
> +
> +	event_store_add_event(es, CXL_EVENT_TYPE_INFO, &maint_needed);
> +
> +	event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> +
> +	store_event_store(es);
> +}
> +
>  static int mock_get_lsa(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  {
>  	struct cxl_mbox_get_lsa *get_lsa = cmd->payload_in;
> @@ -224,6 +507,12 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
>  	case CXL_MBOX_OP_GET_PARTITION_INFO:
>  		rc = mock_partition_info(cxlds, cmd);
>  		break;
> +	case CXL_MBOX_OP_GET_EVENT_RECORD:
> +		rc = mock_get_event(cxlds, cmd);
> +		break;
> +	case CXL_MBOX_OP_CLEAR_EVENT_RECORD:
> +		rc = mock_clear_event(cxlds, cmd);
> +		break;
>  	case CXL_MBOX_OP_SET_LSA:
>  		rc = mock_set_lsa(cxlds, cmd);
>  		break;
> @@ -285,6 +574,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
> +	devm_cxl_mock_event_logs(cxlmd);
> +
>  	cxl_mem_get_event_records(cxlds);
>  
>  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 8/9] cxl/test: Add specific events
  2022-08-13  5:32 ` [RFC PATCH 8/9] cxl/test: Add specific events ira.weiny
@ 2022-08-25 11:37   ` Jonathan Cameron
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Cameron @ 2022-08-25 11:37 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 12 Aug 2022 22:32:42 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Each type of event has different trace point outputs.
> 
> Add mock General Media Event, DRAM event, and Memory Module Event
> records to the mock list of events returned.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  tools/testing/cxl/test/mem.c | 70 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 87196d62acf5..c5d7857ae2e5 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -391,6 +391,70 @@ struct cxl_event_record_raw hardware_replace = {
>  	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
>  };
>  
> +struct cxl_evt_gen_media gen_media = {
> +	.hdr = {
> +		.id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> +				0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> +		.flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_PERMANENT << 8) |
> +					     sizeof(struct cxl_evt_gen_media)),
> +		/* .handle = Set dynamically */
> +		.related_handle = cpu_to_le16(0),
> +	},
> +	.phys_addr = cpu_to_le64(0x2000),
> +	.descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> +	.type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> +	.transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> +	.validity_flags = cpu_to_le16(CXL_GMER_VALID_CHANNEL |
> +				      CXL_GMER_VALID_RANK),

No actual affect (I think: __put_unaligned_t is basically
forcing a packed structure element) , but put_unaligned_le16() would
make it clear this is unaligned?


> +	.channel = 1,
> +	.rank = 30
> +};
> +
> +struct cxl_evt_dram_rec dram_rec = {
> +	.hdr = {
> +		.id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> +				0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> +		.flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_PERF_DEGRADED << 8) |
> +					     sizeof(struct cxl_evt_dram_rec)),
> +		/* .handle = Set dynamically */
> +		.related_handle = cpu_to_le16(0),
> +	},
> +	.phys_addr = cpu_to_le64(0x8000),
> +	.descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> +	.type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> +	.transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> +	.validity_flags = cpu_to_le16(CXL_DER_VALID_CHANNEL |
> +				      CXL_DER_VALID_BANK_GROUP |
> +				      CXL_DER_VALID_BANK |
> +				      CXL_DER_VALID_COLUMN),
> +	.channel = 1,
> +	.bank_group = 5,
> +	.bank = 2,
> +	.column = cpu_to_le16(1024)
> +};
> +
> +struct cxl_evt_mem_mod_rec mem_mod_rec = {
> +	.hdr = {
> +		.id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
> +				0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
> +		.flags_length = cpu_to_le32(sizeof(struct cxl_evt_mem_mod_rec)),
> +		/* .handle = Set dynamically */
> +		.related_handle = cpu_to_le16(0),
> +	},
> +	.event_type = CXL_MMER_TEMP_CHANGE,
> +	.info = {
> +		.health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> +		.media_status = CXL_DHI_MS_ALL_DATA_LOST,
> +		.add_status = (CXL_DHI_AS_CRITICAL << 2) |

Can we use masks + FIELD_PREP() for these rather than
magic shifts here?

> +			      (CXL_DHI_AS_WARNING << 4) |
> +			      (CXL_DHI_AS_WARNING << 5),
> +		.device_temp = cpu_to_le16(1000),
> +		.dirty_shutdown_cnt = cpu_to_le32(30000),
> +		.cor_vol_err_cnt = cpu_to_le32(30100),
> +		.cor_per_err_cnt = cpu_to_le32(40100),
> +	}
> +};
> +
>  static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
>  {
>  	struct device *dev = &cxlmd->dev;
> @@ -414,8 +478,14 @@ static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
>  	es->cxlds = cxlmd->cxlds;
>  
>  	event_store_add_event(es, CXL_EVENT_TYPE_INFO, &maint_needed);
> +	event_store_add_event(es, CXL_EVENT_TYPE_INFO,
> +			      (struct cxl_event_record_raw *)&gen_media);
> +	event_store_add_event(es, CXL_EVENT_TYPE_INFO,
> +			      (struct cxl_event_record_raw *)&mem_mod_rec);
>  
>  	event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> +	event_store_add_event(es, CXL_EVENT_TYPE_FATAL,
> +			      (struct cxl_event_record_raw *)&dram_rec);
>  
>  	store_event_store(es);
>  }


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 0/9] CXL: Read and clear event logs
  2022-08-24 10:07     ` Jonathan Cameron
@ 2022-09-01 18:10       ` Dave Jiang
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Jiang @ 2022-09-01 18:10 UTC (permalink / raw)
  To: Jonathan Cameron, Ira Weiny
  Cc: Davidlohr Bueso, Dan Williams, Alison Schofield, Vishal Verma,
	Ben Widawsky, Steven Rostedt, a.manzanares, linux-kernel,
	linux-cxl


On 8/24/2022 3:07 AM, Jonathan Cameron wrote:
> On Mon, 22 Aug 2022 15:53:54 -0700
> Ira Weiny <ira.weiny@intel.com> wrote:
>
>> On Mon, Aug 22, 2022 at 09:18:02AM -0700, Davidlohr Bueso wrote:
>>> On Fri, 12 Aug 2022, ira.weiny@intel.com wrote:
>>>    
>>>> From: Ira Weiny <ira.weiny@intel.com>
>>>>
>>>> Event records inform the OS of various device events.  Events are not needed
>>>> for any kernel operation but various user level software will want to track
>>>> events.
>>>>
>>>> Add event reporting through the trace event mechanism.  On driver load read and
>>>> clear all device events.
>>>>
>>>> Normally interrupts will trigger new events to be reported as they occur.
>>>> Because the interrupt code is still being worked on this series provides a
>>>> cxl-test mechanism to create a series of events and trigger the reporting of
>>>> those events.
>>> Where is this irq code being worked on? I've asked about this for async mbox
>>> commands, and Jonathan has also posted some code for the PMU implementation.
>> I'm still trying to work out how to share irq's between PCI and CXL.  Mainly
>> for DOE.
>>
>> I thought that we could skip IRQ support for DOE completely and this would
>> support your proposal below.  But I just found that:
>>
>> "A device may interrupt the host when CDAT content changes using the MSI
>> associated with this DOE Capability instance."
> As of today that doesn't work because there is no status flag anywhere to let
> you know that was the interrupt source.
>
> It's been raised in appropriate places, but I can't say anymore on that
> until stuff is published.
>
> Hence I'd not worry about that corner for now.
>
>> So I guess it needs to be supported at some point.
>>
>>> Could we not just start with an initial MSI/MSI-X support? Then gradually
>>> interested users can be added? So each "feature" would need to do implement
>>> it's "get message number" and to install the isr just do the standard:
>>>
>>>       irq = pci_irq_vector(pdev, num);
>>>       irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s\n", dev_name(dev),
>>> 			       cxl_irq_cap_table[feature].name);
>>>       rc = devm_request_irq(dev, irq, isr_fn, IRQF_SHARED, irq_name, info);
>>>
>>> The only complexity I see for this is to know the number of vectors to request
>>> apriori, for which we'd have to get the larges value of all CXL features that
>>> can support interrupts. Something like the following?
>> Generally it seems ok but I have questions below.
>>
>>> One thing I have not
>>> considered in this is the DOE stuff.
>> I think this is the harder thing to support because of needing to allow both
>> the PCI layer and the CXL layer to create irqs.  Potentially at different
>> times.
> My reasoning on this is that IRQ creation has to be done by
> the PCI device driver.  That may result in some juggling and late starting
> or indeed restarting of DOE mailboxes once we can know the list of vectors.
> (e.g. query them by polling, then a later driver register can request enabling
> the DOE with an irq).
> Or it needs the ability to do dynamic increasing of the requested IRQ vectors.

tglx was working on dynamic MSIX a while back. not sure the state of 
that now

https://lore.kernel.org/lkml/87a6hof5sr.ffs@tglx/T/

DJ

>
>>> Thanks,
>>> Davidlohr
>>>
>>> ------
>>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>>> index 88e3a8e54b6a..b334d2f497c1 100644
>>> --- a/drivers/cxl/cxlmem.h
>>> +++ b/drivers/cxl/cxlmem.h
>>> @@ -245,6 +245,8 @@ struct cxl_dev_state {
>>> 	resource_size_t component_reg_phys;
>>> 	u64 serial;
>>>
>>> +	int irq_type; /* MSI-X, MSI */
>>> +
>>> 	struct xarray doe_mbs;
>>>
>>> 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>>> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
>>> index eec597dbe763..95f4b91f43b1 100644
>>> --- a/drivers/cxl/cxlpci.h
>>> +++ b/drivers/cxl/cxlpci.h
>>> @@ -53,15 +53,6 @@
>>>   #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
>>>   #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
>>>
>>> -/* Register Block Identifier (RBI) */
>>> -enum cxl_regloc_type {
>>> -	CXL_REGLOC_RBI_EMPTY = 0,
>>> -	CXL_REGLOC_RBI_COMPONENT,
>>> -	CXL_REGLOC_RBI_VIRT,
>>> -	CXL_REGLOC_RBI_MEMDEV,
>>> -	CXL_REGLOC_RBI_TYPES
>>> -};
>> Why move this?
>>
>>> -
>>>   static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
>>> 						 struct cxl_register_map *map)
>>>   {
>>> @@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
>>>   struct cxl_dev_state;
>>>   int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
>>>   void read_cdat_data(struct cxl_port *port);
>>> +
>>> +#define CXL_IRQ_CAPABILITY_TABLE				\
>>> +	C(ISOLATION, "isolation", NULL),			\
>>> +	C(PMU, "pmu_overflow", NULL), /* per pmu instance */	\
>>> +	C(MBOX, "mailbox", NULL), /* primary-only */		\
>>> +	C(EVENT, "event", NULL),
>> This is defining get_max_msgnum to NULL right?
>>
>>> +
>>> +#undef C
>>> +#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
>>> +enum  { CXL_IRQ_CAPABILITY_TABLE };
>>> +#undef C
>>> +#define C(a, b, c) { b, c }
>>> +/**
>>> + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
>>> + *
>>> + * @name: Name of the device generating this interrupt.
>>> + * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
>>> + *                  where there is only one instance it also indicates which
>>> + *                  MSI/MSI-X vector is used for the interrupt message generated
>>> + *                  in association with the feature. If the feature does not
>>> + *                  have the Interrupt Supported bit set, then return -1.
>>> + */
>>> +struct cxl_irq_cap {
>>> +	const char *name;
>>> +	int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
>>> +};
>>> +
>>> +static const
>>> +struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
>>> +#undef C
>> Why all this macro magic?
> Agreed. I'm rarely persuaded it's a good idea to do this sort of trickery
> and it definitely isn't worth the readabilty problems unless there a
> large number of users.
>
>>> +
>>> +/* Register Block Identifier (RBI) */
>>> +enum cxl_regloc_type {
>>> +	CXL_REGLOC_RBI_EMPTY = 0,
>>> +	CXL_REGLOC_RBI_COMPONENT,
>>> +	CXL_REGLOC_RBI_VIRT,
>>> +	CXL_REGLOC_RBI_MEMDEV,
>>> +	CXL_REGLOC_RBI_TYPES
>>> +};
>>> +
>>>   #endif /* __CXL_PCI_H__ */
>>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>>> index faeb5d9d7a7a..c0fe78e0559b 100644
>>> --- a/drivers/cxl/pci.c
>>> +++ b/drivers/cxl/pci.c
>>> @@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
>>> 	return rc;
>>>   }
>>>
>>> +static void cxl_pci_free_irq_vectors(void *data)
>>> +{
>>> +	pci_free_irq_vectors(data);
>>> +}
>>> +
>>> +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
>>> +{
>>> +	struct device *dev = cxlds->dev;
>>> +	struct pci_dev *pdev = to_pci_dev(dev);
>>> +	int rc, i, vectors = -1;
>>> +
>>> +	for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
>>> +		int irq;
>>> +
>>> +		if (!cxl_irq_cap_table[i].get_max_msgnum)
>>> +			continue;
>>> +
>>> +		irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
>>> +		vectors = max_t(int, irq, vectors);
>>> +	}
>>> +
>>> +	if (vectors == -1)
>>> +		return -EINVAL; /* no irq support whatsoever */
>>> +
>>> +	vectors++;
>> This is pretty much what earlier versions of the DOE code did with the
>> exception of only have 1 get_max_msgnum() calls defined (for DOE).  But there
>> was a lot of debate about how to share vectors with the PCI layer.  And
>> eventually we got rid of it.  I'm still trying to figure it out.  Sorry for
>> being slow.
> I'm not yet setting huge advantage in wrapping this up. For now a set of
> linear calls to establish the max irq vector is more readable.  Sure
> down the line moving to this may make sense.
>
>> Perhaps we do this for this series.  However, won't we have an issue if we want
>> to support switch events?
> We 'could' extend existing stuff in the portdrv code (which is ultimately
> where this general approach was copied from ;) but I suspect doing that
> for non generic PCI stuff is going to be controversial.
>
> That whole infrastructure in PCI may need a rewrite.
>
>> Ira
>>
>>> +	rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
>>> +	if (rc < 0) {
>>> +		rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSI);
>>> +		if (rc < 0)
>>> +			return rc;
>>> +
>>> +		cxlds->irq_type = PCI_IRQ_MSI;
>>> +	} else {
>>> +		cxlds->irq_type = PCI_IRQ_MSIX;
>>> +	}
>>> +
>>> +	if (rc != vectors) {
>>> +		pci_err(pdev, "Not enough interrupts; use polling where supported\n");
>>> +		/* Some got allocated; clean them up */
>>> +		cxl_pci_free_irq_vectors(pdev);
>>> +		return -ENOSPC;
>>> +	}
>>> +
>>> +	return devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
>>> +}
>>> +
>>>   static void cxl_pci_destroy_doe(void *mbs)
>>>   {
>>> 	xa_destroy(mbs);
>>> @@ -476,6 +522,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>>
>>> 	cxlds->component_reg_phys = cxl_regmap_to_base(pdev, &map);
>>>
>>> +	if (cxl_pci_alloc_irq_vectors(cxlds))
>>> +		cxlds->irq_type = 0;
>>> +
>>> 	devm_cxl_pci_create_doe(cxlds);
>>>
>>> 	rc = cxl_pci_setup_mailbox(cxlds);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-08-24 15:50   ` Jonathan Cameron
@ 2022-09-07  4:28     ` Ira Weiny
  2022-09-08 12:52       ` Jonathan Cameron
  0 siblings, 1 reply; 51+ messages in thread
From: Ira Weiny @ 2022-09-07  4:28 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Aug 24, 2022 at 04:50:58PM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:35 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Event records are defined for CXL devices.  Each record is reported in
> > one event log.  Devices are required to support the storage of at least
> > one event record in each event log type.
> Hi Ira,
> 
> Someone went and slipped in a new field in CXL r3.0.  Might be easier
> just to add it now?

Yea I did not notice the difference below.  I thought 3.0 only added records in
this area which I was going to leave to additional patches.  Thanks for
catching this.

> 
> A few other comments inline.
> 
> > 
> > Devices track event log overflow by incrementing a counter and tracking
> > the time of the first and last overflow event seen.
> > 
> > Software queries events via the Get Event Record mailbox command; CXL
> > v3.0 section 8.2.9.2.2.
> rev3.0
> 
> You reference 3.0 but use 2.0 definitions below (I'm guessing this crossed
> with spec release).

It did.  :-/

> 
> > 
> > Issue the Get Event Record mailbox command on driver load.  Trace each
> > record found, as well as any overflow conditions.  Only 1 event is
> > requested for each query.  Optimization of multiple record queries is
> > deferred.
> I'd be tempted to make it easier by using a variable sized fail element and
> an allocation, but fair enough that can come later.

Yea I don't see this as a performant path.  So I'm struggling to justify the
additional complexity of reading more than 1 event at a time.  It is not super
difficult of course but I think my time is better spent elsewhere.

> 
> > 
> > This patch traces a raw event record only and leaves the specific event
> > record types to subsequent patches.
> > 
> > NOTE: checkpatch is not completely happy with the tracing part of this
> > patch but AFAICT it is correct.  I'm open to suggestions if I've done
> > something wrong.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> 
> > ---
> >  MAINTAINERS                       |   1 +
> >  drivers/cxl/core/mbox.c           |  60 ++++++++++++++
> >  drivers/cxl/cxlmem.h              |  66 ++++++++++++++++
> >  include/trace/events/cxl-events.h | 127 ++++++++++++++++++++++++++++++
> >  include/uapi/linux/cxl_mem.h      |   1 +
> >  5 files changed, 255 insertions(+)
> >  create mode 100644 include/trace/events/cxl-events.h
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 54fa6e2059de..1cb9cec31009 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -5014,6 +5014,7 @@ M:	Dan Williams <dan.j.williams@intel.com>
> >  L:	linux-cxl@vger.kernel.org
> >  S:	Maintained
> >  F:	drivers/cxl/
> > +F:	include/trace/events/cxl*.h
> >  F:	include/uapi/linux/cxl_mem.h
> >  
> >  CONEXANT ACCESSRUNNER USB DRIVER
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 16176b9278b4..2cceed8608dc 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -7,6 +7,9 @@
> >  #include <cxlmem.h>
> >  #include <cxl.h>
> >  
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/cxl-events.h>
> > +
> >  #include "core.h"
> >  
> >  static bool cxl_raw_allow_all;
> > @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> >  	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
> >  #endif
> >  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> > +	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> >  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> >  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> >  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> > @@ -704,6 +708,62 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> >  
> > +static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > +				   enum cxl_event_log_type type)
> > +{
> > +	struct cxl_get_event_payload payload;
> > +
> > +	do {
> > +		u8 log_type = type;
> > +		u16 record_count;
> > +		int rc;
> > +
> > +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > +				       &log_type, sizeof(log_type),
> > +				       &payload, sizeof(payload));
> > +		if (rc)
> > +			return rc;
> > +
> > +		record_count = le16_to_cpu(payload.record_count);
> > +		if (record_count > 0)
> 
> If it is anything other than 1 you have a problem..  So fornow
> I would check for that.

I assume you mean if there are any records at all?  For the next version I've
checked for 1 here but 0 is also valid if there are no records to return.  So
!= 1 is not an error.

(Currently all logs are checked when the event records are queried and some may
be empty.  I don't plan on trying to distinguish the various interrupts.)

> 
> > +			trace_cxl_event(dev_name(cxlds->dev), type,
> > +					&payload.record);
> > +
> > +		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> > +			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> > +						 &payload);
> > +
> > +	} while (payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * cxl_mem_get_event_records - Get Event Records from the device
> > + * @cxlds: The device data for the operation
> > + *
> > + * Retrieve all event records available on the device and report them as trace
> > + * events.
> > + *
> > + * See CXL v3.0 @8.2.9.2.2 Get Event Records
> > + */
> > +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> > +{
> > +	struct device *dev = cxlds->dev;
> > +	enum cxl_event_log_type log_type;
> > +
> > +	for (log_type = CXL_EVENT_TYPE_INFO;
> > +	     log_type < CXL_EVENT_TYPE_MAX; log_type++) {
> > +		int rc;
> > +
> > +		rc = cxl_mem_get_records_log(cxlds, log_type);
> > +		if (rc)
> > +			dev_err(dev, "Failed to query %s Event Logs : %d",
> > +				cxl_event_log_type_str(log_type), rc);
> > +	}
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> > +
> >  /**
> >   * cxl_mem_get_partition_info - Get partition info
> >   * @cxlds: The device data for the operation
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 88e3a8e54b6a..f83634f3bc8d 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -4,6 +4,7 @@
> >  #define __CXL_MEM_H__
> >  #include <uapi/linux/cxl_mem.h>
> >  #include <linux/cdev.h>
> > +#include <linux/uuid.h>
> >  #include "cxl.h"
> >  
> >  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> > @@ -253,6 +254,7 @@ struct cxl_dev_state {
> >  enum cxl_opcode {
> >  	CXL_MBOX_OP_INVALID		= 0x0000,
> >  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> > +	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
> >  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
> >  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
> >  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> > @@ -322,6 +324,69 @@ struct cxl_mbox_identify {
> >  	u8 qos_telemetry_caps;
> >  } __packed;
> >  
> > +/*
> > + * Common Event Record Format
> > + * CXL v3.0 section 8.2.9.2.1; Table 8-42
> > + */
> > +struct cxl_event_record_hdr {
> > +	uuid_t id;
> > +	__le32 flags_length;
> 
> Can you split this into a u8 and
> u8[3] then use the get_unaligned_le24 accessor
> where appropriate? Oh for 24bit types ;)

Sure!  Another function I did not know about.

So the following should be correct ordering?

...
	uuid_t id;
	u8 length;
	u8 flags[3];
	__le16 handle;
...

There are other records which may work better this way too.

> 
> > +	__le16 handle;
> > +	__le16 related_handle;
> > +	__le64 timestamp;
> > +	__le64 reserved1;
> 
> As below. Maintenance op from CXL 3.0?  Seems easy
> to add now rather than needing a change later.

Yes I see it now.  Added.

> 
> > +	__le64 reserved2;
> > +} __packed;
> > +
> > +#define EVENT_RECORD_DATA_LENGTH 0x50
> > +struct cxl_event_record_raw {
> > +	struct cxl_event_record_hdr hdr;
> > +	u8 data[EVENT_RECORD_DATA_LENGTH];
> > +} __packed;
> > +
> > +/*
> > + * Get Event Records output payload
> > + * CXL v3.0 section 8.2.9.2.2; Table 8-50
> 
> r3.0 :) (just drop the v and go with 3.0 would be my preference).

Can do.

> 
> > + *
> > + * Space given for 1 record
> > + */
> > +#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
> > +#define CXL_GET_EVENT_FLAG_MORE_RECORDS	BIT(1)
> > +struct cxl_get_event_payload {
> > +	u8 flags;
> > +	u8 reserved1;
> > +	__le16 overflow_err_count;
> > +	__le64 first_overflow_timestamp;
> > +	__le64 last_overflow_timestamp;
> > +	__le16 record_count;
> > +	u8 reserved2[0xa];
> > +	struct cxl_event_record_raw record;
> > +} __packed;
> > +
> > +enum cxl_event_log_type {
> > +	CXL_EVENT_TYPE_INFO = 0x00,
> > +	CXL_EVENT_TYPE_WARN,
> > +	CXL_EVENT_TYPE_FAIL,
> > +	CXL_EVENT_TYPE_FATAL,
> 
> Worth putting Dynamic capacity in now? Up to you.

Might as well.

> 
> > +	CXL_EVENT_TYPE_MAX
> > +};
> 
> Blank line for readability.

Done.

> 
> > +static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
> > +{
> > +	switch (type) {
> > +	case CXL_EVENT_TYPE_INFO:
> > +		return "Informational";
> > +	case CXL_EVENT_TYPE_WARN:
> > +		return "Warning";
> > +	case CXL_EVENT_TYPE_FAIL:
> > +		return "Failure";
> > +	case CXL_EVENT_TYPE_FATAL:
> > +		return "Fatal";
> > +	default:
> > +		break;
> > +	}
> > +	return "<unknown>";
> > +}
> > +
> >  struct cxl_mbox_get_partition_info {
> >  	__le64 active_volatile_cap;
> >  	__le64 active_persistent_cap;
> > @@ -381,6 +446,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
> >  struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
> >  void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> >  void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> > +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
> >  #ifdef CONFIG_CXL_SUSPEND
> >  void cxl_mem_active_inc(void);
> >  void cxl_mem_active_dec(void);
> > diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> > new file mode 100644
> > index 000000000000..f4baeae66cf3
> > --- /dev/null
> > +++ b/include/trace/events/cxl-events.h
> > @@ -0,0 +1,127 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM cxl_events
> > +
> > +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> > +#define _CXL_TRACE_EVENTS_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +#define EVENT_LOGS					\
> > +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
> > +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
> > +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
> > +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
> > +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")
> 
> Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
> dynamic capacity events so I guess it doesn't matter.

I'm not sure why you would say that.  I anticipate some user space daemon
requiring these events to set things up.

> 
> > +
> > +/*
> > + * First define the enums in the above macros to be exported to userspace via
> > + * TRACE_DEFINE_ENUM().
> > + */
> > +#undef EM
> > +#undef EMe
> > +#define EM(a, b)	TRACE_DEFINE_ENUM(a);
> > +#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
> > +
> > +EVENT_LOGS
> > +#define show_log_type(type) __print_symbolic(type, EVENT_LOGS)
> > +
> > +/*
> > + * Now redefine the EM and EMe macros to map the enums to the strings that will
> > + * be printed in the output
> > + */
> > +#undef EM
> > +#undef EMe
> > +#define EM(a, b)        {a, b},
> > +#define EMe(a, b)       {a, b}
> > +
> > +TRACE_EVENT(cxl_event_overflow,
> > +
> > +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > +		 struct cxl_get_event_payload *payload),
> > +
> > +	TP_ARGS(dev_name, log, payload),
> > +
> > +	TP_STRUCT__entry(
> > +		__string(dev_name, dev_name)
> > +		__field(int, log)
> > +		__field(u16, count)
> > +		__field(u64, first)
> > +		__field(u64, last)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__assign_str(dev_name, dev_name);
> > +		__entry->log = log;
> > +		__entry->count = le16_to_cpu(payload->overflow_err_count);
> > +		__entry->first = le64_to_cpu(payload->first_overflow_timestamp);
> > +		__entry->last = le64_to_cpu(payload->last_overflow_timestamp);
> > +	),
> > +
> > +	TP_printk("%s: EVENT LOG %s OVERFLOW %u records from %llu to %llu",
> > +		__get_str(dev_name), show_log_type(__entry->log),
> > +		__entry->count, __entry->first, __entry->last)
> > +
> > +);
> > +
> > +/*
> > + * Common Event Record Format
> > + * CXL v2.0 section 8.2.9.1.1; Table 153
> > + */
> > +#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
> > +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
> > +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
> > +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
> > +#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
> > +	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"Permanent Condition"		}, \
> > +	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"Maintanance Needed"		}, \
> 
> Maintenance

Thanks done.

> 
> > +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> > +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> > +)
> > +
> > +TRACE_EVENT(cxl_event,
> > +
> > +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > +		 struct cxl_event_record_raw *rec),
> > +
> > +	TP_ARGS(dev_name, log, rec),
> > +
> > +	TP_STRUCT__entry(
> > +		__string(dev_name, dev_name)
> > +		__field(int, log)
> > +		__array(u8, id, UUID_SIZE)
> > +		__field(u32, flags)
> > +		__field(u16, handle)
> > +		__field(u16, related_handle)
> > +		__field(u64, timestamp)
> > +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> > +		__field(u8, length)
> 
> Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
> (only noticed because I happen to have that spec revision open rather than 2.0).

Yes done.

There is some discussion with Dan regarding not decoding anything and letting
user space take care of it all.  I think this shows a valid reason Dan
suggested this.

But for now I have added the field.

Thanks for the review,
Ira

> 
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__assign_str(dev_name, dev_name);
> > +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > +		__entry->log = log;
> > +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > +		__entry->length = le32_to_cpu(rec->hdr.flags_length) & 0xFF;
> > +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> > +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > +		memcpy(__entry->data, &rec->data, EVENT_RECORD_DATA_LENGTH);
> > +	),
> > +
> > +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s' " \
> > +		  ": %s",
> > +		__get_str(dev_name), show_log_type(__entry->log),
> > +		__entry->timestamp, __entry->id, __entry->handle,
> > +		__entry->related_handle, show_hdr_flags(__entry->flags),
> > +		__print_hex(__entry->data, EVENT_RECORD_DATA_LENGTH)
> > +		)
> > +);
> > +
> > +#endif /* _CXL_TRACE_EVENTS_H */
> > +
> > +/* This part must be outside protection */
> > +#undef TRACE_INCLUDE_FILE
> > +#define TRACE_INCLUDE_FILE cxl-events
> > +#include <trace/define_trace.h>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-08-17 22:54   ` Dave Jiang
@ 2022-09-07  4:53     ` Ira Weiny
  0 siblings, 0 replies; 51+ messages in thread
From: Ira Weiny @ 2022-09-07  4:53 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

On Wed, Aug 17, 2022 at 03:54:15PM -0700, Jiang, Dave wrote:
> 

[snip]

> > +
> > +/*
> > + * Now redefine the EM and EMe macros to map the enums to the strings that will
> > + * be printed in the output
> > + */
> > +#undef EM
> > +#undef EMe
> > +#define EM(a, b)        {a, b},
> > +#define EMe(a, b)       {a, b}
> > +
> > +TRACE_EVENT(cxl_event_overflow,
> 
> Kind of a general comment for the event names. Maybe just "overflow" instead
> of "cxl_event_overflow" since it shows up in sysfs under the cxl_events
> directory and becomes redundant?

It is redundant for sure...  The problem is that trace_overflow() is pretty
generic but I think it will work.

Ira

> 
> - Dave
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-09-07  4:28     ` Ira Weiny
@ 2022-09-08 12:52       ` Jonathan Cameron
  2022-09-09 20:53         ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-09-08 12:52 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl


> > > +static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > > +				   enum cxl_event_log_type type)
> > > +{
> > > +	struct cxl_get_event_payload payload;
> > > +
> > > +	do {
> > > +		u8 log_type = type;
> > > +		u16 record_count;
> > > +		int rc;
> > > +
> > > +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > > +				       &log_type, sizeof(log_type),
> > > +				       &payload, sizeof(payload));
> > > +		if (rc)
> > > +			return rc;
> > > +
> > > +		record_count = le16_to_cpu(payload.record_count);
> > > +		if (record_count > 0)  
> > 
> > If it is anything other than 1 you have a problem..  So fornow
> > I would check for that.  
> 
> I assume you mean if there are any records at all?  For the next version I've
> checked for 1 here but 0 is also valid if there are no records to return.  So
> != 1 is not an error.

Yes, I must meant if (record_count == 1)
for this case..

> 
> (Currently all logs are checked when the event records are queried and some may
> be empty.  I don't plan on trying to distinguish the various interrupts.)
> 
> >   
> > > +			trace_cxl_event(dev_name(cxlds->dev), type,
> > > +					&payload.record);
> > > +
> > > +		if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> > > +			trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> > > +						 &payload);
> > > +
> > > +	} while (payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > +
> > > +	return 0;
> > > +}
> > > +

> > >   * cxl_mem_get_partition_info - Get partition info
> > >   * @cxlds: The device data for the operation
> > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > index 88e3a8e54b6a..f83634f3bc8d 100644
> > > --- a/drivers/cxl/cxlmem.h
> > > +++ b/drivers/cxl/cxlmem.h
> > > @@ -4,6 +4,7 @@
> > >  #define __CXL_MEM_H__
> > >  #include <uapi/linux/cxl_mem.h>
> > >  #include <linux/cdev.h>
> > > +#include <linux/uuid.h>
> > >  #include "cxl.h"
> > >  
> > >  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> > > @@ -253,6 +254,7 @@ struct cxl_dev_state {
> > >  enum cxl_opcode {
> > >  	CXL_MBOX_OP_INVALID		= 0x0000,
> > >  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> > > +	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
> > >  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
> > >  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
> > >  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> > > @@ -322,6 +324,69 @@ struct cxl_mbox_identify {
> > >  	u8 qos_telemetry_caps;
> > >  } __packed;
> > >  
> > > +/*
> > > + * Common Event Record Format
> > > + * CXL v3.0 section 8.2.9.2.1; Table 8-42
> > > + */
> > > +struct cxl_event_record_hdr {
> > > +	uuid_t id;
> > > +	__le32 flags_length;  
> > 
> > Can you split this into a u8 and
> > u8[3] then use the get_unaligned_le24 accessor
> > where appropriate? Oh for 24bit types ;)  
> 
> Sure!  Another function I did not know about.
> 
> So the following should be correct ordering?
> 
> ...
> 	uuid_t id;
> 	u8 length;
> 	u8 flags[3];
> 	__le16 handle;
> ...
> 
Looks good.

> There are other records which may work better this way too.
> 
> >   
> > > +	__le16 handle;
> > > +	__le16 related_handle;
> > > +	__le64 timestamp;
> > > +	__le64 reserved1;  
> > 
> > As below. Maintenance op from CXL 3.0?  Seems easy
> > to add now rather than needing a change later.  
> 
> Yes I see it now.  Added.
> 
> >   
> > > +	__le64 reserved2;
> > > +} __packed;
> > > +
> > > +#define EVENT_RECORD_DATA_LENGTH 0x50
> > > +struct cxl_event_record_raw {
> > > +	struct cxl_event_record_hdr hdr;
> > > +	u8 data[EVENT_RECORD_DATA_LENGTH];
> > > +} __packed;
> > > +
> > > +/*
> > > + * Get Event Records output payload
> > > + * CXL v3.0 section 8.2.9.2.2; Table 8-50  
> > 
> > r3.0 :) (just drop the v and go with 3.0 would be my preference).  
> 
> Can do.
> 
> >   
> > > + *
> > > + * Space given for 1 record
> > > + */
> > > +#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
> > > +#define CXL_GET_EVENT_FLAG_MORE_RECORDS	BIT(1)
> > > +struct cxl_get_event_payload {
> > > +	u8 flags;
> > > +	u8 reserved1;
> > > +	__le16 overflow_err_count;
> > > +	__le64 first_overflow_timestamp;
> > > +	__le64 last_overflow_timestamp;
> > > +	__le16 record_count;
> > > +	u8 reserved2[0xa];
> > > +	struct cxl_event_record_raw record;
> > > +} __packed;
> > > +
> > > +enum cxl_event_log_type {
> > > +	CXL_EVENT_TYPE_INFO = 0x00,
> > > +	CXL_EVENT_TYPE_WARN,
> > > +	CXL_EVENT_TYPE_FAIL,
> > > +	CXL_EVENT_TYPE_FATAL,  
> > 
> > Worth putting Dynamic capacity in now? Up to you.  
> 
> Might as well.
> 
> >   
> > > +	CXL_EVENT_TYPE_MAX
> > > +};  
> > 
> > Blank line for readability.  
> 
> Done.
> 
> >   
> > > +static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
> > > +{
> > > +	switch (type) {
> > > +	case CXL_EVENT_TYPE_INFO:
> > > +		return "Informational";
> > > +	case CXL_EVENT_TYPE_WARN:
> > > +		return "Warning";
> > > +	case CXL_EVENT_TYPE_FAIL:
> > > +		return "Failure";
> > > +	case CXL_EVENT_TYPE_FATAL:
> > > +		return "Fatal";
> > > +	default:
> > > +		break;
> > > +	}
> > > +	return "<unknown>";
> > > +}
> > > +
> > >  struct cxl_mbox_get_partition_info {
> > >  	__le64 active_volatile_cap;
> > >  	__le64 active_persistent_cap;
> > > @@ -381,6 +446,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
> > >  struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
> > >  void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> > >  void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> > > +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
> > >  #ifdef CONFIG_CXL_SUSPEND
> > >  void cxl_mem_active_inc(void);
> > >  void cxl_mem_active_dec(void);
> > > diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> > > new file mode 100644
> > > index 000000000000..f4baeae66cf3
> > > --- /dev/null
> > > +++ b/include/trace/events/cxl-events.h
> > > @@ -0,0 +1,127 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#undef TRACE_SYSTEM
> > > +#define TRACE_SYSTEM cxl_events
> > > +
> > > +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> > > +#define _CXL_TRACE_EVENTS_H
> > > +
> > > +#include <linux/tracepoint.h>
> > > +
> > > +#define EVENT_LOGS					\
> > > +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
> > > +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
> > > +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
> > > +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
> > > +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")  
> > 
> > Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
> > dynamic capacity events so I guess it doesn't matter.  
> 
> I'm not sure why you would say that.  I anticipate some user space daemon
> requiring these events to set things up.

Certainly a possible solution. I'd kind of expect a more hand shake based approach
than a tracepoint.  Guess we'll see :)


> >   
> > > +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> > > +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> > > +)
> > > +
> > > +TRACE_EVENT(cxl_event,
> > > +
> > > +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > > +		 struct cxl_event_record_raw *rec),
> > > +
> > > +	TP_ARGS(dev_name, log, rec),
> > > +
> > > +	TP_STRUCT__entry(
> > > +		__string(dev_name, dev_name)
> > > +		__field(int, log)
> > > +		__array(u8, id, UUID_SIZE)
> > > +		__field(u32, flags)
> > > +		__field(u16, handle)
> > > +		__field(u16, related_handle)
> > > +		__field(u64, timestamp)
> > > +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> > > +		__field(u8, length)  
> > 
> > Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
> > (only noticed because I happen to have that spec revision open rather than 2.0).  
> 
> Yes done.
> 
> There is some discussion with Dan regarding not decoding anything and letting
> user space take care of it all.  I think this shows a valid reason Dan
> suggested this.

I like being able to print tracepoints with out userspace tools.
This also enforces structure and stability of interface which I like.

Maybe a raw tracepoint or variable length trailing buffer to pass
on what we don't understand?

Jonathan



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-09-08 12:52       ` Jonathan Cameron
@ 2022-09-09 20:53         ` Ira Weiny
  2022-09-20 15:49           ` Jonathan Cameron
  0 siblings, 1 reply; 51+ messages in thread
From: Ira Weiny @ 2022-09-09 20:53 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, Sep 08, 2022 at 01:52:40PM +0100, Jonathan Cameron wrote:
> 

[snip]

> > > > diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> > > > new file mode 100644
> > > > index 000000000000..f4baeae66cf3
> > > > --- /dev/null
> > > > +++ b/include/trace/events/cxl-events.h
> > > > @@ -0,0 +1,127 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +#undef TRACE_SYSTEM
> > > > +#define TRACE_SYSTEM cxl_events
> > > > +
> > > > +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> > > > +#define _CXL_TRACE_EVENTS_H
> > > > +
> > > > +#include <linux/tracepoint.h>
> > > > +
> > > > +#define EVENT_LOGS					\
> > > > +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
> > > > +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
> > > > +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
> > > > +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
> > > > +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")  
> > > 
> > > Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
> > > dynamic capacity events so I guess it doesn't matter.  
> > 
> > I'm not sure why you would say that.  I anticipate some user space daemon
> > requiring these events to set things up.
> 
> Certainly a possible solution. I'd kind of expect a more hand shake based approach
> than a tracepoint.  Guess we'll see :)

Yea I think we should wait an see.

> 
> 
> > >   
> > > > +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> > > > +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> > > > +)
> > > > +
> > > > +TRACE_EVENT(cxl_event,
> > > > +
> > > > +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > > > +		 struct cxl_event_record_raw *rec),
> > > > +
> > > > +	TP_ARGS(dev_name, log, rec),
> > > > +
> > > > +	TP_STRUCT__entry(
> > > > +		__string(dev_name, dev_name)
> > > > +		__field(int, log)
> > > > +		__array(u8, id, UUID_SIZE)
> > > > +		__field(u32, flags)
> > > > +		__field(u16, handle)
> > > > +		__field(u16, related_handle)
> > > > +		__field(u64, timestamp)
> > > > +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> > > > +		__field(u8, length)  
> > > 
> > > Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
> > > (only noticed because I happen to have that spec revision open rather than 2.0).  
> > 
> > Yes done.
> > 
> > There is some discussion with Dan regarding not decoding anything and letting
> > user space take care of it all.  I think this shows a valid reason Dan
> > suggested this.
> 
> I like being able to print tracepoints with out userspace tools.
> This also enforces structure and stability of interface which I like.

I tend to agree with you.

> 
> Maybe a raw tracepoint or variable length trailing buffer to pass
> on what we don't understand?

I've already realized that we need to print all reserved fields for this
reason.  If there is something the kernel does not understand user space can
just figure it out on it's own.

Sound reasonable?

Ira

> 
> Jonathan
> 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 2/9] cxl/mem: Implement Clear Event Records command
  2022-08-24 15:55   ` Jonathan Cameron
@ 2022-09-09 21:35     ` Ira Weiny
  0 siblings, 0 replies; 51+ messages in thread
From: Ira Weiny @ 2022-09-09 21:35 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Aug 24, 2022 at 04:55:13PM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:36 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL v3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > command.  After an event record is read it needs to be cleared from the
> > event log.
> > 
> > Implement cxl_clear_event_record() and call it for each record retrieved
> > from the device.
> > 
> > Each record is cleared individually.  A clear all bit is specified but
> > events could arrive between a get and the final clear all operation.
> > Therefore each event is cleared specifically.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> Trivial suggestions inline, but other than that LGTM
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks!

> 
> > ---
> >  drivers/cxl/core/mbox.c      | 31 ++++++++++++++++++++++++++++---
> >  drivers/cxl/cxlmem.h         | 15 +++++++++++++++
> >  include/uapi/linux/cxl_mem.h |  1 +
> >  3 files changed, 44 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 2cceed8608dc..493f5ceb5d1c 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> >  #endif
> >  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> >  	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> > +	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
> >  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> >  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> >  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> > @@ -708,6 +709,26 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> >  
> > +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> > +				  enum cxl_event_log_type log,
> > +				  __le16 handle)
> > +{
> > +	struct cxl_mbox_clear_event_payload payload;
> > +	int rc;
> > +
> > +	memset(&payload, 0, sizeof(payload));
> 
> Could just do payload = {};
> 
> Thouch as you are setting stuff, why not just do
> 
> payload = {
> 	.event_log = log,
> 	.nr_recs = 1,
> 	.handle = handle,
> };
> and let the compiler zero anything else (I think there are no holes to complicate
> things).

Yea!  Done.

> 
> > +	payload.event_log = log;
> > +	payload.nr_recs = 1;
> > +	payload.handle = handle;
> > +
> > +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> > +			       &payload, sizeof(payload), NULL, 0);
> 
> return cxl_mbox_send_cmd() and drop rc definition.

And Done.  I've also used the return value now!  ;-)

Thanks again!
Ira

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record
  2022-08-24 16:11   ` Jonathan Cameron
@ 2022-09-12 22:38     ` Ira Weiny
  2022-09-20 15:52       ` Jonathan Cameron
  0 siblings, 1 reply; 51+ messages in thread
From: Ira Weiny @ 2022-09-12 22:38 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Aug 24, 2022 at 05:11:13PM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:38 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL v3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
> > 
> > Determine if the event read is a general media record and if so trace
> > the record.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> A few trivial things inline...
> 

[snip]

> > +/*
> > + * General Media Event Record - GMER
> > + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> > + */
> > +#define CXL_GMER_PHYS_ADDR_VOLATILE			BIT(0)
> > +#define CXL_GMER_PHYS_ADDR_MASK				0x3f
> 
> Inverse of mask is confusing. Just specify the full mask.

Fixed

[snip]

> > +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> > +		  "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> > +		  "rank=%u device=%x comp_id=%s valid_flags='%s'",
> > +		__get_str(dev_name), show_log_type(__entry->log),
> > +		__entry->timestamp, __entry->id, __entry->handle,
> > +		__entry->related_handle, show_hdr_flags(__entry->flags),
> > +		__entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> > +		(__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> > +		show_event_desc_flags(__entry->descriptor),
> > +		show_mem_event_type(__entry->type),
> > +		show_trans_type(__entry->transaction_type),
> > +		__entry->channel, __entry->rank, __entry->device,
> > +		__print_hex(__entry->comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE),
> > +		show_valid_flags(__entry->validity_flags)
> 
> Can we make the printing of fields with valid flags conditional?
> Been a while since I wrote a Trace point, but I think I recall doing that..

I'm not seeing a way right off.  But I can't say it is impossible...

I'll keep an eye out as I clean the series up,
Ira

> 
> > +		)
> > +);
> > +
> >  #endif /* _CXL_TRACE_EVENTS_H */
> >  
> >  /* This part must be outside protection */
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 5/9] cxl/mem: Trace DRAM Event Record
  2022-08-25 10:46   ` Jonathan Cameron
@ 2022-09-12 23:04     ` Ira Weiny
  2022-09-20 16:02       ` Jonathan Cameron
  0 siblings, 1 reply; 51+ messages in thread
From: Ira Weiny @ 2022-09-12 23:04 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, Aug 25, 2022 at 11:46:32AM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:39 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL v3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
> > 
> > Determine if the event read is a DRAM event record and if so trace the
> > record.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > This record has a very odd byte layout with 2 - 16 bit fields
> > (validity_flags and column) aligned on an odd byte boundary.  In
> > addition nibble_mask and row are oddly aligned.
> > 
> > I've made my best guess as to how the endianess of these fields should
> > be resolved.  But I'm happy to hear from other folks if what I have is
> > wrong.
> My assumption is same as you.  We should sanity check of course by
> poking relevant people.  
> 
> Similar comments in here to previous.  Use the get_unaligned_le24()
> accessors + consider not printing invalid fields.

Yea I've already converted the 3 byte fields to get_unaligned_le24()

> > 
> > struct cxl_evt_dram_rec {
> > 	struct cxl_event_record_hdr hdr;
> > 	__le64 phys_addr;
> > 	u8 descriptor;
> > 	u8 type;
> > 	u8 transaction_type;
> > 	u16 validity_flags;
> > 	u8 channel;
> > 	u8 rank;
> > 	u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
> > 	u8 bank_group;
> > 	u8 bank;
> > 	u8 row[CXL_EVT_DER_ROW_SIZE];
> > 	u16 column;
> > 	u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> > } __packed;
> > ---
> >  drivers/cxl/core/mbox.c           |  16 +++++
> >  drivers/cxl/cxlmem.h              |  24 +++++++
> >  include/trace/events/cxl-events.h | 114 ++++++++++++++++++++++++++++++
> >  3 files changed, 154 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 0e433f072163..6414588a3c7b 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -717,6 +717,14 @@ static const uuid_t gen_media_event_uuid =
> >  	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> >  		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
> >  
> > +/*
> > + * DRAM Event Record
> > + * CXL v3.0 section 8.2.9.2.1.2; Table 8-44
> rev3.0, r3.0 or just 3.0  

Already done.

> 
> > + */
> > +static const uuid_t dram_event_uuid =
> > +	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> > +		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
> > +
> >  static void cxl_trace_event_record(const char *dev_name,
> >  				   enum cxl_event_log_type type,
> >  				   struct cxl_get_event_payload *payload)
> > @@ -731,6 +739,14 @@ static void cxl_trace_event_record(const char *dev_name,
> >  		return;
> >  	}
> >  
> > +	if (uuid_equal(id, &dram_event_uuid)) {
> Why not else if?  Should be obvious to compiler that multiple uuid_equal
> conditions can't match, but even better to not make it try hard perhaps?

Sure else if can work.

> 
> > +		struct cxl_evt_dram_rec *rec =
> > +				(struct cxl_evt_dram_rec *)&payload->record;
> > +
> > +		trace_cxl_dram_event(dev_name, type, rec);
> > +		return;
> > +	}
> > +
> >  	/* For unknown record types print just the header */
> >  	trace_cxl_event(dev_name, type, &payload->record);
> >  }
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 33669459ae4b..50536c0a7850 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -421,6 +421,30 @@ struct cxl_evt_gen_media {
> >  	u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> >  } __packed;
> >  
> > +/*
> > + * DRAM Event Record - DER
> > + * CXL v3.0 section 8.2.9.2.1.2; Table 3-44
> > + */
> > +#define CXL_EVT_DER_NIBBLE_MASK_SIZE		3
> > +#define CXL_EVT_DER_ROW_SIZE			3
> > +#define CXL_EVT_DER_CORRECTION_MASK_SIZE	0x20
> > +struct cxl_evt_dram_rec {
> > +	struct cxl_event_record_hdr hdr;
> > +	__le64 phys_addr;
> > +	u8 descriptor;
> > +	u8 type;
> > +	u8 transaction_type;
> > +	u16 validity_flags;
> I've not tried it, but can we just mark these as __le16 and use
> the unaligned accessors?  get_unaligned_le16 etc

get_unaligned_le16() requires a byte array...

So I think this needs to be:

	u8 validity_flags[2];

Now that I know about those calls I think this does make a lot more sense.  The
test code works but I knew that it would be sketchy with real devices.

I'll adjust this.

> Also there is get_unaligned_le24() for the 3 byte ones.

Yea done.

[snip]

> > +
> > +	TP_fast_assign(
> > +		/* Common */
> > +		__assign_str(dev_name, dev_name);
> > +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > +		__entry->log = log;
> > +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> > +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > +
> > +		/* DRAM */
> > +		__entry->phys_addr = le64_to_cpu(rec->phys_addr);
> > +		__entry->descriptor = rec->descriptor;
> > +		__entry->type = rec->type;
> > +		__entry->transaction_type = rec->transaction_type;
> > +		__entry->validity_flags = le16_to_cpu(rec->validity_flags);
> > +		__entry->channel = rec->channel;
> > +		__entry->rank = rec->rank;
> > +		__entry->nibble_mask = rec->nibble_mask[0] << 24 |
> > +				       rec->nibble_mask[1] << 16 |
> > +				       rec->nibble_mask[2] << 8; /* 3 byte LE ? */
> 
> Use get_unalinged_le24() ? I'd definitely expect these to be le24.
> 
> 
> > +		__entry->nibble_mask = le32_to_cpu(__entry->nibble_mask);
> 
> That doesn't look right.  You will have unwound the endianness using
> the shifts above. Don't convert it again (noop on le systems, so you
> probably won't see a problem when testing).

I thought I did it right with 2 shifts.  But regardless using
get_unalinged_le24() is better and I've already changed it.

> 
> > +		__entry->bank_group = rec->bank_group;
> > +		__entry->bank = rec->bank;
> > +		__entry->row = rec->row[0] << 24 |
> > +			       rec->row[1] << 16 |
> > +			       rec->row[2] << 8; /* 3 byte LE ? */
> 
> get_unaligned_le24()

... and this one.

> 
> > +		__entry->row = le32_to_cpu(__entry->row);
> 
> > +		__entry->column = le16_to_cpu(rec->column);
> > +		memcpy(__entry->cor_mask, &rec->correction_mask,
> > +			CXL_EVT_DER_CORRECTION_MASK_SIZE);
> > +	),
> > +
> > +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> > +		  "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> > +		  "rank=%u nibble_mask=%x bank_group=%u bank=%u row=%u column=%u " \
> > +		  "cor_mask=%s valid_flags='%s'",
> > +		__get_str(dev_name), show_log_type(__entry->log),
> > +		__entry->timestamp, __entry->id, __entry->handle,
> > +		__entry->related_handle, show_hdr_flags(__entry->flags),
> > +		__entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> > +		(__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> > +		show_event_desc_flags(__entry->descriptor),
> As before can we not print the invalid ones based on the validity flags?
> 
> Few years ago now, but I did something along those lines for the CCIX equivalent of
> this stuff.  (honestly can't remember much about it now though!)
> Was a bit fiddly but lead to nicer prints in my opinion.
> 
> https://lore.kernel.org/all/20191114133919.32290-2-Jonathan.Cameron@huawei.com/

I'm still not seeing anything which alters the actual print in this patch or
ras_event.h

Perhaps I'm missing what you mean by selecting the valid fields.

Something will have to change the TP_printk() format itself from what I can see
and I don't see a way to do that within the trace infrastructure.

We _could_ do that within the C code where trace_dram() is called.  But I'd
like to keep all the info together and let user space decode more than what the
kernel may know.

Ira

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 6/9] cxl/mem: Trace Memory Module Event Record
  2022-08-25 10:58   ` Jonathan Cameron
@ 2022-09-14 21:17     ` Ira Weiny
  2022-09-20 16:11       ` Jonathan Cameron
  0 siblings, 1 reply; 51+ messages in thread
From: Ira Weiny @ 2022-09-14 21:17 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, Aug 25, 2022 at 11:58:42AM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:40 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL v3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> > 
> > Determine if the event read is memory module record and if so trace the
> > record.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> Similar comments to on previous patches around using
> get_unaligned_le*()

Yep...

[snip]

> >  
> > +/*
> > + * Get Health Info Record
> > + * CXL v3.0 section 8.2.9.8.3.1; Table 8-100
> > + */
> > +struct cxl_get_health_info {
> > +	u8 health_status;
> > +	u8 media_status;
> > +	u8 add_status;
> > +	u8 life_used;
> > +	u16 device_temp;
> 
> As previous - even though they aren't aligned, I'd have thought
> __le16 etc will still work.  The unaligned accessors are fine
> taking __le16 * for example.

Ok my bad on using u16 here and I will change it.  I 100% agree that these
should be __le16/__le32.  That said there is no need to use the unaligned
accessors for the 16/32 bit fields.

The unaligned accessors cast the pointer to a __le16/__le32 type and no
architecture redefines those.  So using le{16,32}_to_cpu() should work just
fine on all archs.

[snip]

> > +
> > +	TP_fast_assign(
> > +		/* Common */
> > +		__assign_str(dev_name, dev_name);
> > +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > +		__entry->log = log;
> > +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> > +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > +
> > +		/* Memory Module Event */
> > +		__entry->event_type = rec->event_type;
> > +
> > +		/* Device Health Info */
> > +		__entry->health_status = rec->info.health_status;
> > +		__entry->media_status = rec->info.media_status;
> > +		__entry->life_used = rec->info.life_used;
> > +		__entry->dirty_shutdown_cnt = le32_to_cpu(rec->info.dirty_shutdown_cnt);
> > +		__entry->cor_vol_err_cnt = le32_to_cpu(rec->info.cor_vol_err_cnt);
> 
> I've lost track, but my guess is some / all of these need the unaligned_get_le32()
> etc rather than aligned form.  Maybe just be lazy and use the unaligned versions
> even when things happen to be aligned - then we don't have to think about it
> when reviewing :)

See above.  I think the 16/32 bit fields work as intended except for my lack of
using the correct type.

Ira

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 7/9] cxl/test: Add generic mock events
  2022-08-25 11:31   ` Jonathan Cameron
@ 2022-09-15 18:53     ` Ira Weiny
  2022-09-20 16:17       ` Jonathan Cameron
  0 siblings, 1 reply; 51+ messages in thread
From: Ira Weiny @ 2022-09-15 18:53 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:41 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Facilitate testing basic Get/Clear Event functionality by creating
> > multiple logs and generic events with made up UUID's.
> > 
> > Data is completely made up with data patterns which should be easy to
> > spot in trace output.
> Hi Ira,
> 
> I'm tempted to hack the QEMU emulation for this in with appropriately
> complex interface to inject all the record types...

Every time I look at the QEMU code it makes my head spin.  :-(

I really thought about adding some support there.  And I think for irq's it may
work better?  But after your talk today I did a quick search to see what it
would take to do irqs in QEMU and got even more confused.  :-(

> Lots to do there though, so not sure where this fits in my priority list!

I bet it is higher on mine!  ;-)

> 
> > 
> > Test traces are easy to obtain with a small script such as this:
> > 
> > 	#!/bin/bash -x
> > 
> > 	devices=`find /sys/devices/platform -name cxl_mem*`
> > 
> > 	# Generate fake events if reset is passed in
> 
> reset is rather unintuitive naming.
> 
> fill_event_queue maybe or something more in that direction?

Fair enough...  Naming is hard and I'm one of the worst.

I've changed to

<sysfs>/.../event_fill_queue
<sysfs>/.../event_trigger

Thoughts?

[snip]

> >  
> > +/*
> > + * Mock Events
> > + */
> > +struct mock_event_log {
> > +	int cur_event;
> > +	int nr_events;
> > +	struct xarray events;
> 
> I'm not convinced an xarray is appropriate here (I'd have used
> a fixed size array) but meh, I don't care that much and mocking
> code doesn't have to be quick or elegant :)

I rather thought the xarray was more elegant than the fixed array.

> 
> > +};
> > +
> > +struct mock_event_store {
> > +	struct cxl_dev_state *cxlds;
> > +	struct mock_event_log *mock_logs[CXL_EVENT_TYPE_MAX];
> 
> Each entry isn't terribly big and there aren't that many of them.
> Make the code simpler by just embedding the instances here?

That is a good idea.  Not sure any more why I did it this way.

[snip]

> > +
> > +static void event_store_add_event(struct mock_event_store *es,
> > +				  enum cxl_event_log_type log_type,
> > +				  struct cxl_event_record_raw *event)
> > +{
> > +	struct mock_event_log *log;
> > +	struct device *dev = es->cxlds->dev;
> > +	int rc;
> > +
> > +	if (log_type >= CXL_EVENT_TYPE_MAX)
> > +		return;
> > +
> > +	log = es->mock_logs[log_type];
> > +	if (!log) {
> > +		log = devm_kzalloc(dev, sizeof(*log), GFP_KERNEL);
> 
> As above, I'd just embed the logs directly in the containing structure
> rather than allocating on demand. init them all up front.

yep.  Done.

> 
> > +		if (!log) {
> > +			dev_err(dev, "Failed to create %s log\n",
> > +				cxl_event_log_type_str(log_type));
> > +			return;
> > +		}
> > +		xa_init(&log->events);
> > +		devm_add_action(dev, xa_events_destroy, log);
> > +		es->mock_logs[log_type] = log;
> > +	}
> > +
> > +	rc = xa_insert(&log->events, log->nr_events, event, GFP_KERNEL);
> Not sure using an xa for a list really makes that much sense, but
> doesn't matter hugely. 

It is much easier than trying to manage pointers and allows the events to be
inserted more than once.

> > +	if (rc) {
> > +		dev_err(dev, "Failed to store event %s log\n",
> > +			cxl_event_log_type_str(log_type));
> > +		return;
> > +	}
> > +	log->nr_events++;
> 
> Having an index into a static set of events is more complex.
> I'd either switch to a simple array of pointers, or actually add and
> remove events (or pointers to them anyway).

xarray was much easier to deal with than an array of pointers.  Using a list
was hard because I wanted to reuse the static definitions of events rather than
have a bunch of them defined.

[snip]

> > +
> > +/*
> > + * Get and clear event only handle 1 record at a time as this is what is
> > + * currently implemented in the main code.
> 
> Duplicating this comment seems unnecessary.

I wanted to make it clear this test code could only test what was currently
implemented...

>  
> > + */
> > +static int mock_clear_event(struct cxl_dev_state *cxlds,
> > +			    struct cxl_mbox_cmd *cmd)
> > +{
> > +	struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
> > +	struct mock_event_log *log;
> > +	u8 log_type = pl->event_log;
> > +
> > +	/* Don't handle more than 1 record at a time */
> > +	if (pl->nr_recs != 1)
> > +		return -EINVAL;

... and this check ...

> > +
> > +	if (log_type >= CXL_EVENT_TYPE_MAX)
> > +		return -EINVAL;
> > +
> > +	log = find_event_log(cxlds, log_type);
> > +	if (!log)
> > +		return 0; /* No mock data in this log */
> > +
> > +	/*
> > +	 * The current code clears events as they are read
> > +	 * Test that behavior; not clearning from the middle of the log
> > +	 */

... and this one; prevents it from blowing up.

[snip]

> > +
> > +static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
> > +{
> > +	struct device *dev = &cxlmd->dev;
> > +	struct mock_event_store *es;
> > +
> > +	/*
> > +	 * The memory device gets the sysfs attributes such that the cxlmd
> > +	 * pointer can be used to get to a cxlds pointer.
> > +	 */
> > +	if (device_add_groups(dev, cxl_mock_event_groups))
> 
> Whilst it might not matter in a mocking driver, it's normal to jump through
> hoops to avoid doing this because it races with userspace notifications in
> all sorts of hideous ways.  It makes the sysfs maintainers very grumpy ;)

<sigh> I know this is a hack...  I really wanted to hang this off of cxlds but
it did not make sense.

> To do it here, you would need to pass the group to devm_cxl_add_memdev()
> and have that slip it in before the cdev_device_add() call I think.
> That wouldn't be particular invasive though. 

I guess that would work and yea I guess it is not too invasive.

I'll throw it together for the next version and see how it looks/works.

> 
> 
> > +		return;
> > +	if (devm_add_action_or_reset(dev, remove_mock_event_groups, dev))
> > +		return;
> > +
> > +	/*
> > +	 * All the mock event data hangs off the device itself.
> 
> Nitpick of the day: Single line comment syntax ;)

:-D

Done.

Thanks again for the review!
Ira

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-09-09 20:53         ` Ira Weiny
@ 2022-09-20 15:49           ` Jonathan Cameron
  2022-09-20 20:23             ` Dave Jiang
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-09-20 15:49 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 9 Sep 2022 13:53:55 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Thu, Sep 08, 2022 at 01:52:40PM +0100, Jonathan Cameron wrote:
> >   
> 
> [snip]
> 
> > > > > diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> > > > > new file mode 100644
> > > > > index 000000000000..f4baeae66cf3
> > > > > --- /dev/null
> > > > > +++ b/include/trace/events/cxl-events.h
> > > > > @@ -0,0 +1,127 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > > +#undef TRACE_SYSTEM
> > > > > +#define TRACE_SYSTEM cxl_events
> > > > > +
> > > > > +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> > > > > +#define _CXL_TRACE_EVENTS_H
> > > > > +
> > > > > +#include <linux/tracepoint.h>
> > > > > +
> > > > > +#define EVENT_LOGS					\
> > > > > +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
> > > > > +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
> > > > > +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
> > > > > +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
> > > > > +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")    
> > > > 
> > > > Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
> > > > dynamic capacity events so I guess it doesn't matter.    
> > > 
> > > I'm not sure why you would say that.  I anticipate some user space daemon
> > > requiring these events to set things up.  
> > 
> > Certainly a possible solution. I'd kind of expect a more hand shake based approach
> > than a tracepoint.  Guess we'll see :)  
> 
> Yea I think we should wait an see.
> 
> > 
> >   
> > > >     
> > > > > +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> > > > > +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> > > > > +)
> > > > > +
> > > > > +TRACE_EVENT(cxl_event,
> > > > > +
> > > > > +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > > > > +		 struct cxl_event_record_raw *rec),
> > > > > +
> > > > > +	TP_ARGS(dev_name, log, rec),
> > > > > +
> > > > > +	TP_STRUCT__entry(
> > > > > +		__string(dev_name, dev_name)
> > > > > +		__field(int, log)
> > > > > +		__array(u8, id, UUID_SIZE)
> > > > > +		__field(u32, flags)
> > > > > +		__field(u16, handle)
> > > > > +		__field(u16, related_handle)
> > > > > +		__field(u64, timestamp)
> > > > > +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> > > > > +		__field(u8, length)    
> > > > 
> > > > Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
> > > > (only noticed because I happen to have that spec revision open rather than 2.0).    
> > > 
> > > Yes done.
> > > 
> > > There is some discussion with Dan regarding not decoding anything and letting
> > > user space take care of it all.  I think this shows a valid reason Dan
> > > suggested this.  
> > 
> > I like being able to print tracepoints with out userspace tools.
> > This also enforces structure and stability of interface which I like.  
> 
> I tend to agree with you.
> 
> > 
> > Maybe a raw tracepoint or variable length trailing buffer to pass
> > on what we don't understand?  
> 
> I've already realized that we need to print all reserved fields for this
> reason.  If there is something the kernel does not understand user space can
> just figure it out on it's own.
> 
> Sound reasonable?

Hmm. Printing reserved fields would be unusual.  Not sure what is done for similar
cases elsewhere, CPER records etc...

We could just print a raw array of the whole event as well as decode version, but
that means logging most of the fields twice...

Not nice either.

I'm a bit inclined to say we should maybe just ignore stuff we don't know about or
is there a version number we can use to decide between decoded vs decoded as much as
possible + raw log?

Jonathan

> 
> Ira
> 
> > 
> > Jonathan
> > 
> >   


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record
  2022-09-12 22:38     ` Ira Weiny
@ 2022-09-20 15:52       ` Jonathan Cameron
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Cameron @ 2022-09-20 15:52 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Mon, 12 Sep 2022 15:38:21 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Wed, Aug 24, 2022 at 05:11:13PM +0100, Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 22:32:38 -0700
> > ira.weiny@intel.com wrote:
> >   
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > CXL v3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
> > > 
> > > Determine if the event read is a general media record and if so trace
> > > the record.
> > > 
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>  
> > A few trivial things inline...
> >   
> 
> [snip]
> 
> > > +/*
> > > + * General Media Event Record - GMER
> > > + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> > > + */
> > > +#define CXL_GMER_PHYS_ADDR_VOLATILE			BIT(0)
> > > +#define CXL_GMER_PHYS_ADDR_MASK				0x3f  
> > 
> > Inverse of mask is confusing. Just specify the full mask.  
> 
> Fixed
> 
> [snip]
> 
> > > +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> > > +		  "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> > > +		  "rank=%u device=%x comp_id=%s valid_flags='%s'",
> > > +		__get_str(dev_name), show_log_type(__entry->log),
> > > +		__entry->timestamp, __entry->id, __entry->handle,
> > > +		__entry->related_handle, show_hdr_flags(__entry->flags),
> > > +		__entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> > > +		(__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> > > +		show_event_desc_flags(__entry->descriptor),
> > > +		show_mem_event_type(__entry->type),
> > > +		show_trans_type(__entry->transaction_type),
> > > +		__entry->channel, __entry->rank, __entry->device,
> > > +		__print_hex(__entry->comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE),
> > > +		show_valid_flags(__entry->validity_flags)  
> > 
> > Can we make the printing of fields with valid flags conditional?
> > Been a while since I wrote a Trace point, but I think I recall doing that..  
> 
> I'm not seeing a way right off.  But I can't say it is impossible...

Needs some helper code... Here's one I made earlier (and had almost entirely
banished from my memory!)

https://lore.kernel.org/all/20191114133919.32290-2-Jonathan.Cameron@huawei.com/

> 
> I'll keep an eye out as I clean the series up,
> Ira
> 
> >   
> > > +		)
> > > +);
> > > +
> > >  #endif /* _CXL_TRACE_EVENTS_H */
> > >  
> > >  /* This part must be outside protection */  
> >   


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 5/9] cxl/mem: Trace DRAM Event Record
  2022-09-12 23:04     ` Ira Weiny
@ 2022-09-20 16:02       ` Jonathan Cameron
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Cameron @ 2022-09-20 16:02 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Mon, 12 Sep 2022 16:04:07 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Thu, Aug 25, 2022 at 11:46:32AM +0100, Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 22:32:39 -0700
> > ira.weiny@intel.com wrote:
> >   
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > CXL v3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
> > > 
> > > Determine if the event read is a DRAM event record and if so trace the
> > > record.
> > > 
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > ---
> > > This record has a very odd byte layout with 2 - 16 bit fields
> > > (validity_flags and column) aligned on an odd byte boundary.  In
> > > addition nibble_mask and row are oddly aligned.
> > > 
> > > I've made my best guess as to how the endianess of these fields should
> > > be resolved.  But I'm happy to hear from other folks if what I have is
> > > wrong.  
> > My assumption is same as you.  We should sanity check of course by
> > poking relevant people.  
> > 
> > Similar comments in here to previous.  Use the get_unaligned_le24()
> > accessors + consider not printing invalid fields.  
> 
> Yea I've already converted the 3 byte fields to get_unaligned_le24()
> 
> > > 
> > > struct cxl_evt_dram_rec {
> > > 	struct cxl_event_record_hdr hdr;
> > > 	__le64 phys_addr;
> > > 	u8 descriptor;
> > > 	u8 type;
> > > 	u8 transaction_type;
> > > 	u16 validity_flags;
> > > 	u8 channel;
> > > 	u8 rank;
> > > 	u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
> > > 	u8 bank_group;
> > > 	u8 bank;
> > > 	u8 row[CXL_EVT_DER_ROW_SIZE];
> > > 	u16 column;
> > > 	u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> > > } __packed;
> > > ---
> > >  drivers/cxl/core/mbox.c           |  16 +++++
> > >  drivers/cxl/cxlmem.h              |  24 +++++++
> > >  include/trace/events/cxl-events.h | 114 ++++++++++++++++++++++++++++++
> > >  3 files changed, 154 insertions(+)
> > > 
> > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > > index 0e433f072163..6414588a3c7b 100644
> > > --- a/drivers/cxl/core/mbox.c
> > > +++ b/drivers/cxl/core/mbox.c
> > > @@ -717,6 +717,14 @@ static const uuid_t gen_media_event_uuid =
> > >  	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> > >  		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
> > >  
> > > +/*
> > > + * DRAM Event Record
> > > + * CXL v3.0 section 8.2.9.2.1.2; Table 8-44  
> > rev3.0, r3.0 or just 3.0    
> 
> Already done.
> 
> >   
> > > + */
> > > +static const uuid_t dram_event_uuid =
> > > +	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> > > +		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
> > > +
> > >  static void cxl_trace_event_record(const char *dev_name,
> > >  				   enum cxl_event_log_type type,
> > >  				   struct cxl_get_event_payload *payload)
> > > @@ -731,6 +739,14 @@ static void cxl_trace_event_record(const char *dev_name,
> > >  		return;
> > >  	}
> > >  
> > > +	if (uuid_equal(id, &dram_event_uuid)) {  
> > Why not else if?  Should be obvious to compiler that multiple uuid_equal
> > conditions can't match, but even better to not make it try hard perhaps?  
> 
> Sure else if can work.
> 
> >   
> > > +		struct cxl_evt_dram_rec *rec =
> > > +				(struct cxl_evt_dram_rec *)&payload->record;
> > > +
> > > +		trace_cxl_dram_event(dev_name, type, rec);
> > > +		return;
> > > +	}
> > > +
> > >  	/* For unknown record types print just the header */
> > >  	trace_cxl_event(dev_name, type, &payload->record);
> > >  }
> > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > index 33669459ae4b..50536c0a7850 100644
> > > --- a/drivers/cxl/cxlmem.h
> > > +++ b/drivers/cxl/cxlmem.h
> > > @@ -421,6 +421,30 @@ struct cxl_evt_gen_media {
> > >  	u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> > >  } __packed;
> > >  
> > > +/*
> > > + * DRAM Event Record - DER
> > > + * CXL v3.0 section 8.2.9.2.1.2; Table 3-44
> > > + */
> > > +#define CXL_EVT_DER_NIBBLE_MASK_SIZE		3
> > > +#define CXL_EVT_DER_ROW_SIZE			3
> > > +#define CXL_EVT_DER_CORRECTION_MASK_SIZE	0x20
> > > +struct cxl_evt_dram_rec {
> > > +	struct cxl_event_record_hdr hdr;
> > > +	__le64 phys_addr;
> > > +	u8 descriptor;
> > > +	u8 type;
> > > +	u8 transaction_type;
> > > +	u16 validity_flags;  
> > I've not tried it, but can we just mark these as __le16 and use
> > the unaligned accessors?  get_unaligned_le16 etc  
> 
> get_unaligned_le16() requires a byte array...
> 
> So I think this needs to be:
> 
> 	u8 validity_flags[2];
> 
> Now that I know about those calls I think this does make a lot more sense.  The
> test code works but I knew that it would be sketchy with real devices.
> 
> I'll adjust this.
> 
> > Also there is get_unaligned_le24() for the 3 byte ones.  
> 
> Yea done.
> 
> [snip]
> 
> > > +
> > > +	TP_fast_assign(
> > > +		/* Common */
> > > +		__assign_str(dev_name, dev_name);
> > > +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > > +		__entry->log = log;
> > > +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > > +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> > > +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > > +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > > +
> > > +		/* DRAM */
> > > +		__entry->phys_addr = le64_to_cpu(rec->phys_addr);
> > > +		__entry->descriptor = rec->descriptor;
> > > +		__entry->type = rec->type;
> > > +		__entry->transaction_type = rec->transaction_type;
> > > +		__entry->validity_flags = le16_to_cpu(rec->validity_flags);
> > > +		__entry->channel = rec->channel;
> > > +		__entry->rank = rec->rank;
> > > +		__entry->nibble_mask = rec->nibble_mask[0] << 24 |
> > > +				       rec->nibble_mask[1] << 16 |
> > > +				       rec->nibble_mask[2] << 8; /* 3 byte LE ? */  
> > 
> > Use get_unalinged_le24() ? I'd definitely expect these to be le24.
> > 
> >   
> > > +		__entry->nibble_mask = le32_to_cpu(__entry->nibble_mask);  
> > 
> > That doesn't look right.  You will have unwound the endianness using
> > the shifts above. Don't convert it again (noop on le systems, so you
> > probably won't see a problem when testing).  
> 
> I thought I did it right with 2 shifts.  But regardless using
> get_unalinged_le24() is better and I've already changed it.
> 
> >   
> > > +		__entry->bank_group = rec->bank_group;
> > > +		__entry->bank = rec->bank;
> > > +		__entry->row = rec->row[0] << 24 |
> > > +			       rec->row[1] << 16 |
> > > +			       rec->row[2] << 8; /* 3 byte LE ? */  
> > 
> > get_unaligned_le24()  
> 
> ... and this one.
> 
> >   
> > > +		__entry->row = le32_to_cpu(__entry->row);  
> >   
> > > +		__entry->column = le16_to_cpu(rec->column);
> > > +		memcpy(__entry->cor_mask, &rec->correction_mask,
> > > +			CXL_EVT_DER_CORRECTION_MASK_SIZE);
> > > +	),
> > > +
> > > +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> > > +		  "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> > > +		  "rank=%u nibble_mask=%x bank_group=%u bank=%u row=%u column=%u " \
> > > +		  "cor_mask=%s valid_flags='%s'",
> > > +		__get_str(dev_name), show_log_type(__entry->log),
> > > +		__entry->timestamp, __entry->id, __entry->handle,
> > > +		__entry->related_handle, show_hdr_flags(__entry->flags),
> > > +		__entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> > > +		(__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> > > +		show_event_desc_flags(__entry->descriptor),  
> > As before can we not print the invalid ones based on the validity flags?
> > 
> > Few years ago now, but I did something along those lines for the CCIX equivalent of
> > this stuff.  (honestly can't remember much about it now though!)
> > Was a bit fiddly but lead to nicer prints in my opinion.
> > 
> > https://lore.kernel.org/all/20191114133919.32290-2-Jonathan.Cameron@huawei.com/  
> 

Ah. And I'd forgotten I shared it in this reply ;)

> I'm still not seeing anything which alters the actual print in this patch or
> ras_event.h
> 
> Perhaps I'm missing what you mean by selecting the valid fields.
> 
> Something will have to change the TP_printk() format itself from what I can see
> and I don't see a way to do that within the trace infrastructure.
> 
> We _could_ do that within the C code where trace_dram() is called.  But I'd
> like to keep all the info together and let user space decode more than what the
> kernel may know.


Take a look at cper_ccix_err_location() e.g. 

+	if (cmem_err->validation_bits & CCIX_MEM_ERR_GENERIC_MEM_VALID)
+		n = snprintf(msg, len, "Pool Generic Type: %s ",
+			     cper_ccix_mem_err_generic_type_str(cmem_err->pool_generic_type));

which is called from the TP_printk() via cper_ccix_mem_err_unpack()

You can call normal code in TP_printk() though indeed that code needs to then
be in a c file, not the tracepoint header.

Given the meaning of those valid fields won't change, I'd be keen not to print
the associated 'invalid' entries as those are kind of misleading.

Note that userspace code doesn't generally consume anything to do with TP_printk()
but rather does it's own processing... 
E.g. something like:
https://github.com/mchehab/rasdaemon/blob/master/non-standard-hisilicon.c#L210
which happens to be one of our more complex trace point handlers in
rasdaemon.  I think that particular handler decodes for print, but drops the
data in the DB in a fairly raw format.  Some others break it down further for
logging.  Here are the CCIX ones that never went upstream...
https://lore.kernel.org/all/20190827113010.50405-2-Jonathan.Cameron@huawei.com/


Jonathan


> 
> Ira


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 6/9] cxl/mem: Trace Memory Module Event Record
  2022-09-14 21:17     ` Ira Weiny
@ 2022-09-20 16:11       ` Jonathan Cameron
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Cameron @ 2022-09-20 16:11 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, 14 Sep 2022 14:17:14 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Thu, Aug 25, 2022 at 11:58:42AM +0100, Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 22:32:40 -0700
> > ira.weiny@intel.com wrote:
> >   
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > CXL v3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> > > 
> > > Determine if the event read is memory module record and if so trace the
> > > record.
> > > 
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>  
> > Similar comments to on previous patches around using
> > get_unaligned_le*()  
> 
> Yep...
> 
> [snip]
> 
> > >  
> > > +/*
> > > + * Get Health Info Record
> > > + * CXL v3.0 section 8.2.9.8.3.1; Table 8-100
> > > + */
> > > +struct cxl_get_health_info {
> > > +	u8 health_status;
> > > +	u8 media_status;
> > > +	u8 add_status;
> > > +	u8 life_used;
> > > +	u16 device_temp;  
> > 
> > As previous - even though they aren't aligned, I'd have thought
> > __le16 etc will still work.  The unaligned accessors are fine
> > taking __le16 * for example.  
> 
> Ok my bad on using u16 here and I will change it.  I 100% agree that these
> should be __le16/__le32.  That said there is no need to use the unaligned
> accessors for the 16/32 bit fields.
> 
> The unaligned accessors cast the pointer to a __le16/__le32 type and no
> architecture redefines those.  So using le{16,32}_to_cpu() should work just
> fine on all archs.

If they are unaligned, make sure to use the unaligned accessors.

Key is that it's not a simple cast, but rather a cast to a packed
structure.  The C spec guarantees that those will be handled correctly
even on platforms that don't do unaligned accesses - it will have to
use multiple instructions to construct the unaligned access from
a set of small aligned ones.
The C Spec doesn't guarantee the same for a simple cast to an __le16.

There are some hints on this in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/asm-generic/unaligned.h?id=778aaefb8e864fc61f850539ea479554dd4caea1

I recall a full explanation of why this worked, but no idea where
to find that now - might be the thread referred to in that patch from
Arnd.

Jonathan


> 
> [snip]
> 
> > > +
> > > +	TP_fast_assign(
> > > +		/* Common */
> > > +		__assign_str(dev_name, dev_name);
> > > +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > > +		__entry->log = log;
> > > +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > > +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> > > +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > > +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > > +
> > > +		/* Memory Module Event */
> > > +		__entry->event_type = rec->event_type;
> > > +
> > > +		/* Device Health Info */
> > > +		__entry->health_status = rec->info.health_status;
> > > +		__entry->media_status = rec->info.media_status;
> > > +		__entry->life_used = rec->info.life_used;
> > > +		__entry->dirty_shutdown_cnt = le32_to_cpu(rec->info.dirty_shutdown_cnt);
> > > +		__entry->cor_vol_err_cnt = le32_to_cpu(rec->info.cor_vol_err_cnt);  
> > 
> > I've lost track, but my guess is some / all of these need the unaligned_get_le32()
> > etc rather than aligned form.  Maybe just be lazy and use the unaligned versions
> > even when things happen to be aligned - then we don't have to think about it
> > when reviewing :)  
> 
> See above.  I think the 16/32 bit fields work as intended except for my lack of
> using the correct type.
> 
> Ira


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 7/9] cxl/test: Add generic mock events
  2022-09-15 18:53     ` Ira Weiny
@ 2022-09-20 16:17       ` Jonathan Cameron
  2022-09-26 21:39         ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-09-20 16:17 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 15 Sep 2022 11:53:29 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 22:32:41 -0700
> > ira.weiny@intel.com wrote:
> >   
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > Facilitate testing basic Get/Clear Event functionality by creating
> > > multiple logs and generic events with made up UUID's.
> > > 
> > > Data is completely made up with data patterns which should be easy to
> > > spot in trace output.  
> > Hi Ira,
> > 
> > I'm tempted to hack the QEMU emulation for this in with appropriately
> > complex interface to inject all the record types...  
> 
> Every time I look at the QEMU code it makes my head spin.  :-(

You get used to it ;)`

> 
> I really thought about adding some support there.  And I think for irq's it may
> work better?  But after your talk today I did a quick search to see what it
> would take to do irqs in QEMU and got even more confused.  :-(

Copy an example - though we haven't upstreamed any yet...

Either...

https://gitlab.com/jic23/qemu/-/commit/958fec58582b5cc910d2da4e2b855e134bb2c0c3#3dfd54f69a5f2382ddf5a6c00a52546d8b57316e_0_169

Or the CPMU one. 

https://lore.kernel.org/all/20220831153336.16165-2-Jonathan.Cameron@huawei.com/
to setup then look for msix_notify in 

https://lore.kernel.org/all/20220831153336.16165-4-Jonathan.Cameron@huawei.com/

> 
> > Lots to do there though, so not sure where this fits in my priority list!  
> 
> I bet it is higher on mine!  ;-)

:)

> 
> >   
> > > 
> > > Test traces are easy to obtain with a small script such as this:
> > > 
> > > 	#!/bin/bash -x
> > > 
> > > 	devices=`find /sys/devices/platform -name cxl_mem*`
> > > 
> > > 	# Generate fake events if reset is passed in  
> > 
> > reset is rather unintuitive naming.
> > 
> > fill_event_queue maybe or something more in that direction?  
> 
> Fair enough...  Naming is hard and I'm one of the worst.
> 
> I've changed to
> 
> <sysfs>/.../event_fill_queue
> <sysfs>/.../event_trigger
> 
> Thoughts?

Works for me.

..

J

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-09-20 15:49           ` Jonathan Cameron
@ 2022-09-20 20:23             ` Dave Jiang
  2022-09-20 22:10               ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Jiang @ 2022-09-20 20:23 UTC (permalink / raw)
  To: Jonathan Cameron, Ira Weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl


On 9/20/2022 8:49 AM, Jonathan Cameron wrote:
> On Fri, 9 Sep 2022 13:53:55 -0700
> Ira Weiny <ira.weiny@intel.com> wrote:
>
>> On Thu, Sep 08, 2022 at 01:52:40PM +0100, Jonathan Cameron wrote:
>>>    
>> [snip]
>>
>>>>>> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
>>>>>> new file mode 100644
>>>>>> index 000000000000..f4baeae66cf3
>>>>>> --- /dev/null
>>>>>> +++ b/include/trace/events/cxl-events.h
>>>>>> @@ -0,0 +1,127 @@
>>>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>>>> +#undef TRACE_SYSTEM
>>>>>> +#define TRACE_SYSTEM cxl_events
>>>>>> +
>>>>>> +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
>>>>>> +#define _CXL_TRACE_EVENTS_H
>>>>>> +
>>>>>> +#include <linux/tracepoint.h>
>>>>>> +
>>>>>> +#define EVENT_LOGS					\
>>>>>> +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
>>>>>> +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
>>>>>> +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
>>>>>> +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
>>>>>> +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")
>>>>> Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
>>>>> dynamic capacity events so I guess it doesn't matter.
>>>> I'm not sure why you would say that.  I anticipate some user space daemon
>>>> requiring these events to set things up.
>>> Certainly a possible solution. I'd kind of expect a more hand shake based approach
>>> than a tracepoint.  Guess we'll see :)
>> Yea I think we should wait an see.
>>
>>>    
>>>>>      
>>>>>> +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
>>>>>> +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
>>>>>> +)
>>>>>> +
>>>>>> +TRACE_EVENT(cxl_event,
>>>>>> +
>>>>>> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
>>>>>> +		 struct cxl_event_record_raw *rec),
>>>>>> +
>>>>>> +	TP_ARGS(dev_name, log, rec),
>>>>>> +
>>>>>> +	TP_STRUCT__entry(
>>>>>> +		__string(dev_name, dev_name)
>>>>>> +		__field(int, log)
>>>>>> +		__array(u8, id, UUID_SIZE)
>>>>>> +		__field(u32, flags)
>>>>>> +		__field(u16, handle)
>>>>>> +		__field(u16, related_handle)
>>>>>> +		__field(u64, timestamp)
>>>>>> +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
>>>>>> +		__field(u8, length)
>>>>> Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
>>>>> (only noticed because I happen to have that spec revision open rather than 2.0).
>>>> Yes done.
>>>>
>>>> There is some discussion with Dan regarding not decoding anything and letting
>>>> user space take care of it all.  I think this shows a valid reason Dan
>>>> suggested this.
>>> I like being able to print tracepoints with out userspace tools.
>>> This also enforces structure and stability of interface which I like.
>> I tend to agree with you.
>>
>>> Maybe a raw tracepoint or variable length trailing buffer to pass
>>> on what we don't understand?
>> I've already realized that we need to print all reserved fields for this
>> reason.  If there is something the kernel does not understand user space can
>> just figure it out on it's own.
>>
>> Sound reasonable?
> Hmm. Printing reserved fields would be unusual.  Not sure what is done for similar
> cases elsewhere, CPER records etc...
>
> We could just print a raw array of the whole event as well as decode version, but
> that means logging most of the fields twice...
>
> Not nice either.
>
> I'm a bit inclined to say we should maybe just ignore stuff we don't know about or
> is there a version number we can use to decide between decoded vs decoded as much as
> possible + raw log?

libtraceevent can pull the trace event data structure fields directly. 
So the raw data can be pulled directly from the kernel. And what gets 
printed to the trace buffer can be decoded data constructed from those 
fields by the kernel code. So with that you can have access both.

>
> Jonathan
>
>> Ira
>>
>>> Jonathan
>>>
>>>    

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-09-20 20:23             ` Dave Jiang
@ 2022-09-20 22:10               ` Ira Weiny
  2022-09-21 16:36                 ` Jonathan Cameron
  0 siblings, 1 reply; 51+ messages in thread
From: Ira Weiny @ 2022-09-20 22:10 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

On Tue, Sep 20, 2022 at 01:23:29PM -0700, Jiang, Dave wrote:
> 
> On 9/20/2022 8:49 AM, Jonathan Cameron wrote:
> > On Fri, 9 Sep 2022 13:53:55 -0700
> > Ira Weiny <ira.weiny@intel.com> wrote:
> > 
> > > On Thu, Sep 08, 2022 at 01:52:40PM +0100, Jonathan Cameron wrote:
> > > [snip]
> > > 
> > > > > > > diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..f4baeae66cf3
> > > > > > > --- /dev/null
> > > > > > > +++ b/include/trace/events/cxl-events.h
> > > > > > > @@ -0,0 +1,127 @@
> > > > > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > > > > +#undef TRACE_SYSTEM
> > > > > > > +#define TRACE_SYSTEM cxl_events
> > > > > > > +
> > > > > > > +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> > > > > > > +#define _CXL_TRACE_EVENTS_H
> > > > > > > +
> > > > > > > +#include <linux/tracepoint.h>
> > > > > > > +
> > > > > > > +#define EVENT_LOGS					\
> > > > > > > +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
> > > > > > > +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
> > > > > > > +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
> > > > > > > +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
> > > > > > > +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")
> > > > > > Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
> > > > > > dynamic capacity events so I guess it doesn't matter.
> > > > > I'm not sure why you would say that.  I anticipate some user space daemon
> > > > > requiring these events to set things up.
> > > > Certainly a possible solution. I'd kind of expect a more hand shake based approach
> > > > than a tracepoint.  Guess we'll see :)
> > > Yea I think we should wait an see.
> > > 
> > > > > > > +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> > > > > > > +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> > > > > > > +)
> > > > > > > +
> > > > > > > +TRACE_EVENT(cxl_event,
> > > > > > > +
> > > > > > > +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > > > > > > +		 struct cxl_event_record_raw *rec),
> > > > > > > +
> > > > > > > +	TP_ARGS(dev_name, log, rec),
> > > > > > > +
> > > > > > > +	TP_STRUCT__entry(
> > > > > > > +		__string(dev_name, dev_name)
> > > > > > > +		__field(int, log)
> > > > > > > +		__array(u8, id, UUID_SIZE)
> > > > > > > +		__field(u32, flags)
> > > > > > > +		__field(u16, handle)
> > > > > > > +		__field(u16, related_handle)
> > > > > > > +		__field(u64, timestamp)
> > > > > > > +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> > > > > > > +		__field(u8, length)
> > > > > > Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
> > > > > > (only noticed because I happen to have that spec revision open rather than 2.0).
> > > > > Yes done.
> > > > > 
> > > > > There is some discussion with Dan regarding not decoding anything and letting
> > > > > user space take care of it all.  I think this shows a valid reason Dan
> > > > > suggested this.
> > > > I like being able to print tracepoints with out userspace tools.
> > > > This also enforces structure and stability of interface which I like.
> > > I tend to agree with you.
> > > 
> > > > Maybe a raw tracepoint or variable length trailing buffer to pass
> > > > on what we don't understand?
> > > I've already realized that we need to print all reserved fields for this
> > > reason.  If there is something the kernel does not understand user space can
> > > just figure it out on it's own.
> > > 
> > > Sound reasonable?
> > Hmm. Printing reserved fields would be unusual.  Not sure what is done for similar
> > cases elsewhere, CPER records etc...
> > 
> > We could just print a raw array of the whole event as well as decode version, but
> > that means logging most of the fields twice...
> > 
> > Not nice either.
> > 
> > I'm a bit inclined to say we should maybe just ignore stuff we don't know about or
> > is there a version number we can use to decide between decoded vs decoded as much as
> > possible + raw log?

I'm not a fan of loging the raw + decoded versions.

> 
> libtraceevent can pull the trace event data structure fields directly. So
> the raw data can be pulled directly from the kernel.

This raw data needs to be in a field though.  If the kernel does not save the
reserved fields in the TP_fast_assign() then the data won't be in a field to
access.

>
> And what gets printed
> to the trace buffer can be decoded data constructed from those fields by the
> kernel code. So with that you can have access both.
> 

Fast assigning the entire buffer + decoded versions will roughly double the
trace event size.

Thinking through this a bit more there is a sticking point.

The difficulty will be ensuring that any new field names are documented such
that when user space starts to look at them they can determine if that data
appears as a new field or as part of a reserved field.

For example if user space needs to access data in the reserved data now it can
simply decode it.  However, when that data becomes a field it no longer is part
of the reserved data.  So what user space would need to do is look for the
field first (ie know the field name) and then if it does not appear extract it
from the reserved data.

I'm now wondering if I've wasted my time decoding anything since the kernel
does not need to know anything about these fields.  Because the above scenario
means that user space may get ugly over time.

That said I don't think it will present any incompatibilities.  So perhaps we
are ok?

Ira

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-09-20 22:10               ` Ira Weiny
@ 2022-09-21 16:36                 ` Jonathan Cameron
  2022-09-22  4:16                   ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-09-21 16:36 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dave Jiang, Dan Williams, Alison Schofield, Vishal Verma,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

On Tue, 20 Sep 2022 15:10:26 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Tue, Sep 20, 2022 at 01:23:29PM -0700, Jiang, Dave wrote:
> > 
> > On 9/20/2022 8:49 AM, Jonathan Cameron wrote:  
> > > On Fri, 9 Sep 2022 13:53:55 -0700
> > > Ira Weiny <ira.weiny@intel.com> wrote:
> > >   
> > > > On Thu, Sep 08, 2022 at 01:52:40PM +0100, Jonathan Cameron wrote:
> > > > [snip]
> > > >   
> > > > > > > > diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> > > > > > > > new file mode 100644
> > > > > > > > index 000000000000..f4baeae66cf3
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/include/trace/events/cxl-events.h
> > > > > > > > @@ -0,0 +1,127 @@
> > > > > > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > > > > > +#undef TRACE_SYSTEM
> > > > > > > > +#define TRACE_SYSTEM cxl_events
> > > > > > > > +
> > > > > > > > +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> > > > > > > > +#define _CXL_TRACE_EVENTS_H
> > > > > > > > +
> > > > > > > > +#include <linux/tracepoint.h>
> > > > > > > > +
> > > > > > > > +#define EVENT_LOGS					\
> > > > > > > > +	EM(CXL_EVENT_TYPE_INFO,		"Info")		\
> > > > > > > > +	EM(CXL_EVENT_TYPE_WARN,		"Warning")	\
> > > > > > > > +	EM(CXL_EVENT_TYPE_FAIL,		"Failure")	\
> > > > > > > > +	EM(CXL_EVENT_TYPE_FATAL,	"Fatal")	\
> > > > > > > > +	EMe(CXL_EVENT_TYPE_MAX,		"<undefined>")  
> > > > > > > Hmm. 4 is defined in CXL 3.0, but I'd assume we won't use tracepoints for
> > > > > > > dynamic capacity events so I guess it doesn't matter.  
> > > > > > I'm not sure why you would say that.  I anticipate some user space daemon
> > > > > > requiring these events to set things up.  
> > > > > Certainly a possible solution. I'd kind of expect a more hand shake based approach
> > > > > than a tracepoint.  Guess we'll see :)  
> > > > Yea I think we should wait an see.
> > > >   
> > > > > > > > +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> > > > > > > > +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> > > > > > > > +)
> > > > > > > > +
> > > > > > > > +TRACE_EVENT(cxl_event,
> > > > > > > > +
> > > > > > > > +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > > > > > > > +		 struct cxl_event_record_raw *rec),
> > > > > > > > +
> > > > > > > > +	TP_ARGS(dev_name, log, rec),
> > > > > > > > +
> > > > > > > > +	TP_STRUCT__entry(
> > > > > > > > +		__string(dev_name, dev_name)
> > > > > > > > +		__field(int, log)
> > > > > > > > +		__array(u8, id, UUID_SIZE)
> > > > > > > > +		__field(u32, flags)
> > > > > > > > +		__field(u16, handle)
> > > > > > > > +		__field(u16, related_handle)
> > > > > > > > +		__field(u64, timestamp)
> > > > > > > > +		__array(u8, data, EVENT_RECORD_DATA_LENGTH)
> > > > > > > > +		__field(u8, length)  
> > > > > > > Do we want the maintenance operation class added in Table 8-42 from CXL 3.0?
> > > > > > > (only noticed because I happen to have that spec revision open rather than 2.0).  
> > > > > > Yes done.
> > > > > > 
> > > > > > There is some discussion with Dan regarding not decoding anything and letting
> > > > > > user space take care of it all.  I think this shows a valid reason Dan
> > > > > > suggested this.  
> > > > > I like being able to print tracepoints with out userspace tools.
> > > > > This also enforces structure and stability of interface which I like.  
> > > > I tend to agree with you.
> > > >   
> > > > > Maybe a raw tracepoint or variable length trailing buffer to pass
> > > > > on what we don't understand?  
> > > > I've already realized that we need to print all reserved fields for this
> > > > reason.  If there is something the kernel does not understand user space can
> > > > just figure it out on it's own.
> > > > 
> > > > Sound reasonable?  
> > > Hmm. Printing reserved fields would be unusual.  Not sure what is done for similar
> > > cases elsewhere, CPER records etc...
> > > 
> > > We could just print a raw array of the whole event as well as decode version, but
> > > that means logging most of the fields twice...
> > > 
> > > Not nice either.
> > > 
> > > I'm a bit inclined to say we should maybe just ignore stuff we don't know about or
> > > is there a version number we can use to decide between decoded vs decoded as much as
> > > possible + raw log?  
> 
> I'm not a fan of loging the raw + decoded versions.
> 
> > 
> > libtraceevent can pull the trace event data structure fields directly. So
> > the raw data can be pulled directly from the kernel.  
> 
> This raw data needs to be in a field though.  If the kernel does not save the
> reserved fields in the TP_fast_assign() then the data won't be in a field to
> access.
> 
> >
> > And what gets printed
> > to the trace buffer can be decoded data constructed from those fields by the
> > kernel code. So with that you can have access both.
> >   
> 
> Fast assigning the entire buffer + decoded versions will roughly double the
> trace event size.
> 
> Thinking through this a bit more there is a sticking point.
> 
> The difficulty will be ensuring that any new field names are documented such
> that when user space starts to look at them they can determine if that data
> appears as a new field or as part of a reserved field.
> 
> For example if user space needs to access data in the reserved data now it can
> simply decode it.  However, when that data becomes a field it no longer is part
> of the reserved data.  So what user space would need to do is look for the
> field first (ie know the field name) and then if it does not appear extract it
> from the reserved data.
> 
> I'm now wondering if I've wasted my time decoding anything since the kernel
> does not need to know anything about these fields.  Because the above scenario
> means that user space may get ugly over time.
> 
> That said I don't think it will present any incompatibilities.  So perhaps we
> are ok?

I favor decoding current record in kernel and packing it appropriately.
If that means we don't provide some new data from a future version then such
is life - the kernel needs upgrading.  That information is unlikely to be
crucial - it's probably just more detail.

Jonathan

> 
> Ira


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command
  2022-09-21 16:36                 ` Jonathan Cameron
@ 2022-09-22  4:16                   ` Ira Weiny
  0 siblings, 0 replies; 51+ messages in thread
From: Ira Weiny @ 2022-09-22  4:16 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dave Jiang, Dan Williams, Alison Schofield, Vishal Verma,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

On Wed, Sep 21, 2022 at 05:36:42PM +0100, Jonathan Cameron wrote:
> On Tue, 20 Sep 2022 15:10:26 -0700
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > On Tue, Sep 20, 2022 at 01:23:29PM -0700, Jiang, Dave wrote:

[snip]

> > >
> > > And what gets printed
> > > to the trace buffer can be decoded data constructed from those fields by the
> > > kernel code. So with that you can have access both.
> > >   
> > 
> > Fast assigning the entire buffer + decoded versions will roughly double the
> > trace event size.
> > 
> > Thinking through this a bit more there is a sticking point.
> > 
> > The difficulty will be ensuring that any new field names are documented such
> > that when user space starts to look at them they can determine if that data
> > appears as a new field or as part of a reserved field.
> > 
> > For example if user space needs to access data in the reserved data now it can
> > simply decode it.  However, when that data becomes a field it no longer is part
> > of the reserved data.  So what user space would need to do is look for the
> > field first (ie know the field name) and then if it does not appear extract it
> > from the reserved data.
> > 
> > I'm now wondering if I've wasted my time decoding anything since the kernel
> > does not need to know anything about these fields.  Because the above scenario
> > means that user space may get ugly over time.
> > 
> > That said I don't think it will present any incompatibilities.  So perhaps we
> > are ok?
> 
> I favor decoding current record in kernel and packing it appropriately.
> If that means we don't provide some new data from a future version then such
> is life - the kernel needs upgrading.  That information is unlikely to be
> crucial - it's probably just more detail.

Dave, Dan, and I discussed this further today.  Dan expressed the same opinion.
So I'm going to remove all the reserved fields from the next version.

Thanks,
Ira

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 7/9] cxl/test: Add generic mock events
  2022-09-20 16:17       ` Jonathan Cameron
@ 2022-09-26 21:39         ` Ira Weiny
  2022-09-27 13:56           ` Jonathan Cameron
  0 siblings, 1 reply; 51+ messages in thread
From: Ira Weiny @ 2022-09-26 21:39 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Williams, Dan J, Schofield, Alison, Verma, Vishal L,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

On Tue, Sep 20, 2022 at 09:17:48AM -0700, Jonathan Cameron wrote:
> On Thu, 15 Sep 2022 11:53:29 -0700
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:
> > > On Fri, 12 Aug 2022 22:32:41 -0700
> > > ira.weiny@intel.com wrote:
> > >   
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > 
> > > > Facilitate testing basic Get/Clear Event functionality by creating
> > > > multiple logs and generic events with made up UUID's.
> > > > 
> > > > Data is completely made up with data patterns which should be easy to
> > > > spot in trace output.  
> > > Hi Ira,
> > > 
> > > I'm tempted to hack the QEMU emulation for this in with appropriately
> > > complex interface to inject all the record types...  
> > 
> > Every time I look at the QEMU code it makes my head spin.  :-(
> 
> You get used to it ;)`

I'm trying...  :-/

Question though:

Is there a call in qemu which is equivalent to cpu_to_leXX()?  The
exec/cpu-all.h is having compilation issues for me because the
TARGET_BIG_ENDIAN is not defined (it is defined in a meson generated header).

So I'm afraid that the tswapXX() calls are not what I'm supposed to use.  Is
this true?  Are those some sort of internal call?

Ira

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 7/9] cxl/test: Add generic mock events
  2022-09-26 21:39         ` Ira Weiny
@ 2022-09-27 13:56           ` Jonathan Cameron
  2022-09-27 16:13             ` Ira Weiny
  0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2022-09-27 13:56 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Williams, Dan J, Schofield, Alison, Verma, Vishal L,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

On Mon, 26 Sep 2022 14:39:52 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Tue, Sep 20, 2022 at 09:17:48AM -0700, Jonathan Cameron wrote:
> > On Thu, 15 Sep 2022 11:53:29 -0700
> > Ira Weiny <ira.weiny@intel.com> wrote:
> >   
> > > On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:  
> > > > On Fri, 12 Aug 2022 22:32:41 -0700
> > > > ira.weiny@intel.com wrote:
> > > >     
> > > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > > 
> > > > > Facilitate testing basic Get/Clear Event functionality by creating
> > > > > multiple logs and generic events with made up UUID's.
> > > > > 
> > > > > Data is completely made up with data patterns which should be easy to
> > > > > spot in trace output.    
> > > > Hi Ira,
> > > > 
> > > > I'm tempted to hack the QEMU emulation for this in with appropriately
> > > > complex interface to inject all the record types...    
> > > 
> > > Every time I look at the QEMU code it makes my head spin.  :-(  
> > 
> > You get used to it ;)`  
> 
> I'm trying...  :-/
> 
> Question though:
> 
> Is there a call in qemu which is equivalent to cpu_to_leXX()?  The
> exec/cpu-all.h is having compilation issues for me because the
> TARGET_BIG_ENDIAN is not defined (it is defined in a meson generated header).
> 
> So I'm afraid that the tswapXX() calls are not what I'm supposed to use.  Is
> this true?  Are those some sort of internal call?
I'm confused.  There is cpu_to_le16 in "qemu/bswap.h"

I suspect we've played a bit fast and loose with endianness in a few places in
current qemu code and should probably check all that sometime.

Jonathan



> 
> Ira


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 7/9] cxl/test: Add generic mock events
  2022-09-27 13:56           ` Jonathan Cameron
@ 2022-09-27 16:13             ` Ira Weiny
  2022-09-28  9:49               ` Jonathan Cameron
  0 siblings, 1 reply; 51+ messages in thread
From: Ira Weiny @ 2022-09-27 16:13 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Williams, Dan J, Schofield, Alison, Verma, Vishal L,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

On Tue, Sep 27, 2022 at 02:56:23PM +0100, Jonathan Cameron wrote:
> On Mon, 26 Sep 2022 14:39:52 -0700
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > On Tue, Sep 20, 2022 at 09:17:48AM -0700, Jonathan Cameron wrote:
> > > On Thu, 15 Sep 2022 11:53:29 -0700
> > > Ira Weiny <ira.weiny@intel.com> wrote:
> > >   
> > > > On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:  
> > > > > On Fri, 12 Aug 2022 22:32:41 -0700
> > > > > ira.weiny@intel.com wrote:
> > > > >     
> > > > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > > > 
> > > > > > Facilitate testing basic Get/Clear Event functionality by creating
> > > > > > multiple logs and generic events with made up UUID's.
> > > > > > 
> > > > > > Data is completely made up with data patterns which should be easy to
> > > > > > spot in trace output.    
> > > > > Hi Ira,
> > > > > 
> > > > > I'm tempted to hack the QEMU emulation for this in with appropriately
> > > > > complex interface to inject all the record types...    
> > > > 
> > > > Every time I look at the QEMU code it makes my head spin.  :-(  
> > > 
> > > You get used to it ;)`  
> > 
> > I'm trying...  :-/
> > 
> > Question though:
> > 
> > Is there a call in qemu which is equivalent to cpu_to_leXX()?  The
> > exec/cpu-all.h is having compilation issues for me because the
> > TARGET_BIG_ENDIAN is not defined (it is defined in a meson generated header).
> > 
> > So I'm afraid that the tswapXX() calls are not what I'm supposed to use.  Is
> > this true?  Are those some sort of internal call?
> I'm confused.  There is cpu_to_le16 in "qemu/bswap.h"

<sigh> I don't know how I missed it.  Sorry.

> 
> I suspect we've played a bit fast and loose with endianness in a few places in
> current qemu code and should probably check all that sometime.
 
Yea nothing in hw/cxl seems to use any swapping.  I suppose only little endian
hosts have been used thus far?

I greped for 'ENDIAN' and found the tswap* calls.  I guess I should have
grepped for 'cpu_to'!  That found it right away!  :-/  :-D

Sorry for the distraction,
Ira

> Jonathan
> 
> 
> 
> > 
> > Ira
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 7/9] cxl/test: Add generic mock events
  2022-09-27 16:13             ` Ira Weiny
@ 2022-09-28  9:49               ` Jonathan Cameron
  0 siblings, 0 replies; 51+ messages in thread
From: Jonathan Cameron @ 2022-09-28  9:49 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Williams, Dan J, Schofield, Alison, Verma, Vishal L,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

On Tue, 27 Sep 2022 09:13:58 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Tue, Sep 27, 2022 at 02:56:23PM +0100, Jonathan Cameron wrote:
> > On Mon, 26 Sep 2022 14:39:52 -0700
> > Ira Weiny <ira.weiny@intel.com> wrote:
> >   
> > > On Tue, Sep 20, 2022 at 09:17:48AM -0700, Jonathan Cameron wrote:  
> > > > On Thu, 15 Sep 2022 11:53:29 -0700
> > > > Ira Weiny <ira.weiny@intel.com> wrote:
> > > >     
> > > > > On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:    
> > > > > > On Fri, 12 Aug 2022 22:32:41 -0700
> > > > > > ira.weiny@intel.com wrote:
> > > > > >       
> > > > > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > > > > 
> > > > > > > Facilitate testing basic Get/Clear Event functionality by creating
> > > > > > > multiple logs and generic events with made up UUID's.
> > > > > > > 
> > > > > > > Data is completely made up with data patterns which should be easy to
> > > > > > > spot in trace output.      
> > > > > > Hi Ira,
> > > > > > 
> > > > > > I'm tempted to hack the QEMU emulation for this in with appropriately
> > > > > > complex interface to inject all the record types...      
> > > > > 
> > > > > Every time I look at the QEMU code it makes my head spin.  :-(    
> > > > 
> > > > You get used to it ;)`    
> > > 
> > > I'm trying...  :-/
> > > 
> > > Question though:
> > > 
> > > Is there a call in qemu which is equivalent to cpu_to_leXX()?  The
> > > exec/cpu-all.h is having compilation issues for me because the
> > > TARGET_BIG_ENDIAN is not defined (it is defined in a meson generated header).
> > > 
> > > So I'm afraid that the tswapXX() calls are not what I'm supposed to use.  Is
> > > this true?  Are those some sort of internal call?  
> > I'm confused.  There is cpu_to_le16 in "qemu/bswap.h"  
> 
> <sigh> I don't know how I missed it.  Sorry.
> 
> > 
> > I suspect we've played a bit fast and loose with endianness in a few places in
> > current qemu code and should probably check all that sometime.  
>  
> Yea nothing in hw/cxl seems to use any swapping.  I suppose only little endian
> hosts have been used thus far?

Exactly and I'm not sure when we'll see any big endian emulated hosts.  We should fix that,
but lots of other things on todo list, so it's not particularly high on the list
+ getting a test environment up is going to be non trivial.

J
> 
> I greped for 'ENDIAN' and found the tswap* calls.  I guess I should have
> grepped for 'cpu_to'!  That found it right away!  :-/  :-D
> 
> Sorry for the distraction,
> Ira
> 
> > Jonathan
> > 
> > 
> >   
> > > 
> > > Ira  
> >   


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2022-09-28  9:50 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
2022-08-13  5:32 ` [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command ira.weiny
2022-08-16 16:39   ` Steven Rostedt
2022-08-16 16:41     ` Steven Rostedt
2022-08-16 23:11       ` Ira Weiny
2022-08-16 23:35     ` Ira Weiny
2022-08-17 22:54   ` Dave Jiang
2022-09-07  4:53     ` Ira Weiny
2022-08-24 15:50   ` Jonathan Cameron
2022-09-07  4:28     ` Ira Weiny
2022-09-08 12:52       ` Jonathan Cameron
2022-09-09 20:53         ` Ira Weiny
2022-09-20 15:49           ` Jonathan Cameron
2022-09-20 20:23             ` Dave Jiang
2022-09-20 22:10               ` Ira Weiny
2022-09-21 16:36                 ` Jonathan Cameron
2022-09-22  4:16                   ` Ira Weiny
2022-08-13  5:32 ` [RFC PATCH 2/9] cxl/mem: Implement Clear " ira.weiny
2022-08-24 15:55   ` Jonathan Cameron
2022-09-09 21:35     ` Ira Weiny
2022-08-13  5:32 ` [RFC PATCH 3/9] cxl/mem: Clear events on driver load ira.weiny
2022-08-24 15:57   ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record ira.weiny
2022-08-24 16:11   ` Jonathan Cameron
2022-09-12 22:38     ` Ira Weiny
2022-09-20 15:52       ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 5/9] cxl/mem: Trace DRAM " ira.weiny
2022-08-25 10:46   ` Jonathan Cameron
2022-09-12 23:04     ` Ira Weiny
2022-09-20 16:02       ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 6/9] cxl/mem: Trace Memory Module " ira.weiny
2022-08-25 10:58   ` Jonathan Cameron
2022-09-14 21:17     ` Ira Weiny
2022-09-20 16:11       ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 7/9] cxl/test: Add generic mock events ira.weiny
2022-08-25 11:31   ` Jonathan Cameron
2022-09-15 18:53     ` Ira Weiny
2022-09-20 16:17       ` Jonathan Cameron
2022-09-26 21:39         ` Ira Weiny
2022-09-27 13:56           ` Jonathan Cameron
2022-09-27 16:13             ` Ira Weiny
2022-09-28  9:49               ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 8/9] cxl/test: Add specific events ira.weiny
2022-08-25 11:37   ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 9/9] cxl/test: Simulate event log overflow ira.weiny
2022-08-16 16:44   ` Steven Rostedt
2022-08-22 16:18 ` [RFC PATCH 0/9] CXL: Read and clear event logs Davidlohr Bueso
2022-08-22 22:53   ` Ira Weiny
2022-08-23 16:12     ` Davidlohr Bueso
2022-08-24 10:07     ` Jonathan Cameron
2022-09-01 18:10       ` Dave Jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.