linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v4 0/6] efi/cxl-cper: Report CPER CXL component events through trace events
@ 2023-11-09 22:07 Ira Weiny
  2023-11-09 22:07 ` [PATCH RFC v4 1/6] cxl/trace: Pass uuid explicitly to event traces Ira Weiny
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-09 22:07 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli, Shiju Jose
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

Series status/background
========================

This RFC should be very close to done but I'm still working on getting a
system to test on.  Therefore, the CPER code remains compile tested
only.  The modified event log code continues to pass cxl-test.

Cover letter
============

CXL Component Events, as defined by EFI 2.10 Section N.2.14, wrap a
mostly CXL event payload in an EFI Common Platform Error Record (CPER)
record.  If a device is configured for firmware first CXL event records
are not sent directly to the host.

The CXL sub-system uniquely has DPA to HPA translation information.  It
also already has event format tracing.  Restructure the code to make
sharing the data between CPER/event logs most efficient.  Then send the
CXL CPER records to the CXL sub-system for processing.

With event logs the events interrupt the driver directly.  In the EFI
case events are wrapped with device information which needs to be
matched with memdev devices.  A number of alternatives were considered
to match the memdev with the CPER record.  The most robust was to find
the PCI device via Bus, Device, Function and match it to the memdev
driver data.

CPER records are identified with GUID's while CXL event logs contain
UUID's.  The UUID is reported for all events.  While the UUID is
redundant for the known events the UUID's are already used by rasdaemon.
To keep compatibility UUIDs are injected for CPER records based on the
record type.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
Changes in v4:
- Jonathan/Shiju: Put UUID back in trace points.
- Smita: Fix packed structure
- Smita: use PCI_DEVFN() properly
- iweiny: update cover letter/commit messages
- Link to v3: https://lore.kernel.org/r/20230601-cxl-cper-v3-0-0189d61f7956@intel.com

---
Ira Weiny (6):
      cxl/trace: Pass uuid explicitly to event traces
      cxl/events: Promote CXL event structures to a core header
      cxl/events: Separate UUID from event structures
      cxl/events: Create a CXL event union
      firmware/efi: Process CXL Component Events
      cxl/memdev: Register for and process CPER events

 drivers/cxl/core/mbox.c         |  65 ++++++++++------
 drivers/cxl/core/trace.h        |  34 ++++----
 drivers/cxl/cxlmem.h            |  96 ++---------------------
 drivers/cxl/pci.c               |  58 +++++++++++++-
 drivers/firmware/efi/cper.c     |  15 ++++
 drivers/firmware/efi/cper_cxl.c |  40 ++++++++++
 drivers/firmware/efi/cper_cxl.h |  29 +++++++
 include/linux/cxl-event.h       | 160 ++++++++++++++++++++++++++++++++++++++
 tools/testing/cxl/test/mem.c    | 166 +++++++++++++++++++++++-----------------
 9 files changed, 461 insertions(+), 202 deletions(-)
---
base-commit: 1c8b86a3799f7e5be903c3f49fcdaee29fd385b5
change-id: 20230601-cxl-cper-26ffc839c6c6

Best regards,
-- 
Ira Weiny <ira.weiny@intel.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC v4 1/6] cxl/trace: Pass uuid explicitly to event traces
  2023-11-09 22:07 [PATCH RFC v4 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
@ 2023-11-09 22:07 ` Ira Weiny
  2023-11-09 22:07 ` [PATCH RFC v4 2/6] cxl/events: Promote CXL event structures to a core header Ira Weiny
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-09 22:07 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli, Shiju Jose
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

CPER CXL events do not have a UUID associated with them.  It is
desirable to share event structures between the CPER CXL event and the
CXL event log events.

Pass the UUID explicitly to each trace event to be able to remove the
UUID from the event structures.

Originally it was desirable to remove the UUID from the well known event
because the UUID value was redundant.  However, the trace API was
already in place.[1]

[1] https://lore.kernel.org/all/36f2d12934d64a278f2c0313cbd01abc@huawei.com/

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC V3:
[iweiny: drop remove uuid patch; replaced it with this one]
---
 drivers/cxl/core/mbox.c  |  8 ++++----
 drivers/cxl/core/trace.h | 32 ++++++++++++++++----------------
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 4df4f614f490..43aad71810e2 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -870,19 +870,19 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 		struct cxl_event_gen_media *rec =
 				(struct cxl_event_gen_media *)record;
 
-		trace_cxl_general_media(cxlmd, type, rec);
+		trace_cxl_general_media(cxlmd, type, id, rec);
 	} else if (uuid_equal(id, &dram_event_uuid)) {
 		struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
 
-		trace_cxl_dram(cxlmd, type, rec);
+		trace_cxl_dram(cxlmd, type, id, rec);
 	} else if (uuid_equal(id, &mem_mod_event_uuid)) {
 		struct cxl_event_mem_module *rec =
 				(struct cxl_event_mem_module *)record;
 
-		trace_cxl_memory_module(cxlmd, type, rec);
+		trace_cxl_memory_module(cxlmd, type, id, rec);
 	} else {
 		/* For unknown record types print just the header */
-		trace_cxl_generic_event(cxlmd, type, record);
+		trace_cxl_generic_event(cxlmd, type, id, record);
 	}
 }
 
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index a0b5819bc70b..2aef185f4cd0 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -189,7 +189,7 @@ TRACE_EVENT(cxl_overflow,
 	__string(memdev, dev_name(&cxlmd->dev))			\
 	__string(host, dev_name(cxlmd->dev.parent))		\
 	__field(int, log)					\
-	__field_struct(uuid_t, hdr_uuid)			\
+	__field_struct(uuid_t, uuid)				\
 	__field(u64, serial)					\
 	__field(u32, hdr_flags)					\
 	__field(u16, hdr_handle)				\
@@ -198,12 +198,12 @@ TRACE_EVENT(cxl_overflow,
 	__field(u8, hdr_length)					\
 	__field(u8, hdr_maint_op_class)
 
-#define CXL_EVT_TP_fast_assign(cxlmd, l, hdr)					\
+#define CXL_EVT_TP_fast_assign(cxlmd, l, uuid, hdr)				\
 	__assign_str(memdev, dev_name(&(cxlmd)->dev));				\
 	__assign_str(host, dev_name((cxlmd)->dev.parent));			\
 	__entry->log = (l);							\
 	__entry->serial = (cxlmd)->cxlds->serial;				\
-	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
+	memcpy(&__entry->uuid, (uuid), sizeof(uuid_t));				\
 	__entry->hdr_length = (hdr).length;					\
 	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
 	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
@@ -217,7 +217,7 @@ TRACE_EVENT(cxl_overflow,
 		"maint_op_class=%u : " fmt,					\
 		__get_str(memdev), __get_str(host), __entry->serial,		\
 		cxl_event_log_type_str(__entry->log),				\
-		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
+		__entry->hdr_timestamp, &__entry->uuid, __entry->hdr_length,	\
 		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
 		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
 		##__VA_ARGS__)
@@ -225,9 +225,9 @@ TRACE_EVENT(cxl_overflow,
 TRACE_EVENT(cxl_generic_event,
 
 	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
-		 struct cxl_event_record_raw *rec),
+		 const uuid_t *uuid, struct cxl_event_record_raw *rec),
 
-	TP_ARGS(cxlmd, log, rec),
+	TP_ARGS(cxlmd, log, uuid, rec),
 
 	TP_STRUCT__entry(
 		CXL_EVT_TP_entry
@@ -235,7 +235,7 @@ TRACE_EVENT(cxl_generic_event,
 	),
 
 	TP_fast_assign(
-		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+		CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
 		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
 	),
 
@@ -315,9 +315,9 @@ TRACE_EVENT(cxl_generic_event,
 TRACE_EVENT(cxl_general_media,
 
 	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
-		 struct cxl_event_gen_media *rec),
+		 const uuid_t *uuid, struct cxl_event_gen_media *rec),
 
-	TP_ARGS(cxlmd, log, rec),
+	TP_ARGS(cxlmd, log, uuid, rec),
 
 	TP_STRUCT__entry(
 		CXL_EVT_TP_entry
@@ -336,7 +336,7 @@ TRACE_EVENT(cxl_general_media,
 	),
 
 	TP_fast_assign(
-		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+		CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
 
 		/* General Media */
 		__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -398,9 +398,9 @@ TRACE_EVENT(cxl_general_media,
 TRACE_EVENT(cxl_dram,
 
 	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
-		 struct cxl_event_dram *rec),
+		 const uuid_t *uuid, struct cxl_event_dram *rec),
 
-	TP_ARGS(cxlmd, log, rec),
+	TP_ARGS(cxlmd, log, uuid, rec),
 
 	TP_STRUCT__entry(
 		CXL_EVT_TP_entry
@@ -422,7 +422,7 @@ TRACE_EVENT(cxl_dram,
 	),
 
 	TP_fast_assign(
-		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+		CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
 
 		/* DRAM */
 		__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -547,9 +547,9 @@ TRACE_EVENT(cxl_dram,
 TRACE_EVENT(cxl_memory_module,
 
 	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
-		 struct cxl_event_mem_module *rec),
+		 const uuid_t *uuid, struct cxl_event_mem_module *rec),
 
-	TP_ARGS(cxlmd, log, rec),
+	TP_ARGS(cxlmd, log, uuid, rec),
 
 	TP_STRUCT__entry(
 		CXL_EVT_TP_entry
@@ -569,7 +569,7 @@ TRACE_EVENT(cxl_memory_module,
 	),
 
 	TP_fast_assign(
-		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+		CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
 
 		/* Memory Module Event */
 		__entry->event_type = rec->event_type;

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v4 2/6] cxl/events: Promote CXL event structures to a core header
  2023-11-09 22:07 [PATCH RFC v4 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
  2023-11-09 22:07 ` [PATCH RFC v4 1/6] cxl/trace: Pass uuid explicitly to event traces Ira Weiny
@ 2023-11-09 22:07 ` Ira Weiny
  2023-11-09 22:07 ` [PATCH RFC v4 3/6] cxl/events: Separate UUID from event structures Ira Weiny
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-09 22:07 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli, Shiju Jose
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

UEFI code can process CXL events through CPER records.  Those records
use almost the same format as the CXL events.

Lift the CXL event structures to a core header to be shared.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2:
[djbw: new patch]
---
 drivers/cxl/cxlmem.h      |  90 +----------------------------------------
 include/linux/cxl-event.h | 100 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 101 insertions(+), 89 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 706f8a6d1ef4..d694820ce8f5 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -6,6 +6,7 @@
 #include <linux/cdev.h>
 #include <linux/uuid.h>
 #include <linux/rcuwait.h>
+#include <linux/cxl-event.h>
 #include "cxl.h"
 
 /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -576,27 +577,6 @@ struct cxl_mbox_identify {
 	u8 qos_telemetry_caps;
 } __packed;
 
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
-struct cxl_event_record_hdr {
-	uuid_t id;
-	u8 length;
-	u8 flags[3];
-	__le16 handle;
-	__le16 related_handle;
-	__le64 timestamp;
-	u8 maint_op_class;
-	u8 reserved[15];
-} __packed;
-
-#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
-	struct cxl_event_record_hdr hdr;
-	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
-} __packed;
-
 /*
  * Get Event Records output payload
  * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
@@ -638,74 +618,6 @@ struct cxl_mbox_clear_event_payload {
 } __packed;
 #define CXL_CLEAR_EVENT_MAX_HANDLES U8_MAX
 
-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CXL_EVENT_GEN_MED_COMP_ID_SIZE	0x10
-struct cxl_event_gen_media {
-	struct cxl_event_record_hdr hdr;
-	__le64 phys_addr;
-	u8 descriptor;
-	u8 type;
-	u8 transaction_type;
-	u8 validity_flags[2];
-	u8 channel;
-	u8 rank;
-	u8 device[3];
-	u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
-	u8 reserved[46];
-} __packed;
-
-/*
- * DRAM Event Record - DER
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
- */
-#define CXL_EVENT_DER_CORRECTION_MASK_SIZE	0x20
-struct cxl_event_dram {
-	struct cxl_event_record_hdr hdr;
-	__le64 phys_addr;
-	u8 descriptor;
-	u8 type;
-	u8 transaction_type;
-	u8 validity_flags[2];
-	u8 channel;
-	u8 rank;
-	u8 nibble_mask[3];
-	u8 bank_group;
-	u8 bank;
-	u8 row[3];
-	u8 column[2];
-	u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
-	u8 reserved[0x17];
-} __packed;
-
-/*
- * Get Health Info Record
- * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
- */
-struct cxl_get_health_info {
-	u8 health_status;
-	u8 media_status;
-	u8 add_status;
-	u8 life_used;
-	u8 device_temp[2];
-	u8 dirty_shutdown_cnt[4];
-	u8 cor_vol_err_cnt[4];
-	u8 cor_per_err_cnt[4];
-} __packed;
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-struct cxl_event_mem_module {
-	struct cxl_event_record_hdr hdr;
-	u8 event_type;
-	struct cxl_get_health_info info;
-	u8 reserved[0x3d];
-} __packed;
-
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
new file mode 100644
index 000000000000..1c94e8fdd227
--- /dev/null
+++ b/include/linux/cxl-event.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CXL_EVENT_H
+#define _LINUX_CXL_EVENT_H
+
+/*
+ * CXL event records; CXL rev 3.0
+ *
+ * Copyright(c) 2023 Intel Corporation.
+ */
+
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+	uuid_t id;
+	u8 length;
+	u8 flags[3];
+	__le16 handle;
+	__le16 related_handle;
+	__le64 timestamp;
+	u8 maint_op_class;
+	u8 reserved[15];
+} __packed;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+	struct cxl_event_record_hdr hdr;
+	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MED_COMP_ID_SIZE	0x10
+struct cxl_event_gen_media {
+	struct cxl_event_record_hdr hdr;
+	__le64 phys_addr;
+	u8 descriptor;
+	u8 type;
+	u8 transaction_type;
+	u8 validity_flags[2];
+	u8 channel;
+	u8 rank;
+	u8 device[3];
+	u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+	u8 reserved[46];
+} __packed;
+
+/*
+ * DRAM Event Record - DER
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVENT_DER_CORRECTION_MASK_SIZE	0x20
+struct cxl_event_dram {
+	struct cxl_event_record_hdr hdr;
+	__le64 phys_addr;
+	u8 descriptor;
+	u8 type;
+	u8 transaction_type;
+	u8 validity_flags[2];
+	u8 channel;
+	u8 rank;
+	u8 nibble_mask[3];
+	u8 bank_group;
+	u8 bank;
+	u8 row[3];
+	u8 column[2];
+	u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
+	u8 reserved[0x17];
+} __packed;
+
+/*
+ * Get Health Info Record
+ * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+	u8 health_status;
+	u8 media_status;
+	u8 add_status;
+	u8 life_used;
+	u8 device_temp[2];
+	u8 dirty_shutdown_cnt[4];
+	u8 cor_vol_err_cnt[4];
+	u8 cor_per_err_cnt[4];
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_event_mem_module {
+	struct cxl_event_record_hdr hdr;
+	u8 event_type;
+	struct cxl_get_health_info info;
+	u8 reserved[0x3d];
+} __packed;
+
+#endif /* _LINUX_CXL_EVENT_H */

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v4 3/6] cxl/events: Separate UUID from event structures
  2023-11-09 22:07 [PATCH RFC v4 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
  2023-11-09 22:07 ` [PATCH RFC v4 1/6] cxl/trace: Pass uuid explicitly to event traces Ira Weiny
  2023-11-09 22:07 ` [PATCH RFC v4 2/6] cxl/events: Promote CXL event structures to a core header Ira Weiny
@ 2023-11-09 22:07 ` Ira Weiny
  2023-11-09 22:07 ` [PATCH RFC v4 4/6] cxl/events: Create a CXL event union Ira Weiny
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-09 22:07 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli, Shiju Jose
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

The UEFI CXL CPER structure does not include the UUID.  Now that the
UUID is passed separately to the trace event there is no need to have
the UUID in those structures.

Move UUID from the event record header to the raw structures.  Adjust
cxl-test to Create dummy structures for creating test records.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v3:
[iweiny: reword the commit message]

Changes from RFC v2:
[iweiny: new patch]
---
 drivers/cxl/core/mbox.c      |   2 +-
 include/linux/cxl-event.h    |  10 ++--
 tools/testing/cxl/test/mem.c | 135 +++++++++++++++++++++++++------------------
 3 files changed, 84 insertions(+), 63 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 43aad71810e2..74446c6a990a 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -864,7 +864,7 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 				   enum cxl_event_log_type type,
 				   struct cxl_event_record_raw *record)
 {
-	uuid_t *id = &record->hdr.id;
+	uuid_t *id = &record->id;
 
 	if (uuid_equal(id, &gen_media_event_uuid)) {
 		struct cxl_event_gen_media *rec =
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 1c94e8fdd227..ebb00ead1496 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -8,12 +8,7 @@
  * Copyright(c) 2023 Intel Corporation.
  */
 
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
 struct cxl_event_record_hdr {
-	uuid_t id;
 	u8 length;
 	u8 flags[3];
 	__le16 handle;
@@ -23,8 +18,13 @@ struct cxl_event_record_hdr {
 	u8 reserved[15];
 } __packed;
 
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
 #define CXL_EVENT_RECORD_DATA_LENGTH 0x50
 struct cxl_event_record_raw {
+	uuid_t id;
 	struct cxl_event_record_hdr hdr;
 	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
 } __packed;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 464fc39ed277..85862be00c48 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -330,9 +330,9 @@ static void cxl_mock_event_trigger(struct device *dev)
 }
 
 struct cxl_event_record_raw maint_needed = {
+	.id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
+			0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
 	.hdr = {
-		.id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
-				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
 		.length = sizeof(struct cxl_event_record_raw),
 		.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
 		/* .handle = Set dynamically */
@@ -342,9 +342,9 @@ struct cxl_event_record_raw maint_needed = {
 };
 
 struct cxl_event_record_raw hardware_replace = {
+	.id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
+			0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
 	.hdr = {
-		.id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
-				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
 		.length = sizeof(struct cxl_event_record_raw),
 		.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
 		/* .handle = Set dynamically */
@@ -353,64 +353,85 @@ struct cxl_event_record_raw hardware_replace = {
 	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 };
 
-struct cxl_event_gen_media gen_media = {
-	.hdr = {
-		.id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
-				0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
-		.length = sizeof(struct cxl_event_gen_media),
-		.flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0),
+struct cxl_test_gen_media {
+	uuid_t id;
+	struct cxl_event_gen_media rec;
+} __packed;
+
+struct cxl_test_gen_media gen_media = {
+	.id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+			0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+	.rec = {
+		.hdr = {
+			.length = sizeof(struct cxl_test_gen_media),
+			.flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0),
+		},
+		.phys_addr = cpu_to_le64(0x2000),
+		.descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+		.type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+		.transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+		/* .validity_flags = <set below> */
+		.channel = 1,
+		.rank = 30
 	},
-	.phys_addr = cpu_to_le64(0x2000),
-	.descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
-	.type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
-	.transaction_type = CXL_GMER_TRANS_HOST_WRITE,
-	/* .validity_flags = <set below> */
-	.channel = 1,
-	.rank = 30
 };
 
-struct cxl_event_dram dram = {
-	.hdr = {
-		.id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
-				0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
-		.length = sizeof(struct cxl_event_dram),
-		.flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0),
+struct cxl_test_dram {
+	uuid_t id;
+	struct cxl_event_dram rec;
+} __packed;
+
+struct cxl_test_dram dram = {
+	.id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+			0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+	.rec = {
+		.hdr = {
+			.length = sizeof(struct cxl_test_dram),
+			.flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0),
+		},
+		.phys_addr = cpu_to_le64(0x8000),
+		.descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+		.type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+		.transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+		/* .validity_flags = <set below> */
+		.channel = 1,
+		.bank_group = 5,
+		.bank = 2,
+		.column = {0xDE, 0xAD},
 	},
-	.phys_addr = cpu_to_le64(0x8000),
-	.descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
-	.type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
-	.transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
-	/* .validity_flags = <set below> */
-	.channel = 1,
-	.bank_group = 5,
-	.bank = 2,
-	.column = {0xDE, 0xAD},
 };
 
-struct cxl_event_mem_module mem_module = {
-	.hdr = {
-		.id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
-				0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
-		.length = sizeof(struct cxl_event_mem_module),
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0),
+struct cxl_test_mem_module {
+	uuid_t id;
+	struct cxl_event_mem_module rec;
+} __packed;
+
+struct cxl_test_mem_module mem_module = {
+	.id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+			0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+	.rec = {
+		.hdr = {
+			.length = sizeof(struct cxl_test_mem_module),
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0),
+		},
+		.event_type = CXL_MMER_TEMP_CHANGE,
+		.info = {
+			.health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+			.media_status = CXL_DHI_MS_ALL_DATA_LOST,
+			.add_status = (CXL_DHI_AS_CRITICAL << 2) |
+				      (CXL_DHI_AS_WARNING << 4) |
+				      (CXL_DHI_AS_WARNING << 5),
+			.device_temp = { 0xDE, 0xAD},
+			.dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
+			.cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+			.cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+		}
 	},
-	.event_type = CXL_MMER_TEMP_CHANGE,
-	.info = {
-		.health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
-		.media_status = CXL_DHI_MS_ALL_DATA_LOST,
-		.add_status = (CXL_DHI_AS_CRITICAL << 2) |
-			      (CXL_DHI_AS_WARNING << 4) |
-			      (CXL_DHI_AS_WARNING << 5),
-		.device_temp = { 0xDE, 0xAD},
-		.dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
-		.cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
-		.cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
-	}
 };
 
 static int mock_set_timestamp(struct cxl_dev_state *cxlds,
@@ -432,11 +453,11 @@ static int mock_set_timestamp(struct cxl_dev_state *cxlds,
 static void cxl_mock_add_event_logs(struct mock_event_store *mes)
 {
 	put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK,
-			   &gen_media.validity_flags);
+			   &gen_media.rec.validity_flags);
 
 	put_unaligned_le16(CXL_DER_VALID_CHANNEL | CXL_DER_VALID_BANK_GROUP |
 			   CXL_DER_VALID_BANK | CXL_DER_VALID_COLUMN,
-			   &dram.validity_flags);
+			   &dram.rec.validity_flags);
 
 	mes_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
 	mes_add_event(mes, CXL_EVENT_TYPE_INFO,

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v4 4/6] cxl/events: Create a CXL event union
  2023-11-09 22:07 [PATCH RFC v4 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
                   ` (2 preceding siblings ...)
  2023-11-09 22:07 ` [PATCH RFC v4 3/6] cxl/events: Separate UUID from event structures Ira Weiny
@ 2023-11-09 22:07 ` Ira Weiny
  2023-11-09 22:07 ` [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events Ira Weiny
  2023-11-09 22:07 ` [PATCH RFC v4 6/6] cxl/memdev: Register for and process CPER events Ira Weiny
  5 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-09 22:07 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli, Shiju Jose
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

The CXL CPER and event log records share everything but a UUID/GUID in
their structures.

Define a cxl_event union without the UUID/GUID to be shared between the
CPER and event log record formats.  Adjust the code to use this union.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v3:
[iweiny: adjust to keeping uuid in trace events]

Changes from RFC v2:
[iweiny: new patch]
---
 drivers/cxl/core/mbox.c      | 32 +++++++++++++-------------------
 drivers/cxl/core/trace.h     |  8 ++++----
 include/linux/cxl-event.h    | 23 +++++++++++++++++------
 tools/testing/cxl/test/mem.c | 31 ++++++++++++++++++-------------
 4 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 74446c6a990a..ac44f3659d84 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -864,26 +864,17 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 				   enum cxl_event_log_type type,
 				   struct cxl_event_record_raw *record)
 {
+	union cxl_event *evt = &record->event;
 	uuid_t *id = &record->id;
 
-	if (uuid_equal(id, &gen_media_event_uuid)) {
-		struct cxl_event_gen_media *rec =
-				(struct cxl_event_gen_media *)record;
-
-		trace_cxl_general_media(cxlmd, type, id, rec);
-	} else if (uuid_equal(id, &dram_event_uuid)) {
-		struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
-
-		trace_cxl_dram(cxlmd, type, id, rec);
-	} else if (uuid_equal(id, &mem_mod_event_uuid)) {
-		struct cxl_event_mem_module *rec =
-				(struct cxl_event_mem_module *)record;
-
-		trace_cxl_memory_module(cxlmd, type, id, rec);
-	} else {
-		/* For unknown record types print just the header */
-		trace_cxl_generic_event(cxlmd, type, id, record);
-	}
+	if (uuid_equal(id, &gen_media_event_uuid))
+		trace_cxl_general_media(cxlmd, type, id, &evt->gen_media);
+	else if (uuid_equal(id, &dram_event_uuid))
+		trace_cxl_dram(cxlmd, type, id, &evt->dram);
+	else if (uuid_equal(id, &mem_mod_event_uuid))
+		trace_cxl_memory_module(cxlmd, type, id, &evt->mem_module);
+	else
+		trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
 }
 
 static int cxl_clear_event_record(struct cxl_memdev_state *mds,
@@ -926,7 +917,10 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
 	 */
 	i = 0;
 	for (cnt = 0; cnt < total; cnt++) {
-		payload->handles[i++] = get_pl->records[cnt].hdr.handle;
+		struct cxl_event_record_raw *raw = &get_pl->records[cnt];
+		struct cxl_event_generic *gen = &raw->event.generic;
+
+		payload->handles[i++] = gen->hdr.handle;
 		dev_dbg(mds->cxlds.dev, "Event log '%d': Clearing %u\n", log,
 			le16_to_cpu(payload->handles[i]));
 
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 2aef185f4cd0..3d007f0472f2 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -225,9 +225,9 @@ TRACE_EVENT(cxl_overflow,
 TRACE_EVENT(cxl_generic_event,
 
 	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
-		 const uuid_t *uuid, struct cxl_event_record_raw *rec),
+		 const uuid_t *uuid, struct cxl_event_generic *gen_rec),
 
-	TP_ARGS(cxlmd, log, uuid, rec),
+	TP_ARGS(cxlmd, log, uuid, gen_rec),
 
 	TP_STRUCT__entry(
 		CXL_EVT_TP_entry
@@ -235,8 +235,8 @@ TRACE_EVENT(cxl_generic_event,
 	),
 
 	TP_fast_assign(
-		CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
-		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
+		CXL_EVT_TP_fast_assign(cxlmd, log, uuid, gen_rec->hdr);
+		memcpy(__entry->data, gen_rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
 	),
 
 	CXL_EVT_TP_printk("%s",
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index ebb00ead1496..6b689e1efc78 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -18,13 +18,8 @@ struct cxl_event_record_hdr {
 	u8 reserved[15];
 } __packed;
 
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
 #define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
-	uuid_t id;
+struct cxl_event_generic {
 	struct cxl_event_record_hdr hdr;
 	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
 } __packed;
@@ -97,4 +92,20 @@ struct cxl_event_mem_module {
 	u8 reserved[0x3d];
 } __packed;
 
+union cxl_event {
+	struct cxl_event_generic generic;
+	struct cxl_event_gen_media gen_media;
+	struct cxl_event_dram dram;
+	struct cxl_event_mem_module mem_module;
+};
+
+/*
+ * Common Event Record Format; in event logs
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_raw {
+	uuid_t id;
+	union cxl_event event;
+} __packed;
+
 #endif /* _LINUX_CXL_EVENT_H */
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 85862be00c48..2535eb182ece 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -244,7 +244,8 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
 	for (i = 0; i < CXL_TEST_EVENT_CNT && !event_log_empty(log); i++) {
 		memcpy(&pl->records[i], event_get_current(log),
 		       sizeof(pl->records[i]));
-		pl->records[i].hdr.handle = event_get_cur_event_handle(log);
+		pl->records[i].event.generic.hdr.handle =
+				event_get_cur_event_handle(log);
 		log->cur_idx++;
 	}
 
@@ -332,25 +333,29 @@ static void cxl_mock_event_trigger(struct device *dev)
 struct cxl_event_record_raw maint_needed = {
 	.id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
 			0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
-	.hdr = {
-		.length = sizeof(struct cxl_event_record_raw),
-		.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0xa5b6),
+	.event.generic = {
+		.hdr = {
+			.length = sizeof(struct cxl_event_record_raw),
+			.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0xa5b6),
+		},
+		.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 	},
-	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 };
 
 struct cxl_event_record_raw hardware_replace = {
 	.id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
 			0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
-	.hdr = {
-		.length = sizeof(struct cxl_event_record_raw),
-		.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0xb6a5),
+	.event.generic = {
+		.hdr = {
+			.length = sizeof(struct cxl_event_record_raw),
+			.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0xb6a5),
+		},
+		.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 	},
-	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 };
 
 struct cxl_test_gen_media {

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events
  2023-11-09 22:07 [PATCH RFC v4 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
                   ` (3 preceding siblings ...)
  2023-11-09 22:07 ` [PATCH RFC v4 4/6] cxl/events: Create a CXL event union Ira Weiny
@ 2023-11-09 22:07 ` Ira Weiny
  2023-11-28 20:32   ` Smita Koralahalli
  2023-11-09 22:07 ` [PATCH RFC v4 6/6] cxl/memdev: Register for and process CPER events Ira Weiny
  5 siblings, 1 reply; 15+ messages in thread
From: Ira Weiny @ 2023-11-09 22:07 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli, Shiju Jose
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

BIOS can configure memory devices as firmware first.  This will send CXL
events to the firmware instead of the OS.  The firmware can then send
these events to the OS via UEFI.

UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
format for CXL Component Events.  The format is mostly the same as the
CXL Common Event Record Format.  The difference is a GUID is used in
the Section Type to identify the event type.

Add EFI support to detect CXL CPER records and call a notifier chain
with the record data blobs to be processed by the CXL code.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v3
[Smita: ensure cper_cxl_event_rec is packed]

Changes from RFC v2
[djbw: use common event structures]
[djbw: remove print in core cper code]
[djbw: export register call as NS_GPL]
[iweiny: fix 0day issues]

Changes from RFC v1
[iweiny: use an enum for know record types and skip converting GUID to UUID]
[iweiny: commit to the UUID not being part of the event record data]
[iweiny: use defines for GUID definitions]
---
 drivers/firmware/efi/cper.c     | 15 +++++++++++++
 drivers/firmware/efi/cper_cxl.c | 40 +++++++++++++++++++++++++++++++++
 drivers/firmware/efi/cper_cxl.h | 29 ++++++++++++++++++++++++
 include/linux/cxl-event.h       | 49 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 133 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 35c37f667781..3d0b60144a07 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -22,6 +22,7 @@
 #include <linux/aer.h>
 #include <linux/printk.h>
 #include <linux/bcd.h>
+#include <linux/cxl-event.h>
 #include <acpi/ghes.h>
 #include <ras/ras_event.h>
 #include "cper_cxl.h"
@@ -607,6 +608,20 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
 			cper_print_prot_err(newpfx, prot_err);
 		else
 			goto err_section_too_small;
+	} else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA) ||
+		   guid_equal(sec_type, &CPER_SEC_CXL_DRAM) ||
+		   guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE)) {
+		struct cper_cxl_event_rec *rec = acpi_hest_get_payload(gdata);
+
+		if (rec->hdr.length <= sizeof(rec->hdr))
+			goto err_section_too_small;
+
+		if (rec->hdr.length > sizeof(*rec)) {
+			pr_err(FW_WARN "error section length is too big\n");
+			return;
+		}
+
+		cper_post_cxl_event(newpfx, sec_type, rec);
 	} else {
 		const void *err = acpi_hest_get_payload(gdata);
 
diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
index a55771b99a97..bf642962a7ba 100644
--- a/drivers/firmware/efi/cper_cxl.c
+++ b/drivers/firmware/efi/cper_cxl.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/cper.h>
+#include <linux/cxl-event.h>
 #include "cper_cxl.h"
 
 #define PROT_ERR_VALID_AGENT_TYPE		BIT_ULL(0)
@@ -187,3 +188,42 @@ void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_e
 			       sizeof(cxl_ras->header_log), 0);
 	}
 }
+
+/* CXL CPER notifier chain */
+static BLOCKING_NOTIFIER_HEAD(cxl_cper_chain_head);
+
+void cper_post_cxl_event(const char *pfx, guid_t *sec_type,
+			 struct cper_cxl_event_rec *rec)
+{
+	struct cxl_cper_notifier_data nd = {
+		.rec = rec,
+	};
+
+	if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
+		pr_err(FW_WARN "cxl event no Component Event Log present\n");
+		return;
+	}
+
+	if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA))
+		nd.event_type = CXL_CPER_EVENT_GEN_MEDIA;
+	else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM))
+		nd.event_type = CXL_CPER_EVENT_DRAM;
+	else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE))
+		nd.event_type = CXL_CPER_EVENT_MEM_MODULE;
+
+	if (blocking_notifier_call_chain(&cxl_cper_chain_head, 0, (void *)&nd)
+			== NOTIFY_BAD)
+		pr_err(FW_WARN "cxl event notifier chain failed\n");
+}
+
+int register_cxl_cper_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&cxl_cper_chain_head, nb);
+}
+EXPORT_SYMBOL_NS_GPL(register_cxl_cper_notifier, CXL);
+
+void unregister_cxl_cper_notifier(struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&cxl_cper_chain_head, nb);
+}
+EXPORT_SYMBOL_NS_GPL(unregister_cxl_cper_notifier, CXL);
diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
index 86bfcf7909ec..aa3d36493586 100644
--- a/drivers/firmware/efi/cper_cxl.h
+++ b/drivers/firmware/efi/cper_cxl.h
@@ -10,11 +10,38 @@
 #ifndef LINUX_CPER_CXL_H
 #define LINUX_CPER_CXL_H
 
+#include <linux/cxl-event.h>
+
 /* CXL Protocol Error Section */
 #define CPER_SEC_CXL_PROT_ERR						\
 	GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78,	\
 		  0x4B, 0x77, 0x10, 0x48)
 
+/* CXL Event record UUIDs are formated at GUIDs and reported in section type */
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CPER_SEC_CXL_GEN_MEDIA						\
+	GUID_INIT(0xfbcd0a77, 0xc260, 0x417f,				\
+		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CPER_SEC_CXL_DRAM						\
+	GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,				\
+		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CPER_SEC_CXL_MEM_MODULE						\
+	GUID_INIT(0xfe927475, 0xdd59, 0x4339,				\
+		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
 #pragma pack(1)
 
 /* Compute Express Link Protocol Error Section, UEFI v2.10 sec N.2.13 */
@@ -62,5 +89,7 @@ struct cper_sec_prot_err {
 #pragma pack()
 
 void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err);
+void cper_post_cxl_event(const char *pfx, guid_t *sec_type,
+			 struct cper_cxl_event_rec *rec);
 
 #endif //__CPER_CXL_
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 6b689e1efc78..733ab2ab8639 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -108,4 +108,53 @@ struct cxl_event_record_raw {
 	union cxl_event event;
 } __packed;
 
+enum cxl_event_type {
+	CXL_CPER_EVENT_GEN_MEDIA,
+	CXL_CPER_EVENT_DRAM,
+	CXL_CPER_EVENT_MEM_MODULE,
+};
+
+#define CPER_CXL_DEVICE_ID_VALID		BIT(0)
+#define CPER_CXL_DEVICE_SN_VALID		BIT(1)
+#define CPER_CXL_COMP_EVENT_LOG_VALID		BIT(2)
+struct cper_cxl_event_rec {
+	struct {
+		u32 length;
+		u64 validation_bits;
+		struct cper_cxl_event_devid {
+			u16 vendor_id;
+			u16 device_id;
+			u8 func_num;
+			u8 device_num;
+			u8 bus_num;
+			u16 segment_num;
+			u16 slot_num; /* bits 2:0 reserved */
+			u8 reserved;
+		} device_id;
+		struct cper_cxl_event_sn {
+			u32 lower_dw;
+			u32 upper_dw;
+		} dev_serial_num;
+	} hdr;
+
+	union cxl_event event;
+} __packed;
+
+struct cxl_cper_notifier_data {
+	enum cxl_event_type event_type;
+	struct cper_cxl_event_rec *rec;
+};
+
+#ifdef CONFIG_UEFI_CPER
+int register_cxl_cper_notifier(struct notifier_block *nb);
+void unregister_cxl_cper_notifier(struct notifier_block *nb);
+#else
+static inline int register_cxl_cper_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+
+static inline void unregister_cxl_cper_notifier(struct notifier_block *nb) { }
+#endif
+
 #endif /* _LINUX_CXL_EVENT_H */

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v4 6/6] cxl/memdev: Register for and process CPER events
  2023-11-09 22:07 [PATCH RFC v4 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
                   ` (4 preceding siblings ...)
  2023-11-09 22:07 ` [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events Ira Weiny
@ 2023-11-09 22:07 ` Ira Weiny
  5 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-09 22:07 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli, Shiju Jose
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

If the firmware has configured CXL event support to be firmware first
the OS can process those events through CPER records.  The CXL layer has
unique DPA to HPA knowledge and standard event trace parsing in place.
Matching memory devices to the CPER records can be done via Bus, Device,
Function which is part of the CPER record header.

Detect firmware first, register a notifier callback for each memdev, and
trace events when they match the proper device.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v3:
[smita: use PCI_DEVFN()]
[iweiny: put uuid back in]

Changes from RFC v2:
[smita/djbw: use BDF not serial number for memdev ID]
[smita: eliminate memcpy]
[djbw: adjust to new structures]
[iweiny: fix 0day errors]

Changes from RFC v1:
[iweiny: adjust to cper_event enum instead of converting guids]
---
 drivers/cxl/core/mbox.c | 31 +++++++++++++++++++++-----
 drivers/cxl/cxlmem.h    |  6 +++++
 drivers/cxl/pci.c       | 58 ++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index ac44f3659d84..7c8691e392ae 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -860,9 +860,30 @@ static const uuid_t mem_mod_event_uuid =
 	UUID_INIT(0xfe927475, 0xdd59, 0x4339,
 		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
 
-static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
-				   enum cxl_event_log_type type,
-				   struct cxl_event_record_raw *record)
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+			    enum cxl_event_log_type type,
+			    enum cxl_event_type event_type,
+			    union cxl_event *event)
+{
+	switch (event_type) {
+	case CXL_CPER_EVENT_GEN_MEDIA:
+		trace_cxl_general_media(cxlmd, type, &gen_media_event_uuid,
+					&event->gen_media);
+		break;
+	case CXL_CPER_EVENT_DRAM:
+		trace_cxl_dram(cxlmd, type, &dram_event_uuid, &event->dram);
+		break;
+	case CXL_CPER_EVENT_MEM_MODULE:
+		trace_cxl_memory_module(cxlmd, type, &mem_mod_event_uuid,
+					&event->mem_module);
+		break;
+	}
+}
+EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
+
+static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+				     enum cxl_event_log_type type,
+				     struct cxl_event_record_raw *record)
 {
 	union cxl_event *evt = &record->event;
 	uuid_t *id = &record->id;
@@ -985,8 +1006,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
 			break;
 
 		for (i = 0; i < nr_rec; i++)
-			cxl_event_trace_record(cxlmd, type,
-					       &payload->records[i]);
+			__cxl_event_trace_record(cxlmd, type,
+						 &payload->records[i]);
 
 		if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
 			trace_cxl_overflow(cxlmd, type, payload);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index d694820ce8f5..b85cbf421cf4 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -478,6 +478,8 @@ struct cxl_memdev_state {
 	struct cxl_security_state security;
 	struct cxl_fw_state fw;
 
+	struct notifier_block cxl_cper_nb;
+
 	struct rcuwait mbox_wait;
 	int (*mbox_send)(struct cxl_memdev_state *mds,
 			 struct cxl_mbox_cmd *cmd);
@@ -775,6 +777,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 				  unsigned long *cmds);
 void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+			    enum cxl_event_log_type type,
+			    enum cxl_event_type event_type,
+			    union cxl_event *event);
 int cxl_set_timestamp(struct cxl_memdev_state *mds);
 int cxl_poison_state_init(struct cxl_memdev_state *mds);
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 44a21ab7add5..09d9e060f0f2 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#include <asm-generic/unaligned.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/moduleparam.h>
 #include <linux/module.h>
@@ -748,6 +749,59 @@ static bool cxl_event_int_is_fw(u8 setting)
 	return mode == CXL_INT_FW;
 }
 
+#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
+static int cxl_cper_event_call(struct notifier_block *nb, unsigned long action,
+			       void *data)
+{
+	struct cxl_cper_notifier_data *nd = data;
+	struct cper_cxl_event_devid *device_id = &nd->rec->hdr.device_id;
+	enum cxl_event_log_type log_type;
+	struct cxl_memdev_state *mds;
+	struct cxl_dev_state *cxlds;
+	struct pci_dev *pdev;
+	unsigned int devfn;
+	u32 hdr_flags;
+
+	mds = container_of(nb, struct cxl_memdev_state, cxl_cper_nb);
+
+	devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
+	pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
+					   device_id->bus_num, devfn);
+	cxlds = pci_get_drvdata(pdev);
+	if (cxlds != &mds->cxlds) {
+		pci_dev_put(pdev);
+		return NOTIFY_DONE;
+	}
+
+	/* Fabricate a log type */
+	hdr_flags = get_unaligned_le24(nd->rec->event.generic.hdr.flags);
+	log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
+
+	cxl_event_trace_record(mds->cxlds.cxlmd, log_type, nd->event_type,
+			       &nd->rec->event);
+	pci_dev_put(pdev);
+	return NOTIFY_OK;
+}
+
+static void cxl_unregister_cper_events(void *_mds)
+{
+	struct cxl_memdev_state *mds = _mds;
+
+	unregister_cxl_cper_notifier(&mds->cxl_cper_nb);
+}
+
+static void register_cper_events(struct cxl_memdev_state *mds)
+{
+	mds->cxl_cper_nb.notifier_call = cxl_cper_event_call;
+
+	if (register_cxl_cper_notifier(&mds->cxl_cper_nb)) {
+		dev_err(mds->cxlds.dev, "CPER registration failed\n");
+		return;
+	}
+
+	devm_add_action_or_reset(mds->cxlds.dev, cxl_unregister_cper_events, mds);
+}
+
 static int cxl_event_config(struct pci_host_bridge *host_bridge,
 			    struct cxl_memdev_state *mds)
 {
@@ -758,8 +812,10 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
 	 * When BIOS maintains CXL error reporting control, it will process
 	 * event records.  Only one agent can do so.
 	 */
-	if (!host_bridge->native_cxl_error)
+	if (!host_bridge->native_cxl_error) {
+		register_cper_events(mds);
 		return 0;
+	}
 
 	rc = cxl_mem_alloc_event_buf(mds);
 	if (rc)

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events
  2023-11-09 22:07 ` [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events Ira Weiny
@ 2023-11-28 20:32   ` Smita Koralahalli
  2023-11-29 14:28     ` Ira Weiny
  0 siblings, 1 reply; 15+ messages in thread
From: Smita Koralahalli @ 2023-11-28 20:32 UTC (permalink / raw)
  To: Ira Weiny, Dan Williams, Jonathan Cameron, Shiju Jose
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

Hi Ira,

I tested this out. Just one correction below to make it work.

On 11/9/2023 2:07 PM, Ira Weiny wrote:
> BIOS can configure memory devices as firmware first.  This will send CXL
> events to the firmware instead of the OS.  The firmware can then send
> these events to the OS via UEFI.
> 
> UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> format for CXL Component Events.  The format is mostly the same as the
> CXL Common Event Record Format.  The difference is a GUID is used in
> the Section Type to identify the event type.
> 
> Add EFI support to detect CXL CPER records and call a notifier chain
> with the record data blobs to be processed by the CXL code.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from RFC v3
> [Smita: ensure cper_cxl_event_rec is packed]
> 
> Changes from RFC v2
> [djbw: use common event structures]
> [djbw: remove print in core cper code]
> [djbw: export register call as NS_GPL]
> [iweiny: fix 0day issues]
> 
> Changes from RFC v1
> [iweiny: use an enum for know record types and skip converting GUID to UUID]
> [iweiny: commit to the UUID not being part of the event record data]
> [iweiny: use defines for GUID definitions]
> ---

[snip]

> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 6b689e1efc78..733ab2ab8639 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -108,4 +108,53 @@ struct cxl_event_record_raw {
>   	union cxl_event event;
>   } __packed;
>   
> +enum cxl_event_type {
> +	CXL_CPER_EVENT_GEN_MEDIA,
> +	CXL_CPER_EVENT_DRAM,
> +	CXL_CPER_EVENT_MEM_MODULE,
> +};
> +
> +#define CPER_CXL_DEVICE_ID_VALID		BIT(0)
> +#define CPER_CXL_DEVICE_SN_VALID		BIT(1)
> +#define CPER_CXL_COMP_EVENT_LOG_VALID		BIT(2)
> +struct cper_cxl_event_rec {
> +	struct {
> +		u32 length;
> +		u64 validation_bits;
> +		struct cper_cxl_event_devid {
> +			u16 vendor_id;
> +			u16 device_id;
> +			u8 func_num;
> +			u8 device_num;
> +			u8 bus_num;
> +			u16 segment_num;
> +			u16 slot_num; /* bits 2:0 reserved */
> +			u8 reserved;
> +		} device_id;
> +		struct cper_cxl_event_sn {
> +			u32 lower_dw;
> +			u32 upper_dw;
> +		} dev_serial_num;
> +	} hdr;
> +
> +	union cxl_event event;
> +} __packed;

__packed attribute just for cper_cxl_event_rec still fails to properly 
align structure elements. Looks like, __packed attribute is needed for 
all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside 
cper_cxl_event_rec.

Seems easier to use global pragma instead.. I could test and obtain the 
output as expected using pragma..

Thanks,
Smita

> +
> +struct cxl_cper_notifier_data {
> +	enum cxl_event_type event_type;
> +	struct cper_cxl_event_rec *rec;
> +};
> +
> +#ifdef CONFIG_UEFI_CPER
> +int register_cxl_cper_notifier(struct notifier_block *nb);
> +void unregister_cxl_cper_notifier(struct notifier_block *nb);
> +#else
> +static inline int register_cxl_cper_notifier(struct notifier_block *nb)
> +{
> +	return 0;
> +}
> +
> +static inline void unregister_cxl_cper_notifier(struct notifier_block *nb) { }
> +#endif
> +
>   #endif /* _LINUX_CXL_EVENT_H */
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events
  2023-11-28 20:32   ` Smita Koralahalli
@ 2023-11-29 14:28     ` Ira Weiny
  2023-12-13 17:13       ` Jonathan Cameron
  0 siblings, 1 reply; 15+ messages in thread
From: Ira Weiny @ 2023-11-29 14:28 UTC (permalink / raw)
  To: Smita Koralahalli, Ira Weiny, Dan Williams, Jonathan Cameron, Shiju Jose
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

Smita Koralahalli wrote:
> Hi Ira,
> 
> I tested this out. Just one correction below to make it work.
> 

[snip]

> > +
> > +#define CPER_CXL_DEVICE_ID_VALID		BIT(0)
> > +#define CPER_CXL_DEVICE_SN_VALID		BIT(1)
> > +#define CPER_CXL_COMP_EVENT_LOG_VALID		BIT(2)
> > +struct cper_cxl_event_rec {
> > +	struct {
> > +		u32 length;
> > +		u64 validation_bits;
> > +		struct cper_cxl_event_devid {
> > +			u16 vendor_id;
> > +			u16 device_id;
> > +			u8 func_num;
> > +			u8 device_num;
> > +			u8 bus_num;
> > +			u16 segment_num;
> > +			u16 slot_num; /* bits 2:0 reserved */
> > +			u8 reserved;
> > +		} device_id;
> > +		struct cper_cxl_event_sn {
> > +			u32 lower_dw;
> > +			u32 upper_dw;
> > +		} dev_serial_num;
> > +	} hdr;
> > +
> > +	union cxl_event event;
> > +} __packed;
> 
> __packed attribute just for cper_cxl_event_rec still fails to properly 
> align structure elements. Looks like, __packed attribute is needed for 
> all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside 
> cper_cxl_event_rec.
> 
> Seems easier to use global pragma instead.. I could test and obtain the 
> output as expected using pragma..

I did not know that was acceptable in the kernel but I see you used it in
cper_cxl.h before...

Ok I'll do that and spin again.

Thanks so much for testing this!  I was out last week and still don't have
a test environment.

Ira

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events
  2023-11-29 14:28     ` Ira Weiny
@ 2023-12-13 17:13       ` Jonathan Cameron
  2023-12-13 22:28         ` Ira Weiny
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Cameron @ 2023-12-13 17:13 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Smita Koralahalli, Dan Williams, Shiju Jose, Yazen Ghannam,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

On Wed, 29 Nov 2023 06:28:01 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> Smita Koralahalli wrote:
> > Hi Ira,
> > 
> > I tested this out. Just one correction below to make it work.
> >   
> 
> [snip]
> 
> > > +
> > > +#define CPER_CXL_DEVICE_ID_VALID		BIT(0)
> > > +#define CPER_CXL_DEVICE_SN_VALID		BIT(1)
> > > +#define CPER_CXL_COMP_EVENT_LOG_VALID		BIT(2)
> > > +struct cper_cxl_event_rec {
> > > +	struct {
> > > +		u32 length;
> > > +		u64 validation_bits;
> > > +		struct cper_cxl_event_devid {
> > > +			u16 vendor_id;
> > > +			u16 device_id;
> > > +			u8 func_num;
> > > +			u8 device_num;
> > > +			u8 bus_num;
> > > +			u16 segment_num;
> > > +			u16 slot_num; /* bits 2:0 reserved */
> > > +			u8 reserved;
> > > +		} device_id;
> > > +		struct cper_cxl_event_sn {
> > > +			u32 lower_dw;
> > > +			u32 upper_dw;
> > > +		} dev_serial_num;
> > > +	} hdr;
> > > +
> > > +	union cxl_event event;
> > > +} __packed;  
> > 
> > __packed attribute just for cper_cxl_event_rec still fails to properly 
> > align structure elements. Looks like, __packed attribute is needed for 
> > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside 
> > cper_cxl_event_rec.
> > 
> > Seems easier to use global pragma instead.. I could test and obtain the 
> > output as expected using pragma..  
> 
> I did not know that was acceptable in the kernel but I see you used it in
> cper_cxl.h before...
> 
> Ok I'll do that and spin again.
> 
> Thanks so much for testing this!  I was out last week and still don't have
> a test environment.

Easy to hack into QEMU :)  Hmm. I have a CCIX patch set from years ago
somewhere that does similar. Would be easy to repurposed. Looks like
I never published them (just told people to ask if they wanted them :( ).

Anyhow, if useful I can dig them out.

> 
> Ira


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events
  2023-12-13 17:13       ` Jonathan Cameron
@ 2023-12-13 22:28         ` Ira Weiny
  2023-12-19 17:12           ` Jonathan Cameron
  0 siblings, 1 reply; 15+ messages in thread
From: Ira Weiny @ 2023-12-13 22:28 UTC (permalink / raw)
  To: Jonathan Cameron, Ira Weiny
  Cc: Smita Koralahalli, Dan Williams, Shiju Jose, Yazen Ghannam,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

Jonathan Cameron wrote:
> On Wed, 29 Nov 2023 06:28:01 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 

[snip]

> > > __packed attribute just for cper_cxl_event_rec still fails to properly 
> > > align structure elements. Looks like, __packed attribute is needed for 
> > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside 
> > > cper_cxl_event_rec.
> > > 
> > > Seems easier to use global pragma instead.. I could test and obtain the 
> > > output as expected using pragma..  
> > 
> > I did not know that was acceptable in the kernel but I see you used it in
> > cper_cxl.h before...
> > 
> > Ok I'll do that and spin again.
> > 
> > Thanks so much for testing this!  I was out last week and still don't have
> > a test environment.
> 
> Easy to hack into QEMU :)  Hmm. I have a CCIX patch set from years ago
> somewhere that does similar. Would be easy to repurposed. Looks like
> I never published them (just told people to ask if they wanted them :( ).
> 
> Anyhow, if useful I can dig them out.

If you have a branch with them with a somewhat latest qemu that could work
too.

Ira

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events
  2023-12-13 22:28         ` Ira Weiny
@ 2023-12-19 17:12           ` Jonathan Cameron
  2023-12-20 23:48             ` Ira Weiny
  2024-01-03 17:50             ` Jonathan Cameron
  0 siblings, 2 replies; 15+ messages in thread
From: Jonathan Cameron @ 2023-12-19 17:12 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Smita Koralahalli, Dan Williams, Shiju Jose, Yazen Ghannam,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

On Wed, 13 Dec 2023 14:28:03 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Wed, 29 Nov 2023 06:28:01 -0800
> > Ira Weiny <ira.weiny@intel.com> wrote:
> >   
> 
> [snip]
> 
> > > > __packed attribute just for cper_cxl_event_rec still fails to properly 
> > > > align structure elements. Looks like, __packed attribute is needed for 
> > > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside 
> > > > cper_cxl_event_rec.
> > > > 
> > > > Seems easier to use global pragma instead.. I could test and obtain the 
> > > > output as expected using pragma..    
> > > 
> > > I did not know that was acceptable in the kernel but I see you used it in
> > > cper_cxl.h before...
> > > 
> > > Ok I'll do that and spin again.
> > > 
> > > Thanks so much for testing this!  I was out last week and still don't have
> > > a test environment.  
> > 
> > Easy to hack into QEMU :)  Hmm. I have a CCIX patch set from years ago
> > somewhere that does similar. Would be easy to repurposed. Looks like
> > I never published them (just told people to ask if they wanted them :( ).
> > 
> > Anyhow, if useful I can dig them out.  
> 
> If you have a branch with them with a somewhat latest qemu that could work
> too.
They are ancient and based on GHES emulation that got reworked before being
merged. I had a quick go at a forwards port but this is a bigger job than
I expected. May be a little while :(

Jonathan

> 
> Ira


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events
  2023-12-19 17:12           ` Jonathan Cameron
@ 2023-12-20 23:48             ` Ira Weiny
  2024-01-03 17:50             ` Jonathan Cameron
  1 sibling, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-12-20 23:48 UTC (permalink / raw)
  To: Jonathan Cameron, Ira Weiny
  Cc: Smita Koralahalli, Dan Williams, Shiju Jose, Yazen Ghannam,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 14:28:03 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > Jonathan Cameron wrote:
> > > On Wed, 29 Nov 2023 06:28:01 -0800
> > > Ira Weiny <ira.weiny@intel.com> wrote:
> > >   
> > 
> > [snip]
> > 
> > > > > __packed attribute just for cper_cxl_event_rec still fails to properly 
> > > > > align structure elements. Looks like, __packed attribute is needed for 
> > > > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside 
> > > > > cper_cxl_event_rec.
> > > > > 
> > > > > Seems easier to use global pragma instead.. I could test and obtain the 
> > > > > output as expected using pragma..    
> > > > 
> > > > I did not know that was acceptable in the kernel but I see you used it in
> > > > cper_cxl.h before...
> > > > 
> > > > Ok I'll do that and spin again.
> > > > 
> > > > Thanks so much for testing this!  I was out last week and still don't have
> > > > a test environment.  
> > > 
> > > Easy to hack into QEMU :)  Hmm. I have a CCIX patch set from years ago
> > > somewhere that does similar. Would be easy to repurposed. Looks like
> > > I never published them (just told people to ask if they wanted them :( ).
> > > 
> > > Anyhow, if useful I can dig them out.  
> > 
> > If you have a branch with them with a somewhat latest qemu that could work
> > too.
> They are ancient and based on GHES emulation that got reworked before being
> merged. I had a quick go at a forwards port but this is a bigger job than
> I expected. May be a little while :(
> 

Let's not waste the time on it then.  Dan and I would like to get this
merged in 6.8 if possible.

Ira

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events
  2023-12-19 17:12           ` Jonathan Cameron
  2023-12-20 23:48             ` Ira Weiny
@ 2024-01-03 17:50             ` Jonathan Cameron
  2024-01-03 20:40               ` Ira Weiny
  1 sibling, 1 reply; 15+ messages in thread
From: Jonathan Cameron @ 2024-01-03 17:50 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Smita Koralahalli, Dan Williams, Shiju Jose, Yazen Ghannam,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

On Tue, 19 Dec 2023 17:12:10 +0000
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Wed, 13 Dec 2023 14:28:03 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > Jonathan Cameron wrote:  
> > > On Wed, 29 Nov 2023 06:28:01 -0800
> > > Ira Weiny <ira.weiny@intel.com> wrote:
> > >     
> > 
> > [snip]
> >   
> > > > > __packed attribute just for cper_cxl_event_rec still fails to properly 
> > > > > align structure elements. Looks like, __packed attribute is needed for 
> > > > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside 
> > > > > cper_cxl_event_rec.
> > > > > 
> > > > > Seems easier to use global pragma instead.. I could test and obtain the 
> > > > > output as expected using pragma..      
> > > > 
> > > > I did not know that was acceptable in the kernel but I see you used it in
> > > > cper_cxl.h before...
> > > > 
> > > > Ok I'll do that and spin again.
> > > > 
> > > > Thanks so much for testing this!  I was out last week and still don't have
> > > > a test environment.    
> > > 
> > > Easy to hack into QEMU :)  Hmm. I have a CCIX patch set from years ago
> > > somewhere that does similar. Would be easy to repurposed. Looks like
> > > I never published them (just told people to ask if they wanted them :( ).
> > > 
> > > Anyhow, if useful I can dig them out.    
> > 
> > If you have a branch with them with a somewhat latest qemu that could work
> > too.  
> They are ancient and based on GHES emulation that got reworked before being
> merged. I had a quick go at a forwards port but this is a bigger job than
> I expected. May be a little while :(

Working again (embarrassingly I had the error source numbers reversed due
to a merge resolution that went wrong which took me a day to find). I'll flesh
out the injection but it will basically look like normal error injection
via qmp (json records) with a bonus parameter to stick them out as via
GHESv2 / CPER rather than AER internal error.  I've not figured out how
to wire HEST up for x86 emulation yet though so it's ARM virt only for now.
(HEST isn't created for x86 qemu machines whereas it is for arm virt with ras=on)
Obviously that emulation is wrong in all sorts of ways as I should be dealing
with firmware/OSPM negotiation and setting the messaging up etc but meh
- it works for exercising the code :)

On the plus side I get nice trace points using your series and Smita's one.
Quite a bit of data is 0s at the moment as I'm lazy and it's the end of the day
here - I'll fix that up later this week as I can see 'everything' in QEMU
and the register values etc are already handled via the native injection paths.

Jonathan

> 
> Jonathan
> 
> > 
> > Ira  
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events
  2024-01-03 17:50             ` Jonathan Cameron
@ 2024-01-03 20:40               ` Ira Weiny
  0 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2024-01-03 20:40 UTC (permalink / raw)
  To: Jonathan Cameron, Ira Weiny
  Cc: Smita Koralahalli, Dan Williams, Shiju Jose, Yazen Ghannam,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

Jonathan Cameron wrote:
> On Tue, 19 Dec 2023 17:12:10 +0000
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Wed, 13 Dec 2023 14:28:03 -0800
> > Ira Weiny <ira.weiny@intel.com> wrote:
> > 
> > > Jonathan Cameron wrote:  
> > > > On Wed, 29 Nov 2023 06:28:01 -0800
> > > > Ira Weiny <ira.weiny@intel.com> wrote:
> > > >     
> > > 
> > > [snip]
> > >   
> > > > > > __packed attribute just for cper_cxl_event_rec still fails to properly 
> > > > > > align structure elements. Looks like, __packed attribute is needed for 
> > > > > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside 
> > > > > > cper_cxl_event_rec.
> > > > > > 
> > > > > > Seems easier to use global pragma instead.. I could test and obtain the 
> > > > > > output as expected using pragma..      
> > > > > 
> > > > > I did not know that was acceptable in the kernel but I see you used it in
> > > > > cper_cxl.h before...
> > > > > 
> > > > > Ok I'll do that and spin again.
> > > > > 
> > > > > Thanks so much for testing this!  I was out last week and still don't have
> > > > > a test environment.    
> > > > 
> > > > Easy to hack into QEMU :)  Hmm. I have a CCIX patch set from years ago
> > > > somewhere that does similar. Would be easy to repurposed. Looks like
> > > > I never published them (just told people to ask if they wanted them :( ).
> > > > 
> > > > Anyhow, if useful I can dig them out.    
> > > 
> > > If you have a branch with them with a somewhat latest qemu that could work
> > > too.  
> > They are ancient and based on GHES emulation that got reworked before being
> > merged. I had a quick go at a forwards port but this is a bigger job than
> > I expected. May be a little while :(
> 
> Working again (embarrassingly I had the error source numbers reversed due
> to a merge resolution that went wrong which took me a day to find). I'll flesh
> out the injection but it will basically look like normal error injection
> via qmp (json records) with a bonus parameter to stick them out as via
> GHESv2 / CPER rather than AER internal error.  I've not figured out how
> to wire HEST up for x86 emulation yet though so it's ARM virt only for now.
> (HEST isn't created for x86 qemu machines whereas it is for arm virt with ras=on)
> Obviously that emulation is wrong in all sorts of ways as I should be dealing
> with firmware/OSPM negotiation and setting the messaging up etc but meh
> - it works for exercising the code :)
> 
> On the plus side I get nice trace points using your series and Smita's one.
> Quite a bit of data is 0s at the moment as I'm lazy and it's the end of the day
> here - I'll fix that up later this week as I can see 'everything' in QEMU
> and the register values etc are already handled via the native injection paths.

Thanks for the testing!
Ira

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-01-03 20:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-09 22:07 [PATCH RFC v4 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
2023-11-09 22:07 ` [PATCH RFC v4 1/6] cxl/trace: Pass uuid explicitly to event traces Ira Weiny
2023-11-09 22:07 ` [PATCH RFC v4 2/6] cxl/events: Promote CXL event structures to a core header Ira Weiny
2023-11-09 22:07 ` [PATCH RFC v4 3/6] cxl/events: Separate UUID from event structures Ira Weiny
2023-11-09 22:07 ` [PATCH RFC v4 4/6] cxl/events: Create a CXL event union Ira Weiny
2023-11-09 22:07 ` [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events Ira Weiny
2023-11-28 20:32   ` Smita Koralahalli
2023-11-29 14:28     ` Ira Weiny
2023-12-13 17:13       ` Jonathan Cameron
2023-12-13 22:28         ` Ira Weiny
2023-12-19 17:12           ` Jonathan Cameron
2023-12-20 23:48             ` Ira Weiny
2024-01-03 17:50             ` Jonathan Cameron
2024-01-03 20:40               ` Ira Weiny
2023-11-09 22:07 ` [PATCH RFC v4 6/6] cxl/memdev: Register for and process CPER events Ira Weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).