All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] CXL: Process event logs
@ 2022-11-10 18:57 ira.weiny
  2022-11-10 18:57 ` [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support ira.weiny
                   ` (10 more replies)
  0 siblings, 11 replies; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>


This code is well tested with the changes to Qemu I've made (See Below).

The series is in 5 parts:

	0) Davidlohrs irq patch modified for 16 vectors
	1) Base functionality
	2) Parsing specific events (Dynamic Capacity Event Record is defered)
	3) Event interrupt support
	4) cxl-test infrastructure for basic tests

While I believe this entire series is ready to be merged I realize that the
interrupt support may still have some discussion around it.  Therefor parts 1,
2, and 4 could be merged without irq support as cxl-test provides testing for
that.  Interrupt testing requires Qemu but it too is fully tested and ready to
go.


Changes from RFC v2
	Integrated Davidlohr's irq patch, allocate up to 16 vectors, and base
		my irq support on modifications to that patch.
	Smita
		Check event status before reading each log.
	Jonathan
		Process more than 1 record at a time
		Remove reserved fields
	Steven
		Prefix trace points with 'cxl_'
	Davidlohr
		PUll in his patch

Changes from RFC v1
	Add event irqs
	General simplification of the code.
	Resolve field alignment questions
	Update to rev 3.0 for comments and structures
	Add reserved fields and output them

Event records inform the OS of various device events.  Events are not needed
for any kernel operation but various user level software will want to track
events.

Add event reporting through the trace event mechanism.  On driver load read and
clear all device events.

Enable all event logs for interrupts and process each log on interrupt.


TESTING:

Testing of this was performed with additions to QEMU in the following repo:

	https://github.com/weiny2/qemu/tree/ira-cxl-events-latest

Changes to this repo are not finalized yet so I'm not posting those patches
right away.  But there is enough functionality added to further test this.

	1) event status register
	2) additional event injection capabilities
	3) Process more than 1 record at a time in Get/Clear mailbox commands

Davidlohr Bueso (1):
  cxl/pci: Add generic MSI-X/MSI irq support

Ira Weiny (10):
  cxl/mem: Implement Get Event Records command
  cxl/mem: Implement Clear Event Records command
  cxl/mem: Clear events on driver load
  cxl/mem: Trace General Media Event Record
  cxl/mem: Trace DRAM Event Record
  cxl/mem: Trace Memory Module Event Record
  cxl/mem: Wire up event interrupts
  cxl/test: Add generic mock events
  cxl/test: Add specific events
  cxl/test: Simulate event log overflow

 MAINTAINERS                     |   1 +
 drivers/cxl/core/mbox.c         | 219 ++++++++++++++
 drivers/cxl/cxl.h               |   8 +
 drivers/cxl/cxlmem.h            | 191 +++++++++++++
 drivers/cxl/cxlpci.h            |   6 +
 drivers/cxl/pci.c               | 167 +++++++++++
 include/trace/events/cxl.h      | 487 ++++++++++++++++++++++++++++++++
 include/uapi/linux/cxl_mem.h    |   4 +
 tools/testing/cxl/test/Kbuild   |   2 +-
 tools/testing/cxl/test/events.c | 329 +++++++++++++++++++++
 tools/testing/cxl/test/events.h |   9 +
 tools/testing/cxl/test/mem.c    |  35 +++
 12 files changed, 1457 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/cxl.h
 create mode 100644 tools/testing/cxl/test/events.c
 create mode 100644 tools/testing/cxl/test/events.h


base-commit: aae703b02f92bde9264366c545e87cec451de471
-- 
2.37.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-15 21:41   ` Dave Jiang
  2022-11-16 14:53   ` Jonathan Cameron
  2022-11-10 18:57 ` [PATCH 02/11] cxl/mem: Implement Get Event Records command ira.weiny
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Davidlohr Bueso, Bjorn Helgaas, Jonathan Cameron, Ira Weiny,
	Alison Schofield, Vishal Verma, Ben Widawsky, Steven Rostedt,
	linux-kernel, linux-cxl

From: Davidlohr Bueso <dave@stgolabs.net>

Currently the only CXL features targeted for irq support require their
message numbers to be within the first 16 entries.  The device may
however support less than 16 entries depending on the support it
provides.

Attempt to allocate these 16 irq vectors.  If the device supports less
then the PCI infrastructure will allocate that number.  Store the number
of vectors actually allocated in the device state for later use
by individual functions.

Upon successful allocation, users can plug in their respective isr at
any point thereafter, for example, if the irq setup is not done in the
PCI driver, such as the case of the CXL-PMU.

Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>

---
Changes from Ira
	Remove reviews
	Allocate up to a static 16 vectors.
	Change cover letter
---
 drivers/cxl/cxlmem.h |  3 +++
 drivers/cxl/cxlpci.h |  6 ++++++
 drivers/cxl/pci.c    | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 88e3a8e54b6a..b7b955ded3ac 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
  * @info: Cached DVSEC information about the device.
  * @serial: PCIe Device Serial Number
  * @doe_mbs: PCI DOE mailbox array
+ * @nr_irq_vecs: Number of MSI-X/MSI vectors available
  * @mbox_send: @dev specific transport for transmitting mailbox commands
  *
  * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
@@ -247,6 +248,8 @@ struct cxl_dev_state {
 
 	struct xarray doe_mbs;
 
+	int nr_irq_vecs;
+
 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
 };
 
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index eec597dbe763..b7f4e2f417d3 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -53,6 +53,12 @@
 #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
 #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
 
+/*
+ * NOTE: Currently all the functions which are enabled for CXL require their
+ * vectors to be in the first 16.  Use this as the max.
+ */
+#define CXL_PCI_REQUIRED_VECTORS 16
+
 /* Register Block Identifier (RBI) */
 enum cxl_regloc_type {
 	CXL_REGLOC_RBI_EMPTY = 0,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..62e560063e50 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
 	}
 }
 
+static void cxl_pci_free_irq_vectors(void *data)
+{
+	pci_free_irq_vectors(data);
+}
+
+static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
+{
+	struct device *dev = cxlds->dev;
+	struct pci_dev *pdev = to_pci_dev(dev);
+	int nvecs;
+	int rc;
+
+	nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
+				   PCI_IRQ_MSIX | PCI_IRQ_MSI);
+	if (nvecs < 0) {
+		dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
+		return;
+	}
+
+	rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
+	if (rc) {
+		dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
+		/* some got allocated, clean them up */
+		cxl_pci_free_irq_vectors(pdev);
+		return;
+	}
+
+	cxlds->nr_irq_vecs = nvecs;
+}
+
 static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct cxl_register_map map;
@@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		return rc;
 
+	cxl_pci_alloc_irq_vectors(cxlds);
+
 	cxlmd = devm_cxl_add_memdev(cxlds);
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
  2022-11-10 18:57 ` [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-15 21:54   ` Dave Jiang
  2022-11-16 15:19   ` Jonathan Cameron
  2022-11-10 18:57 ` [PATCH 03/11] cxl/mem: Implement Clear " ira.weiny
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL devices have multiple event logs which can be queried for CXL event
records.  Devices are required to support the storage of at least one
event record in each event log type.

Devices track event log overflow by incrementing a counter and tracking
the time of the first and last overflow event seen.

Software queries events via the Get Event Record mailbox command; CXL
rev 3.0 section 8.2.9.2.2.

Issue the Get Event Record mailbox command on driver load.  Trace each
record found with a generic record trace.  Trace any overflow
conditions.

The device can return up to 1MB worth of event records per query.  This
presents complications with allocating a huge buffers to potentially
capture all the records.  It is not anticipated that these event logs
will be very deep and reading them does not need to be performant.
Process only 3 records at a time.  3 records was chosen as it fits
comfortably on the stack to prevent dynamic allocation while still
cutting down on extra mailbox messages.

This patch traces a raw event record only and leaves the specific event
record types to subsequent patches.

Macros are created to use for tracing the common CXL Event header
fields.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Change from RFC v2:
	Support reading 3 events at once.
	Reverse Jonathan's suggestion and check for positive number of
		records.  Because the record count may have been
		returned as something > 3 based on what the device
		thinks it can send back even though the core Linux mbox
		processing truncates the data.
	Alison and Dave Jiang
		Change header uuid type to uuid_t for better user space
		processing
	Smita
		Check status reg before reading log.
	Steven
		Prefix all trace points with 'cxl_'
		Use static branch <trace>_enabled() calls
	Jonathan
		s/CXL_EVENT_TYPE_INFO/0
		s/{first,last}/{first,last}_ts
		Remove Reserved field from header
		Fix header issue for cxl_event_log_type_str()

Change from RFC:
	Remove redundant error message in get event records loop
	s/EVENT_RECORD_DATA_LENGTH/CXL_EVENT_RECORD_DATA_LENGTH
	Use hdr_uuid for the header UUID field
	Use cxl_event_log_type_str() for the trace events
	Create macros for the header fields and common entries of each event
	Add reserved buffer output dump
	Report error if event query fails
	Remove unused record_cnt variable
	Steven - reorder overflow record
		Remove NOTE about checkpatch
	Jonathan
		check for exactly 1 record
		s/v3.0/rev 3.0
		Use 3 byte fields for 24bit fields
		Add 3.0 Maintenance Operation Class
		Add Dynamic Capacity log type
		Fix spelling
	Dave Jiang/Dan/Alison
		s/cxl-event/cxl
		trace/events/cxl-events => trace/events/cxl.h
		s/cxl_event_overflow/overflow
		s/cxl_event/generic_event
---
 MAINTAINERS                  |   1 +
 drivers/cxl/core/mbox.c      |  70 +++++++++++++++++++
 drivers/cxl/cxl.h            |   8 +++
 drivers/cxl/cxlmem.h         |  73 ++++++++++++++++++++
 include/trace/events/cxl.h   | 127 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/cxl_mem.h |   1 +
 6 files changed, 280 insertions(+)
 create mode 100644 include/trace/events/cxl.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ca063a504026..4b7c6e3055c6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5223,6 +5223,7 @@ M:	Dan Williams <dan.j.williams@intel.com>
 L:	linux-cxl@vger.kernel.org
 S:	Maintained
 F:	drivers/cxl/
+F:	include/trace/events/cxl.h
 F:	include/uapi/linux/cxl_mem.h
 
 CONEXANT ACCESSRUNNER USB DRIVER
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 16176b9278b4..a908b95a7de4 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -7,6 +7,9 @@
 #include <cxlmem.h>
 #include <cxl.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/cxl.h>
+
 #include "core.h"
 
 static bool cxl_raw_allow_all;
@@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
 #endif
 	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
+	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
 	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
 	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
 	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -704,6 +708,72 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
 
+static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
+				    enum cxl_event_log_type type)
+{
+	struct cxl_get_event_payload payload;
+	u16 pl_nr;
+
+	do {
+		u8 log_type = type;
+		int rc;
+
+		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
+				       &log_type, sizeof(log_type),
+				       &payload, sizeof(payload));
+		if (rc) {
+			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
+				cxl_event_log_type_str(type), rc);
+			return;
+		}
+
+		pl_nr = le16_to_cpu(payload.record_count);
+		if (trace_cxl_generic_event_enabled()) {
+			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
+			int i;
+
+			for (i = 0; i < nr_rec; i++)
+				trace_cxl_generic_event(dev_name(cxlds->dev),
+							type,
+							&payload.record[i]);
+		}
+
+		if (trace_cxl_overflow_enabled() &&
+		    (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
+			trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
+
+	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
+		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
+}
+
+/**
+ * cxl_mem_get_event_records - Get Event Records from the device
+ * @cxlds: The device data for the operation
+ *
+ * Retrieve all event records available on the device and report them as trace
+ * events.
+ *
+ * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
+ */
+void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
+{
+	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
+
+	dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
+
+	if (status & CXLDEV_EVENT_STATUS_INFO)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
+	if (status & CXLDEV_EVENT_STATUS_WARN)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
+	if (status & CXLDEV_EVENT_STATUS_FAIL)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
+	if (status & CXLDEV_EVENT_STATUS_FATAL)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
+	if (status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
+
 /**
  * cxl_mem_get_partition_info - Get partition info
  * @cxlds: The device data for the operation
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index f680450f0b16..492cff1bea6d 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -132,6 +132,14 @@ static inline int ways_to_cxl(unsigned int ways, u8 *iw)
 #define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
 #define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
 
+/* CXL 3.0 8.2.8.3.1 Event Status Register */
+#define CXLDEV_DEV_EVENT_STATUS_OFFSET		0x00
+#define CXLDEV_EVENT_STATUS_INFO		BIT(0)
+#define CXLDEV_EVENT_STATUS_WARN		BIT(1)
+#define CXLDEV_EVENT_STATUS_FAIL		BIT(2)
+#define CXLDEV_EVENT_STATUS_FATAL		BIT(3)
+#define CXLDEV_EVENT_STATUS_DYNAMIC_CAP		BIT(4)
+
 /* CXL 2.0 8.2.8.4 Mailbox Registers */
 #define CXLDEV_MBOX_CAPS_OFFSET 0x00
 #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index b7b955ded3ac..da64ba0f156b 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -4,6 +4,7 @@
 #define __CXL_MEM_H__
 #include <uapi/linux/cxl_mem.h>
 #include <linux/cdev.h>
+#include <linux/uuid.h>
 #include "cxl.h"
 
 /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -256,6 +257,7 @@ struct cxl_dev_state {
 enum cxl_opcode {
 	CXL_MBOX_OP_INVALID		= 0x0000,
 	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
+	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
 	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
 	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
 	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
@@ -325,6 +327,76 @@ struct cxl_mbox_identify {
 	u8 qos_telemetry_caps;
 } __packed;
 
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+	uuid_t id;
+	u8 length;
+	u8 flags[3];
+	__le16 handle;
+	__le16 related_handle;
+	__le64 timestamp;
+	u8 maint_op_class;
+	u8 reserved[0xf];
+} __packed;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+	struct cxl_event_record_hdr hdr;
+	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * Get Event Records output payload
+ * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
+ */
+#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
+#define CXL_GET_EVENT_FLAG_MORE_RECORDS		BIT(1)
+#define CXL_GET_EVENT_NR_RECORDS		3
+struct cxl_get_event_payload {
+	u8 flags;
+	u8 reserved1;
+	__le16 overflow_err_count;
+	__le64 first_overflow_timestamp;
+	__le64 last_overflow_timestamp;
+	__le16 record_count;
+	u8 reserved2[0xa];
+	struct cxl_event_record_raw record[CXL_GET_EVENT_NR_RECORDS];
+} __packed;
+
+/*
+ * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
+ */
+enum cxl_event_log_type {
+	CXL_EVENT_TYPE_INFO = 0x00,
+	CXL_EVENT_TYPE_WARN,
+	CXL_EVENT_TYPE_FAIL,
+	CXL_EVENT_TYPE_FATAL,
+	CXL_EVENT_TYPE_DYNAMIC_CAP,
+	CXL_EVENT_TYPE_MAX
+};
+
+static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
+{
+	switch (type) {
+	case CXL_EVENT_TYPE_INFO:
+		return "Informational";
+	case CXL_EVENT_TYPE_WARN:
+		return "Warning";
+	case CXL_EVENT_TYPE_FAIL:
+		return "Failure";
+	case CXL_EVENT_TYPE_FATAL:
+		return "Fatal";
+	case CXL_EVENT_TYPE_DYNAMIC_CAP:
+		return "Dynamic Capacity";
+	default:
+		break;
+	}
+	return "<unknown>";
+}
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
@@ -384,6 +456,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
 struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
 void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
 void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
+void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
 void cxl_mem_active_dec(void);
diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
new file mode 100644
index 000000000000..60dec9a84918
--- /dev/null
+++ b/include/trace/events/cxl.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cxl
+
+#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
+#define _CXL_TRACE_EVENTS_H
+
+#include <asm-generic/unaligned.h>
+#include <linux/tracepoint.h>
+#include <cxlmem.h>
+
+TRACE_EVENT(cxl_overflow,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_get_event_payload *payload),
+
+	TP_ARGS(dev_name, log, payload),
+
+	TP_STRUCT__entry(
+		__string(dev_name, dev_name)
+		__field(int, log)
+		__field(u64, first_ts)
+		__field(u64, last_ts)
+		__field(u16, count)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name);
+		__entry->log = log;
+		__entry->count = le16_to_cpu(payload->overflow_err_count);
+		__entry->first_ts = le64_to_cpu(payload->first_overflow_timestamp);
+		__entry->last_ts = le64_to_cpu(payload->last_overflow_timestamp);
+	),
+
+	TP_printk("%s: EVENT LOG OVERFLOW log=%s : %u records from %llu to %llu",
+		__get_str(dev_name), cxl_event_log_type_str(__entry->log),
+		__entry->count, __entry->first_ts, __entry->last_ts)
+
+);
+
+/*
+ * Common Event Record Format
+ * CXL 3.0 section 8.2.9.2.1; Table 8-42
+ */
+#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
+#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
+#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
+#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
+#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
+	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"Permanent Condition"		}, \
+	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"Maintenance Needed"		}, \
+	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
+	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
+)
+
+/*
+ * Define macros for the common header of each CXL event.
+ *
+ * Tracepoints using these macros must do 3 things:
+ *
+ *	1) Add CXL_EVT_TP_entry to TP_STRUCT__entry
+ *	2) Use CXL_EVT_TP_fast_assign within TP_fast_assign;
+ *	   pass the dev_name, log, and CXL event header
+ *	3) Use CXL_EVT_TP_printk() instead of TP_printk()
+ *
+ * See the generic_event tracepoint as an example.
+ */
+#define CXL_EVT_TP_entry					\
+	__string(dev_name, dev_name)				\
+	__field(int, log)					\
+	__field_struct(uuid_t, hdr_uuid)			\
+	__field(u32, hdr_flags)					\
+	__field(u16, hdr_handle)				\
+	__field(u16, hdr_related_handle)			\
+	__field(u64, hdr_timestamp)				\
+	__field(u8, hdr_length)					\
+	__field(u8, hdr_maint_op_class)
+
+#define CXL_EVT_TP_fast_assign(dname, l, hdr)					\
+	__assign_str(dev_name, (dname));					\
+	__entry->log = (l);							\
+	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
+	__entry->hdr_length = (hdr).length;					\
+	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
+	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
+	__entry->hdr_related_handle = le16_to_cpu((hdr).related_handle);	\
+	__entry->hdr_timestamp = le64_to_cpu((hdr).timestamp);			\
+	__entry->hdr_maint_op_class = (hdr).maint_op_class
+
+
+#define CXL_EVT_TP_printk(fmt, ...) \
+	TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' "		\
+		"handle=%x related_handle=%x maint_op_class=%u"			\
+		" : " fmt,							\
+		__get_str(dev_name), cxl_event_log_type_str(__entry->log),	\
+		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
+		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
+		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
+		##__VA_ARGS__)
+
+TRACE_EVENT(cxl_generic_event,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_event_record_raw *rec),
+
+	TP_ARGS(dev_name, log, rec),
+
+	TP_STRUCT__entry(
+		CXL_EVT_TP_entry
+		__array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
+	),
+
+	TP_fast_assign(
+		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
+		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
+	),
+
+	CXL_EVT_TP_printk("%s",
+		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
+);
+
+#endif /* _CXL_TRACE_EVENTS_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE cxl
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index c71021a2a9ed..70459be5bdd4 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -24,6 +24,7 @@
 	___C(IDENTIFY, "Identify Command"),                               \
 	___C(RAW, "Raw device command"),                                  \
 	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
+	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
 	___C(GET_FW_INFO, "Get FW Info"),                                 \
 	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
 	___C(GET_LSA, "Get Label Storage Area"),                          \
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/11] cxl/mem: Implement Clear Event Records command
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
  2022-11-10 18:57 ` [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support ira.weiny
  2022-11-10 18:57 ` [PATCH 02/11] cxl/mem: Implement Get Event Records command ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-15 22:09   ` Dave Jiang
  2022-11-16 15:24   ` Jonathan Cameron
  2022-11-10 18:57 ` [PATCH 04/11] cxl/mem: Clear events on driver load ira.weiny
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Jonathan Cameron, Alison Schofield, Vishal Verma,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
command.  After an event record is read it needs to be cleared from the
event log.

Implement cxl_clear_event_record() and call it for each record retrieved
from the device.

Each record is cleared individually.  A clear all bit is specified but
events could arrive between a get and the final clear all operation.
Therefore each event is cleared specifically.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC:
	Jonathan
		Clean up init of payload and use return code.
		Also report any error to clear the event.
		s/v3.0/rev 3.0
---
 drivers/cxl/core/mbox.c      | 46 ++++++++++++++++++++++++++++++------
 drivers/cxl/cxlmem.h         | 15 ++++++++++++
 include/uapi/linux/cxl_mem.h |  1 +
 3 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index a908b95a7de4..f46558e09f08 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 #endif
 	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
 	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
+	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
 	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
 	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
 	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -708,6 +709,27 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
 
+static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
+				  enum cxl_event_log_type log,
+				  struct cxl_get_event_payload *get_pl, u16 nr)
+{
+	struct cxl_mbox_clear_event_payload payload = {
+		.event_log = log,
+		.nr_recs = nr,
+	};
+	int i;
+
+	for (i = 0; i < nr; i++) {
+		payload.handle[i] = get_pl->record[i].hdr.handle;
+		dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",
+			cxl_event_log_type_str(log),
+			le16_to_cpu(payload.handle[i]));
+	}
+
+	return cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
+				 &payload, sizeof(payload), NULL, 0);
+}
+
 static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 				    enum cxl_event_log_type type)
 {
@@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 		}
 
 		pl_nr = le16_to_cpu(payload.record_count);
-		if (trace_cxl_generic_event_enabled()) {
+		if (pl_nr > 0) {
 			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
 			int i;
 
-			for (i = 0; i < nr_rec; i++)
-				trace_cxl_generic_event(dev_name(cxlds->dev),
-							type,
-							&payload.record[i]);
+			if (trace_cxl_generic_event_enabled()) {
+				for (i = 0; i < nr_rec; i++)
+					trace_cxl_generic_event(dev_name(cxlds->dev),
+								type,
+								&payload.record[i]);
+			}
+
+			rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
+			if (rc) {
+				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
+					cxl_event_log_type_str(type), rc);
+				return;
+			}
 		}
 
 		if (trace_cxl_overflow_enabled() &&
@@ -750,10 +781,11 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
  * cxl_mem_get_event_records - Get Event Records from the device
  * @cxlds: The device data for the operation
  *
- * Retrieve all event records available on the device and report them as trace
- * events.
+ * Retrieve all event records available on the device, report them as trace
+ * events, and clear them.
  *
  * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
+ * See CXL rev 3.0 @8.2.9.2.3 Clear Event Records
  */
 void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index da64ba0f156b..28a114c7cf69 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -258,6 +258,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_INVALID		= 0x0000,
 	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
 	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
+	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
 	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
 	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
 	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
@@ -397,6 +398,20 @@ static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
 	return "<unknown>";
 }
 
+/*
+ * Clear Event Records input payload
+ * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
+ *
+ * Space given for 1 record
+ */
+struct cxl_mbox_clear_event_payload {
+	u8 event_log;		/* enum cxl_event_log_type */
+	u8 clear_flags;
+	u8 nr_recs;		/* 1 for this struct */
+	u8 reserved[3];
+	__le16 handle[CXL_GET_EVENT_NR_RECORDS];
+};
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 70459be5bdd4..7c1ad8062792 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -25,6 +25,7 @@
 	___C(RAW, "Raw device command"),                                  \
 	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
 	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
+	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
 	___C(GET_FW_INFO, "Get FW Info"),                                 \
 	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
 	___C(GET_LSA, "Get Label Storage Area"),                          \
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/11] cxl/mem: Clear events on driver load
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
                   ` (2 preceding siblings ...)
  2022-11-10 18:57 ` [PATCH 03/11] cxl/mem: Implement Clear " ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-15 22:10   ` Dave Jiang
  2022-11-10 18:57 ` [PATCH 05/11] cxl/mem: Trace General Media Event Record ira.weiny
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Jonathan Cameron, Alison Schofield, Vishal Verma,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

The information contained in the events prior to the driver loading can
be queried at any time through other mailbox commands.

Ensure a clean slate of events by reading and clearing the events.  The
events are sent to the trace buffer but it is not anticipated to have
anyone listening to it at driver load time.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/cxl/pci.c            | 2 ++
 tools/testing/cxl/test/mem.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 62e560063e50..e0d511575b45 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -530,6 +530,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
+	cxl_mem_get_event_records(cxlds);
+
 	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
 
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index aa2df3a15051..e2f5445d24ff 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
+	cxl_mem_get_event_records(cxlds);
+
 	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(dev, cxlmd);
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/11] cxl/mem: Trace General Media Event Record
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
                   ` (3 preceding siblings ...)
  2022-11-10 18:57 ` [PATCH 04/11] cxl/mem: Clear events on driver load ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-15 22:25   ` Dave Jiang
  2022-11-16 15:31   ` Jonathan Cameron
  2022-11-10 18:57 ` [PATCH 06/11] cxl/mem: Trace DRAM " ira.weiny
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.

Determine if the event read is a general media record and if so trace
the record as a General Media Event Record.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2:
	Output DPA flags as a single field
	Ensure names of fields match what TP_print outputs
	Steven
		prefix TRACE_EVENT with 'cxl_'
	Jonathan
		Remove Reserved field

Changes from RFC:
	Add reserved byte array
	Use common CXL event header record macros
	Jonathan
		Use unaligned_le{24,16} for unaligned fields
		Don't use the inverse of phy addr mask
	Dave Jiang
		s/cxl_gen_media_event/general_media
		s/cxl_evt_gen_media/cxl_event_gen_media
---
 drivers/cxl/core/mbox.c    |  40 ++++++++++--
 drivers/cxl/cxlmem.h       |  19 ++++++
 include/trace/events/cxl.h | 124 +++++++++++++++++++++++++++++++++++++
 3 files changed, 179 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index f46558e09f08..6d48fdb07700 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -709,6 +709,38 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
 
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+static const uuid_t gen_media_event_uuid =
+	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
+
+static bool cxl_event_tracing_enabled(void)
+{
+	return trace_cxl_generic_event_enabled() ||
+	       trace_cxl_general_media_enabled();
+}
+
+static void cxl_trace_event_record(const char *dev_name,
+				   enum cxl_event_log_type type,
+				   struct cxl_event_record_raw *record)
+{
+	uuid_t *id = &record->hdr.id;
+
+	if (uuid_equal(id, &gen_media_event_uuid)) {
+		struct cxl_event_gen_media *rec =
+				(struct cxl_event_gen_media *)record;
+
+		trace_cxl_general_media(dev_name, type, rec);
+		return;
+	}
+
+	/* For unknown record types print just the header */
+	trace_cxl_generic_event(dev_name, type, record);
+}
+
 static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
 				  enum cxl_event_log_type log,
 				  struct cxl_get_event_payload *get_pl, u16 nr)
@@ -754,11 +786,11 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
 			int i;
 
-			if (trace_cxl_generic_event_enabled()) {
+			if (cxl_event_tracing_enabled()) {
 				for (i = 0; i < nr_rec; i++)
-					trace_cxl_generic_event(dev_name(cxlds->dev),
-								type,
-								&payload.record[i]);
+					cxl_trace_event_record(dev_name(cxlds->dev),
+							       type,
+							       &payload.record[i]);
 			}
 
 			rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 28a114c7cf69..86197f3168c7 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -412,6 +412,25 @@ struct cxl_mbox_clear_event_payload {
 	__le16 handle[CXL_GET_EVENT_NR_RECORDS];
 };
 
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MED_COMP_ID_SIZE	0x10
+struct cxl_event_gen_media {
+	struct cxl_event_record_hdr hdr;
+	__le64 phys_addr;
+	u8 descriptor;
+	u8 type;
+	u8 transaction_type;
+	u8 validity_flags[2];
+	u8 channel;
+	u8 rank;
+	u8 device[3];
+	u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+	u8 reserved[0x2e];
+} __packed;
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
index 60dec9a84918..a0c20e110708 100644
--- a/include/trace/events/cxl.h
+++ b/include/trace/events/cxl.h
@@ -119,6 +119,130 @@ TRACE_EVENT(cxl_generic_event,
 		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
 );
 
+/*
+ * Physical Address field masks
+ *
+ * General Media Event Record
+ * CXL v2.0 Section 8.2.9.1.1.1; Table 154
+ *
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CXL_DPA_FLAGS_MASK			0x3F
+#define CXL_DPA_MASK				(~CXL_DPA_FLAGS_MASK)
+
+#define CXL_DPA_VOLATILE			BIT(0)
+#define CXL_DPA_NOT_REPAIRABLE			BIT(1)
+#define show_dpa_flags(flags)	__print_flags(flags, "|",		   \
+	{ CXL_DPA_VOLATILE,			"VOLATILE"		}, \
+	{ CXL_DPA_NOT_REPAIRABLE,		"NOT_REPAIRABLE"	}  \
+)
+
+/*
+ * General Media Event Record - GMER
+ * CXL v2.0 Section 8.2.9.1.1.1; Table 154
+ */
+#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT		BIT(0)
+#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT		BIT(1)
+#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW		BIT(2)
+#define show_event_desc_flags(flags)	__print_flags(flags, "|",		   \
+	{ CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,		"Uncorrectable Event"	}, \
+	{ CXL_GMER_EVT_DESC_THRESHOLD_EVENT,		"Threshold event"	}, \
+	{ CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW,	"Poison List Overflow"	}  \
+)
+
+#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR			0x00
+#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR			0x01
+#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR		0x02
+#define show_mem_event_type(type)	__print_symbolic(type,			\
+	{ CXL_GMER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },		\
+	{ CXL_GMER_MEM_EVT_TYPE_INV_ADDR,		"Invalid Address" },	\
+	{ CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,	"Data Path Error" }	\
+)
+
+#define CXL_GMER_TRANS_UNKNOWN				0x00
+#define CXL_GMER_TRANS_HOST_READ			0x01
+#define CXL_GMER_TRANS_HOST_WRITE			0x02
+#define CXL_GMER_TRANS_HOST_SCAN_MEDIA			0x03
+#define CXL_GMER_TRANS_HOST_INJECT_POISON		0x04
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB		0x05
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT	0x06
+#define show_trans_type(type)	__print_symbolic(type,					\
+	{ CXL_GMER_TRANS_UNKNOWN,			"Unknown" },			\
+	{ CXL_GMER_TRANS_HOST_READ,			"Host Read" },			\
+	{ CXL_GMER_TRANS_HOST_WRITE,			"Host Write" },			\
+	{ CXL_GMER_TRANS_HOST_SCAN_MEDIA,		"Host Scan Media" },		\
+	{ CXL_GMER_TRANS_HOST_INJECT_POISON,		"Host Inject Poison" },		\
+	{ CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,		"Internal Media Scrub" },	\
+	{ CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,	"Internal Media Management" }	\
+)
+
+#define CXL_GMER_VALID_CHANNEL				BIT(0)
+#define CXL_GMER_VALID_RANK				BIT(1)
+#define CXL_GMER_VALID_DEVICE				BIT(2)
+#define CXL_GMER_VALID_COMPONENT			BIT(3)
+#define show_valid_flags(flags)	__print_flags(flags, "|",		   \
+	{ CXL_GMER_VALID_CHANNEL,			"CHANNEL"	}, \
+	{ CXL_GMER_VALID_RANK,				"RANK"		}, \
+	{ CXL_GMER_VALID_DEVICE,			"DEVICE"	}, \
+	{ CXL_GMER_VALID_COMPONENT,			"COMPONENT"	}  \
+)
+
+TRACE_EVENT(cxl_general_media,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_event_gen_media *rec),
+
+	TP_ARGS(dev_name, log, rec),
+
+	TP_STRUCT__entry(
+		CXL_EVT_TP_entry
+		/* General Media */
+		__field(u64, dpa)
+		__field(u8, descriptor)
+		__field(u8, type)
+		__field(u8, transaction_type)
+		__field(u8, channel)
+		__field(u32, device)
+		__array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
+		__field(u16, validity_flags)
+		/* Following are out of order to pack trace record */
+		__field(u8, rank)
+		__field(u8, dpa_flags)
+	),
+
+	TP_fast_assign(
+		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
+
+		/* General Media */
+		__entry->dpa = le64_to_cpu(rec->phys_addr);
+		__entry->dpa_flags = __entry->dpa & CXL_DPA_FLAGS_MASK;
+		/* Mask after flags have been parsed */
+		__entry->dpa &= CXL_DPA_MASK;
+		__entry->descriptor = rec->descriptor;
+		__entry->type = rec->type;
+		__entry->transaction_type = rec->transaction_type;
+		__entry->channel = rec->channel;
+		__entry->rank = rec->rank;
+		__entry->device = get_unaligned_le24(rec->device);
+		memcpy(__entry->comp_id, &rec->component_id,
+			CXL_EVENT_GEN_MED_COMP_ID_SIZE);
+		__entry->validity_flags = get_unaligned_le16(&rec->validity_flags);
+	),
+
+	CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
+		"descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
+		"device=%x comp_id=%s validity_flags='%s'",
+		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
+		show_event_desc_flags(__entry->descriptor),
+		show_mem_event_type(__entry->type),
+		show_trans_type(__entry->transaction_type),
+		__entry->channel, __entry->rank, __entry->device,
+		__print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
+		show_valid_flags(__entry->validity_flags)
+	)
+);
+
 #endif /* _CXL_TRACE_EVENTS_H */
 
 /* This part must be outside protection */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/11] cxl/mem: Trace DRAM Event Record
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
                   ` (4 preceding siblings ...)
  2022-11-10 18:57 ` [PATCH 05/11] cxl/mem: Trace General Media Event Record ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-15 22:26   ` Dave Jiang
  2022-11-10 18:57 ` [PATCH 07/11] cxl/mem: Trace Memory Module " ira.weiny
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Jonathan Cameron, Alison Schofield, Vishal Verma,
	Ben Widawsky, Steven Rostedt, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL rev 3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.

Determine if the event read is a DRAM event record and if so trace the
record.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2:
	Output DPA flags as a separate field.
	Ensure field names match TP_print output
	Steven
		prefix TRACE_EVENT with 'cxl_'
	Jonathan
		Formatting fix
		Remove reserved field

Changes from RFC:
	Add reserved byte data
	Use new CXL header macros
	Jonathan
		Use get_unaligned_le{24,16}() for unaligned fields
		Use 'else if'
	Dave Jiang
		s/cxl_dram_event/dram
		s/cxl_evt_dram_rec/cxl_event_dram
	Adjust for new phys addr mask
---
 drivers/cxl/core/mbox.c    | 16 ++++++-
 drivers/cxl/cxlmem.h       | 23 ++++++++++
 include/trace/events/cxl.h | 92 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 6d48fdb07700..b03d7b856f3d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -717,10 +717,19 @@ static const uuid_t gen_media_event_uuid =
 	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
 		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
 
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+static const uuid_t dram_event_uuid =
+	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
+
 static bool cxl_event_tracing_enabled(void)
 {
 	return trace_cxl_generic_event_enabled() ||
-	       trace_cxl_general_media_enabled();
+	       trace_cxl_general_media_enabled() ||
+	       trace_cxl_dram_enabled();
 }
 
 static void cxl_trace_event_record(const char *dev_name,
@@ -735,6 +744,11 @@ static void cxl_trace_event_record(const char *dev_name,
 
 		trace_cxl_general_media(dev_name, type, rec);
 		return;
+	} else if (uuid_equal(id, &dram_event_uuid)) {
+		struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
+
+		trace_cxl_dram(dev_name, type, rec);
+		return;
 	}
 
 	/* For unknown record types print just the header */
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 86197f3168c7..87c877f0940d 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -431,6 +431,29 @@ struct cxl_event_gen_media {
 	u8 reserved[0x2e];
 } __packed;
 
+/*
+ * DRAM Event Record - DER
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVENT_DER_CORRECTION_MASK_SIZE	0x20
+struct cxl_event_dram {
+	struct cxl_event_record_hdr hdr;
+	__le64 phys_addr;
+	u8 descriptor;
+	u8 type;
+	u8 transaction_type;
+	u8 validity_flags[2];
+	u8 channel;
+	u8 rank;
+	u8 nibble_mask[3];
+	u8 bank_group;
+	u8 bank;
+	u8 row[3];
+	u8 column[2];
+	u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
+	u8 reserved[0x17];
+} __packed;
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
index a0c20e110708..37bbe59905af 100644
--- a/include/trace/events/cxl.h
+++ b/include/trace/events/cxl.h
@@ -243,6 +243,98 @@ TRACE_EVENT(cxl_general_media,
 	)
 );
 
+/*
+ * DRAM Event Record - DER
+ *
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+/*
+ * DRAM Event Record defines many fields the same as the General Media Event
+ * Record.  Reuse those definitions as appropriate.
+ */
+#define CXL_DER_VALID_CHANNEL				BIT(0)
+#define CXL_DER_VALID_RANK				BIT(1)
+#define CXL_DER_VALID_NIBBLE				BIT(2)
+#define CXL_DER_VALID_BANK_GROUP			BIT(3)
+#define CXL_DER_VALID_BANK				BIT(4)
+#define CXL_DER_VALID_ROW				BIT(5)
+#define CXL_DER_VALID_COLUMN				BIT(6)
+#define CXL_DER_VALID_CORRECTION_MASK			BIT(7)
+#define show_dram_valid_flags(flags)	__print_flags(flags, "|",			   \
+	{ CXL_DER_VALID_CHANNEL,			"CHANNEL"		}, \
+	{ CXL_DER_VALID_RANK,				"RANK"			}, \
+	{ CXL_DER_VALID_NIBBLE,				"NIBBLE"		}, \
+	{ CXL_DER_VALID_BANK_GROUP,			"BANK GROUP"		}, \
+	{ CXL_DER_VALID_BANK,				"BANK"			}, \
+	{ CXL_DER_VALID_ROW,				"ROW"			}, \
+	{ CXL_DER_VALID_COLUMN,				"COLUMN"		}, \
+	{ CXL_DER_VALID_CORRECTION_MASK,		"CORRECTION MASK"	}  \
+)
+
+TRACE_EVENT(cxl_dram,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_event_dram *rec),
+
+	TP_ARGS(dev_name, log, rec),
+
+	TP_STRUCT__entry(
+		CXL_EVT_TP_entry
+		/* DRAM */
+		__field(u64, dpa)
+		__field(u8, descriptor)
+		__field(u8, type)
+		__field(u8, transaction_type)
+		__field(u8, channel)
+		__field(u16, validity_flags)
+		__field(u16, column)	/* Out of order to pack trace record */
+		__field(u32, nibble_mask)
+		__field(u32, row)
+		__array(u8, cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE)
+		__field(u8, rank)	/* Out of order to pack trace record */
+		__field(u8, bank_group)	/* Out of order to pack trace record */
+		__field(u8, bank)	/* Out of order to pack trace record */
+		__field(u8, dpa_flags)	/* Out of order to pack trace record */
+	),
+
+	TP_fast_assign(
+		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
+
+		/* DRAM */
+		__entry->dpa = le64_to_cpu(rec->phys_addr);
+		__entry->dpa_flags = __entry->dpa & CXL_DPA_FLAGS_MASK;
+		__entry->dpa &= CXL_DPA_MASK;
+		__entry->descriptor = rec->descriptor;
+		__entry->type = rec->type;
+		__entry->transaction_type = rec->transaction_type;
+		__entry->validity_flags = get_unaligned_le16(rec->validity_flags);
+		__entry->channel = rec->channel;
+		__entry->rank = rec->rank;
+		__entry->nibble_mask = get_unaligned_le24(rec->nibble_mask);
+		__entry->bank_group = rec->bank_group;
+		__entry->bank = rec->bank;
+		__entry->row = get_unaligned_le24(rec->row);
+		__entry->column = get_unaligned_le16(rec->column);
+		memcpy(__entry->cor_mask, &rec->correction_mask,
+			CXL_EVENT_DER_CORRECTION_MASK_SIZE);
+	),
+
+	CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' " \
+		"transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
+		"bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
+		"validity_flags='%s'",
+		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
+		show_event_desc_flags(__entry->descriptor),
+		show_mem_event_type(__entry->type),
+		show_trans_type(__entry->transaction_type),
+		__entry->channel, __entry->rank, __entry->nibble_mask,
+		__entry->bank_group, __entry->bank,
+		__entry->row, __entry->column,
+		__print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
+		show_dram_valid_flags(__entry->validity_flags)
+	)
+);
+
 #endif /* _CXL_TRACE_EVENTS_H */
 
 /* This part must be outside protection */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/11] cxl/mem: Trace Memory Module Event Record
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
                   ` (5 preceding siblings ...)
  2022-11-10 18:57 ` [PATCH 06/11] cxl/mem: Trace DRAM " ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-15 22:39   ` Dave Jiang
                     ` (2 more replies)
  2022-11-10 18:57 ` [PATCH 08/11] cxl/mem: Wire up event interrupts ira.weiny
                   ` (3 subsequent siblings)
  10 siblings, 3 replies; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.

Determine if the event read is memory module record and if so trace the
record.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2:
	Ensure field names match TP_print output
	Steven
		prefix TRACE_EVENT with 'cxl_'
	Jonathan
		Remove reserved field
		Define a 1bit and 2 bit status decoder
		Fix paren alignment

Changes from RFC:
	Clean up spec reference
	Add reserved data
	Use new CXL header macros
	Jonathan
		Use else if
		Use get_unaligned_le*() for unaligned fields
	Dave Jiang
		s/cxl_mem_mod_event/memory_module
		s/cxl_evt_mem_mod_rec/cxl_event_mem_module
---
 drivers/cxl/core/mbox.c    |  17 ++++-
 drivers/cxl/cxlmem.h       |  26 +++++++
 include/trace/events/cxl.h | 144 +++++++++++++++++++++++++++++++++++++
 3 files changed, 186 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index b03d7b856f3d..879b228a98a0 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -725,11 +725,20 @@ static const uuid_t dram_event_uuid =
 	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
 		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
 
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+static const uuid_t mem_mod_event_uuid =
+	UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
+
 static bool cxl_event_tracing_enabled(void)
 {
 	return trace_cxl_generic_event_enabled() ||
 	       trace_cxl_general_media_enabled() ||
-	       trace_cxl_dram_enabled();
+	       trace_cxl_dram_enabled() ||
+	       trace_cxl_memory_module_enabled();
 }
 
 static void cxl_trace_event_record(const char *dev_name,
@@ -749,6 +758,12 @@ static void cxl_trace_event_record(const char *dev_name,
 
 		trace_cxl_dram(dev_name, type, rec);
 		return;
+	} else if (uuid_equal(id, &mem_mod_event_uuid)) {
+		struct cxl_event_mem_module *rec =
+				(struct cxl_event_mem_module *)record;
+
+		trace_cxl_memory_module(dev_name, type, rec);
+		return;
 	}
 
 	/* For unknown record types print just the header */
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 87c877f0940d..03da4f8f74d3 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -454,6 +454,32 @@ struct cxl_event_dram {
 	u8 reserved[0x17];
 } __packed;
 
+/*
+ * Get Health Info Record
+ * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+	u8 health_status;
+	u8 media_status;
+	u8 add_status;
+	u8 life_used;
+	u8 device_temp[2];
+	u8 dirty_shutdown_cnt[4];
+	u8 cor_vol_err_cnt[4];
+	u8 cor_per_err_cnt[4];
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_event_mem_module {
+	struct cxl_event_record_hdr hdr;
+	u8 event_type;
+	struct cxl_get_health_info info;
+	u8 reserved[0x3d];
+} __packed;
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
index 37bbe59905af..05437e13a882 100644
--- a/include/trace/events/cxl.h
+++ b/include/trace/events/cxl.h
@@ -335,6 +335,150 @@ TRACE_EVENT(cxl_dram,
 	)
 );
 
+/*
+ * Memory Module Event Record - MMER
+ *
+ * CXL res 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CXL_MMER_HEALTH_STATUS_CHANGE		0x00
+#define CXL_MMER_MEDIA_STATUS_CHANGE		0x01
+#define CXL_MMER_LIFE_USED_CHANGE		0x02
+#define CXL_MMER_TEMP_CHANGE			0x03
+#define CXL_MMER_DATA_PATH_ERROR		0x04
+#define CXL_MMER_LAS_ERROR			0x05
+#define show_dev_evt_type(type)	__print_symbolic(type,			   \
+	{ CXL_MMER_HEALTH_STATUS_CHANGE,	"Health Status Change"	}, \
+	{ CXL_MMER_MEDIA_STATUS_CHANGE,		"Media Status Change"	}, \
+	{ CXL_MMER_LIFE_USED_CHANGE,		"Life Used Change"	}, \
+	{ CXL_MMER_TEMP_CHANGE,			"Temperature Change"	}, \
+	{ CXL_MMER_DATA_PATH_ERROR,		"Data Path Error"	}, \
+	{ CXL_MMER_LAS_ERROR,			"LSA Error"		}  \
+)
+
+/*
+ * Device Health Information - DHI
+ *
+ * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+#define CXL_DHI_HS_MAINTENANCE_NEEDED				BIT(0)
+#define CXL_DHI_HS_PERFORMANCE_DEGRADED				BIT(1)
+#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED			BIT(2)
+#define show_health_status_flags(flags)	__print_flags(flags, "|",	   \
+	{ CXL_DHI_HS_MAINTENANCE_NEEDED,	"Maintenance Needed"	}, \
+	{ CXL_DHI_HS_PERFORMANCE_DEGRADED,	"Performance Degraded"	}, \
+	{ CXL_DHI_HS_HW_REPLACEMENT_NEEDED,	"Replacement Needed"	}  \
+)
+
+#define CXL_DHI_MS_NORMAL							0x00
+#define CXL_DHI_MS_NOT_READY							0x01
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST					0x02
+#define CXL_DHI_MS_ALL_DATA_LOST						0x03
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS			0x04
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN			0x05
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT				0x06
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS				0x07
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN				0x08
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT					0x09
+#define show_media_status(ms)	__print_symbolic(ms,			   \
+	{ CXL_DHI_MS_NORMAL,						   \
+		"Normal"						}, \
+	{ CXL_DHI_MS_NOT_READY,						   \
+		"Not Ready"						}, \
+	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOST,				   \
+		"Write Persistency Lost"				}, \
+	{ CXL_DHI_MS_ALL_DATA_LOST,					   \
+		"All Data Lost"						}, \
+	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS,		   \
+		"Write Persistency Loss in the Event of Power Loss"	}, \
+	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN,		   \
+		"Write Persistency Loss in Event of Shutdown"		}, \
+	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT,			   \
+		"Write Persistency Loss Imminent"			}, \
+	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS,		   \
+		"All Data Loss in Event of Power Loss"			}, \
+	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN,		   \
+		"All Data loss in the Event of Shutdown"		}, \
+	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT,			   \
+		"All Data Loss Imminent"				}  \
+)
+
+#define CXL_DHI_AS_NORMAL		0x0
+#define CXL_DHI_AS_WARNING		0x1
+#define CXL_DHI_AS_CRITICAL		0x2
+#define show_two_bit_status(as) __print_symbolic(as,	   \
+	{ CXL_DHI_AS_NORMAL,		"Normal"	}, \
+	{ CXL_DHI_AS_WARNING,		"Warning"	}, \
+	{ CXL_DHI_AS_CRITICAL,		"Critical"	}  \
+)
+#define show_one_bit_status(as) __print_symbolic(as,	   \
+	{ CXL_DHI_AS_NORMAL,		"Normal"	}, \
+	{ CXL_DHI_AS_WARNING,		"Warning"	}  \
+)
+
+#define CXL_DHI_AS_LIFE_USED(as)			(as & 0x3)
+#define CXL_DHI_AS_DEV_TEMP(as)				((as & 0xC) >> 2)
+#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)			((as & 0x10) >> 4)
+#define CXL_DHI_AS_COR_PER_ERR_CNT(as)			((as & 0x20) >> 5)
+
+TRACE_EVENT(cxl_memory_module,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_event_mem_module *rec),
+
+	TP_ARGS(dev_name, log, rec),
+
+	TP_STRUCT__entry(
+		CXL_EVT_TP_entry
+
+		/* Memory Module Event */
+		__field(u8, event_type)
+
+		/* Device Health Info */
+		__field(u8, health_status)
+		__field(u8, media_status)
+		__field(u8, life_used)
+		__field(u32, dirty_shutdown_cnt)
+		__field(u32, cor_vol_err_cnt)
+		__field(u32, cor_per_err_cnt)
+		__field(s16, device_temp)
+		__field(u8, add_status)
+	),
+
+	TP_fast_assign(
+		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
+
+		/* Memory Module Event */
+		__entry->event_type = rec->event_type;
+
+		/* Device Health Info */
+		__entry->health_status = rec->info.health_status;
+		__entry->media_status = rec->info.media_status;
+		__entry->life_used = rec->info.life_used;
+		__entry->dirty_shutdown_cnt = get_unaligned_le32(rec->info.dirty_shutdown_cnt);
+		__entry->cor_vol_err_cnt = get_unaligned_le32(rec->info.cor_vol_err_cnt);
+		__entry->cor_per_err_cnt = get_unaligned_le32(rec->info.cor_per_err_cnt);
+		__entry->device_temp = get_unaligned_le16(rec->info.device_temp);
+		__entry->add_status = rec->info.add_status;
+	),
+
+	CXL_EVT_TP_printk("event_type='%s' health_status='%s' media_status='%s' " \
+		"as_life_used=%s as_dev_temp=%s as_cor_vol_err_cnt=%s " \
+		"as_cor_per_err_cnt=%s life_used=%u device_temp=%d " \
+		"dirty_shutdown_cnt=%u cor_vol_err_cnt=%u cor_per_err_cnt=%u",
+		show_dev_evt_type(__entry->event_type),
+		show_health_status_flags(__entry->health_status),
+		show_media_status(__entry->media_status),
+		show_two_bit_status(CXL_DHI_AS_LIFE_USED(__entry->add_status)),
+		show_two_bit_status(CXL_DHI_AS_DEV_TEMP(__entry->add_status)),
+		show_one_bit_status(CXL_DHI_AS_COR_VOL_ERR_CNT(__entry->add_status)),
+		show_one_bit_status(CXL_DHI_AS_COR_PER_ERR_CNT(__entry->add_status)),
+		__entry->life_used, __entry->device_temp,
+		__entry->dirty_shutdown_cnt, __entry->cor_vol_err_cnt,
+		__entry->cor_per_err_cnt
+	)
+);
+
+
 #endif /* _CXL_TRACE_EVENTS_H */
 
 /* This part must be outside protection */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/11] cxl/mem: Wire up event interrupts
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
                   ` (6 preceding siblings ...)
  2022-11-10 18:57 ` [PATCH 07/11] cxl/mem: Trace Memory Module " ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-15 23:13   ` Dave Jiang
  2022-11-16 14:40   ` Jonathan Cameron
  2022-11-10 18:57 ` [PATCH 09/11] cxl/test: Add generic mock events ira.weiny
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL device events are signaled via interrupts.  Each event log may have
a different interrupt message number.  These message numbers are
reported in the Get Event Interrupt Policy mailbox command.

Add interrupt support for event logs.  Interrupts are allocated as
shared interrupts.  Therefore, all or some event logs can share the same
message number.

The driver must deal with the possibility that dynamic capacity is not
yet supported by a device it sees.  Fallback and retry without dynamic
capacity if the first attempt fails.

Device capacity event logs interrupt as part of the informational event
log.  Check the event status to see which log has data.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2
	Adjust to new irq 16 vector allocation
	Jonathan
		Remove CXL_INT_RES
	Use irq threads to ensure mailbox commands are executed outside irq context
	Adjust for optional Dynamic Capacity log
---
 drivers/cxl/core/mbox.c      |  53 +++++++++++++-
 drivers/cxl/cxlmem.h         |  31 ++++++++
 drivers/cxl/pci.c            | 133 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/cxl_mem.h |   2 +
 4 files changed, 217 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 879b228a98a0..1e6762af2a00 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -53,6 +53,8 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
 	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
 	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
+	CXL_CMD(GET_EVT_INT_POLICY, 0, 0x5, 0),
+	CXL_CMD(SET_EVT_INT_POLICY, 0x5, 0, 0),
 	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
 	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
 	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -791,8 +793,8 @@ static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
 				 &payload, sizeof(payload), NULL, 0);
 }
 
-static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
-				    enum cxl_event_log_type type)
+void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
+			     enum cxl_event_log_type type)
 {
 	struct cxl_get_event_payload payload;
 	u16 pl_nr;
@@ -837,6 +839,7 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
 		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
 }
+EXPORT_SYMBOL_NS_GPL(cxl_mem_get_records_log, CXL);
 
 /**
  * cxl_mem_get_event_records - Get Event Records from the device
@@ -867,6 +870,52 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
 
+int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
+{
+	struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
+	size_t policy_size = sizeof(*policy);
+	bool retry = true;
+	int rc;
+
+	policy->info_settings = CXL_INT_MSI_MSIX;
+	policy->warn_settings = CXL_INT_MSI_MSIX;
+	policy->failure_settings = CXL_INT_MSI_MSIX;
+	policy->fatal_settings = CXL_INT_MSI_MSIX;
+	policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
+
+again:
+	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
+			       policy, policy_size, NULL, 0);
+	if (rc < 0) {
+		/*
+		 * If the device does not support dynamic capacity it may fail
+		 * the command due to an invalid payload.  Retry without
+		 * dynamic capacity.
+		 */
+		if (retry) {
+			retry = false;
+			policy->dyn_cap_settings = 0;
+			policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
+			goto again;
+		}
+		dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
+			rc);
+		memset(policy, CXL_INT_NONE, sizeof(*policy));
+		return rc;
+	}
+
+	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVT_INT_POLICY, NULL, 0,
+			       policy, policy_size);
+	if (rc < 0) {
+		dev_err(cxlds->dev, "Failed to get event interrupt policy : %d",
+			rc);
+		return rc;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_event_config_msgnums, CXL);
+
 /**
  * cxl_mem_get_partition_info - Get partition info
  * @cxlds: The device data for the operation
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 03da4f8f74d3..4d9c3ea30c24 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -179,6 +179,31 @@ struct cxl_endpoint_dvsec_info {
 	struct range dvsec_range[2];
 };
 
+/**
+ * Event Interrupt Policy
+ *
+ * CXL rev 3.0 section 8.2.9.2.4; Table 8-52
+ */
+enum cxl_event_int_mode {
+	CXL_INT_NONE		= 0x00,
+	CXL_INT_MSI_MSIX	= 0x01,
+	CXL_INT_FW		= 0x02
+};
+#define CXL_EVENT_INT_MODE_MASK 0x3
+#define CXL_EVENT_INT_MSGNUM(setting) (((setting) & 0xf0) >> 4)
+struct cxl_event_interrupt_policy {
+	u8 info_settings;
+	u8 warn_settings;
+	u8 failure_settings;
+	u8 fatal_settings;
+	u8 dyn_cap_settings;
+} __packed;
+
+static inline bool cxl_evt_int_is_msi(u8 setting)
+{
+	return CXL_INT_MSI_MSIX == (setting & CXL_EVENT_INT_MODE_MASK);
+}
+
 /**
  * struct cxl_dev_state - The driver device state
  *
@@ -246,6 +271,7 @@ struct cxl_dev_state {
 
 	resource_size_t component_reg_phys;
 	u64 serial;
+	struct cxl_event_interrupt_policy evt_int_policy;
 
 	struct xarray doe_mbs;
 
@@ -259,6 +285,8 @@ enum cxl_opcode {
 	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
 	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
 	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
+	CXL_MBOX_OP_GET_EVT_INT_POLICY	= 0x0102,
+	CXL_MBOX_OP_SET_EVT_INT_POLICY	= 0x0103,
 	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
 	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
 	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
@@ -539,7 +567,10 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
 struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
 void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
 void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
+void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
+			     enum cxl_event_log_type type);
 void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
+int cxl_event_config_msgnums(struct cxl_dev_state *cxlds);
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
 void cxl_mem_active_dec(void);
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index e0d511575b45..64b2e2671043 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -458,6 +458,138 @@ static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
 	cxlds->nr_irq_vecs = nvecs;
 }
 
+struct cxl_event_irq_id {
+	struct cxl_dev_state *cxlds;
+	u32 status;
+	unsigned int msgnum;
+};
+
+static irqreturn_t cxl_event_int_thread(int irq, void *id)
+{
+	struct cxl_event_irq_id *cxlid = id;
+	struct cxl_dev_state *cxlds = cxlid->cxlds;
+
+	if (cxlid->status & CXLDEV_EVENT_STATUS_INFO)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
+	if (cxlid->status & CXLDEV_EVENT_STATUS_WARN)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
+	if (cxlid->status & CXLDEV_EVENT_STATUS_FAIL)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
+	if (cxlid->status & CXLDEV_EVENT_STATUS_FATAL)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
+	if (cxlid->status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_event_int_handler(int irq, void *id)
+{
+	struct cxl_event_irq_id *cxlid = id;
+	struct cxl_dev_state *cxlds = cxlid->cxlds;
+	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
+
+	if (cxlid->status & status)
+		return IRQ_WAKE_THREAD;
+	return IRQ_HANDLED;
+}
+
+static void cxl_free_event_irq(void *id)
+{
+	struct cxl_event_irq_id *cxlid = id;
+	struct pci_dev *pdev = to_pci_dev(cxlid->cxlds->dev);
+
+	pci_free_irq(pdev, cxlid->msgnum, id);
+}
+
+static u32 log_type_to_status(enum cxl_event_log_type log_type)
+{
+	switch (log_type) {
+	case CXL_EVENT_TYPE_INFO:
+		return CXLDEV_EVENT_STATUS_INFO | CXLDEV_EVENT_STATUS_DYNAMIC_CAP;
+	case CXL_EVENT_TYPE_WARN:
+		return CXLDEV_EVENT_STATUS_WARN;
+	case CXL_EVENT_TYPE_FAIL:
+		return CXLDEV_EVENT_STATUS_FAIL;
+	case CXL_EVENT_TYPE_FATAL:
+		return CXLDEV_EVENT_STATUS_FATAL;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
+				 enum cxl_event_log_type log_type,
+				 u8 setting)
+{
+	struct device *dev = cxlds->dev;
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct cxl_event_irq_id *id;
+	unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
+	int irq;
+
+	/* Disabled irq is not an error */
+	if (!cxl_evt_int_is_msi(setting) || msgnum > cxlds->nr_irq_vecs) {
+		dev_dbg(dev, "Event interrupt not enabled; %s %u %d\n",
+			cxl_event_log_type_str(CXL_EVENT_TYPE_INFO),
+			msgnum, cxlds->nr_irq_vecs);
+		return 0;
+	}
+
+	id = devm_kzalloc(dev, sizeof(*id), GFP_KERNEL);
+	if (!id)
+		return -ENOMEM;
+
+	id->cxlds = cxlds;
+	id->msgnum = msgnum;
+	id->status = log_type_to_status(log_type);
+
+	irq = pci_request_irq(pdev, id->msgnum, cxl_event_int_handler,
+			      cxl_event_int_thread, id,
+			      "%s:event-log-%s", dev_name(dev),
+			      cxl_event_log_type_str(log_type));
+	if (irq)
+		return irq;
+
+	devm_add_action_or_reset(dev, cxl_free_event_irq, id);
+	return 0;
+}
+
+static void cxl_event_irqsetup(struct cxl_dev_state *cxlds)
+{
+	struct device *dev = cxlds->dev;
+	u8 setting;
+
+	if (cxl_event_config_msgnums(cxlds))
+		return;
+
+	/*
+	 * Dynamic Capacity shares the info message number
+	 * Nothing to be done except check the status bit in the
+	 * irq thread.
+	 */
+	setting = cxlds->evt_int_policy.info_settings;
+	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_INFO, setting))
+		dev_err(dev, "Failed to get interrupt for %s event log\n",
+			cxl_event_log_type_str(CXL_EVENT_TYPE_INFO));
+
+	setting = cxlds->evt_int_policy.warn_settings;
+	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_WARN, setting))
+		dev_err(dev, "Failed to get interrupt for %s event log\n",
+			cxl_event_log_type_str(CXL_EVENT_TYPE_WARN));
+
+	setting = cxlds->evt_int_policy.failure_settings;
+	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FAIL, setting))
+		dev_err(dev, "Failed to get interrupt for %s event log\n",
+			cxl_event_log_type_str(CXL_EVENT_TYPE_FAIL));
+
+	setting = cxlds->evt_int_policy.fatal_settings;
+	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FATAL, setting))
+		dev_err(dev, "Failed to get interrupt for %s event log\n",
+			cxl_event_log_type_str(CXL_EVENT_TYPE_FATAL));
+}
+
 static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct cxl_register_map map;
@@ -525,6 +657,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		return rc;
 
 	cxl_pci_alloc_irq_vectors(cxlds);
+	cxl_event_irqsetup(cxlds);
 
 	cxlmd = devm_cxl_add_memdev(cxlds);
 	if (IS_ERR(cxlmd))
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 7c1ad8062792..a8204802fcca 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -26,6 +26,8 @@
 	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
 	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
 	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
+	___C(GET_EVT_INT_POLICY, "Get Event Interrupt Policy"),           \
+	___C(SET_EVT_INT_POLICY, "Set Event Interrupt Policy"),           \
 	___C(GET_FW_INFO, "Get FW Info"),                                 \
 	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
 	___C(GET_LSA, "Get Label Storage Area"),                          \
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/11] cxl/test: Add generic mock events
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
                   ` (7 preceding siblings ...)
  2022-11-10 18:57 ` [PATCH 08/11] cxl/mem: Wire up event interrupts ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-16 16:00   ` Jonathan Cameron
  2022-11-10 18:57 ` [PATCH 10/11] cxl/test: Add specific events ira.weiny
  2022-11-10 18:57 ` [PATCH 11/11] cxl/test: Simulate event log overflow ira.weiny
  10 siblings, 1 reply; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Facilitate testing basic Get/Clear Event functionality by creating
multiple logs and generic events with made up UUID's.

Data is completely made up with data patterns which should be easy to
spot in trace output.

A single sysfs entry resets the event data and triggers collecting the
events for testing.

Test traces are easy to obtain with a small script such as this:

	#!/bin/bash -x

	devices=`find /sys/devices/platform -name cxl_mem*`

	# Turn on tracing
	echo "" > /sys/kernel/tracing/trace
	echo 1 > /sys/kernel/tracing/events/cxl/enable
	echo 1 > /sys/kernel/tracing/tracing_on

	# Generate fake interrupt
	for device in $devices; do
	        echo 1 > $device/event_trigger
	done

	# Turn off tracing and report events
	echo 0 > /sys/kernel/tracing/tracing_on
	cat /sys/kernel/tracing/trace

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2:
	Adjust to simulate the event status register

Changes from RFC:
	Separate out the event code
	Adjust for struct changes.
	Clean up devm_cxl_mock_event_logs()
	Clean up naming and comments
	Jonathan
		Remove dynamic allocation of event logs
		Clean up comment
		Remove unneeded xarray
		Ensure event_trigger sysfs is valid prior to the driver
		going active.
	Dan
		Remove the fill/reset event sysfs as these operations
		can be done together
---
 drivers/cxl/core/mbox.c         |  31 +++--
 drivers/cxl/cxlmem.h            |   1 +
 tools/testing/cxl/test/Kbuild   |   2 +-
 tools/testing/cxl/test/events.c | 222 ++++++++++++++++++++++++++++++++
 tools/testing/cxl/test/events.h |   9 ++
 tools/testing/cxl/test/mem.c    |  35 ++++-
 6 files changed, 286 insertions(+), 14 deletions(-)
 create mode 100644 tools/testing/cxl/test/events.c
 create mode 100644 tools/testing/cxl/test/events.h

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 1e6762af2a00..2d74c0f2cbf7 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -841,6 +841,24 @@ void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_get_records_log, CXL);
 
+/* Direct call for mock testing */
+void __cxl_mem_get_event_records(struct cxl_dev_state *cxlds, u32 status)
+{
+	dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
+
+	if (status & CXLDEV_EVENT_STATUS_INFO)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
+	if (status & CXLDEV_EVENT_STATUS_WARN)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
+	if (status & CXLDEV_EVENT_STATUS_FAIL)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
+	if (status & CXLDEV_EVENT_STATUS_FATAL)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
+	if (status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
+}
+EXPORT_SYMBOL_NS_GPL(__cxl_mem_get_event_records, CXL);
+
 /**
  * cxl_mem_get_event_records - Get Event Records from the device
  * @cxlds: The device data for the operation
@@ -855,18 +873,7 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
 {
 	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
 
-	dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
-
-	if (status & CXLDEV_EVENT_STATUS_INFO)
-		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
-	if (status & CXLDEV_EVENT_STATUS_WARN)
-		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
-	if (status & CXLDEV_EVENT_STATUS_FAIL)
-		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
-	if (status & CXLDEV_EVENT_STATUS_FATAL)
-		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
-	if (status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
-		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
+	__cxl_mem_get_event_records(cxlds, status);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 4d9c3ea30c24..77bcbaa16dd3 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -569,6 +569,7 @@ void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds
 void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
 void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 			     enum cxl_event_log_type type);
+void __cxl_mem_get_event_records(struct cxl_dev_state *cxlds, u32 status);
 void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
 int cxl_event_config_msgnums(struct cxl_dev_state *cxlds);
 #ifdef CONFIG_CXL_SUSPEND
diff --git a/tools/testing/cxl/test/Kbuild b/tools/testing/cxl/test/Kbuild
index 4e59e2c911f6..64b14b83d8d9 100644
--- a/tools/testing/cxl/test/Kbuild
+++ b/tools/testing/cxl/test/Kbuild
@@ -7,4 +7,4 @@ obj-m += cxl_mock_mem.o
 
 cxl_test-y := cxl.o
 cxl_mock-y := mock.o
-cxl_mock_mem-y := mem.o
+cxl_mock_mem-y := mem.o events.o
diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
new file mode 100644
index 000000000000..a4816f230bb5
--- /dev/null
+++ b/tools/testing/cxl/test/events.c
@@ -0,0 +1,222 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2022 Intel Corporation. All rights reserved.
+
+#include <cxlmem.h>
+#include <trace/events/cxl.h>
+
+#include "events.h"
+
+#define CXL_TEST_EVENT_CNT_MAX 15
+
+struct mock_event_log {
+	int cur_event;
+	int nr_events;
+	struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
+};
+
+struct mock_event_store {
+	struct cxl_dev_state *cxlds;
+	struct mock_event_log mock_logs[CXL_EVENT_TYPE_MAX];
+	u32 ev_status;
+};
+
+DEFINE_XARRAY(mock_dev_event_store);
+
+struct mock_event_log *find_event_log(struct device *dev, int log_type)
+{
+	struct mock_event_store *mes = xa_load(&mock_dev_event_store,
+					       (unsigned long)dev);
+
+	if (!mes || log_type >= CXL_EVENT_TYPE_MAX)
+		return NULL;
+	return &mes->mock_logs[log_type];
+}
+
+struct cxl_event_record_raw *get_cur_event(struct mock_event_log *log)
+{
+	return log->events[log->cur_event];
+}
+
+__le16 get_cur_event_handle(struct mock_event_log *log)
+{
+	return cpu_to_le16(log->cur_event);
+}
+
+static bool log_empty(struct mock_event_log *log)
+{
+	return log->cur_event == log->nr_events;
+}
+
+static int log_rec_left(struct mock_event_log *log)
+{
+	return log->nr_events - log->cur_event;
+}
+
+static void event_store_add_event(struct mock_event_store *mes,
+				  enum cxl_event_log_type log_type,
+				  struct cxl_event_record_raw *event)
+{
+	struct mock_event_log *log;
+
+	if (WARN_ON(log_type >= CXL_EVENT_TYPE_MAX))
+		return;
+
+	log = &mes->mock_logs[log_type];
+	if (WARN_ON(log->nr_events >= CXL_TEST_EVENT_CNT_MAX))
+		return;
+
+	log->events[log->nr_events] = event;
+	log->nr_events++;
+}
+
+int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_get_event_payload *pl;
+	struct mock_event_log *log;
+	u8 log_type;
+
+	/* Valid request? */
+	if (cmd->size_in != sizeof(log_type))
+		return -EINVAL;
+
+	log_type = *((u8 *)cmd->payload_in);
+	if (log_type >= CXL_EVENT_TYPE_MAX)
+		return -EINVAL;
+
+	log = find_event_log(cxlds->dev, log_type);
+	if (!log || log_empty(log))
+		goto no_data;
+
+	pl = cmd->payload_out;
+	memset(pl, 0, sizeof(*pl));
+
+	pl->record_count = cpu_to_le16(1);
+
+	if (log_rec_left(log) > 1)
+		pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
+
+	memcpy(&pl->record[0], get_cur_event(log), sizeof(pl->record[0]));
+	pl->record[0].hdr.handle = get_cur_event_handle(log);
+	return 0;
+
+no_data:
+	/* Room for header? */
+	if (cmd->size_out < (sizeof(*pl) - sizeof(pl->record[0])))
+		return -EINVAL;
+
+	memset(cmd->payload_out, 0, cmd->size_out);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mock_get_event);
+
+/*
+ * Get and clear event only handle 1 record at a time as this is what is
+ * currently implemented in the main code.
+ */
+int mock_clear_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
+	struct mock_event_log *log;
+	u8 log_type = pl->event_log;
+
+	/* Don't handle more than 1 record at a time */
+	if (pl->nr_recs != 1)
+		return -EINVAL;
+
+	if (log_type >= CXL_EVENT_TYPE_MAX)
+		return -EINVAL;
+
+	log = find_event_log(cxlds->dev, log_type);
+	if (!log)
+		return 0; /* No mock data in this log */
+
+	/*
+	 * Test code only reported 1 event at a time.  So only support 1 event
+	 * being cleared.
+	 */
+	if (log->cur_event != le16_to_cpu(pl->handle[0])) {
+		dev_err(cxlds->dev, "Clearing events out of order\n");
+		return -EINVAL;
+	}
+
+	log->cur_event++;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mock_clear_event);
+
+void cxl_mock_event_trigger(struct device *dev)
+{
+	struct mock_event_store *mes = xa_load(&mock_dev_event_store,
+					       (unsigned long)dev);
+	int i;
+
+	for (i = CXL_EVENT_TYPE_INFO; i < CXL_EVENT_TYPE_MAX; i++) {
+		struct mock_event_log *log;
+
+		log = find_event_log(dev, i);
+		if (log)
+			log->cur_event = 0;
+	}
+
+	__cxl_mem_get_event_records(mes->cxlds, mes->ev_status);
+}
+EXPORT_SYMBOL_GPL(cxl_mock_event_trigger);
+
+struct cxl_event_record_raw maint_needed = {
+	.hdr = {
+		.id = UUID_INIT(0xDEADBEEF, 0xCAFE, 0xBABE,
+				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
+		.length = sizeof(struct cxl_event_record_raw),
+		.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0xa5b6),
+	},
+	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
+};
+
+struct cxl_event_record_raw hardware_replace = {
+	.hdr = {
+		.id = UUID_INIT(0xBABECAFE, 0xBEEF, 0xDEAD,
+				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
+		.length = sizeof(struct cxl_event_record_raw),
+		.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0xb6a5),
+	},
+	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
+};
+
+u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
+{
+	struct device *dev = cxlds->dev;
+	struct mock_event_store *mes;
+
+	mes = devm_kzalloc(dev, sizeof(*mes), GFP_KERNEL);
+	if (WARN_ON(!mes))
+		return 0;
+	mes->cxlds = cxlds;
+
+	if (xa_insert(&mock_dev_event_store, (unsigned long)dev, mes,
+		      GFP_KERNEL)) {
+		dev_err(dev, "Event store not available for %s\n",
+			dev_name(dev));
+		return 0;
+	}
+
+	event_store_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
+	mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
+
+	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
+	mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL;
+
+	return mes->ev_status;
+}
+EXPORT_SYMBOL_GPL(cxl_mock_add_event_logs);
+
+void cxl_mock_remove_event_logs(struct device *dev)
+{
+	struct mock_event_store *mes;
+
+	mes = xa_erase(&mock_dev_event_store, (unsigned long)dev);
+}
+EXPORT_SYMBOL_GPL(cxl_mock_remove_event_logs);
diff --git a/tools/testing/cxl/test/events.h b/tools/testing/cxl/test/events.h
new file mode 100644
index 000000000000..5bebc6a0a01b
--- /dev/null
+++ b/tools/testing/cxl/test/events.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <cxlmem.h>
+
+int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
+int mock_clear_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
+u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds);
+void cxl_mock_remove_event_logs(struct device *dev);
+void cxl_mock_event_trigger(struct device *dev);
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index e2f5445d24ff..333fa8527a07 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -8,6 +8,7 @@
 #include <linux/sizes.h>
 #include <linux/bits.h>
 #include <cxlmem.h>
+#include "events.h"
 
 #define LSA_SIZE SZ_128K
 #define DEV_SIZE SZ_2G
@@ -224,6 +225,12 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
 	case CXL_MBOX_OP_GET_PARTITION_INFO:
 		rc = mock_partition_info(cxlds, cmd);
 		break;
+	case CXL_MBOX_OP_GET_EVENT_RECORD:
+		rc = mock_get_event(cxlds, cmd);
+		break;
+	case CXL_MBOX_OP_CLEAR_EVENT_RECORD:
+		rc = mock_clear_event(cxlds, cmd);
+		break;
 	case CXL_MBOX_OP_SET_LSA:
 		rc = mock_set_lsa(cxlds, cmd);
 		break;
@@ -245,11 +252,27 @@ static void label_area_release(void *lsa)
 	vfree(lsa);
 }
 
+static ssize_t event_trigger_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	cxl_mock_event_trigger(dev);
+	return count;
+}
+static DEVICE_ATTR_WO(event_trigger);
+
+static struct attribute *cxl_mock_event_attrs[] = {
+	&dev_attr_event_trigger.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(cxl_mock_event);
+
 static int cxl_mock_mem_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct cxl_memdev *cxlmd;
 	struct cxl_dev_state *cxlds;
+	u32 ev_status;
 	void *lsa;
 	int rc;
 
@@ -281,11 +304,13 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	if (rc)
 		return rc;
 
+	ev_status = cxl_mock_add_event_logs(cxlds);
+
 	cxlmd = devm_cxl_add_memdev(cxlds);
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
-	cxl_mem_get_event_records(cxlds);
+	__cxl_mem_get_event_records(cxlds, ev_status);
 
 	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(dev, cxlmd);
@@ -293,6 +318,12 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	return 0;
 }
 
+static int cxl_mock_mem_remove(struct platform_device *pdev)
+{
+	cxl_mock_remove_event_logs(&pdev->dev);
+	return 0;
+}
+
 static const struct platform_device_id cxl_mock_mem_ids[] = {
 	{ .name = "cxl_mem", },
 	{ },
@@ -301,9 +332,11 @@ MODULE_DEVICE_TABLE(platform, cxl_mock_mem_ids);
 
 static struct platform_driver cxl_mock_mem_driver = {
 	.probe = cxl_mock_mem_probe,
+	.remove = cxl_mock_mem_remove,
 	.id_table = cxl_mock_mem_ids,
 	.driver = {
 		.name = KBUILD_MODNAME,
+		.dev_groups = cxl_mock_event_groups,
 	},
 };
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/11] cxl/test: Add specific events
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
                   ` (8 preceding siblings ...)
  2022-11-10 18:57 ` [PATCH 09/11] cxl/test: Add generic mock events ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-16 16:08   ` Jonathan Cameron
  2022-11-10 18:57 ` [PATCH 11/11] cxl/test: Simulate event log overflow ira.weiny
  10 siblings, 1 reply; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Each type of event has different trace point outputs.

Add mock General Media Event, DRAM event, and Memory Module Event
records to the mock list of events returned.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC:
	Adjust for struct changes
	adjust for unaligned fields
---
 tools/testing/cxl/test/events.c | 70 +++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
index a4816f230bb5..8693f3fb9cbb 100644
--- a/tools/testing/cxl/test/events.c
+++ b/tools/testing/cxl/test/events.c
@@ -186,6 +186,70 @@ struct cxl_event_record_raw hardware_replace = {
 	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 };
 
+struct cxl_event_gen_media gen_media = {
+	.hdr = {
+		.id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+				0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+		.length = sizeof(struct cxl_event_gen_media),
+		.flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0),
+	},
+	.phys_addr = cpu_to_le64(0x2000),
+	.descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+	.type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+	.transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+	.validity_flags = { CXL_GMER_VALID_CHANNEL |
+			    CXL_GMER_VALID_RANK, 0 },
+	.channel = 1,
+	.rank = 30
+};
+
+struct cxl_event_dram dram = {
+	.hdr = {
+		.id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+				0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+		.length = sizeof(struct cxl_event_dram),
+		.flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0),
+	},
+	.phys_addr = cpu_to_le64(0x8000),
+	.descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+	.type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+	.transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+	.validity_flags = { CXL_DER_VALID_CHANNEL |
+			    CXL_DER_VALID_BANK_GROUP |
+			    CXL_DER_VALID_BANK |
+			    CXL_DER_VALID_COLUMN, 0 },
+	.channel = 1,
+	.bank_group = 5,
+	.bank = 2,
+	.column = { 0xDE, 0xAD},
+};
+
+struct cxl_event_mem_module mem_module = {
+	.hdr = {
+		.id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+				0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+		.length = sizeof(struct cxl_event_mem_module),
+		/* .handle = Set dynamically */
+		.related_handle = cpu_to_le16(0),
+	},
+	.event_type = CXL_MMER_TEMP_CHANGE,
+	.info = {
+		.health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+		.media_status = CXL_DHI_MS_ALL_DATA_LOST,
+		.add_status = (CXL_DHI_AS_CRITICAL << 2) |
+			      (CXL_DHI_AS_WARNING << 4) |
+			      (CXL_DHI_AS_WARNING << 5),
+		.device_temp = { 0xDE, 0xAD},
+		.dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
+		.cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+		.cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+	}
+};
+
 u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
 {
 	struct device *dev = cxlds->dev;
@@ -204,9 +268,15 @@ u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
 	}
 
 	event_store_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
+	event_store_add_event(mes, CXL_EVENT_TYPE_INFO,
+			      (struct cxl_event_record_raw *)&gen_media);
+	event_store_add_event(mes, CXL_EVENT_TYPE_INFO,
+			      (struct cxl_event_record_raw *)&mem_module);
 	mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
 
 	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
+	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL,
+			      (struct cxl_event_record_raw *)&dram);
 	mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL;
 
 	return mes->ev_status;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/11] cxl/test: Simulate event log overflow
  2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
                   ` (9 preceding siblings ...)
  2022-11-10 18:57 ` [PATCH 10/11] cxl/test: Add specific events ira.weiny
@ 2022-11-10 18:57 ` ira.weiny
  2022-11-16 16:10   ` Jonathan Cameron
  10 siblings, 1 reply; 50+ messages in thread
From: ira.weiny @ 2022-11-10 18:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Log overflow is marked by a separate trace message.

Simulate a log with lots of messages and flag overflow until it is
drained a bit.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC
	Adjust for new struct changes
---
 tools/testing/cxl/test/events.c | 37 +++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
index 8693f3fb9cbb..5ce257114f4e 100644
--- a/tools/testing/cxl/test/events.c
+++ b/tools/testing/cxl/test/events.c
@@ -69,11 +69,21 @@ static void event_store_add_event(struct mock_event_store *mes,
 	log->nr_events++;
 }
 
+static u16 log_overflow(struct mock_event_log *log)
+{
+	int cnt = log_rec_left(log) - 5;
+
+	if (cnt < 0)
+		return 0;
+	return cnt;
+}
+
 int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	struct cxl_get_event_payload *pl;
 	struct mock_event_log *log;
 	u8 log_type;
+	u16 nr_overflow;
 
 	/* Valid request? */
 	if (cmd->size_in != sizeof(log_type))
@@ -95,6 +105,20 @@ int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 	if (log_rec_left(log) > 1)
 		pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
 
+	nr_overflow = log_overflow(log);
+	if (nr_overflow) {
+		u64 ns;
+
+		pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW;
+		pl->overflow_err_count = cpu_to_le16(nr_overflow);
+		ns = ktime_get_real_ns();
+		ns -= 5000000000; /* 5s ago */
+		pl->first_overflow_timestamp = cpu_to_le64(ns);
+		ns = ktime_get_real_ns();
+		ns -= 1000000000; /* 1s ago */
+		pl->last_overflow_timestamp = cpu_to_le64(ns);
+	}
+
 	memcpy(&pl->record[0], get_cur_event(log), sizeof(pl->record[0]));
 	pl->record[0].hdr.handle = get_cur_event_handle(log);
 	return 0;
@@ -274,6 +298,19 @@ u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
 			      (struct cxl_event_record_raw *)&mem_module);
 	mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
 
+	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &maint_needed);
+	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
+			      (struct cxl_event_record_raw *)&dram);
+	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
+			      (struct cxl_event_record_raw *)&gen_media);
+	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
+			      (struct cxl_event_record_raw *)&mem_module);
+	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
+			      (struct cxl_event_record_raw *)&dram);
+	mes->ev_status |= CXLDEV_EVENT_STATUS_FAIL;
+
 	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
 	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL,
 			      (struct cxl_event_record_raw *)&dram);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support
  2022-11-10 18:57 ` [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support ira.weiny
@ 2022-11-15 21:41   ` Dave Jiang
  2022-11-16 14:53   ` Jonathan Cameron
  1 sibling, 0 replies; 50+ messages in thread
From: Dave Jiang @ 2022-11-15 21:41 UTC (permalink / raw)
  To: ira.weiny, Dan Williams
  Cc: Davidlohr Bueso, Bjorn Helgaas, Jonathan Cameron,
	Alison Schofield, Vishal Verma, Ben Widawsky, Steven Rostedt,
	linux-kernel, linux-cxl



On 11/10/2022 10:57 AM, ira.weiny@intel.com wrote:
> From: Davidlohr Bueso <dave@stgolabs.net>
> 
> Currently the only CXL features targeted for irq support require their
> message numbers to be within the first 16 entries.  The device may
> however support less than 16 entries depending on the support it
> provides.
> 
> Attempt to allocate these 16 irq vectors.  If the device supports less
> then the PCI infrastructure will allocate that number.  Store the number
> of vectors actually allocated in the device state for later use
> by individual functions.
> 
> Upon successful allocation, users can plug in their respective isr at
> any point thereafter, for example, if the irq setup is not done in the
> PCI driver, such as the case of the CXL-PMU.
> 
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Co-developed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> Changes from Ira
> 	Remove reviews
> 	Allocate up to a static 16 vectors.
> 	Change cover letter
> ---
>   drivers/cxl/cxlmem.h |  3 +++
>   drivers/cxl/cxlpci.h |  6 ++++++
>   drivers/cxl/pci.c    | 32 ++++++++++++++++++++++++++++++++
>   3 files changed, 41 insertions(+)
> 
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..b7b955ded3ac 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
>    * @info: Cached DVSEC information about the device.
>    * @serial: PCIe Device Serial Number
>    * @doe_mbs: PCI DOE mailbox array
> + * @nr_irq_vecs: Number of MSI-X/MSI vectors available
>    * @mbox_send: @dev specific transport for transmitting mailbox commands
>    *
>    * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> @@ -247,6 +248,8 @@ struct cxl_dev_state {
>   
>   	struct xarray doe_mbs;
>   
> +	int nr_irq_vecs;
> +
>   	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>   };
>   
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index eec597dbe763..b7f4e2f417d3 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -53,6 +53,12 @@
>   #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
>   #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
>   
> +/*
> + * NOTE: Currently all the functions which are enabled for CXL require their
> + * vectors to be in the first 16.  Use this as the max.
> + */
> +#define CXL_PCI_REQUIRED_VECTORS 16
> +
>   /* Register Block Identifier (RBI) */
>   enum cxl_regloc_type {
>   	CXL_REGLOC_RBI_EMPTY = 0,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..62e560063e50 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
>   	}
>   }
>   
> +static void cxl_pci_free_irq_vectors(void *data)
> +{
> +	pci_free_irq_vectors(data);
> +}
> +
> +static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	int nvecs;
> +	int rc;
> +
> +	nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
> +				   PCI_IRQ_MSIX | PCI_IRQ_MSI);
> +	if (nvecs < 0) {
> +		dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
> +		return;
> +	}
> +
> +	rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> +	if (rc) {
> +		dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
> +		/* some got allocated, clean them up */
> +		cxl_pci_free_irq_vectors(pdev);
> +		return;
> +	}
> +
> +	cxlds->nr_irq_vecs = nvecs;
> +}
> +
>   static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   {
>   	struct cxl_register_map map;
> @@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   	if (rc)
>   		return rc;
>   
> +	cxl_pci_alloc_irq_vectors(cxlds);
> +
>   	cxlmd = devm_cxl_add_memdev(cxlds);
>   	if (IS_ERR(cxlmd))
>   		return PTR_ERR(cxlmd);

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-10 18:57 ` [PATCH 02/11] cxl/mem: Implement Get Event Records command ira.weiny
@ 2022-11-15 21:54   ` Dave Jiang
  2022-11-16 15:19   ` Jonathan Cameron
  1 sibling, 0 replies; 50+ messages in thread
From: Dave Jiang @ 2022-11-15 21:54 UTC (permalink / raw)
  To: ira.weiny, Dan Williams
  Cc: Steven Rostedt, Alison Schofield, Vishal Verma, Ben Widawsky,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl



On 11/10/2022 10:57 AM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL devices have multiple event logs which can be queried for CXL event
> records.  Devices are required to support the storage of at least one
> event record in each event log type.
> 
> Devices track event log overflow by incrementing a counter and tracking
> the time of the first and last overflow event seen.
> 
> Software queries events via the Get Event Record mailbox command; CXL
> rev 3.0 section 8.2.9.2.2.
> 
> Issue the Get Event Record mailbox command on driver load.  Trace each
> record found with a generic record trace.  Trace any overflow
> conditions.
> 
> The device can return up to 1MB worth of event records per query.  This
> presents complications with allocating a huge buffers to potentially
> capture all the records.  It is not anticipated that these event logs
> will be very deep and reading them does not need to be performant.
> Process only 3 records at a time.  3 records was chosen as it fits
> comfortably on the stack to prevent dynamic allocation while still
> cutting down on extra mailbox messages.
> 
> This patch traces a raw event record only and leaves the specific event
> record types to subsequent patches.
> 
> Macros are created to use for tracing the common CXL Event header
> fields.
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Would it be cleaner to split out the include/trace/events/cxl.h changes 
to its own patch?

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> Change from RFC v2:
> 	Support reading 3 events at once.
> 	Reverse Jonathan's suggestion and check for positive number of
> 		records.  Because the record count may have been
> 		returned as something > 3 based on what the device
> 		thinks it can send back even though the core Linux mbox
> 		processing truncates the data.
> 	Alison and Dave Jiang
> 		Change header uuid type to uuid_t for better user space
> 		processing
> 	Smita
> 		Check status reg before reading log.
> 	Steven
> 		Prefix all trace points with 'cxl_'
> 		Use static branch <trace>_enabled() calls
> 	Jonathan
> 		s/CXL_EVENT_TYPE_INFO/0
> 		s/{first,last}/{first,last}_ts
> 		Remove Reserved field from header
> 		Fix header issue for cxl_event_log_type_str()
> 
> Change from RFC:
> 	Remove redundant error message in get event records loop
> 	s/EVENT_RECORD_DATA_LENGTH/CXL_EVENT_RECORD_DATA_LENGTH
> 	Use hdr_uuid for the header UUID field
> 	Use cxl_event_log_type_str() for the trace events
> 	Create macros for the header fields and common entries of each event
> 	Add reserved buffer output dump
> 	Report error if event query fails
> 	Remove unused record_cnt variable
> 	Steven - reorder overflow record
> 		Remove NOTE about checkpatch
> 	Jonathan
> 		check for exactly 1 record
> 		s/v3.0/rev 3.0
> 		Use 3 byte fields for 24bit fields
> 		Add 3.0 Maintenance Operation Class
> 		Add Dynamic Capacity log type
> 		Fix spelling
> 	Dave Jiang/Dan/Alison
> 		s/cxl-event/cxl
> 		trace/events/cxl-events => trace/events/cxl.h
> 		s/cxl_event_overflow/overflow
> 		s/cxl_event/generic_event
> ---
>   MAINTAINERS                  |   1 +
>   drivers/cxl/core/mbox.c      |  70 +++++++++++++++++++
>   drivers/cxl/cxl.h            |   8 +++
>   drivers/cxl/cxlmem.h         |  73 ++++++++++++++++++++
>   include/trace/events/cxl.h   | 127 +++++++++++++++++++++++++++++++++++
>   include/uapi/linux/cxl_mem.h |   1 +
>   6 files changed, 280 insertions(+)
>   create mode 100644 include/trace/events/cxl.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ca063a504026..4b7c6e3055c6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5223,6 +5223,7 @@ M:	Dan Williams <dan.j.williams@intel.com>
>   L:	linux-cxl@vger.kernel.org
>   S:	Maintained
>   F:	drivers/cxl/
> +F:	include/trace/events/cxl.h
>   F:	include/uapi/linux/cxl_mem.h
>   
>   CONEXANT ACCESSRUNNER USB DRIVER
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..a908b95a7de4 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -7,6 +7,9 @@
>   #include <cxlmem.h>
>   #include <cxl.h>
>   
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl.h>
> +
>   #include "core.h"
>   
>   static bool cxl_raw_allow_all;
> @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>   	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
>   #endif
>   	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> +	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
>   	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>   	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>   	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -704,6 +708,72 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>   
> +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> +				    enum cxl_event_log_type type)
> +{
> +	struct cxl_get_event_payload payload;
> +	u16 pl_nr;
> +
> +	do {
> +		u8 log_type = type;
> +		int rc;
> +
> +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> +				       &log_type, sizeof(log_type),
> +				       &payload, sizeof(payload));
> +		if (rc) {
> +			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> +				cxl_event_log_type_str(type), rc);
> +			return;
> +		}
> +
> +		pl_nr = le16_to_cpu(payload.record_count);
> +		if (trace_cxl_generic_event_enabled()) {
> +			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> +			int i;
> +
> +			for (i = 0; i < nr_rec; i++)
> +				trace_cxl_generic_event(dev_name(cxlds->dev),
> +							type,
> +							&payload.record[i]);
> +		}
> +
> +		if (trace_cxl_overflow_enabled() &&
> +		    (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> +			trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> +
> +	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
> +		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> +}
> +
> +/**
> + * cxl_mem_get_event_records - Get Event Records from the device
> + * @cxlds: The device data for the operation
> + *
> + * Retrieve all event records available on the device and report them as trace
> + * events.
> + *
> + * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> + */
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> +{
> +	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +
> +	dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
> +
> +	if (status & CXLDEV_EVENT_STATUS_INFO)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> +	if (status & CXLDEV_EVENT_STATUS_WARN)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> +	if (status & CXLDEV_EVENT_STATUS_FAIL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> +	if (status & CXLDEV_EVENT_STATUS_FATAL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> +	if (status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> +
>   /**
>    * cxl_mem_get_partition_info - Get partition info
>    * @cxlds: The device data for the operation
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f680450f0b16..492cff1bea6d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -132,6 +132,14 @@ static inline int ways_to_cxl(unsigned int ways, u8 *iw)
>   #define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
>   #define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
>   
> +/* CXL 3.0 8.2.8.3.1 Event Status Register */
> +#define CXLDEV_DEV_EVENT_STATUS_OFFSET		0x00
> +#define CXLDEV_EVENT_STATUS_INFO		BIT(0)
> +#define CXLDEV_EVENT_STATUS_WARN		BIT(1)
> +#define CXLDEV_EVENT_STATUS_FAIL		BIT(2)
> +#define CXLDEV_EVENT_STATUS_FATAL		BIT(3)
> +#define CXLDEV_EVENT_STATUS_DYNAMIC_CAP		BIT(4)
> +
>   /* CXL 2.0 8.2.8.4 Mailbox Registers */
>   #define CXLDEV_MBOX_CAPS_OFFSET 0x00
>   #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index b7b955ded3ac..da64ba0f156b 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -4,6 +4,7 @@
>   #define __CXL_MEM_H__
>   #include <uapi/linux/cxl_mem.h>
>   #include <linux/cdev.h>
> +#include <linux/uuid.h>
>   #include "cxl.h"
>   
>   /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -256,6 +257,7 @@ struct cxl_dev_state {
>   enum cxl_opcode {
>   	CXL_MBOX_OP_INVALID		= 0x0000,
>   	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> +	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
>   	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>   	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>   	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -325,6 +327,76 @@ struct cxl_mbox_identify {
>   	u8 qos_telemetry_caps;
>   } __packed;
>   
> +/*
> + * Common Event Record Format
> + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +struct cxl_event_record_hdr {
> +	uuid_t id;
> +	u8 length;
> +	u8 flags[3];
> +	__le16 handle;
> +	__le16 related_handle;
> +	__le64 timestamp;
> +	u8 maint_op_class;
> +	u8 reserved[0xf];
> +} __packed;
> +
> +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> +struct cxl_event_record_raw {
> +	struct cxl_event_record_hdr hdr;
> +	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
> +} __packed;
> +
> +/*
> + * Get Event Records output payload
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> + */
> +#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
> +#define CXL_GET_EVENT_FLAG_MORE_RECORDS		BIT(1)
> +#define CXL_GET_EVENT_NR_RECORDS		3
> +struct cxl_get_event_payload {
> +	u8 flags;
> +	u8 reserved1;
> +	__le16 overflow_err_count;
> +	__le64 first_overflow_timestamp;
> +	__le64 last_overflow_timestamp;
> +	__le16 record_count;
> +	u8 reserved2[0xa];
> +	struct cxl_event_record_raw record[CXL_GET_EVENT_NR_RECORDS];
> +} __packed;
> +
> +/*
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> + */
> +enum cxl_event_log_type {
> +	CXL_EVENT_TYPE_INFO = 0x00,
> +	CXL_EVENT_TYPE_WARN,
> +	CXL_EVENT_TYPE_FAIL,
> +	CXL_EVENT_TYPE_FATAL,
> +	CXL_EVENT_TYPE_DYNAMIC_CAP,
> +	CXL_EVENT_TYPE_MAX
> +};
> +
> +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> +{
> +	switch (type) {
> +	case CXL_EVENT_TYPE_INFO:
> +		return "Informational";
> +	case CXL_EVENT_TYPE_WARN:
> +		return "Warning";
> +	case CXL_EVENT_TYPE_FAIL:
> +		return "Failure";
> +	case CXL_EVENT_TYPE_FATAL:
> +		return "Fatal";
> +	case CXL_EVENT_TYPE_DYNAMIC_CAP:
> +		return "Dynamic Capacity";
> +	default:
> +		break;
> +	}
> +	return "<unknown>";
> +}
> +
>   struct cxl_mbox_get_partition_info {
>   	__le64 active_volatile_cap;
>   	__le64 active_persistent_cap;
> @@ -384,6 +456,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
>   struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
>   void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>   void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
>   #ifdef CONFIG_CXL_SUSPEND
>   void cxl_mem_active_inc(void);
>   void cxl_mem_active_dec(void);
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..60dec9a84918
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,127 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl
> +
> +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_EVENTS_H
> +
> +#include <asm-generic/unaligned.h>
> +#include <linux/tracepoint.h>
> +#include <cxlmem.h>
> +
> +TRACE_EVENT(cxl_overflow,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_get_event_payload *payload),
> +
> +	TP_ARGS(dev_name, log, payload),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__field(u64, first_ts)
> +		__field(u64, last_ts)
> +		__field(u16, count)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->log = log;
> +		__entry->count = le16_to_cpu(payload->overflow_err_count);
> +		__entry->first_ts = le64_to_cpu(payload->first_overflow_timestamp);
> +		__entry->last_ts = le64_to_cpu(payload->last_overflow_timestamp);
> +	),
> +
> +	TP_printk("%s: EVENT LOG OVERFLOW log=%s : %u records from %llu to %llu",
> +		__get_str(dev_name), cxl_event_log_type_str(__entry->log),
> +		__entry->count, __entry->first_ts, __entry->last_ts)
> +
> +);
> +
> +/*
> + * Common Event Record Format
> + * CXL 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
> +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
> +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
> +#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
> +	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"Permanent Condition"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"Maintenance Needed"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"Performance Degraded"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"Hardware Replacement Needed"	}  \
> +)
> +
> +/*
> + * Define macros for the common header of each CXL event.
> + *
> + * Tracepoints using these macros must do 3 things:
> + *
> + *	1) Add CXL_EVT_TP_entry to TP_STRUCT__entry
> + *	2) Use CXL_EVT_TP_fast_assign within TP_fast_assign;
> + *	   pass the dev_name, log, and CXL event header
> + *	3) Use CXL_EVT_TP_printk() instead of TP_printk()
> + *
> + * See the generic_event tracepoint as an example.
> + */
> +#define CXL_EVT_TP_entry					\
> +	__string(dev_name, dev_name)				\
> +	__field(int, log)					\
> +	__field_struct(uuid_t, hdr_uuid)			\
> +	__field(u32, hdr_flags)					\
> +	__field(u16, hdr_handle)				\
> +	__field(u16, hdr_related_handle)			\
> +	__field(u64, hdr_timestamp)				\
> +	__field(u8, hdr_length)					\
> +	__field(u8, hdr_maint_op_class)
> +
> +#define CXL_EVT_TP_fast_assign(dname, l, hdr)					\
> +	__assign_str(dev_name, (dname));					\
> +	__entry->log = (l);							\
> +	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
> +	__entry->hdr_length = (hdr).length;					\
> +	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
> +	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
> +	__entry->hdr_related_handle = le16_to_cpu((hdr).related_handle);	\
> +	__entry->hdr_timestamp = le64_to_cpu((hdr).timestamp);			\
> +	__entry->hdr_maint_op_class = (hdr).maint_op_class
> +
> +
> +#define CXL_EVT_TP_printk(fmt, ...) \
> +	TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' "		\
> +		"handle=%x related_handle=%x maint_op_class=%u"			\
> +		" : " fmt,							\
> +		__get_str(dev_name), cxl_event_log_type_str(__entry->log),	\
> +		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> +		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
> +		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
> +		##__VA_ARGS__)
> +
> +TRACE_EVENT(cxl_generic_event,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_event_record_raw *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		CXL_EVT_TP_entry
> +		__array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
> +	),
> +
> +	TP_fast_assign(
> +		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> +		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
> +	),
> +
> +	CXL_EVT_TP_printk("%s",
> +		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
> +);
> +
> +#endif /* _CXL_TRACE_EVENTS_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl
> +#include <trace/define_trace.h>
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index c71021a2a9ed..70459be5bdd4 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -24,6 +24,7 @@
>   	___C(IDENTIFY, "Identify Command"),                               \
>   	___C(RAW, "Raw device command"),                                  \
>   	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
> +	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
>   	___C(GET_FW_INFO, "Get FW Info"),                                 \
>   	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
>   	___C(GET_LSA, "Get Label Storage Area"),                          \

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 03/11] cxl/mem: Implement Clear Event Records command
  2022-11-10 18:57 ` [PATCH 03/11] cxl/mem: Implement Clear " ira.weiny
@ 2022-11-15 22:09   ` Dave Jiang
  2022-11-16 15:24   ` Jonathan Cameron
  1 sibling, 0 replies; 50+ messages in thread
From: Dave Jiang @ 2022-11-15 22:09 UTC (permalink / raw)
  To: ira.weiny, Dan Williams
  Cc: Jonathan Cameron, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl



On 11/10/2022 10:57 AM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> command.  After an event record is read it needs to be cleared from the
> event log.
> 
> Implement cxl_clear_event_record() and call it for each record retrieved
> from the device.
> 
> Each record is cleared individually.  A clear all bit is specified but
> events could arrive between a get and the final clear all operation.
> Therefore each event is cleared specifically.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> Changes from RFC:
> 	Jonathan
> 		Clean up init of payload and use return code.
> 		Also report any error to clear the event.
> 		s/v3.0/rev 3.0
> ---
>   drivers/cxl/core/mbox.c      | 46 ++++++++++++++++++++++++++++++------
>   drivers/cxl/cxlmem.h         | 15 ++++++++++++
>   include/uapi/linux/cxl_mem.h |  1 +
>   3 files changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index a908b95a7de4..f46558e09f08 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>   #endif
>   	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
>   	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> +	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
>   	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>   	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>   	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -708,6 +709,27 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>   
> +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> +				  enum cxl_event_log_type log,
> +				  struct cxl_get_event_payload *get_pl, u16 nr)
> +{
> +	struct cxl_mbox_clear_event_payload payload = {
> +		.event_log = log,
> +		.nr_recs = nr,
> +	};
> +	int i;
> +
> +	for (i = 0; i < nr; i++) {
> +		payload.handle[i] = get_pl->record[i].hdr.handle;
> +		dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",
> +			cxl_event_log_type_str(log),
> +			le16_to_cpu(payload.handle[i]));
> +	}
> +
> +	return cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> +				 &payload, sizeof(payload), NULL, 0);
> +}
> +
>   static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>   				    enum cxl_event_log_type type)
>   {
> @@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>   		}
>   
>   		pl_nr = le16_to_cpu(payload.record_count);
> -		if (trace_cxl_generic_event_enabled()) {
> +		if (pl_nr > 0) {
>   			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
>   			int i;
>   
> -			for (i = 0; i < nr_rec; i++)
> -				trace_cxl_generic_event(dev_name(cxlds->dev),
> -							type,
> -							&payload.record[i]);
> +			if (trace_cxl_generic_event_enabled()) {
> +				for (i = 0; i < nr_rec; i++)
> +					trace_cxl_generic_event(dev_name(cxlds->dev),
> +								type,
> +								&payload.record[i]);
> +			}
> +
> +			rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> +			if (rc) {
> +				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> +					cxl_event_log_type_str(type), rc);
> +				return;
> +			}
>   		}
>   
>   		if (trace_cxl_overflow_enabled() &&
> @@ -750,10 +781,11 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>    * cxl_mem_get_event_records - Get Event Records from the device
>    * @cxlds: The device data for the operation
>    *
> - * Retrieve all event records available on the device and report them as trace
> - * events.
> + * Retrieve all event records available on the device, report them as trace
> + * events, and clear them.
>    *
>    * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> + * See CXL rev 3.0 @8.2.9.2.3 Clear Event Records
>    */
>   void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
>   {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index da64ba0f156b..28a114c7cf69 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -258,6 +258,7 @@ enum cxl_opcode {
>   	CXL_MBOX_OP_INVALID		= 0x0000,
>   	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
>   	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
> +	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
>   	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>   	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>   	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -397,6 +398,20 @@ static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
>   	return "<unknown>";
>   }
>   
> +/*
> + * Clear Event Records input payload
> + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> + *
> + * Space given for 1 record
> + */
> +struct cxl_mbox_clear_event_payload {
> +	u8 event_log;		/* enum cxl_event_log_type */
> +	u8 clear_flags;
> +	u8 nr_recs;		/* 1 for this struct */
> +	u8 reserved[3];
> +	__le16 handle[CXL_GET_EVENT_NR_RECORDS];
> +};
> +
>   struct cxl_mbox_get_partition_info {
>   	__le64 active_volatile_cap;
>   	__le64 active_persistent_cap;
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 70459be5bdd4..7c1ad8062792 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -25,6 +25,7 @@
>   	___C(RAW, "Raw device command"),                                  \
>   	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
>   	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
> +	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
>   	___C(GET_FW_INFO, "Get FW Info"),                                 \
>   	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
>   	___C(GET_LSA, "Get Label Storage Area"),                          \

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 04/11] cxl/mem: Clear events on driver load
  2022-11-10 18:57 ` [PATCH 04/11] cxl/mem: Clear events on driver load ira.weiny
@ 2022-11-15 22:10   ` Dave Jiang
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Jiang @ 2022-11-15 22:10 UTC (permalink / raw)
  To: ira.weiny, Dan Williams
  Cc: Jonathan Cameron, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl



On 11/10/2022 10:57 AM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> The information contained in the events prior to the driver loading can
> be queried at any time through other mailbox commands.
> 
> Ensure a clean slate of events by reading and clearing the events.  The
> events are sent to the trace buffer but it is not anticipated to have
> anyone listening to it at driver load time.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   drivers/cxl/pci.c            | 2 ++
>   tools/testing/cxl/test/mem.c | 2 ++
>   2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 62e560063e50..e0d511575b45 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -530,6 +530,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   	if (IS_ERR(cxlmd))
>   		return PTR_ERR(cxlmd);
>   
> +	cxl_mem_get_event_records(cxlds);
> +
>   	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>   		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
>   
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index aa2df3a15051..e2f5445d24ff 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>   	if (IS_ERR(cxlmd))
>   		return PTR_ERR(cxlmd);
>   
> +	cxl_mem_get_event_records(cxlds);
> +
>   	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>   		rc = devm_cxl_add_nvdimm(dev, cxlmd);
>   

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 05/11] cxl/mem: Trace General Media Event Record
  2022-11-10 18:57 ` [PATCH 05/11] cxl/mem: Trace General Media Event Record ira.weiny
@ 2022-11-15 22:25   ` Dave Jiang
  2022-11-16 15:31   ` Jonathan Cameron
  1 sibling, 0 replies; 50+ messages in thread
From: Dave Jiang @ 2022-11-15 22:25 UTC (permalink / raw)
  To: ira.weiny, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ben Widawsky, Steven Rostedt,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl



On 11/10/2022 10:57 AM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
> 
> Determine if the event read is a general media record and if so trace
> the record as a General Media Event Record.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> Changes from RFC v2:
> 	Output DPA flags as a single field
> 	Ensure names of fields match what TP_print outputs
> 	Steven
> 		prefix TRACE_EVENT with 'cxl_'
> 	Jonathan
> 		Remove Reserved field
> 
> Changes from RFC:
> 	Add reserved byte array
> 	Use common CXL event header record macros
> 	Jonathan
> 		Use unaligned_le{24,16} for unaligned fields
> 		Don't use the inverse of phy addr mask
> 	Dave Jiang
> 		s/cxl_gen_media_event/general_media
> 		s/cxl_evt_gen_media/cxl_event_gen_media
> ---
>   drivers/cxl/core/mbox.c    |  40 ++++++++++--
>   drivers/cxl/cxlmem.h       |  19 ++++++
>   include/trace/events/cxl.h | 124 +++++++++++++++++++++++++++++++++++++
>   3 files changed, 179 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index f46558e09f08..6d48fdb07700 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -709,6 +709,38 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>   
> +/*
> + * General Media Event Record
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +static const uuid_t gen_media_event_uuid =
> +	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> +		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
> +
> +static bool cxl_event_tracing_enabled(void)
> +{
> +	return trace_cxl_generic_event_enabled() ||
> +	       trace_cxl_general_media_enabled();
> +}
> +
> +static void cxl_trace_event_record(const char *dev_name,
> +				   enum cxl_event_log_type type,
> +				   struct cxl_event_record_raw *record)
> +{
> +	uuid_t *id = &record->hdr.id;
> +
> +	if (uuid_equal(id, &gen_media_event_uuid)) {
> +		struct cxl_event_gen_media *rec =
> +				(struct cxl_event_gen_media *)record;
> +
> +		trace_cxl_general_media(dev_name, type, rec);
> +		return;
> +	}
> +
> +	/* For unknown record types print just the header */
> +	trace_cxl_generic_event(dev_name, type, record);
> +}
> +
>   static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
>   				  enum cxl_event_log_type log,
>   				  struct cxl_get_event_payload *get_pl, u16 nr)
> @@ -754,11 +786,11 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>   			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
>   			int i;
>   
> -			if (trace_cxl_generic_event_enabled()) {
> +			if (cxl_event_tracing_enabled()) {
>   				for (i = 0; i < nr_rec; i++)
> -					trace_cxl_generic_event(dev_name(cxlds->dev),
> -								type,
> -								&payload.record[i]);
> +					cxl_trace_event_record(dev_name(cxlds->dev),
> +							       type,
> +							       &payload.record[i]);
>   			}
>   
>   			rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 28a114c7cf69..86197f3168c7 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -412,6 +412,25 @@ struct cxl_mbox_clear_event_payload {
>   	__le16 handle[CXL_GET_EVENT_NR_RECORDS];
>   };
>   
> +/*
> + * General Media Event Record
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +#define CXL_EVENT_GEN_MED_COMP_ID_SIZE	0x10
> +struct cxl_event_gen_media {
> +	struct cxl_event_record_hdr hdr;
> +	__le64 phys_addr;
> +	u8 descriptor;
> +	u8 type;
> +	u8 transaction_type;
> +	u8 validity_flags[2];
> +	u8 channel;
> +	u8 rank;
> +	u8 device[3];
> +	u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
> +	u8 reserved[0x2e];
> +} __packed;
> +
>   struct cxl_mbox_get_partition_info {
>   	__le64 active_volatile_cap;
>   	__le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> index 60dec9a84918..a0c20e110708 100644
> --- a/include/trace/events/cxl.h
> +++ b/include/trace/events/cxl.h
> @@ -119,6 +119,130 @@ TRACE_EVENT(cxl_generic_event,
>   		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
>   );
>   
> +/*
> + * Physical Address field masks
> + *
> + * General Media Event Record
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> + *
> + * DRAM Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */
> +#define CXL_DPA_FLAGS_MASK			0x3F
> +#define CXL_DPA_MASK				(~CXL_DPA_FLAGS_MASK)
> +
> +#define CXL_DPA_VOLATILE			BIT(0)
> +#define CXL_DPA_NOT_REPAIRABLE			BIT(1)
> +#define show_dpa_flags(flags)	__print_flags(flags, "|",		   \
> +	{ CXL_DPA_VOLATILE,			"VOLATILE"		}, \
> +	{ CXL_DPA_NOT_REPAIRABLE,		"NOT_REPAIRABLE"	}  \
> +)
> +
> +/*
> + * General Media Event Record - GMER
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> + */
> +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT		BIT(0)
> +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT		BIT(1)
> +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW		BIT(2)
> +#define show_event_desc_flags(flags)	__print_flags(flags, "|",		   \
> +	{ CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,		"Uncorrectable Event"	}, \
> +	{ CXL_GMER_EVT_DESC_THRESHOLD_EVENT,		"Threshold event"	}, \
> +	{ CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW,	"Poison List Overflow"	}  \
> +)
> +
> +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR			0x00
> +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR			0x01
> +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR		0x02
> +#define show_mem_event_type(type)	__print_symbolic(type,			\
> +	{ CXL_GMER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },		\
> +	{ CXL_GMER_MEM_EVT_TYPE_INV_ADDR,		"Invalid Address" },	\
> +	{ CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,	"Data Path Error" }	\
> +)
> +
> +#define CXL_GMER_TRANS_UNKNOWN				0x00
> +#define CXL_GMER_TRANS_HOST_READ			0x01
> +#define CXL_GMER_TRANS_HOST_WRITE			0x02
> +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA			0x03
> +#define CXL_GMER_TRANS_HOST_INJECT_POISON		0x04
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB		0x05
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT	0x06
> +#define show_trans_type(type)	__print_symbolic(type,					\
> +	{ CXL_GMER_TRANS_UNKNOWN,			"Unknown" },			\
> +	{ CXL_GMER_TRANS_HOST_READ,			"Host Read" },			\
> +	{ CXL_GMER_TRANS_HOST_WRITE,			"Host Write" },			\
> +	{ CXL_GMER_TRANS_HOST_SCAN_MEDIA,		"Host Scan Media" },		\
> +	{ CXL_GMER_TRANS_HOST_INJECT_POISON,		"Host Inject Poison" },		\
> +	{ CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,		"Internal Media Scrub" },	\
> +	{ CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,	"Internal Media Management" }	\
> +)
> +
> +#define CXL_GMER_VALID_CHANNEL				BIT(0)
> +#define CXL_GMER_VALID_RANK				BIT(1)
> +#define CXL_GMER_VALID_DEVICE				BIT(2)
> +#define CXL_GMER_VALID_COMPONENT			BIT(3)
> +#define show_valid_flags(flags)	__print_flags(flags, "|",		   \
> +	{ CXL_GMER_VALID_CHANNEL,			"CHANNEL"	}, \
> +	{ CXL_GMER_VALID_RANK,				"RANK"		}, \
> +	{ CXL_GMER_VALID_DEVICE,			"DEVICE"	}, \
> +	{ CXL_GMER_VALID_COMPONENT,			"COMPONENT"	}  \
> +)
> +
> +TRACE_EVENT(cxl_general_media,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_event_gen_media *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		CXL_EVT_TP_entry
> +		/* General Media */
> +		__field(u64, dpa)
> +		__field(u8, descriptor)
> +		__field(u8, type)
> +		__field(u8, transaction_type)
> +		__field(u8, channel)
> +		__field(u32, device)
> +		__array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
> +		__field(u16, validity_flags)
> +		/* Following are out of order to pack trace record */
> +		__field(u8, rank)
> +		__field(u8, dpa_flags)
> +	),
> +
> +	TP_fast_assign(
> +		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> +
> +		/* General Media */
> +		__entry->dpa = le64_to_cpu(rec->phys_addr);
> +		__entry->dpa_flags = __entry->dpa & CXL_DPA_FLAGS_MASK;
> +		/* Mask after flags have been parsed */
> +		__entry->dpa &= CXL_DPA_MASK;
> +		__entry->descriptor = rec->descriptor;
> +		__entry->type = rec->type;
> +		__entry->transaction_type = rec->transaction_type;
> +		__entry->channel = rec->channel;
> +		__entry->rank = rec->rank;
> +		__entry->device = get_unaligned_le24(rec->device);
> +		memcpy(__entry->comp_id, &rec->component_id,
> +			CXL_EVENT_GEN_MED_COMP_ID_SIZE);
> +		__entry->validity_flags = get_unaligned_le16(&rec->validity_flags);
> +	),
> +
> +	CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
> +		"descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
> +		"device=%x comp_id=%s validity_flags='%s'",
> +		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
> +		show_event_desc_flags(__entry->descriptor),
> +		show_mem_event_type(__entry->type),
> +		show_trans_type(__entry->transaction_type),
> +		__entry->channel, __entry->rank, __entry->device,
> +		__print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
> +		show_valid_flags(__entry->validity_flags)
> +	)
> +);
> +
>   #endif /* _CXL_TRACE_EVENTS_H */
>   
>   /* This part must be outside protection */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/11] cxl/mem: Trace DRAM Event Record
  2022-11-10 18:57 ` [PATCH 06/11] cxl/mem: Trace DRAM " ira.weiny
@ 2022-11-15 22:26   ` Dave Jiang
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Jiang @ 2022-11-15 22:26 UTC (permalink / raw)
  To: ira.weiny, Dan Williams
  Cc: Jonathan Cameron, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl



On 11/10/2022 10:57 AM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL rev 3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
> 
> Determine if the event read is a DRAM event record and if so trace the
> record.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> Changes from RFC v2:
> 	Output DPA flags as a separate field.
> 	Ensure field names match TP_print output
> 	Steven
> 		prefix TRACE_EVENT with 'cxl_'
> 	Jonathan
> 		Formatting fix
> 		Remove reserved field
> 
> Changes from RFC:
> 	Add reserved byte data
> 	Use new CXL header macros
> 	Jonathan
> 		Use get_unaligned_le{24,16}() for unaligned fields
> 		Use 'else if'
> 	Dave Jiang
> 		s/cxl_dram_event/dram
> 		s/cxl_evt_dram_rec/cxl_event_dram
> 	Adjust for new phys addr mask
> ---
>   drivers/cxl/core/mbox.c    | 16 ++++++-
>   drivers/cxl/cxlmem.h       | 23 ++++++++++
>   include/trace/events/cxl.h | 92 ++++++++++++++++++++++++++++++++++++++
>   3 files changed, 130 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 6d48fdb07700..b03d7b856f3d 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -717,10 +717,19 @@ static const uuid_t gen_media_event_uuid =
>   	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
>   		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
>   
> +/*
> + * DRAM Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */
> +static const uuid_t dram_event_uuid =
> +	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> +		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
> +
>   static bool cxl_event_tracing_enabled(void)
>   {
>   	return trace_cxl_generic_event_enabled() ||
> -	       trace_cxl_general_media_enabled();
> +	       trace_cxl_general_media_enabled() ||
> +	       trace_cxl_dram_enabled();
>   }
>   
>   static void cxl_trace_event_record(const char *dev_name,
> @@ -735,6 +744,11 @@ static void cxl_trace_event_record(const char *dev_name,
>   
>   		trace_cxl_general_media(dev_name, type, rec);
>   		return;
> +	} else if (uuid_equal(id, &dram_event_uuid)) {
> +		struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
> +
> +		trace_cxl_dram(dev_name, type, rec);
> +		return;
>   	}
>   
>   	/* For unknown record types print just the header */
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 86197f3168c7..87c877f0940d 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -431,6 +431,29 @@ struct cxl_event_gen_media {
>   	u8 reserved[0x2e];
>   } __packed;
>   
> +/*
> + * DRAM Event Record - DER
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
> + */
> +#define CXL_EVENT_DER_CORRECTION_MASK_SIZE	0x20
> +struct cxl_event_dram {
> +	struct cxl_event_record_hdr hdr;
> +	__le64 phys_addr;
> +	u8 descriptor;
> +	u8 type;
> +	u8 transaction_type;
> +	u8 validity_flags[2];
> +	u8 channel;
> +	u8 rank;
> +	u8 nibble_mask[3];
> +	u8 bank_group;
> +	u8 bank;
> +	u8 row[3];
> +	u8 column[2];
> +	u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
> +	u8 reserved[0x17];
> +} __packed;
> +
>   struct cxl_mbox_get_partition_info {
>   	__le64 active_volatile_cap;
>   	__le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> index a0c20e110708..37bbe59905af 100644
> --- a/include/trace/events/cxl.h
> +++ b/include/trace/events/cxl.h
> @@ -243,6 +243,98 @@ TRACE_EVENT(cxl_general_media,
>   	)
>   );
>   
> +/*
> + * DRAM Event Record - DER
> + *
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */
> +/*
> + * DRAM Event Record defines many fields the same as the General Media Event
> + * Record.  Reuse those definitions as appropriate.
> + */
> +#define CXL_DER_VALID_CHANNEL				BIT(0)
> +#define CXL_DER_VALID_RANK				BIT(1)
> +#define CXL_DER_VALID_NIBBLE				BIT(2)
> +#define CXL_DER_VALID_BANK_GROUP			BIT(3)
> +#define CXL_DER_VALID_BANK				BIT(4)
> +#define CXL_DER_VALID_ROW				BIT(5)
> +#define CXL_DER_VALID_COLUMN				BIT(6)
> +#define CXL_DER_VALID_CORRECTION_MASK			BIT(7)
> +#define show_dram_valid_flags(flags)	__print_flags(flags, "|",			   \
> +	{ CXL_DER_VALID_CHANNEL,			"CHANNEL"		}, \
> +	{ CXL_DER_VALID_RANK,				"RANK"			}, \
> +	{ CXL_DER_VALID_NIBBLE,				"NIBBLE"		}, \
> +	{ CXL_DER_VALID_BANK_GROUP,			"BANK GROUP"		}, \
> +	{ CXL_DER_VALID_BANK,				"BANK"			}, \
> +	{ CXL_DER_VALID_ROW,				"ROW"			}, \
> +	{ CXL_DER_VALID_COLUMN,				"COLUMN"		}, \
> +	{ CXL_DER_VALID_CORRECTION_MASK,		"CORRECTION MASK"	}  \
> +)
> +
> +TRACE_EVENT(cxl_dram,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_event_dram *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		CXL_EVT_TP_entry
> +		/* DRAM */
> +		__field(u64, dpa)
> +		__field(u8, descriptor)
> +		__field(u8, type)
> +		__field(u8, transaction_type)
> +		__field(u8, channel)
> +		__field(u16, validity_flags)
> +		__field(u16, column)	/* Out of order to pack trace record */
> +		__field(u32, nibble_mask)
> +		__field(u32, row)
> +		__array(u8, cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE)
> +		__field(u8, rank)	/* Out of order to pack trace record */
> +		__field(u8, bank_group)	/* Out of order to pack trace record */
> +		__field(u8, bank)	/* Out of order to pack trace record */
> +		__field(u8, dpa_flags)	/* Out of order to pack trace record */
> +	),
> +
> +	TP_fast_assign(
> +		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> +
> +		/* DRAM */
> +		__entry->dpa = le64_to_cpu(rec->phys_addr);
> +		__entry->dpa_flags = __entry->dpa & CXL_DPA_FLAGS_MASK;
> +		__entry->dpa &= CXL_DPA_MASK;
> +		__entry->descriptor = rec->descriptor;
> +		__entry->type = rec->type;
> +		__entry->transaction_type = rec->transaction_type;
> +		__entry->validity_flags = get_unaligned_le16(rec->validity_flags);
> +		__entry->channel = rec->channel;
> +		__entry->rank = rec->rank;
> +		__entry->nibble_mask = get_unaligned_le24(rec->nibble_mask);
> +		__entry->bank_group = rec->bank_group;
> +		__entry->bank = rec->bank;
> +		__entry->row = get_unaligned_le24(rec->row);
> +		__entry->column = get_unaligned_le16(rec->column);
> +		memcpy(__entry->cor_mask, &rec->correction_mask,
> +			CXL_EVENT_DER_CORRECTION_MASK_SIZE);
> +	),
> +
> +	CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' " \
> +		"transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
> +		"bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
> +		"validity_flags='%s'",
> +		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
> +		show_event_desc_flags(__entry->descriptor),
> +		show_mem_event_type(__entry->type),
> +		show_trans_type(__entry->transaction_type),
> +		__entry->channel, __entry->rank, __entry->nibble_mask,
> +		__entry->bank_group, __entry->bank,
> +		__entry->row, __entry->column,
> +		__print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
> +		show_dram_valid_flags(__entry->validity_flags)
> +	)
> +);
> +
>   #endif /* _CXL_TRACE_EVENTS_H */
>   
>   /* This part must be outside protection */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/11] cxl/mem: Trace Memory Module Event Record
  2022-11-10 18:57 ` [PATCH 07/11] cxl/mem: Trace Memory Module " ira.weiny
@ 2022-11-15 22:39   ` Dave Jiang
  2022-11-16 15:35   ` Jonathan Cameron
  2022-11-22 22:36   ` Steven Rostedt
  2 siblings, 0 replies; 50+ messages in thread
From: Dave Jiang @ 2022-11-15 22:39 UTC (permalink / raw)
  To: ira.weiny, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ben Widawsky, Steven Rostedt,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl



On 11/10/2022 10:57 AM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> 
> Determine if the event read is memory module record and if so trace the
> record.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> ---
> Changes from RFC v2:
> 	Ensure field names match TP_print output
> 	Steven
> 		prefix TRACE_EVENT with 'cxl_'
> 	Jonathan
> 		Remove reserved field
> 		Define a 1bit and 2 bit status decoder
> 		Fix paren alignment
> 
> Changes from RFC:
> 	Clean up spec reference
> 	Add reserved data
> 	Use new CXL header macros
> 	Jonathan
> 		Use else if
> 		Use get_unaligned_le*() for unaligned fields
> 	Dave Jiang
> 		s/cxl_mem_mod_event/memory_module
> 		s/cxl_evt_mem_mod_rec/cxl_event_mem_module
> ---
>   drivers/cxl/core/mbox.c    |  17 ++++-
>   drivers/cxl/cxlmem.h       |  26 +++++++
>   include/trace/events/cxl.h | 144 +++++++++++++++++++++++++++++++++++++
>   3 files changed, 186 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index b03d7b856f3d..879b228a98a0 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -725,11 +725,20 @@ static const uuid_t dram_event_uuid =
>   	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
>   		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
>   
> +/*
> + * Memory Module Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +static const uuid_t mem_mod_event_uuid =
> +	UUID_INIT(0xfe927475, 0xdd59, 0x4339,
> +		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
> +
>   static bool cxl_event_tracing_enabled(void)
>   {
>   	return trace_cxl_generic_event_enabled() ||
>   	       trace_cxl_general_media_enabled() ||
> -	       trace_cxl_dram_enabled();
> +	       trace_cxl_dram_enabled() ||
> +	       trace_cxl_memory_module_enabled();
>   }
>   
>   static void cxl_trace_event_record(const char *dev_name,
> @@ -749,6 +758,12 @@ static void cxl_trace_event_record(const char *dev_name,
>   
>   		trace_cxl_dram(dev_name, type, rec);
>   		return;
> +	} else if (uuid_equal(id, &mem_mod_event_uuid)) {
> +		struct cxl_event_mem_module *rec =
> +				(struct cxl_event_mem_module *)record;
> +
> +		trace_cxl_memory_module(dev_name, type, rec);
> +		return;
>   	}
>   
>   	/* For unknown record types print just the header */
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 87c877f0940d..03da4f8f74d3 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -454,6 +454,32 @@ struct cxl_event_dram {
>   	u8 reserved[0x17];
>   } __packed;
>   
> +/*
> + * Get Health Info Record
> + * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +struct cxl_get_health_info {
> +	u8 health_status;
> +	u8 media_status;
> +	u8 add_status;
> +	u8 life_used;
> +	u8 device_temp[2];
> +	u8 dirty_shutdown_cnt[4];
> +	u8 cor_vol_err_cnt[4];
> +	u8 cor_per_err_cnt[4];
> +} __packed;
> +
> +/*
> + * Memory Module Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +struct cxl_event_mem_module {
> +	struct cxl_event_record_hdr hdr;
> +	u8 event_type;
> +	struct cxl_get_health_info info;
> +	u8 reserved[0x3d];
> +} __packed;
> +
>   struct cxl_mbox_get_partition_info {
>   	__le64 active_volatile_cap;
>   	__le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> index 37bbe59905af..05437e13a882 100644
> --- a/include/trace/events/cxl.h
> +++ b/include/trace/events/cxl.h
> @@ -335,6 +335,150 @@ TRACE_EVENT(cxl_dram,
>   	)
>   );
>   
> +/*
> + * Memory Module Event Record - MMER
> + *
> + * CXL res 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +#define CXL_MMER_HEALTH_STATUS_CHANGE		0x00
> +#define CXL_MMER_MEDIA_STATUS_CHANGE		0x01
> +#define CXL_MMER_LIFE_USED_CHANGE		0x02
> +#define CXL_MMER_TEMP_CHANGE			0x03
> +#define CXL_MMER_DATA_PATH_ERROR		0x04
> +#define CXL_MMER_LAS_ERROR			0x05
> +#define show_dev_evt_type(type)	__print_symbolic(type,			   \
> +	{ CXL_MMER_HEALTH_STATUS_CHANGE,	"Health Status Change"	}, \
> +	{ CXL_MMER_MEDIA_STATUS_CHANGE,		"Media Status Change"	}, \
> +	{ CXL_MMER_LIFE_USED_CHANGE,		"Life Used Change"	}, \
> +	{ CXL_MMER_TEMP_CHANGE,			"Temperature Change"	}, \
> +	{ CXL_MMER_DATA_PATH_ERROR,		"Data Path Error"	}, \
> +	{ CXL_MMER_LAS_ERROR,			"LSA Error"		}  \
> +)
> +
> +/*
> + * Device Health Information - DHI
> + *
> + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +#define CXL_DHI_HS_MAINTENANCE_NEEDED				BIT(0)
> +#define CXL_DHI_HS_PERFORMANCE_DEGRADED				BIT(1)
> +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED			BIT(2)
> +#define show_health_status_flags(flags)	__print_flags(flags, "|",	   \
> +	{ CXL_DHI_HS_MAINTENANCE_NEEDED,	"Maintenance Needed"	}, \
> +	{ CXL_DHI_HS_PERFORMANCE_DEGRADED,	"Performance Degraded"	}, \
> +	{ CXL_DHI_HS_HW_REPLACEMENT_NEEDED,	"Replacement Needed"	}  \
> +)
> +
> +#define CXL_DHI_MS_NORMAL							0x00
> +#define CXL_DHI_MS_NOT_READY							0x01
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST					0x02
> +#define CXL_DHI_MS_ALL_DATA_LOST						0x03
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS			0x04
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN			0x05
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT				0x06
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS				0x07
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN				0x08
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT					0x09
> +#define show_media_status(ms)	__print_symbolic(ms,			   \
> +	{ CXL_DHI_MS_NORMAL,						   \
> +		"Normal"						}, \
> +	{ CXL_DHI_MS_NOT_READY,						   \
> +		"Not Ready"						}, \
> +	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOST,				   \
> +		"Write Persistency Lost"				}, \
> +	{ CXL_DHI_MS_ALL_DATA_LOST,					   \
> +		"All Data Lost"						}, \
> +	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS,		   \
> +		"Write Persistency Loss in the Event of Power Loss"	}, \
> +	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN,		   \
> +		"Write Persistency Loss in Event of Shutdown"		}, \
> +	{ CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT,			   \
> +		"Write Persistency Loss Imminent"			}, \
> +	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS,		   \
> +		"All Data Loss in Event of Power Loss"			}, \
> +	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN,		   \
> +		"All Data loss in the Event of Shutdown"		}, \
> +	{ CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT,			   \
> +		"All Data Loss Imminent"				}  \
> +)
> +
> +#define CXL_DHI_AS_NORMAL		0x0
> +#define CXL_DHI_AS_WARNING		0x1
> +#define CXL_DHI_AS_CRITICAL		0x2
> +#define show_two_bit_status(as) __print_symbolic(as,	   \
> +	{ CXL_DHI_AS_NORMAL,		"Normal"	}, \
> +	{ CXL_DHI_AS_WARNING,		"Warning"	}, \
> +	{ CXL_DHI_AS_CRITICAL,		"Critical"	}  \
> +)
> +#define show_one_bit_status(as) __print_symbolic(as,	   \
> +	{ CXL_DHI_AS_NORMAL,		"Normal"	}, \
> +	{ CXL_DHI_AS_WARNING,		"Warning"	}  \
> +)
> +
> +#define CXL_DHI_AS_LIFE_USED(as)			(as & 0x3)
> +#define CXL_DHI_AS_DEV_TEMP(as)				((as & 0xC) >> 2)
> +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)			((as & 0x10) >> 4)
> +#define CXL_DHI_AS_COR_PER_ERR_CNT(as)			((as & 0x20) >> 5)
> +
> +TRACE_EVENT(cxl_memory_module,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_event_mem_module *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		CXL_EVT_TP_entry
> +
> +		/* Memory Module Event */
> +		__field(u8, event_type)
> +
> +		/* Device Health Info */
> +		__field(u8, health_status)
> +		__field(u8, media_status)
> +		__field(u8, life_used)
> +		__field(u32, dirty_shutdown_cnt)
> +		__field(u32, cor_vol_err_cnt)
> +		__field(u32, cor_per_err_cnt)
> +		__field(s16, device_temp)
> +		__field(u8, add_status)
> +	),
> +
> +	TP_fast_assign(
> +		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> +
> +		/* Memory Module Event */
> +		__entry->event_type = rec->event_type;
> +
> +		/* Device Health Info */
> +		__entry->health_status = rec->info.health_status;
> +		__entry->media_status = rec->info.media_status;
> +		__entry->life_used = rec->info.life_used;
> +		__entry->dirty_shutdown_cnt = get_unaligned_le32(rec->info.dirty_shutdown_cnt);
> +		__entry->cor_vol_err_cnt = get_unaligned_le32(rec->info.cor_vol_err_cnt);
> +		__entry->cor_per_err_cnt = get_unaligned_le32(rec->info.cor_per_err_cnt);
> +		__entry->device_temp = get_unaligned_le16(rec->info.device_temp);
> +		__entry->add_status = rec->info.add_status;
> +	),
> +
> +	CXL_EVT_TP_printk("event_type='%s' health_status='%s' media_status='%s' " \
> +		"as_life_used=%s as_dev_temp=%s as_cor_vol_err_cnt=%s " \
> +		"as_cor_per_err_cnt=%s life_used=%u device_temp=%d " \
> +		"dirty_shutdown_cnt=%u cor_vol_err_cnt=%u cor_per_err_cnt=%u",
> +		show_dev_evt_type(__entry->event_type),
> +		show_health_status_flags(__entry->health_status),
> +		show_media_status(__entry->media_status),
> +		show_two_bit_status(CXL_DHI_AS_LIFE_USED(__entry->add_status)),
> +		show_two_bit_status(CXL_DHI_AS_DEV_TEMP(__entry->add_status)),
> +		show_one_bit_status(CXL_DHI_AS_COR_VOL_ERR_CNT(__entry->add_status)),
> +		show_one_bit_status(CXL_DHI_AS_COR_PER_ERR_CNT(__entry->add_status)),
> +		__entry->life_used, __entry->device_temp,
> +		__entry->dirty_shutdown_cnt, __entry->cor_vol_err_cnt,
> +		__entry->cor_per_err_cnt
> +	)
> +);
> +
> +
>   #endif /* _CXL_TRACE_EVENTS_H */
>   
>   /* This part must be outside protection */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/11] cxl/mem: Wire up event interrupts
  2022-11-10 18:57 ` [PATCH 08/11] cxl/mem: Wire up event interrupts ira.weiny
@ 2022-11-15 23:13   ` Dave Jiang
  2022-11-17  1:38     ` Ira Weiny
  2022-11-16 14:40   ` Jonathan Cameron
  1 sibling, 1 reply; 50+ messages in thread
From: Dave Jiang @ 2022-11-15 23:13 UTC (permalink / raw)
  To: ira.weiny, Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ben Widawsky, Steven Rostedt,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl



On 11/10/2022 10:57 AM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL device events are signaled via interrupts.  Each event log may have
> a different interrupt message number.  These message numbers are
> reported in the Get Event Interrupt Policy mailbox command.
> 
> Add interrupt support for event logs.  Interrupts are allocated as
> shared interrupts.  Therefore, all or some event logs can share the same
> message number.
> 
> The driver must deal with the possibility that dynamic capacity is not
> yet supported by a device it sees.  Fallback and retry without dynamic
> capacity if the first attempt fails.
> 
> Device capacity event logs interrupt as part of the informational event
> log.  Check the event status to see which log has data.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from RFC v2
> 	Adjust to new irq 16 vector allocation
> 	Jonathan
> 		Remove CXL_INT_RES
> 	Use irq threads to ensure mailbox commands are executed outside irq context
> 	Adjust for optional Dynamic Capacity log
> ---
>   drivers/cxl/core/mbox.c      |  53 +++++++++++++-
>   drivers/cxl/cxlmem.h         |  31 ++++++++
>   drivers/cxl/pci.c            | 133 +++++++++++++++++++++++++++++++++++
>   include/uapi/linux/cxl_mem.h |   2 +
>   4 files changed, 217 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 879b228a98a0..1e6762af2a00 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -53,6 +53,8 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>   	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
>   	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
>   	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
> +	CXL_CMD(GET_EVT_INT_POLICY, 0, 0x5, 0),
> +	CXL_CMD(SET_EVT_INT_POLICY, 0x5, 0, 0),
>   	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>   	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>   	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -791,8 +793,8 @@ static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
>   				 &payload, sizeof(payload), NULL, 0);
>   }
>   
> -static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> -				    enum cxl_event_log_type type)
> +void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> +			     enum cxl_event_log_type type)
>   {
>   	struct cxl_get_event_payload payload;
>   	u16 pl_nr;
> @@ -837,6 +839,7 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>   	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
>   		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
>   }
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_records_log, CXL);
>   
>   /**
>    * cxl_mem_get_event_records - Get Event Records from the device
> @@ -867,6 +870,52 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
>   
> +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
> +{
> +	struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
> +	size_t policy_size = sizeof(*policy);
> +	bool retry = true;
> +	int rc;
> +
> +	policy->info_settings = CXL_INT_MSI_MSIX;
> +	policy->warn_settings = CXL_INT_MSI_MSIX;
> +	policy->failure_settings = CXL_INT_MSI_MSIX;
> +	policy->fatal_settings = CXL_INT_MSI_MSIX;
> +	policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> +
> +again:
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
> +			       policy, policy_size, NULL, 0);
> +	if (rc < 0) {
> +		/*
> +		 * If the device does not support dynamic capacity it may fail
> +		 * the command due to an invalid payload.  Retry without
> +		 * dynamic capacity.
> +		 */
> +		if (retry) {
> +			retry = false;
> +			policy->dyn_cap_settings = 0;
> +			policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> +			goto again;
> +		}
> +		dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
> +			rc);
> +		memset(policy, CXL_INT_NONE, sizeof(*policy));
> +		return rc;
> +	}

Up to you, but I think you can avoid the goto:

	int retry = 2;
	do {
		rc = cxl_mbox_send_cmd(...);
		if (rc == 0 || retry == 1)
			break;
		policy->dyn_cap_settings = 0;
		policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
		retry--;
	} while (retry);

	if (rc < 0) {
		dev_err(...);
		memset(policy, ...);
		return rc;
	}

> +
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVT_INT_POLICY, NULL, 0,
> +			       policy, policy_size);
> +	if (rc < 0) {
> +		dev_err(cxlds->dev, "Failed to get event interrupt policy : %d",
> +			rc);
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_config_msgnums, CXL);
> +
>   /**
>    * cxl_mem_get_partition_info - Get partition info
>    * @cxlds: The device data for the operation
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 03da4f8f74d3..4d9c3ea30c24 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -179,6 +179,31 @@ struct cxl_endpoint_dvsec_info {
>   	struct range dvsec_range[2];
>   };
>   
> +/**
> + * Event Interrupt Policy
> + *
> + * CXL rev 3.0 section 8.2.9.2.4; Table 8-52
> + */
> +enum cxl_event_int_mode {
> +	CXL_INT_NONE		= 0x00,
> +	CXL_INT_MSI_MSIX	= 0x01,
> +	CXL_INT_FW		= 0x02
> +};
> +#define CXL_EVENT_INT_MODE_MASK 0x3
> +#define CXL_EVENT_INT_MSGNUM(setting) (((setting) & 0xf0) >> 4)
> +struct cxl_event_interrupt_policy {
> +	u8 info_settings;
> +	u8 warn_settings;
> +	u8 failure_settings;
> +	u8 fatal_settings;
> +	u8 dyn_cap_settings;
> +} __packed;
> +
> +static inline bool cxl_evt_int_is_msi(u8 setting)
> +{
> +	return CXL_INT_MSI_MSIX == (setting & CXL_EVENT_INT_MODE_MASK);
> +}
> +
>   /**
>    * struct cxl_dev_state - The driver device state
>    *
> @@ -246,6 +271,7 @@ struct cxl_dev_state {
>   
>   	resource_size_t component_reg_phys;
>   	u64 serial;
> +	struct cxl_event_interrupt_policy evt_int_policy;
>   
>   	struct xarray doe_mbs;
>   
> @@ -259,6 +285,8 @@ enum cxl_opcode {
>   	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
>   	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
>   	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
> +	CXL_MBOX_OP_GET_EVT_INT_POLICY	= 0x0102,
> +	CXL_MBOX_OP_SET_EVT_INT_POLICY	= 0x0103,
>   	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>   	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>   	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -539,7 +567,10 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
>   struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
>   void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>   void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> +			     enum cxl_event_log_type type);
>   void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
> +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds);
>   #ifdef CONFIG_CXL_SUSPEND
>   void cxl_mem_active_inc(void);
>   void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e0d511575b45..64b2e2671043 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -458,6 +458,138 @@ static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
>   	cxlds->nr_irq_vecs = nvecs;
>   }
>   
> +struct cxl_event_irq_id {
> +	struct cxl_dev_state *cxlds;
> +	u32 status;
> +	unsigned int msgnum;
> +};
> +
> +static irqreturn_t cxl_event_int_thread(int irq, void *id)
> +{
> +	struct cxl_event_irq_id *cxlid = id;
> +	struct cxl_dev_state *cxlds = cxlid->cxlds;
> +
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_INFO)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_WARN)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_FAIL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_FATAL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t cxl_event_int_handler(int irq, void *id)
> +{
> +	struct cxl_event_irq_id *cxlid = id;
> +	struct cxl_dev_state *cxlds = cxlid->cxlds;
> +	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +
> +	if (cxlid->status & status)
> +		return IRQ_WAKE_THREAD;
> +	return IRQ_HANDLED;

IRQ_NONE since your handler did not handle anything and this is a shared 
interrupt?

> +}
> +
> +static void cxl_free_event_irq(void *id)
> +{
> +	struct cxl_event_irq_id *cxlid = id;
> +	struct pci_dev *pdev = to_pci_dev(cxlid->cxlds->dev);
> +
> +	pci_free_irq(pdev, cxlid->msgnum, id);
> +}
> +
> +static u32 log_type_to_status(enum cxl_event_log_type log_type)
> +{
> +	switch (log_type) {
> +	case CXL_EVENT_TYPE_INFO:
> +		return CXLDEV_EVENT_STATUS_INFO | CXLDEV_EVENT_STATUS_DYNAMIC_CAP;
> +	case CXL_EVENT_TYPE_WARN:
> +		return CXLDEV_EVENT_STATUS_WARN;
> +	case CXL_EVENT_TYPE_FAIL:
> +		return CXLDEV_EVENT_STATUS_FAIL;
> +	case CXL_EVENT_TYPE_FATAL:
> +		return CXLDEV_EVENT_STATUS_FATAL;
> +	default:
> +		break;
> +	}
> +	return 0;
> +}
> +
> +static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
> +				 enum cxl_event_log_type log_type,
> +				 u8 setting)
> +{
> +	struct device *dev = cxlds->dev;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct cxl_event_irq_id *id;
> +	unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
> +	int irq;

int rc? pci_request_irq() returns an errno or 0, not the number of irq. 
The variable naming is a bit confusing.

DJ

> +
> +	/* Disabled irq is not an error */
> +	if (!cxl_evt_int_is_msi(setting) || msgnum > cxlds->nr_irq_vecs) {
> +		dev_dbg(dev, "Event interrupt not enabled; %s %u %d\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_INFO),
> +			msgnum, cxlds->nr_irq_vecs);
> +		return 0;
> +	}
> +
> +	id = devm_kzalloc(dev, sizeof(*id), GFP_KERNEL);
> +	if (!id)
> +		return -ENOMEM;
> +
> +	id->cxlds = cxlds;
> +	id->msgnum = msgnum;
> +	id->status = log_type_to_status(log_type);
> +
> +	irq = pci_request_irq(pdev, id->msgnum, cxl_event_int_handler,
> +			      cxl_event_int_thread, id,
> +			      "%s:event-log-%s", dev_name(dev),
> +			      cxl_event_log_type_str(log_type));
> +	if (irq)
> +		return irq;
> +
> +	devm_add_action_or_reset(dev, cxl_free_event_irq, id);
> +	return 0;
> +}
> +
> +static void cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	u8 setting;
> +
> +	if (cxl_event_config_msgnums(cxlds))
> +		return;
> +
> +	/*
> +	 * Dynamic Capacity shares the info message number
> +	 * Nothing to be done except check the status bit in the
> +	 * irq thread.
> +	 */
> +	setting = cxlds->evt_int_policy.info_settings;
> +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_INFO, setting))
> +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_INFO));
> +
> +	setting = cxlds->evt_int_policy.warn_settings;
> +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_WARN, setting))
> +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_WARN));
> +
> +	setting = cxlds->evt_int_policy.failure_settings;
> +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FAIL, setting))
> +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_FAIL));
> +
> +	setting = cxlds->evt_int_policy.fatal_settings;
> +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FATAL, setting))
> +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_FATAL));
> +}
> +
>   static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   {
>   	struct cxl_register_map map;
> @@ -525,6 +657,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   		return rc;
>   
>   	cxl_pci_alloc_irq_vectors(cxlds);
> +	cxl_event_irqsetup(cxlds);
>   
>   	cxlmd = devm_cxl_add_memdev(cxlds);
>   	if (IS_ERR(cxlmd))
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 7c1ad8062792..a8204802fcca 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -26,6 +26,8 @@
>   	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
>   	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
>   	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
> +	___C(GET_EVT_INT_POLICY, "Get Event Interrupt Policy"),           \
> +	___C(SET_EVT_INT_POLICY, "Set Event Interrupt Policy"),           \
>   	___C(GET_FW_INFO, "Get FW Info"),                                 \
>   	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
>   	___C(GET_LSA, "Get Label Storage Area"),                          \

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/11] cxl/mem: Wire up event interrupts
  2022-11-10 18:57 ` [PATCH 08/11] cxl/mem: Wire up event interrupts ira.weiny
  2022-11-15 23:13   ` Dave Jiang
@ 2022-11-16 14:40   ` Jonathan Cameron
  2022-11-30  9:11     ` Ira Weiny
  1 sibling, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 14:40 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 10 Nov 2022 10:57:55 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL device events are signaled via interrupts.  Each event log may have
> a different interrupt message number.  These message numbers are
> reported in the Get Event Interrupt Policy mailbox command.
> 
> Add interrupt support for event logs.  Interrupts are allocated as
> shared interrupts.  Therefore, all or some event logs can share the same
> message number.
> 
> The driver must deal with the possibility that dynamic capacity is not
> yet supported by a device it sees.  Fallback and retry without dynamic
> capacity if the first attempt fails.
> 
> Device capacity event logs interrupt as part of the informational event
> log.  Check the event status to see which log has data.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
Hi Ira,

A few comments inline.

Thanks,

Jonathan

> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 879b228a98a0..1e6762af2a00 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c

>  /**
>   * cxl_mem_get_event_records - Get Event Records from the device
> @@ -867,6 +870,52 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
>  
> +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
> +{
> +	struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
> +	size_t policy_size = sizeof(*policy);
> +	bool retry = true;
> +	int rc;
> +
> +	policy->info_settings = CXL_INT_MSI_MSIX;
> +	policy->warn_settings = CXL_INT_MSI_MSIX;
> +	policy->failure_settings = CXL_INT_MSI_MSIX;
> +	policy->fatal_settings = CXL_INT_MSI_MSIX;
> +	policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> +
> +again:
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
> +			       policy, policy_size, NULL, 0);
> +	if (rc < 0) {
> +		/*
> +		 * If the device does not support dynamic capacity it may fail
> +		 * the command due to an invalid payload.  Retry without
> +		 * dynamic capacity.
> +		 */

There are a number of ways to discover if DCD is supported that aren't based
on try and retry like this. 9.13.3 has "basic sequence to utilize Dynamic Capacity"
That calls out:
Verify the necessary Dynamic Capacity commands are returned in the CEL.

First I'm not sure we should set the interrupt on for DCD until we have a lot
more of the flow handled, secondly even then we should figure out if it is supported
at a higher level than this command and pass that info down here.


> +		if (retry) {
> +			retry = false;
> +			policy->dyn_cap_settings = 0;
> +			policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> +			goto again;
> +		}
> +		dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
> +			rc);
> +		memset(policy, CXL_INT_NONE, sizeof(*policy));

Relying on all the fields being 1 byte is a bit error prone. I'd just set them all
individually in the interests of more readable code.

> +		return rc;
> +	}
> +
> +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVT_INT_POLICY, NULL, 0,
> +			       policy, policy_size);

Add a comment on why you are reading this back (to get the msgnums in the upper
bits) as it's not obvious to a casual reader.

> +	if (rc < 0) {
> +		dev_err(cxlds->dev, "Failed to get event interrupt policy : %d",
> +			rc);
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_config_msgnums, CXL);
> +

...

> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e0d511575b45..64b2e2671043 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -458,6 +458,138 @@ static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
>  	cxlds->nr_irq_vecs = nvecs;
>  }
>  
> +struct cxl_event_irq_id {
> +	struct cxl_dev_state *cxlds;
> +	u32 status;
> +	unsigned int msgnum;
msgnum is only here for freeing the interrupt - I'd rather we fixed
that by using standard infrastructure (or adding some - see below).

status is an indirect way of allowing us to share an interrupt handler.
You could do that by registering a trivial wrapper for each instead.
Then all you have left is the cxl_dev_state which could be passed
in directly as the callback parameter removing need to have this
structure at all.  I think that might be neater.

> +};
> +
> +static irqreturn_t cxl_event_int_thread(int irq, void *id)
> +{
> +	struct cxl_event_irq_id *cxlid = id;
> +	struct cxl_dev_state *cxlds = cxlid->cxlds;
> +
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_INFO)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_WARN)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_FAIL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_FATAL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> +	if (cxlid->status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t cxl_event_int_handler(int irq, void *id)
> +{
> +	struct cxl_event_irq_id *cxlid = id;
> +	struct cxl_dev_state *cxlds = cxlid->cxlds;
> +	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +
> +	if (cxlid->status & status)
> +		return IRQ_WAKE_THREAD;
> +	return IRQ_HANDLED;

If status not set IRQ_NONE.
Ah. I see Dave raised this as well.

> +}

...

> +static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
> +				 enum cxl_event_log_type log_type,
> +				 u8 setting)
> +{
> +	struct device *dev = cxlds->dev;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct cxl_event_irq_id *id;
> +	unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
> +	int irq;
> +
> +	/* Disabled irq is not an error */
> +	if (!cxl_evt_int_is_msi(setting) || msgnum > cxlds->nr_irq_vecs) {

I don't think that second condition can occur.  The language under table 8-52
(I think) means that it will move around if there aren't enough vectors
(for MSI - MSI-X is more complex, but result the same).

> +		dev_dbg(dev, "Event interrupt not enabled; %s %u %d\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_INFO),
> +			msgnum, cxlds->nr_irq_vecs);
> +		return 0;
> +	}
> +
> +	id = devm_kzalloc(dev, sizeof(*id), GFP_KERNEL);
> +	if (!id)
> +		return -ENOMEM;
> +
> +	id->cxlds = cxlds;
> +	id->msgnum = msgnum;
> +	id->status = log_type_to_status(log_type);
> +
> +	irq = pci_request_irq(pdev, id->msgnum, cxl_event_int_handler,
> +			      cxl_event_int_thread, id,
> +			      "%s:event-log-%s", dev_name(dev),
> +			      cxl_event_log_type_str(log_type));
> +	if (irq)
> +		return irq;
> +
> +	devm_add_action_or_reset(dev, cxl_free_event_irq, id);

Hmm. no pcim_request_irq()  maybe this is the time to propose one
(separate from this patch so we don't get delayed by that!)

We discussed this way back in DOE series (I'd forgotten but lore found
it for me).  There I suggested just calling
devm_request_threaded_irq() directly as a work around.

> +	return 0;
> +}
> +
> +static void cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	u8 setting;
> +
> +	if (cxl_event_config_msgnums(cxlds))
> +		return;
> +
> +	/*
> +	 * Dynamic Capacity shares the info message number
> +	 * Nothing to be done except check the status bit in the
> +	 * irq thread.
> +	 */
> +	setting = cxlds->evt_int_policy.info_settings;
> +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_INFO, setting))
> +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_INFO));
> +
> +	setting = cxlds->evt_int_policy.warn_settings;
> +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_WARN, setting))
> +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_WARN));
> +
> +	setting = cxlds->evt_int_policy.failure_settings;
> +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FAIL, setting))
> +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_FAIL));
> +
> +	setting = cxlds->evt_int_policy.fatal_settings;
> +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FATAL, setting))
> +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> +			cxl_event_log_type_str(CXL_EVENT_TYPE_FATAL));
> +}


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support
  2022-11-10 18:57 ` [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support ira.weiny
  2022-11-15 21:41   ` Dave Jiang
@ 2022-11-16 14:53   ` Jonathan Cameron
  2022-11-16 23:48     ` Ira Weiny
  1 sibling, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 14:53 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Davidlohr Bueso, Bjorn Helgaas, Alison Schofield,
	Vishal Verma, Ben Widawsky, Steven Rostedt, linux-kernel,
	linux-cxl

On Thu, 10 Nov 2022 10:57:48 -0800
ira.weiny@intel.com wrote:

> From: Davidlohr Bueso <dave@stgolabs.net>
> 
> Currently the only CXL features targeted for irq support require their
> message numbers to be within the first 16 entries.  The device may
> however support less than 16 entries depending on the support it
> provides.
> 
> Attempt to allocate these 16 irq vectors.  If the device supports less
> then the PCI infrastructure will allocate that number.  Store the number
> of vectors actually allocated in the device state for later use
> by individual functions.
See later patch review, but I don't think we need to store the number
allocated because any vector is guaranteed to be below that point
(QEMU code is wrong on this at the momemt, but there are very few vectors
 so it hasn't mattered yet).

Otherwise, pcim fun deals with some of the cleanup you are doing again
here for us so can simplify this somewhat. See inline.

Jonathan



> 
> Upon successful allocation, users can plug in their respective isr at
> any point thereafter, for example, if the irq setup is not done in the
> PCI driver, such as the case of the CXL-PMU.
> 
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Co-developed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> 
> ---
> Changes from Ira
> 	Remove reviews
> 	Allocate up to a static 16 vectors.
> 	Change cover letter
> ---
>  drivers/cxl/cxlmem.h |  3 +++
>  drivers/cxl/cxlpci.h |  6 ++++++
>  drivers/cxl/pci.c    | 32 ++++++++++++++++++++++++++++++++
>  3 files changed, 41 insertions(+)
> 
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..b7b955ded3ac 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
>   * @info: Cached DVSEC information about the device.
>   * @serial: PCIe Device Serial Number
>   * @doe_mbs: PCI DOE mailbox array
> + * @nr_irq_vecs: Number of MSI-X/MSI vectors available
>   * @mbox_send: @dev specific transport for transmitting mailbox commands
>   *
>   * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> @@ -247,6 +248,8 @@ struct cxl_dev_state {
>  
>  	struct xarray doe_mbs;
>  
> +	int nr_irq_vecs;
> +
>  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>  };
>  
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index eec597dbe763..b7f4e2f417d3 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -53,6 +53,12 @@
>  #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
>  #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
>  
> +/*
> + * NOTE: Currently all the functions which are enabled for CXL require their
> + * vectors to be in the first 16.  Use this as the max.
> + */
> +#define CXL_PCI_REQUIRED_VECTORS 16
> +
>  /* Register Block Identifier (RBI) */
>  enum cxl_regloc_type {
>  	CXL_REGLOC_RBI_EMPTY = 0,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..62e560063e50 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
>  	}
>  }
>  
> +static void cxl_pci_free_irq_vectors(void *data)
> +{
> +	pci_free_irq_vectors(data);
> +}
> +
> +static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	int nvecs;
> +	int rc;
> +
> +	nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
> +				   PCI_IRQ_MSIX | PCI_IRQ_MSI);
> +	if (nvecs < 0) {
> +		dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
> +		return;
> +	}
> +
> +	rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
The pci managed code always gives me a headache because there is a lot of magic
under the hood if you ever called pcim_enable_device() which we did.

Chasing through

pci_alloc_irq_vectors_affinity()->
either
	__pci_enable_msix_range()
or
	__pci_enable_msi_range()

they are similar
	pci_setup_msi_context()
		pci_setup_msi_release()
			adds pcmi_msi_release devm action.
and that frees the vectors for us.
So we don't need to do it here.


> +	if (rc) {
> +		dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
> +		/* some got allocated, clean them up */
> +		cxl_pci_free_irq_vectors(pdev);
We could just leave them lying around for devm cleanup to sweep up eventually
or free them as you have done here.

> +		return;
> +	}
> +
> +	cxlds->nr_irq_vecs = nvecs;
> +}
> +
>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct cxl_register_map map;
> @@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (rc)
>  		return rc;
>  
> +	cxl_pci_alloc_irq_vectors(cxlds);
> +
>  	cxlmd = devm_cxl_add_memdev(cxlds);
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-10 18:57 ` [PATCH 02/11] cxl/mem: Implement Get Event Records command ira.weiny
  2022-11-15 21:54   ` Dave Jiang
@ 2022-11-16 15:19   ` Jonathan Cameron
  2022-11-17  0:47     ` Ira Weiny
  1 sibling, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 15:19 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 10 Nov 2022 10:57:49 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL devices have multiple event logs which can be queried for CXL event
> records.  Devices are required to support the storage of at least one
> event record in each event log type.
> 
> Devices track event log overflow by incrementing a counter and tracking
> the time of the first and last overflow event seen.
> 
> Software queries events via the Get Event Record mailbox command; CXL
> rev 3.0 section 8.2.9.2.2.
> 
> Issue the Get Event Record mailbox command on driver load.  Trace each
> record found with a generic record trace.  Trace any overflow
> conditions.
> 
> The device can return up to 1MB worth of event records per query.  This
> presents complications with allocating a huge buffers to potentially
> capture all the records.  It is not anticipated that these event logs
> will be very deep and reading them does not need to be performant.
> Process only 3 records at a time.  3 records was chosen as it fits
> comfortably on the stack to prevent dynamic allocation while still
> cutting down on extra mailbox messages.
> 
> This patch traces a raw event record only and leaves the specific event
> record types to subsequent patches.
> 
> Macros are created to use for tracing the common CXL Event header
> fields.
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Hi Ira,

A question inline about whether some of the conditions you are checking
for can actually happen. Otherwise looks good to me.

Jonathan

> 
> ---
> Change from RFC v2:
> 	Support reading 3 events at once.
> 	Reverse Jonathan's suggestion and check for positive number of
> 		records.  Because the record count may have been
> 		returned as something > 3 based on what the device
> 		thinks it can send back even though the core Linux mbox
> 		processing truncates the data.
> 	Alison and Dave Jiang
> 		Change header uuid type to uuid_t for better user space
> 		processing
> 	Smita
> 		Check status reg before reading log.
> 	Steven
> 		Prefix all trace points with 'cxl_'
> 		Use static branch <trace>_enabled() calls
> 	Jonathan
> 		s/CXL_EVENT_TYPE_INFO/0
> 		s/{first,last}/{first,last}_ts
> 		Remove Reserved field from header
> 		Fix header issue for cxl_event_log_type_str()
> 
> Change from RFC:
> 	Remove redundant error message in get event records loop
> 	s/EVENT_RECORD_DATA_LENGTH/CXL_EVENT_RECORD_DATA_LENGTH
> 	Use hdr_uuid for the header UUID field
> 	Use cxl_event_log_type_str() for the trace events
> 	Create macros for the header fields and common entries of each event
> 	Add reserved buffer output dump
> 	Report error if event query fails
> 	Remove unused record_cnt variable
> 	Steven - reorder overflow record
> 		Remove NOTE about checkpatch
> 	Jonathan
> 		check for exactly 1 record
> 		s/v3.0/rev 3.0
> 		Use 3 byte fields for 24bit fields
> 		Add 3.0 Maintenance Operation Class
> 		Add Dynamic Capacity log type
> 		Fix spelling
> 	Dave Jiang/Dan/Alison
> 		s/cxl-event/cxl
> 		trace/events/cxl-events => trace/events/cxl.h
> 		s/cxl_event_overflow/overflow
> 		s/cxl_event/generic_event
> ---
>  MAINTAINERS                  |   1 +
>  drivers/cxl/core/mbox.c      |  70 +++++++++++++++++++
>  drivers/cxl/cxl.h            |   8 +++
>  drivers/cxl/cxlmem.h         |  73 ++++++++++++++++++++
>  include/trace/events/cxl.h   | 127 +++++++++++++++++++++++++++++++++++
>  include/uapi/linux/cxl_mem.h |   1 +
>  6 files changed, 280 insertions(+)
>  create mode 100644 include/trace/events/cxl.h

> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..a908b95a7de4 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c

> +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> +				    enum cxl_event_log_type type)
> +{
> +	struct cxl_get_event_payload payload;
> +	u16 pl_nr;
> +
> +	do {
> +		u8 log_type = type;
> +		int rc;
> +
> +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> +				       &log_type, sizeof(log_type),
> +				       &payload, sizeof(payload));
> +		if (rc) {
> +			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> +				cxl_event_log_type_str(type), rc);
> +			return;
> +		}
> +
> +		pl_nr = le16_to_cpu(payload.record_count);
> +		if (trace_cxl_generic_event_enabled()) {
> +			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);

Either I'm misreading the spec, or it can't be greater than NR_RECORDS.
"The number of event records in the Event Records list...."
Event Records being the field inside this payload which is not big enough to
take more than CXL_GET_EVENT_NR_RECORDS and the intro to Get Event Records
refers to the number being restricted by the mailbox output payload provided.

I'm in favor of defense against broken hardware, but don't paper over any
such error - scream about it.

> +			int i;
> +
> +			for (i = 0; i < nr_rec; i++)
> +				trace_cxl_generic_event(dev_name(cxlds->dev),
> +							type,
> +							&payload.record[i]);
> +		}
> +
> +		if (trace_cxl_overflow_enabled() &&
> +		    (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> +			trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> +
> +	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||

Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
payload not the total number.

> +		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> +}


> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..60dec9a84918
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,127 @@


> +#define CXL_EVT_TP_fast_assign(dname, l, hdr)					\
> +	__assign_str(dev_name, (dname));					\
> +	__entry->log = (l);							\
> +	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
> +	__entry->hdr_length = (hdr).length;					\
> +	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
> +	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
> +	__entry->hdr_related_handle = le16_to_cpu((hdr).related_handle);	\
> +	__entry->hdr_timestamp = le64_to_cpu((hdr).timestamp);			\
> +	__entry->hdr_maint_op_class = (hdr).maint_op_class
> +
Trivial: Maybe one blank line is enough?
> +
> +#define CXL_EVT_TP_printk(fmt, ...) \
> +	TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' "		\
> +		"handle=%x related_handle=%x maint_op_class=%u"			\
> +		" : " fmt,							\
> +		__get_str(dev_name), cxl_event_log_type_str(__entry->log),	\
> +		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> +		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
> +		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
> +		##__VA_ARGS__)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 03/11] cxl/mem: Implement Clear Event Records command
  2022-11-10 18:57 ` [PATCH 03/11] cxl/mem: Implement Clear " ira.weiny
  2022-11-15 22:09   ` Dave Jiang
@ 2022-11-16 15:24   ` Jonathan Cameron
  2022-11-16 15:45     ` Jonathan Cameron
  2022-11-17  1:07     ` Ira Weiny
  1 sibling, 2 replies; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 15:24 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 10 Nov 2022 10:57:50 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> command.  After an event record is read it needs to be cleared from the
> event log.
> 
> Implement cxl_clear_event_record() and call it for each record retrieved
> from the device.
> 
> Each record is cleared individually.  A clear all bit is specified but
> events could arrive between a get and the final clear all operation.
> Therefore each event is cleared specifically.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
Some follow through comment updates needed from changes in earlier patches +
one comment you can ignore if you prefer to keep it as is.

>  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>  				    enum cxl_event_log_type type)
>  {
> @@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>  		}
>  
>  		pl_nr = le16_to_cpu(payload.record_count);
> -		if (trace_cxl_generic_event_enabled()) {

To simplify this patch, maybe push this check down in the previous patch so this
one doesn't move code around?  It'll look a tiny bit odd there of course..

> +		if (pl_nr > 0) {
>  			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
>  			int i;
>  
> -			for (i = 0; i < nr_rec; i++)
> -				trace_cxl_generic_event(dev_name(cxlds->dev),
> -							type,
> -							&payload.record[i]);
> +			if (trace_cxl_generic_event_enabled()) {
> +				for (i = 0; i < nr_rec; i++)
> +					trace_cxl_generic_event(dev_name(cxlds->dev),
> +								type,
> +								&payload.record[i]);
> +			}
> +
> +			rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> +			if (rc) {
> +				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> +					cxl_event_log_type_str(type), rc);
> +				return;
> +			}
>  		}
>  

> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index da64ba0f156b..28a114c7cf69 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h

>  
> +/*
> + * Clear Event Records input payload
> + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> + *
> + * Space given for 1 record

Nope...


> + */
> +struct cxl_mbox_clear_event_payload {
> +	u8 event_log;		/* enum cxl_event_log_type */
> +	u8 clear_flags;
> +	u8 nr_recs;		/* 1 for this struct */
Nope :)  Delete the comments so they can't be wrong if this changes in future!

> +	u8 reserved[3];
> +	__le16 handle[CXL_GET_EVENT_NR_RECORDS];
> +};
> +

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 05/11] cxl/mem: Trace General Media Event Record
  2022-11-10 18:57 ` [PATCH 05/11] cxl/mem: Trace General Media Event Record ira.weiny
  2022-11-15 22:25   ` Dave Jiang
@ 2022-11-16 15:31   ` Jonathan Cameron
  2022-11-17  1:18     ` Ira Weiny
  1 sibling, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 15:31 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 10 Nov 2022 10:57:52 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
> 
> Determine if the event read is a general media record and if so trace
> the record as a General Media Event Record.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
A few v2.0 references left in here that should be updated given it's new code.

With those tidied up
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> index 60dec9a84918..a0c20e110708 100644
> --- a/include/trace/events/cxl.h
> +++ b/include/trace/events/cxl.h
> @@ -119,6 +119,130 @@ TRACE_EVENT(cxl_generic_event,
>  		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
>  );
>  
> +/*
> + * Physical Address field masks
> + *
> + * General Media Event Record
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154

Update to CXL rev 3.0 as I think we are preferring latest
spec references on any new code.

> + *
> + * DRAM Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */

> +
> +/*
> + * General Media Event Record - GMER
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
Update ref to r3.0
Never v or the spec folk will get irritable :)

> + */




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/11] cxl/mem: Trace Memory Module Event Record
  2022-11-10 18:57 ` [PATCH 07/11] cxl/mem: Trace Memory Module " ira.weiny
  2022-11-15 22:39   ` Dave Jiang
@ 2022-11-16 15:35   ` Jonathan Cameron
  2022-11-17  1:23     ` Ira Weiny
  2022-11-22 22:36   ` Steven Rostedt
  2 siblings, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 15:35 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 10 Nov 2022 10:57:54 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> 
> Determine if the event read is memory module record and if so trace the
> record.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
Noticed that we have a mixture of fully capitalized and not for flags.
With that either explained or tidied up:

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> +/*
> + * Device Health Information - DHI
> + *
> + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +#define CXL_DHI_HS_MAINTENANCE_NEEDED				BIT(0)
> +#define CXL_DHI_HS_PERFORMANCE_DEGRADED				BIT(1)
> +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED			BIT(2)
> +#define show_health_status_flags(flags)	__print_flags(flags, "|",	   \
> +	{ CXL_DHI_HS_MAINTENANCE_NEEDED,	"Maintenance Needed"	}, \
> +	{ CXL_DHI_HS_PERFORMANCE_DEGRADED,	"Performance Degraded"	}, \
> +	{ CXL_DHI_HS_HW_REPLACEMENT_NEEDED,	"Replacement Needed"	}  \

Why are we sometime using capitals for flags (e.g patch 5) and not other times?

> +)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 03/11] cxl/mem: Implement Clear Event Records command
  2022-11-16 15:24   ` Jonathan Cameron
@ 2022-11-16 15:45     ` Jonathan Cameron
  2022-11-17  1:12       ` Ira Weiny
  2022-11-17  1:07     ` Ira Weiny
  1 sibling, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 15:45 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, 16 Nov 2022 15:24:26 +0000
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Thu, 10 Nov 2022 10:57:50 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > command.  After an event record is read it needs to be cleared from the
> > event log.
> > 
> > Implement cxl_clear_event_record() and call it for each record retrieved
> > from the device.
> > 
> > Each record is cleared individually.  A clear all bit is specified but
> > events could arrive between a get and the final clear all operation.
> > Therefore each event is cleared specifically.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >   
> Some follow through comment updates needed from changes in earlier patches +
> one comment you can ignore if you prefer to keep it as is.
> 
> >  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> >  				    enum cxl_event_log_type type)
> >  {
> > @@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> >  		}
> >  
> >  		pl_nr = le16_to_cpu(payload.record_count);
> > -		if (trace_cxl_generic_event_enabled()) {  
> 
> To simplify this patch, maybe push this check down in the previous patch so this
> one doesn't move code around?  It'll look a tiny bit odd there of course..
> 
> > +		if (pl_nr > 0) {
> >  			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> >  			int i;
> >  
> > -			for (i = 0; i < nr_rec; i++)
> > -				trace_cxl_generic_event(dev_name(cxlds->dev),
> > -							type,
> > -							&payload.record[i]);
> > +			if (trace_cxl_generic_event_enabled()) {
> > +				for (i = 0; i < nr_rec; i++)
> > +					trace_cxl_generic_event(dev_name(cxlds->dev),
> > +								type,
> > +								&payload.record[i]);
> > +			}
> > +
> > +			rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> > +			if (rc) {
> > +				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> > +					cxl_event_log_type_str(type), rc);
> > +				return;
> > +			}
> >  		}
> >    
> 
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index da64ba0f156b..28a114c7cf69 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h  
> 
> >  
> > +/*
> > + * Clear Event Records input payload
> > + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> > + *
> > + * Space given for 1 record  
> 
> Nope...
> 
> 
> > + */
> > +struct cxl_mbox_clear_event_payload {
> > +	u8 event_log;		/* enum cxl_event_log_type */
> > +	u8 clear_flags;
> > +	u8 nr_recs;		/* 1 for this struct */  
> Nope :)  Delete the comments so they can't be wrong if this changes in future!
Ah. You only use one. So should hard code that in the array size below.

> 
> > +	u8 reserved[3];
> > +	__le16 handle[CXL_GET_EVENT_NR_RECORDS];
> > +};
> > +  
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 09/11] cxl/test: Add generic mock events
  2022-11-10 18:57 ` [PATCH 09/11] cxl/test: Add generic mock events ira.weiny
@ 2022-11-16 16:00   ` Jonathan Cameron
  2022-11-29 18:29     ` Ira Weiny
  0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 16:00 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 10 Nov 2022 10:57:56 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Facilitate testing basic Get/Clear Event functionality by creating
> multiple logs and generic events with made up UUID's.
> 
> Data is completely made up with data patterns which should be easy to
> spot in trace output.
> 
> A single sysfs entry resets the event data and triggers collecting the
> events for testing.
> 
> Test traces are easy to obtain with a small script such as this:
> 
> 	#!/bin/bash -x
> 
> 	devices=`find /sys/devices/platform -name cxl_mem*`
> 
> 	# Turn on tracing
> 	echo "" > /sys/kernel/tracing/trace
> 	echo 1 > /sys/kernel/tracing/events/cxl/enable
> 	echo 1 > /sys/kernel/tracing/tracing_on
> 
> 	# Generate fake interrupt
> 	for device in $devices; do
> 	        echo 1 > $device/event_trigger
> 	done
> 
> 	# Turn off tracing and report events
> 	echo 0 > /sys/kernel/tracing/tracing_on
> 	cat /sys/kernel/tracing/trace
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Hi Ira,

I don't think your mocked device is now obeying the spec
after changes in the core code that mean it gets a larger
request than previously.
If it has more than 1 record and the read is for 3 it
must return more than 1 and only set MORE_RECORDS if there
are more than 3.

Gah. The more event records approach also suffers the
same problem that poison list does. You have no way
to be sure that "previous software" (which may have crashed)
hasn't already read some.  So in the core code we probably
need to do one more read on initial boot to be sure we have
all the records.  Not sure how I spotted that for poison
but never noticed it for these.  At least for these records
the expectation is that there won't be a huge number of them
so reading one more time is fine - particularly as you clear
them on that initial read so the list will get shorter.

Jonathan

> 
> ---
> Changes from RFC v2:
> 	Adjust to simulate the event status register
> 
> Changes from RFC:
> 	Separate out the event code
> 	Adjust for struct changes.
> 	Clean up devm_cxl_mock_event_logs()
> 	Clean up naming and comments
> 	Jonathan
> 		Remove dynamic allocation of event logs
> 		Clean up comment
> 		Remove unneeded xarray
> 		Ensure event_trigger sysfs is valid prior to the driver
> 		going active.
> 	Dan
> 		Remove the fill/reset event sysfs as these operations
> 		can be done together
> ---
>  drivers/cxl/core/mbox.c         |  31 +++--
>  drivers/cxl/cxlmem.h            |   1 +
>  tools/testing/cxl/test/Kbuild   |   2 +-
>  tools/testing/cxl/test/events.c | 222 ++++++++++++++++++++++++++++++++
>  tools/testing/cxl/test/events.h |   9 ++
>  tools/testing/cxl/test/mem.c    |  35 ++++-
>  6 files changed, 286 insertions(+), 14 deletions(-)
>  create mode 100644 tools/testing/cxl/test/events.c
>  create mode 100644 tools/testing/cxl/test/events.h


> diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
> new file mode 100644
> index 000000000000..a4816f230bb5
> --- /dev/null
> +++ b/tools/testing/cxl/test/events.c


xl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
> +};
> +
> +struct mock_event_store {
> +	struct cxl_dev_state *cxlds;
> +	struct mock_event_log mock_logs[CXL_EVENT_TYPE_MAX];
> +	u32 ev_status;
> +};
> +
> +DEFINE_XARRAY(mock_dev_event_store);

Perhaps add a comment on what this xarray is for.
I think it's all to allow associating some extra data with the devices
without bloating structures outside of tests?

> +
> +struct mock_event_log *find_event_log(struct device *dev, int log_type)
> +{
> +	struct mock_event_store *mes = xa_load(&mock_dev_event_store,
> +					       (unsigned long)dev);
> +
> +	if (!mes || log_type >= CXL_EVENT_TYPE_MAX)
> +		return NULL;
> +	return &mes->mock_logs[log_type];
> +}
> +

> +
> +int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_get_event_payload *pl;
> +	struct mock_event_log *log;
> +	u8 log_type;
> +
> +	/* Valid request? */
> +	if (cmd->size_in != sizeof(log_type))
> +		return -EINVAL;
> +
> +	log_type = *((u8 *)cmd->payload_in);
> +	if (log_type >= CXL_EVENT_TYPE_MAX)
> +		return -EINVAL;
> +
> +	log = find_event_log(cxlds->dev, log_type);
> +	if (!log || log_empty(log))
> +		goto no_data;
> +
> +	pl = cmd->payload_out;
> +	memset(pl, 0, sizeof(*pl));
> +
> +	pl->record_count = cpu_to_le16(1);

Not valid.  Kernel now requests 3 and as I read the spec we have
to return 3 if we have 3 or more to return. Can't send 1 and set
MORE_RECORDS as done here.

> +
> +	if (log_rec_left(log) > 1)
> +		pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
> +
> +	memcpy(&pl->record[0], get_cur_event(log), sizeof(pl->record[0]));
> +	pl->record[0].hdr.handle = get_cur_event_handle(log);
> +	return 0;
> +
> +no_data:
> +	/* Room for header? */

Why check for space here, but not when setting records above?

> +	if (cmd->size_out < (sizeof(*pl) - sizeof(pl->record[0])))
> +		return -EINVAL;
> +
> +	memset(cmd->payload_out, 0, cmd->size_out);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(mock_get_event);
> +
> +/*
> + * Get and clear event only handle 1 record at a time as this is what is
> + * currently implemented in the main code.
> + */
> +int mock_clear_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> +{
> +	struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
> +	struct mock_event_log *log;
> +	u8 log_type = pl->event_log;
> +
> +	/* Don't handle more than 1 record at a time */
> +	if (pl->nr_recs != 1)
> +		return -EINVAL;
> +
> +	if (log_type >= CXL_EVENT_TYPE_MAX)
> +		return -EINVAL;
> +
> +	log = find_event_log(cxlds->dev, log_type);
> +	if (!log)
> +		return 0; /* No mock data in this log */
> +
> +	/*
> +	 * Test code only reported 1 event at a time.  So only support 1 event
> +	 * being cleared.
> +	 */
> +	if (log->cur_event != le16_to_cpu(pl->handle[0])) {
> +		dev_err(cxlds->dev, "Clearing events out of order\n");
> +		return -EINVAL;
> +	}
> +
> +	log->cur_event++;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(mock_clear_event);

...

> +
> +struct cxl_event_record_raw maint_needed = {
> +	.hdr = {
> +		.id = UUID_INIT(0xDEADBEEF, 0xCAFE, 0xBABE,
> +				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> +		.length = sizeof(struct cxl_event_record_raw),
> +		.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> +		/* .handle = Set dynamically */

Multiple devices... So this should be const and a copy made for each one
to avoid races.

> +		.related_handle = cpu_to_le16(0xa5b6),
> +	},
> +	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +struct cxl_event_record_raw hardware_replace = {
> +	.hdr = {
> +		.id = UUID_INIT(0xBABECAFE, 0xBEEF, 0xDEAD,
> +				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> +		.length = sizeof(struct cxl_event_record_raw),
> +		.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> +		/* .handle = Set dynamically */
> +		.related_handle = cpu_to_le16(0xb6a5),
> +	},
> +	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
> +{
> +	struct device *dev = cxlds->dev;
> +	struct mock_event_store *mes;
> +
> +	mes = devm_kzalloc(dev, sizeof(*mes), GFP_KERNEL);
> +	if (WARN_ON(!mes))
> +		return 0;
> +	mes->cxlds = cxlds;
> +
> +	if (xa_insert(&mock_dev_event_store, (unsigned long)dev, mes,
> +		      GFP_KERNEL)) {
> +		dev_err(dev, "Event store not available for %s\n",
> +			dev_name(dev));
> +		return 0;
> +	}
> +
> +	event_store_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
> +	mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
> +
> +	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> +	mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL;
> +
> +	return mes->ev_status;
> +}
> +EXPORT_SYMBOL_GPL(cxl_mock_add_event_logs);
> +
> +void cxl_mock_remove_event_logs(struct device *dev)
> +{
> +	struct mock_event_store *mes;
> +
> +	mes = xa_erase(&mock_dev_event_store, (unsigned long)dev);
> +}
> +EXPORT_SYMBOL_GPL(cxl_mock_remove_event_logs);

> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index e2f5445d24ff..333fa8527a07 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
...


>  static int cxl_mock_mem_probe(struct platform_device *pdev)
>  {
>  	struct device *dev = &pdev->dev;
>  	struct cxl_memdev *cxlmd;
>  	struct cxl_dev_state *cxlds;
> +	u32 ev_status;
>  	void *lsa;
>  	int rc;
>  
> @@ -281,11 +304,13 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	if (rc)
>  		return rc;
>  
> +	ev_status = cxl_mock_add_event_logs(cxlds);
For below comment, add a devm_add_action_or_reset() here to
undo this.  If nothing else, without one you should have error
handling...

> +
>  	cxlmd = devm_cxl_add_memdev(cxlds);
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
> -	cxl_mem_get_event_records(cxlds);
> +	__cxl_mem_get_event_records(cxlds, ev_status);
>  
>  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>  		rc = devm_cxl_add_nvdimm(dev, cxlmd);
> @@ -293,6 +318,12 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	return 0;
>  }
>  
> +static int cxl_mock_mem_remove(struct platform_device *pdev)
> +{
> +	cxl_mock_remove_event_logs(&pdev->dev);

Given you have a bunch of devm above, probably better to just use
a devm_add_action_or_reset() to clean this up.
Saves on introducing remove for just this one call + any potential
ordering issues (I'm too lazy to check if there are any ;)


> +	return 0;
> +}
> +
>  static const struct platform_device_id cxl_mock_mem_ids[] = {
>  	{ .name = "cxl_mem", },
>  	{ },
> @@ -301,9 +332,11 @@ MODULE_DEVICE_TABLE(platform, cxl_mock_mem_ids);
>  
>  static struct platform_driver cxl_mock_mem_driver = {
>  	.probe = cxl_mock_mem_probe,
> +	.remove = cxl_mock_mem_remove,
>  	.id_table = cxl_mock_mem_ids,
>  	.driver = {
>  		.name = KBUILD_MODNAME,
> +		.dev_groups = cxl_mock_event_groups,
>  	},
>  };
>  


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 10/11] cxl/test: Add specific events
  2022-11-10 18:57 ` [PATCH 10/11] cxl/test: Add specific events ira.weiny
@ 2022-11-16 16:08   ` Jonathan Cameron
  0 siblings, 0 replies; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 16:08 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 10 Nov 2022 10:57:57 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Each type of event has different trace point outputs.
> 
> Add mock General Media Event, DRAM event, and Memory Module Event
> records to the mock list of events returned.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
A few trivial things inline. Otherwise

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> 
> ---
> Changes from RFC:
> 	Adjust for struct changes
> 	adjust for unaligned fields
> ---
>  tools/testing/cxl/test/events.c | 70 +++++++++++++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
> index a4816f230bb5..8693f3fb9cbb 100644
> --- a/tools/testing/cxl/test/events.c
> +++ b/tools/testing/cxl/test/events.c
> @@ -186,6 +186,70 @@ struct cxl_event_record_raw hardware_replace = {
>  	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
>  };
>  
> +struct cxl_event_gen_media gen_media = {
> +	.hdr = {
> +		.id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> +				0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> +		.length = sizeof(struct cxl_event_gen_media),
> +		.flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
> +		/* .handle = Set dynamically */
> +		.related_handle = cpu_to_le16(0),
> +	},
> +	.phys_addr = cpu_to_le64(0x2000),
> +	.descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> +	.type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> +	.transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> +	.validity_flags = { CXL_GMER_VALID_CHANNEL |
> +			    CXL_GMER_VALID_RANK, 0 },
put_unaligned_le16()

> +	.channel = 1,
> +	.rank = 30
> +};
> +
> +struct cxl_event_dram dram = {
> +	.hdr = {
> +		.id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> +				0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> +		.length = sizeof(struct cxl_event_dram),
> +		.flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
> +		/* .handle = Set dynamically */
> +		.related_handle = cpu_to_le16(0),
> +	},
> +	.phys_addr = cpu_to_le64(0x8000),
> +	.descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> +	.type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> +	.transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> +	.validity_flags = { CXL_DER_VALID_CHANNEL |
> +			    CXL_DER_VALID_BANK_GROUP |
> +			    CXL_DER_VALID_BANK |
> +			    CXL_DER_VALID_COLUMN, 0 },

put_unaligned_le16() etc

> +	.channel = 1,
> +	.bank_group = 5,
> +	.bank = 2,
> +	.column = { 0xDE, 0xAD},
spacing

> +};
> +
> +struct cxl_event_mem_module mem_module = {
> +	.hdr = {
> +		.id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
> +				0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
> +		.length = sizeof(struct cxl_event_mem_module),
> +		/* .handle = Set dynamically */
> +		.related_handle = cpu_to_le16(0),
> +	},
> +	.event_type = CXL_MMER_TEMP_CHANGE,
> +	.info = {
> +		.health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> +		.media_status = CXL_DHI_MS_ALL_DATA_LOST,
> +		.add_status = (CXL_DHI_AS_CRITICAL << 2) |
> +			      (CXL_DHI_AS_WARNING << 4) |
> +			      (CXL_DHI_AS_WARNING << 5),
> +		.device_temp = { 0xDE, 0xAD},
> +		.dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
> +		.cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> +		.cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> +	}
> +};
> +
>  u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
>  {
>  	struct device *dev = cxlds->dev;
> @@ -204,9 +268,15 @@ u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
>  	}
>  
>  	event_store_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
> +	event_store_add_event(mes, CXL_EVENT_TYPE_INFO,
> +			      (struct cxl_event_record_raw *)&gen_media);
> +	event_store_add_event(mes, CXL_EVENT_TYPE_INFO,
> +			      (struct cxl_event_record_raw *)&mem_module);
>  	mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
>  
>  	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> +	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL,
> +			      (struct cxl_event_record_raw *)&dram);
>  	mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL;
>  
>  	return mes->ev_status;


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/11] cxl/test: Simulate event log overflow
  2022-11-10 18:57 ` [PATCH 11/11] cxl/test: Simulate event log overflow ira.weiny
@ 2022-11-16 16:10   ` Jonathan Cameron
  0 siblings, 0 replies; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-16 16:10 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 10 Nov 2022 10:57:58 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Log overflow is marked by a separate trace message.
> 
> Simulate a log with lots of messages and flag overflow until it is
> drained a bit.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Looks fine to me

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> ---
> Changes from RFC
> 	Adjust for new struct changes
> ---
>  tools/testing/cxl/test/events.c | 37 +++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
> index 8693f3fb9cbb..5ce257114f4e 100644
> --- a/tools/testing/cxl/test/events.c
> +++ b/tools/testing/cxl/test/events.c
> @@ -69,11 +69,21 @@ static void event_store_add_event(struct mock_event_store *mes,
>  	log->nr_events++;
>  }
>  
> +static u16 log_overflow(struct mock_event_log *log)
> +{
> +	int cnt = log_rec_left(log) - 5;
> +
> +	if (cnt < 0)
> +		return 0;
> +	return cnt;
> +}
> +
>  int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  {
>  	struct cxl_get_event_payload *pl;
>  	struct mock_event_log *log;
>  	u8 log_type;
> +	u16 nr_overflow;
>  
>  	/* Valid request? */
>  	if (cmd->size_in != sizeof(log_type))
> @@ -95,6 +105,20 @@ int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
>  	if (log_rec_left(log) > 1)
>  		pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
>  
> +	nr_overflow = log_overflow(log);
> +	if (nr_overflow) {
> +		u64 ns;
> +
> +		pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW;
> +		pl->overflow_err_count = cpu_to_le16(nr_overflow);
> +		ns = ktime_get_real_ns();
> +		ns -= 5000000000; /* 5s ago */
> +		pl->first_overflow_timestamp = cpu_to_le64(ns);
> +		ns = ktime_get_real_ns();
> +		ns -= 1000000000; /* 1s ago */
> +		pl->last_overflow_timestamp = cpu_to_le64(ns);
> +	}
> +
>  	memcpy(&pl->record[0], get_cur_event(log), sizeof(pl->record[0]));
>  	pl->record[0].hdr.handle = get_cur_event_handle(log);
>  	return 0;
> @@ -274,6 +298,19 @@ u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
>  			      (struct cxl_event_record_raw *)&mem_module);
>  	mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
>  
> +	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &maint_needed);
> +	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> +	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
> +			      (struct cxl_event_record_raw *)&dram);
> +	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
> +			      (struct cxl_event_record_raw *)&gen_media);
> +	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
> +			      (struct cxl_event_record_raw *)&mem_module);
> +	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> +	event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
> +			      (struct cxl_event_record_raw *)&dram);
> +	mes->ev_status |= CXLDEV_EVENT_STATUS_FAIL;
> +
>  	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
>  	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL,
>  			      (struct cxl_event_record_raw *)&dram);


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support
  2022-11-16 14:53   ` Jonathan Cameron
@ 2022-11-16 23:48     ` Ira Weiny
  2022-11-17 11:20       ` Jonathan Cameron
  0 siblings, 1 reply; 50+ messages in thread
From: Ira Weiny @ 2022-11-16 23:48 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Davidlohr Bueso, Bjorn Helgaas, Alison Schofield,
	Vishal Verma, Ben Widawsky, Steven Rostedt, linux-kernel,
	linux-cxl

On Wed, Nov 16, 2022 at 02:53:41PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:48 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Davidlohr Bueso <dave@stgolabs.net>
> > 
> > Currently the only CXL features targeted for irq support require their
> > message numbers to be within the first 16 entries.  The device may
> > however support less than 16 entries depending on the support it
> > provides.
> > 
> > Attempt to allocate these 16 irq vectors.  If the device supports less
> > then the PCI infrastructure will allocate that number.  Store the number
> > of vectors actually allocated in the device state for later use
> > by individual functions.
> See later patch review, but I don't think we need to store the number
> allocated because any vector is guaranteed to be below that point

Only as long as we stick to those functions which are guaranteed to be under
16.  If a device supports more than 16 and code is added to try and enable that
irq this base support will not cover that.

> (QEMU code is wrong on this at the momemt, but there are very few vectors
>  so it hasn't mattered yet).

How so?  Does the spec state that a device must report at least 16 vectors?

> 
> Otherwise, pcim fun deals with some of the cleanup you are doing again
> here for us so can simplify this somewhat. See inline.

Yea it is broken.

> 
> Jonathan
> 
> 
> 
> > 
> > Upon successful allocation, users can plug in their respective isr at
> > any point thereafter, for example, if the irq setup is not done in the
> > PCI driver, such as the case of the CXL-PMU.
> > 
> > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Co-developed-by: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> > 
> > ---
> > Changes from Ira
> > 	Remove reviews
> > 	Allocate up to a static 16 vectors.
> > 	Change cover letter
> > ---
> >  drivers/cxl/cxlmem.h |  3 +++
> >  drivers/cxl/cxlpci.h |  6 ++++++
> >  drivers/cxl/pci.c    | 32 ++++++++++++++++++++++++++++++++
> >  3 files changed, 41 insertions(+)
> > 
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 88e3a8e54b6a..b7b955ded3ac 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
> >   * @info: Cached DVSEC information about the device.
> >   * @serial: PCIe Device Serial Number
> >   * @doe_mbs: PCI DOE mailbox array
> > + * @nr_irq_vecs: Number of MSI-X/MSI vectors available
> >   * @mbox_send: @dev specific transport for transmitting mailbox commands
> >   *
> >   * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> > @@ -247,6 +248,8 @@ struct cxl_dev_state {
> >  
> >  	struct xarray doe_mbs;
> >  
> > +	int nr_irq_vecs;
> > +
> >  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> >  };
> >  
> > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > index eec597dbe763..b7f4e2f417d3 100644
> > --- a/drivers/cxl/cxlpci.h
> > +++ b/drivers/cxl/cxlpci.h
> > @@ -53,6 +53,12 @@
> >  #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
> >  #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
> >  
> > +/*
> > + * NOTE: Currently all the functions which are enabled for CXL require their
> > + * vectors to be in the first 16.  Use this as the max.
> > + */
> > +#define CXL_PCI_REQUIRED_VECTORS 16
> > +
> >  /* Register Block Identifier (RBI) */
> >  enum cxl_regloc_type {
> >  	CXL_REGLOC_RBI_EMPTY = 0,
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index faeb5d9d7a7a..62e560063e50 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> >  	}
> >  }
> >  
> > +static void cxl_pci_free_irq_vectors(void *data)
> > +{
> > +	pci_free_irq_vectors(data);
> > +}
> > +
> > +static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> > +{
> > +	struct device *dev = cxlds->dev;
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +	int nvecs;
> > +	int rc;
> > +
> > +	nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
> > +				   PCI_IRQ_MSIX | PCI_IRQ_MSI);
> > +	if (nvecs < 0) {
> > +		dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
> > +		return;
> > +	}
> > +
> > +	rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> The pci managed code always gives me a headache because there is a lot of magic
> under the hood if you ever called pcim_enable_device() which we did.
> 
> Chasing through
> 
> pci_alloc_irq_vectors_affinity()->
> either
> 	__pci_enable_msix_range()
> or
> 	__pci_enable_msi_range()
> 
> they are similar
> 	pci_setup_msi_context()
> 		pci_setup_msi_release()
> 			adds pcmi_msi_release devm action.
> and that frees the vectors for us.
> So we don't need to do it here.

:-/

So what is the point of pci_free_irq_vectors()?  This is very confusing to have
a function not called pcim_* [pci_alloc_irq_vectors()] do 'pcim stuff'.

Ok I'll drop this extra because I see it now.

> 
> 
> > +	if (rc) {
> > +		dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
> > +		/* some got allocated, clean them up */
> > +		cxl_pci_free_irq_vectors(pdev);
> We could just leave them lying around for devm cleanup to sweep up eventually
> or free them as you have done here.

And besides this extra call is flat out broken.  cxl_pci_free_irq_vectors() is
already called at this point if devm_add_action_or_reset() failed...  But I see
this is not required.

I do plan to add a big ol' comment as to why we don't need to mirror the call
with the corresponding 'free'.

I'll respin,
Ira

> 
> > +		return;
> > +	}
> > +
> > +	cxlds->nr_irq_vecs = nvecs;
> > +}
> > +
> >  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  {
> >  	struct cxl_register_map map;
> > @@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  	if (rc)
> >  		return rc;
> >  
> > +	cxl_pci_alloc_irq_vectors(cxlds);
> > +
> >  	cxlmd = devm_cxl_add_memdev(cxlds);
> >  	if (IS_ERR(cxlmd))
> >  		return PTR_ERR(cxlmd);
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-16 15:19   ` Jonathan Cameron
@ 2022-11-17  0:47     ` Ira Weiny
  2022-11-17 10:43       ` Jonathan Cameron
  0 siblings, 1 reply; 50+ messages in thread
From: Ira Weiny @ 2022-11-17  0:47 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Nov 16, 2022 at 03:19:36PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:49 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL devices have multiple event logs which can be queried for CXL event
> > records.  Devices are required to support the storage of at least one
> > event record in each event log type.
> > 
> > Devices track event log overflow by incrementing a counter and tracking
> > the time of the first and last overflow event seen.
> > 
> > Software queries events via the Get Event Record mailbox command; CXL
> > rev 3.0 section 8.2.9.2.2.
> > 
> > Issue the Get Event Record mailbox command on driver load.  Trace each
> > record found with a generic record trace.  Trace any overflow
> > conditions.
> > 
> > The device can return up to 1MB worth of event records per query.  This
> > presents complications with allocating a huge buffers to potentially
> > capture all the records.  It is not anticipated that these event logs
> > will be very deep and reading them does not need to be performant.
> > Process only 3 records at a time.  3 records was chosen as it fits
> > comfortably on the stack to prevent dynamic allocation while still
> > cutting down on extra mailbox messages.
> > 
> > This patch traces a raw event record only and leaves the specific event
> > record types to subsequent patches.
> > 
> > Macros are created to use for tracing the common CXL Event header
> > fields.
> > 
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> Hi Ira,
> 
> A question inline about whether some of the conditions you are checking
> for can actually happen. Otherwise looks good to me.
> 
> Jonathan
> 

[snip]

> > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > +				    enum cxl_event_log_type type)
> > +{
> > +	struct cxl_get_event_payload payload;
> > +	u16 pl_nr;
> > +
> > +	do {
> > +		u8 log_type = type;
> > +		int rc;
> > +
> > +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > +				       &log_type, sizeof(log_type),
> > +				       &payload, sizeof(payload));
> > +		if (rc) {
> > +			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> > +				cxl_event_log_type_str(type), rc);
> > +			return;
> > +		}
> > +
> > +		pl_nr = le16_to_cpu(payload.record_count);
> > +		if (trace_cxl_generic_event_enabled()) {
> > +			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> 
> Either I'm misreading the spec, or it can't be greater than NR_RECORDS.

Well...  I could have read the spec wrong as well.  But after reading very
carefully I think this is actually correct.

> "The number of event records in the Event Records list...."

Where is this quote from?  I don't see that in the spec.

> Event Records being the field inside this payload which is not big enough to
> take more than CXL_GET_EVENT_NR_RECORDS and the intro to Get Event Records
> refers to the number being restricted by the mailbox output payload provided.

My understanding is that the output payload is only limited by the Payload Size
reported in the Mailbox Capability Register.Payload Size.  (Section 8.2.8.4.3)

This can be up to 1MB.  So the device could fill up to 1MB's worth of Event
Records while still being in compliance.  The generic mailbox code in the
driver caps the data based on the size passed into cxl_mbox_send_cmd() however,
the number of records reported is not changed.

> 
> I'm in favor of defense against broken hardware, but don't paper over any
> such error - scream about it.

I don't think this is out of spec unless the device is trying to write more
than 1MB and I think the core mailbox code will scream about that.

> 
> > +			int i;
> > +
> > +			for (i = 0; i < nr_rec; i++)
> > +				trace_cxl_generic_event(dev_name(cxlds->dev),
> > +							type,
> > +							&payload.record[i]);
> > +		}
> > +
> > +		if (trace_cxl_overflow_enabled() &&
> > +		    (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > +			trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > +
> > +	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
> 
> Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> payload not the total number.

I don't think so.  The only value passed to the device is the _input_ payload
size.  The output payload size is not passed to the device and is not included
in the Get Event Records Input Payload.  (Table 8-49)

So my previous code was wrong.  Here is an example I think which is within the
spec but would result in the more records flag not being set.

	Device log depth == 10
	nr log entries == 7
	nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000

Device sets Output Payload.Event Record Count == 7 (which is < 8000).  Common
mailbox code truncates that to 3.  More Event Records == 0 because it sent all
7 that it had.

This code will clear 3 and read again 2 more times.

Am I reading that wrong?

> 
> > +		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > +}
> 
> 
> > diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> > new file mode 100644
> > index 000000000000..60dec9a84918
> > --- /dev/null
> > +++ b/include/trace/events/cxl.h
> > @@ -0,0 +1,127 @@
> 
> 
> > +#define CXL_EVT_TP_fast_assign(dname, l, hdr)					\
> > +	__assign_str(dev_name, (dname));					\
> > +	__entry->log = (l);							\
> > +	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
> > +	__entry->hdr_length = (hdr).length;					\
> > +	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
> > +	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
> > +	__entry->hdr_related_handle = le16_to_cpu((hdr).related_handle);	\
> > +	__entry->hdr_timestamp = le64_to_cpu((hdr).timestamp);			\
> > +	__entry->hdr_maint_op_class = (hdr).maint_op_class
> > +
> Trivial: Maybe one blank line is enough?

Yea I'll adjust,
Ira

> > +
> > +#define CXL_EVT_TP_printk(fmt, ...) \
> > +	TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' "		\
> > +		"handle=%x related_handle=%x maint_op_class=%u"			\
> > +		" : " fmt,							\
> > +		__get_str(dev_name), cxl_event_log_type_str(__entry->log),	\
> > +		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> > +		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
> > +		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
> > +		##__VA_ARGS__)
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 03/11] cxl/mem: Implement Clear Event Records command
  2022-11-16 15:24   ` Jonathan Cameron
  2022-11-16 15:45     ` Jonathan Cameron
@ 2022-11-17  1:07     ` Ira Weiny
  1 sibling, 0 replies; 50+ messages in thread
From: Ira Weiny @ 2022-11-17  1:07 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Nov 16, 2022 at 03:24:26PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:50 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > command.  After an event record is read it needs to be cleared from the
> > event log.
> > 
> > Implement cxl_clear_event_record() and call it for each record retrieved
> > from the device.
> > 
> > Each record is cleared individually.  A clear all bit is specified but
> > events could arrive between a get and the final clear all operation.
> > Therefore each event is cleared specifically.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> Some follow through comment updates needed from changes in earlier patches +
> one comment you can ignore if you prefer to keep it as is.
> 
> >  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> >  				    enum cxl_event_log_type type)
> >  {
> > @@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> >  		}
> >  
> >  		pl_nr = le16_to_cpu(payload.record_count);
> > -		if (trace_cxl_generic_event_enabled()) {
> 
> To simplify this patch, maybe push this check down in the previous patch so this
> one doesn't move code around?  It'll look a tiny bit odd there of course..

That is the issue I think the oddness is easier to defend here vs having it in
the previous patch.

> 
> > +		if (pl_nr > 0) {
> >  			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> >  			int i;
> >  
> > -			for (i = 0; i < nr_rec; i++)
> > -				trace_cxl_generic_event(dev_name(cxlds->dev),
> > -							type,
> > -							&payload.record[i]);
> > +			if (trace_cxl_generic_event_enabled()) {
> > +				for (i = 0; i < nr_rec; i++)
> > +					trace_cxl_generic_event(dev_name(cxlds->dev),
> > +								type,
> > +								&payload.record[i]);
> > +			}
> > +
> > +			rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> > +			if (rc) {
> > +				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> > +					cxl_event_log_type_str(type), rc);
> > +				return;
> > +			}
> >  		}
> >  
> 
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index da64ba0f156b..28a114c7cf69 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> 
> >  
> > +/*
> > + * Clear Event Records input payload
> > + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> > + *
> > + * Space given for 1 record
> 
> Nope...

<sigh> yep...  ;-)

> 
> 
> > + */
> > +struct cxl_mbox_clear_event_payload {
> > +	u8 event_log;		/* enum cxl_event_log_type */
> > +	u8 clear_flags;
> > +	u8 nr_recs;		/* 1 for this struct */
> Nope :)  Delete the comments so they can't be wrong if this changes in future!

Yep.  :-/

Ira

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 03/11] cxl/mem: Implement Clear Event Records command
  2022-11-16 15:45     ` Jonathan Cameron
@ 2022-11-17  1:12       ` Ira Weiny
  0 siblings, 0 replies; 50+ messages in thread
From: Ira Weiny @ 2022-11-17  1:12 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Nov 16, 2022 at 03:45:43PM +0000, Jonathan Cameron wrote:
> On Wed, 16 Nov 2022 15:24:26 +0000
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 

[snip]

> > 
> > 
> > > + */
> > > +struct cxl_mbox_clear_event_payload {
> > > +	u8 event_log;		/* enum cxl_event_log_type */
> > > +	u8 clear_flags;
> > > +	u8 nr_recs;		/* 1 for this struct */  
> > Nope :)  Delete the comments so they can't be wrong if this changes in future!
> Ah. You only use one. So should hard code that in the array size below.

No it can can send up to CXL_GET_EVENT_NR_RECORDS at a time : 'nr_rec'.


                        rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);


static int cxl_clear_event_record(struct cxl_dev_state *cxlds,                                                                  
                                  enum cxl_event_log_type log,                                                                  
                                  struct cxl_get_event_payload *get_pl, u16 nr)                                                 
{                                                                                                                               
        struct cxl_mbox_clear_event_payload payload = {                                                                         
                .event_log = log,                                                                                               
                .nr_recs = nr,                                                                                                  
                ^^^^^^^^^^^^^^
                Here...

        };                                                                                                                      
        int i;                                                                                                                  
                                                                                                                                
        for (i = 0; i < nr; i++) {                                                                                              
                payload.handle[i] = get_pl->record[i].hdr.handle;                                                               
                dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",                                                           
                        cxl_event_log_type_str(log),                                                                            
                        le16_to_cpu(payload.handle[i]));                                                                        
        }                                                                                                                       
...

Ira

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 05/11] cxl/mem: Trace General Media Event Record
  2022-11-16 15:31   ` Jonathan Cameron
@ 2022-11-17  1:18     ` Ira Weiny
  0 siblings, 0 replies; 50+ messages in thread
From: Ira Weiny @ 2022-11-17  1:18 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Nov 16, 2022 at 03:31:06PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:52 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
> > 
> > Determine if the event read is a general media record and if so trace
> > the record as a General Media Event Record.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> A few v2.0 references left in here that should be updated given it's new code.
> 
> With those tidied up

Fixed.

> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

And thanks!
Ira

> 
> 
> > diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> > index 60dec9a84918..a0c20e110708 100644
> > --- a/include/trace/events/cxl.h
> > +++ b/include/trace/events/cxl.h
> > @@ -119,6 +119,130 @@ TRACE_EVENT(cxl_generic_event,
> >  		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
> >  );
> >  
> > +/*
> > + * Physical Address field masks
> > + *
> > + * General Media Event Record
> > + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> 
> Update to CXL rev 3.0 as I think we are preferring latest
> spec references on any new code.
> 
> > + *
> > + * DRAM Event Record
> > + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> > + */
> 
> > +
> > +/*
> > + * General Media Event Record - GMER
> > + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> Update ref to r3.0
> Never v or the spec folk will get irritable :)
> 
> > + */
> 
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/11] cxl/mem: Trace Memory Module Event Record
  2022-11-16 15:35   ` Jonathan Cameron
@ 2022-11-17  1:23     ` Ira Weiny
  2022-11-17 11:22       ` Jonathan Cameron
  0 siblings, 1 reply; 50+ messages in thread
From: Ira Weiny @ 2022-11-17  1:23 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Nov 16, 2022 at 03:35:28PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:54 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> > 
> > Determine if the event read is memory module record and if so trace the
> > record.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> Noticed that we have a mixture of fully capitalized and not for flags.
> With that either explained or tidied up:
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > +/*
> > + * Device Health Information - DHI
> > + *
> > + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> > + */
> > +#define CXL_DHI_HS_MAINTENANCE_NEEDED				BIT(0)
> > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED				BIT(1)
> > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED			BIT(2)
> > +#define show_health_status_flags(flags)	__print_flags(flags, "|",	   \
> > +	{ CXL_DHI_HS_MAINTENANCE_NEEDED,	"Maintenance Needed"	}, \
> > +	{ CXL_DHI_HS_PERFORMANCE_DEGRADED,	"Performance Degraded"	}, \
> > +	{ CXL_DHI_HS_HW_REPLACEMENT_NEEDED,	"Replacement Needed"	}  \
> 
> Why are we sometime using capitals for flags (e.g patch 5) and not other times?

Not sure what you mean.  Do you mean this from patch 5?

...
        { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,         "Uncorrectable Event"   }, \
        { CXL_GMER_EVT_DESC_THRESHOLD_EVENT,            "Threshold event"       }, \
        { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW,       "Poison List Overflow"  }  \
...

Threshold event was a mistake.  This is the capitalization the spec uses.

Bit[0]: Uncorrectable Event: When set, indicates the reported event is
        ^^^^^^^^^^^^^^^^^^^
uncorrectable by the device. When cleared, indicates the reported
event was corrected by the device.

Bit[1]: Threshold Event: When set, the event is the result of a
        ^^^^^^^^^^^^^^^
threshold on the device having been reached. When cleared, the event
is not the result of a threshold limit.

Bit[2]: Poison List Overflow Event: When set, the Poison List has
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
overflowed, and this event is not in the Poison List. When cleared, the
Poison List has not overflowed.


I'll update this 'Event' in patch 5.  Probably need to add 'Event' to the
Poison List...

Ira

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/11] cxl/mem: Wire up event interrupts
  2022-11-15 23:13   ` Dave Jiang
@ 2022-11-17  1:38     ` Ira Weiny
  0 siblings, 0 replies; 50+ messages in thread
From: Ira Weiny @ 2022-11-17  1:38 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Jonathan Cameron, Davidlohr Bueso, linux-kernel,
	linux-cxl

On Tue, Nov 15, 2022 at 04:13:24PM -0700, Jiang, Dave wrote:
> 
> 
> On 11/10/2022 10:57 AM, ira.weiny@intel.com wrote:

[snip]

> > +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
> > +{
> > +	struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
> > +	size_t policy_size = sizeof(*policy);
> > +	bool retry = true;
> > +	int rc;
> > +
> > +	policy->info_settings = CXL_INT_MSI_MSIX;
> > +	policy->warn_settings = CXL_INT_MSI_MSIX;
> > +	policy->failure_settings = CXL_INT_MSI_MSIX;
> > +	policy->fatal_settings = CXL_INT_MSI_MSIX;
> > +	policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> > +
> > +again:
> > +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
> > +			       policy, policy_size, NULL, 0);
> > +	if (rc < 0) {
> > +		/*
> > +		 * If the device does not support dynamic capacity it may fail
> > +		 * the command due to an invalid payload.  Retry without
> > +		 * dynamic capacity.
> > +		 */
> > +		if (retry) {
> > +			retry = false;
> > +			policy->dyn_cap_settings = 0;
> > +			policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> > +			goto again;
> > +		}
> > +		dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
> > +			rc);
> > +		memset(policy, CXL_INT_NONE, sizeof(*policy));
> > +		return rc;
> > +	}
> 
> Up to you, but I think you can avoid the goto:

I think this is a bit more confusing because we are not really retrying 2
times.

> 
> 	int retry = 2;
> 	do {
> 		rc = cxl_mbox_send_cmd(...);
> 		if (rc == 0 || retry == 1)

Specifically this looks confusing to me.  Why break on retry == 1?

> 			break;
> 		policy->dyn_cap_settings = 0;
> 		policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> 		retry--;
> 	} while (retry);
> 
> 	if (rc < 0) {
> 		dev_err(...);
> 		memset(policy, ...);
> 		return rc;
> 	}

That said perhaps the retry should be based on policy_size...  :-/  I'm not
sure that adds much.  I'm going to leave it as is.

[snip]

> > +
> > +static irqreturn_t cxl_event_int_handler(int irq, void *id)
> > +{
> > +	struct cxl_event_irq_id *cxlid = id;
> > +	struct cxl_dev_state *cxlds = cxlid->cxlds;
> > +	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> > +
> > +	if (cxlid->status & status)
> > +		return IRQ_WAKE_THREAD;
> > +	return IRQ_HANDLED;
> 
> IRQ_NONE since your handler did not handle anything and this is a shared
> interrupt?

Yes.  Good catch thanks!

> 
> > +}
> > +
> > +static void cxl_free_event_irq(void *id)
> > +{
> > +	struct cxl_event_irq_id *cxlid = id;
> > +	struct pci_dev *pdev = to_pci_dev(cxlid->cxlds->dev);
> > +
> > +	pci_free_irq(pdev, cxlid->msgnum, id);
> > +}
> > +
> > +static u32 log_type_to_status(enum cxl_event_log_type log_type)
> > +{
> > +	switch (log_type) {
> > +	case CXL_EVENT_TYPE_INFO:
> > +		return CXLDEV_EVENT_STATUS_INFO | CXLDEV_EVENT_STATUS_DYNAMIC_CAP;
> > +	case CXL_EVENT_TYPE_WARN:
> > +		return CXLDEV_EVENT_STATUS_WARN;
> > +	case CXL_EVENT_TYPE_FAIL:
> > +		return CXLDEV_EVENT_STATUS_FAIL;
> > +	case CXL_EVENT_TYPE_FATAL:
> > +		return CXLDEV_EVENT_STATUS_FATAL;
> > +	default:
> > +		break;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
> > +				 enum cxl_event_log_type log_type,
> > +				 u8 setting)
> > +{
> > +	struct device *dev = cxlds->dev;
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +	struct cxl_event_irq_id *id;
> > +	unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
> > +	int irq;
> 
> int rc? pci_request_irq() returns an errno or 0, not the number of irq. The
> variable naming is a bit confusing.

Indeed.  Changed, and thanks,
Ira

> 
> DJ
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-17  0:47     ` Ira Weiny
@ 2022-11-17 10:43       ` Jonathan Cameron
  2022-11-18 23:26         ` Ira Weiny
  0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-17 10:43 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, 16 Nov 2022 16:47:20 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> On Wed, Nov 16, 2022 at 03:19:36PM +0000, Jonathan Cameron wrote:
> > On Thu, 10 Nov 2022 10:57:49 -0800
> > ira.weiny@intel.com wrote:
> >   
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > CXL devices have multiple event logs which can be queried for CXL event
> > > records.  Devices are required to support the storage of at least one
> > > event record in each event log type.
> > > 
> > > Devices track event log overflow by incrementing a counter and tracking
> > > the time of the first and last overflow event seen.
> > > 
> > > Software queries events via the Get Event Record mailbox command; CXL
> > > rev 3.0 section 8.2.9.2.2.
> > > 
> > > Issue the Get Event Record mailbox command on driver load.  Trace each
> > > record found with a generic record trace.  Trace any overflow
> > > conditions.
> > > 
> > > The device can return up to 1MB worth of event records per query.  This
> > > presents complications with allocating a huge buffers to potentially
> > > capture all the records.  It is not anticipated that these event logs
> > > will be very deep and reading them does not need to be performant.
> > > Process only 3 records at a time.  3 records was chosen as it fits
> > > comfortably on the stack to prevent dynamic allocation while still
> > > cutting down on extra mailbox messages.
> > > 
> > > This patch traces a raw event record only and leaves the specific event
> > > record types to subsequent patches.
> > > 
> > > Macros are created to use for tracing the common CXL Event header
> > > fields.
> > > 
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>  
> > 
> > Hi Ira,
> > 
> > A question inline about whether some of the conditions you are checking
> > for can actually happen. Otherwise looks good to me.
> > 
> > Jonathan
> >   
> 
> [snip]
> 
> > > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > > +				    enum cxl_event_log_type type)
> > > +{
> > > +	struct cxl_get_event_payload payload;
> > > +	u16 pl_nr;
> > > +
> > > +	do {
> > > +		u8 log_type = type;
> > > +		int rc;
> > > +
> > > +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > > +				       &log_type, sizeof(log_type),
> > > +				       &payload, sizeof(payload));
> > > +		if (rc) {
> > > +			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> > > +				cxl_event_log_type_str(type), rc);
> > > +			return;
> > > +		}
> > > +
> > > +		pl_nr = le16_to_cpu(payload.record_count);
> > > +		if (trace_cxl_generic_event_enabled()) {
> > > +			u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);  
> > 
> > Either I'm misreading the spec, or it can't be greater than NR_RECORDS.  
> 
> Well...  I could have read the spec wrong as well.  But after reading very
> carefully I think this is actually correct.
> 
> > "The number of event records in the Event Records list...."  
> 
> Where is this quote from?  I don't see that in the spec.

Table 8-50 Event Record Count (the field we are reading here).

> 
> > Event Records being the field inside this payload which is not big enough to
> > take more than CXL_GET_EVENT_NR_RECORDS and the intro to Get Event Records
> > refers to the number being restricted by the mailbox output payload provided.  
> 
> My understanding is that the output payload is only limited by the Payload Size
> reported in the Mailbox Capability Register.Payload Size.  (Section 8.2.8.4.3)
> 
> This can be up to 1MB.  So the device could fill up to 1MB's worth of Event
> Records while still being in compliance.  The generic mailbox code in the
> driver caps the data based on the size passed into cxl_mbox_send_cmd() however,
> the number of records reported is not changed.

Indeed I had that wrong.  I thought we passed in an output payload length whereas
we only provide "payload length" which is defined as being the input length in 8.2.8.4.5

> 
> > 
> > I'm in favor of defense against broken hardware, but don't paper over any
> > such error - scream about it.  
> 
> I don't think this is out of spec unless the device is trying to write more
> than 1MB and I think the core mailbox code will scream about that.
> 
> >   
> > > +			int i;
> > > +
> > > +			for (i = 0; i < nr_rec; i++)
> > > +				trace_cxl_generic_event(dev_name(cxlds->dev),
> > > +							type,
> > > +							&payload.record[i]);
> > > +		}
> > > +
> > > +		if (trace_cxl_overflow_enabled() &&
> > > +		    (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > +			trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > +
> > > +	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||  
> > 
> > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > payload not the total number.  
> 
> I don't think so.  The only value passed to the device is the _input_ payload
> size.  The output payload size is not passed to the device and is not included
> in the Get Event Records Input Payload.  (Table 8-49)
> 
> So my previous code was wrong.  Here is an example I think which is within the
> spec but would result in the more records flag not being set.
> 
> 	Device log depth == 10
> 	nr log entries == 7
> 	nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
> 
> Device sets Output Payload.Event Record Count == 7 (which is < 8000).  Common
> mailbox code truncates that to 3.  More Event Records == 0 because it sent all
> 7 that it had.
> 
> This code will clear 3 and read again 2 more times.
> 
> Am I reading that wrong?

I think this is still wrong, but for a different reason. :)
If we don't clear the records and more records is set, that means it didn't
fit in the mailbox payload (potentially 1MB)  then the next read
will return the next set of records from there.

Taking this patch only, let's say the mailbox takes 4 records.
Read 1: Records 0, 1, 2, 3 More set.
   We handle 0, 1, 2
Read 2: Records 4, 5, 6 More not set.
   We handle 4, 5, 6

Record 3 is never handled.

If we add in clearing as happens later in the series, the current
assumption is that if we clear some records a subsequent read will
start again.  I'm not sure that is true. If it is spec reference needed.

So assumption is
Read 1: Records 0, 1, 2, 3 More set
  Clear 0, 1, 2
Read 2: Records 3, 4, 5, 6
  Clear 3, 4, 5 More not set, but catch it with the condition above.
Read 3: 6 only
  Clear 6

However, I think a valid implementation could do the following
(imagine a ring buffer with a pointer to the 'next' record to read out and
 each record has a 'valid' flag to deal with corner cases around
 sequences such as read log once, start reading again and some
 clears occur using handles obtained from first read - not that
 case isn't ruled out by the spec as far as I can see).

Read 1: Records 0, 1, 2, 3 More set.  'next' pointer points to record 4.
  Clear 0, 1, 2
Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
  Clear 4, 5, 6

Skipping record 3.

So I think we have to absorb the full mailbox payload each time to guarantee
we don't skip events or process them out of order (which is what would happen
if we relied on a retry loop - we aren't allowed to clear them out of
order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
shall verify the event record handles specified in the input payload are in
temporal order. ... "). 
Obviously that temporal order thing is only relevant if we get my second
example occurring on real hardware.  I think the spec is vague enough
to allow that implementation.  Would have been easy to specify this originally
but it probably won't go in as errata so we need to cope with all the
flexibility that is present.

What fun and oh for a parameter to control how many records are returned!

Jonathan


> 
> >   
> > > +		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > +}  
> > 

> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support
  2022-11-16 23:48     ` Ira Weiny
@ 2022-11-17 11:20       ` Jonathan Cameron
  0 siblings, 0 replies; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-17 11:20 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Davidlohr Bueso, Bjorn Helgaas, Alison Schofield,
	Vishal Verma, Ben Widawsky, Steven Rostedt, linux-kernel,
	linux-cxl

On Wed, 16 Nov 2022 15:48:48 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> On Wed, Nov 16, 2022 at 02:53:41PM +0000, Jonathan Cameron wrote:
> > On Thu, 10 Nov 2022 10:57:48 -0800
> > ira.weiny@intel.com wrote:
> >   
> > > From: Davidlohr Bueso <dave@stgolabs.net>
> > > 
> > > Currently the only CXL features targeted for irq support require their
> > > message numbers to be within the first 16 entries.  The device may
> > > however support less than 16 entries depending on the support it
> > > provides.
> > > 
> > > Attempt to allocate these 16 irq vectors.  If the device supports less
> > > then the PCI infrastructure will allocate that number.  Store the number
> > > of vectors actually allocated in the device state for later use
> > > by individual functions.  
> > See later patch review, but I don't think we need to store the number
> > allocated because any vector is guaranteed to be below that point  
> 
> Only as long as we stick to those functions which are guaranteed to be under
> 16.  If a device supports more than 16 and code is added to try and enable that
> irq this base support will not cover that.

It matters only if this enable is changed.  If this remains at 16, the vectors
are guaranteed to be under 16..

> 
> > (QEMU code is wrong on this at the momemt, but there are very few vectors
> >  so it hasn't mattered yet).  
> 
> How so?  Does the spec state that a device must report at least 16 vectors?

Technically QEMU upstream today uses 1 vector I think, so failure to get that
is the same as no irqs.

As we expand to more possible vectors, QEMU should adjust the reported msgnums
to fit in whatever vectors are enabled (if using msi, logic is handled
elsewhere for msix as there is an indirection in the way and I think it
is down to the OS to program that indirection correctly).  See spec language
referred to in review of the patch using the irqs.  We may never implement
that magic. but it is done correctly for other devices.

https://elixir.bootlin.com/qemu/latest/source/hw/pci-bridge/ioh3420.c#L47
is an example where aer is on vector 1 unless there is only one vector in which
case it falls back to vector 0.

Jonathan


> 
> > 
> > Otherwise, pcim fun deals with some of the cleanup you are doing again
> > here for us so can simplify this somewhat. See inline.  
> 
> Yea it is broken.
> 
> > 
> > Jonathan
> > 
> > 
> >   
> > > 
> > > Upon successful allocation, users can plug in their respective isr at
> > > any point thereafter, for example, if the irq setup is not done in the
> > > PCI driver, such as the case of the CXL-PMU.
> > > 
> > > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Co-developed-by: Ira Weiny <ira.weiny@intel.com>
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> > > 
> > > ---
> > > Changes from Ira
> > > 	Remove reviews
> > > 	Allocate up to a static 16 vectors.
> > > 	Change cover letter
> > > ---
> > >  drivers/cxl/cxlmem.h |  3 +++
> > >  drivers/cxl/cxlpci.h |  6 ++++++
> > >  drivers/cxl/pci.c    | 32 ++++++++++++++++++++++++++++++++
> > >  3 files changed, 41 insertions(+)
> > > 
> > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > index 88e3a8e54b6a..b7b955ded3ac 100644
> > > --- a/drivers/cxl/cxlmem.h
> > > +++ b/drivers/cxl/cxlmem.h
> > > @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
> > >   * @info: Cached DVSEC information about the device.
> > >   * @serial: PCIe Device Serial Number
> > >   * @doe_mbs: PCI DOE mailbox array
> > > + * @nr_irq_vecs: Number of MSI-X/MSI vectors available
> > >   * @mbox_send: @dev specific transport for transmitting mailbox commands
> > >   *
> > >   * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> > > @@ -247,6 +248,8 @@ struct cxl_dev_state {
> > >  
> > >  	struct xarray doe_mbs;
> > >  
> > > +	int nr_irq_vecs;
> > > +
> > >  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> > >  };
> > >  
> > > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > > index eec597dbe763..b7f4e2f417d3 100644
> > > --- a/drivers/cxl/cxlpci.h
> > > +++ b/drivers/cxl/cxlpci.h
> > > @@ -53,6 +53,12 @@
> > >  #define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
> > >  #define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
> > >  
> > > +/*
> > > + * NOTE: Currently all the functions which are enabled for CXL require their
> > > + * vectors to be in the first 16.  Use this as the max.
> > > + */
> > > +#define CXL_PCI_REQUIRED_VECTORS 16
> > > +
> > >  /* Register Block Identifier (RBI) */
> > >  enum cxl_regloc_type {
> > >  	CXL_REGLOC_RBI_EMPTY = 0,
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index faeb5d9d7a7a..62e560063e50 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > >  	}
> > >  }
> > >  
> > > +static void cxl_pci_free_irq_vectors(void *data)
> > > +{
> > > +	pci_free_irq_vectors(data);
> > > +}
> > > +
> > > +static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> > > +{
> > > +	struct device *dev = cxlds->dev;
> > > +	struct pci_dev *pdev = to_pci_dev(dev);
> > > +	int nvecs;
> > > +	int rc;
> > > +
> > > +	nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
> > > +				   PCI_IRQ_MSIX | PCI_IRQ_MSI);
> > > +	if (nvecs < 0) {
> > > +		dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
> > > +		return;
> > > +	}
> > > +
> > > +	rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);  
> > The pci managed code always gives me a headache because there is a lot of magic
> > under the hood if you ever called pcim_enable_device() which we did.
> > 
> > Chasing through
> > 
> > pci_alloc_irq_vectors_affinity()->
> > either
> > 	__pci_enable_msix_range()
> > or
> > 	__pci_enable_msi_range()
> > 
> > they are similar
> > 	pci_setup_msi_context()
> > 		pci_setup_msi_release()
> > 			adds pcmi_msi_release devm action.
> > and that frees the vectors for us.
> > So we don't need to do it here.  
> 
> :-/
> 
> So what is the point of pci_free_irq_vectors()?  This is very confusing to have
> a function not called pcim_* [pci_alloc_irq_vectors()] do 'pcim stuff'.
> 
> Ok I'll drop this extra because I see it now.
> 
> > 
> >   
> > > +	if (rc) {
> > > +		dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
> > > +		/* some got allocated, clean them up */
> > > +		cxl_pci_free_irq_vectors(pdev);  
> > We could just leave them lying around for devm cleanup to sweep up eventually
> > or free them as you have done here.  
> 
> And besides this extra call is flat out broken.  cxl_pci_free_irq_vectors() is
> already called at this point if devm_add_action_or_reset() failed...  But I see
> this is not required.
> 
> I do plan to add a big ol' comment as to why we don't need to mirror the call
> with the corresponding 'free'.
> 
> I'll respin,
> Ira
> 
> >   
> > > +		return;
> > > +	}
> > > +
> > > +	cxlds->nr_irq_vecs = nvecs;
> > > +}
> > > +
> > >  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > >  {
> > >  	struct cxl_register_map map;
> > > @@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > >  	if (rc)
> > >  		return rc;
> > >  
> > > +	cxl_pci_alloc_irq_vectors(cxlds);
> > > +
> > >  	cxlmd = devm_cxl_add_memdev(cxlds);
> > >  	if (IS_ERR(cxlmd))
> > >  		return PTR_ERR(cxlmd);  
> >   


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/11] cxl/mem: Trace Memory Module Event Record
  2022-11-17  1:23     ` Ira Weiny
@ 2022-11-17 11:22       ` Jonathan Cameron
  2022-11-30  9:30         ` Ira Weiny
  0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-17 11:22 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, 16 Nov 2022 17:23:58 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> On Wed, Nov 16, 2022 at 03:35:28PM +0000, Jonathan Cameron wrote:
> > On Thu, 10 Nov 2022 10:57:54 -0800
> > ira.weiny@intel.com wrote:
> >   
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> > > 
> > > Determine if the event read is memory module record and if so trace the
> > > record.
> > > 
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > >   
> > Noticed that we have a mixture of fully capitalized and not for flags.
> > With that either explained or tidied up:
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >   
> > > +/*
> > > + * Device Health Information - DHI
> > > + *
> > > + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> > > + */
> > > +#define CXL_DHI_HS_MAINTENANCE_NEEDED				BIT(0)
> > > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED				BIT(1)
> > > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED			BIT(2)
> > > +#define show_health_status_flags(flags)	__print_flags(flags, "|",	   \
> > > +	{ CXL_DHI_HS_MAINTENANCE_NEEDED,	"Maintenance Needed"	}, \
> > > +	{ CXL_DHI_HS_PERFORMANCE_DEGRADED,	"Performance Degraded"	}, \
> > > +	{ CXL_DHI_HS_HW_REPLACEMENT_NEEDED,	"Replacement Needed"	}  \  
> > 
> > Why are we sometime using capitals for flags (e.g patch 5) and not other times?  
> 
> Not sure what you mean.  Do you mean this from patch 5?
Nope

+#define CXL_DPA_VOLATILE			BIT(0)
+#define CXL_DPA_NOT_REPAIRABLE			BIT(1)
+#define show_dpa_flags(flags)	__print_flags(flags, "|",		   \
+	{ CXL_DPA_VOLATILE,			"VOLATILE"		}, \
+	{ CXL_DPA_NOT_REPAIRABLE,		"NOT_REPAIRABLE"	}  \
+)
+

Where they are all capitals.  I thought that was maybe a flags vs other fields
thing but it doesn't seem to be.


> 
> ...
>         { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,         "Uncorrectable Event"   }, \
>         { CXL_GMER_EVT_DESC_THRESHOLD_EVENT,            "Threshold event"       }, \
>         { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW,       "Poison List Overflow"  }  \
> ...
> 
> Threshold event was a mistake.  This is the capitalization the spec uses.
> 
> Bit[0]: Uncorrectable Event: When set, indicates the reported event is
>         ^^^^^^^^^^^^^^^^^^^
> uncorrectable by the device. When cleared, indicates the reported
> event was corrected by the device.
> 
> Bit[1]: Threshold Event: When set, the event is the result of a
>         ^^^^^^^^^^^^^^^
> threshold on the device having been reached. When cleared, the event
> is not the result of a threshold limit.
> 
> Bit[2]: Poison List Overflow Event: When set, the Poison List has
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^
> overflowed, and this event is not in the Poison List. When cleared, the
> Poison List has not overflowed.
> 
> 
> I'll update this 'Event' in patch 5.  Probably need to add 'Event' to the
> Poison List...
> 
> Ira


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-17 10:43       ` Jonathan Cameron
@ 2022-11-18 23:26         ` Ira Weiny
  2022-11-21 10:47           ` Jonathan Cameron
  0 siblings, 1 reply; 50+ messages in thread
From: Ira Weiny @ 2022-11-18 23:26 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, Nov 17, 2022 at 10:43:37AM +0000, Jonathan Cameron wrote:
> On Wed, 16 Nov 2022 16:47:20 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> 

[snip]

> > 
> > >   
> > > > +			int i;
> > > > +
> > > > +			for (i = 0; i < nr_rec; i++)
> > > > +				trace_cxl_generic_event(dev_name(cxlds->dev),
> > > > +							type,
> > > > +							&payload.record[i]);
> > > > +		}
> > > > +
> > > > +		if (trace_cxl_overflow_enabled() &&
> > > > +		    (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > > +			trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > > +
> > > > +	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||  
> > > 
> > > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > > payload not the total number.  
> > 
> > I don't think so.  The only value passed to the device is the _input_ payload
> > size.  The output payload size is not passed to the device and is not included
> > in the Get Event Records Input Payload.  (Table 8-49)
> > 
> > So my previous code was wrong.  Here is an example I think which is within the
> > spec but would result in the more records flag not being set.
> > 
> > 	Device log depth == 10
> > 	nr log entries == 7
> > 	nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
> > 
> > Device sets Output Payload.Event Record Count == 7 (which is < 8000).  Common
> > mailbox code truncates that to 3.  More Event Records == 0 because it sent all
> > 7 that it had.
> > 
> > This code will clear 3 and read again 2 more times.
> > 
> > Am I reading that wrong?
> 
> I think this is still wrong, but for a different reason. :)

I hope not...  :-/

> If we don't clear the records and more records is set, that means it didn't
> fit in the mailbox payload (potentially 1MB)  then the next read
> will return the next set of records from there.

That is not how I read the Get Event Records command:

From 8.2.9.2.2 Get Event Records

... "Devices shall return event records to the host in the temporal order the
device detected the events in. The event occurring the earliest in time, in the
specific event log, shall be returned first."

If item 3 below is earlier than 4 then it must be returned if we have not
cleared it.  At least that is how I read the above.  :-/

> 
> Taking this patch only, let's say the mailbox takes 4 records.
> Read 1: Records 0, 1, 2, 3 More set.
>    We handle 0, 1, 2
> Read 2: Records 4, 5, 6 More not set.
>    We handle 4, 5, 6
> 
> Record 3 is never handled.
> 
> If we add in clearing as happens later in the series,

I suppose I should squash the patches as this may not work without the
clearing.  :-/

> the current
> assumption is that if we clear some records a subsequent read will
> start again.  I'm not sure that is true. If it is spec reference needed.
> 
> So assumption is
> Read 1: Records 0, 1, 2, 3 More set
>   Clear 0, 1, 2
> Read 2: Records 3, 4, 5, 6
>   Clear 3, 4, 5 More not set, but catch it with the condition above.
> Read 3: 6 only
>   Clear 6
> 
> However, I think a valid implementation could do the following
> (imagine a ring buffer with a pointer to the 'next' record to read out and
>  each record has a 'valid' flag to deal with corner cases around
>  sequences such as read log once, start reading again and some
>  clears occur using handles obtained from first read - not that
>  case isn't ruled out by the spec as far as I can see).

I believe this is a violation because the next pointer can't be advanced until
the record is cleared.  Otherwise the device is not returning items in temporal
order based on what is in the log.

> 
> Read 1: Records 0, 1, 2, 3 More set.  'next' pointer points to record 4.
>   Clear 0, 1, 2
> Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
>   Clear 4, 5, 6
> 
> Skipping record 3.
> 
> So I think we have to absorb the full mailbox payload each time to guarantee
> we don't skip events or process them out of order (which is what would happen
> if we relied on a retry loop - we aren't allowed to clear them out of
> order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
> shall verify the event record handles specified in the input payload are in
> temporal order. ... "). 
> Obviously that temporal order thing is only relevant if we get my second
> example occurring on real hardware.  I think the spec is vague enough
> to allow that implementation.  Would have been easy to specify this originally
> but it probably won't go in as errata so we need to cope with all the
> flexibility that is present.

:-(  Yea coulda, woulda, shoulda...  ;-)

> 
> What fun and oh for a parameter to control how many records are returned!

Yea.  But I really don't think there is a problem unless someone really take
liberty with the spec.  I think it boils down to how one interprets _when_ a
record is removed from the log.

If the record is removed when it is returned (as in your 'next' pointer
example) then why have a clear at all?  If my interpretation is correct then
the next available entry is the one which has not been cleared.  Therefore in
your example 'next' is not incremented until clear has been called.  I think
that implementation is also supported by the idea that records must be cleared
in temporal order.  Otherwise I think devices would get confused.

FWIW the qemu implementation is based on my interpretation ATM.

Ira

> 
> Jonathan
> 
> 
> > 
> > >   
> > > > +		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > > +}  
> > > 
> 
> > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-18 23:26         ` Ira Weiny
@ 2022-11-21 10:47           ` Jonathan Cameron
  2022-11-28 23:30             ` Ira Weiny
  0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-21 10:47 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Davidlohr Bueso, linux-kernel, linux-cxl

On Fri, 18 Nov 2022 15:26:17 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> On Thu, Nov 17, 2022 at 10:43:37AM +0000, Jonathan Cameron wrote:
> > On Wed, 16 Nov 2022 16:47:20 -0800
> > Ira Weiny <ira.weiny@intel.com> wrote:
> > 
> >   
> 
> [snip]
> 
> > >   
> > > >     
> > > > > +			int i;
> > > > > +
> > > > > +			for (i = 0; i < nr_rec; i++)
> > > > > +				trace_cxl_generic_event(dev_name(cxlds->dev),
> > > > > +							type,
> > > > > +							&payload.record[i]);
> > > > > +		}
> > > > > +
> > > > > +		if (trace_cxl_overflow_enabled() &&
> > > > > +		    (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > > > +			trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > > > +
> > > > > +	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||    
> > > > 
> > > > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > > > payload not the total number.    
> > > 
> > > I don't think so.  The only value passed to the device is the _input_ payload
> > > size.  The output payload size is not passed to the device and is not included
> > > in the Get Event Records Input Payload.  (Table 8-49)
> > > 
> > > So my previous code was wrong.  Here is an example I think which is within the
> > > spec but would result in the more records flag not being set.
> > > 
> > > 	Device log depth == 10
> > > 	nr log entries == 7
> > > 	nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
> > > 
> > > Device sets Output Payload.Event Record Count == 7 (which is < 8000).  Common
> > > mailbox code truncates that to 3.  More Event Records == 0 because it sent all
> > > 7 that it had.
> > > 
> > > This code will clear 3 and read again 2 more times.
> > > 
> > > Am I reading that wrong?  
> > 
> > I think this is still wrong, but for a different reason. :)  
> 
> I hope not...  :-/
> 
> > If we don't clear the records and more records is set, that means it didn't
> > fit in the mailbox payload (potentially 1MB)  then the next read
> > will return the next set of records from there.  
> 
> That is not how I read the Get Event Records command:
> 
> From 8.2.9.2.2 Get Event Records
> 
> ... "Devices shall return event records to the host in the temporal order the
> device detected the events in. The event occurring the earliest in time, in the
> specific event log, shall be returned first."
> 
> If item 3 below is earlier than 4 then it must be returned if we have not
> cleared it.  At least that is how I read the above.  :-/

In general that doesn't work.  Imagine we cleared no records.
In that case we'd return 4 despite there being earlier records.
There is no language to cover this particular case of clearing
part of what was returned.  The device did return the records
in temporal order, we just didn't notice some of them.

The wonders of slightly loose spec wording.  Far as I can tell
we are stuck with having to come with all things that could be
read as being valid implementations.

> 
> > 
> > Taking this patch only, let's say the mailbox takes 4 records.
> > Read 1: Records 0, 1, 2, 3 More set.
> >    We handle 0, 1, 2
> > Read 2: Records 4, 5, 6 More not set.
> >    We handle 4, 5, 6
> > 
> > Record 3 is never handled.
> > 
> > If we add in clearing as happens later in the series,  
> 
> I suppose I should squash the patches as this may not work without the
> clearing.  :-/
> 
> > the current
> > assumption is that if we clear some records a subsequent read will
> > start again.  I'm not sure that is true. If it is spec reference needed.
> > 
> > So assumption is
> > Read 1: Records 0, 1, 2, 3 More set
> >   Clear 0, 1, 2
> > Read 2: Records 3, 4, 5, 6
> >   Clear 3, 4, 5 More not set, but catch it with the condition above.
> > Read 3: 6 only
> >   Clear 6
> > 
> > However, I think a valid implementation could do the following
> > (imagine a ring buffer with a pointer to the 'next' record to read out and
> >  each record has a 'valid' flag to deal with corner cases around
> >  sequences such as read log once, start reading again and some
> >  clears occur using handles obtained from first read - not that
> >  case isn't ruled out by the spec as far as I can see).  
> 
> I believe this is a violation because the next pointer can't be advanced until
> the record is cleared.  Otherwise the device is not returning items in temporal
> order based on what is in the log.

Ah. This is where we disagree.  The temporal order is (potentially?) unconnected
from the clearing.  The device did return them in temporal order, we just didn't
take any novice of record 3 being returned.
A valid reading of that temporal order comment is actually the other way around
that the device must not reset it's idea of temporal order until all records
have been read (reading 3 twice is not in temporal order - imagine we had
read 5 each time and it becomes more obvious as the read order becomes
0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
reading of the term. The more I read this, the more I think the current implementation
is not compliant with the specification at all.

I'm not seeing a spec mention of 'reseting' the ordering on clearing records
(which might have been a good thing in the first place but too late now).

> 
> > 
> > Read 1: Records 0, 1, 2, 3 More set.  'next' pointer points to record 4.
> >   Clear 0, 1, 2
> > Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
> >   Clear 4, 5, 6
> > 
> > Skipping record 3.
> > 
> > So I think we have to absorb the full mailbox payload each time to guarantee
> > we don't skip events or process them out of order (which is what would happen
> > if we relied on a retry loop - we aren't allowed to clear them out of
> > order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
> > shall verify the event record handles specified in the input payload are in
> > temporal order. ... "). 
> > Obviously that temporal order thing is only relevant if we get my second
> > example occurring on real hardware.  I think the spec is vague enough
> > to allow that implementation.  Would have been easy to specify this originally
> > but it probably won't go in as errata so we need to cope with all the
> > flexibility that is present.  
> 
> :-(  Yea coulda, woulda, shoulda...  ;-)
> 
> > 
> > What fun and oh for a parameter to control how many records are returned!  
> 
> Yea.  But I really don't think there is a problem unless someone really take
> liberty with the spec.  I think it boils down to how one interprets _when_ a
> record is removed from the log.

This is nothing to do with removal. The wording we have is just about reading
and I think a strict reading of the spec would say your assumption of a reset of the
read pointer on clear is NOT a valid implementation.  There is separate wording
about clears being in temporal order, but that doesn't effect the Get Event
Records handling.

> 
> If the record is removed when it is returned (as in your 'next' pointer
> example) then why have a clear at all?

Because if your software crashes, you don't have a handshake to reestablish
state.  If that happens you read the whole log until MORE is not set and
then read it again to get a clean list.  It's messy situation that has
been discussed before for GET POISON LIST which has the same nasty handing
of MORE.  (look in appropriate forum for resolution to that one that we can't
yet discuss here!)

Also, allows for non destructive readback (debugging tools might take a look
having paused the normal handling).

> If my interpretation is correct then
> the next available entry is the one which has not been cleared.

If that is the case the language in "More Event Records" doesn't work
"The host should continue to retrieve records using this command, until
this indicator is no longer set by the device"

With your reading of the spec, if we clear nothing, we'd keep getting the
first set of records and only be able to read more by clearing them...


>  Therefore in
> your example 'next' is not incremented until clear has been called.  I think
> that implementation is also supported by the idea that records must be cleared
> in temporal order.  Otherwise I think devices would get confused.

Not hard for device to do this (how I now read the spec) properly.

Two pointers:
1) Next to clear: CLEAR
2) Next to read:  READ

Advance the the READ pointer on Get Event Records
For CLEAR, check that the requested clears are handled in order and that
they are before the READ pointer.

Maybe we should just take it to appropriate spec forum to seek a clarification?

Jonathan

> 
> FWIW the qemu implementation is based on my interpretation ATM.
> 
> Ira
> 
> > 
> > Jonathan
> > 
> >   
> > >   
> > > >     
> > > > > +		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > > > +}    
> > > >   
> >   
> > >   
> >   


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/11] cxl/mem: Trace Memory Module Event Record
  2022-11-10 18:57 ` [PATCH 07/11] cxl/mem: Trace Memory Module " ira.weiny
  2022-11-15 22:39   ` Dave Jiang
  2022-11-16 15:35   ` Jonathan Cameron
@ 2022-11-22 22:36   ` Steven Rostedt
  2 siblings, 0 replies; 50+ messages in thread
From: Steven Rostedt @ 2022-11-22 22:36 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Jonathan Cameron, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, 10 Nov 2022 10:57:54 -0800
ira.weiny@intel.com wrote:

>  static bool cxl_event_tracing_enabled(void)
>  {
>  	return trace_cxl_generic_event_enabled() ||
>  	       trace_cxl_general_media_enabled() ||
> -	       trace_cxl_dram_enabled();
> +	       trace_cxl_dram_enabled() ||
> +	       trace_cxl_memory_module_enabled();
>  }
>  

My only concern with this patch set is that gcc may decide to not inline
this function and you will lose the performance of the static branches
provided by the trace_cxl_*enabled() functions.

Other than that, for patches 5-7 from a tracing perspective:

Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-21 10:47           ` Jonathan Cameron
@ 2022-11-28 23:30             ` Ira Weiny
  2022-11-29 12:26               ` Jonathan Cameron
  0 siblings, 1 reply; 50+ messages in thread
From: Ira Weiny @ 2022-11-28 23:30 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Davidlohr Bueso, linux-kernel, linux-cxl

On Mon, Nov 21, 2022 at 10:47:14AM +0000, Jonathan Cameron wrote:
> On Fri, 18 Nov 2022 15:26:17 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > On Thu, Nov 17, 2022 at 10:43:37AM +0000, Jonathan Cameron wrote:
> > > On Wed, 16 Nov 2022 16:47:20 -0800
> > > Ira Weiny <ira.weiny@intel.com> wrote:
> > > 
> > >   
> > 
> > [snip]
> > 
> > > >   
> > > > >     
> > > > > > +			int i;
> > > > > > +
> > > > > > +			for (i = 0; i < nr_rec; i++)
> > > > > > +				trace_cxl_generic_event(dev_name(cxlds->dev),
> > > > > > +							type,
> > > > > > +							&payload.record[i]);
> > > > > > +		}
> > > > > > +
> > > > > > +		if (trace_cxl_overflow_enabled() &&
> > > > > > +		    (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > > > > +			trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > > > > +
> > > > > > +	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||    
> > > > > 
> > > > > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > > > > payload not the total number.    
> > > > 
> > > > I don't think so.  The only value passed to the device is the _input_ payload
> > > > size.  The output payload size is not passed to the device and is not included
> > > > in the Get Event Records Input Payload.  (Table 8-49)
> > > > 
> > > > So my previous code was wrong.  Here is an example I think which is within the
> > > > spec but would result in the more records flag not being set.
> > > > 
> > > > 	Device log depth == 10
> > > > 	nr log entries == 7
> > > > 	nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
> > > > 
> > > > Device sets Output Payload.Event Record Count == 7 (which is < 8000).  Common
> > > > mailbox code truncates that to 3.  More Event Records == 0 because it sent all
> > > > 7 that it had.
> > > > 
> > > > This code will clear 3 and read again 2 more times.
> > > > 
> > > > Am I reading that wrong?  
> > > 
> > > I think this is still wrong, but for a different reason. :)  
> > 
> > I hope not...  :-/
> > 
> > > If we don't clear the records and more records is set, that means it didn't
> > > fit in the mailbox payload (potentially 1MB)  then the next read
> > > will return the next set of records from there.  
> > 
> > That is not how I read the Get Event Records command:
> > 
> > From 8.2.9.2.2 Get Event Records
> > 
> > ... "Devices shall return event records to the host in the temporal order the
> > device detected the events in. The event occurring the earliest in time, in the
> > specific event log, shall be returned first."
> > 
> > If item 3 below is earlier than 4 then it must be returned if we have not
> > cleared it.  At least that is how I read the above.  :-/
> 
> In general that doesn't work.  Imagine we cleared no records.
> In that case we'd return 4 despite there being earlier records.
> There is no language to cover this particular case of clearing
> part of what was returned.  The device did return the records
> in temporal order, we just didn't notice some of them.
> 
> The wonders of slightly loose spec wording.  Far as I can tell
> we are stuck with having to come with all things that could be
> read as being valid implementations.

So I've been thinking about this for a while.

Lets take this example:

> > > 
> > > Taking this patch only, let's say the mailbox takes 4 records.
> > > Read 1: Records 0, 1, 2, 3 More set.
> > >    We handle 0, 1, 2
> > > Read 2: Records 4, 5, 6 More not set.
> > >    We handle 4, 5, 6
> > > 

In this case what happens if you do a 3rd read?  Does the device return
nothing?  Or does it return 0, 1, 2, 3 again?

It must start from the beginning right?  But that is no longer in temporal
order by your definition either.

And if it returns nothing then there is no way to recover them except on device
reset?

FWIW I'm altering the patch set to do what you say and allocate a buffer large
enough to get all the records.  Because I am thinking you are correct.

However, considering the buffer may be large, I fear we may run afoul of memory
allocation failures.  And that will require some more tricky error recovery to
continue reading the log because the irq settings state:

"... Settings: Specifies the settings for the interrupt when the <event> event
log transitions from having no entries to having one or more entries."
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This means that no more interrupts will happen until the log is empty and
additional events occur.  So if an allocation failure happens I'll have to put
a task on a work queue to wake up and continue to try.  Otherwise the log will
stall.  Or we could just put a WARN_ON_ONCE() in and hope this never happens...

I still believe that with a clear operation defined my method makes more sense.
But I agree with you that the language is not strong.

:-(

> > > Record 3 is never handled.
> > > 
> > > If we add in clearing as happens later in the series,  
> > 
> > I suppose I should squash the patches as this may not work without the
> > clearing.  :-/
> > 
> > > the current
> > > assumption is that if we clear some records a subsequent read will
> > > start again.  I'm not sure that is true. If it is spec reference needed.
> > > 
> > > So assumption is
> > > Read 1: Records 0, 1, 2, 3 More set
> > >   Clear 0, 1, 2
> > > Read 2: Records 3, 4, 5, 6
> > >   Clear 3, 4, 5 More not set, but catch it with the condition above.
> > > Read 3: 6 only
> > >   Clear 6
> > > 
> > > However, I think a valid implementation could do the following
> > > (imagine a ring buffer with a pointer to the 'next' record to read out and
> > >  each record has a 'valid' flag to deal with corner cases around
> > >  sequences such as read log once, start reading again and some
> > >  clears occur using handles obtained from first read - not that
> > >  case isn't ruled out by the spec as far as I can see).  
> > 
> > I believe this is a violation because the next pointer can't be advanced until
> > the record is cleared.  Otherwise the device is not returning items in temporal
> > order based on what is in the log.
> 
> Ah. This is where we disagree.  The temporal order is (potentially?) unconnected
> from the clearing.  The device did return them in temporal order, we just didn't
> take any novice of record 3 being returned.

:-/

> A valid reading of that temporal order comment is actually the other way around
> that the device must not reset it's idea of temporal order until all records
> have been read (reading 3 twice is not in temporal order - imagine we had
> read 5 each time and it becomes more obvious as the read order becomes
> 0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
> reading of the term.

Well I guess.  My reading was that it must return the first element temporally
within the list at the time of the Get operation.

So in this example since 3 is still in the list it must return it first.  Each
read is considered atomic from the others.  Yes as long as 0 is in the queue it
will be returned.

But I can see it your way too...

>
> The more I read this, the more I think the current implementation
> is not compliant with the specification at all.
> 
> I'm not seeing a spec mention of 'reseting' the ordering on clearing records
> (which might have been a good thing in the first place but too late now).

There is no resetting of order.  Only that the device does not consider the
previous reads on determining which events to return on any individual Get
call.

> 
> > 
> > > 
> > > Read 1: Records 0, 1, 2, 3 More set.  'next' pointer points to record 4.
> > >   Clear 0, 1, 2
> > > Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
> > >   Clear 4, 5, 6
> > > 
> > > Skipping record 3.
> > > 
> > > So I think we have to absorb the full mailbox payload each time to guarantee
> > > we don't skip events or process them out of order (which is what would happen
> > > if we relied on a retry loop - we aren't allowed to clear them out of
> > > order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
> > > shall verify the event record handles specified in the input payload are in
> > > temporal order. ... "). 
> > > Obviously that temporal order thing is only relevant if we get my second
> > > example occurring on real hardware.  I think the spec is vague enough
> > > to allow that implementation.  Would have been easy to specify this originally
> > > but it probably won't go in as errata so we need to cope with all the
> > > flexibility that is present.  
> > 
> > :-(  Yea coulda, woulda, shoulda...  ;-)
> > 
> > > 
> > > What fun and oh for a parameter to control how many records are returned!  
> > 
> > Yea.  But I really don't think there is a problem unless someone really take
> > liberty with the spec.  I think it boils down to how one interprets _when_ a
> > record is removed from the log.
> 
> This is nothing to do with removal. The wording we have is just about reading
> and I think a strict reading of the spec would say your assumption of a reset of the
> read pointer on clear is NOT a valid implementation.  There is separate wording
> about clears being in temporal order, but that doesn't effect the Get Event
> Records handling.
> 
> > 
> > If the record is removed when it is returned (as in your 'next' pointer
> > example) then why have a clear at all?
> 
> Because if your software crashes, you don't have a handshake to reestablish
> state.  If that happens you read the whole log until MORE is not set and
> then read it again to get a clean list.  It's messy situation that has
> been discussed before for GET POISON LIST which has the same nasty handing
> of MORE.  (look in appropriate forum for resolution to that one that we can't
> yet discuss here!)

I can see the similarities but I think events are a more ephemeral item which
makes sense to clear once they are consumed.  The idea that they should be left
for others to consume does not make sense to me.  Where Poison is something
which could be a permanent marker which should be left in a list.

> 
> Also, allows for non destructive readback (debugging tools might take a look
> having paused the normal handling).

That is true.

> 
> > If my interpretation is correct then
> > the next available entry is the one which has not been cleared.
> 
> If that is the case the language in "More Event Records" doesn't work
> "The host should continue to retrieve records using this command, until
> this indicator is no longer set by the device"
> 
> With your reading of the spec, if we clear nothing, we'd keep getting the
> first set of records and only be able to read more by clearing them...
> 

Yea.

> 
> >  Therefore in
> > your example 'next' is not incremented until clear has been called.  I think
> > that implementation is also supported by the idea that records must be cleared
> > in temporal order.  Otherwise I think devices would get confused.
> 
> Not hard for device to do this (how I now read the spec) properly.
> 
> Two pointers:
> 1) Next to clear: CLEAR
> 2) Next to read:  READ
> 
> Advance the the READ pointer on Get Event Records

And loop back to the start on a further read...  I'm looking at changing the
code for this but I think making it fully robust under a memory allocation
failure is going to be more tedious or we punt.

> For CLEAR, check that the requested clears are handled in order and that
> they are before the READ pointer.
> 
> Maybe we should just take it to appropriate spec forum to seek a clarification?

Probably.  I've not paid attention lately.

I've sent a separate email with you cc'ed.  Perhaps we can get some
clarification before I completely rework this.

Ira

> 
> Jonathan
> 
> > 
> > FWIW the qemu implementation is based on my interpretation ATM.
> > 
> > Ira
> > 
> > > 
> > > Jonathan
> > > 
> > >   
> > > >   
> > > > >     
> > > > > > +		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > > > > +}    
> > > > >   
> > >   
> > > >   
> > >   
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-28 23:30             ` Ira Weiny
@ 2022-11-29 12:26               ` Jonathan Cameron
  2022-11-30  5:09                 ` Ira Weiny
  0 siblings, 1 reply; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-29 12:26 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Davidlohr Bueso, linux-kernel, linux-cxl

On Mon, 28 Nov 2022 15:30:12 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> On Mon, Nov 21, 2022 at 10:47:14AM +0000, Jonathan Cameron wrote:
> > On Fri, 18 Nov 2022 15:26:17 -0800
> > Ira Weiny <ira.weiny@intel.com> wrote:
> >   
> > > On Thu, Nov 17, 2022 at 10:43:37AM +0000, Jonathan Cameron wrote:  
> > > > On Wed, 16 Nov 2022 16:47:20 -0800
> > > > Ira Weiny <ira.weiny@intel.com> wrote:
> > > > 
> > > >     
> > > 
> > > [snip]
> > >   
> > > > >     
> > > > > >       
> > > > > > > +			int i;
> > > > > > > +
> > > > > > > +			for (i = 0; i < nr_rec; i++)
> > > > > > > +				trace_cxl_generic_event(dev_name(cxlds->dev),
> > > > > > > +							type,
> > > > > > > +							&payload.record[i]);
> > > > > > > +		}
> > > > > > > +
> > > > > > > +		if (trace_cxl_overflow_enabled() &&
> > > > > > > +		    (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > > > > > +			trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > > > > > +
> > > > > > > +	} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||      
> > > > > > 
> > > > > > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > > > > > payload not the total number.      
> > > > > 
> > > > > I don't think so.  The only value passed to the device is the _input_ payload
> > > > > size.  The output payload size is not passed to the device and is not included
> > > > > in the Get Event Records Input Payload.  (Table 8-49)
> > > > > 
> > > > > So my previous code was wrong.  Here is an example I think which is within the
> > > > > spec but would result in the more records flag not being set.
> > > > > 
> > > > > 	Device log depth == 10
> > > > > 	nr log entries == 7
> > > > > 	nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
> > > > > 
> > > > > Device sets Output Payload.Event Record Count == 7 (which is < 8000).  Common
> > > > > mailbox code truncates that to 3.  More Event Records == 0 because it sent all
> > > > > 7 that it had.
> > > > > 
> > > > > This code will clear 3 and read again 2 more times.
> > > > > 
> > > > > Am I reading that wrong?    
> > > > 
> > > > I think this is still wrong, but for a different reason. :)    
> > > 
> > > I hope not...  :-/
> > >   
> > > > If we don't clear the records and more records is set, that means it didn't
> > > > fit in the mailbox payload (potentially 1MB)  then the next read
> > > > will return the next set of records from there.    
> > > 
> > > That is not how I read the Get Event Records command:
> > > 
> > > From 8.2.9.2.2 Get Event Records
> > > 
> > > ... "Devices shall return event records to the host in the temporal order the
> > > device detected the events in. The event occurring the earliest in time, in the
> > > specific event log, shall be returned first."
> > > 
> > > If item 3 below is earlier than 4 then it must be returned if we have not
> > > cleared it.  At least that is how I read the above.  :-/  
> > 
> > In general that doesn't work.  Imagine we cleared no records.
> > In that case we'd return 4 despite there being earlier records.
> > There is no language to cover this particular case of clearing
> > part of what was returned.  The device did return the records
> > in temporal order, we just didn't notice some of them.
> > 
> > The wonders of slightly loose spec wording.  Far as I can tell
> > we are stuck with having to come with all things that could be
> > read as being valid implementations.  
> 
> So I've been thinking about this for a while.
> 
> Lets take this example:
> 
> > > > 
> > > > Taking this patch only, let's say the mailbox takes 4 records.
> > > > Read 1: Records 0, 1, 2, 3 More set.
> > > >    We handle 0, 1, 2
> > > > Read 2: Records 4, 5, 6 More not set.
> > > >    We handle 4, 5, 6
> > > >   
> 
> In this case what happens if you do a 3rd read?  Does the device return
> nothing?  Or does it return 0, 1, 2, 3 again?
> 
> It must start from the beginning right?  But that is no longer in temporal
> order by your definition either.

Agreed that is not clearly specified either.  I assume it works the same
way as poison where we raised the question and conclusion was it starts again
at the beginning. In fact we have to loop twice to guarantee that we have
all the records (as other software may have crashed half way through reading
the poison list so we don't know if we have the first record or not)..

> 
> And if it returns nothing then there is no way to recover them except on device
> reset?
> 
> FWIW I'm altering the patch set to do what you say and allocate a buffer large
> enough to get all the records.  Because I am thinking you are correct.

Horrible, but maybe the best we can do (subject to suggested hack below ;)

> 
> However, considering the buffer may be large, I fear we may run afoul of memory
> allocation failures.  And that will require some more tricky error recovery to
> continue reading the log because the irq settings state:
> 

We could implement cleverer mailbox handling to avoid the large allocation requirement.
Would be messy though as we'd effectively have to lock the mailbox whilst we did
multiple reads of the content into a smaller buffer.

> "... Settings: Specifies the settings for the interrupt when the <event> event
> log transitions from having no entries to having one or more entries."
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> This means that no more interrupts will happen until the log is empty and
> additional events occur.  So if an allocation failure happens I'll have to put
> a task on a work queue to wake up and continue to try.  Otherwise the log will
> stall.  Or we could just put a WARN_ON_ONCE() in and hope this never happens...

I think the WARN_ON_ONCE() is probably fine.  If we are paranoid vmalloc one
when we initially connect device as failure less likely...

As a side note, seems like we should maybe take a request to SSWG for devices
to optionally be told to use a smaller mailbox than they support - in order to allow
for corners like this. There is such a command but it's prohibited on primary and
secondary mailboxes (Set Response Message Limit).  That is allowed on switch CCIs,
I guess because it is assumed they may be connected to a BMC without much memory.

> 
> I still believe that with a clear operation defined my method makes more sense.
> But I agree with you that the language is not strong.

Absolutely agree!  Your method would be the one I'd push for if we were starting
from scratch (or another similar method looking like what I can't talk about
for a similar case...)

> 
> :-(
> 
> > > > Record 3 is never handled.
> > > > 
> > > > If we add in clearing as happens later in the series,    
> > > 
> > > I suppose I should squash the patches as this may not work without the
> > > clearing.  :-/
> > >   
> > > > the current
> > > > assumption is that if we clear some records a subsequent read will
> > > > start again.  I'm not sure that is true. If it is spec reference needed.
> > > > 
> > > > So assumption is
> > > > Read 1: Records 0, 1, 2, 3 More set
> > > >   Clear 0, 1, 2
> > > > Read 2: Records 3, 4, 5, 6
> > > >   Clear 3, 4, 5 More not set, but catch it with the condition above.
> > > > Read 3: 6 only
> > > >   Clear 6
> > > > 
> > > > However, I think a valid implementation could do the following
> > > > (imagine a ring buffer with a pointer to the 'next' record to read out and
> > > >  each record has a 'valid' flag to deal with corner cases around
> > > >  sequences such as read log once, start reading again and some
> > > >  clears occur using handles obtained from first read - not that
> > > >  case isn't ruled out by the spec as far as I can see).    
> > > 
> > > I believe this is a violation because the next pointer can't be advanced until
> > > the record is cleared.  Otherwise the device is not returning items in temporal
> > > order based on what is in the log.  
> > 
> > Ah. This is where we disagree.  The temporal order is (potentially?) unconnected
> > from the clearing.  The device did return them in temporal order, we just didn't
> > take any novice of record 3 being returned.  
> 
> :-/
> 
> > A valid reading of that temporal order comment is actually the other way around
> > that the device must not reset it's idea of temporal order until all records
> > have been read (reading 3 twice is not in temporal order - imagine we had
> > read 5 each time and it becomes more obvious as the read order becomes
> > 0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
> > reading of the term.  
> 
> Well I guess.  My reading was that it must return the first element temporally
> within the list at the time of the Get operation.
> 
> So in this example since 3 is still in the list it must return it first.  Each
> read is considered atomic from the others.  Yes as long as 0 is in the queue it
> will be returned.
> 
> But I can see it your way too...

That pesky text under More Event Records flag doesn't mention clearing when it
says "The host should continue to retrieve 
records using this command, until this indicator is no longer set by the 
device."

I wish it did :(

> 
> >
> > The more I read this, the more I think the current implementation
> > is not compliant with the specification at all.
> > 
> > I'm not seeing a spec mention of 'reseting' the ordering on clearing records
> > (which might have been a good thing in the first place but too late now).  
> 
> There is no resetting of order.  Only that the device does not consider the
> previous reads on determining which events to return on any individual Get
> call.

Sure, see above quote though. 

> 
> >   
> > >   
> > > > 
> > > > Read 1: Records 0, 1, 2, 3 More set.  'next' pointer points to record 4.
> > > >   Clear 0, 1, 2
> > > > Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
> > > >   Clear 4, 5, 6
> > > > 
> > > > Skipping record 3.
> > > > 
> > > > So I think we have to absorb the full mailbox payload each time to guarantee
> > > > we don't skip events or process them out of order (which is what would happen
> > > > if we relied on a retry loop - we aren't allowed to clear them out of
> > > > order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
> > > > shall verify the event record handles specified in the input payload are in
> > > > temporal order. ... "). 
> > > > Obviously that temporal order thing is only relevant if we get my second
> > > > example occurring on real hardware.  I think the spec is vague enough
> > > > to allow that implementation.  Would have been easy to specify this originally
> > > > but it probably won't go in as errata so we need to cope with all the
> > > > flexibility that is present.    
> > > 
> > > :-(  Yea coulda, woulda, shoulda...  ;-)
> > >   
> > > > 
> > > > What fun and oh for a parameter to control how many records are returned!    
> > > 
> > > Yea.  But I really don't think there is a problem unless someone really take
> > > liberty with the spec.  I think it boils down to how one interprets _when_ a
> > > record is removed from the log.  
> > 
> > This is nothing to do with removal. The wording we have is just about reading
> > and I think a strict reading of the spec would say your assumption of a reset of the
> > read pointer on clear is NOT a valid implementation.  There is separate wording
> > about clears being in temporal order, but that doesn't effect the Get Event
> > Records handling.
> >   
> > > 
> > > If the record is removed when it is returned (as in your 'next' pointer
> > > example) then why have a clear at all?  
> > 
> > Because if your software crashes, you don't have a handshake to reestablish
> > state.  If that happens you read the whole log until MORE is not set and
> > then read it again to get a clean list.  It's messy situation that has
> > been discussed before for GET POISON LIST which has the same nasty handing
> > of MORE.  (look in appropriate forum for resolution to that one that we can't
> > yet discuss here!)  
> 
> I can see the similarities but I think events are a more ephemeral item which
> makes sense to clear once they are consumed.  The idea that they should be left
> for others to consume does not make sense to me.  Where Poison is something
> which could be a permanent marker which should be left in a list.

Agreed - but sections use same wording for the More flag.. So we need to interpret
the same.

> 
> > 
> > Also, allows for non destructive readback (debugging tools might take a look
> > having paused the normal handling).  
> 
> That is true.
> 
> >   
> > > If my interpretation is correct then
> > > the next available entry is the one which has not been cleared.  
> > 
> > If that is the case the language in "More Event Records" doesn't work
> > "The host should continue to retrieve records using this command, until
> > this indicator is no longer set by the device"
> > 
> > With your reading of the spec, if we clear nothing, we'd keep getting the
> > first set of records and only be able to read more by clearing them...
> >   
> 
> Yea.
> 
> >   
> > >  Therefore in
> > > your example 'next' is not incremented until clear has been called.  I think
> > > that implementation is also supported by the idea that records must be cleared
> > > in temporal order.  Otherwise I think devices would get confused.  
> > 
> > Not hard for device to do this (how I now read the spec) properly.
> > 
> > Two pointers:
> > 1) Next to clear: CLEAR
> > 2) Next to read:  READ
> > 
> > Advance the the READ pointer on Get Event Records  
> 
> And loop back to the start on a further read...  I'm looking at changing the
> code for this but I think making it fully robust under a memory allocation
> failure is going to be more tedious or we punt.

If we get a memory allocation failure, perhaps we could do the follow horrible hack.

1 Allocate a small buffer.
2 Read once.
3 Hopefully we get the full record - in which case success.
4 Clear those records.
5 If not dealt with all records - read again until More Event Records not set
  (may already not be if it fitted in the buffer)
6 Go back to 2.

If we think a valid implementation might reset the read pointer on clear then
there is a variant where we make use of the fact the handles are constant
 - read 3 records, clear 2 and then use the handle of remaining one to identify
   if we have the next 3 to clear or not... 

> 
> > For CLEAR, check that the requested clears are handled in order and that
> > they are before the READ pointer.
> > 
> > Maybe we should just take it to appropriate spec forum to seek a clarification?  
> 
> Probably.  I've not paid attention lately.
> 
> I've sent a separate email with you cc'ed.  Perhaps we can get some
> clarification before I completely rework this.

Fingers crossed.

Thanks,

Jonathan

> 
> Ira
> 
> > 
> > Jonathan
> >   
> > > 
> > > FWIW the qemu implementation is based on my interpretation ATM.
> > > 
> > > Ira
> > >   
> > > > 
> > > > Jonathan
> > > > 
> > > >     
> > > > >     
> > > > > >       
> > > > > > > +		 payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > > > > > +}      
> > > > > >     
> > > >     
> > > > >     
> > > >     
> >   


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 09/11] cxl/test: Add generic mock events
  2022-11-16 16:00   ` Jonathan Cameron
@ 2022-11-29 18:29     ` Ira Weiny
  0 siblings, 0 replies; 50+ messages in thread
From: Ira Weiny @ 2022-11-29 18:29 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Nov 16, 2022 at 04:00:53PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:56 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Facilitate testing basic Get/Clear Event functionality by creating
> > multiple logs and generic events with made up UUID's.
> > 
> > Data is completely made up with data patterns which should be easy to
> > spot in trace output.
> > 
> > A single sysfs entry resets the event data and triggers collecting the
> > events for testing.
> > 
> > Test traces are easy to obtain with a small script such as this:
> > 
> > 	#!/bin/bash -x
> > 
> > 	devices=`find /sys/devices/platform -name cxl_mem*`
> > 
> > 	# Turn on tracing
> > 	echo "" > /sys/kernel/tracing/trace
> > 	echo 1 > /sys/kernel/tracing/events/cxl/enable
> > 	echo 1 > /sys/kernel/tracing/tracing_on
> > 
> > 	# Generate fake interrupt
> > 	for device in $devices; do
> > 	        echo 1 > $device/event_trigger
> > 	done
> > 
> > 	# Turn off tracing and report events
> > 	echo 0 > /sys/kernel/tracing/tracing_on
> > 	cat /sys/kernel/tracing/trace
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> Hi Ira,
> 
> I don't think your mocked device is now obeying the spec
> after changes in the core code that mean it gets a larger
> request than previously.
> If it has more than 1 record and the read is for 3 it
> must return more than 1 and only set MORE_RECORDS if there
> are more than 3.

Based on our other conversations I think it is fine if this test only returns 1
record at a time.  As long as it sets Event Record Count to 1 and More Event
Records appropriately.  The Clear can loop through the number of handles
specified (which will be 1 in this case but it is not hard to do this).

> 
> Gah. The more event records approach also suffers the
> same problem that poison list does. You have no way
> to be sure that "previous software" (which may have crashed)
> hasn't already read some.  So in the core code we probably
> need to do one more read on initial boot to be sure we have
> all the records.  Not sure how I spotted that for poison
> but never noticed it for these.  At least for these records
> the expectation is that there won't be a huge number of them
> so reading one more time is fine - particularly as you clear
> them on that initial read so the list will get shorter.

I can throw in another round of reads on start up.  Actually to be 100% sure we
get all the events cleared the initial read of events needs to be _after_ the
irq setup.  Because irqs are only triggered when the event logs go from zero
to non-zero events.  :-(

Thanks for making me think though this more.
Ira

> 
> Jonathan
> 
> > 
> > ---
> > Changes from RFC v2:
> > 	Adjust to simulate the event status register
> > 
> > Changes from RFC:
> > 	Separate out the event code
> > 	Adjust for struct changes.
> > 	Clean up devm_cxl_mock_event_logs()
> > 	Clean up naming and comments
> > 	Jonathan
> > 		Remove dynamic allocation of event logs
> > 		Clean up comment
> > 		Remove unneeded xarray
> > 		Ensure event_trigger sysfs is valid prior to the driver
> > 		going active.
> > 	Dan
> > 		Remove the fill/reset event sysfs as these operations
> > 		can be done together
> > ---
> >  drivers/cxl/core/mbox.c         |  31 +++--
> >  drivers/cxl/cxlmem.h            |   1 +
> >  tools/testing/cxl/test/Kbuild   |   2 +-
> >  tools/testing/cxl/test/events.c | 222 ++++++++++++++++++++++++++++++++
> >  tools/testing/cxl/test/events.h |   9 ++
> >  tools/testing/cxl/test/mem.c    |  35 ++++-
> >  6 files changed, 286 insertions(+), 14 deletions(-)
> >  create mode 100644 tools/testing/cxl/test/events.c
> >  create mode 100644 tools/testing/cxl/test/events.h
> 
> 
> > diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
> > new file mode 100644
> > index 000000000000..a4816f230bb5
> > --- /dev/null
> > +++ b/tools/testing/cxl/test/events.c
> 
> 
> xl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
> > +};
> > +
> > +struct mock_event_store {
> > +	struct cxl_dev_state *cxlds;
> > +	struct mock_event_log mock_logs[CXL_EVENT_TYPE_MAX];
> > +	u32 ev_status;
> > +};
> > +
> > +DEFINE_XARRAY(mock_dev_event_store);
> 
> Perhaps add a comment on what this xarray is for.
> I think it's all to allow associating some extra data with the devices
> without bloating structures outside of tests?
> 
> > +
> > +struct mock_event_log *find_event_log(struct device *dev, int log_type)
> > +{
> > +	struct mock_event_store *mes = xa_load(&mock_dev_event_store,
> > +					       (unsigned long)dev);
> > +
> > +	if (!mes || log_type >= CXL_EVENT_TYPE_MAX)
> > +		return NULL;
> > +	return &mes->mock_logs[log_type];
> > +}
> > +
> 
> > +
> > +int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> > +{
> > +	struct cxl_get_event_payload *pl;
> > +	struct mock_event_log *log;
> > +	u8 log_type;
> > +
> > +	/* Valid request? */
> > +	if (cmd->size_in != sizeof(log_type))
> > +		return -EINVAL;
> > +
> > +	log_type = *((u8 *)cmd->payload_in);
> > +	if (log_type >= CXL_EVENT_TYPE_MAX)
> > +		return -EINVAL;
> > +
> > +	log = find_event_log(cxlds->dev, log_type);
> > +	if (!log || log_empty(log))
> > +		goto no_data;
> > +
> > +	pl = cmd->payload_out;
> > +	memset(pl, 0, sizeof(*pl));
> > +
> > +	pl->record_count = cpu_to_le16(1);
> 
> Not valid.  Kernel now requests 3 and as I read the spec we have
> to return 3 if we have 3 or more to return. Can't send 1 and set
> MORE_RECORDS as done here.
> 
> > +
> > +	if (log_rec_left(log) > 1)
> > +		pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
> > +
> > +	memcpy(&pl->record[0], get_cur_event(log), sizeof(pl->record[0]));
> > +	pl->record[0].hdr.handle = get_cur_event_handle(log);
> > +	return 0;
> > +
> > +no_data:
> > +	/* Room for header? */
> 
> Why check for space here, but not when setting records above?
> 
> > +	if (cmd->size_out < (sizeof(*pl) - sizeof(pl->record[0])))
> > +		return -EINVAL;
> > +
> > +	memset(cmd->payload_out, 0, cmd->size_out);
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(mock_get_event);
> > +
> > +/*
> > + * Get and clear event only handle 1 record at a time as this is what is
> > + * currently implemented in the main code.
> > + */
> > +int mock_clear_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> > +{
> > +	struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
> > +	struct mock_event_log *log;
> > +	u8 log_type = pl->event_log;
> > +
> > +	/* Don't handle more than 1 record at a time */
> > +	if (pl->nr_recs != 1)
> > +		return -EINVAL;
> > +
> > +	if (log_type >= CXL_EVENT_TYPE_MAX)
> > +		return -EINVAL;
> > +
> > +	log = find_event_log(cxlds->dev, log_type);
> > +	if (!log)
> > +		return 0; /* No mock data in this log */
> > +
> > +	/*
> > +	 * Test code only reported 1 event at a time.  So only support 1 event
> > +	 * being cleared.
> > +	 */
> > +	if (log->cur_event != le16_to_cpu(pl->handle[0])) {
> > +		dev_err(cxlds->dev, "Clearing events out of order\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	log->cur_event++;
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(mock_clear_event);
> 
> ...
> 
> > +
> > +struct cxl_event_record_raw maint_needed = {
> > +	.hdr = {
> > +		.id = UUID_INIT(0xDEADBEEF, 0xCAFE, 0xBABE,
> > +				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > +		.length = sizeof(struct cxl_event_record_raw),
> > +		.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> > +		/* .handle = Set dynamically */
> 
> Multiple devices... So this should be const and a copy made for each one
> to avoid races.
> 
> > +		.related_handle = cpu_to_le16(0xa5b6),
> > +	},
> > +	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > +};
> > +
> > +struct cxl_event_record_raw hardware_replace = {
> > +	.hdr = {
> > +		.id = UUID_INIT(0xBABECAFE, 0xBEEF, 0xDEAD,
> > +				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > +		.length = sizeof(struct cxl_event_record_raw),
> > +		.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> > +		/* .handle = Set dynamically */
> > +		.related_handle = cpu_to_le16(0xb6a5),
> > +	},
> > +	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > +};
> > +
> > +u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
> > +{
> > +	struct device *dev = cxlds->dev;
> > +	struct mock_event_store *mes;
> > +
> > +	mes = devm_kzalloc(dev, sizeof(*mes), GFP_KERNEL);
> > +	if (WARN_ON(!mes))
> > +		return 0;
> > +	mes->cxlds = cxlds;
> > +
> > +	if (xa_insert(&mock_dev_event_store, (unsigned long)dev, mes,
> > +		      GFP_KERNEL)) {
> > +		dev_err(dev, "Event store not available for %s\n",
> > +			dev_name(dev));
> > +		return 0;
> > +	}
> > +
> > +	event_store_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
> > +	mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
> > +
> > +	event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> > +	mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL;
> > +
> > +	return mes->ev_status;
> > +}
> > +EXPORT_SYMBOL_GPL(cxl_mock_add_event_logs);
> > +
> > +void cxl_mock_remove_event_logs(struct device *dev)
> > +{
> > +	struct mock_event_store *mes;
> > +
> > +	mes = xa_erase(&mock_dev_event_store, (unsigned long)dev);
> > +}
> > +EXPORT_SYMBOL_GPL(cxl_mock_remove_event_logs);
> 
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index e2f5445d24ff..333fa8527a07 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> ...
> 
> 
> >  static int cxl_mock_mem_probe(struct platform_device *pdev)
> >  {
> >  	struct device *dev = &pdev->dev;
> >  	struct cxl_memdev *cxlmd;
> >  	struct cxl_dev_state *cxlds;
> > +	u32 ev_status;
> >  	void *lsa;
> >  	int rc;
> >  
> > @@ -281,11 +304,13 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> >  	if (rc)
> >  		return rc;
> >  
> > +	ev_status = cxl_mock_add_event_logs(cxlds);
> For below comment, add a devm_add_action_or_reset() here to
> undo this.  If nothing else, without one you should have error
> handling...
> 
> > +
> >  	cxlmd = devm_cxl_add_memdev(cxlds);
> >  	if (IS_ERR(cxlmd))
> >  		return PTR_ERR(cxlmd);
> >  
> > -	cxl_mem_get_event_records(cxlds);
> > +	__cxl_mem_get_event_records(cxlds, ev_status);
> >  
> >  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> >  		rc = devm_cxl_add_nvdimm(dev, cxlmd);
> > @@ -293,6 +318,12 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> >  	return 0;
> >  }
> >  
> > +static int cxl_mock_mem_remove(struct platform_device *pdev)
> > +{
> > +	cxl_mock_remove_event_logs(&pdev->dev);
> 
> Given you have a bunch of devm above, probably better to just use
> a devm_add_action_or_reset() to clean this up.
> Saves on introducing remove for just this one call + any potential
> ordering issues (I'm too lazy to check if there are any ;)
> 
> 
> > +	return 0;
> > +}
> > +
> >  static const struct platform_device_id cxl_mock_mem_ids[] = {
> >  	{ .name = "cxl_mem", },
> >  	{ },
> > @@ -301,9 +332,11 @@ MODULE_DEVICE_TABLE(platform, cxl_mock_mem_ids);
> >  
> >  static struct platform_driver cxl_mock_mem_driver = {
> >  	.probe = cxl_mock_mem_probe,
> > +	.remove = cxl_mock_mem_remove,
> >  	.id_table = cxl_mock_mem_ids,
> >  	.driver = {
> >  		.name = KBUILD_MODNAME,
> > +		.dev_groups = cxl_mock_event_groups,
> >  	},
> >  };
> >  
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-29 12:26               ` Jonathan Cameron
@ 2022-11-30  5:09                 ` Ira Weiny
  2022-11-30 14:05                   ` Jonathan Cameron
  0 siblings, 1 reply; 50+ messages in thread
From: Ira Weiny @ 2022-11-30  5:09 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Davidlohr Bueso, linux-kernel, linux-cxl

On Tue, Nov 29, 2022 at 12:26:20PM +0000, Jonathan Cameron wrote:
> On Mon, 28 Nov 2022 15:30:12 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 

[snip]

> > > A valid reading of that temporal order comment is actually the other way around
> > > that the device must not reset it's idea of temporal order until all records
> > > have been read (reading 3 twice is not in temporal order - imagine we had
> > > read 5 each time and it becomes more obvious as the read order becomes
> > > 0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
> > > reading of the term.  
> > 
> > Well I guess.  My reading was that it must return the first element temporally
> > within the list at the time of the Get operation.
> > 
> > So in this example since 3 is still in the list it must return it first.  Each
> > read is considered atomic from the others.  Yes as long as 0 is in the queue it
> > will be returned.
> > 
> > But I can see it your way too...
> 
> That pesky text under More Event Records flag doesn't mention clearing when it
> says "The host should continue to retrieve 
> records using this command, until this indicator is no longer set by the 
> device."
> 
> I wish it did :(
> 

As I have reviewed these in my head again I have come to the conclusion that
the More Event Records flags is useless.  Let me explain:

The Clear all Records flag is useless because if an event which occurs between the
Get and Clear all operation will be dropped without the host having seen it.

However, while clearing records based on the handles read, additional events
could come in.  Because of the way the interrupts are specified the host can't
be sure that those new events will cause a zero to non-zero transition.  This
is because there is no way to guarantee all the events were cleared at the
moment the events came in.

I believe this is what you mentioned in another email about needing an 'extra
read' at the end to ensure there was nothing more to be read.  But based on
that logic the only thing that matters is the Get Event.Record
Count.  If it is not 0 keep on reading because while the host is clearing the
records another event could come in.

In other words, the only way to be sure that all records are seen is to do a
Get and see the number of records equal to 0.  Thus any further events will
trigger an interrupt and we can safely exit the loop.

Ira

Basically the loop looks like:

	int nr_rec;

	do {
		... <Get Events> ...

		nr_rec = le16_to_cpu(payload->record_count);

		... <for each record trace> ...
		... <for each record clear> ...

	} while (nr_rec);


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/11] cxl/mem: Wire up event interrupts
  2022-11-16 14:40   ` Jonathan Cameron
@ 2022-11-30  9:11     ` Ira Weiny
  0 siblings, 0 replies; 50+ messages in thread
From: Ira Weiny @ 2022-11-30  9:11 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Wed, Nov 16, 2022 at 02:40:21PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:55 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL device events are signaled via interrupts.  Each event log may have
> > a different interrupt message number.  These message numbers are
> > reported in the Get Event Interrupt Policy mailbox command.
> > 
> > Add interrupt support for event logs.  Interrupts are allocated as
> > shared interrupts.  Therefore, all or some event logs can share the same
> > message number.
> > 
> > The driver must deal with the possibility that dynamic capacity is not
> > yet supported by a device it sees.  Fallback and retry without dynamic
> > capacity if the first attempt fails.
> > 
> > Device capacity event logs interrupt as part of the informational event
> > log.  Check the event status to see which log has data.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> Hi Ira,
> 
> A few comments inline.

Thanks for the review!

> 
> Thanks,
> 
> Jonathan
> 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 879b228a98a0..1e6762af2a00 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> 
> >  /**
> >   * cxl_mem_get_event_records - Get Event Records from the device
> > @@ -867,6 +870,52 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> >  
> > +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
> > +{
> > +	struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
> > +	size_t policy_size = sizeof(*policy);
> > +	bool retry = true;
> > +	int rc;
> > +
> > +	policy->info_settings = CXL_INT_MSI_MSIX;
> > +	policy->warn_settings = CXL_INT_MSI_MSIX;
> > +	policy->failure_settings = CXL_INT_MSI_MSIX;
> > +	policy->fatal_settings = CXL_INT_MSI_MSIX;
> > +	policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> > +
> > +again:
> > +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
> > +			       policy, policy_size, NULL, 0);
> > +	if (rc < 0) {
> > +		/*
> > +		 * If the device does not support dynamic capacity it may fail
> > +		 * the command due to an invalid payload.  Retry without
> > +		 * dynamic capacity.
> > +		 */
> 
> There are a number of ways to discover if DCD is supported that aren't based
> on try and retry like this. 9.13.3 has "basic sequence to utilize Dynamic Capacity"
> That calls out:
> Verify the necessary Dynamic Capacity commands are returned in the CEL.
> 
> First I'm not sure we should set the interrupt on for DCD until we have a lot
> more of the flow handled, secondly even then we should figure out if it is supported
> at a higher level than this command and pass that info down here.

I'm not sure I really agree.  The events are just traced.  I think this
functionality is really orthogonal to if any other support for DCD is there.

Regardless like I said in the call I think deferring this is the right way to
go for now.

> 
> 
> > +		if (retry) {
> > +			retry = false;
> > +			policy->dyn_cap_settings = 0;
> > +			policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> > +			goto again;
> > +		}
> > +		dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
> > +			rc);
> > +		memset(policy, CXL_INT_NONE, sizeof(*policy));
> 
> Relying on all the fields being 1 byte is a bit error prone. I'd just set them all
> individually in the interests of more readable code.

Done.

> 
> > +		return rc;
> > +	}
> > +
> > +	rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVT_INT_POLICY, NULL, 0,
> > +			       policy, policy_size);
> 
> Add a comment on why you are reading this back (to get the msgnums in the upper
> bits) as it's not obvious to a casual reader.

Done.

> 
> > +	if (rc < 0) {
> > +		dev_err(cxlds->dev, "Failed to get event interrupt policy : %d",
> > +			rc);
> > +		return rc;
> > +	}
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_event_config_msgnums, CXL);
> > +
> 
> ...
> 
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index e0d511575b45..64b2e2671043 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -458,6 +458,138 @@ static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> >  	cxlds->nr_irq_vecs = nvecs;
> >  }
> >  
> > +struct cxl_event_irq_id {
> > +	struct cxl_dev_state *cxlds;
> > +	u32 status;
> > +	unsigned int msgnum;
> msgnum is only here for freeing the interrupt - I'd rather we fixed
> that by using standard infrastructure (or adding some - see below).
> 
> status is an indirect way of allowing us to share an interrupt handler.
> You could do that by registering a trivial wrapper for each instead.
> Then all you have left is the cxl_dev_state which could be passed
> in directly as the callback parameter removing need to have this
> structure at all.  I think that might be neater.

It does prevent the alloc of this structure which I like.

I've made the change.

> 
> > +};
> > +
> > +static irqreturn_t cxl_event_int_thread(int irq, void *id)
> > +{
> > +	struct cxl_event_irq_id *cxlid = id;
> > +	struct cxl_dev_state *cxlds = cxlid->cxlds;
> > +
> > +	if (cxlid->status & CXLDEV_EVENT_STATUS_INFO)
> > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> > +	if (cxlid->status & CXLDEV_EVENT_STATUS_WARN)
> > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> > +	if (cxlid->status & CXLDEV_EVENT_STATUS_FAIL)
> > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> > +	if (cxlid->status & CXLDEV_EVENT_STATUS_FATAL)
> > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> > +	if (cxlid->status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
> > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
> > +
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +static irqreturn_t cxl_event_int_handler(int irq, void *id)
> > +{
> > +	struct cxl_event_irq_id *cxlid = id;
> > +	struct cxl_dev_state *cxlds = cxlid->cxlds;
> > +	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> > +
> > +	if (cxlid->status & status)
> > +		return IRQ_WAKE_THREAD;
> > +	return IRQ_HANDLED;
> 
> If status not set IRQ_NONE.
> Ah. I see Dave raised this as well.

Yep done.

> 
> > +}
> 
> ...
> 
> > +static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
> > +				 enum cxl_event_log_type log_type,
> > +				 u8 setting)
> > +{
> > +	struct device *dev = cxlds->dev;
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +	struct cxl_event_irq_id *id;
> > +	unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
> > +	int irq;
> > +
> > +	/* Disabled irq is not an error */
> > +	if (!cxl_evt_int_is_msi(setting) || msgnum > cxlds->nr_irq_vecs) {
> 
> I don't think that second condition can occur.  The language under table 8-52
> (I think) means that it will move around if there aren't enough vectors
> (for MSI - MSI-X is more complex, but result the same).

Based on the other review this is just a bool msi_enabled which is used to
determine if this should be set up at all.

> 
> > +		dev_dbg(dev, "Event interrupt not enabled; %s %u %d\n",
> > +			cxl_event_log_type_str(CXL_EVENT_TYPE_INFO),
> > +			msgnum, cxlds->nr_irq_vecs);
> > +		return 0;
> > +	}
> > +
> > +	id = devm_kzalloc(dev, sizeof(*id), GFP_KERNEL);
> > +	if (!id)
> > +		return -ENOMEM;
> > +
> > +	id->cxlds = cxlds;
> > +	id->msgnum = msgnum;
> > +	id->status = log_type_to_status(log_type);
> > +
> > +	irq = pci_request_irq(pdev, id->msgnum, cxl_event_int_handler,
> > +			      cxl_event_int_thread, id,
> > +			      "%s:event-log-%s", dev_name(dev),
> > +			      cxl_event_log_type_str(log_type));
> > +	if (irq)
> > +		return irq;
> > +
> > +	devm_add_action_or_reset(dev, cxl_free_event_irq, id);
> 
> Hmm. no pcim_request_irq()  maybe this is the time to propose one
> (separate from this patch so we don't get delayed by that!)

Perhaps.  But not tonight...  ;-)

> 
> We discussed this way back in DOE series (I'd forgotten but lore found
> it for me).  There I suggested just calling
> devm_request_threaded_irq() directly as a work around.

Yea that works fine.  One issue is we lose the format printing of the irq name:

...
 29:  ...  PCI-MSI 100663300-edge      0000:c0:00.0:event-log-Fatal
 30:  ...  PCI-MSI 100663301-edge      0000:c0:00.0:event-log-Failure
 31:  ...  PCI-MSI 100663302-edge      0000:c0:00.0:event-log-Warning
 32:  ...  PCI-MSI 100663303-edge      0000:c0:00.0:event-log-Informational
...

Thanks,
Ira

> 
> > +	return 0;
> > +}
> > +
> > +static void cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> > +{
> > +	struct device *dev = cxlds->dev;
> > +	u8 setting;
> > +
> > +	if (cxl_event_config_msgnums(cxlds))
> > +		return;
> > +
> > +	/*
> > +	 * Dynamic Capacity shares the info message number
> > +	 * Nothing to be done except check the status bit in the
> > +	 * irq thread.
> > +	 */
> > +	setting = cxlds->evt_int_policy.info_settings;
> > +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_INFO, setting))
> > +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> > +			cxl_event_log_type_str(CXL_EVENT_TYPE_INFO));
> > +
> > +	setting = cxlds->evt_int_policy.warn_settings;
> > +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_WARN, setting))
> > +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> > +			cxl_event_log_type_str(CXL_EVENT_TYPE_WARN));
> > +
> > +	setting = cxlds->evt_int_policy.failure_settings;
> > +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FAIL, setting))
> > +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> > +			cxl_event_log_type_str(CXL_EVENT_TYPE_FAIL));
> > +
> > +	setting = cxlds->evt_int_policy.fatal_settings;
> > +	if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FATAL, setting))
> > +		dev_err(dev, "Failed to get interrupt for %s event log\n",
> > +			cxl_event_log_type_str(CXL_EVENT_TYPE_FATAL));
> > +}
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/11] cxl/mem: Trace Memory Module Event Record
  2022-11-17 11:22       ` Jonathan Cameron
@ 2022-11-30  9:30         ` Ira Weiny
  0 siblings, 0 replies; 50+ messages in thread
From: Ira Weiny @ 2022-11-30  9:30 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Dan Williams, Alison Schofield, Vishal Verma, Ben Widawsky,
	Steven Rostedt, Davidlohr Bueso, linux-kernel, linux-cxl

On Thu, Nov 17, 2022 at 11:22:35AM +0000, Jonathan Cameron wrote:
> On Wed, 16 Nov 2022 17:23:58 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > On Wed, Nov 16, 2022 at 03:35:28PM +0000, Jonathan Cameron wrote:
> > > On Thu, 10 Nov 2022 10:57:54 -0800
> > > ira.weiny@intel.com wrote:
> > >   
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > 
> > > > CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> > > > 
> > > > Determine if the event read is memory module record and if so trace the
> > > > record.
> > > > 
> > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > >   
> > > Noticed that we have a mixture of fully capitalized and not for flags.
> > > With that either explained or tidied up:
> > > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > >   
> > > > +/*
> > > > + * Device Health Information - DHI
> > > > + *
> > > > + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> > > > + */
> > > > +#define CXL_DHI_HS_MAINTENANCE_NEEDED				BIT(0)
> > > > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED				BIT(1)
> > > > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED			BIT(2)
> > > > +#define show_health_status_flags(flags)	__print_flags(flags, "|",	   \
> > > > +	{ CXL_DHI_HS_MAINTENANCE_NEEDED,	"Maintenance Needed"	}, \
> > > > +	{ CXL_DHI_HS_PERFORMANCE_DEGRADED,	"Performance Degraded"	}, \
> > > > +	{ CXL_DHI_HS_HW_REPLACEMENT_NEEDED,	"Replacement Needed"	}  \  
> > > 
> > > Why are we sometime using capitals for flags (e.g patch 5) and not other times?  
> > 
> > Not sure what you mean.  Do you mean this from patch 5?
> Nope
> 
> +#define CXL_DPA_VOLATILE			BIT(0)
> +#define CXL_DPA_NOT_REPAIRABLE			BIT(1)
> +#define show_dpa_flags(flags)	__print_flags(flags, "|",		   \
> +	{ CXL_DPA_VOLATILE,			"VOLATILE"		}, \
> +	{ CXL_DPA_NOT_REPAIRABLE,		"NOT_REPAIRABLE"	}  \
> +)
> +
> 
> Where they are all capitals.  I thought that was maybe a flags vs other fields
> thing but it doesn't seem to be.

I've not made all flags capital on this and other patches.

Ira

> 
> 
> > 
> > ...
> >         { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,         "Uncorrectable Event"   }, \
> >         { CXL_GMER_EVT_DESC_THRESHOLD_EVENT,            "Threshold event"       }, \
> >         { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW,       "Poison List Overflow"  }  \
> > ...
> > 
> > Threshold event was a mistake.  This is the capitalization the spec uses.
> > 
> > Bit[0]: Uncorrectable Event: When set, indicates the reported event is
> >         ^^^^^^^^^^^^^^^^^^^
> > uncorrectable by the device. When cleared, indicates the reported
> > event was corrected by the device.
> > 
> > Bit[1]: Threshold Event: When set, the event is the result of a
> >         ^^^^^^^^^^^^^^^
> > threshold on the device having been reached. When cleared, the event
> > is not the result of a threshold limit.
> > 
> > Bit[2]: Poison List Overflow Event: When set, the Poison List has
> >         ^^^^^^^^^^^^^^^^^^^^^^^^^^
> > overflowed, and this event is not in the Poison List. When cleared, the
> > Poison List has not overflowed.
> > 
> > 
> > I'll update this 'Event' in patch 5.  Probably need to add 'Event' to the
> > Poison List...
> > 
> > Ira
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command
  2022-11-30  5:09                 ` Ira Weiny
@ 2022-11-30 14:05                   ` Jonathan Cameron
  0 siblings, 0 replies; 50+ messages in thread
From: Jonathan Cameron @ 2022-11-30 14:05 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Steven Rostedt, Alison Schofield, Vishal Verma,
	Ben Widawsky, Davidlohr Bueso, linux-kernel, linux-cxl

On Tue, 29 Nov 2022 21:09:58 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> On Tue, Nov 29, 2022 at 12:26:20PM +0000, Jonathan Cameron wrote:
> > On Mon, 28 Nov 2022 15:30:12 -0800
> > Ira Weiny <ira.weiny@intel.com> wrote:
> >   
> 
> [snip]
> 
> > > > A valid reading of that temporal order comment is actually the other way around
> > > > that the device must not reset it's idea of temporal order until all records
> > > > have been read (reading 3 twice is not in temporal order - imagine we had
> > > > read 5 each time and it becomes more obvious as the read order becomes
> > > > 0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
> > > > reading of the term.    
> > > 
> > > Well I guess.  My reading was that it must return the first element temporally
> > > within the list at the time of the Get operation.
> > > 
> > > So in this example since 3 is still in the list it must return it first.  Each
> > > read is considered atomic from the others.  Yes as long as 0 is in the queue it
> > > will be returned.
> > > 
> > > But I can see it your way too...  
> > 
> > That pesky text under More Event Records flag doesn't mention clearing when it
> > says "The host should continue to retrieve 
> > records using this command, until this indicator is no longer set by the 
> > device."
> > 
> > I wish it did :(
> >   
> 
> As I have reviewed these in my head again I have come to the conclusion that
> the More Event Records flags is useless.  Let me explain:
> 
> The Clear all Records flag is useless because if an event which occurs between the
> Get and Clear all operation will be dropped without the host having seen it.

Can still be used to get a known clean sheet if you don't care about a bunch
of records on initial boot because no data in flight yet etc.
Agreed it is no use if you care about content of the records.

Make sure interrupts are enabled before re-checking if there are new records
to close that race.

> 
> However, while clearing records based on the handles read, additional events
> could come in.  Because of the way the interrupts are specified the host can't
> be sure that those new events will cause a zero to non-zero transition.  This
> is because there is no way to guarantee all the events were cleared at the
> moment the events came in.
> 
> I believe this is what you mentioned in another email about needing an 'extra
> read' at the end to ensure there was nothing more to be read.  But based on
> that logic the only thing that matters is the Get Event.Record
> Count.  If it is not 0 keep on reading because while the host is clearing the
> records another event could come in.
> 
> In other words, the only way to be sure that all records are seen is to do a
> Get and see the number of records equal to 0.  Thus any further events will
> trigger an interrupt and we can safely exit the loop.

Agreed - standard race to close when ever we have a FIFO with edge interrupts
on how full it is.

More records is useful for a different potential pattern of non destructive
read and later clear.  Or for a debug non destructive read.


	int nr_rec;
	<list>

round_we_go:
	do {
		... <for each record trace and add to list...> ...	
		... 
	} while (!MORE);

	for_each_list_entry() {
		clear records one at a time.
	}

	nr_rec = le16_to_cpu(payload->record_count);
	if (nr_rec)
		goto round_we_go;

	...

> 
> Ira
> 
> Basically the loop looks like:
> 
> 	int nr_rec;
> 
> 	do {
> 		... <Get Events> ...
> 
> 		nr_rec = le16_to_cpu(payload->record_count);
> 
> 		... <for each record trace> ...
> 		... <for each record clear> ...
> 
> 	} while (nr_rec);
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-11-30 14:05 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-10 18:57 [PATCH 00/11] CXL: Process event logs ira.weiny
2022-11-10 18:57 ` [PATCH 01/11] cxl/pci: Add generic MSI-X/MSI irq support ira.weiny
2022-11-15 21:41   ` Dave Jiang
2022-11-16 14:53   ` Jonathan Cameron
2022-11-16 23:48     ` Ira Weiny
2022-11-17 11:20       ` Jonathan Cameron
2022-11-10 18:57 ` [PATCH 02/11] cxl/mem: Implement Get Event Records command ira.weiny
2022-11-15 21:54   ` Dave Jiang
2022-11-16 15:19   ` Jonathan Cameron
2022-11-17  0:47     ` Ira Weiny
2022-11-17 10:43       ` Jonathan Cameron
2022-11-18 23:26         ` Ira Weiny
2022-11-21 10:47           ` Jonathan Cameron
2022-11-28 23:30             ` Ira Weiny
2022-11-29 12:26               ` Jonathan Cameron
2022-11-30  5:09                 ` Ira Weiny
2022-11-30 14:05                   ` Jonathan Cameron
2022-11-10 18:57 ` [PATCH 03/11] cxl/mem: Implement Clear " ira.weiny
2022-11-15 22:09   ` Dave Jiang
2022-11-16 15:24   ` Jonathan Cameron
2022-11-16 15:45     ` Jonathan Cameron
2022-11-17  1:12       ` Ira Weiny
2022-11-17  1:07     ` Ira Weiny
2022-11-10 18:57 ` [PATCH 04/11] cxl/mem: Clear events on driver load ira.weiny
2022-11-15 22:10   ` Dave Jiang
2022-11-10 18:57 ` [PATCH 05/11] cxl/mem: Trace General Media Event Record ira.weiny
2022-11-15 22:25   ` Dave Jiang
2022-11-16 15:31   ` Jonathan Cameron
2022-11-17  1:18     ` Ira Weiny
2022-11-10 18:57 ` [PATCH 06/11] cxl/mem: Trace DRAM " ira.weiny
2022-11-15 22:26   ` Dave Jiang
2022-11-10 18:57 ` [PATCH 07/11] cxl/mem: Trace Memory Module " ira.weiny
2022-11-15 22:39   ` Dave Jiang
2022-11-16 15:35   ` Jonathan Cameron
2022-11-17  1:23     ` Ira Weiny
2022-11-17 11:22       ` Jonathan Cameron
2022-11-30  9:30         ` Ira Weiny
2022-11-22 22:36   ` Steven Rostedt
2022-11-10 18:57 ` [PATCH 08/11] cxl/mem: Wire up event interrupts ira.weiny
2022-11-15 23:13   ` Dave Jiang
2022-11-17  1:38     ` Ira Weiny
2022-11-16 14:40   ` Jonathan Cameron
2022-11-30  9:11     ` Ira Weiny
2022-11-10 18:57 ` [PATCH 09/11] cxl/test: Add generic mock events ira.weiny
2022-11-16 16:00   ` Jonathan Cameron
2022-11-29 18:29     ` Ira Weiny
2022-11-10 18:57 ` [PATCH 10/11] cxl/test: Add specific events ira.weiny
2022-11-16 16:08   ` Jonathan Cameron
2022-11-10 18:57 ` [PATCH 11/11] cxl/test: Simulate event log overflow ira.weiny
2022-11-16 16:10   ` Jonathan Cameron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.