All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support
@ 2022-10-10 22:29 ira.weiny
  2022-10-10 22:29 ` [RFC PATCH 1/6] qemu/bswap: Add const_le64() ira.weiny
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: ira.weiny @ 2022-10-10 22:29 UTC (permalink / raw)
  To: Michael Tsirkin, Ben Widawsky, Jonathan Cameron
  Cc: Ira Weiny, qemu-devel, linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

CXL Event records inform the OS of various CXL device events.  Thus far CXL
memory devices are emulated and therefore don't naturally have events which
will occur.

Add mock events and a HMP trigger mechanism to facilitate guest OS testing of
event support.

This support requires a follow on version of the event patch set.  The RFC was
submitted and discussed here:

	https://lore.kernel.org/linux-cxl/20220813053243.757363-1-ira.weiny@intel.com/

I'll post the lore link to the new version shortly.

Instructions for running this test.

Add qmp option to qemu:

	<host> $ qemu-system-x86_64 ... -qmp unix:/tmp/run_qemu_qmp_0,server,nowait ...

	OR

	<host> $ run_qemu.sh ... --qmp ...

Enable tracing of events within the guest:

	<guest> $ echo "" > /sys/kernel/tracing/trace
	<guest> $ echo 1 > /sys/kernel/tracing/events/cxl/enable
	<guest> $ echo 1 > /sys/kernel/tracing/tracing_on

Trigger event generation and interrupts in the host:

	<host> $ echo "cxl_event_inject cxl-devX" | qmp-shell -H /tmp/run_qemu_qmp_0

	Where X == one of the memory devices; cxl-dev0 should work.

View events on the guest:

	<guest> $ cat /sys/kernel/tracing/trace


Ira Weiny (6):
  qemu/bswap: Add const_le64()
  qemu/uuid: Add UUID static initializer
  hw/cxl/cxl-events: Add CXL mock events
  hw/cxl/mailbox: Wire up get/clear event mailbox commands
  hw/cxl/cxl-events: Add event interrupt support
  hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy

 hmp-commands.hx             |  14 ++
 hw/cxl/cxl-device-utils.c   |   1 +
 hw/cxl/cxl-events.c         | 330 ++++++++++++++++++++++++++++++++++++
 hw/cxl/cxl-host-stubs.c     |   5 +
 hw/cxl/cxl-mailbox-utils.c  | 224 +++++++++++++++++++++---
 hw/cxl/meson.build          |   1 +
 hw/mem/cxl_type3.c          |   7 +-
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h | 194 +++++++++++++++++++++
 include/qemu/bswap.h        |  10 ++
 include/qemu/uuid.h         |  12 ++
 include/sysemu/sysemu.h     |   3 +
 12 files changed, 802 insertions(+), 21 deletions(-)
 create mode 100644 hw/cxl/cxl-events.c
 create mode 100644 include/hw/cxl/cxl_events.h


base-commit: 6f7f81898e4437ea544ee4ca24bef7ec543b1f06
-- 
2.37.2


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 1/6] qemu/bswap: Add const_le64()
  2022-10-10 22:29 [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support ira.weiny
@ 2022-10-10 22:29 ` ira.weiny
  2022-10-11  9:03     ` Jonathan Cameron via
  2022-10-11  9:48   ` Peter Maydell
  2022-10-10 22:29 ` [RFC PATCH 2/6] qemu/uuid: Add UUID static initializer ira.weiny
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: ira.weiny @ 2022-10-10 22:29 UTC (permalink / raw)
  To: Michael Tsirkin, Ben Widawsky, Jonathan Cameron
  Cc: Ira Weiny, qemu-devel, linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Gcc requires constant versions of cpu_to_le* calls.

Add a 64 bit version.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 include/qemu/bswap.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index 346d05f2aab3..08e607821102 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -192,10 +192,20 @@ CPU_CONVERT(le, 64, uint64_t)
      (((_x) & 0x0000ff00U) <<  8) |              \
      (((_x) & 0x00ff0000U) >>  8) |              \
      (((_x) & 0xff000000U) >> 24))
+# define const_le64(_x)                          \
+    ((((_x) & 0x00000000000000ffU) << 56) |      \
+     (((_x) & 0x000000000000ff00U) << 40) |      \
+     (((_x) & 0x0000000000ff0000U) << 24) |      \
+     (((_x) & 0x00000000ff000000U) <<  8) |      \
+     (((_x) & 0x000000ff00000000U) >>  8) |      \
+     (((_x) & 0x0000ff0000000000U) >> 24) |      \
+     (((_x) & 0x00ff000000000000U) >> 40) |      \
+     (((_x) & 0xff00000000000000U) >> 56))
 # define const_le16(_x)                          \
     ((((_x) & 0x00ff) << 8) |                    \
      (((_x) & 0xff00) >> 8))
 #else
+# define const_le64(_x) (_x)
 # define const_le32(_x) (_x)
 # define const_le16(_x) (_x)
 #endif
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 2/6] qemu/uuid: Add UUID static initializer
  2022-10-10 22:29 [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support ira.weiny
  2022-10-10 22:29 ` [RFC PATCH 1/6] qemu/bswap: Add const_le64() ira.weiny
@ 2022-10-10 22:29 ` ira.weiny
  2022-10-11  9:13     ` Jonathan Cameron via
  2022-10-10 22:29 ` [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events ira.weiny
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: ira.weiny @ 2022-10-10 22:29 UTC (permalink / raw)
  To: Michael Tsirkin, Ben Widawsky, Jonathan Cameron
  Cc: Ira Weiny, qemu-devel, linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

UUID's are defined as network byte order fields.  No static initializer
was available for UUID's in their standard big endian format.

Define a big endian initializer for UUIDs.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 include/qemu/uuid.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
index 9925febfa54d..dc40ee1fc998 100644
--- a/include/qemu/uuid.h
+++ b/include/qemu/uuid.h
@@ -61,6 +61,18 @@ typedef struct {
     (clock_seq_hi_and_reserved), (clock_seq_low), (node0), (node1), (node2),\
     (node3), (node4), (node5) }
 
+/* Normal (network byte order) UUID */
+#define UUID(time_low, time_mid, time_hi_and_version,                    \
+  clock_seq_hi_and_reserved, clock_seq_low, node0, node1, node2,         \
+  node3, node4, node5)                                                   \
+  { ((time_low) >> 24) & 0xff, ((time_low) >> 16) & 0xff,                \
+    ((time_low) >> 8) & 0xff, (time_low) & 0xff,                         \
+    ((time_mid) >> 8) & 0xff, (time_mid) & 0xff,                         \
+    ((time_hi_and_version) >> 8) & 0xff, (time_hi_and_version) & 0xff,   \
+    (clock_seq_hi_and_reserved), (clock_seq_low),                        \
+    (node0), (node1), (node2), (node3), (node4), (node5)                 \
+  }
+
 #define UUID_FMT "%02hhx%02hhx%02hhx%02hhx-" \
                  "%02hhx%02hhx-%02hhx%02hhx-" \
                  "%02hhx%02hhx-" \
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
  2022-10-10 22:29 [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support ira.weiny
  2022-10-10 22:29 ` [RFC PATCH 1/6] qemu/bswap: Add const_le64() ira.weiny
  2022-10-10 22:29 ` [RFC PATCH 2/6] qemu/uuid: Add UUID static initializer ira.weiny
@ 2022-10-10 22:29 ` ira.weiny
  2022-10-11 10:07     ` Jonathan Cameron via
  2022-12-19 10:07     ` Jonathan Cameron via
  2022-10-10 22:29 ` [RFC PATCH 4/6] hw/cxl/mailbox: Wire up get/clear event mailbox commands ira.weiny
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: ira.weiny @ 2022-10-10 22:29 UTC (permalink / raw)
  To: Michael Tsirkin, Ben Widawsky, Jonathan Cameron
  Cc: Ira Weiny, qemu-devel, linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

To facilitate testing of guest software add mock events and code to
support iterating through the event logs.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 hw/cxl/cxl-events.c         | 248 ++++++++++++++++++++++++++++++++++++
 hw/cxl/meson.build          |   1 +
 include/hw/cxl/cxl_device.h |  19 +++
 include/hw/cxl/cxl_events.h | 173 +++++++++++++++++++++++++
 4 files changed, 441 insertions(+)
 create mode 100644 hw/cxl/cxl-events.c
 create mode 100644 include/hw/cxl/cxl_events.h

diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
new file mode 100644
index 000000000000..c275280bcb64
--- /dev/null
+++ b/hw/cxl/cxl-events.c
@@ -0,0 +1,248 @@
+/*
+ * CXL Event processing
+ *
+ * Copyright(C) 2022 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include <stdint.h>
+
+#include "qemu/osdep.h"
+#include "qemu/bswap.h"
+#include "qemu/typedefs.h"
+#include "hw/cxl/cxl.h"
+#include "hw/cxl/cxl_events.h"
+
+struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type)
+{
+    if (log_type >= CXL_EVENT_TYPE_MAX) {
+        return NULL;
+    }
+    return &cxlds->event_logs[log_type];
+}
+
+struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log)
+{
+    return log->events[log->cur_event];
+}
+
+uint16_t get_cur_event_handle(struct cxl_event_log *log)
+{
+    return cpu_to_le16(log->cur_event);
+}
+
+bool log_empty(struct cxl_event_log *log)
+{
+    return log->cur_event == log->nr_events;
+}
+
+int log_rec_left(struct cxl_event_log *log)
+{
+    return log->nr_events - log->cur_event;
+}
+
+static void event_store_add_event(CXLDeviceState *cxlds,
+                                  enum cxl_event_log_type log_type,
+                                  struct cxl_event_record_raw *event)
+{
+    struct cxl_event_log *log;
+
+    assert(log_type < CXL_EVENT_TYPE_MAX);
+
+    log = &cxlds->event_logs[log_type];
+    assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);
+
+    log->events[log->nr_events] = event;
+    log->nr_events++;
+}
+
+uint16_t log_overflow(struct cxl_event_log *log)
+{
+    int cnt = log_rec_left(log) - 5;
+
+    if (cnt < 0) {
+        return 0;
+    }
+    return cnt;
+}
+
+#define CXL_EVENT_RECORD_FLAG_PERMANENT         BIT(2)
+#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED      BIT(3)
+#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED     BIT(4)
+#define CXL_EVENT_RECORD_FLAG_HW_REPLACE        BIT(5)
+
+struct cxl_event_record_raw maint_needed = {
+    .hdr = {
+        .id.data = UUID(0xDEADBEEF, 0xCAFE, 0xBABE,
+                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
+        .length = sizeof(struct cxl_event_record_raw),
+        .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
+        /* .handle = Set dynamically */
+        .related_handle = const_le16(0xa5b6),
+    },
+    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
+};
+
+struct cxl_event_record_raw hardware_replace = {
+    .hdr = {
+        .id.data = UUID(0xBABECAFE, 0xBEEF, 0xDEAD,
+                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
+        .length = sizeof(struct cxl_event_record_raw),
+        .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
+        /* .handle = Set dynamically */
+        .related_handle = const_le16(0xb6a5),
+    },
+    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
+};
+
+#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT            BIT(0)
+#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT               BIT(1)
+#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW          BIT(2)
+
+#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR                 0x00
+#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR                  0x01
+#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR           0x02
+
+#define CXL_GMER_TRANS_UNKNOWN                          0x00
+#define CXL_GMER_TRANS_HOST_READ                        0x01
+#define CXL_GMER_TRANS_HOST_WRITE                       0x02
+#define CXL_GMER_TRANS_HOST_SCAN_MEDIA                  0x03
+#define CXL_GMER_TRANS_HOST_INJECT_POISON               0x04
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB             0x05
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT        0x06
+
+#define CXL_GMER_VALID_CHANNEL                          BIT(0)
+#define CXL_GMER_VALID_RANK                             BIT(1)
+#define CXL_GMER_VALID_DEVICE                           BIT(2)
+#define CXL_GMER_VALID_COMPONENT                        BIT(3)
+
+struct cxl_event_gen_media gen_media = {
+    .hdr = {
+        .id.data = UUID(0xfbcd0a77, 0xc260, 0x417f,
+                        0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+        .length = sizeof(struct cxl_event_gen_media),
+        .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
+        /* .handle = Set dynamically */
+        .related_handle = const_le16(0),
+    },
+    .phys_addr = const_le64(0x2000),
+    .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+    .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+    .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+    .validity_flags = { CXL_GMER_VALID_CHANNEL |
+                        CXL_GMER_VALID_RANK, 0 },
+    .channel = 1,
+    .rank = 30
+};
+
+#define CXL_DER_VALID_CHANNEL                           BIT(0)
+#define CXL_DER_VALID_RANK                              BIT(1)
+#define CXL_DER_VALID_NIBBLE                            BIT(2)
+#define CXL_DER_VALID_BANK_GROUP                        BIT(3)
+#define CXL_DER_VALID_BANK                              BIT(4)
+#define CXL_DER_VALID_ROW                               BIT(5)
+#define CXL_DER_VALID_COLUMN                            BIT(6)
+#define CXL_DER_VALID_CORRECTION_MASK                   BIT(7)
+
+struct cxl_event_dram dram = {
+    .hdr = {
+        .id.data = UUID(0x601dcbb3, 0x9c06, 0x4eab,
+                        0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+        .length = sizeof(struct cxl_event_dram),
+        .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
+        /* .handle = Set dynamically */
+        .related_handle = const_le16(0),
+    },
+    .phys_addr = const_le64(0x8000),
+    .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+    .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+    .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+    .validity_flags = { CXL_DER_VALID_CHANNEL |
+                        CXL_DER_VALID_BANK_GROUP |
+                        CXL_DER_VALID_BANK |
+                        CXL_DER_VALID_COLUMN, 0 },
+    .channel = 1,
+    .bank_group = 5,
+    .bank = 2,
+    .column = { 0xDE, 0xAD},
+};
+
+#define CXL_MMER_HEALTH_STATUS_CHANGE           0x00
+#define CXL_MMER_MEDIA_STATUS_CHANGE            0x01
+#define CXL_MMER_LIFE_USED_CHANGE               0x02
+#define CXL_MMER_TEMP_CHANGE                    0x03
+#define CXL_MMER_DATA_PATH_ERROR                0x04
+#define CXL_MMER_LAS_ERROR                      0x05
+
+#define CXL_DHI_HS_MAINTENANCE_NEEDED           BIT(0)
+#define CXL_DHI_HS_PERFORMANCE_DEGRADED         BIT(1)
+#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED        BIT(2)
+
+#define CXL_DHI_MS_NORMAL                                    0x00
+#define CXL_DHI_MS_NOT_READY                                 0x01
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST                    0x02
+#define CXL_DHI_MS_ALL_DATA_LOST                             0x03
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS   0x04
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN     0x05
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT           0x06
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS      0x07
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN        0x08
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT              0x09
+
+#define CXL_DHI_AS_NORMAL               0x0
+#define CXL_DHI_AS_WARNING              0x1
+#define CXL_DHI_AS_CRITICAL             0x2
+
+#define CXL_DHI_AS_LIFE_USED(as)        (as & 0x3)
+#define CXL_DHI_AS_DEV_TEMP(as)         ((as & 0xC) >> 2)
+#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)  ((as & 0x10) >> 4)
+#define CXL_DHI_AS_COR_PER_ERR_CNT(as)  ((as & 0x20) >> 5)
+
+struct cxl_event_mem_module mem_module = {
+    .hdr = {
+        .id.data = UUID(0xfe927475, 0xdd59, 0x4339,
+                        0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+        .length = sizeof(struct cxl_event_mem_module),
+        /* .handle = Set dynamically */
+        .related_handle = const_le16(0),
+    },
+    .event_type = CXL_MMER_TEMP_CHANGE,
+    .info = {
+        .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+        .media_status = CXL_DHI_MS_ALL_DATA_LOST,
+        .add_status = (CXL_DHI_AS_CRITICAL << 2) |
+                       (CXL_DHI_AS_WARNING << 4) |
+                       (CXL_DHI_AS_WARNING << 5),
+        .device_temp = { 0xDE, 0xAD},
+        .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
+        .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+        .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+    }
+};
+
+void cxl_mock_add_event_logs(CXLDeviceState *cxlds)
+{
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO, &maint_needed);
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
+                          (struct cxl_event_record_raw *)&gen_media);
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
+                          (struct cxl_event_record_raw *)&mem_module);
+
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &maint_needed);
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
+                          (struct cxl_event_record_raw *)&dram);
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
+                          (struct cxl_event_record_raw *)&gen_media);
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
+                          (struct cxl_event_record_raw *)&mem_module);
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
+                          (struct cxl_event_record_raw *)&dram);
+
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL, &hardware_replace);
+    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL,
+                          (struct cxl_event_record_raw *)&dram);
+}
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index cfa95ffd40b7..71059972d435 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -5,6 +5,7 @@ softmmu_ss.add(when: 'CONFIG_CXL',
                    'cxl-mailbox-utils.c',
                    'cxl-host.c',
                    'cxl-cdat.c',
+                   'cxl-events.c',
                ),
                if_false: files(
                    'cxl-host-stubs.c',
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 7b4cff569347..46c50c1c13a6 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -11,6 +11,7 @@
 #define CXL_DEVICE_H
 
 #include "hw/register.h"
+#include "hw/cxl/cxl_events.h"
 
 /*
  * The following is how a CXL device's Memory Device registers are laid out.
@@ -80,6 +81,14 @@
     (CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH +     \
      CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
 
+#define CXL_TEST_EVENT_CNT_MAX 15
+
+struct cxl_event_log {
+    int cur_event;
+    int nr_events;
+    struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
+};
+
 typedef struct cxl_device_state {
     MemoryRegion device_registers;
 
@@ -119,6 +128,8 @@ typedef struct cxl_device_state {
 
     /* memory region for persistent memory, HDM */
     uint64_t pmem_size;
+
+    struct cxl_event_log event_logs[CXL_EVENT_TYPE_MAX];
 } CXLDeviceState;
 
 /* Initialize the register block for a device */
@@ -272,4 +283,12 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
                             unsigned size, MemTxAttrs attrs);
 
+void cxl_mock_add_event_logs(CXLDeviceState *cxlds);
+struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type);
+struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log);
+uint16_t get_cur_event_handle(struct cxl_event_log *log);
+bool log_empty(struct cxl_event_log *log);
+int log_rec_left(struct cxl_event_log *log);
+uint16_t log_overflow(struct cxl_event_log *log);
+
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
new file mode 100644
index 000000000000..255111f3dcfb
--- /dev/null
+++ b/include/hw/cxl/cxl_events.h
@@ -0,0 +1,173 @@
+/*
+ * QEMU CXL Events
+ *
+ * Copyright (c) 2022 Intel
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef CXL_EVENTS_H
+#define CXL_EVENTS_H
+
+#include "qemu/uuid.h"
+#include "hw/cxl/cxl.h"
+
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+#define CXL_EVENT_REC_HDR_RES_LEN 0xf
+struct cxl_event_record_hdr {
+    QemuUUID id;
+    uint8_t length;
+    uint8_t flags[3];
+    uint16_t handle;
+    uint16_t related_handle;
+    uint64_t timestamp;
+    uint8_t maint_op_class;
+    uint8_t reserved[CXL_EVENT_REC_HDR_RES_LEN];
+} QEMU_PACKED;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+    struct cxl_event_record_hdr hdr;
+    uint8_t data[CXL_EVENT_RECORD_DATA_LENGTH];
+} QEMU_PACKED;
+
+/*
+ * Get Event Records output payload
+ * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
+ *
+ * Space given for 1 record
+ */
+#define CXL_GET_EVENT_FLAG_OVERFLOW     BIT(0)
+#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
+struct cxl_get_event_payload {
+    uint8_t flags;
+    uint8_t reserved1;
+    uint16_t overflow_err_count;
+    uint64_t first_overflow_timestamp;
+    uint64_t last_overflow_timestamp;
+    uint16_t record_count;
+    uint8_t reserved2[0xa];
+    struct cxl_event_record_raw record;
+} QEMU_PACKED;
+
+/*
+ * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
+ */
+enum cxl_event_log_type {
+    CXL_EVENT_TYPE_INFO = 0x00,
+    CXL_EVENT_TYPE_WARN,
+    CXL_EVENT_TYPE_FAIL,
+    CXL_EVENT_TYPE_FATAL,
+    CXL_EVENT_TYPE_DYNAMIC_CAP,
+    CXL_EVENT_TYPE_MAX
+};
+
+static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
+{
+    switch (type) {
+    case CXL_EVENT_TYPE_INFO:
+        return "Informational";
+    case CXL_EVENT_TYPE_WARN:
+        return "Warning";
+    case CXL_EVENT_TYPE_FAIL:
+        return "Failure";
+    case CXL_EVENT_TYPE_FATAL:
+        return "Fatal";
+    case CXL_EVENT_TYPE_DYNAMIC_CAP:
+        return "Dynamic Capacity";
+    default:
+        break;
+    }
+    return "<unknown>";
+}
+
+/*
+ * Clear Event Records input payload
+ * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
+ *
+ * Space given for 1 record
+ */
+struct cxl_mbox_clear_event_payload {
+    uint8_t event_log;      /* enum cxl_event_log_type */
+    uint8_t clear_flags;
+    uint8_t nr_recs;        /* 1 for this struct */
+    uint8_t reserved[3];
+    uint16_t handle;
+};
+
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MED_COMP_ID_SIZE  0x10
+#define CXL_EVENT_GEN_MED_RES_SIZE      0x2e
+struct cxl_event_gen_media {
+    struct cxl_event_record_hdr hdr;
+    uint64_t phys_addr;
+    uint8_t descriptor;
+    uint8_t type;
+    uint8_t transaction_type;
+    uint8_t validity_flags[2];
+    uint8_t channel;
+    uint8_t rank;
+    uint8_t device[3];
+    uint8_t component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+    uint8_t reserved[CXL_EVENT_GEN_MED_RES_SIZE];
+} QEMU_PACKED;
+
+/*
+ * DRAM Event Record - DER
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVENT_DER_CORRECTION_MASK_SIZE   0x20
+#define CXL_EVENT_DER_RES_SIZE               0x17
+struct cxl_event_dram {
+    struct cxl_event_record_hdr hdr;
+    uint64_t phys_addr;
+    uint8_t descriptor;
+    uint8_t type;
+    uint8_t transaction_type;
+    uint8_t validity_flags[2];
+    uint8_t channel;
+    uint8_t rank;
+    uint8_t nibble_mask[3];
+    uint8_t bank_group;
+    uint8_t bank;
+    uint8_t row[3];
+    uint8_t column[2];
+    uint8_t correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
+    uint8_t reserved[CXL_EVENT_DER_RES_SIZE];
+} QEMU_PACKED;
+
+/*
+ * Get Health Info Record
+ * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+    uint8_t health_status;
+    uint8_t media_status;
+    uint8_t add_status;
+    uint8_t life_used;
+    uint8_t device_temp[2];
+    uint8_t dirty_shutdown_cnt[4];
+    uint8_t cor_vol_err_cnt[4];
+    uint8_t cor_per_err_cnt[4];
+} QEMU_PACKED;
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CXL_EVENT_MEM_MOD_RES_SIZE  0x3d
+struct cxl_event_mem_module {
+    struct cxl_event_record_hdr hdr;
+    uint8_t event_type;
+    struct cxl_get_health_info info;
+    uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
+} QEMU_PACKED;
+
+#endif /* CXL_EVENTS_H */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 4/6] hw/cxl/mailbox: Wire up get/clear event mailbox commands
  2022-10-10 22:29 [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support ira.weiny
                   ` (2 preceding siblings ...)
  2022-10-10 22:29 ` [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events ira.weiny
@ 2022-10-10 22:29 ` ira.weiny
  2022-10-11 10:26     ` Jonathan Cameron via
  2022-10-10 22:29 ` [RFC PATCH 5/6] hw/cxl/cxl-events: Add event interrupt support ira.weiny
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: ira.weiny @ 2022-10-10 22:29 UTC (permalink / raw)
  To: Michael Tsirkin, Ben Widawsky, Jonathan Cameron
  Cc: Ira Weiny, qemu-devel, linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Replace the stubbed out CXL Get/Clear Event mailbox commands with
commands which return the mock event information.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 hw/cxl/cxl-device-utils.c  |   1 +
 hw/cxl/cxl-mailbox-utils.c | 103 +++++++++++++++++++++++++++++++++++--
 2 files changed, 101 insertions(+), 3 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 687759b3017b..4bb41101882e 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -262,4 +262,5 @@ void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
     memdev_reg_init_common(cxl_dstate);
 
     assert(cxl_initialize_mailbox(cxl_dstate) == 0);
+    cxl_mock_add_event_logs(cxl_dstate);
 }
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index bb66c765a538..df345f23a30c 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/cxl/cxl.h"
+#include "hw/cxl/cxl_events.h"
 #include "hw/pci/pci.h"
 #include "qemu/cutils.h"
 #include "qemu/log.h"
@@ -116,11 +117,107 @@ struct cxl_cmd {
         return CXL_MBOX_SUCCESS;                                          \
     }
 
-DEFINE_MAILBOX_HANDLER_ZEROED(events_get_records, 0x20);
-DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
 DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
 DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
 
+static ret_code cmd_events_get_records(struct cxl_cmd *cmd,
+                                       CXLDeviceState *cxlds,
+                                       uint16_t *len)
+{
+    struct cxl_get_event_payload *pl;
+    struct cxl_event_log *log;
+    uint8_t log_type;
+    uint16_t nr_overflow;
+
+    if (cmd->in < sizeof(log_type)) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    log_type = *((uint8_t *)cmd->payload);
+    if (log_type >= CXL_EVENT_TYPE_MAX) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    pl = (struct cxl_get_event_payload *)cmd->payload;
+
+    log = find_event_log(cxlds, log_type);
+    if (!log || log_empty(log)) {
+        goto no_data;
+    }
+
+    memset(pl, 0, sizeof(*pl));
+    pl->record_count = const_le16(1);
+
+    if (log_rec_left(log) > 1) {
+        pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
+    }
+
+    nr_overflow = log_overflow(log);
+    if (nr_overflow) {
+        struct timespec ts;
+        uint64_t ns;
+
+        clock_gettime(CLOCK_REALTIME, &ts);
+
+        ns = ((uint64_t)ts.tv_sec * 1000000000) + (uint64_t)ts.tv_nsec;
+
+        pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW;
+        pl->overflow_err_count = cpu_to_le16(nr_overflow);
+        ns -= 5000000000; /* 5s ago */
+        pl->first_overflow_timestamp = cpu_to_le64(ns);
+        ns -= 1000000000; /* 1s ago */
+        pl->last_overflow_timestamp = cpu_to_le64(ns);
+    }
+
+    memcpy(&pl->record, get_cur_event(log), sizeof(pl->record));
+    pl->record.hdr.handle = get_cur_event_handle(log);
+    *len = sizeof(pl->record);
+    return CXL_MBOX_SUCCESS;
+
+no_data:
+    *len = sizeof(*pl) - sizeof(pl->record);
+    memset(pl, 0, *len);
+    return CXL_MBOX_SUCCESS;
+}
+
+static ret_code cmd_events_clear_records(struct cxl_cmd *cmd,
+                                         CXLDeviceState *cxlds,
+                                         uint16_t *len)
+{
+    struct cxl_mbox_clear_event_payload *pl;
+    struct cxl_event_log *log;
+    uint8_t log_type;
+
+    pl = (struct cxl_mbox_clear_event_payload *)cmd->payload;
+    log_type = pl->event_log;
+
+    /* Don't handle more than 1 record at a time */
+    if (pl->nr_recs != 1) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    if (log_type >= CXL_EVENT_TYPE_MAX) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    log = find_event_log(cxlds, log_type);
+    if (!log) {
+        return CXL_MBOX_SUCCESS;
+    }
+
+    /*
+     * The current code clears events as they are read.  Test that behavior
+     * only; don't support clearning from the middle of the log
+     */
+    if (log->cur_event != le16_to_cpu(pl->handle)) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    log->cur_event++;
+    *len = 0;
+    return CXL_MBOX_SUCCESS;
+}
+
 /* 8.2.9.2.1 */
 static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
                                              CXLDeviceState *cxl_dstate,
@@ -391,7 +488,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
     [EVENTS][GET_RECORDS] = { "EVENTS_GET_RECORDS",
         cmd_events_get_records, 1, 0 },
     [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
-        cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
+        cmd_events_clear_records, 8, IMMEDIATE_LOG_CHANGE },
     [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
         cmd_events_get_interrupt_policy, 0, 0 },
     [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 5/6] hw/cxl/cxl-events: Add event interrupt support
  2022-10-10 22:29 [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support ira.weiny
                   ` (3 preceding siblings ...)
  2022-10-10 22:29 ` [RFC PATCH 4/6] hw/cxl/mailbox: Wire up get/clear event mailbox commands ira.weiny
@ 2022-10-10 22:29 ` ira.weiny
  2022-10-11 10:30     ` Jonathan Cameron via
  2022-10-10 22:29 ` [RFC PATCH 6/6] hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy ira.weiny
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: ira.weiny @ 2022-10-10 22:29 UTC (permalink / raw)
  To: Michael Tsirkin, Ben Widawsky, Jonathan Cameron
  Cc: Ira Weiny, qemu-devel, linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

To facilitate testing of event interrupt support add a QMP HMP command
to reset the event logs and issue interrupts when the guest has enabled
those interrupts.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 hmp-commands.hx             | 14 +++++++
 hw/cxl/cxl-events.c         | 82 +++++++++++++++++++++++++++++++++++++
 hw/cxl/cxl-host-stubs.c     |  5 +++
 hw/mem/cxl_type3.c          |  7 +++-
 include/hw/cxl/cxl_device.h |  3 ++
 include/sysemu/sysemu.h     |  3 ++
 6 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 564f1de364df..c59a98097317 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1266,6 +1266,20 @@ SRST
   Inject PCIe AER error
 ERST
 
+    {
+        .name       = "cxl_event_inject",
+        .args_type  = "id:s",
+        .params     = "id <error_status>",
+        .help       = "inject cxl events and interrupt\n\t\t\t"
+                      "<id> = qdev device id\n\t\t\t",
+        .cmd        = hmp_cxl_event_inject,
+    },
+
+SRST
+``cxl_event_inject``
+  Inject CXL Events
+ERST
+
     {
         .name       = "netdev_add",
         .args_type  = "netdev:O",
diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
index c275280bcb64..6ece6f252462 100644
--- a/hw/cxl/cxl-events.c
+++ b/hw/cxl/cxl-events.c
@@ -10,8 +10,14 @@
 #include <stdint.h>
 
 #include "qemu/osdep.h"
+#include "sysemu/sysemu.h"
+#include "monitor/monitor.h"
 #include "qemu/bswap.h"
 #include "qemu/typedefs.h"
+#include "qapi/qmp/qdict.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
 #include "hw/cxl/cxl.h"
 #include "hw/cxl/cxl_events.h"
 
@@ -68,6 +74,11 @@ uint16_t log_overflow(struct cxl_event_log *log)
     return cnt;
 }
 
+static void reset_log(struct cxl_event_log *log)
+{
+    log->cur_event = 0;
+}
+
 #define CXL_EVENT_RECORD_FLAG_PERMANENT         BIT(2)
 #define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED      BIT(3)
 #define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED     BIT(4)
@@ -246,3 +257,74 @@ void cxl_mock_add_event_logs(CXLDeviceState *cxlds)
     event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL,
                           (struct cxl_event_record_raw *)&dram);
 }
+
+static void cxl_reset_all_logs(CXLDeviceState *cxlds)
+{
+    int i;
+
+    for (i = 0; i < CXL_EVENT_TYPE_MAX; i++) {
+        struct cxl_event_log *log = find_event_log(cxlds, i);
+
+        if (!log) {
+            continue;
+        }
+
+        reset_log(log);
+    }
+}
+
+static void cxl_event_irq_assert(PCIDevice *pdev)
+{
+    CXLType3Dev *ct3d = container_of(pdev, struct CXLType3Dev, parent_obj);
+    CXLDeviceState *cxlds = &ct3d->cxl_dstate;
+    int i;
+
+    for (i = 0; i < CXL_EVENT_TYPE_MAX; i++) {
+        struct cxl_event_log *log;
+
+        log = find_event_log(cxlds, i);
+        if (!log || !log->irq_enabled || log_empty(log)) {
+            continue;
+        }
+
+        /* Notifies interrupt, legacy IRQ is not supported */
+        if (msix_enabled(pdev)) {
+            msix_notify(pdev, log->irq_vec);
+        } else if (msi_enabled(pdev)) {
+            msi_notify(pdev, log->irq_vec);
+        }
+    }
+}
+
+static int do_cxl_event_inject(Monitor *mon, const QDict *qdict)
+{
+    const char *id = qdict_get_str(qdict, "id");
+    CXLType3Dev *ct3d;
+    PCIDevice *pdev;
+    int ret;
+
+    ret = pci_qdev_find_device(id, &pdev);
+    if (ret < 0) {
+        monitor_printf(mon,
+                       "id or cxl device path is invalid or device not "
+                       "found. %s\n", id);
+        return ret;
+    }
+
+    ct3d = container_of(pdev, struct CXLType3Dev, parent_obj);
+    cxl_reset_all_logs(&ct3d->cxl_dstate);
+
+    cxl_event_irq_assert(pdev);
+    return 0;
+}
+
+void hmp_cxl_event_inject(Monitor *mon, const QDict *qdict)
+{
+    const char *id = qdict_get_str(qdict, "id");
+
+    if (do_cxl_event_inject(mon, qdict) < 0) {
+        return;
+    }
+
+    monitor_printf(mon, "OK id: %s\n", id);
+}
diff --git a/hw/cxl/cxl-host-stubs.c b/hw/cxl/cxl-host-stubs.c
index cae4afcdde26..61039263f25a 100644
--- a/hw/cxl/cxl-host-stubs.c
+++ b/hw/cxl/cxl-host-stubs.c
@@ -12,4 +12,9 @@ void cxl_fmws_link_targets(CXLState *stat, Error **errp) {};
 void cxl_machine_init(Object *obj, CXLState *state) {};
 void cxl_hook_up_pxb_registers(PCIBus *bus, CXLState *state, Error **errp) {};
 
+void hmp_cxl_event_inject(Monitor *mon, const QDict *qdict)
+{
+    monitor_printf(mon, "CXL devices not supported\n");
+}
+
 const MemoryRegionOps cfmws_ops;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 2b13179d116d..b4a90136d190 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -459,7 +459,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     ComponentRegisters *regs = &cxl_cstate->crb;
     MemoryRegion *mr = &regs->component_registers;
     uint8_t *pci_conf = pci_dev->config;
-    unsigned short msix_num = 3;
+    unsigned short msix_num = 7;
     int i;
 
     if (!cxl_setup_memory(ct3d, errp)) {
@@ -502,6 +502,11 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
         msix_vector_use(pci_dev, i);
     }
 
+    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_INFO] = 6;
+    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_WARN] = 5;
+    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_FAIL] = 4;
+    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_FATAL] = 3;
+
     /* DOE Initailization */
     if (ct3d->spdm_port) {
         pcie_doe_init(pci_dev, &ct3d->doe_spdm, 0x160, doe_spdm_prot, true, 2);
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 46c50c1c13a6..41232d3b3476 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -84,6 +84,8 @@
 #define CXL_TEST_EVENT_CNT_MAX 15
 
 struct cxl_event_log {
+    bool irq_enabled;
+    int irq_vec;
     int cur_event;
     int nr_events;
     struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
@@ -129,6 +131,7 @@ typedef struct cxl_device_state {
     /* memory region for persistent memory, HDM */
     uint64_t pmem_size;
 
+    uint16_t event_vector[CXL_EVENT_TYPE_MAX];
     struct cxl_event_log event_logs[CXL_EVENT_TYPE_MAX];
 } CXLDeviceState;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 812f66a31a90..39476cc50190 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -64,6 +64,9 @@ extern unsigned int nb_prom_envs;
 /* pcie aer error injection */
 void hmp_pcie_aer_inject_error(Monitor *mon, const QDict *qdict);
 
+/* CXL */
+void hmp_cxl_event_inject(Monitor *mon, const QDict *qdict);
+
 /* serial ports */
 
 /* Return the Chardev for serial port i, or NULL if none */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 6/6] hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy
  2022-10-10 22:29 [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support ira.weiny
                   ` (4 preceding siblings ...)
  2022-10-10 22:29 ` [RFC PATCH 5/6] hw/cxl/cxl-events: Add event interrupt support ira.weiny
@ 2022-10-10 22:29 ` ira.weiny
  2022-10-11 10:40     ` Jonathan Cameron via
  2022-10-10 22:45 ` [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support Ira Weiny
  2022-10-11  9:40   ` Jonathan Cameron via
  7 siblings, 1 reply; 35+ messages in thread
From: ira.weiny @ 2022-10-10 22:29 UTC (permalink / raw)
  To: Michael Tsirkin, Ben Widawsky, Jonathan Cameron
  Cc: Ira Weiny, qemu-devel, linux-cxl

From: Ira Weiny <ira.weiny@intel.com>

Replace the stubbed out CXL Get/Set Event interrupt policy mailbox
commands.  Enable those commands to control interrupts for each of the
event log types.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 129 ++++++++++++++++++++++++++++++------
 include/hw/cxl/cxl_events.h |  21 ++++++
 2 files changed, 129 insertions(+), 21 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index df345f23a30c..52e8804c24ed 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -101,25 +101,6 @@ struct cxl_cmd {
     uint8_t *payload;
 };
 
-#define DEFINE_MAILBOX_HANDLER_ZEROED(name, size)                         \
-    uint16_t __zero##name = size;                                         \
-    static ret_code cmd_##name(struct cxl_cmd *cmd,                       \
-                               CXLDeviceState *cxl_dstate, uint16_t *len) \
-    {                                                                     \
-        *len = __zero##name;                                              \
-        memset(cmd->payload, 0, *len);                                    \
-        return CXL_MBOX_SUCCESS;                                          \
-    }
-#define DEFINE_MAILBOX_HANDLER_NOP(name)                                  \
-    static ret_code cmd_##name(struct cxl_cmd *cmd,                       \
-                               CXLDeviceState *cxl_dstate, uint16_t *len) \
-    {                                                                     \
-        return CXL_MBOX_SUCCESS;                                          \
-    }
-
-DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
-DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
-
 static ret_code cmd_events_get_records(struct cxl_cmd *cmd,
                                        CXLDeviceState *cxlds,
                                        uint16_t *len)
@@ -218,6 +199,110 @@ static ret_code cmd_events_clear_records(struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+static ret_code cmd_events_get_interrupt_policy(struct cxl_cmd *cmd,
+                                                CXLDeviceState *cxl_dstate,
+                                                uint16_t *len)
+{
+    struct cxl_event_interrupt_policy *policy;
+    struct cxl_event_log *log;
+
+    policy = (struct cxl_event_interrupt_policy *)cmd->payload;
+    memset(policy, 0, sizeof(*policy));
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_INFO);
+    if (log->irq_enabled) {
+        policy->info_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
+    }
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_WARN);
+    if (log->irq_enabled) {
+        policy->warn_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
+    }
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FAIL);
+    if (log->irq_enabled) {
+        policy->failure_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
+    }
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FATAL);
+    if (log->irq_enabled) {
+        policy->fatal_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
+    }
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_DYNAMIC_CAP);
+    if (log->irq_enabled) {
+        /* Dynamic Capacity borrows the same vector as info */
+        policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
+    }
+
+    *len = sizeof(*policy);
+    return CXL_MBOX_SUCCESS;
+}
+
+static ret_code cmd_events_set_interrupt_policy(struct cxl_cmd *cmd,
+                                                CXLDeviceState *cxl_dstate,
+                                                uint16_t *len)
+{
+    struct cxl_event_interrupt_policy *policy;
+    struct cxl_event_log *log;
+
+    policy = (struct cxl_event_interrupt_policy *)cmd->payload;
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_INFO);
+    if ((policy->info_settings & CXL_EVENT_INT_MODE_MASK) ==
+                                                    CXL_INT_MSI_MSIX) {
+        log->irq_enabled = true;
+        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_INFO];
+    } else {
+        log->irq_enabled = false;
+        log->irq_vec = 0;
+    }
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_WARN);
+    if ((policy->warn_settings & CXL_EVENT_INT_MODE_MASK) ==
+                                                    CXL_INT_MSI_MSIX) {
+        log->irq_enabled = true;
+        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_WARN];
+    } else {
+        log->irq_enabled = false;
+        log->irq_vec = 0;
+    }
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FAIL);
+    if ((policy->failure_settings & CXL_EVENT_INT_MODE_MASK) ==
+                                                    CXL_INT_MSI_MSIX) {
+        log->irq_enabled = true;
+        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_FAIL];
+    } else {
+        log->irq_enabled = false;
+        log->irq_vec = 0;
+    }
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FATAL);
+    if ((policy->fatal_settings & CXL_EVENT_INT_MODE_MASK) ==
+                                                    CXL_INT_MSI_MSIX) {
+        log->irq_enabled = true;
+        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_FATAL];
+    } else {
+        log->irq_enabled = false;
+        log->irq_vec = 0;
+    }
+
+    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_DYNAMIC_CAP);
+    if ((policy->dyn_cap_settings & CXL_EVENT_INT_MODE_MASK) ==
+                                                    CXL_INT_MSI_MSIX) {
+        log->irq_enabled = true;
+        /* Dynamic Capacity borrows the same vector as info */
+        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_INFO];
+    } else {
+        log->irq_enabled = false;
+        log->irq_vec = 0;
+    }
+
+    *len = sizeof(*policy);
+    return CXL_MBOX_SUCCESS;
+}
+
 /* 8.2.9.2.1 */
 static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
                                              CXLDeviceState *cxl_dstate,
@@ -490,9 +575,11 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
     [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
         cmd_events_clear_records, 8, IMMEDIATE_LOG_CHANGE },
     [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
-        cmd_events_get_interrupt_policy, 0, 0 },
+                                      cmd_events_get_interrupt_policy, 0, 0 },
     [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
-        cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+                                      cmd_events_set_interrupt_policy,
+                                      sizeof(struct cxl_event_interrupt_policy),
+                                      IMMEDIATE_CONFIG_CHANGE },
     [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
         cmd_firmware_update_get_info, 0, 0 },
     [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 255111f3dcfb..c121e504a6db 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -170,4 +170,25 @@ struct cxl_event_mem_module {
     uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
 } QEMU_PACKED;
 
+/**
+ * Event Interrupt Policy
+ *
+ * CXL rev 3.0 section 8.2.9.2.4; Table 8-52
+ */
+enum cxl_event_int_mode {
+    CXL_INT_NONE     = 0x00,
+    CXL_INT_MSI_MSIX = 0x01,
+    CXL_INT_FW       = 0x02,
+    CXL_INT_RES      = 0x03,
+};
+#define CXL_EVENT_INT_MODE_MASK 0x3
+#define CXL_EVENT_INT_SETTING(vector) ((((uint8_t)vector & 0xf) << 4) | CXL_INT_MSI_MSIX)
+struct cxl_event_interrupt_policy {
+    uint8_t info_settings;
+    uint8_t warn_settings;
+    uint8_t failure_settings;
+    uint8_t fatal_settings;
+    uint8_t dyn_cap_settings;
+} QEMU_PACKED;
+
 #endif /* CXL_EVENTS_H */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support
  2022-10-10 22:29 [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support ira.weiny
                   ` (5 preceding siblings ...)
  2022-10-10 22:29 ` [RFC PATCH 6/6] hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy ira.weiny
@ 2022-10-10 22:45 ` Ira Weiny
  2022-10-11  9:40   ` Jonathan Cameron via
  7 siblings, 0 replies; 35+ messages in thread
From: Ira Weiny @ 2022-10-10 22:45 UTC (permalink / raw)
  To: Michael Tsirkin, Ben Widawsky, Jonathan Cameron; +Cc: qemu-devel, linux-cxl

On Mon, Oct 10, 2022 at 03:29:38PM -0700, Ira wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL Event records inform the OS of various CXL device events.  Thus far CXL
> memory devices are emulated and therefore don't naturally have events which
> will occur.
> 
> Add mock events and a HMP trigger mechanism to facilitate guest OS testing of
> event support.
> 
> This support requires a follow on version of the event patch set.  The RFC was
> submitted and discussed here:
> 
> 	https://lore.kernel.org/linux-cxl/20220813053243.757363-1-ira.weiny@intel.com/
> 
> I'll post the lore link to the new version shortly.

Kernel support now posted here:

	https://lore.kernel.org/all/20221010224131.1866246-1-ira.weiny@intel.com/

Ira

> 
> Instructions for running this test.
> 
> Add qmp option to qemu:
> 
> 	<host> $ qemu-system-x86_64 ... -qmp unix:/tmp/run_qemu_qmp_0,server,nowait ...
> 
> 	OR
> 
> 	<host> $ run_qemu.sh ... --qmp ...
> 
> Enable tracing of events within the guest:
> 
> 	<guest> $ echo "" > /sys/kernel/tracing/trace
> 	<guest> $ echo 1 > /sys/kernel/tracing/events/cxl/enable
> 	<guest> $ echo 1 > /sys/kernel/tracing/tracing_on
> 
> Trigger event generation and interrupts in the host:
> 
> 	<host> $ echo "cxl_event_inject cxl-devX" | qmp-shell -H /tmp/run_qemu_qmp_0
> 
> 	Where X == one of the memory devices; cxl-dev0 should work.
> 
> View events on the guest:
> 
> 	<guest> $ cat /sys/kernel/tracing/trace
> 
> 
> Ira Weiny (6):
>   qemu/bswap: Add const_le64()
>   qemu/uuid: Add UUID static initializer
>   hw/cxl/cxl-events: Add CXL mock events
>   hw/cxl/mailbox: Wire up get/clear event mailbox commands
>   hw/cxl/cxl-events: Add event interrupt support
>   hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy
> 
>  hmp-commands.hx             |  14 ++
>  hw/cxl/cxl-device-utils.c   |   1 +
>  hw/cxl/cxl-events.c         | 330 ++++++++++++++++++++++++++++++++++++
>  hw/cxl/cxl-host-stubs.c     |   5 +
>  hw/cxl/cxl-mailbox-utils.c  | 224 +++++++++++++++++++++---
>  hw/cxl/meson.build          |   1 +
>  hw/mem/cxl_type3.c          |   7 +-
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h | 194 +++++++++++++++++++++
>  include/qemu/bswap.h        |  10 ++
>  include/qemu/uuid.h         |  12 ++
>  include/sysemu/sysemu.h     |   3 +
>  12 files changed, 802 insertions(+), 21 deletions(-)
>  create mode 100644 hw/cxl/cxl-events.c
>  create mode 100644 include/hw/cxl/cxl_events.h
> 
> 
> base-commit: 6f7f81898e4437ea544ee4ca24bef7ec543b1f06
> -- 
> 2.37.2
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/6] qemu/bswap: Add const_le64()
  2022-10-10 22:29 ` [RFC PATCH 1/6] qemu/bswap: Add const_le64() ira.weiny
@ 2022-10-11  9:03     ` Jonathan Cameron via
  2022-10-11  9:48   ` Peter Maydell
  1 sibling, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2022-10-11  9:03 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:39 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Gcc requires constant versions of cpu_to_le* calls.
> 
> Add a 64 bit version.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Seems reasonable to me but I'm not an expert in this stuff.
FWIW

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

There are probably a lot of places in the CXL emulation where
our endian handling isn't correct but so far it hasn't mattered
as all the supported architectures are little endian.

Good to not introduce more cases however!

Jonathan


> ---
>  include/qemu/bswap.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
> index 346d05f2aab3..08e607821102 100644
> --- a/include/qemu/bswap.h
> +++ b/include/qemu/bswap.h
> @@ -192,10 +192,20 @@ CPU_CONVERT(le, 64, uint64_t)
>       (((_x) & 0x0000ff00U) <<  8) |              \
>       (((_x) & 0x00ff0000U) >>  8) |              \
>       (((_x) & 0xff000000U) >> 24))
> +# define const_le64(_x)                          \
> +    ((((_x) & 0x00000000000000ffU) << 56) |      \
> +     (((_x) & 0x000000000000ff00U) << 40) |      \
> +     (((_x) & 0x0000000000ff0000U) << 24) |      \
> +     (((_x) & 0x00000000ff000000U) <<  8) |      \
> +     (((_x) & 0x000000ff00000000U) >>  8) |      \
> +     (((_x) & 0x0000ff0000000000U) >> 24) |      \
> +     (((_x) & 0x00ff000000000000U) >> 40) |      \
> +     (((_x) & 0xff00000000000000U) >> 56))
>  # define const_le16(_x)                          \
>      ((((_x) & 0x00ff) << 8) |                    \
>       (((_x) & 0xff00) >> 8))
>  #else
> +# define const_le64(_x) (_x)
>  # define const_le32(_x) (_x)
>  # define const_le16(_x) (_x)
>  #endif


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/6] qemu/bswap: Add const_le64()
@ 2022-10-11  9:03     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron via @ 2022-10-11  9:03 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:39 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Gcc requires constant versions of cpu_to_le* calls.
> 
> Add a 64 bit version.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Seems reasonable to me but I'm not an expert in this stuff.
FWIW

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

There are probably a lot of places in the CXL emulation where
our endian handling isn't correct but so far it hasn't mattered
as all the supported architectures are little endian.

Good to not introduce more cases however!

Jonathan


> ---
>  include/qemu/bswap.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
> index 346d05f2aab3..08e607821102 100644
> --- a/include/qemu/bswap.h
> +++ b/include/qemu/bswap.h
> @@ -192,10 +192,20 @@ CPU_CONVERT(le, 64, uint64_t)
>       (((_x) & 0x0000ff00U) <<  8) |              \
>       (((_x) & 0x00ff0000U) >>  8) |              \
>       (((_x) & 0xff000000U) >> 24))
> +# define const_le64(_x)                          \
> +    ((((_x) & 0x00000000000000ffU) << 56) |      \
> +     (((_x) & 0x000000000000ff00U) << 40) |      \
> +     (((_x) & 0x0000000000ff0000U) << 24) |      \
> +     (((_x) & 0x00000000ff000000U) <<  8) |      \
> +     (((_x) & 0x000000ff00000000U) >>  8) |      \
> +     (((_x) & 0x0000ff0000000000U) >> 24) |      \
> +     (((_x) & 0x00ff000000000000U) >> 40) |      \
> +     (((_x) & 0xff00000000000000U) >> 56))
>  # define const_le16(_x)                          \
>      ((((_x) & 0x00ff) << 8) |                    \
>       (((_x) & 0xff00) >> 8))
>  #else
> +# define const_le64(_x) (_x)
>  # define const_le32(_x) (_x)
>  # define const_le16(_x) (_x)
>  #endif



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] qemu/uuid: Add UUID static initializer
  2022-10-10 22:29 ` [RFC PATCH 2/6] qemu/uuid: Add UUID static initializer ira.weiny
@ 2022-10-11  9:13     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2022-10-11  9:13 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:40 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> UUID's are defined as network byte order fields.  No static initializer
> was available for UUID's in their standard big endian format.
> 
> Define a big endian initializer for UUIDs.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Seems sensible.  Would allow a cleanup in the existing cel_uuid handling
in the CXL code where we use a static for this and end up filling it
with the same value multiple times which is less than ideal...
A quick grep and for qemu_uuid_parse() suggests there are other cases
where it's passed a constant string.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  include/qemu/uuid.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
> index 9925febfa54d..dc40ee1fc998 100644
> --- a/include/qemu/uuid.h
> +++ b/include/qemu/uuid.h
> @@ -61,6 +61,18 @@ typedef struct {
>      (clock_seq_hi_and_reserved), (clock_seq_low), (node0), (node1), (node2),\
>      (node3), (node4), (node5) }
>  
> +/* Normal (network byte order) UUID */
> +#define UUID(time_low, time_mid, time_hi_and_version,                    \
> +  clock_seq_hi_and_reserved, clock_seq_low, node0, node1, node2,         \
> +  node3, node4, node5)                                                   \
> +  { ((time_low) >> 24) & 0xff, ((time_low) >> 16) & 0xff,                \
> +    ((time_low) >> 8) & 0xff, (time_low) & 0xff,                         \
> +    ((time_mid) >> 8) & 0xff, (time_mid) & 0xff,                         \
> +    ((time_hi_and_version) >> 8) & 0xff, (time_hi_and_version) & 0xff,   \
> +    (clock_seq_hi_and_reserved), (clock_seq_low),                        \
> +    (node0), (node1), (node2), (node3), (node4), (node5)                 \
> +  }
> +
>  #define UUID_FMT "%02hhx%02hhx%02hhx%02hhx-" \
>                   "%02hhx%02hhx-%02hhx%02hhx-" \
>                   "%02hhx%02hhx-" \


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] qemu/uuid: Add UUID static initializer
@ 2022-10-11  9:13     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron via @ 2022-10-11  9:13 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:40 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> UUID's are defined as network byte order fields.  No static initializer
> was available for UUID's in their standard big endian format.
> 
> Define a big endian initializer for UUIDs.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Seems sensible.  Would allow a cleanup in the existing cel_uuid handling
in the CXL code where we use a static for this and end up filling it
with the same value multiple times which is less than ideal...
A quick grep and for qemu_uuid_parse() suggests there are other cases
where it's passed a constant string.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  include/qemu/uuid.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
> index 9925febfa54d..dc40ee1fc998 100644
> --- a/include/qemu/uuid.h
> +++ b/include/qemu/uuid.h
> @@ -61,6 +61,18 @@ typedef struct {
>      (clock_seq_hi_and_reserved), (clock_seq_low), (node0), (node1), (node2),\
>      (node3), (node4), (node5) }
>  
> +/* Normal (network byte order) UUID */
> +#define UUID(time_low, time_mid, time_hi_and_version,                    \
> +  clock_seq_hi_and_reserved, clock_seq_low, node0, node1, node2,         \
> +  node3, node4, node5)                                                   \
> +  { ((time_low) >> 24) & 0xff, ((time_low) >> 16) & 0xff,                \
> +    ((time_low) >> 8) & 0xff, (time_low) & 0xff,                         \
> +    ((time_mid) >> 8) & 0xff, (time_mid) & 0xff,                         \
> +    ((time_hi_and_version) >> 8) & 0xff, (time_hi_and_version) & 0xff,   \
> +    (clock_seq_hi_and_reserved), (clock_seq_low),                        \
> +    (node0), (node1), (node2), (node3), (node4), (node5)                 \
> +  }
> +
>  #define UUID_FMT "%02hhx%02hhx%02hhx%02hhx-" \
>                   "%02hhx%02hhx-%02hhx%02hhx-" \
>                   "%02hhx%02hhx-" \



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support
  2022-10-10 22:29 [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support ira.weiny
@ 2022-10-11  9:40   ` Jonathan Cameron via
  2022-10-10 22:29 ` [RFC PATCH 2/6] qemu/uuid: Add UUID static initializer ira.weiny
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2022-10-11  9:40 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:38 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL Event records inform the OS of various CXL device events.  Thus far CXL
> memory devices are emulated and therefore don't naturally have events which
> will occur.
> 
> Add mock events and a HMP trigger mechanism to facilitate guest OS testing of
> event support.
> 
> This support requires a follow on version of the event patch set.  The RFC was
> submitted and discussed here:
> 
> 	https://lore.kernel.org/linux-cxl/20220813053243.757363-1-ira.weiny@intel.com/
> 
> I'll post the lore link to the new version shortly.
> 
> Instructions for running this test.
> 
> Add qmp option to qemu:
> 
> 	<host> $ qemu-system-x86_64 ... -qmp unix:/tmp/run_qemu_qmp_0,server,nowait ...
> 
> 	OR
> 
> 	<host> $ run_qemu.sh ... --qmp ...
> 
> Enable tracing of events within the guest:
> 
> 	<guest> $ echo "" > /sys/kernel/tracing/trace
> 	<guest> $ echo 1 > /sys/kernel/tracing/events/cxl/enable
> 	<guest> $ echo 1 > /sys/kernel/tracing/tracing_on
> 
> Trigger event generation and interrupts in the host:
> 
> 	<host> $ echo "cxl_event_inject cxl-devX" | qmp-shell -H /tmp/run_qemu_qmp_0
> 
> 	Where X == one of the memory devices; cxl-dev0 should work.
> 
> View events on the guest:
> 
> 	<guest> $ cat /sys/kernel/tracing/trace

Hi Ira,

Why is this an RFC rather than a patch set to apply?

It's useful to have that in the cover letter so we can focus on what
you want comments on (rather than simply review).

Thanks,

Jonathan

> 
> 
> Ira Weiny (6):
>   qemu/bswap: Add const_le64()
>   qemu/uuid: Add UUID static initializer
>   hw/cxl/cxl-events: Add CXL mock events
>   hw/cxl/mailbox: Wire up get/clear event mailbox commands
>   hw/cxl/cxl-events: Add event interrupt support
>   hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy
> 
>  hmp-commands.hx             |  14 ++
>  hw/cxl/cxl-device-utils.c   |   1 +
>  hw/cxl/cxl-events.c         | 330 ++++++++++++++++++++++++++++++++++++
>  hw/cxl/cxl-host-stubs.c     |   5 +
>  hw/cxl/cxl-mailbox-utils.c  | 224 +++++++++++++++++++++---
>  hw/cxl/meson.build          |   1 +
>  hw/mem/cxl_type3.c          |   7 +-
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h | 194 +++++++++++++++++++++
>  include/qemu/bswap.h        |  10 ++
>  include/qemu/uuid.h         |  12 ++
>  include/sysemu/sysemu.h     |   3 +
>  12 files changed, 802 insertions(+), 21 deletions(-)
>  create mode 100644 hw/cxl/cxl-events.c
>  create mode 100644 include/hw/cxl/cxl_events.h
> 
> 
> base-commit: 6f7f81898e4437ea544ee4ca24bef7ec543b1f06


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support
@ 2022-10-11  9:40   ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron via @ 2022-10-11  9:40 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:38 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL Event records inform the OS of various CXL device events.  Thus far CXL
> memory devices are emulated and therefore don't naturally have events which
> will occur.
> 
> Add mock events and a HMP trigger mechanism to facilitate guest OS testing of
> event support.
> 
> This support requires a follow on version of the event patch set.  The RFC was
> submitted and discussed here:
> 
> 	https://lore.kernel.org/linux-cxl/20220813053243.757363-1-ira.weiny@intel.com/
> 
> I'll post the lore link to the new version shortly.
> 
> Instructions for running this test.
> 
> Add qmp option to qemu:
> 
> 	<host> $ qemu-system-x86_64 ... -qmp unix:/tmp/run_qemu_qmp_0,server,nowait ...
> 
> 	OR
> 
> 	<host> $ run_qemu.sh ... --qmp ...
> 
> Enable tracing of events within the guest:
> 
> 	<guest> $ echo "" > /sys/kernel/tracing/trace
> 	<guest> $ echo 1 > /sys/kernel/tracing/events/cxl/enable
> 	<guest> $ echo 1 > /sys/kernel/tracing/tracing_on
> 
> Trigger event generation and interrupts in the host:
> 
> 	<host> $ echo "cxl_event_inject cxl-devX" | qmp-shell -H /tmp/run_qemu_qmp_0
> 
> 	Where X == one of the memory devices; cxl-dev0 should work.
> 
> View events on the guest:
> 
> 	<guest> $ cat /sys/kernel/tracing/trace

Hi Ira,

Why is this an RFC rather than a patch set to apply?

It's useful to have that in the cover letter so we can focus on what
you want comments on (rather than simply review).

Thanks,

Jonathan

> 
> 
> Ira Weiny (6):
>   qemu/bswap: Add const_le64()
>   qemu/uuid: Add UUID static initializer
>   hw/cxl/cxl-events: Add CXL mock events
>   hw/cxl/mailbox: Wire up get/clear event mailbox commands
>   hw/cxl/cxl-events: Add event interrupt support
>   hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy
> 
>  hmp-commands.hx             |  14 ++
>  hw/cxl/cxl-device-utils.c   |   1 +
>  hw/cxl/cxl-events.c         | 330 ++++++++++++++++++++++++++++++++++++
>  hw/cxl/cxl-host-stubs.c     |   5 +
>  hw/cxl/cxl-mailbox-utils.c  | 224 +++++++++++++++++++++---
>  hw/cxl/meson.build          |   1 +
>  hw/mem/cxl_type3.c          |   7 +-
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h | 194 +++++++++++++++++++++
>  include/qemu/bswap.h        |  10 ++
>  include/qemu/uuid.h         |  12 ++
>  include/sysemu/sysemu.h     |   3 +
>  12 files changed, 802 insertions(+), 21 deletions(-)
>  create mode 100644 hw/cxl/cxl-events.c
>  create mode 100644 include/hw/cxl/cxl_events.h
> 
> 
> base-commit: 6f7f81898e4437ea544ee4ca24bef7ec543b1f06



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/6] qemu/bswap: Add const_le64()
  2022-10-10 22:29 ` [RFC PATCH 1/6] qemu/bswap: Add const_le64() ira.weiny
  2022-10-11  9:03     ` Jonathan Cameron via
@ 2022-10-11  9:48   ` Peter Maydell
  2022-10-11 15:22     ` Richard Henderson
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Maydell @ 2022-10-11  9:48 UTC (permalink / raw)
  To: ira.weiny
  Cc: Michael Tsirkin, Ben Widawsky, Jonathan Cameron, qemu-devel,
	linux-cxl, Richard Henderson

On Mon, 10 Oct 2022 at 23:48, <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> Gcc requires constant versions of cpu_to_le* calls.
>
> Add a 64 bit version.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  include/qemu/bswap.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
> index 346d05f2aab3..08e607821102 100644
> --- a/include/qemu/bswap.h
> +++ b/include/qemu/bswap.h
> @@ -192,10 +192,20 @@ CPU_CONVERT(le, 64, uint64_t)
>       (((_x) & 0x0000ff00U) <<  8) |              \
>       (((_x) & 0x00ff0000U) >>  8) |              \
>       (((_x) & 0xff000000U) >> 24))
> +# define const_le64(_x)                          \
> +    ((((_x) & 0x00000000000000ffU) << 56) |      \
> +     (((_x) & 0x000000000000ff00U) << 40) |      \
> +     (((_x) & 0x0000000000ff0000U) << 24) |      \
> +     (((_x) & 0x00000000ff000000U) <<  8) |      \
> +     (((_x) & 0x000000ff00000000U) >>  8) |      \
> +     (((_x) & 0x0000ff0000000000U) >> 24) |      \
> +     (((_x) & 0x00ff000000000000U) >> 40) |      \
> +     (((_x) & 0xff00000000000000U) >> 56))

Can you add this in the right place, ie above the const_le32()
definition, please ?

>  # define const_le16(_x)                          \
>      ((((_x) & 0x00ff) << 8) |                    \
>       (((_x) & 0xff00) >> 8))
>  #else
> +# define const_le64(_x) (_x)
>  # define const_le32(_x) (_x)
>  # define const_le16(_x) (_x)
>  #endif

This is kind of a weird API, because:
 * it only exists for little-endian, not big-endian
 * we use it in exactly two files (linux-user/elfload.c and
   hw/input/virtio-input-hid.c)

which leaves me wondering if there's a better way of doing
it that I'm missing. But maybe it's just that we never filled
out the missing bits of the API surface because we haven't
needed them yet. Richard ?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
  2022-10-10 22:29 ` [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events ira.weiny
@ 2022-10-11 10:07     ` Jonathan Cameron via
  2022-12-19 10:07     ` Jonathan Cameron via
  1 sibling, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2022-10-11 10:07 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:41 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> To facilitate testing of guest software add mock events and code to
> support iterating through the event logs.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Various comments inline, but biggest one is I'd like to see
a much more flexible injection interface.  Happy to help code one
up if that is useful.

Jonathan


> ---
>  hw/cxl/cxl-events.c         | 248 ++++++++++++++++++++++++++++++++++++
>  hw/cxl/meson.build          |   1 +
>  include/hw/cxl/cxl_device.h |  19 +++
>  include/hw/cxl/cxl_events.h | 173 +++++++++++++++++++++++++
>  4 files changed, 441 insertions(+)
>  create mode 100644 hw/cxl/cxl-events.c
>  create mode 100644 include/hw/cxl/cxl_events.h
> 
> diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
> new file mode 100644
> index 000000000000..c275280bcb64
> --- /dev/null
> +++ b/hw/cxl/cxl-events.c
> @@ -0,0 +1,248 @@
> +/*
> + * CXL Event processing
> + *
> + * Copyright(C) 2022 Intel Corporation.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#include <stdint.h>
> +
> +#include "qemu/osdep.h"
> +#include "qemu/bswap.h"
> +#include "qemu/typedefs.h"
> +#include "hw/cxl/cxl.h"
> +#include "hw/cxl/cxl_events.h"
> +
> +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type)
> +{
> +    if (log_type >= CXL_EVENT_TYPE_MAX) {
> +        return NULL;
> +    }
> +    return &cxlds->event_logs[log_type];
> +}
> +
> +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log)
> +{
> +    return log->events[log->cur_event];
> +}
> +
> +uint16_t get_cur_event_handle(struct cxl_event_log *log)
> +{
> +    return cpu_to_le16(log->cur_event);
> +}
> +
> +bool log_empty(struct cxl_event_log *log)
> +{
> +    return log->cur_event == log->nr_events;
> +}
> +
> +int log_rec_left(struct cxl_event_log *log)
> +{
> +    return log->nr_events - log->cur_event;
> +}
> +
> +static void event_store_add_event(CXLDeviceState *cxlds,
> +                                  enum cxl_event_log_type log_type,
> +                                  struct cxl_event_record_raw *event)
> +{
> +    struct cxl_event_log *log;
> +
> +    assert(log_type < CXL_EVENT_TYPE_MAX);
> +
> +    log = &cxlds->event_logs[log_type];
> +    assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);
> +
> +    log->events[log->nr_events] = event;
> +    log->nr_events++;
> +}
> +
> +uint16_t log_overflow(struct cxl_event_log *log)
> +{
> +    int cnt = log_rec_left(log) - 5;

Why -5?  Can't we make it actually overflow and drop records
if that happens?

> +
> +    if (cnt < 0) {
> +        return 0;
> +    }
> +    return cnt;
> +}
> +
> +#define CXL_EVENT_RECORD_FLAG_PERMANENT         BIT(2)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED      BIT(3)
> +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED     BIT(4)
> +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE        BIT(5)
> +
> +struct cxl_event_record_raw maint_needed = {
> +    .hdr = {
> +        .id.data = UUID(0xDEADBEEF, 0xCAFE, 0xBABE,
> +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> +        .length = sizeof(struct cxl_event_record_raw),
> +        .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0xa5b6),
> +    },
> +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +struct cxl_event_record_raw hardware_replace = {
> +    .hdr = {
> +        .id.data = UUID(0xBABECAFE, 0xBEEF, 0xDEAD,
> +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> +        .length = sizeof(struct cxl_event_record_raw),
> +        .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0xb6a5),
> +    },
> +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT            BIT(0)
> +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT               BIT(1)
> +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW          BIT(2)
> +
> +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR                 0x00
> +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR                  0x01
> +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR           0x02
> +
> +#define CXL_GMER_TRANS_UNKNOWN                          0x00
> +#define CXL_GMER_TRANS_HOST_READ                        0x01
> +#define CXL_GMER_TRANS_HOST_WRITE                       0x02
> +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA                  0x03
> +#define CXL_GMER_TRANS_HOST_INJECT_POISON               0x04
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB             0x05
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT        0x06
> +
> +#define CXL_GMER_VALID_CHANNEL                          BIT(0)
> +#define CXL_GMER_VALID_RANK                             BIT(1)
> +#define CXL_GMER_VALID_DEVICE                           BIT(2)
> +#define CXL_GMER_VALID_COMPONENT                        BIT(3)
> +
> +struct cxl_event_gen_media gen_media = {
> +    .hdr = {
> +        .id.data = UUID(0xfbcd0a77, 0xc260, 0x417f,
> +                        0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> +        .length = sizeof(struct cxl_event_gen_media),
> +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0),
> +    },
> +    .phys_addr = const_le64(0x2000),
> +    .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> +    .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> +    .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> +    .validity_flags = { CXL_GMER_VALID_CHANNEL |
> +                        CXL_GMER_VALID_RANK, 0 },
> +    .channel = 1,
> +    .rank = 30
> +};
> +
> +#define CXL_DER_VALID_CHANNEL                           BIT(0)
> +#define CXL_DER_VALID_RANK                              BIT(1)
> +#define CXL_DER_VALID_NIBBLE                            BIT(2)
> +#define CXL_DER_VALID_BANK_GROUP                        BIT(3)
> +#define CXL_DER_VALID_BANK                              BIT(4)
> +#define CXL_DER_VALID_ROW                               BIT(5)
> +#define CXL_DER_VALID_COLUMN                            BIT(6)
> +#define CXL_DER_VALID_CORRECTION_MASK                   BIT(7)
> +
> +struct cxl_event_dram dram = {
> +    .hdr = {
> +        .id.data = UUID(0x601dcbb3, 0x9c06, 0x4eab,
> +                        0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> +        .length = sizeof(struct cxl_event_dram),
> +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0),
> +    },
> +    .phys_addr = const_le64(0x8000),
> +    .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> +    .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> +    .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> +    .validity_flags = { CXL_DER_VALID_CHANNEL |
> +                        CXL_DER_VALID_BANK_GROUP |
> +                        CXL_DER_VALID_BANK |
> +                        CXL_DER_VALID_COLUMN, 0 },
> +    .channel = 1,
> +    .bank_group = 5,
> +    .bank = 2,
> +    .column = { 0xDE, 0xAD},
> +};
> +
> +#define CXL_MMER_HEALTH_STATUS_CHANGE           0x00
> +#define CXL_MMER_MEDIA_STATUS_CHANGE            0x01
> +#define CXL_MMER_LIFE_USED_CHANGE               0x02
> +#define CXL_MMER_TEMP_CHANGE                    0x03
> +#define CXL_MMER_DATA_PATH_ERROR                0x04
> +#define CXL_MMER_LAS_ERROR                      0x05

Ah this explains why I didn't find these alongside the structures.
I'd keep them together.  If we need to put the structures in a header
then put the defines there as well.  Puts all the spec related
stuff in one place.

> +
> +#define CXL_DHI_HS_MAINTENANCE_NEEDED           BIT(0)
> +#define CXL_DHI_HS_PERFORMANCE_DEGRADED         BIT(1)
> +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED        BIT(2)
> +
> +#define CXL_DHI_MS_NORMAL                                    0x00
> +#define CXL_DHI_MS_NOT_READY                                 0x01
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST                    0x02
> +#define CXL_DHI_MS_ALL_DATA_LOST                             0x03
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS   0x04
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN     0x05
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT           0x06
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS      0x07
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN        0x08
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT              0x09
> +
> +#define CXL_DHI_AS_NORMAL               0x0
> +#define CXL_DHI_AS_WARNING              0x1
> +#define CXL_DHI_AS_CRITICAL             0x2
> +
> +#define CXL_DHI_AS_LIFE_USED(as)        (as & 0x3)
> +#define CXL_DHI_AS_DEV_TEMP(as)         ((as & 0xC) >> 2)
> +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)  ((as & 0x10) >> 4)
> +#define CXL_DHI_AS_COR_PER_ERR_CNT(as)  ((as & 0x20) >> 5)
> +
> +struct cxl_event_mem_module mem_module = {
> +    .hdr = {
> +        .id.data = UUID(0xfe927475, 0xdd59, 0x4339,
> +                        0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),

As mentioned, below a UUID define for each type in the header
probably makes more sense than having this huge thing inline.

> +        .length = sizeof(struct cxl_event_mem_module),
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0),
> +    },
> +    .event_type = CXL_MMER_TEMP_CHANGE,
> +    .info = {
> +        .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> +        .media_status = CXL_DHI_MS_ALL_DATA_LOST,
> +        .add_status = (CXL_DHI_AS_CRITICAL << 2) |
> +                       (CXL_DHI_AS_WARNING << 4) |
> +                       (CXL_DHI_AS_WARNING << 5),
> +        .device_temp = { 0xDE, 0xAD},

odd spacing

> +        .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
> +        .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },

Could make a reasonable number up rather than deadbeef ;)

> +        .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> +    }
> +};
> +
> +void cxl_mock_add_event_logs(CXLDeviceState *cxlds)
> +{

This is fine for initial testing, but I Think we want to be more
sophisticated with the injection interface and allow injecting
individual events so we can move the requirement for 'coverage'
testing from having a representative list here to an external script
that hits all the corners.

I can build something on top of this that lets us doing that if you like.
I have ancient code doing the equivalent for CCIX devices that I never
upstreamed. Would probably do it a bit differently today but principle
is the same. Using QMP  directly rather than qmp-shell lets you do it
as json which ends up more readable than complex command lines for this
sort of structure command.



> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO, &maint_needed);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> +                          (struct cxl_event_record_raw *)&gen_media);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> +                          (struct cxl_event_record_raw *)&mem_module);
> +
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &maint_needed);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> +                          (struct cxl_event_record_raw *)&dram);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> +                          (struct cxl_event_record_raw *)&gen_media);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> +                          (struct cxl_event_record_raw *)&mem_module);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> +                          (struct cxl_event_record_raw *)&dram);
> +
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL,
> +                          (struct cxl_event_record_raw *)&dram);
> +}


> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 7b4cff569347..46c50c1c13a6 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -11,6 +11,7 @@
>  #define CXL_DEVICE_H
>  
>  #include "hw/register.h"
> +#include "hw/cxl/cxl_events.h"
>  
>  /*
>   * The following is how a CXL device's Memory Device registers are laid out.
> @@ -80,6 +81,14 @@
>      (CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH +     \
>       CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
>  
> +#define CXL_TEST_EVENT_CNT_MAX 15

Where did 15 come from?

> +
> +struct cxl_event_log {
> +    int cur_event;
> +    int nr_events;
> +    struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
> +};
> +
>  typedef struct cxl_device_state {
>      MemoryRegion device_registers;
>  
> @@ -119,6 +128,8 @@ typedef struct cxl_device_state {
>  
>      /* memory region for persistent memory, HDM */
>      uint64_t pmem_size;
> +
> +    struct cxl_event_log event_logs[CXL_EVENT_TYPE_MAX];
>  } CXLDeviceState;
>  
>  /* Initialize the register block for a device */
> @@ -272,4 +283,12 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
>  MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
>                              unsigned size, MemTxAttrs attrs);
>  
> +void cxl_mock_add_event_logs(CXLDeviceState *cxlds);
> +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type);
> +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log);
> +uint16_t get_cur_event_handle(struct cxl_event_log *log);
> +bool log_empty(struct cxl_event_log *log);
> +int log_rec_left(struct cxl_event_log *log);
> +uint16_t log_overflow(struct cxl_event_log *log);
> +
>  #endif
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> new file mode 100644
> index 000000000000..255111f3dcfb
> --- /dev/null
> +++ b/include/hw/cxl/cxl_events.h
> @@ -0,0 +1,173 @@
> +/*
> + * QEMU CXL Events
> + *
> + * Copyright (c) 2022 Intel
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef CXL_EVENTS_H
> +#define CXL_EVENTS_H
> +
> +#include "qemu/uuid.h"
> +#include "hw/cxl/cxl.h"
> +
> +/*
> + * Common Event Record Format
> + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +#define CXL_EVENT_REC_HDR_RES_LEN 0xf

I don't see an advantage in this define vs just
putting the value in directly below.
Same with similar cases - the define must makes them
a tiny bit harder to compare with the specification when
reviewing.

> +struct cxl_event_record_hdr {
> +    QemuUUID id;
> +    uint8_t length;
> +    uint8_t flags[3];
> +    uint16_t handle;
> +    uint16_t related_handle;
> +    uint64_t timestamp;
> +    uint8_t maint_op_class;
> +    uint8_t reserved[CXL_EVENT_REC_HDR_RES_LEN];
> +} QEMU_PACKED;
> +
> +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> +struct cxl_event_record_raw {
> +    struct cxl_event_record_hdr hdr;
> +    uint8_t data[CXL_EVENT_RECORD_DATA_LENGTH];
> +} QEMU_PACKED;

Hmm. I wonder if we should instead define this as a union of
the known event types?  I haven't checked if it would work
everywhere yet though.

> +
> +/*
> + * Get Event Records output payload
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> + *
> + * Space given for 1 record
> + */
> +#define CXL_GET_EVENT_FLAG_OVERFLOW     BIT(0)
> +#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
> +struct cxl_get_event_payload {
> +    uint8_t flags;
> +    uint8_t reserved1;
> +    uint16_t overflow_err_count;
> +    uint64_t first_overflow_timestamp;
> +    uint64_t last_overflow_timestamp;
> +    uint16_t record_count;
> +    uint8_t reserved2[0xa];
> +    struct cxl_event_record_raw record;

This last element should be a [] array and then move
the handling of different record counts to the places it
is used.

Spec unfortunately says that we should return as many
as we can fit, so we can't rely on the users of this interface
only sending a request for one record (as I think your Linux
kernel code currently does). See below for more on this...


> +} QEMU_PACKED;
> +
> +/*
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> + */
> +enum cxl_event_log_type {
> +    CXL_EVENT_TYPE_INFO = 0x00,
> +    CXL_EVENT_TYPE_WARN,
> +    CXL_EVENT_TYPE_FAIL,
> +    CXL_EVENT_TYPE_FATAL,
> +    CXL_EVENT_TYPE_DYNAMIC_CAP,
> +    CXL_EVENT_TYPE_MAX
> +};
> +
> +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> +{
> +    switch (type) {
> +    case CXL_EVENT_TYPE_INFO:
> +        return "Informational";
> +    case CXL_EVENT_TYPE_WARN:
> +        return "Warning";
> +    case CXL_EVENT_TYPE_FAIL:
> +        return "Failure";
> +    case CXL_EVENT_TYPE_FATAL:
> +        return "Fatal";
> +    case CXL_EVENT_TYPE_DYNAMIC_CAP:
> +        return "Dynamic Capacity";
> +    default:
> +        break;
> +    }
> +    return "<unknown>";
> +}
> +
> +/*
> + * Clear Event Records input payload
> + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> + *
> + * Space given for 1 record

I'd rather this was defined to have a trailing variable length
array of handles and allocations and then wherever it was used
we deal with the length.

I'm also nervous about limiting the qemu emulation to handling only
one record.. Spec wise I don't think you are allowed to say
no to larger clears.  I understand the fact we can't test this today
with the kernel code but maybe we can hack together enough to
verify the emulation of larger gets and clears...


> + */
> +struct cxl_mbox_clear_event_payload {
> +    uint8_t event_log;      /* enum cxl_event_log_type */
> +    uint8_t clear_flags;
> +    uint8_t nr_recs;        /* 1 for this struct */
> +    uint8_t reserved[3];
> +    uint16_t handle;
> +};
> +
> +/*
> + * General Media Event Record
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> + */

In interests of keeping everything that needs checking against
a chunk of the spec together, perhaps it's worth adding appropriate
defines for the UUIDs?

> +#define CXL_EVENT_GEN_MED_COMP_ID_SIZE  0x10
> +#define CXL_EVENT_GEN_MED_RES_SIZE      0x2e

As above, I'd rather see these inline.

> +struct cxl_event_gen_media {
> +    struct cxl_event_record_hdr hdr;
> +    uint64_t phys_addr;
Defines for the mask + that we have a few things hiding in
the bottom bits?

> +    uint8_t descriptor;
Defines for the various fields in here?

> +    uint8_t type;
Same for the others that follow.

> +    uint8_t transaction_type;

> +    uint8_t validity_flags[2];

uint16_t probably makes sense as we can do that for this one (unlike the helpful le24 flags fields
in other structures).

> +    uint8_t channel;
> +    uint8_t rank;
> +    uint8_t device[3];
> +    uint8_t component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
> +    uint8_t reserved[CXL_EVENT_GEN_MED_RES_SIZE];
> +} QEMU_PACKED;
Would be nice to add a build time check that these structures have the correct
overall size. Ben did a bunch of these in the other CXL emulation and they are
a handy way to reassure reviewers that it adds up right!

> +
> +/*
> + * DRAM Event Record - DER
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
> + */
> +#define CXL_EVENT_DER_CORRECTION_MASK_SIZE   0x20
> +#define CXL_EVENT_DER_RES_SIZE               0x17
Same as above.

> +struct cxl_event_dram {
> +    struct cxl_event_record_hdr hdr;
> +    uint64_t phys_addr;
As before I'd like defines for the sub fields and masks.

> +    uint8_t descriptor;
> +    uint8_t type;
> +    uint8_t transaction_type;
> +    uint8_t validity_flags[2];
uint16_t and same in similar cases.

> +    uint8_t channel;
> +    uint8_t rank;
> +    uint8_t nibble_mask[3];
> +    uint8_t bank_group;
> +    uint8_t bank;
> +    uint8_t row[3];
> +    uint8_t column[2];
> +    uint8_t correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
> +    uint8_t reserved[CXL_EVENT_DER_RES_SIZE];
> +} QEMU_PACKED;
> +
> +/*
> + * Get Health Info Record
> + * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +struct cxl_get_health_info {
Same stuff as for earlier structures.

> +    uint8_t health_status;
> +    uint8_t media_status;
> +    uint8_t add_status;
> +    uint8_t life_used;
> +    uint8_t device_temp[2];
> +    uint8_t dirty_shutdown_cnt[4];
> +    uint8_t cor_vol_err_cnt[4];
> +    uint8_t cor_per_err_cnt[4];
> +} QEMU_PACKED;
> +
> +/*
> + * Memory Module Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +#define CXL_EVENT_MEM_MOD_RES_SIZE  0x3d
> +struct cxl_event_mem_module {
> +    struct cxl_event_record_hdr hdr;
> +    uint8_t event_type;
> +    struct cxl_get_health_info info;
> +    uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
> +} QEMU_PACKED;
> +
> +#endif /* CXL_EVENTS_H */


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
@ 2022-10-11 10:07     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron via @ 2022-10-11 10:07 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:41 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> To facilitate testing of guest software add mock events and code to
> support iterating through the event logs.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Various comments inline, but biggest one is I'd like to see
a much more flexible injection interface.  Happy to help code one
up if that is useful.

Jonathan


> ---
>  hw/cxl/cxl-events.c         | 248 ++++++++++++++++++++++++++++++++++++
>  hw/cxl/meson.build          |   1 +
>  include/hw/cxl/cxl_device.h |  19 +++
>  include/hw/cxl/cxl_events.h | 173 +++++++++++++++++++++++++
>  4 files changed, 441 insertions(+)
>  create mode 100644 hw/cxl/cxl-events.c
>  create mode 100644 include/hw/cxl/cxl_events.h
> 
> diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
> new file mode 100644
> index 000000000000..c275280bcb64
> --- /dev/null
> +++ b/hw/cxl/cxl-events.c
> @@ -0,0 +1,248 @@
> +/*
> + * CXL Event processing
> + *
> + * Copyright(C) 2022 Intel Corporation.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#include <stdint.h>
> +
> +#include "qemu/osdep.h"
> +#include "qemu/bswap.h"
> +#include "qemu/typedefs.h"
> +#include "hw/cxl/cxl.h"
> +#include "hw/cxl/cxl_events.h"
> +
> +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type)
> +{
> +    if (log_type >= CXL_EVENT_TYPE_MAX) {
> +        return NULL;
> +    }
> +    return &cxlds->event_logs[log_type];
> +}
> +
> +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log)
> +{
> +    return log->events[log->cur_event];
> +}
> +
> +uint16_t get_cur_event_handle(struct cxl_event_log *log)
> +{
> +    return cpu_to_le16(log->cur_event);
> +}
> +
> +bool log_empty(struct cxl_event_log *log)
> +{
> +    return log->cur_event == log->nr_events;
> +}
> +
> +int log_rec_left(struct cxl_event_log *log)
> +{
> +    return log->nr_events - log->cur_event;
> +}
> +
> +static void event_store_add_event(CXLDeviceState *cxlds,
> +                                  enum cxl_event_log_type log_type,
> +                                  struct cxl_event_record_raw *event)
> +{
> +    struct cxl_event_log *log;
> +
> +    assert(log_type < CXL_EVENT_TYPE_MAX);
> +
> +    log = &cxlds->event_logs[log_type];
> +    assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);
> +
> +    log->events[log->nr_events] = event;
> +    log->nr_events++;
> +}
> +
> +uint16_t log_overflow(struct cxl_event_log *log)
> +{
> +    int cnt = log_rec_left(log) - 5;

Why -5?  Can't we make it actually overflow and drop records
if that happens?

> +
> +    if (cnt < 0) {
> +        return 0;
> +    }
> +    return cnt;
> +}
> +
> +#define CXL_EVENT_RECORD_FLAG_PERMANENT         BIT(2)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED      BIT(3)
> +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED     BIT(4)
> +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE        BIT(5)
> +
> +struct cxl_event_record_raw maint_needed = {
> +    .hdr = {
> +        .id.data = UUID(0xDEADBEEF, 0xCAFE, 0xBABE,
> +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> +        .length = sizeof(struct cxl_event_record_raw),
> +        .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0xa5b6),
> +    },
> +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +struct cxl_event_record_raw hardware_replace = {
> +    .hdr = {
> +        .id.data = UUID(0xBABECAFE, 0xBEEF, 0xDEAD,
> +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> +        .length = sizeof(struct cxl_event_record_raw),
> +        .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0xb6a5),
> +    },
> +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT            BIT(0)
> +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT               BIT(1)
> +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW          BIT(2)
> +
> +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR                 0x00
> +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR                  0x01
> +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR           0x02
> +
> +#define CXL_GMER_TRANS_UNKNOWN                          0x00
> +#define CXL_GMER_TRANS_HOST_READ                        0x01
> +#define CXL_GMER_TRANS_HOST_WRITE                       0x02
> +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA                  0x03
> +#define CXL_GMER_TRANS_HOST_INJECT_POISON               0x04
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB             0x05
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT        0x06
> +
> +#define CXL_GMER_VALID_CHANNEL                          BIT(0)
> +#define CXL_GMER_VALID_RANK                             BIT(1)
> +#define CXL_GMER_VALID_DEVICE                           BIT(2)
> +#define CXL_GMER_VALID_COMPONENT                        BIT(3)
> +
> +struct cxl_event_gen_media gen_media = {
> +    .hdr = {
> +        .id.data = UUID(0xfbcd0a77, 0xc260, 0x417f,
> +                        0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> +        .length = sizeof(struct cxl_event_gen_media),
> +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0),
> +    },
> +    .phys_addr = const_le64(0x2000),
> +    .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> +    .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> +    .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> +    .validity_flags = { CXL_GMER_VALID_CHANNEL |
> +                        CXL_GMER_VALID_RANK, 0 },
> +    .channel = 1,
> +    .rank = 30
> +};
> +
> +#define CXL_DER_VALID_CHANNEL                           BIT(0)
> +#define CXL_DER_VALID_RANK                              BIT(1)
> +#define CXL_DER_VALID_NIBBLE                            BIT(2)
> +#define CXL_DER_VALID_BANK_GROUP                        BIT(3)
> +#define CXL_DER_VALID_BANK                              BIT(4)
> +#define CXL_DER_VALID_ROW                               BIT(5)
> +#define CXL_DER_VALID_COLUMN                            BIT(6)
> +#define CXL_DER_VALID_CORRECTION_MASK                   BIT(7)
> +
> +struct cxl_event_dram dram = {
> +    .hdr = {
> +        .id.data = UUID(0x601dcbb3, 0x9c06, 0x4eab,
> +                        0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> +        .length = sizeof(struct cxl_event_dram),
> +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0),
> +    },
> +    .phys_addr = const_le64(0x8000),
> +    .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> +    .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> +    .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> +    .validity_flags = { CXL_DER_VALID_CHANNEL |
> +                        CXL_DER_VALID_BANK_GROUP |
> +                        CXL_DER_VALID_BANK |
> +                        CXL_DER_VALID_COLUMN, 0 },
> +    .channel = 1,
> +    .bank_group = 5,
> +    .bank = 2,
> +    .column = { 0xDE, 0xAD},
> +};
> +
> +#define CXL_MMER_HEALTH_STATUS_CHANGE           0x00
> +#define CXL_MMER_MEDIA_STATUS_CHANGE            0x01
> +#define CXL_MMER_LIFE_USED_CHANGE               0x02
> +#define CXL_MMER_TEMP_CHANGE                    0x03
> +#define CXL_MMER_DATA_PATH_ERROR                0x04
> +#define CXL_MMER_LAS_ERROR                      0x05

Ah this explains why I didn't find these alongside the structures.
I'd keep them together.  If we need to put the structures in a header
then put the defines there as well.  Puts all the spec related
stuff in one place.

> +
> +#define CXL_DHI_HS_MAINTENANCE_NEEDED           BIT(0)
> +#define CXL_DHI_HS_PERFORMANCE_DEGRADED         BIT(1)
> +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED        BIT(2)
> +
> +#define CXL_DHI_MS_NORMAL                                    0x00
> +#define CXL_DHI_MS_NOT_READY                                 0x01
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST                    0x02
> +#define CXL_DHI_MS_ALL_DATA_LOST                             0x03
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS   0x04
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN     0x05
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT           0x06
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS      0x07
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN        0x08
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT              0x09
> +
> +#define CXL_DHI_AS_NORMAL               0x0
> +#define CXL_DHI_AS_WARNING              0x1
> +#define CXL_DHI_AS_CRITICAL             0x2
> +
> +#define CXL_DHI_AS_LIFE_USED(as)        (as & 0x3)
> +#define CXL_DHI_AS_DEV_TEMP(as)         ((as & 0xC) >> 2)
> +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)  ((as & 0x10) >> 4)
> +#define CXL_DHI_AS_COR_PER_ERR_CNT(as)  ((as & 0x20) >> 5)
> +
> +struct cxl_event_mem_module mem_module = {
> +    .hdr = {
> +        .id.data = UUID(0xfe927475, 0xdd59, 0x4339,
> +                        0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),

As mentioned, below a UUID define for each type in the header
probably makes more sense than having this huge thing inline.

> +        .length = sizeof(struct cxl_event_mem_module),
> +        /* .handle = Set dynamically */
> +        .related_handle = const_le16(0),
> +    },
> +    .event_type = CXL_MMER_TEMP_CHANGE,
> +    .info = {
> +        .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> +        .media_status = CXL_DHI_MS_ALL_DATA_LOST,
> +        .add_status = (CXL_DHI_AS_CRITICAL << 2) |
> +                       (CXL_DHI_AS_WARNING << 4) |
> +                       (CXL_DHI_AS_WARNING << 5),
> +        .device_temp = { 0xDE, 0xAD},

odd spacing

> +        .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
> +        .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },

Could make a reasonable number up rather than deadbeef ;)

> +        .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> +    }
> +};
> +
> +void cxl_mock_add_event_logs(CXLDeviceState *cxlds)
> +{

This is fine for initial testing, but I Think we want to be more
sophisticated with the injection interface and allow injecting
individual events so we can move the requirement for 'coverage'
testing from having a representative list here to an external script
that hits all the corners.

I can build something on top of this that lets us doing that if you like.
I have ancient code doing the equivalent for CCIX devices that I never
upstreamed. Would probably do it a bit differently today but principle
is the same. Using QMP  directly rather than qmp-shell lets you do it
as json which ends up more readable than complex command lines for this
sort of structure command.



> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO, &maint_needed);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> +                          (struct cxl_event_record_raw *)&gen_media);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> +                          (struct cxl_event_record_raw *)&mem_module);
> +
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &maint_needed);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> +                          (struct cxl_event_record_raw *)&dram);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> +                          (struct cxl_event_record_raw *)&gen_media);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> +                          (struct cxl_event_record_raw *)&mem_module);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> +                          (struct cxl_event_record_raw *)&dram);
> +
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL,
> +                          (struct cxl_event_record_raw *)&dram);
> +}


> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 7b4cff569347..46c50c1c13a6 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -11,6 +11,7 @@
>  #define CXL_DEVICE_H
>  
>  #include "hw/register.h"
> +#include "hw/cxl/cxl_events.h"
>  
>  /*
>   * The following is how a CXL device's Memory Device registers are laid out.
> @@ -80,6 +81,14 @@
>      (CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH +     \
>       CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
>  
> +#define CXL_TEST_EVENT_CNT_MAX 15

Where did 15 come from?

> +
> +struct cxl_event_log {
> +    int cur_event;
> +    int nr_events;
> +    struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
> +};
> +
>  typedef struct cxl_device_state {
>      MemoryRegion device_registers;
>  
> @@ -119,6 +128,8 @@ typedef struct cxl_device_state {
>  
>      /* memory region for persistent memory, HDM */
>      uint64_t pmem_size;
> +
> +    struct cxl_event_log event_logs[CXL_EVENT_TYPE_MAX];
>  } CXLDeviceState;
>  
>  /* Initialize the register block for a device */
> @@ -272,4 +283,12 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
>  MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
>                              unsigned size, MemTxAttrs attrs);
>  
> +void cxl_mock_add_event_logs(CXLDeviceState *cxlds);
> +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type);
> +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log);
> +uint16_t get_cur_event_handle(struct cxl_event_log *log);
> +bool log_empty(struct cxl_event_log *log);
> +int log_rec_left(struct cxl_event_log *log);
> +uint16_t log_overflow(struct cxl_event_log *log);
> +
>  #endif
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> new file mode 100644
> index 000000000000..255111f3dcfb
> --- /dev/null
> +++ b/include/hw/cxl/cxl_events.h
> @@ -0,0 +1,173 @@
> +/*
> + * QEMU CXL Events
> + *
> + * Copyright (c) 2022 Intel
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef CXL_EVENTS_H
> +#define CXL_EVENTS_H
> +
> +#include "qemu/uuid.h"
> +#include "hw/cxl/cxl.h"
> +
> +/*
> + * Common Event Record Format
> + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +#define CXL_EVENT_REC_HDR_RES_LEN 0xf

I don't see an advantage in this define vs just
putting the value in directly below.
Same with similar cases - the define must makes them
a tiny bit harder to compare with the specification when
reviewing.

> +struct cxl_event_record_hdr {
> +    QemuUUID id;
> +    uint8_t length;
> +    uint8_t flags[3];
> +    uint16_t handle;
> +    uint16_t related_handle;
> +    uint64_t timestamp;
> +    uint8_t maint_op_class;
> +    uint8_t reserved[CXL_EVENT_REC_HDR_RES_LEN];
> +} QEMU_PACKED;
> +
> +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> +struct cxl_event_record_raw {
> +    struct cxl_event_record_hdr hdr;
> +    uint8_t data[CXL_EVENT_RECORD_DATA_LENGTH];
> +} QEMU_PACKED;

Hmm. I wonder if we should instead define this as a union of
the known event types?  I haven't checked if it would work
everywhere yet though.

> +
> +/*
> + * Get Event Records output payload
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> + *
> + * Space given for 1 record
> + */
> +#define CXL_GET_EVENT_FLAG_OVERFLOW     BIT(0)
> +#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
> +struct cxl_get_event_payload {
> +    uint8_t flags;
> +    uint8_t reserved1;
> +    uint16_t overflow_err_count;
> +    uint64_t first_overflow_timestamp;
> +    uint64_t last_overflow_timestamp;
> +    uint16_t record_count;
> +    uint8_t reserved2[0xa];
> +    struct cxl_event_record_raw record;

This last element should be a [] array and then move
the handling of different record counts to the places it
is used.

Spec unfortunately says that we should return as many
as we can fit, so we can't rely on the users of this interface
only sending a request for one record (as I think your Linux
kernel code currently does). See below for more on this...


> +} QEMU_PACKED;
> +
> +/*
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> + */
> +enum cxl_event_log_type {
> +    CXL_EVENT_TYPE_INFO = 0x00,
> +    CXL_EVENT_TYPE_WARN,
> +    CXL_EVENT_TYPE_FAIL,
> +    CXL_EVENT_TYPE_FATAL,
> +    CXL_EVENT_TYPE_DYNAMIC_CAP,
> +    CXL_EVENT_TYPE_MAX
> +};
> +
> +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> +{
> +    switch (type) {
> +    case CXL_EVENT_TYPE_INFO:
> +        return "Informational";
> +    case CXL_EVENT_TYPE_WARN:
> +        return "Warning";
> +    case CXL_EVENT_TYPE_FAIL:
> +        return "Failure";
> +    case CXL_EVENT_TYPE_FATAL:
> +        return "Fatal";
> +    case CXL_EVENT_TYPE_DYNAMIC_CAP:
> +        return "Dynamic Capacity";
> +    default:
> +        break;
> +    }
> +    return "<unknown>";
> +}
> +
> +/*
> + * Clear Event Records input payload
> + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> + *
> + * Space given for 1 record

I'd rather this was defined to have a trailing variable length
array of handles and allocations and then wherever it was used
we deal with the length.

I'm also nervous about limiting the qemu emulation to handling only
one record.. Spec wise I don't think you are allowed to say
no to larger clears.  I understand the fact we can't test this today
with the kernel code but maybe we can hack together enough to
verify the emulation of larger gets and clears...


> + */
> +struct cxl_mbox_clear_event_payload {
> +    uint8_t event_log;      /* enum cxl_event_log_type */
> +    uint8_t clear_flags;
> +    uint8_t nr_recs;        /* 1 for this struct */
> +    uint8_t reserved[3];
> +    uint16_t handle;
> +};
> +
> +/*
> + * General Media Event Record
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> + */

In interests of keeping everything that needs checking against
a chunk of the spec together, perhaps it's worth adding appropriate
defines for the UUIDs?

> +#define CXL_EVENT_GEN_MED_COMP_ID_SIZE  0x10
> +#define CXL_EVENT_GEN_MED_RES_SIZE      0x2e

As above, I'd rather see these inline.

> +struct cxl_event_gen_media {
> +    struct cxl_event_record_hdr hdr;
> +    uint64_t phys_addr;
Defines for the mask + that we have a few things hiding in
the bottom bits?

> +    uint8_t descriptor;
Defines for the various fields in here?

> +    uint8_t type;
Same for the others that follow.

> +    uint8_t transaction_type;

> +    uint8_t validity_flags[2];

uint16_t probably makes sense as we can do that for this one (unlike the helpful le24 flags fields
in other structures).

> +    uint8_t channel;
> +    uint8_t rank;
> +    uint8_t device[3];
> +    uint8_t component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
> +    uint8_t reserved[CXL_EVENT_GEN_MED_RES_SIZE];
> +} QEMU_PACKED;
Would be nice to add a build time check that these structures have the correct
overall size. Ben did a bunch of these in the other CXL emulation and they are
a handy way to reassure reviewers that it adds up right!

> +
> +/*
> + * DRAM Event Record - DER
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
> + */
> +#define CXL_EVENT_DER_CORRECTION_MASK_SIZE   0x20
> +#define CXL_EVENT_DER_RES_SIZE               0x17
Same as above.

> +struct cxl_event_dram {
> +    struct cxl_event_record_hdr hdr;
> +    uint64_t phys_addr;
As before I'd like defines for the sub fields and masks.

> +    uint8_t descriptor;
> +    uint8_t type;
> +    uint8_t transaction_type;
> +    uint8_t validity_flags[2];
uint16_t and same in similar cases.

> +    uint8_t channel;
> +    uint8_t rank;
> +    uint8_t nibble_mask[3];
> +    uint8_t bank_group;
> +    uint8_t bank;
> +    uint8_t row[3];
> +    uint8_t column[2];
> +    uint8_t correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
> +    uint8_t reserved[CXL_EVENT_DER_RES_SIZE];
> +} QEMU_PACKED;
> +
> +/*
> + * Get Health Info Record
> + * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +struct cxl_get_health_info {
Same stuff as for earlier structures.

> +    uint8_t health_status;
> +    uint8_t media_status;
> +    uint8_t add_status;
> +    uint8_t life_used;
> +    uint8_t device_temp[2];
> +    uint8_t dirty_shutdown_cnt[4];
> +    uint8_t cor_vol_err_cnt[4];
> +    uint8_t cor_per_err_cnt[4];
> +} QEMU_PACKED;
> +
> +/*
> + * Memory Module Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +#define CXL_EVENT_MEM_MOD_RES_SIZE  0x3d
> +struct cxl_event_mem_module {
> +    struct cxl_event_record_hdr hdr;
> +    uint8_t event_type;
> +    struct cxl_get_health_info info;
> +    uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
> +} QEMU_PACKED;
> +
> +#endif /* CXL_EVENTS_H */



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 4/6] hw/cxl/mailbox: Wire up get/clear event mailbox commands
  2022-10-10 22:29 ` [RFC PATCH 4/6] hw/cxl/mailbox: Wire up get/clear event mailbox commands ira.weiny
@ 2022-10-11 10:26     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2022-10-11 10:26 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:42 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Replace the stubbed out CXL Get/Clear Event mailbox commands with
> commands which return the mock event information.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  hw/cxl/cxl-device-utils.c  |   1 +
>  hw/cxl/cxl-mailbox-utils.c | 103 +++++++++++++++++++++++++++++++++++--
>  2 files changed, 101 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> index 687759b3017b..4bb41101882e 100644
> --- a/hw/cxl/cxl-device-utils.c
> +++ b/hw/cxl/cxl-device-utils.c
> @@ -262,4 +262,5 @@ void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
>      memdev_reg_init_common(cxl_dstate);
>  
>      assert(cxl_initialize_mailbox(cxl_dstate) == 0);
> +    cxl_mock_add_event_logs(cxl_dstate);

Given you add support for injection later, why start with some records?
If we do want to do this for testing detection of events before driver
is loaded, then add a parameter to the command line to turn this on.

>  }
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index bb66c765a538..df345f23a30c 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -9,6 +9,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "hw/cxl/cxl.h"
> +#include "hw/cxl/cxl_events.h"
>  #include "hw/pci/pci.h"
>  #include "qemu/cutils.h"
>  #include "qemu/log.h"
> @@ -116,11 +117,107 @@ struct cxl_cmd {
>          return CXL_MBOX_SUCCESS;                                          \
>      }
>  
> -DEFINE_MAILBOX_HANDLER_ZEROED(events_get_records, 0x20);
> -DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
>  DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
>  DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
>  
> +static ret_code cmd_events_get_records(struct cxl_cmd *cmd,
> +                                       CXLDeviceState *cxlds,
> +                                       uint16_t *len)
> +{
> +    struct cxl_get_event_payload *pl;
> +    struct cxl_event_log *log;
> +    uint8_t log_type;
> +    uint16_t nr_overflow;
> +
> +    if (cmd->in < sizeof(log_type)) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    log_type = *((uint8_t *)cmd->payload);
> +    if (log_type >= CXL_EVENT_TYPE_MAX) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    pl = (struct cxl_get_event_payload *)cmd->payload;
> +
> +    log = find_event_log(cxlds, log_type);
> +    if (!log || log_empty(log)) {
> +        goto no_data;
> +    }
> +
> +    memset(pl, 0, sizeof(*pl));
> +    pl->record_count = const_le16(1);
> +
> +    if (log_rec_left(log) > 1) {

As below we need to handle a request that can take more than
one record, otherwise we aren't complaint with the spec.

> +        pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
> +    }
> +
> +    nr_overflow = log_overflow(log);
> +    if (nr_overflow) {
> +        struct timespec ts;
> +        uint64_t ns;
> +
> +        clock_gettime(CLOCK_REALTIME, &ts);
> +
> +        ns = ((uint64_t)ts.tv_sec * 1000000000) + (uint64_t)ts.tv_nsec;
> +
> +        pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW;
> +        pl->overflow_err_count = cpu_to_le16(nr_overflow);
> +        ns -= 5000000000; /* 5s ago */
> +        pl->first_overflow_timestamp = cpu_to_le64(ns);
> +        ns -= 1000000000; /* 1s ago */
> +        pl->last_overflow_timestamp = cpu_to_le64(ns);
> +    }
> +
> +    memcpy(&pl->record, get_cur_event(log), sizeof(pl->record));
> +    pl->record.hdr.handle = get_cur_event_handle(log);
> +    *len = sizeof(pl->record);
> +    return CXL_MBOX_SUCCESS;
> +
> +no_data:
> +    *len = sizeof(*pl) - sizeof(pl->record);
> +    memset(pl, 0, *len);
> +    return CXL_MBOX_SUCCESS;
> +}
> +
> +static ret_code cmd_events_clear_records(struct cxl_cmd *cmd,
> +                                         CXLDeviceState *cxlds,
> +                                         uint16_t *len)
> +{
> +    struct cxl_mbox_clear_event_payload *pl;
> +    struct cxl_event_log *log;
> +    uint8_t log_type;
> +
> +    pl = (struct cxl_mbox_clear_event_payload *)cmd->payload;
> +    log_type = pl->event_log;
> +
> +    /* Don't handle more than 1 record at a time */
> +    if (pl->nr_recs != 1) {

I think we need to fix this so it will handle multiple clears + hack just
enough in on kernel side to verify it.

I don't recall seeing that invalid input is something we can return if
we simply don't support as many clear entries as the command provides.

> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    if (log_type >= CXL_EVENT_TYPE_MAX) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    log = find_event_log(cxlds, log_type);
> +    if (!log) {
> +        return CXL_MBOX_SUCCESS;
> +    }
> +
> +    /*
> +     * The current code clears events as they are read.  Test that behavior
> +     * only; don't support clearning from the middle of the log

This comment had me worried that we were looking at needing
to request an errata.
Thankfully there is a statement in the r3.0 spec under 8.2.9.2.3
"Events shall be cleared in temporal order.  The device shall
verify the event record handles specified in the input payload are
in temporal order.  If the device detects an older event record
that will not be cleared when Clear Event Records is executed,
the device shall return Invalid Handle return code and shall not
clear any of the specified event codes"

Hence, wrong return value and the comment needs updating to reflect
that such a mid log clear isn't allowed by the spec.


> +     */
> +    if (log->cur_event != le16_to_cpu(pl->handle)) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    log->cur_event++;
> +    *len = 0;
> +    return CXL_MBOX_SUCCESS;
> +}
> +
>  /* 8.2.9.2.1 */
>  static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
>                                               CXLDeviceState *cxl_dstate,
> @@ -391,7 +488,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>      [EVENTS][GET_RECORDS] = { "EVENTS_GET_RECORDS",
>          cmd_events_get_records, 1, 0 },
>      [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
> -        cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
> +        cmd_events_clear_records, 8, IMMEDIATE_LOG_CHANGE },

>      [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
>          cmd_events_get_interrupt_policy, 0, 0 },
>      [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 4/6] hw/cxl/mailbox: Wire up get/clear event mailbox commands
@ 2022-10-11 10:26     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron via @ 2022-10-11 10:26 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:42 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Replace the stubbed out CXL Get/Clear Event mailbox commands with
> commands which return the mock event information.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  hw/cxl/cxl-device-utils.c  |   1 +
>  hw/cxl/cxl-mailbox-utils.c | 103 +++++++++++++++++++++++++++++++++++--
>  2 files changed, 101 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> index 687759b3017b..4bb41101882e 100644
> --- a/hw/cxl/cxl-device-utils.c
> +++ b/hw/cxl/cxl-device-utils.c
> @@ -262,4 +262,5 @@ void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
>      memdev_reg_init_common(cxl_dstate);
>  
>      assert(cxl_initialize_mailbox(cxl_dstate) == 0);
> +    cxl_mock_add_event_logs(cxl_dstate);

Given you add support for injection later, why start with some records?
If we do want to do this for testing detection of events before driver
is loaded, then add a parameter to the command line to turn this on.

>  }
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index bb66c765a538..df345f23a30c 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -9,6 +9,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "hw/cxl/cxl.h"
> +#include "hw/cxl/cxl_events.h"
>  #include "hw/pci/pci.h"
>  #include "qemu/cutils.h"
>  #include "qemu/log.h"
> @@ -116,11 +117,107 @@ struct cxl_cmd {
>          return CXL_MBOX_SUCCESS;                                          \
>      }
>  
> -DEFINE_MAILBOX_HANDLER_ZEROED(events_get_records, 0x20);
> -DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
>  DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
>  DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
>  
> +static ret_code cmd_events_get_records(struct cxl_cmd *cmd,
> +                                       CXLDeviceState *cxlds,
> +                                       uint16_t *len)
> +{
> +    struct cxl_get_event_payload *pl;
> +    struct cxl_event_log *log;
> +    uint8_t log_type;
> +    uint16_t nr_overflow;
> +
> +    if (cmd->in < sizeof(log_type)) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    log_type = *((uint8_t *)cmd->payload);
> +    if (log_type >= CXL_EVENT_TYPE_MAX) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    pl = (struct cxl_get_event_payload *)cmd->payload;
> +
> +    log = find_event_log(cxlds, log_type);
> +    if (!log || log_empty(log)) {
> +        goto no_data;
> +    }
> +
> +    memset(pl, 0, sizeof(*pl));
> +    pl->record_count = const_le16(1);
> +
> +    if (log_rec_left(log) > 1) {

As below we need to handle a request that can take more than
one record, otherwise we aren't complaint with the spec.

> +        pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
> +    }
> +
> +    nr_overflow = log_overflow(log);
> +    if (nr_overflow) {
> +        struct timespec ts;
> +        uint64_t ns;
> +
> +        clock_gettime(CLOCK_REALTIME, &ts);
> +
> +        ns = ((uint64_t)ts.tv_sec * 1000000000) + (uint64_t)ts.tv_nsec;
> +
> +        pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW;
> +        pl->overflow_err_count = cpu_to_le16(nr_overflow);
> +        ns -= 5000000000; /* 5s ago */
> +        pl->first_overflow_timestamp = cpu_to_le64(ns);
> +        ns -= 1000000000; /* 1s ago */
> +        pl->last_overflow_timestamp = cpu_to_le64(ns);
> +    }
> +
> +    memcpy(&pl->record, get_cur_event(log), sizeof(pl->record));
> +    pl->record.hdr.handle = get_cur_event_handle(log);
> +    *len = sizeof(pl->record);
> +    return CXL_MBOX_SUCCESS;
> +
> +no_data:
> +    *len = sizeof(*pl) - sizeof(pl->record);
> +    memset(pl, 0, *len);
> +    return CXL_MBOX_SUCCESS;
> +}
> +
> +static ret_code cmd_events_clear_records(struct cxl_cmd *cmd,
> +                                         CXLDeviceState *cxlds,
> +                                         uint16_t *len)
> +{
> +    struct cxl_mbox_clear_event_payload *pl;
> +    struct cxl_event_log *log;
> +    uint8_t log_type;
> +
> +    pl = (struct cxl_mbox_clear_event_payload *)cmd->payload;
> +    log_type = pl->event_log;
> +
> +    /* Don't handle more than 1 record at a time */
> +    if (pl->nr_recs != 1) {

I think we need to fix this so it will handle multiple clears + hack just
enough in on kernel side to verify it.

I don't recall seeing that invalid input is something we can return if
we simply don't support as many clear entries as the command provides.

> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    if (log_type >= CXL_EVENT_TYPE_MAX) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    log = find_event_log(cxlds, log_type);
> +    if (!log) {
> +        return CXL_MBOX_SUCCESS;
> +    }
> +
> +    /*
> +     * The current code clears events as they are read.  Test that behavior
> +     * only; don't support clearning from the middle of the log

This comment had me worried that we were looking at needing
to request an errata.
Thankfully there is a statement in the r3.0 spec under 8.2.9.2.3
"Events shall be cleared in temporal order.  The device shall
verify the event record handles specified in the input payload are
in temporal order.  If the device detects an older event record
that will not be cleared when Clear Event Records is executed,
the device shall return Invalid Handle return code and shall not
clear any of the specified event codes"

Hence, wrong return value and the comment needs updating to reflect
that such a mid log clear isn't allowed by the spec.


> +     */
> +    if (log->cur_event != le16_to_cpu(pl->handle)) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    log->cur_event++;
> +    *len = 0;
> +    return CXL_MBOX_SUCCESS;
> +}
> +
>  /* 8.2.9.2.1 */
>  static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
>                                               CXLDeviceState *cxl_dstate,
> @@ -391,7 +488,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>      [EVENTS][GET_RECORDS] = { "EVENTS_GET_RECORDS",
>          cmd_events_get_records, 1, 0 },
>      [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
> -        cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
> +        cmd_events_clear_records, 8, IMMEDIATE_LOG_CHANGE },

>      [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
>          cmd_events_get_interrupt_policy, 0, 0 },
>      [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 5/6] hw/cxl/cxl-events: Add event interrupt support
  2022-10-10 22:29 ` [RFC PATCH 5/6] hw/cxl/cxl-events: Add event interrupt support ira.weiny
@ 2022-10-11 10:30     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2022-10-11 10:30 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:43 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> To facilitate testing of event interrupt support add a QMP HMP command
> to reset the event logs and issue interrupts when the guest has enabled
> those interrupts.
Two things in here, so probably wants breaking into two patches:
1) Add the injection command
2) Add the interrupt support.

As on earlier patches, I think we need a more sophisticated
injection interface so we can inject individual errors (or better yet sets of
errors so we can trigger single error case, and multiple error per interrupt.)

Jonathan


> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  hmp-commands.hx             | 14 +++++++
>  hw/cxl/cxl-events.c         | 82 +++++++++++++++++++++++++++++++++++++
>  hw/cxl/cxl-host-stubs.c     |  5 +++
>  hw/mem/cxl_type3.c          |  7 +++-
>  include/hw/cxl/cxl_device.h |  3 ++
>  include/sysemu/sysemu.h     |  3 ++
>  6 files changed, 113 insertions(+), 1 deletion(-)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 564f1de364df..c59a98097317 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1266,6 +1266,20 @@ SRST
>    Inject PCIe AER error
>  ERST
>  
> +    {
> +        .name       = "cxl_event_inject",
> +        .args_type  = "id:s",
> +        .params     = "id <error_status>",
> +        .help       = "inject cxl events and interrupt\n\t\t\t"
> +                      "<id> = qdev device id\n\t\t\t",
> +        .cmd        = hmp_cxl_event_inject,
> +    },
> +
> +SRST
> +``cxl_event_inject``
> +  Inject CXL Events
> +ERST
> +
>      {
>          .name       = "netdev_add",
>          .args_type  = "netdev:O",


>  const MemoryRegionOps cfmws_ops;
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 2b13179d116d..b4a90136d190 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -459,7 +459,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>      ComponentRegisters *regs = &cxl_cstate->crb;
>      MemoryRegion *mr = &regs->component_registers;
>      uint8_t *pci_conf = pci_dev->config;
> -    unsigned short msix_num = 3;
> +    unsigned short msix_num = 7;
>      int i;
>  
>      if (!cxl_setup_memory(ct3d, errp)) {
> @@ -502,6 +502,11 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>          msix_vector_use(pci_dev, i);
>      }
>  
> +    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_INFO] = 6;
> +    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_WARN] = 5;
> +    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_FAIL] = 4;
> +    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_FATAL] = 3;

For testing purposes, maybe put 2 of them on same interrupt vector?
That way we'll verify the kernel code deals fine with either separate
interrupts or shared vectors.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 5/6] hw/cxl/cxl-events: Add event interrupt support
@ 2022-10-11 10:30     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron via @ 2022-10-11 10:30 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:43 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> To facilitate testing of event interrupt support add a QMP HMP command
> to reset the event logs and issue interrupts when the guest has enabled
> those interrupts.
Two things in here, so probably wants breaking into two patches:
1) Add the injection command
2) Add the interrupt support.

As on earlier patches, I think we need a more sophisticated
injection interface so we can inject individual errors (or better yet sets of
errors so we can trigger single error case, and multiple error per interrupt.)

Jonathan


> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  hmp-commands.hx             | 14 +++++++
>  hw/cxl/cxl-events.c         | 82 +++++++++++++++++++++++++++++++++++++
>  hw/cxl/cxl-host-stubs.c     |  5 +++
>  hw/mem/cxl_type3.c          |  7 +++-
>  include/hw/cxl/cxl_device.h |  3 ++
>  include/sysemu/sysemu.h     |  3 ++
>  6 files changed, 113 insertions(+), 1 deletion(-)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 564f1de364df..c59a98097317 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1266,6 +1266,20 @@ SRST
>    Inject PCIe AER error
>  ERST
>  
> +    {
> +        .name       = "cxl_event_inject",
> +        .args_type  = "id:s",
> +        .params     = "id <error_status>",
> +        .help       = "inject cxl events and interrupt\n\t\t\t"
> +                      "<id> = qdev device id\n\t\t\t",
> +        .cmd        = hmp_cxl_event_inject,
> +    },
> +
> +SRST
> +``cxl_event_inject``
> +  Inject CXL Events
> +ERST
> +
>      {
>          .name       = "netdev_add",
>          .args_type  = "netdev:O",


>  const MemoryRegionOps cfmws_ops;
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 2b13179d116d..b4a90136d190 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -459,7 +459,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>      ComponentRegisters *regs = &cxl_cstate->crb;
>      MemoryRegion *mr = &regs->component_registers;
>      uint8_t *pci_conf = pci_dev->config;
> -    unsigned short msix_num = 3;
> +    unsigned short msix_num = 7;
>      int i;
>  
>      if (!cxl_setup_memory(ct3d, errp)) {
> @@ -502,6 +502,11 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>          msix_vector_use(pci_dev, i);
>      }
>  
> +    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_INFO] = 6;
> +    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_WARN] = 5;
> +    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_FAIL] = 4;
> +    ct3d->cxl_dstate.event_vector[CXL_EVENT_TYPE_FATAL] = 3;

For testing purposes, maybe put 2 of them on same interrupt vector?
That way we'll verify the kernel code deals fine with either separate
interrupts or shared vectors.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 6/6] hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy
  2022-10-10 22:29 ` [RFC PATCH 6/6] hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy ira.weiny
@ 2022-10-11 10:40     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2022-10-11 10:40 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:44 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Replace the stubbed out CXL Get/Set Event interrupt policy mailbox
> commands.  Enable those commands to control interrupts for each of the
> event log types.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
A few trivial comments inline.

Thanks,

Jonathan

> ---
>  hw/cxl/cxl-mailbox-utils.c  | 129 ++++++++++++++++++++++++++++++------
>  include/hw/cxl/cxl_events.h |  21 ++++++
>  2 files changed, 129 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index df345f23a30c..52e8804c24ed 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -101,25 +101,6 @@ struct cxl_cmd {
>      uint8_t *payload;
>  };
>  
> -#define DEFINE_MAILBOX_HANDLER_ZEROED(name, size)                         \
> -    uint16_t __zero##name = size;                                         \
> -    static ret_code cmd_##name(struct cxl_cmd *cmd,                       \
> -                               CXLDeviceState *cxl_dstate, uint16_t *len) \
> -    {                                                                     \
> -        *len = __zero##name;                                              \
> -        memset(cmd->payload, 0, *len);                                    \
> -        return CXL_MBOX_SUCCESS;                                          \
> -    }
> -#define DEFINE_MAILBOX_HANDLER_NOP(name)                                  \
> -    static ret_code cmd_##name(struct cxl_cmd *cmd,                       \
> -                               CXLDeviceState *cxl_dstate, uint16_t *len) \
> -    {                                                                     \
> -        return CXL_MBOX_SUCCESS;                                          \
> -    }
> -
> -DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
> -DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
> -
>  static ret_code cmd_events_get_records(struct cxl_cmd *cmd,
>                                         CXLDeviceState *cxlds,
>                                         uint16_t *len)
> @@ -218,6 +199,110 @@ static ret_code cmd_events_clear_records(struct cxl_cmd *cmd,
>      return CXL_MBOX_SUCCESS;
>  }
>  
> +static ret_code cmd_events_get_interrupt_policy(struct cxl_cmd *cmd,
> +                                                CXLDeviceState *cxl_dstate,
> +                                                uint16_t *len)
> +{
> +    struct cxl_event_interrupt_policy *policy;
> +    struct cxl_event_log *log;
> +
> +    policy = (struct cxl_event_interrupt_policy *)cmd->payload;
> +    memset(policy, 0, sizeof(*policy));
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_INFO);

Less obvious than below case, but again, perhaps a little utility function
to cut down on duplication.

> +    if (log->irq_enabled) {
> +        policy->info_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_WARN);
> +    if (log->irq_enabled) {
> +        policy->warn_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FAIL);
> +    if (log->irq_enabled) {
> +        policy->failure_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FATAL);
> +    if (log->irq_enabled) {
> +        policy->fatal_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +    if (log->irq_enabled) {
> +        /* Dynamic Capacity borrows the same vector as info */
> +        policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> +    }
> +
> +    *len = sizeof(*policy);
> +    return CXL_MBOX_SUCCESS;
> +}
> +
> +static ret_code cmd_events_set_interrupt_policy(struct cxl_cmd *cmd,
> +                                                CXLDeviceState *cxl_dstate,
> +                                                uint16_t *len)
> +{
> +    struct cxl_event_interrupt_policy *policy;
> +    struct cxl_event_log *log;
> +
> +    policy = (struct cxl_event_interrupt_policy *)cmd->payload;
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_INFO);
Maybe a utility function?

	set_int_policy(cxl_dstate, CXL_EVENT_TYPE_INFO,
		       policy->info_settings);
	set_int_policy(cxl_dstate, CXL_EVENT_TYPE_WARN,
		       policy->warn_settings);
etc


> +    if ((policy->info_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_INFO];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_WARN);
> +    if ((policy->warn_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_WARN];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FAIL);
> +    if ((policy->failure_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_FAIL];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FATAL);
> +    if ((policy->fatal_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_FATAL];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +    if ((policy->dyn_cap_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        /* Dynamic Capacity borrows the same vector as info */
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_INFO];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    *len = sizeof(*policy);
> +    return CXL_MBOX_SUCCESS;
> +}
> +

...

> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 255111f3dcfb..c121e504a6db 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -170,4 +170,25 @@ struct cxl_event_mem_module {
>      uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
>  } QEMU_PACKED;
>  
> +/**
> + * Event Interrupt Policy
> + *
> + * CXL rev 3.0 section 8.2.9.2.4; Table 8-52
> + */
> +enum cxl_event_int_mode {
> +    CXL_INT_NONE     = 0x00,
> +    CXL_INT_MSI_MSIX = 0x01,
> +    CXL_INT_FW       = 0x02,

I guess at somepoint we'll probably want to wire up the INT_FW path.
Job for another day though!

> +    CXL_INT_RES      = 0x03,

Why define the reserved value here?  By definition we won't use it.

> +};
> +#define CXL_EVENT_INT_MODE_MASK 0x3
> +#define CXL_EVENT_INT_SETTING(vector) ((((uint8_t)vector & 0xf) << 4) | CXL_INT_MSI_MSIX)

I probably haven't had enough caffeine yet today, but why the cast given
you are masking to a smaller range?

> +struct cxl_event_interrupt_policy {
> +    uint8_t info_settings;

Can we shorten these to just info, warn, failure, fatal, dyn_cap?
No real loss I think and will help with some of the long lines above.

> +    uint8_t warn_settings;
> +    uint8_t failure_settings;
> +    uint8_t fatal_settings;
> +    uint8_t dyn_cap_settings;
> +} QEMU_PACKED;
> +
>  #endif /* CXL_EVENTS_H */


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 6/6] hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy
@ 2022-10-11 10:40     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron via @ 2022-10-11 10:40 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:44 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> Replace the stubbed out CXL Get/Set Event interrupt policy mailbox
> commands.  Enable those commands to control interrupts for each of the
> event log types.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
A few trivial comments inline.

Thanks,

Jonathan

> ---
>  hw/cxl/cxl-mailbox-utils.c  | 129 ++++++++++++++++++++++++++++++------
>  include/hw/cxl/cxl_events.h |  21 ++++++
>  2 files changed, 129 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index df345f23a30c..52e8804c24ed 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -101,25 +101,6 @@ struct cxl_cmd {
>      uint8_t *payload;
>  };
>  
> -#define DEFINE_MAILBOX_HANDLER_ZEROED(name, size)                         \
> -    uint16_t __zero##name = size;                                         \
> -    static ret_code cmd_##name(struct cxl_cmd *cmd,                       \
> -                               CXLDeviceState *cxl_dstate, uint16_t *len) \
> -    {                                                                     \
> -        *len = __zero##name;                                              \
> -        memset(cmd->payload, 0, *len);                                    \
> -        return CXL_MBOX_SUCCESS;                                          \
> -    }
> -#define DEFINE_MAILBOX_HANDLER_NOP(name)                                  \
> -    static ret_code cmd_##name(struct cxl_cmd *cmd,                       \
> -                               CXLDeviceState *cxl_dstate, uint16_t *len) \
> -    {                                                                     \
> -        return CXL_MBOX_SUCCESS;                                          \
> -    }
> -
> -DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
> -DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
> -
>  static ret_code cmd_events_get_records(struct cxl_cmd *cmd,
>                                         CXLDeviceState *cxlds,
>                                         uint16_t *len)
> @@ -218,6 +199,110 @@ static ret_code cmd_events_clear_records(struct cxl_cmd *cmd,
>      return CXL_MBOX_SUCCESS;
>  }
>  
> +static ret_code cmd_events_get_interrupt_policy(struct cxl_cmd *cmd,
> +                                                CXLDeviceState *cxl_dstate,
> +                                                uint16_t *len)
> +{
> +    struct cxl_event_interrupt_policy *policy;
> +    struct cxl_event_log *log;
> +
> +    policy = (struct cxl_event_interrupt_policy *)cmd->payload;
> +    memset(policy, 0, sizeof(*policy));
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_INFO);

Less obvious than below case, but again, perhaps a little utility function
to cut down on duplication.

> +    if (log->irq_enabled) {
> +        policy->info_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_WARN);
> +    if (log->irq_enabled) {
> +        policy->warn_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FAIL);
> +    if (log->irq_enabled) {
> +        policy->failure_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FATAL);
> +    if (log->irq_enabled) {
> +        policy->fatal_settings = CXL_EVENT_INT_SETTING(log->irq_vec);
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +    if (log->irq_enabled) {
> +        /* Dynamic Capacity borrows the same vector as info */
> +        policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> +    }
> +
> +    *len = sizeof(*policy);
> +    return CXL_MBOX_SUCCESS;
> +}
> +
> +static ret_code cmd_events_set_interrupt_policy(struct cxl_cmd *cmd,
> +                                                CXLDeviceState *cxl_dstate,
> +                                                uint16_t *len)
> +{
> +    struct cxl_event_interrupt_policy *policy;
> +    struct cxl_event_log *log;
> +
> +    policy = (struct cxl_event_interrupt_policy *)cmd->payload;
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_INFO);
Maybe a utility function?

	set_int_policy(cxl_dstate, CXL_EVENT_TYPE_INFO,
		       policy->info_settings);
	set_int_policy(cxl_dstate, CXL_EVENT_TYPE_WARN,
		       policy->warn_settings);
etc


> +    if ((policy->info_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_INFO];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_WARN);
> +    if ((policy->warn_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_WARN];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FAIL);
> +    if ((policy->failure_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_FAIL];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_FATAL);
> +    if ((policy->fatal_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_FATAL];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    log = find_event_log(cxl_dstate, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +    if ((policy->dyn_cap_settings & CXL_EVENT_INT_MODE_MASK) ==
> +                                                    CXL_INT_MSI_MSIX) {
> +        log->irq_enabled = true;
> +        /* Dynamic Capacity borrows the same vector as info */
> +        log->irq_vec = cxl_dstate->event_vector[CXL_EVENT_TYPE_INFO];
> +    } else {
> +        log->irq_enabled = false;
> +        log->irq_vec = 0;
> +    }
> +
> +    *len = sizeof(*policy);
> +    return CXL_MBOX_SUCCESS;
> +}
> +

...

> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 255111f3dcfb..c121e504a6db 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -170,4 +170,25 @@ struct cxl_event_mem_module {
>      uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
>  } QEMU_PACKED;
>  
> +/**
> + * Event Interrupt Policy
> + *
> + * CXL rev 3.0 section 8.2.9.2.4; Table 8-52
> + */
> +enum cxl_event_int_mode {
> +    CXL_INT_NONE     = 0x00,
> +    CXL_INT_MSI_MSIX = 0x01,
> +    CXL_INT_FW       = 0x02,

I guess at somepoint we'll probably want to wire up the INT_FW path.
Job for another day though!

> +    CXL_INT_RES      = 0x03,

Why define the reserved value here?  By definition we won't use it.

> +};
> +#define CXL_EVENT_INT_MODE_MASK 0x3
> +#define CXL_EVENT_INT_SETTING(vector) ((((uint8_t)vector & 0xf) << 4) | CXL_INT_MSI_MSIX)

I probably haven't had enough caffeine yet today, but why the cast given
you are masking to a smaller range?

> +struct cxl_event_interrupt_policy {
> +    uint8_t info_settings;

Can we shorten these to just info, warn, failure, fatal, dyn_cap?
No real loss I think and will help with some of the long lines above.

> +    uint8_t warn_settings;
> +    uint8_t failure_settings;
> +    uint8_t fatal_settings;
> +    uint8_t dyn_cap_settings;
> +} QEMU_PACKED;
> +
>  #endif /* CXL_EVENTS_H */



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/6] qemu/bswap: Add const_le64()
  2022-10-11  9:48   ` Peter Maydell
@ 2022-10-11 15:22     ` Richard Henderson
  2022-10-11 15:45       ` Peter Maydell
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2022-10-11 15:22 UTC (permalink / raw)
  To: Peter Maydell, ira.weiny
  Cc: Michael Tsirkin, Ben Widawsky, Jonathan Cameron, qemu-devel, linux-cxl

On 10/11/22 02:48, Peter Maydell wrote:
>> +# define const_le64(_x) (_x)
>>   # define const_le32(_x) (_x)
>>   # define const_le16(_x) (_x)
>>   #endif
> 
> This is kind of a weird API, because:
>   * it only exists for little-endian, not big-endian
>   * we use it in exactly two files (linux-user/elfload.c and
>     hw/input/virtio-input-hid.c)
> 
> which leaves me wondering if there's a better way of doing
> it that I'm missing. But maybe it's just that we never filled
> out the missing bits of the API surface because we haven't
> needed them yet. Richard ?

It's piecemeal because, as you note, very few places require a version of byte swapping 
that must be applicable to static data.  I certainly don't want to completely fill this 
out and have most of it remain unused.


r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/6] qemu/bswap: Add const_le64()
  2022-10-11 15:22     ` Richard Henderson
@ 2022-10-11 15:45       ` Peter Maydell
  2022-10-13 22:47         ` Ira Weiny
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Maydell @ 2022-10-11 15:45 UTC (permalink / raw)
  To: Richard Henderson
  Cc: ira.weiny, Michael Tsirkin, Ben Widawsky, Jonathan Cameron,
	qemu-devel, linux-cxl

On Tue, 11 Oct 2022 at 16:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 10/11/22 02:48, Peter Maydell wrote:
> > This is kind of a weird API, because:
> >   * it only exists for little-endian, not big-endian
> >   * we use it in exactly two files (linux-user/elfload.c and
> >     hw/input/virtio-input-hid.c)
> >
> > which leaves me wondering if there's a better way of doing
> > it that I'm missing. But maybe it's just that we never filled
> > out the missing bits of the API surface because we haven't
> > needed them yet. Richard ?
>
> It's piecemeal because, as you note, very few places require a version of byte swapping
> that must be applicable to static data.  I certainly don't want to completely fill this
> out and have most of it remain unused.

Makes sense. In that case, other than ordering the definitions
64-32-16,

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support
  2022-10-11  9:40   ` Jonathan Cameron via
  (?)
@ 2022-10-11 17:03   ` Ira Weiny
  -1 siblings, 0 replies; 35+ messages in thread
From: Ira Weiny @ 2022-10-11 17:03 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Tue, Oct 11, 2022 at 10:40:06AM +0100, Jonathan Cameron wrote:
> On Mon, 10 Oct 2022 15:29:38 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL Event records inform the OS of various CXL device events.  Thus far CXL
> > memory devices are emulated and therefore don't naturally have events which
> > will occur.
> > 
> > Add mock events and a HMP trigger mechanism to facilitate guest OS testing of
> > event support.
> > 
> > This support requires a follow on version of the event patch set.  The RFC was
> > submitted and discussed here:
> > 
> > 	https://lore.kernel.org/linux-cxl/20220813053243.757363-1-ira.weiny@intel.com/
> > 
> > I'll post the lore link to the new version shortly.
> > 
> > Instructions for running this test.
> > 
> > Add qmp option to qemu:
> > 
> > 	<host> $ qemu-system-x86_64 ... -qmp unix:/tmp/run_qemu_qmp_0,server,nowait ...
> > 
> > 	OR
> > 
> > 	<host> $ run_qemu.sh ... --qmp ...
> > 
> > Enable tracing of events within the guest:
> > 
> > 	<guest> $ echo "" > /sys/kernel/tracing/trace
> > 	<guest> $ echo 1 > /sys/kernel/tracing/events/cxl/enable
> > 	<guest> $ echo 1 > /sys/kernel/tracing/tracing_on
> > 
> > Trigger event generation and interrupts in the host:
> > 
> > 	<host> $ echo "cxl_event_inject cxl-devX" | qmp-shell -H /tmp/run_qemu_qmp_0
> > 
> > 	Where X == one of the memory devices; cxl-dev0 should work.
> > 
> > View events on the guest:
> > 
> > 	<guest> $ cat /sys/kernel/tracing/trace
> 
> Hi Ira,
> 
> Why is this an RFC rather than a patch set to apply?

I really just wanted to see what people think of the over all idea.  The
patches themselves stand on their own if the QEMU community is ok using QEMU as
a test vehicle like this.

> 
> It's useful to have that in the cover letter so we can focus on what
> you want comments on (rather than simply review).

Yes sorry,
Ira

> 
> Thanks,
> 
> Jonathan
> 
> > 
> > 
> > Ira Weiny (6):
> >   qemu/bswap: Add const_le64()
> >   qemu/uuid: Add UUID static initializer
> >   hw/cxl/cxl-events: Add CXL mock events
> >   hw/cxl/mailbox: Wire up get/clear event mailbox commands
> >   hw/cxl/cxl-events: Add event interrupt support
> >   hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy
> > 
> >  hmp-commands.hx             |  14 ++
> >  hw/cxl/cxl-device-utils.c   |   1 +
> >  hw/cxl/cxl-events.c         | 330 ++++++++++++++++++++++++++++++++++++
> >  hw/cxl/cxl-host-stubs.c     |   5 +
> >  hw/cxl/cxl-mailbox-utils.c  | 224 +++++++++++++++++++++---
> >  hw/cxl/meson.build          |   1 +
> >  hw/mem/cxl_type3.c          |   7 +-
> >  include/hw/cxl/cxl_device.h |  22 +++
> >  include/hw/cxl/cxl_events.h | 194 +++++++++++++++++++++
> >  include/qemu/bswap.h        |  10 ++
> >  include/qemu/uuid.h         |  12 ++
> >  include/sysemu/sysemu.h     |   3 +
> >  12 files changed, 802 insertions(+), 21 deletions(-)
> >  create mode 100644 hw/cxl/cxl-events.c
> >  create mode 100644 include/hw/cxl/cxl_events.h
> > 
> > 
> > base-commit: 6f7f81898e4437ea544ee4ca24bef7ec543b1f06
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/6] qemu/bswap: Add const_le64()
  2022-10-11 15:45       ` Peter Maydell
@ 2022-10-13 22:47         ` Ira Weiny
  0 siblings, 0 replies; 35+ messages in thread
From: Ira Weiny @ 2022-10-13 22:47 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, Michael Tsirkin, Ben Widawsky,
	Jonathan Cameron, qemu-devel, linux-cxl

On Tue, Oct 11, 2022 at 04:45:57PM +0100, Peter Maydell wrote:
> On Tue, 11 Oct 2022 at 16:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> > On 10/11/22 02:48, Peter Maydell wrote:
> > > This is kind of a weird API, because:
> > >   * it only exists for little-endian, not big-endian
> > >   * we use it in exactly two files (linux-user/elfload.c and
> > >     hw/input/virtio-input-hid.c)
> > >
> > > which leaves me wondering if there's a better way of doing
> > > it that I'm missing. But maybe it's just that we never filled
> > > out the missing bits of the API surface because we haven't
> > > needed them yet. Richard ?
> >
> > It's piecemeal because, as you note, very few places require a version of byte swapping
> > that must be applicable to static data.  I certainly don't want to completely fill this
> > out and have most of it remain unused.
> 
> Makes sense. In that case, other than ordering the definitions
> 64-32-16,

Done.

> 
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

Thanks!
Ira

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/6] qemu/bswap: Add const_le64()
  2022-10-11  9:03     ` Jonathan Cameron via
  (?)
@ 2022-10-13 22:52     ` Ira Weiny
  -1 siblings, 0 replies; 35+ messages in thread
From: Ira Weiny @ 2022-10-13 22:52 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Tue, Oct 11, 2022 at 10:03:00AM +0100, Jonathan Cameron wrote:
> On Mon, 10 Oct 2022 15:29:39 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Gcc requires constant versions of cpu_to_le* calls.
> > 
> > Add a 64 bit version.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> Seems reasonable to me but I'm not an expert in this stuff.
> FWIW
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> There are probably a lot of places in the CXL emulation where
> our endian handling isn't correct but so far it hasn't mattered
> as all the supported architectures are little endian.
> 
> Good to not introduce more cases however!

Agreed. Thanks!
Ira

> 
> Jonathan
> 
> 
> > ---
> >  include/qemu/bswap.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
> > index 346d05f2aab3..08e607821102 100644
> > --- a/include/qemu/bswap.h
> > +++ b/include/qemu/bswap.h
> > @@ -192,10 +192,20 @@ CPU_CONVERT(le, 64, uint64_t)
> >       (((_x) & 0x0000ff00U) <<  8) |              \
> >       (((_x) & 0x00ff0000U) >>  8) |              \
> >       (((_x) & 0xff000000U) >> 24))
> > +# define const_le64(_x)                          \
> > +    ((((_x) & 0x00000000000000ffU) << 56) |      \
> > +     (((_x) & 0x000000000000ff00U) << 40) |      \
> > +     (((_x) & 0x0000000000ff0000U) << 24) |      \
> > +     (((_x) & 0x00000000ff000000U) <<  8) |      \
> > +     (((_x) & 0x000000ff00000000U) >>  8) |      \
> > +     (((_x) & 0x0000ff0000000000U) >> 24) |      \
> > +     (((_x) & 0x00ff000000000000U) >> 40) |      \
> > +     (((_x) & 0xff00000000000000U) >> 56))
> >  # define const_le16(_x)                          \
> >      ((((_x) & 0x00ff) << 8) |                    \
> >       (((_x) & 0xff00) >> 8))
> >  #else
> > +# define const_le64(_x) (_x)
> >  # define const_le32(_x) (_x)
> >  # define const_le16(_x) (_x)
> >  #endif
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/6] qemu/uuid: Add UUID static initializer
  2022-10-11  9:13     ` Jonathan Cameron via
  (?)
@ 2022-10-13 23:11     ` Ira Weiny
  -1 siblings, 0 replies; 35+ messages in thread
From: Ira Weiny @ 2022-10-13 23:11 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Tue, Oct 11, 2022 at 10:13:17AM +0100, Jonathan Cameron wrote:
> On Mon, 10 Oct 2022 15:29:40 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > UUID's are defined as network byte order fields.  No static initializer
> > was available for UUID's in their standard big endian format.
> > 
> > Define a big endian initializer for UUIDs.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> Seems sensible.  Would allow a cleanup in the existing cel_uuid handling
> in the CXL code where we use a static for this and end up filling it
> with the same value multiple times which is less than ideal...
> A quick grep and for qemu_uuid_parse() suggests there are other cases
> where it's passed a constant string.

I'll see if I can find time to clean that up.

> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks,
Ira

> 
> > ---
> >  include/qemu/uuid.h | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
> > index 9925febfa54d..dc40ee1fc998 100644
> > --- a/include/qemu/uuid.h
> > +++ b/include/qemu/uuid.h
> > @@ -61,6 +61,18 @@ typedef struct {
> >      (clock_seq_hi_and_reserved), (clock_seq_low), (node0), (node1), (node2),\
> >      (node3), (node4), (node5) }
> >  
> > +/* Normal (network byte order) UUID */
> > +#define UUID(time_low, time_mid, time_hi_and_version,                    \
> > +  clock_seq_hi_and_reserved, clock_seq_low, node0, node1, node2,         \
> > +  node3, node4, node5)                                                   \
> > +  { ((time_low) >> 24) & 0xff, ((time_low) >> 16) & 0xff,                \
> > +    ((time_low) >> 8) & 0xff, (time_low) & 0xff,                         \
> > +    ((time_mid) >> 8) & 0xff, (time_mid) & 0xff,                         \
> > +    ((time_hi_and_version) >> 8) & 0xff, (time_hi_and_version) & 0xff,   \
> > +    (clock_seq_hi_and_reserved), (clock_seq_low),                        \
> > +    (node0), (node1), (node2), (node3), (node4), (node5)                 \
> > +  }
> > +
> >  #define UUID_FMT "%02hhx%02hhx%02hhx%02hhx-" \
> >                   "%02hhx%02hhx-%02hhx%02hhx-" \
> >                   "%02hhx%02hhx-" \
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
  2022-10-11 10:07     ` Jonathan Cameron via
  (?)
@ 2022-10-14  0:21     ` Ira Weiny
  2022-10-17 15:57         ` Jonathan Cameron via
  -1 siblings, 1 reply; 35+ messages in thread
From: Ira Weiny @ 2022-10-14  0:21 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Tue, Oct 11, 2022 at 11:07:59AM +0100, Jonathan Cameron wrote:
> On Mon, 10 Oct 2022 15:29:41 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > To facilitate testing of guest software add mock events and code to
> > support iterating through the event logs.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> Various comments inline, but biggest one is I'd like to see
> a much more flexible injection interface.  Happy to help code one
> up if that is useful.

Quick response to this.

I thought about holding off and doing something like that but this got the irq
testing in the kernel off the ground.

I think it would be cool to use QMP to submit events as json.  That would be
much more flexible.  But would have taken a lot more time.

What I did below duplicated the test code cxl-test has.  It was pretty quick to
do that.

The biggest issue with is parsing the various events from the json to binary blobs.

I'll clean up this patch and see what I can do.  But I think having a set of
statically defined blobs which can be injected would make testing a bit easier.
Less framework to format json input to QMP.

More to come...

Ira

> 
> Jonathan
> 
> 
> > ---
> >  hw/cxl/cxl-events.c         | 248 ++++++++++++++++++++++++++++++++++++
> >  hw/cxl/meson.build          |   1 +
> >  include/hw/cxl/cxl_device.h |  19 +++
> >  include/hw/cxl/cxl_events.h | 173 +++++++++++++++++++++++++
> >  4 files changed, 441 insertions(+)
> >  create mode 100644 hw/cxl/cxl-events.c
> >  create mode 100644 include/hw/cxl/cxl_events.h
> > 
> > diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
> > new file mode 100644
> > index 000000000000..c275280bcb64
> > --- /dev/null
> > +++ b/hw/cxl/cxl-events.c
> > @@ -0,0 +1,248 @@
> > +/*
> > + * CXL Event processing
> > + *
> > + * Copyright(C) 2022 Intel Corporation.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > + * COPYING file in the top-level directory.
> > + */
> > +
> > +#include <stdint.h>
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/bswap.h"
> > +#include "qemu/typedefs.h"
> > +#include "hw/cxl/cxl.h"
> > +#include "hw/cxl/cxl_events.h"
> > +
> > +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type)
> > +{
> > +    if (log_type >= CXL_EVENT_TYPE_MAX) {
> > +        return NULL;
> > +    }
> > +    return &cxlds->event_logs[log_type];
> > +}
> > +
> > +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log)
> > +{
> > +    return log->events[log->cur_event];
> > +}
> > +
> > +uint16_t get_cur_event_handle(struct cxl_event_log *log)
> > +{
> > +    return cpu_to_le16(log->cur_event);
> > +}
> > +
> > +bool log_empty(struct cxl_event_log *log)
> > +{
> > +    return log->cur_event == log->nr_events;
> > +}
> > +
> > +int log_rec_left(struct cxl_event_log *log)
> > +{
> > +    return log->nr_events - log->cur_event;
> > +}
> > +
> > +static void event_store_add_event(CXLDeviceState *cxlds,
> > +                                  enum cxl_event_log_type log_type,
> > +                                  struct cxl_event_record_raw *event)
> > +{
> > +    struct cxl_event_log *log;
> > +
> > +    assert(log_type < CXL_EVENT_TYPE_MAX);
> > +
> > +    log = &cxlds->event_logs[log_type];
> > +    assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);
> > +
> > +    log->events[log->nr_events] = event;
> > +    log->nr_events++;
> > +}
> > +
> > +uint16_t log_overflow(struct cxl_event_log *log)
> > +{
> > +    int cnt = log_rec_left(log) - 5;
> 
> Why -5?  Can't we make it actually overflow and drop records
> if that happens?
> 
> > +
> > +    if (cnt < 0) {
> > +        return 0;
> > +    }
> > +    return cnt;
> > +}
> > +
> > +#define CXL_EVENT_RECORD_FLAG_PERMANENT         BIT(2)
> > +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED      BIT(3)
> > +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED     BIT(4)
> > +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE        BIT(5)
> > +
> > +struct cxl_event_record_raw maint_needed = {
> > +    .hdr = {
> > +        .id.data = UUID(0xDEADBEEF, 0xCAFE, 0xBABE,
> > +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > +        .length = sizeof(struct cxl_event_record_raw),
> > +        .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> > +        /* .handle = Set dynamically */
> > +        .related_handle = const_le16(0xa5b6),
> > +    },
> > +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > +};
> > +
> > +struct cxl_event_record_raw hardware_replace = {
> > +    .hdr = {
> > +        .id.data = UUID(0xBABECAFE, 0xBEEF, 0xDEAD,
> > +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > +        .length = sizeof(struct cxl_event_record_raw),
> > +        .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> > +        /* .handle = Set dynamically */
> > +        .related_handle = const_le16(0xb6a5),
> > +    },
> > +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > +};
> > +
> > +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT            BIT(0)
> > +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT               BIT(1)
> > +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW          BIT(2)
> > +
> > +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR                 0x00
> > +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR                  0x01
> > +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR           0x02
> > +
> > +#define CXL_GMER_TRANS_UNKNOWN                          0x00
> > +#define CXL_GMER_TRANS_HOST_READ                        0x01
> > +#define CXL_GMER_TRANS_HOST_WRITE                       0x02
> > +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA                  0x03
> > +#define CXL_GMER_TRANS_HOST_INJECT_POISON               0x04
> > +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB             0x05
> > +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT        0x06
> > +
> > +#define CXL_GMER_VALID_CHANNEL                          BIT(0)
> > +#define CXL_GMER_VALID_RANK                             BIT(1)
> > +#define CXL_GMER_VALID_DEVICE                           BIT(2)
> > +#define CXL_GMER_VALID_COMPONENT                        BIT(3)
> > +
> > +struct cxl_event_gen_media gen_media = {
> > +    .hdr = {
> > +        .id.data = UUID(0xfbcd0a77, 0xc260, 0x417f,
> > +                        0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> > +        .length = sizeof(struct cxl_event_gen_media),
> > +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
> > +        /* .handle = Set dynamically */
> > +        .related_handle = const_le16(0),
> > +    },
> > +    .phys_addr = const_le64(0x2000),
> > +    .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> > +    .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> > +    .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> > +    .validity_flags = { CXL_GMER_VALID_CHANNEL |
> > +                        CXL_GMER_VALID_RANK, 0 },
> > +    .channel = 1,
> > +    .rank = 30
> > +};
> > +
> > +#define CXL_DER_VALID_CHANNEL                           BIT(0)
> > +#define CXL_DER_VALID_RANK                              BIT(1)
> > +#define CXL_DER_VALID_NIBBLE                            BIT(2)
> > +#define CXL_DER_VALID_BANK_GROUP                        BIT(3)
> > +#define CXL_DER_VALID_BANK                              BIT(4)
> > +#define CXL_DER_VALID_ROW                               BIT(5)
> > +#define CXL_DER_VALID_COLUMN                            BIT(6)
> > +#define CXL_DER_VALID_CORRECTION_MASK                   BIT(7)
> > +
> > +struct cxl_event_dram dram = {
> > +    .hdr = {
> > +        .id.data = UUID(0x601dcbb3, 0x9c06, 0x4eab,
> > +                        0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> > +        .length = sizeof(struct cxl_event_dram),
> > +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
> > +        /* .handle = Set dynamically */
> > +        .related_handle = const_le16(0),
> > +    },
> > +    .phys_addr = const_le64(0x8000),
> > +    .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> > +    .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> > +    .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> > +    .validity_flags = { CXL_DER_VALID_CHANNEL |
> > +                        CXL_DER_VALID_BANK_GROUP |
> > +                        CXL_DER_VALID_BANK |
> > +                        CXL_DER_VALID_COLUMN, 0 },
> > +    .channel = 1,
> > +    .bank_group = 5,
> > +    .bank = 2,
> > +    .column = { 0xDE, 0xAD},
> > +};
> > +
> > +#define CXL_MMER_HEALTH_STATUS_CHANGE           0x00
> > +#define CXL_MMER_MEDIA_STATUS_CHANGE            0x01
> > +#define CXL_MMER_LIFE_USED_CHANGE               0x02
> > +#define CXL_MMER_TEMP_CHANGE                    0x03
> > +#define CXL_MMER_DATA_PATH_ERROR                0x04
> > +#define CXL_MMER_LAS_ERROR                      0x05
> 
> Ah this explains why I didn't find these alongside the structures.
> I'd keep them together.  If we need to put the structures in a header
> then put the defines there as well.  Puts all the spec related
> stuff in one place.
> 
> > +
> > +#define CXL_DHI_HS_MAINTENANCE_NEEDED           BIT(0)
> > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED         BIT(1)
> > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED        BIT(2)
> > +
> > +#define CXL_DHI_MS_NORMAL                                    0x00
> > +#define CXL_DHI_MS_NOT_READY                                 0x01
> > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST                    0x02
> > +#define CXL_DHI_MS_ALL_DATA_LOST                             0x03
> > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS   0x04
> > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN     0x05
> > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT           0x06
> > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS      0x07
> > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN        0x08
> > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT              0x09
> > +
> > +#define CXL_DHI_AS_NORMAL               0x0
> > +#define CXL_DHI_AS_WARNING              0x1
> > +#define CXL_DHI_AS_CRITICAL             0x2
> > +
> > +#define CXL_DHI_AS_LIFE_USED(as)        (as & 0x3)
> > +#define CXL_DHI_AS_DEV_TEMP(as)         ((as & 0xC) >> 2)
> > +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)  ((as & 0x10) >> 4)
> > +#define CXL_DHI_AS_COR_PER_ERR_CNT(as)  ((as & 0x20) >> 5)
> > +
> > +struct cxl_event_mem_module mem_module = {
> > +    .hdr = {
> > +        .id.data = UUID(0xfe927475, 0xdd59, 0x4339,
> > +                        0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
> 
> As mentioned, below a UUID define for each type in the header
> probably makes more sense than having this huge thing inline.
> 
> > +        .length = sizeof(struct cxl_event_mem_module),
> > +        /* .handle = Set dynamically */
> > +        .related_handle = const_le16(0),
> > +    },
> > +    .event_type = CXL_MMER_TEMP_CHANGE,
> > +    .info = {
> > +        .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> > +        .media_status = CXL_DHI_MS_ALL_DATA_LOST,
> > +        .add_status = (CXL_DHI_AS_CRITICAL << 2) |
> > +                       (CXL_DHI_AS_WARNING << 4) |
> > +                       (CXL_DHI_AS_WARNING << 5),
> > +        .device_temp = { 0xDE, 0xAD},
> 
> odd spacing
> 
> > +        .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
> > +        .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> 
> Could make a reasonable number up rather than deadbeef ;)
> 
> > +        .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> > +    }
> > +};
> > +
> > +void cxl_mock_add_event_logs(CXLDeviceState *cxlds)
> > +{
> 
> This is fine for initial testing, but I Think we want to be more
> sophisticated with the injection interface and allow injecting
> individual events so we can move the requirement for 'coverage'
> testing from having a representative list here to an external script
> that hits all the corners.
> 
> I can build something on top of this that lets us doing that if you like.
> I have ancient code doing the equivalent for CCIX devices that I never
> upstreamed. Would probably do it a bit differently today but principle
> is the same. Using QMP  directly rather than qmp-shell lets you do it
> as json which ends up more readable than complex command lines for this
> sort of structure command.
> 
> 
> 
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO, &maint_needed);
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> > +                          (struct cxl_event_record_raw *)&gen_media);
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> > +                          (struct cxl_event_record_raw *)&mem_module);
> > +
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &maint_needed);
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > +                          (struct cxl_event_record_raw *)&dram);
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > +                          (struct cxl_event_record_raw *)&gen_media);
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > +                          (struct cxl_event_record_raw *)&mem_module);
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > +                          (struct cxl_event_record_raw *)&dram);
> > +
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL,
> > +                          (struct cxl_event_record_raw *)&dram);
> > +}
> 
> 
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index 7b4cff569347..46c50c1c13a6 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -11,6 +11,7 @@
> >  #define CXL_DEVICE_H
> >  
> >  #include "hw/register.h"
> > +#include "hw/cxl/cxl_events.h"
> >  
> >  /*
> >   * The following is how a CXL device's Memory Device registers are laid out.
> > @@ -80,6 +81,14 @@
> >      (CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH +     \
> >       CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
> >  
> > +#define CXL_TEST_EVENT_CNT_MAX 15
> 
> Where did 15 come from?
> 
> > +
> > +struct cxl_event_log {
> > +    int cur_event;
> > +    int nr_events;
> > +    struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
> > +};
> > +
> >  typedef struct cxl_device_state {
> >      MemoryRegion device_registers;
> >  
> > @@ -119,6 +128,8 @@ typedef struct cxl_device_state {
> >  
> >      /* memory region for persistent memory, HDM */
> >      uint64_t pmem_size;
> > +
> > +    struct cxl_event_log event_logs[CXL_EVENT_TYPE_MAX];
> >  } CXLDeviceState;
> >  
> >  /* Initialize the register block for a device */
> > @@ -272,4 +283,12 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
> >  MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
> >                              unsigned size, MemTxAttrs attrs);
> >  
> > +void cxl_mock_add_event_logs(CXLDeviceState *cxlds);
> > +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type);
> > +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log);
> > +uint16_t get_cur_event_handle(struct cxl_event_log *log);
> > +bool log_empty(struct cxl_event_log *log);
> > +int log_rec_left(struct cxl_event_log *log);
> > +uint16_t log_overflow(struct cxl_event_log *log);
> > +
> >  #endif
> > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> > new file mode 100644
> > index 000000000000..255111f3dcfb
> > --- /dev/null
> > +++ b/include/hw/cxl/cxl_events.h
> > @@ -0,0 +1,173 @@
> > +/*
> > + * QEMU CXL Events
> > + *
> > + * Copyright (c) 2022 Intel
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > + * COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef CXL_EVENTS_H
> > +#define CXL_EVENTS_H
> > +
> > +#include "qemu/uuid.h"
> > +#include "hw/cxl/cxl.h"
> > +
> > +/*
> > + * Common Event Record Format
> > + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> > + */
> > +#define CXL_EVENT_REC_HDR_RES_LEN 0xf
> 
> I don't see an advantage in this define vs just
> putting the value in directly below.
> Same with similar cases - the define must makes them
> a tiny bit harder to compare with the specification when
> reviewing.
> 
> > +struct cxl_event_record_hdr {
> > +    QemuUUID id;
> > +    uint8_t length;
> > +    uint8_t flags[3];
> > +    uint16_t handle;
> > +    uint16_t related_handle;
> > +    uint64_t timestamp;
> > +    uint8_t maint_op_class;
> > +    uint8_t reserved[CXL_EVENT_REC_HDR_RES_LEN];
> > +} QEMU_PACKED;
> > +
> > +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> > +struct cxl_event_record_raw {
> > +    struct cxl_event_record_hdr hdr;
> > +    uint8_t data[CXL_EVENT_RECORD_DATA_LENGTH];
> > +} QEMU_PACKED;
> 
> Hmm. I wonder if we should instead define this as a union of
> the known event types?  I haven't checked if it would work
> everywhere yet though.
> 
> > +
> > +/*
> > + * Get Event Records output payload
> > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> > + *
> > + * Space given for 1 record
> > + */
> > +#define CXL_GET_EVENT_FLAG_OVERFLOW     BIT(0)
> > +#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
> > +struct cxl_get_event_payload {
> > +    uint8_t flags;
> > +    uint8_t reserved1;
> > +    uint16_t overflow_err_count;
> > +    uint64_t first_overflow_timestamp;
> > +    uint64_t last_overflow_timestamp;
> > +    uint16_t record_count;
> > +    uint8_t reserved2[0xa];
> > +    struct cxl_event_record_raw record;
> 
> This last element should be a [] array and then move
> the handling of different record counts to the places it
> is used.
> 
> Spec unfortunately says that we should return as many
> as we can fit, so we can't rely on the users of this interface
> only sending a request for one record (as I think your Linux
> kernel code currently does). See below for more on this...
> 
> 
> > +} QEMU_PACKED;
> > +
> > +/*
> > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> > + */
> > +enum cxl_event_log_type {
> > +    CXL_EVENT_TYPE_INFO = 0x00,
> > +    CXL_EVENT_TYPE_WARN,
> > +    CXL_EVENT_TYPE_FAIL,
> > +    CXL_EVENT_TYPE_FATAL,
> > +    CXL_EVENT_TYPE_DYNAMIC_CAP,
> > +    CXL_EVENT_TYPE_MAX
> > +};
> > +
> > +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> > +{
> > +    switch (type) {
> > +    case CXL_EVENT_TYPE_INFO:
> > +        return "Informational";
> > +    case CXL_EVENT_TYPE_WARN:
> > +        return "Warning";
> > +    case CXL_EVENT_TYPE_FAIL:
> > +        return "Failure";
> > +    case CXL_EVENT_TYPE_FATAL:
> > +        return "Fatal";
> > +    case CXL_EVENT_TYPE_DYNAMIC_CAP:
> > +        return "Dynamic Capacity";
> > +    default:
> > +        break;
> > +    }
> > +    return "<unknown>";
> > +}
> > +
> > +/*
> > + * Clear Event Records input payload
> > + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> > + *
> > + * Space given for 1 record
> 
> I'd rather this was defined to have a trailing variable length
> array of handles and allocations and then wherever it was used
> we deal with the length.
> 
> I'm also nervous about limiting the qemu emulation to handling only
> one record.. Spec wise I don't think you are allowed to say
> no to larger clears.  I understand the fact we can't test this today
> with the kernel code but maybe we can hack together enough to
> verify the emulation of larger gets and clears...
> 
> 
> > + */
> > +struct cxl_mbox_clear_event_payload {
> > +    uint8_t event_log;      /* enum cxl_event_log_type */
> > +    uint8_t clear_flags;
> > +    uint8_t nr_recs;        /* 1 for this struct */
> > +    uint8_t reserved[3];
> > +    uint16_t handle;
> > +};
> > +
> > +/*
> > + * General Media Event Record
> > + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> > + */
> 
> In interests of keeping everything that needs checking against
> a chunk of the spec together, perhaps it's worth adding appropriate
> defines for the UUIDs?
> 
> > +#define CXL_EVENT_GEN_MED_COMP_ID_SIZE  0x10
> > +#define CXL_EVENT_GEN_MED_RES_SIZE      0x2e
> 
> As above, I'd rather see these inline.
> 
> > +struct cxl_event_gen_media {
> > +    struct cxl_event_record_hdr hdr;
> > +    uint64_t phys_addr;
> Defines for the mask + that we have a few things hiding in
> the bottom bits?
> 
> > +    uint8_t descriptor;
> Defines for the various fields in here?
> 
> > +    uint8_t type;
> Same for the others that follow.
> 
> > +    uint8_t transaction_type;
> 
> > +    uint8_t validity_flags[2];
> 
> uint16_t probably makes sense as we can do that for this one (unlike the helpful le24 flags fields
> in other structures).
> 
> > +    uint8_t channel;
> > +    uint8_t rank;
> > +    uint8_t device[3];
> > +    uint8_t component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
> > +    uint8_t reserved[CXL_EVENT_GEN_MED_RES_SIZE];
> > +} QEMU_PACKED;
> Would be nice to add a build time check that these structures have the correct
> overall size. Ben did a bunch of these in the other CXL emulation and they are
> a handy way to reassure reviewers that it adds up right!
> 
> > +
> > +/*
> > + * DRAM Event Record - DER
> > + * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
> > + */
> > +#define CXL_EVENT_DER_CORRECTION_MASK_SIZE   0x20
> > +#define CXL_EVENT_DER_RES_SIZE               0x17
> Same as above.
> 
> > +struct cxl_event_dram {
> > +    struct cxl_event_record_hdr hdr;
> > +    uint64_t phys_addr;
> As before I'd like defines for the sub fields and masks.
> 
> > +    uint8_t descriptor;
> > +    uint8_t type;
> > +    uint8_t transaction_type;
> > +    uint8_t validity_flags[2];
> uint16_t and same in similar cases.
> 
> > +    uint8_t channel;
> > +    uint8_t rank;
> > +    uint8_t nibble_mask[3];
> > +    uint8_t bank_group;
> > +    uint8_t bank;
> > +    uint8_t row[3];
> > +    uint8_t column[2];
> > +    uint8_t correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
> > +    uint8_t reserved[CXL_EVENT_DER_RES_SIZE];
> > +} QEMU_PACKED;
> > +
> > +/*
> > + * Get Health Info Record
> > + * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
> > + */
> > +struct cxl_get_health_info {
> Same stuff as for earlier structures.
> 
> > +    uint8_t health_status;
> > +    uint8_t media_status;
> > +    uint8_t add_status;
> > +    uint8_t life_used;
> > +    uint8_t device_temp[2];
> > +    uint8_t dirty_shutdown_cnt[4];
> > +    uint8_t cor_vol_err_cnt[4];
> > +    uint8_t cor_per_err_cnt[4];
> > +} QEMU_PACKED;
> > +
> > +/*
> > + * Memory Module Event Record
> > + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> > + */
> > +#define CXL_EVENT_MEM_MOD_RES_SIZE  0x3d
> > +struct cxl_event_mem_module {
> > +    struct cxl_event_record_hdr hdr;
> > +    uint8_t event_type;
> > +    struct cxl_get_health_info info;
> > +    uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
> > +} QEMU_PACKED;
> > +
> > +#endif /* CXL_EVENTS_H */
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
  2022-10-14  0:21     ` Ira Weiny
@ 2022-10-17 15:57         ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2022-10-17 15:57 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Thu, 13 Oct 2022 17:21:56 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Tue, Oct 11, 2022 at 11:07:59AM +0100, Jonathan Cameron wrote:
> > On Mon, 10 Oct 2022 15:29:41 -0700
> > ira.weiny@intel.com wrote:
> >   
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > To facilitate testing of guest software add mock events and code to
> > > support iterating through the event logs.
> > > 
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>  
> > 
> > Various comments inline, but biggest one is I'd like to see
> > a much more flexible injection interface.  Happy to help code one
> > up if that is useful.  
> 
> Quick response to this.
> 
> I thought about holding off and doing something like that but this got the irq
> testing in the kernel off the ground.
> 
> I think it would be cool to use QMP to submit events as json.  That would be
> much more flexible.  But would have taken a lot more time.

The qapi code gen infrastructure makes this fairly simple (subject to fighting
with meson - with hindsight the same fight I had with it for other stubs...)

I've moved the poison injection patches over to this and will hopefully send
a RFC v2 of those out tomorrow.

For reference injection of poison now looks like

{ "execute": "cxl-inject-poison",
    "arguments": {
        "path": "/machine/peripheral/cxl-pmem0",
	"start": 2048,
	"length": 256
   }
}

defined via a new cxl.json that other than comments just contains
{ 'command': 'cxl-inject-poison',
  'data' : { 'path': 'str, 'start': 'uint64', 'length': uint64 }}

from that all the json parsing infrastructure is generated and you just
need to provide an implementation of

void qmp_cxl_inject_poison(const char *path, uint64_t start, uint64_t length,
			  Error *errp)

Something similar for these events will be very straight forward.

Jonathan

> 
> What I did below duplicated the test code cxl-test has.  It was pretty quick to
> do that.
> 
> The biggest issue with is parsing the various events from the json to binary blobs.
> 
> I'll clean up this patch and see what I can do.  But I think having a set of
> statically defined blobs which can be injected would make testing a bit easier.
> Less framework to format json input to QMP.
> 
> More to come...
> 
> Ira
> 
> > 
> > Jonathan
> > 
> >   
> > > ---
> > >  hw/cxl/cxl-events.c         | 248 ++++++++++++++++++++++++++++++++++++
> > >  hw/cxl/meson.build          |   1 +
> > >  include/hw/cxl/cxl_device.h |  19 +++
> > >  include/hw/cxl/cxl_events.h | 173 +++++++++++++++++++++++++
> > >  4 files changed, 441 insertions(+)
> > >  create mode 100644 hw/cxl/cxl-events.c
> > >  create mode 100644 include/hw/cxl/cxl_events.h
> > > 
> > > diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
> > > new file mode 100644
> > > index 000000000000..c275280bcb64
> > > --- /dev/null
> > > +++ b/hw/cxl/cxl-events.c
> > > @@ -0,0 +1,248 @@
> > > +/*
> > > + * CXL Event processing
> > > + *
> > > + * Copyright(C) 2022 Intel Corporation.
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > > + * COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qemu/bswap.h"
> > > +#include "qemu/typedefs.h"
> > > +#include "hw/cxl/cxl.h"
> > > +#include "hw/cxl/cxl_events.h"
> > > +
> > > +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type)
> > > +{
> > > +    if (log_type >= CXL_EVENT_TYPE_MAX) {
> > > +        return NULL;
> > > +    }
> > > +    return &cxlds->event_logs[log_type];
> > > +}
> > > +
> > > +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log)
> > > +{
> > > +    return log->events[log->cur_event];
> > > +}
> > > +
> > > +uint16_t get_cur_event_handle(struct cxl_event_log *log)
> > > +{
> > > +    return cpu_to_le16(log->cur_event);
> > > +}
> > > +
> > > +bool log_empty(struct cxl_event_log *log)
> > > +{
> > > +    return log->cur_event == log->nr_events;
> > > +}
> > > +
> > > +int log_rec_left(struct cxl_event_log *log)
> > > +{
> > > +    return log->nr_events - log->cur_event;
> > > +}
> > > +
> > > +static void event_store_add_event(CXLDeviceState *cxlds,
> > > +                                  enum cxl_event_log_type log_type,
> > > +                                  struct cxl_event_record_raw *event)
> > > +{
> > > +    struct cxl_event_log *log;
> > > +
> > > +    assert(log_type < CXL_EVENT_TYPE_MAX);
> > > +
> > > +    log = &cxlds->event_logs[log_type];
> > > +    assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);
> > > +
> > > +    log->events[log->nr_events] = event;
> > > +    log->nr_events++;
> > > +}
> > > +
> > > +uint16_t log_overflow(struct cxl_event_log *log)
> > > +{
> > > +    int cnt = log_rec_left(log) - 5;  
> > 
> > Why -5?  Can't we make it actually overflow and drop records
> > if that happens?
> >   
> > > +
> > > +    if (cnt < 0) {
> > > +        return 0;
> > > +    }
> > > +    return cnt;
> > > +}
> > > +
> > > +#define CXL_EVENT_RECORD_FLAG_PERMANENT         BIT(2)
> > > +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED      BIT(3)
> > > +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED     BIT(4)
> > > +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE        BIT(5)
> > > +
> > > +struct cxl_event_record_raw maint_needed = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0xDEADBEEF, 0xCAFE, 0xBABE,
> > > +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > > +        .length = sizeof(struct cxl_event_record_raw),
> > > +        .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0xa5b6),
> > > +    },
> > > +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > > +};
> > > +
> > > +struct cxl_event_record_raw hardware_replace = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0xBABECAFE, 0xBEEF, 0xDEAD,
> > > +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > > +        .length = sizeof(struct cxl_event_record_raw),
> > > +        .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0xb6a5),
> > > +    },
> > > +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > > +};
> > > +
> > > +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT            BIT(0)
> > > +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT               BIT(1)
> > > +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW          BIT(2)
> > > +
> > > +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR                 0x00
> > > +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR                  0x01
> > > +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR           0x02
> > > +
> > > +#define CXL_GMER_TRANS_UNKNOWN                          0x00
> > > +#define CXL_GMER_TRANS_HOST_READ                        0x01
> > > +#define CXL_GMER_TRANS_HOST_WRITE                       0x02
> > > +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA                  0x03
> > > +#define CXL_GMER_TRANS_HOST_INJECT_POISON               0x04
> > > +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB             0x05
> > > +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT        0x06
> > > +
> > > +#define CXL_GMER_VALID_CHANNEL                          BIT(0)
> > > +#define CXL_GMER_VALID_RANK                             BIT(1)
> > > +#define CXL_GMER_VALID_DEVICE                           BIT(2)
> > > +#define CXL_GMER_VALID_COMPONENT                        BIT(3)
> > > +
> > > +struct cxl_event_gen_media gen_media = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0xfbcd0a77, 0xc260, 0x417f,
> > > +                        0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> > > +        .length = sizeof(struct cxl_event_gen_media),
> > > +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0),
> > > +    },
> > > +    .phys_addr = const_le64(0x2000),
> > > +    .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> > > +    .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> > > +    .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> > > +    .validity_flags = { CXL_GMER_VALID_CHANNEL |
> > > +                        CXL_GMER_VALID_RANK, 0 },
> > > +    .channel = 1,
> > > +    .rank = 30
> > > +};
> > > +
> > > +#define CXL_DER_VALID_CHANNEL                           BIT(0)
> > > +#define CXL_DER_VALID_RANK                              BIT(1)
> > > +#define CXL_DER_VALID_NIBBLE                            BIT(2)
> > > +#define CXL_DER_VALID_BANK_GROUP                        BIT(3)
> > > +#define CXL_DER_VALID_BANK                              BIT(4)
> > > +#define CXL_DER_VALID_ROW                               BIT(5)
> > > +#define CXL_DER_VALID_COLUMN                            BIT(6)
> > > +#define CXL_DER_VALID_CORRECTION_MASK                   BIT(7)
> > > +
> > > +struct cxl_event_dram dram = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0x601dcbb3, 0x9c06, 0x4eab,
> > > +                        0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> > > +        .length = sizeof(struct cxl_event_dram),
> > > +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0),
> > > +    },
> > > +    .phys_addr = const_le64(0x8000),
> > > +    .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> > > +    .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> > > +    .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> > > +    .validity_flags = { CXL_DER_VALID_CHANNEL |
> > > +                        CXL_DER_VALID_BANK_GROUP |
> > > +                        CXL_DER_VALID_BANK |
> > > +                        CXL_DER_VALID_COLUMN, 0 },
> > > +    .channel = 1,
> > > +    .bank_group = 5,
> > > +    .bank = 2,
> > > +    .column = { 0xDE, 0xAD},
> > > +};
> > > +
> > > +#define CXL_MMER_HEALTH_STATUS_CHANGE           0x00
> > > +#define CXL_MMER_MEDIA_STATUS_CHANGE            0x01
> > > +#define CXL_MMER_LIFE_USED_CHANGE               0x02
> > > +#define CXL_MMER_TEMP_CHANGE                    0x03
> > > +#define CXL_MMER_DATA_PATH_ERROR                0x04
> > > +#define CXL_MMER_LAS_ERROR                      0x05  
> > 
> > Ah this explains why I didn't find these alongside the structures.
> > I'd keep them together.  If we need to put the structures in a header
> > then put the defines there as well.  Puts all the spec related
> > stuff in one place.
> >   
> > > +
> > > +#define CXL_DHI_HS_MAINTENANCE_NEEDED           BIT(0)
> > > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED         BIT(1)
> > > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED        BIT(2)
> > > +
> > > +#define CXL_DHI_MS_NORMAL                                    0x00
> > > +#define CXL_DHI_MS_NOT_READY                                 0x01
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST                    0x02
> > > +#define CXL_DHI_MS_ALL_DATA_LOST                             0x03
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS   0x04
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN     0x05
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT           0x06
> > > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS      0x07
> > > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN        0x08
> > > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT              0x09
> > > +
> > > +#define CXL_DHI_AS_NORMAL               0x0
> > > +#define CXL_DHI_AS_WARNING              0x1
> > > +#define CXL_DHI_AS_CRITICAL             0x2
> > > +
> > > +#define CXL_DHI_AS_LIFE_USED(as)        (as & 0x3)
> > > +#define CXL_DHI_AS_DEV_TEMP(as)         ((as & 0xC) >> 2)
> > > +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)  ((as & 0x10) >> 4)
> > > +#define CXL_DHI_AS_COR_PER_ERR_CNT(as)  ((as & 0x20) >> 5)
> > > +
> > > +struct cxl_event_mem_module mem_module = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0xfe927475, 0xdd59, 0x4339,
> > > +                        0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),  
> > 
> > As mentioned, below a UUID define for each type in the header
> > probably makes more sense than having this huge thing inline.
> >   
> > > +        .length = sizeof(struct cxl_event_mem_module),
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0),
> > > +    },
> > > +    .event_type = CXL_MMER_TEMP_CHANGE,
> > > +    .info = {
> > > +        .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> > > +        .media_status = CXL_DHI_MS_ALL_DATA_LOST,
> > > +        .add_status = (CXL_DHI_AS_CRITICAL << 2) |
> > > +                       (CXL_DHI_AS_WARNING << 4) |
> > > +                       (CXL_DHI_AS_WARNING << 5),
> > > +        .device_temp = { 0xDE, 0xAD},  
> > 
> > odd spacing
> >   
> > > +        .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
> > > +        .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },  
> > 
> > Could make a reasonable number up rather than deadbeef ;)
> >   
> > > +        .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> > > +    }
> > > +};
> > > +
> > > +void cxl_mock_add_event_logs(CXLDeviceState *cxlds)
> > > +{  
> > 
> > This is fine for initial testing, but I Think we want to be more
> > sophisticated with the injection interface and allow injecting
> > individual events so we can move the requirement for 'coverage'
> > testing from having a representative list here to an external script
> > that hits all the corners.
> > 
> > I can build something on top of this that lets us doing that if you like.
> > I have ancient code doing the equivalent for CCIX devices that I never
> > upstreamed. Would probably do it a bit differently today but principle
> > is the same. Using QMP  directly rather than qmp-shell lets you do it
> > as json which ends up more readable than complex command lines for this
> > sort of structure command.
> > 
> > 
> >   
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO, &maint_needed);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> > > +                          (struct cxl_event_record_raw *)&gen_media);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> > > +                          (struct cxl_event_record_raw *)&mem_module);
> > > +
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &maint_needed);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > +                          (struct cxl_event_record_raw *)&dram);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > +                          (struct cxl_event_record_raw *)&gen_media);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > +                          (struct cxl_event_record_raw *)&mem_module);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > +                          (struct cxl_event_record_raw *)&dram);
> > > +
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL,
> > > +                          (struct cxl_event_record_raw *)&dram);
> > > +}  
> > 
> >   
> > > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > > index 7b4cff569347..46c50c1c13a6 100644
> > > --- a/include/hw/cxl/cxl_device.h
> > > +++ b/include/hw/cxl/cxl_device.h
> > > @@ -11,6 +11,7 @@
> > >  #define CXL_DEVICE_H
> > >  
> > >  #include "hw/register.h"
> > > +#include "hw/cxl/cxl_events.h"
> > >  
> > >  /*
> > >   * The following is how a CXL device's Memory Device registers are laid out.
> > > @@ -80,6 +81,14 @@
> > >      (CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH +     \
> > >       CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
> > >  
> > > +#define CXL_TEST_EVENT_CNT_MAX 15  
> > 
> > Where did 15 come from?
> >   
> > > +
> > > +struct cxl_event_log {
> > > +    int cur_event;
> > > +    int nr_events;
> > > +    struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
> > > +};
> > > +
> > >  typedef struct cxl_device_state {
> > >      MemoryRegion device_registers;
> > >  
> > > @@ -119,6 +128,8 @@ typedef struct cxl_device_state {
> > >  
> > >      /* memory region for persistent memory, HDM */
> > >      uint64_t pmem_size;
> > > +
> > > +    struct cxl_event_log event_logs[CXL_EVENT_TYPE_MAX];
> > >  } CXLDeviceState;
> > >  
> > >  /* Initialize the register block for a device */
> > > @@ -272,4 +283,12 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
> > >  MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
> > >                              unsigned size, MemTxAttrs attrs);
> > >  
> > > +void cxl_mock_add_event_logs(CXLDeviceState *cxlds);
> > > +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type);
> > > +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log);
> > > +uint16_t get_cur_event_handle(struct cxl_event_log *log);
> > > +bool log_empty(struct cxl_event_log *log);
> > > +int log_rec_left(struct cxl_event_log *log);
> > > +uint16_t log_overflow(struct cxl_event_log *log);
> > > +
> > >  #endif
> > > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> > > new file mode 100644
> > > index 000000000000..255111f3dcfb
> > > --- /dev/null
> > > +++ b/include/hw/cxl/cxl_events.h
> > > @@ -0,0 +1,173 @@
> > > +/*
> > > + * QEMU CXL Events
> > > + *
> > > + * Copyright (c) 2022 Intel
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > > + * COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#ifndef CXL_EVENTS_H
> > > +#define CXL_EVENTS_H
> > > +
> > > +#include "qemu/uuid.h"
> > > +#include "hw/cxl/cxl.h"
> > > +
> > > +/*
> > > + * Common Event Record Format
> > > + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> > > + */
> > > +#define CXL_EVENT_REC_HDR_RES_LEN 0xf  
> > 
> > I don't see an advantage in this define vs just
> > putting the value in directly below.
> > Same with similar cases - the define must makes them
> > a tiny bit harder to compare with the specification when
> > reviewing.
> >   
> > > +struct cxl_event_record_hdr {
> > > +    QemuUUID id;
> > > +    uint8_t length;
> > > +    uint8_t flags[3];
> > > +    uint16_t handle;
> > > +    uint16_t related_handle;
> > > +    uint64_t timestamp;
> > > +    uint8_t maint_op_class;
> > > +    uint8_t reserved[CXL_EVENT_REC_HDR_RES_LEN];
> > > +} QEMU_PACKED;
> > > +
> > > +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> > > +struct cxl_event_record_raw {
> > > +    struct cxl_event_record_hdr hdr;
> > > +    uint8_t data[CXL_EVENT_RECORD_DATA_LENGTH];
> > > +} QEMU_PACKED;  
> > 
> > Hmm. I wonder if we should instead define this as a union of
> > the known event types?  I haven't checked if it would work
> > everywhere yet though.
> >   
> > > +
> > > +/*
> > > + * Get Event Records output payload
> > > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> > > + *
> > > + * Space given for 1 record
> > > + */
> > > +#define CXL_GET_EVENT_FLAG_OVERFLOW     BIT(0)
> > > +#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
> > > +struct cxl_get_event_payload {
> > > +    uint8_t flags;
> > > +    uint8_t reserved1;
> > > +    uint16_t overflow_err_count;
> > > +    uint64_t first_overflow_timestamp;
> > > +    uint64_t last_overflow_timestamp;
> > > +    uint16_t record_count;
> > > +    uint8_t reserved2[0xa];
> > > +    struct cxl_event_record_raw record;  
> > 
> > This last element should be a [] array and then move
> > the handling of different record counts to the places it
> > is used.
> > 
> > Spec unfortunately says that we should return as many
> > as we can fit, so we can't rely on the users of this interface
> > only sending a request for one record (as I think your Linux
> > kernel code currently does). See below for more on this...
> > 
> >   
> > > +} QEMU_PACKED;
> > > +
> > > +/*
> > > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> > > + */
> > > +enum cxl_event_log_type {
> > > +    CXL_EVENT_TYPE_INFO = 0x00,
> > > +    CXL_EVENT_TYPE_WARN,
> > > +    CXL_EVENT_TYPE_FAIL,
> > > +    CXL_EVENT_TYPE_FATAL,
> > > +    CXL_EVENT_TYPE_DYNAMIC_CAP,
> > > +    CXL_EVENT_TYPE_MAX
> > > +};
> > > +
> > > +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> > > +{
> > > +    switch (type) {
> > > +    case CXL_EVENT_TYPE_INFO:
> > > +        return "Informational";
> > > +    case CXL_EVENT_TYPE_WARN:
> > > +        return "Warning";
> > > +    case CXL_EVENT_TYPE_FAIL:
> > > +        return "Failure";
> > > +    case CXL_EVENT_TYPE_FATAL:
> > > +        return "Fatal";
> > > +    case CXL_EVENT_TYPE_DYNAMIC_CAP:
> > > +        return "Dynamic Capacity";
> > > +    default:
> > > +        break;
> > > +    }
> > > +    return "<unknown>";
> > > +}
> > > +
> > > +/*
> > > + * Clear Event Records input payload
> > > + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> > > + *
> > > + * Space given for 1 record  
> > 
> > I'd rather this was defined to have a trailing variable length
> > array of handles and allocations and then wherever it was used
> > we deal with the length.
> > 
> > I'm also nervous about limiting the qemu emulation to handling only
> > one record.. Spec wise I don't think you are allowed to say
> > no to larger clears.  I understand the fact we can't test this today
> > with the kernel code but maybe we can hack together enough to
> > verify the emulation of larger gets and clears...
> > 
> >   
> > > + */
> > > +struct cxl_mbox_clear_event_payload {
> > > +    uint8_t event_log;      /* enum cxl_event_log_type */
> > > +    uint8_t clear_flags;
> > > +    uint8_t nr_recs;        /* 1 for this struct */
> > > +    uint8_t reserved[3];
> > > +    uint16_t handle;
> > > +};
> > > +
> > > +/*
> > > + * General Media Event Record
> > > + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> > > + */  
> > 
> > In interests of keeping everything that needs checking against
> > a chunk of the spec together, perhaps it's worth adding appropriate
> > defines for the UUIDs?
> >   
> > > +#define CXL_EVENT_GEN_MED_COMP_ID_SIZE  0x10
> > > +#define CXL_EVENT_GEN_MED_RES_SIZE      0x2e  
> > 
> > As above, I'd rather see these inline.
> >   
> > > +struct cxl_event_gen_media {
> > > +    struct cxl_event_record_hdr hdr;
> > > +    uint64_t phys_addr;  
> > Defines for the mask + that we have a few things hiding in
> > the bottom bits?
> >   
> > > +    uint8_t descriptor;  
> > Defines for the various fields in here?
> >   
> > > +    uint8_t type;  
> > Same for the others that follow.
> >   
> > > +    uint8_t transaction_type;  
> >   
> > > +    uint8_t validity_flags[2];  
> > 
> > uint16_t probably makes sense as we can do that for this one (unlike the helpful le24 flags fields
> > in other structures).
> >   
> > > +    uint8_t channel;
> > > +    uint8_t rank;
> > > +    uint8_t device[3];
> > > +    uint8_t component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
> > > +    uint8_t reserved[CXL_EVENT_GEN_MED_RES_SIZE];
> > > +} QEMU_PACKED;  
> > Would be nice to add a build time check that these structures have the correct
> > overall size. Ben did a bunch of these in the other CXL emulation and they are
> > a handy way to reassure reviewers that it adds up right!
> >   
> > > +
> > > +/*
> > > + * DRAM Event Record - DER
> > > + * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
> > > + */
> > > +#define CXL_EVENT_DER_CORRECTION_MASK_SIZE   0x20
> > > +#define CXL_EVENT_DER_RES_SIZE               0x17  
> > Same as above.
> >   
> > > +struct cxl_event_dram {
> > > +    struct cxl_event_record_hdr hdr;
> > > +    uint64_t phys_addr;  
> > As before I'd like defines for the sub fields and masks.
> >   
> > > +    uint8_t descriptor;
> > > +    uint8_t type;
> > > +    uint8_t transaction_type;
> > > +    uint8_t validity_flags[2];  
> > uint16_t and same in similar cases.
> >   
> > > +    uint8_t channel;
> > > +    uint8_t rank;
> > > +    uint8_t nibble_mask[3];
> > > +    uint8_t bank_group;
> > > +    uint8_t bank;
> > > +    uint8_t row[3];
> > > +    uint8_t column[2];
> > > +    uint8_t correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
> > > +    uint8_t reserved[CXL_EVENT_DER_RES_SIZE];
> > > +} QEMU_PACKED;
> > > +
> > > +/*
> > > + * Get Health Info Record
> > > + * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
> > > + */
> > > +struct cxl_get_health_info {  
> > Same stuff as for earlier structures.
> >   
> > > +    uint8_t health_status;
> > > +    uint8_t media_status;
> > > +    uint8_t add_status;
> > > +    uint8_t life_used;
> > > +    uint8_t device_temp[2];
> > > +    uint8_t dirty_shutdown_cnt[4];
> > > +    uint8_t cor_vol_err_cnt[4];
> > > +    uint8_t cor_per_err_cnt[4];
> > > +} QEMU_PACKED;
> > > +
> > > +/*
> > > + * Memory Module Event Record
> > > + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> > > + */
> > > +#define CXL_EVENT_MEM_MOD_RES_SIZE  0x3d
> > > +struct cxl_event_mem_module {
> > > +    struct cxl_event_record_hdr hdr;
> > > +    uint8_t event_type;
> > > +    struct cxl_get_health_info info;
> > > +    uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
> > > +} QEMU_PACKED;
> > > +
> > > +#endif /* CXL_EVENTS_H */  
> >   
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
@ 2022-10-17 15:57         ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron via @ 2022-10-17 15:57 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Thu, 13 Oct 2022 17:21:56 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> On Tue, Oct 11, 2022 at 11:07:59AM +0100, Jonathan Cameron wrote:
> > On Mon, 10 Oct 2022 15:29:41 -0700
> > ira.weiny@intel.com wrote:
> >   
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > To facilitate testing of guest software add mock events and code to
> > > support iterating through the event logs.
> > > 
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>  
> > 
> > Various comments inline, but biggest one is I'd like to see
> > a much more flexible injection interface.  Happy to help code one
> > up if that is useful.  
> 
> Quick response to this.
> 
> I thought about holding off and doing something like that but this got the irq
> testing in the kernel off the ground.
> 
> I think it would be cool to use QMP to submit events as json.  That would be
> much more flexible.  But would have taken a lot more time.

The qapi code gen infrastructure makes this fairly simple (subject to fighting
with meson - with hindsight the same fight I had with it for other stubs...)

I've moved the poison injection patches over to this and will hopefully send
a RFC v2 of those out tomorrow.

For reference injection of poison now looks like

{ "execute": "cxl-inject-poison",
    "arguments": {
        "path": "/machine/peripheral/cxl-pmem0",
	"start": 2048,
	"length": 256
   }
}

defined via a new cxl.json that other than comments just contains
{ 'command': 'cxl-inject-poison',
  'data' : { 'path': 'str, 'start': 'uint64', 'length': uint64 }}

from that all the json parsing infrastructure is generated and you just
need to provide an implementation of

void qmp_cxl_inject_poison(const char *path, uint64_t start, uint64_t length,
			  Error *errp)

Something similar for these events will be very straight forward.

Jonathan

> 
> What I did below duplicated the test code cxl-test has.  It was pretty quick to
> do that.
> 
> The biggest issue with is parsing the various events from the json to binary blobs.
> 
> I'll clean up this patch and see what I can do.  But I think having a set of
> statically defined blobs which can be injected would make testing a bit easier.
> Less framework to format json input to QMP.
> 
> More to come...
> 
> Ira
> 
> > 
> > Jonathan
> > 
> >   
> > > ---
> > >  hw/cxl/cxl-events.c         | 248 ++++++++++++++++++++++++++++++++++++
> > >  hw/cxl/meson.build          |   1 +
> > >  include/hw/cxl/cxl_device.h |  19 +++
> > >  include/hw/cxl/cxl_events.h | 173 +++++++++++++++++++++++++
> > >  4 files changed, 441 insertions(+)
> > >  create mode 100644 hw/cxl/cxl-events.c
> > >  create mode 100644 include/hw/cxl/cxl_events.h
> > > 
> > > diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
> > > new file mode 100644
> > > index 000000000000..c275280bcb64
> > > --- /dev/null
> > > +++ b/hw/cxl/cxl-events.c
> > > @@ -0,0 +1,248 @@
> > > +/*
> > > + * CXL Event processing
> > > + *
> > > + * Copyright(C) 2022 Intel Corporation.
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > > + * COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qemu/bswap.h"
> > > +#include "qemu/typedefs.h"
> > > +#include "hw/cxl/cxl.h"
> > > +#include "hw/cxl/cxl_events.h"
> > > +
> > > +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type)
> > > +{
> > > +    if (log_type >= CXL_EVENT_TYPE_MAX) {
> > > +        return NULL;
> > > +    }
> > > +    return &cxlds->event_logs[log_type];
> > > +}
> > > +
> > > +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log)
> > > +{
> > > +    return log->events[log->cur_event];
> > > +}
> > > +
> > > +uint16_t get_cur_event_handle(struct cxl_event_log *log)
> > > +{
> > > +    return cpu_to_le16(log->cur_event);
> > > +}
> > > +
> > > +bool log_empty(struct cxl_event_log *log)
> > > +{
> > > +    return log->cur_event == log->nr_events;
> > > +}
> > > +
> > > +int log_rec_left(struct cxl_event_log *log)
> > > +{
> > > +    return log->nr_events - log->cur_event;
> > > +}
> > > +
> > > +static void event_store_add_event(CXLDeviceState *cxlds,
> > > +                                  enum cxl_event_log_type log_type,
> > > +                                  struct cxl_event_record_raw *event)
> > > +{
> > > +    struct cxl_event_log *log;
> > > +
> > > +    assert(log_type < CXL_EVENT_TYPE_MAX);
> > > +
> > > +    log = &cxlds->event_logs[log_type];
> > > +    assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);
> > > +
> > > +    log->events[log->nr_events] = event;
> > > +    log->nr_events++;
> > > +}
> > > +
> > > +uint16_t log_overflow(struct cxl_event_log *log)
> > > +{
> > > +    int cnt = log_rec_left(log) - 5;  
> > 
> > Why -5?  Can't we make it actually overflow and drop records
> > if that happens?
> >   
> > > +
> > > +    if (cnt < 0) {
> > > +        return 0;
> > > +    }
> > > +    return cnt;
> > > +}
> > > +
> > > +#define CXL_EVENT_RECORD_FLAG_PERMANENT         BIT(2)
> > > +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED      BIT(3)
> > > +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED     BIT(4)
> > > +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE        BIT(5)
> > > +
> > > +struct cxl_event_record_raw maint_needed = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0xDEADBEEF, 0xCAFE, 0xBABE,
> > > +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > > +        .length = sizeof(struct cxl_event_record_raw),
> > > +        .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0xa5b6),
> > > +    },
> > > +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > > +};
> > > +
> > > +struct cxl_event_record_raw hardware_replace = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0xBABECAFE, 0xBEEF, 0xDEAD,
> > > +                        0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > > +        .length = sizeof(struct cxl_event_record_raw),
> > > +        .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0xb6a5),
> > > +    },
> > > +    .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > > +};
> > > +
> > > +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT            BIT(0)
> > > +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT               BIT(1)
> > > +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW          BIT(2)
> > > +
> > > +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR                 0x00
> > > +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR                  0x01
> > > +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR           0x02
> > > +
> > > +#define CXL_GMER_TRANS_UNKNOWN                          0x00
> > > +#define CXL_GMER_TRANS_HOST_READ                        0x01
> > > +#define CXL_GMER_TRANS_HOST_WRITE                       0x02
> > > +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA                  0x03
> > > +#define CXL_GMER_TRANS_HOST_INJECT_POISON               0x04
> > > +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB             0x05
> > > +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT        0x06
> > > +
> > > +#define CXL_GMER_VALID_CHANNEL                          BIT(0)
> > > +#define CXL_GMER_VALID_RANK                             BIT(1)
> > > +#define CXL_GMER_VALID_DEVICE                           BIT(2)
> > > +#define CXL_GMER_VALID_COMPONENT                        BIT(3)
> > > +
> > > +struct cxl_event_gen_media gen_media = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0xfbcd0a77, 0xc260, 0x417f,
> > > +                        0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> > > +        .length = sizeof(struct cxl_event_gen_media),
> > > +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0),
> > > +    },
> > > +    .phys_addr = const_le64(0x2000),
> > > +    .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> > > +    .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> > > +    .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> > > +    .validity_flags = { CXL_GMER_VALID_CHANNEL |
> > > +                        CXL_GMER_VALID_RANK, 0 },
> > > +    .channel = 1,
> > > +    .rank = 30
> > > +};
> > > +
> > > +#define CXL_DER_VALID_CHANNEL                           BIT(0)
> > > +#define CXL_DER_VALID_RANK                              BIT(1)
> > > +#define CXL_DER_VALID_NIBBLE                            BIT(2)
> > > +#define CXL_DER_VALID_BANK_GROUP                        BIT(3)
> > > +#define CXL_DER_VALID_BANK                              BIT(4)
> > > +#define CXL_DER_VALID_ROW                               BIT(5)
> > > +#define CXL_DER_VALID_COLUMN                            BIT(6)
> > > +#define CXL_DER_VALID_CORRECTION_MASK                   BIT(7)
> > > +
> > > +struct cxl_event_dram dram = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0x601dcbb3, 0x9c06, 0x4eab,
> > > +                        0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> > > +        .length = sizeof(struct cxl_event_dram),
> > > +        .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0),
> > > +    },
> > > +    .phys_addr = const_le64(0x8000),
> > > +    .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> > > +    .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> > > +    .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> > > +    .validity_flags = { CXL_DER_VALID_CHANNEL |
> > > +                        CXL_DER_VALID_BANK_GROUP |
> > > +                        CXL_DER_VALID_BANK |
> > > +                        CXL_DER_VALID_COLUMN, 0 },
> > > +    .channel = 1,
> > > +    .bank_group = 5,
> > > +    .bank = 2,
> > > +    .column = { 0xDE, 0xAD},
> > > +};
> > > +
> > > +#define CXL_MMER_HEALTH_STATUS_CHANGE           0x00
> > > +#define CXL_MMER_MEDIA_STATUS_CHANGE            0x01
> > > +#define CXL_MMER_LIFE_USED_CHANGE               0x02
> > > +#define CXL_MMER_TEMP_CHANGE                    0x03
> > > +#define CXL_MMER_DATA_PATH_ERROR                0x04
> > > +#define CXL_MMER_LAS_ERROR                      0x05  
> > 
> > Ah this explains why I didn't find these alongside the structures.
> > I'd keep them together.  If we need to put the structures in a header
> > then put the defines there as well.  Puts all the spec related
> > stuff in one place.
> >   
> > > +
> > > +#define CXL_DHI_HS_MAINTENANCE_NEEDED           BIT(0)
> > > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED         BIT(1)
> > > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED        BIT(2)
> > > +
> > > +#define CXL_DHI_MS_NORMAL                                    0x00
> > > +#define CXL_DHI_MS_NOT_READY                                 0x01
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST                    0x02
> > > +#define CXL_DHI_MS_ALL_DATA_LOST                             0x03
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS   0x04
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN     0x05
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT           0x06
> > > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS      0x07
> > > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN        0x08
> > > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT              0x09
> > > +
> > > +#define CXL_DHI_AS_NORMAL               0x0
> > > +#define CXL_DHI_AS_WARNING              0x1
> > > +#define CXL_DHI_AS_CRITICAL             0x2
> > > +
> > > +#define CXL_DHI_AS_LIFE_USED(as)        (as & 0x3)
> > > +#define CXL_DHI_AS_DEV_TEMP(as)         ((as & 0xC) >> 2)
> > > +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as)  ((as & 0x10) >> 4)
> > > +#define CXL_DHI_AS_COR_PER_ERR_CNT(as)  ((as & 0x20) >> 5)
> > > +
> > > +struct cxl_event_mem_module mem_module = {
> > > +    .hdr = {
> > > +        .id.data = UUID(0xfe927475, 0xdd59, 0x4339,
> > > +                        0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),  
> > 
> > As mentioned, below a UUID define for each type in the header
> > probably makes more sense than having this huge thing inline.
> >   
> > > +        .length = sizeof(struct cxl_event_mem_module),
> > > +        /* .handle = Set dynamically */
> > > +        .related_handle = const_le16(0),
> > > +    },
> > > +    .event_type = CXL_MMER_TEMP_CHANGE,
> > > +    .info = {
> > > +        .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> > > +        .media_status = CXL_DHI_MS_ALL_DATA_LOST,
> > > +        .add_status = (CXL_DHI_AS_CRITICAL << 2) |
> > > +                       (CXL_DHI_AS_WARNING << 4) |
> > > +                       (CXL_DHI_AS_WARNING << 5),
> > > +        .device_temp = { 0xDE, 0xAD},  
> > 
> > odd spacing
> >   
> > > +        .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
> > > +        .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },  
> > 
> > Could make a reasonable number up rather than deadbeef ;)
> >   
> > > +        .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> > > +    }
> > > +};
> > > +
> > > +void cxl_mock_add_event_logs(CXLDeviceState *cxlds)
> > > +{  
> > 
> > This is fine for initial testing, but I Think we want to be more
> > sophisticated with the injection interface and allow injecting
> > individual events so we can move the requirement for 'coverage'
> > testing from having a representative list here to an external script
> > that hits all the corners.
> > 
> > I can build something on top of this that lets us doing that if you like.
> > I have ancient code doing the equivalent for CCIX devices that I never
> > upstreamed. Would probably do it a bit differently today but principle
> > is the same. Using QMP  directly rather than qmp-shell lets you do it
> > as json which ends up more readable than complex command lines for this
> > sort of structure command.
> > 
> > 
> >   
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO, &maint_needed);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> > > +                          (struct cxl_event_record_raw *)&gen_media);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> > > +                          (struct cxl_event_record_raw *)&mem_module);
> > > +
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &maint_needed);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > +                          (struct cxl_event_record_raw *)&dram);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > +                          (struct cxl_event_record_raw *)&gen_media);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > +                          (struct cxl_event_record_raw *)&mem_module);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > +                          (struct cxl_event_record_raw *)&dram);
> > > +
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> > > +    event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL,
> > > +                          (struct cxl_event_record_raw *)&dram);
> > > +}  
> > 
> >   
> > > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > > index 7b4cff569347..46c50c1c13a6 100644
> > > --- a/include/hw/cxl/cxl_device.h
> > > +++ b/include/hw/cxl/cxl_device.h
> > > @@ -11,6 +11,7 @@
> > >  #define CXL_DEVICE_H
> > >  
> > >  #include "hw/register.h"
> > > +#include "hw/cxl/cxl_events.h"
> > >  
> > >  /*
> > >   * The following is how a CXL device's Memory Device registers are laid out.
> > > @@ -80,6 +81,14 @@
> > >      (CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH +     \
> > >       CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
> > >  
> > > +#define CXL_TEST_EVENT_CNT_MAX 15  
> > 
> > Where did 15 come from?
> >   
> > > +
> > > +struct cxl_event_log {
> > > +    int cur_event;
> > > +    int nr_events;
> > > +    struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
> > > +};
> > > +
> > >  typedef struct cxl_device_state {
> > >      MemoryRegion device_registers;
> > >  
> > > @@ -119,6 +128,8 @@ typedef struct cxl_device_state {
> > >  
> > >      /* memory region for persistent memory, HDM */
> > >      uint64_t pmem_size;
> > > +
> > > +    struct cxl_event_log event_logs[CXL_EVENT_TYPE_MAX];
> > >  } CXLDeviceState;
> > >  
> > >  /* Initialize the register block for a device */
> > > @@ -272,4 +283,12 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
> > >  MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
> > >                              unsigned size, MemTxAttrs attrs);
> > >  
> > > +void cxl_mock_add_event_logs(CXLDeviceState *cxlds);
> > > +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type);
> > > +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log);
> > > +uint16_t get_cur_event_handle(struct cxl_event_log *log);
> > > +bool log_empty(struct cxl_event_log *log);
> > > +int log_rec_left(struct cxl_event_log *log);
> > > +uint16_t log_overflow(struct cxl_event_log *log);
> > > +
> > >  #endif
> > > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> > > new file mode 100644
> > > index 000000000000..255111f3dcfb
> > > --- /dev/null
> > > +++ b/include/hw/cxl/cxl_events.h
> > > @@ -0,0 +1,173 @@
> > > +/*
> > > + * QEMU CXL Events
> > > + *
> > > + * Copyright (c) 2022 Intel
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > > + * COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#ifndef CXL_EVENTS_H
> > > +#define CXL_EVENTS_H
> > > +
> > > +#include "qemu/uuid.h"
> > > +#include "hw/cxl/cxl.h"
> > > +
> > > +/*
> > > + * Common Event Record Format
> > > + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> > > + */
> > > +#define CXL_EVENT_REC_HDR_RES_LEN 0xf  
> > 
> > I don't see an advantage in this define vs just
> > putting the value in directly below.
> > Same with similar cases - the define must makes them
> > a tiny bit harder to compare with the specification when
> > reviewing.
> >   
> > > +struct cxl_event_record_hdr {
> > > +    QemuUUID id;
> > > +    uint8_t length;
> > > +    uint8_t flags[3];
> > > +    uint16_t handle;
> > > +    uint16_t related_handle;
> > > +    uint64_t timestamp;
> > > +    uint8_t maint_op_class;
> > > +    uint8_t reserved[CXL_EVENT_REC_HDR_RES_LEN];
> > > +} QEMU_PACKED;
> > > +
> > > +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> > > +struct cxl_event_record_raw {
> > > +    struct cxl_event_record_hdr hdr;
> > > +    uint8_t data[CXL_EVENT_RECORD_DATA_LENGTH];
> > > +} QEMU_PACKED;  
> > 
> > Hmm. I wonder if we should instead define this as a union of
> > the known event types?  I haven't checked if it would work
> > everywhere yet though.
> >   
> > > +
> > > +/*
> > > + * Get Event Records output payload
> > > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> > > + *
> > > + * Space given for 1 record
> > > + */
> > > +#define CXL_GET_EVENT_FLAG_OVERFLOW     BIT(0)
> > > +#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
> > > +struct cxl_get_event_payload {
> > > +    uint8_t flags;
> > > +    uint8_t reserved1;
> > > +    uint16_t overflow_err_count;
> > > +    uint64_t first_overflow_timestamp;
> > > +    uint64_t last_overflow_timestamp;
> > > +    uint16_t record_count;
> > > +    uint8_t reserved2[0xa];
> > > +    struct cxl_event_record_raw record;  
> > 
> > This last element should be a [] array and then move
> > the handling of different record counts to the places it
> > is used.
> > 
> > Spec unfortunately says that we should return as many
> > as we can fit, so we can't rely on the users of this interface
> > only sending a request for one record (as I think your Linux
> > kernel code currently does). See below for more on this...
> > 
> >   
> > > +} QEMU_PACKED;
> > > +
> > > +/*
> > > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> > > + */
> > > +enum cxl_event_log_type {
> > > +    CXL_EVENT_TYPE_INFO = 0x00,
> > > +    CXL_EVENT_TYPE_WARN,
> > > +    CXL_EVENT_TYPE_FAIL,
> > > +    CXL_EVENT_TYPE_FATAL,
> > > +    CXL_EVENT_TYPE_DYNAMIC_CAP,
> > > +    CXL_EVENT_TYPE_MAX
> > > +};
> > > +
> > > +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> > > +{
> > > +    switch (type) {
> > > +    case CXL_EVENT_TYPE_INFO:
> > > +        return "Informational";
> > > +    case CXL_EVENT_TYPE_WARN:
> > > +        return "Warning";
> > > +    case CXL_EVENT_TYPE_FAIL:
> > > +        return "Failure";
> > > +    case CXL_EVENT_TYPE_FATAL:
> > > +        return "Fatal";
> > > +    case CXL_EVENT_TYPE_DYNAMIC_CAP:
> > > +        return "Dynamic Capacity";
> > > +    default:
> > > +        break;
> > > +    }
> > > +    return "<unknown>";
> > > +}
> > > +
> > > +/*
> > > + * Clear Event Records input payload
> > > + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> > > + *
> > > + * Space given for 1 record  
> > 
> > I'd rather this was defined to have a trailing variable length
> > array of handles and allocations and then wherever it was used
> > we deal with the length.
> > 
> > I'm also nervous about limiting the qemu emulation to handling only
> > one record.. Spec wise I don't think you are allowed to say
> > no to larger clears.  I understand the fact we can't test this today
> > with the kernel code but maybe we can hack together enough to
> > verify the emulation of larger gets and clears...
> > 
> >   
> > > + */
> > > +struct cxl_mbox_clear_event_payload {
> > > +    uint8_t event_log;      /* enum cxl_event_log_type */
> > > +    uint8_t clear_flags;
> > > +    uint8_t nr_recs;        /* 1 for this struct */
> > > +    uint8_t reserved[3];
> > > +    uint16_t handle;
> > > +};
> > > +
> > > +/*
> > > + * General Media Event Record
> > > + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> > > + */  
> > 
> > In interests of keeping everything that needs checking against
> > a chunk of the spec together, perhaps it's worth adding appropriate
> > defines for the UUIDs?
> >   
> > > +#define CXL_EVENT_GEN_MED_COMP_ID_SIZE  0x10
> > > +#define CXL_EVENT_GEN_MED_RES_SIZE      0x2e  
> > 
> > As above, I'd rather see these inline.
> >   
> > > +struct cxl_event_gen_media {
> > > +    struct cxl_event_record_hdr hdr;
> > > +    uint64_t phys_addr;  
> > Defines for the mask + that we have a few things hiding in
> > the bottom bits?
> >   
> > > +    uint8_t descriptor;  
> > Defines for the various fields in here?
> >   
> > > +    uint8_t type;  
> > Same for the others that follow.
> >   
> > > +    uint8_t transaction_type;  
> >   
> > > +    uint8_t validity_flags[2];  
> > 
> > uint16_t probably makes sense as we can do that for this one (unlike the helpful le24 flags fields
> > in other structures).
> >   
> > > +    uint8_t channel;
> > > +    uint8_t rank;
> > > +    uint8_t device[3];
> > > +    uint8_t component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
> > > +    uint8_t reserved[CXL_EVENT_GEN_MED_RES_SIZE];
> > > +} QEMU_PACKED;  
> > Would be nice to add a build time check that these structures have the correct
> > overall size. Ben did a bunch of these in the other CXL emulation and they are
> > a handy way to reassure reviewers that it adds up right!
> >   
> > > +
> > > +/*
> > > + * DRAM Event Record - DER
> > > + * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
> > > + */
> > > +#define CXL_EVENT_DER_CORRECTION_MASK_SIZE   0x20
> > > +#define CXL_EVENT_DER_RES_SIZE               0x17  
> > Same as above.
> >   
> > > +struct cxl_event_dram {
> > > +    struct cxl_event_record_hdr hdr;
> > > +    uint64_t phys_addr;  
> > As before I'd like defines for the sub fields and masks.
> >   
> > > +    uint8_t descriptor;
> > > +    uint8_t type;
> > > +    uint8_t transaction_type;
> > > +    uint8_t validity_flags[2];  
> > uint16_t and same in similar cases.
> >   
> > > +    uint8_t channel;
> > > +    uint8_t rank;
> > > +    uint8_t nibble_mask[3];
> > > +    uint8_t bank_group;
> > > +    uint8_t bank;
> > > +    uint8_t row[3];
> > > +    uint8_t column[2];
> > > +    uint8_t correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
> > > +    uint8_t reserved[CXL_EVENT_DER_RES_SIZE];
> > > +} QEMU_PACKED;
> > > +
> > > +/*
> > > + * Get Health Info Record
> > > + * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
> > > + */
> > > +struct cxl_get_health_info {  
> > Same stuff as for earlier structures.
> >   
> > > +    uint8_t health_status;
> > > +    uint8_t media_status;
> > > +    uint8_t add_status;
> > > +    uint8_t life_used;
> > > +    uint8_t device_temp[2];
> > > +    uint8_t dirty_shutdown_cnt[4];
> > > +    uint8_t cor_vol_err_cnt[4];
> > > +    uint8_t cor_per_err_cnt[4];
> > > +} QEMU_PACKED;
> > > +
> > > +/*
> > > + * Memory Module Event Record
> > > + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> > > + */
> > > +#define CXL_EVENT_MEM_MOD_RES_SIZE  0x3d
> > > +struct cxl_event_mem_module {
> > > +    struct cxl_event_record_hdr hdr;
> > > +    uint8_t event_type;
> > > +    struct cxl_get_health_info info;
> > > +    uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
> > > +} QEMU_PACKED;
> > > +
> > > +#endif /* CXL_EVENTS_H */  
> >   
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
  2022-10-10 22:29 ` [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events ira.weiny
@ 2022-12-19 10:07     ` Jonathan Cameron via
  2022-12-19 10:07     ` Jonathan Cameron via
  1 sibling, 0 replies; 35+ messages in thread
From: Jonathan Cameron @ 2022-12-19 10:07 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:41 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> To facilitate testing of guest software add mock events and code to
> support iterating through the event logs.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

An FYI for the next version as I hit an issue with this when
testing resets (there are lots of other issues with resets
but this one crashed QEMU :)

> ---

> +static void event_store_add_event(CXLDeviceState *cxlds,
> +                                  enum cxl_event_log_type log_type,
> +                                  struct cxl_event_record_raw *event)
> +{
> +    struct cxl_event_log *log;
> +
> +    assert(log_type < CXL_EVENT_TYPE_MAX);
> +
> +    log = &cxlds->event_logs[log_type];
> +    assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);

This assert triggers on a reset as the function is called without
clearing the buffer first.

I'd suggest moving the setup of any dummy events over to a code
path that isn't run on reset.


> +
> +    log->events[log->nr_events] = event;
> +    log->nr_events++;
> +}


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
@ 2022-12-19 10:07     ` Jonathan Cameron via
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Cameron via @ 2022-12-19 10:07 UTC (permalink / raw)
  To: ira.weiny; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, 10 Oct 2022 15:29:41 -0700
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> To facilitate testing of guest software add mock events and code to
> support iterating through the event logs.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

An FYI for the next version as I hit an issue with this when
testing resets (there are lots of other issues with resets
but this one crashed QEMU :)

> ---

> +static void event_store_add_event(CXLDeviceState *cxlds,
> +                                  enum cxl_event_log_type log_type,
> +                                  struct cxl_event_record_raw *event)
> +{
> +    struct cxl_event_log *log;
> +
> +    assert(log_type < CXL_EVENT_TYPE_MAX);
> +
> +    log = &cxlds->event_logs[log_type];
> +    assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);

This assert triggers on a reset as the function is called without
clearing the buffer first.

I'd suggest moving the setup of any dummy events over to a code
path that isn't run on reset.


> +
> +    log->events[log->nr_events] = event;
> +    log->nr_events++;
> +}



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
  2022-12-19 10:07     ` Jonathan Cameron via
  (?)
@ 2022-12-21 18:56     ` Ira Weiny
  -1 siblings, 0 replies; 35+ messages in thread
From: Ira Weiny @ 2022-12-21 18:56 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: Michael Tsirkin, Ben Widawsky, qemu-devel, linux-cxl

On Mon, Dec 19, 2022 at 10:07:23AM +0000, Jonathan Cameron wrote:
> On Mon, 10 Oct 2022 15:29:41 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > To facilitate testing of guest software add mock events and code to
> > support iterating through the event logs.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> An FYI for the next version as I hit an issue with this when
> testing resets (there are lots of other issues with resets
> but this one crashed QEMU :)
> 
> > ---
> 
> > +static void event_store_add_event(CXLDeviceState *cxlds,
> > +                                  enum cxl_event_log_type log_type,
> > +                                  struct cxl_event_record_raw *event)
> > +{
> > +    struct cxl_event_log *log;
> > +
> > +    assert(log_type < CXL_EVENT_TYPE_MAX);
> > +
> > +    log = &cxlds->event_logs[log_type];
> > +    assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);
> 
> This assert triggers on a reset as the function is called without
> clearing the buffer first.

Not quite sure what happened there. But this code is completely gone in the new
version.  As is the mass insertion of events at start up.  I've jettisoned all
that in favor of the QMP injection of individual events.

I should be sending out a new version today.  It is based on cxl-2022-11-17.  I
I believe it is much cleaner.  But I'm only supporting general media event
right now.  The others can be added pretty easily but I want to get the
infrastructure settled before working on those.

Ira

> 
> I'd suggest moving the setup of any dummy events over to a code
> path that isn't run on reset.
> 
> 
> > +
> > +    log->events[log->nr_events] = event;
> > +    log->nr_events++;
> > +}
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-12-21 18:56 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-10 22:29 [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support ira.weiny
2022-10-10 22:29 ` [RFC PATCH 1/6] qemu/bswap: Add const_le64() ira.weiny
2022-10-11  9:03   ` Jonathan Cameron
2022-10-11  9:03     ` Jonathan Cameron via
2022-10-13 22:52     ` Ira Weiny
2022-10-11  9:48   ` Peter Maydell
2022-10-11 15:22     ` Richard Henderson
2022-10-11 15:45       ` Peter Maydell
2022-10-13 22:47         ` Ira Weiny
2022-10-10 22:29 ` [RFC PATCH 2/6] qemu/uuid: Add UUID static initializer ira.weiny
2022-10-11  9:13   ` Jonathan Cameron
2022-10-11  9:13     ` Jonathan Cameron via
2022-10-13 23:11     ` Ira Weiny
2022-10-10 22:29 ` [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events ira.weiny
2022-10-11 10:07   ` Jonathan Cameron
2022-10-11 10:07     ` Jonathan Cameron via
2022-10-14  0:21     ` Ira Weiny
2022-10-17 15:57       ` Jonathan Cameron
2022-10-17 15:57         ` Jonathan Cameron via
2022-12-19 10:07   ` Jonathan Cameron
2022-12-19 10:07     ` Jonathan Cameron via
2022-12-21 18:56     ` Ira Weiny
2022-10-10 22:29 ` [RFC PATCH 4/6] hw/cxl/mailbox: Wire up get/clear event mailbox commands ira.weiny
2022-10-11 10:26   ` Jonathan Cameron
2022-10-11 10:26     ` Jonathan Cameron via
2022-10-10 22:29 ` [RFC PATCH 5/6] hw/cxl/cxl-events: Add event interrupt support ira.weiny
2022-10-11 10:30   ` Jonathan Cameron
2022-10-11 10:30     ` Jonathan Cameron via
2022-10-10 22:29 ` [RFC PATCH 6/6] hw/cxl/mailbox: Wire up Get/Set Event Interrupt policy ira.weiny
2022-10-11 10:40   ` Jonathan Cameron
2022-10-11 10:40     ` Jonathan Cameron via
2022-10-10 22:45 ` [RFC PATCH 0/6] QEMU CXL Provide mock CXL events and irq support Ira Weiny
2022-10-11  9:40 ` Jonathan Cameron
2022-10-11  9:40   ` Jonathan Cameron via
2022-10-11 17:03   ` Ira Weiny

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.