linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] hw/cxl: Poison get, inject, clear
@ 2023-03-02 10:17 Jonathan Cameron
  2023-03-02 10:17 ` [PATCH v3 1/6] hw/cxl: rename mailbox return code type from ret_code to CXLRetCode Jonathan Cameron
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Jonathan Cameron @ 2023-03-02 10:17 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

v3:
- Gather tags. Thanks Fan Ni
- CC qapi maintainers. Thanks Michael Tsirkin

Note Alison has stated the kernel series will be post 6.3 material
so this one isn't quite as urgent as the patches it is based on.
However I think this series in a good state (plus I have lots more queued
behind it) hence promoting it from RFC.

Changes since RFC v2: Thanks to Markus for review.
 - Improve documentation for QMP interface
 - Add better description of baseline series
 - Include precursor refactors around ret_code / CXLRetCode as this is now
   the first series in suggeste merge order to rely on those.
 - Include Ira's cxl_device_get_timestamp() function as it was better than
   the equivalent in the RFC.

Based on following series (in order)
1. [PATCH v4 00/10] hw/cxl: CXL emulation cleanups and minor fixes for upstream
2. [PATCH v6 0/8] hw/cxl: RAS error emulation and injection
3. [PATCH v2 0/2] hw/cxl: Passthrough HDM decoder emulation
4. [PATCH v4 0/2] hw/mem: CXL Type-3 Volatile Memory Support

Based on: Message-Id: 20230206172816.8201-1-Jonathan.Cameron@huawei.com
Based-on: Message-id: 20230227112751.6101-1-Jonathan.Cameron@huawei.com
Based-on: Message-id: 20230227153128.8164-1-Jonathan.Cameron@huawei.com
Based-on: Message-id: 20230227163157.6621-1-Jonathan.Cameron@huawei.com

The series supports:
1) Injection of variable length poison regions via QMP (to fake real
   memory corruption and ensure we deal with odd overflow corner cases
   such as clearing the middle of a large region making the list overflow
   as we go from one long entry to two smaller entries.
2) Read of poison list via the CXL mailbox.
3) Injection via the poison injection mailbox command (limited to 64 byte
   entries)
4) Clearing of poison injected via either method.

The implementation is meant to be a valid combination of impdef choices
based on what the spec allowed. There are a number of places where it could
be made more sophisticated that we might consider in future:
* Fusing adjacent poison entries if the types match.
* Separate injection list and main poison list, to test out limits on
  injected poison list being smaller than the main list.
* Poison list overflow event (needs event log support in general)
* Connecting up to the poison list error record generation (rather complex
  and not needed for currently kernel handling testing).

As the kernel code is currently fairly simple, it is likely that the above
does not yet matter but who knows what will turn up in future!

Kernel patches:
 [PATCH v7 0/6] CXL Poison List Retrieval & Tracing
 cover.1676685180.git.alison.schofield@intel.com
 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison
 cover.1674101475.git.alison.schofield@intel.com

Ira Weiny (2):
  hw/cxl: Introduce cxl_device_get_timestamp() utility function
  bswap: Add the ability to store to an unaligned 24 bit field

Jonathan Cameron (4):
  hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
  hw/cxl: QMP based poison injection support
  hw/cxl: Add poison injection via the mailbox.
  hw/cxl: Add clear poison mailbox command support.

 hw/cxl/cxl-device-utils.c   |  15 ++
 hw/cxl/cxl-mailbox-utils.c  | 285 ++++++++++++++++++++++++++++++------
 hw/mem/cxl_type3.c          |  92 ++++++++++++
 hw/mem/cxl_type3_stubs.c    |   6 +
 include/hw/cxl/cxl_device.h |  23 +++
 include/qemu/bswap.h        |  23 +++
 qapi/cxl.json               |  18 +++
 7 files changed, 420 insertions(+), 42 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/6] hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
  2023-03-02 10:17 [PATCH v3 0/6] hw/cxl: Poison get, inject, clear Jonathan Cameron
@ 2023-03-02 10:17 ` Jonathan Cameron
  2023-03-02 10:17 ` [PATCH v3 2/6] hw/cxl: Introduce cxl_device_get_timestamp() utility function Jonathan Cameron
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Jonathan Cameron @ 2023-03-02 10:17 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

Given the increasing usage of this mailbox return code type, now
is a good time to switch to QEMU style naming.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
v8: Picked up tag from Fan Ni
---
 hw/cxl/cxl-mailbox-utils.c | 64 +++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index ed663cc04a..7b2aef0d67 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -23,7 +23,7 @@
  *     FOO    = 0x7f,
  *          #define BAR 0
  *  2. Implement the handler
- *    static ret_code cmd_foo_bar(struct cxl_cmd *cmd,
+ *    static CXLRetCode cmd_foo_bar(struct cxl_cmd *cmd,
  *                                  CXLDeviceState *cxl_dstate, uint16_t *len)
  *  3. Add the command to the cxl_cmd_set[][]
  *    [FOO][BAR] = { "FOO_BAR", cmd_foo_bar, x, y },
@@ -90,10 +90,10 @@ typedef enum {
     CXL_MBOX_UNSUPPORTED_MAILBOX = 0x15,
     CXL_MBOX_INVALID_PAYLOAD_LENGTH = 0x16,
     CXL_MBOX_MAX = 0x17
-} ret_code;
+} CXLRetCode;
 
 struct cxl_cmd;
-typedef ret_code (*opcode_handler)(struct cxl_cmd *cmd,
+typedef CXLRetCode (*opcode_handler)(struct cxl_cmd *cmd,
                                    CXLDeviceState *cxl_dstate, uint16_t *len);
 struct cxl_cmd {
     const char *name;
@@ -105,16 +105,16 @@ struct cxl_cmd {
 
 #define DEFINE_MAILBOX_HANDLER_ZEROED(name, size)                         \
     uint16_t __zero##name = size;                                         \
-    static ret_code cmd_##name(struct cxl_cmd *cmd,                       \
-                               CXLDeviceState *cxl_dstate, uint16_t *len) \
+    static CXLRetCode cmd_##name(struct cxl_cmd *cmd,                       \
+                                 CXLDeviceState *cxl_dstate, uint16_t *len) \
     {                                                                     \
         *len = __zero##name;                                              \
         memset(cmd->payload, 0, *len);                                    \
         return CXL_MBOX_SUCCESS;                                          \
     }
 #define DEFINE_MAILBOX_HANDLER_NOP(name)                                  \
-    static ret_code cmd_##name(struct cxl_cmd *cmd,                       \
-                               CXLDeviceState *cxl_dstate, uint16_t *len) \
+    static CXLRetCode cmd_##name(struct cxl_cmd *cmd,                       \
+                                 CXLDeviceState *cxl_dstate, uint16_t *len) \
     {                                                                     \
         return CXL_MBOX_SUCCESS;                                          \
     }
@@ -125,9 +125,9 @@ DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
 DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
 
 /* 8.2.9.2.1 */
-static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
-                                             CXLDeviceState *cxl_dstate,
-                                             uint16_t *len)
+static CXLRetCode cmd_firmware_update_get_info(struct cxl_cmd *cmd,
+                                               CXLDeviceState *cxl_dstate,
+                                               uint16_t *len)
 {
     struct {
         uint8_t slots_supported;
@@ -159,9 +159,9 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
 }
 
 /* 8.2.9.3.1 */
-static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
-                                  CXLDeviceState *cxl_dstate,
-                                  uint16_t *len)
+static CXLRetCode cmd_timestamp_get(struct cxl_cmd *cmd,
+                                    CXLDeviceState *cxl_dstate,
+                                    uint16_t *len)
 {
     uint64_t time, delta;
     uint64_t final_time = 0;
@@ -181,7 +181,7 @@ static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
 }
 
 /* 8.2.9.3.2 */
-static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
+static CXLRetCode cmd_timestamp_set(struct cxl_cmd *cmd,
                                   CXLDeviceState *cxl_dstate,
                                   uint16_t *len)
 {
@@ -201,9 +201,9 @@ static const QemuUUID cel_uuid = {
 };
 
 /* 8.2.9.4.1 */
-static ret_code cmd_logs_get_supported(struct cxl_cmd *cmd,
-                                       CXLDeviceState *cxl_dstate,
-                                       uint16_t *len)
+static CXLRetCode cmd_logs_get_supported(struct cxl_cmd *cmd,
+                                         CXLDeviceState *cxl_dstate,
+                                         uint16_t *len)
 {
     struct {
         uint16_t entries;
@@ -224,9 +224,9 @@ static ret_code cmd_logs_get_supported(struct cxl_cmd *cmd,
 }
 
 /* 8.2.9.4.2 */
-static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
-                                 CXLDeviceState *cxl_dstate,
-                                 uint16_t *len)
+static CXLRetCode cmd_logs_get_log(struct cxl_cmd *cmd,
+                                   CXLDeviceState *cxl_dstate,
+                                   uint16_t *len)
 {
     struct {
         QemuUUID uuid;
@@ -265,9 +265,9 @@ static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
 }
 
 /* 8.2.9.5.1.1 */
-static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
-                                           CXLDeviceState *cxl_dstate,
-                                           uint16_t *len)
+static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
+                                             CXLDeviceState *cxl_dstate,
+                                             uint16_t *len)
 {
     struct {
         char fw_revision[0x10];
@@ -309,9 +309,9 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
-static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
-                                           CXLDeviceState *cxl_dstate,
-                                           uint16_t *len)
+static CXLRetCode cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
+                                              CXLDeviceState *cxl_dstate,
+                                              uint16_t *len)
 {
     struct {
         uint64_t active_vmem;
@@ -339,9 +339,9 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
-static ret_code cmd_ccls_get_lsa(struct cxl_cmd *cmd,
-                                 CXLDeviceState *cxl_dstate,
-                                 uint16_t *len)
+static CXLRetCode cmd_ccls_get_lsa(struct cxl_cmd *cmd,
+                                   CXLDeviceState *cxl_dstate,
+                                   uint16_t *len)
 {
     struct {
         uint32_t offset;
@@ -364,9 +364,9 @@ static ret_code cmd_ccls_get_lsa(struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
-static ret_code cmd_ccls_set_lsa(struct cxl_cmd *cmd,
-                                 CXLDeviceState *cxl_dstate,
-                                 uint16_t *len)
+static CXLRetCode cmd_ccls_set_lsa(struct cxl_cmd *cmd,
+                                   CXLDeviceState *cxl_dstate,
+                                   uint16_t *len)
 {
     struct set_lsa_pl {
         uint32_t offset;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/6] hw/cxl: Introduce cxl_device_get_timestamp() utility function
  2023-03-02 10:17 [PATCH v3 0/6] hw/cxl: Poison get, inject, clear Jonathan Cameron
  2023-03-02 10:17 ` [PATCH v3 1/6] hw/cxl: rename mailbox return code type from ret_code to CXLRetCode Jonathan Cameron
@ 2023-03-02 10:17 ` Jonathan Cameron
  2023-03-02 10:17 ` [PATCH v3 3/6] bswap: Add the ability to store to an unaligned 24 bit field Jonathan Cameron
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Jonathan Cameron @ 2023-03-02 10:17 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

From: Ira Weiny <ira.weiny@intel.com>

There are new users of this functionality coming shortly so factor
it out from the GET_TIMESTAMP mailbox command handling.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
v8:
Picked up tag from Fan Ni
---
 hw/cxl/cxl-device-utils.c   | 15 +++++++++++++++
 hw/cxl/cxl-mailbox-utils.c  | 11 +----------
 include/hw/cxl/cxl_device.h |  2 ++
 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 4c5e88aaf5..86e1cea8ce 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -269,3 +269,18 @@ void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
 
     cxl_initialize_mailbox(cxl_dstate);
 }
+
+uint64_t cxl_device_get_timestamp(CXLDeviceState *cxl_dstate)
+{
+    uint64_t time, delta;
+    uint64_t final_time = 0;
+
+    if (cxl_dstate->timestamp.set) {
+        /* Find the delta from the last time the host set the time. */
+        time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+        delta = time - cxl_dstate->timestamp.last_set;
+        final_time = cxl_dstate->timestamp.host_set + delta;
+    }
+
+    return final_time;
+}
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 7b2aef0d67..702e16ca20 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -163,17 +163,8 @@ static CXLRetCode cmd_timestamp_get(struct cxl_cmd *cmd,
                                     CXLDeviceState *cxl_dstate,
                                     uint16_t *len)
 {
-    uint64_t time, delta;
-    uint64_t final_time = 0;
-
-    if (cxl_dstate->timestamp.set) {
-        /* First find the delta from the last time the host set the time. */
-        time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-        delta = time - cxl_dstate->timestamp.last_set;
-        final_time = cxl_dstate->timestamp.host_set + delta;
-    }
+    uint64_t final_time = cxl_device_get_timestamp(cxl_dstate);
 
-    /* Then adjust the actual time */
     stq_le_p(cmd->payload, final_time);
     *len = 8;
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index edb9791bab..02befda0f6 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -287,4 +287,6 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
                             unsigned size, MemTxAttrs attrs);
 
+uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds);
+
 #endif
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/6] bswap: Add the ability to store to an unaligned 24 bit field
  2023-03-02 10:17 [PATCH v3 0/6] hw/cxl: Poison get, inject, clear Jonathan Cameron
  2023-03-02 10:17 ` [PATCH v3 1/6] hw/cxl: rename mailbox return code type from ret_code to CXLRetCode Jonathan Cameron
  2023-03-02 10:17 ` [PATCH v3 2/6] hw/cxl: Introduce cxl_device_get_timestamp() utility function Jonathan Cameron
@ 2023-03-02 10:17 ` Jonathan Cameron
  2023-03-02 10:17 ` [PATCH v3 4/6] hw/cxl: QMP based poison injection support Jonathan Cameron
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Jonathan Cameron @ 2023-03-02 10:17 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

From: Ira Weiny <ira.weiny@intel.com>

CXL has 24 bit unaligned fields which need to be stored to.  CXL is
specified as little endian.

Define st24_le_p() and the supporting functions to store such a field
from a 32 bit host native value.

The use of b, w, l, q as the size specifier is limiting.  So "24" was
used for the size part of the function name.

Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
v8:
  - Picked up tag from Fan Ni.
---
 include/qemu/bswap.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index 15a78c0db5..ee71cbeaaa 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -8,11 +8,23 @@
 #undef  bswap64
 #define bswap64(_x) __builtin_bswap64(_x)
 
+static inline uint32_t bswap24(uint32_t x)
+{
+    return (((x & 0x000000ffU) << 16) |
+            ((x & 0x0000ff00U) <<  0) |
+            ((x & 0x00ff0000U) >> 16));
+}
+
 static inline void bswap16s(uint16_t *s)
 {
     *s = __builtin_bswap16(*s);
 }
 
+static inline void bswap24s(uint32_t *s)
+{
+    *s = bswap24(*s);
+}
+
 static inline void bswap32s(uint32_t *s)
 {
     *s = __builtin_bswap32(*s);
@@ -176,6 +188,7 @@ CPU_CONVERT(le, 64, uint64_t)
  * size is:
  *   b: 8 bits
  *   w: 16 bits
+ *   24: 24 bits
  *   l: 32 bits
  *   q: 64 bits
  *
@@ -248,6 +261,11 @@ static inline void stw_he_p(void *ptr, uint16_t v)
     __builtin_memcpy(ptr, &v, sizeof(v));
 }
 
+static inline void st24_he_p(void *ptr, uint32_t v)
+{
+    __builtin_memcpy(ptr, &v, 3);
+}
+
 static inline int ldl_he_p(const void *ptr)
 {
     int32_t r;
@@ -297,6 +315,11 @@ static inline void stw_le_p(void *ptr, uint16_t v)
     stw_he_p(ptr, le_bswap(v, 16));
 }
 
+static inline void st24_le_p(void *ptr, uint32_t v)
+{
+    st24_he_p(ptr, le_bswap(v, 24));
+}
+
 static inline void stl_le_p(void *ptr, uint32_t v)
 {
     stl_he_p(ptr, le_bswap(v, 32));
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 4/6] hw/cxl: QMP based poison injection support
  2023-03-02 10:17 [PATCH v3 0/6] hw/cxl: Poison get, inject, clear Jonathan Cameron
                   ` (2 preceding siblings ...)
  2023-03-02 10:17 ` [PATCH v3 3/6] bswap: Add the ability to store to an unaligned 24 bit field Jonathan Cameron
@ 2023-03-02 10:17 ` Jonathan Cameron
  2023-03-03  0:50   ` Ira Weiny
  2023-03-02 10:17 ` [PATCH v3 5/6] hw/cxl: Add poison injection via the mailbox Jonathan Cameron
  2023-03-02 10:17 ` [PATCH v3 6/6] hw/cxl: Add clear poison mailbox command support Jonathan Cameron
  5 siblings, 1 reply; 12+ messages in thread
From: Jonathan Cameron @ 2023-03-02 10:17 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

Inject poison using qmp command cxl-inject-poison to add an entry to the
poison list.

For now, the poison is not returned CXL.mem reads, but only via the
mailbox command Get Poison List.

See CXL rev 3.0, sec 8.2.9.8.4.1 Get Poison list (Opcode 4300h)

Kernel patches to use this interface here:
https://lore.kernel.org/linux-cxl/cover.1665606782.git.alison.schofield@intel.com/

To inject poison using qmp (telnet to the qmp port)
{ "execute": "qmp_capabilities" }

{ "execute": "cxl-inject-poison",
    "arguments": {
         "path": "/machine/peripheral/cxl-pmem0",
         "start": 2048,
         "length": 256
    }
}

Adjusted to select a device on your machine.

Note that the poison list supported is kept short enough to avoid the
complexity of state machine that is needed to handle the MORE flag.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
v2:
Improve QMP documentation.
Fix up some endian issues
---
 hw/cxl/cxl-mailbox-utils.c  | 90 +++++++++++++++++++++++++++++++++++++
 hw/mem/cxl_type3.c          | 56 +++++++++++++++++++++++
 hw/mem/cxl_type3_stubs.c    |  6 +++
 include/hw/cxl/cxl_device.h | 20 +++++++++
 qapi/cxl.json               | 18 ++++++++
 5 files changed, 190 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 702e16ca20..792d3ee5aa 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -62,6 +62,8 @@ enum {
         #define GET_PARTITION_INFO     0x0
         #define GET_LSA       0x2
         #define SET_LSA       0x3
+    MEDIA_AND_POISON = 0x43,
+        #define GET_POISON_LIST        0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -295,6 +297,10 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
     stq_le_p(&id->persistent_capacity, cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->volatile_capacity, cxl_dstate->vmem_size / CXL_CAPACITY_MULTIPLIER);
     stl_le_p(&id->lsa_size, cvc->get_lsa_size(ct3d));
+    /* 256 poison records */
+    st24_le_p(id->poison_list_max_mer, 256);
+    /* No limit - so limited by main poison record limit */
+    stw_le_p(&id->inject_poison_limit, 0);
 
     *len = sizeof(*id);
     return CXL_MBOX_SUCCESS;
@@ -384,6 +390,88 @@ static CXLRetCode cmd_ccls_set_lsa(struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * This is very inefficient, but good enough for now!
+ * Also the payload will always fit, so no need to handle the MORE flag and
+ * make this stateful. We may want to allow longer poison lists to aid
+ * testing that kernel functionality.
+ */
+static CXLRetCode cmd_media_get_poison_list(struct cxl_cmd *cmd,
+                                            CXLDeviceState *cxl_dstate,
+                                            uint16_t *len)
+{
+    struct get_poison_list_pl {
+        uint64_t pa;
+        uint64_t length;
+    } QEMU_PACKED;
+
+    struct get_poison_list_out_pl {
+        uint8_t flags;
+        uint8_t rsvd1;
+        uint64_t overflow_timestamp;
+        uint16_t count;
+        uint8_t rsvd2[0x14];
+        struct {
+            uint64_t addr;
+            uint32_t length;
+            uint32_t resv;
+        } QEMU_PACKED records[];
+    } QEMU_PACKED;
+
+    struct get_poison_list_pl *in = (void *)cmd->payload;
+    struct get_poison_list_out_pl *out = (void *)cmd->payload;
+    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+    uint16_t record_count = 0, i = 0;
+    uint64_t query_start, query_length;
+    CXLPoisonList *poison_list = &ct3d->poison_list;
+    CXLPoison *ent;
+    uint16_t out_pl_len;
+
+    query_start = ldq_le_p(&in->pa);
+    /* 64 byte alignemnt required */
+    if (query_start & 0x3f) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+    query_length = ldq_le_p(&in->length) * 64;
+
+    QLIST_FOREACH(ent, poison_list, node) {
+        /* Check for no overlap */
+        if (ent->start >= query_start + query_length ||
+            ent->start + ent->length <= query_start) {
+            continue;
+        }
+        record_count++;
+    }
+    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+    memset(out, 0, out_pl_len);
+    QLIST_FOREACH(ent, poison_list, node) {
+        uint64_t start, stop;
+
+        /* Check for no overlap */
+        if (ent->start >= query_start + query_length ||
+            ent->start + ent->length <= query_start) {
+            continue;
+        }
+
+        /* Deal with overlap */
+        start = MAX(ent->start & 0xffffffffffffffc0, query_start);
+        stop = MIN((ent->start & 0xffffffffffffffc0) + ent->length,
+                   query_start + query_length);
+        stq_le_p(&out->records[i].addr, start | (ent->type & 0x3));
+        stl_le_p(&out->records[i].length, (stop - start) / 64);
+        i++;
+    }
+    if (ct3d->poison_list_overflowed) {
+        out->flags = (1 << 1);
+        stq_le_p(&out->overflow_timestamp, ct3d->poison_list_overflow_ts);
+    }
+    stw_le_p(&out->count, record_count);
+    *len = out_pl_len;
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -411,6 +499,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
     [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 },
     [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
         ~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
+    [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST",
+        cmd_media_get_poison_list, 16, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 572ad47fa2..21e3a84785 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -919,6 +919,62 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
      */
 }
 
+void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d)
+{
+        ct3d->poison_list_overflowed = true;
+        ct3d->poison_list_overflow_ts =
+            cxl_device_get_timestamp(&ct3d->cxl_dstate);
+}
+
+void qmp_cxl_inject_poison(const char *path, uint64_t start, uint64_t length,
+                           Error **errp)
+{
+    Object *obj = object_resolve_path(path, NULL);
+    CXLType3Dev *ct3d;
+    CXLPoison *p;
+
+    if (length % 64) {
+        error_setg(errp, "Poison injection must be in multiples of 64 bytes");
+        return;
+    }
+    if (start % 64) {
+        error_setg(errp, "Poison start address must be 64 byte aligned");
+        return;
+    }
+    if (!obj) {
+        error_setg(errp, "Unable to resolve path");
+        return;
+    }
+    if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+        error_setg(errp, "Path does not point to a CXL type 3 device");
+        return;
+    }
+
+    ct3d = CXL_TYPE3(obj);
+
+    QLIST_FOREACH(p, &ct3d->poison_list, node) {
+        if (((start >= p->start) && (start < p->start + p->length)) ||
+            ((start + length > p->start) &&
+             (start + length <= p->start + p->length))) {
+            error_setg(errp, "Overlap with existing poisoned region not supported");
+            return;
+        }
+    }
+
+    if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {
+        cxl_set_poison_list_overflowed(ct3d);
+        return;
+    }
+
+    p = g_new0(CXLPoison, 1);
+    p->length = length;
+    p->start = start;
+    p->type = CXL_POISON_TYPE_INTERNAL; /* Different from injected via the mbox */
+
+    QLIST_INSERT_HEAD(&ct3d->poison_list, p, node);
+    ct3d->poison_list_cnt++;
+}
+
 /* For uncorrectable errors include support for multiple header recording */
 void qmp_cxl_inject_uncorrectable_errors(const char *path,
                                          CXLUncorErrorRecordList *errors,
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index d574c58f9a..fd1166a610 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -3,6 +3,12 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-cxl.h"
 
+void qmp_cxl_inject_poison(const char *path, uint64_t start, uint64_t length,
+                           Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
 void qmp_cxl_inject_uncorrectable_errors(const char *path,
                                          CXLUncorErrorRecordList *errors,
                                          Error **errp)
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 02befda0f6..32c234ea91 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -242,6 +242,18 @@ typedef struct CXLError {
 
 typedef QTAILQ_HEAD(, CXLError) CXLErrorList;
 
+typedef struct CXLPoison {
+    uint64_t start, length;
+    uint8_t type;
+#define CXL_POISON_TYPE_EXTERNAL 0x1
+#define CXL_POISON_TYPE_INTERNAL 0x2
+#define CXL_POISON_TYPE_INJECTED 0x3
+    QLIST_ENTRY(CXLPoison) node;
+} CXLPoison;
+
+typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
+#define CXL_POISON_LIST_LIMIT 256
+
 struct CXLType3Dev {
     /* Private */
     PCIDevice parent_obj;
@@ -264,6 +276,12 @@ struct CXLType3Dev {
 
     /* Error injection */
     CXLErrorList error_list;
+
+    /* Poison Injection - cache */
+    CXLPoisonList poison_list;
+    unsigned int poison_list_cnt;
+    bool poison_list_overflowed;
+    uint64_t poison_list_overflow_ts;
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
@@ -289,4 +307,6 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
 
 uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds);
 
+void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
+
 #endif
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 4be7d46041..9ebd680dfe 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -5,6 +5,24 @@
 # = CXL devices
 ##
 
+##
+# @cxl-inject-poison:
+#
+# Poison records indicate that a CXL memory device knows that a particular
+# memory region may be corrupted. This may be because of locally detected
+# errors (e.g. ECC failure) or poisoned writes received from other components
+# in the system. This injection mechanism enables testing of the OS handling
+# of poison records which may be queried via the CXL mailbox.
+#
+# @path: CXL type 3 device canonical QOM path
+# @start: Start address - must be 64 byte aligned.
+# @length: Length of poison to inject - must be a multiple of 64 bytes.
+#
+# Since: 8.0
+##
+{ 'command': 'cxl-inject-poison',
+  'data': { 'path': 'str', 'start': 'uint64', 'length': 'uint64' }}
+
 ##
 # @CxlUncorErrorType:
 #
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 5/6] hw/cxl: Add poison injection via the mailbox.
  2023-03-02 10:17 [PATCH v3 0/6] hw/cxl: Poison get, inject, clear Jonathan Cameron
                   ` (3 preceding siblings ...)
  2023-03-02 10:17 ` [PATCH v3 4/6] hw/cxl: QMP based poison injection support Jonathan Cameron
@ 2023-03-02 10:17 ` Jonathan Cameron
  2023-03-03  0:52   ` Ira Weiny
  2023-03-02 10:17 ` [PATCH v3 6/6] hw/cxl: Add clear poison mailbox command support Jonathan Cameron
  5 siblings, 1 reply; 12+ messages in thread
From: Jonathan Cameron @ 2023-03-02 10:17 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

Very simple implementation to allow testing of corresponding
kernel code. Note that for now we track each 64 byte section
independently.  Whilst a valid implementation choice, it may
make sense to fuse entries so as to prove out more complex
corners of the kernel code.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
v2:
- Match against inclusion of range of QMP injected poison.
- Endian fixes
---
 hw/cxl/cxl-mailbox-utils.c | 41 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 792d3ee5aa..da8732a547 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -64,6 +64,7 @@ enum {
         #define SET_LSA       0x3
     MEDIA_AND_POISON = 0x43,
         #define GET_POISON_LIST        0x0
+        #define INJECT_POISON          0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -472,6 +473,44 @@ static CXLRetCode cmd_media_get_poison_list(struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+static CXLRetCode cmd_media_inject_poison(struct cxl_cmd *cmd,
+                                          CXLDeviceState *cxl_dstate,
+                                          uint16_t *len)
+{
+    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+    CXLPoisonList *poison_list = &ct3d->poison_list;
+    CXLPoison *ent;
+    struct inject_poison_pl {
+        uint64_t dpa;
+    };
+    struct inject_poison_pl *in = (void *)cmd->payload;
+    uint64_t dpa = ldq_le_p(&in->dpa);
+    CXLPoison *p;
+
+    QLIST_FOREACH(ent, poison_list, node) {
+        if (dpa >= ent->start && dpa + 64 <= ent->start + ent->length) {
+            return CXL_MBOX_SUCCESS;
+        }
+    }
+
+    if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {
+        return CXL_MBOX_INJECT_POISON_LIMIT;
+    }
+    p = g_new0(CXLPoison, 1);
+
+    p->length = 64;
+    p->start = dpa;
+    p->type = CXL_POISON_TYPE_INJECTED;
+
+    /*
+     * Possible todo: Merge with existing entry if next to it and if same type
+     */
+    QLIST_INSERT_HEAD(poison_list, p, node);
+    ct3d->poison_list_cnt++;
+
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -501,6 +540,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
         ~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
     [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST",
         cmd_media_get_poison_list, 16, 0 },
+    [MEDIA_AND_POISON][INJECT_POISON] = { "MEDIA_AND_POISON_INJECT_POISON",
+        cmd_media_inject_poison, 8, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 6/6] hw/cxl: Add clear poison mailbox command support.
  2023-03-02 10:17 [PATCH v3 0/6] hw/cxl: Poison get, inject, clear Jonathan Cameron
                   ` (4 preceding siblings ...)
  2023-03-02 10:17 ` [PATCH v3 5/6] hw/cxl: Add poison injection via the mailbox Jonathan Cameron
@ 2023-03-02 10:17 ` Jonathan Cameron
  2023-03-03  1:05   ` Ira Weiny
  5 siblings, 1 reply; 12+ messages in thread
From: Jonathan Cameron @ 2023-03-02 10:17 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

Current implementation is very simple so many of the corner
cases do not exist (e.g. fragmenting larger poison list entries)

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
v2:
- Endian fix
---
 hw/cxl/cxl-mailbox-utils.c  | 79 +++++++++++++++++++++++++++++++++++++
 hw/mem/cxl_type3.c          | 36 +++++++++++++++++
 include/hw/cxl/cxl_device.h |  1 +
 3 files changed, 116 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index da8732a547..f2a339bedc 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -65,6 +65,7 @@ enum {
     MEDIA_AND_POISON = 0x43,
         #define GET_POISON_LIST        0x0
         #define INJECT_POISON          0x1
+        #define CLEAR_POISON           0x2
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -511,6 +512,82 @@ static CXLRetCode cmd_media_inject_poison(struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
+                                         CXLDeviceState *cxl_dstate,
+                                         uint16_t *len)
+{
+    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+    CXLPoisonList *poison_list = &ct3d->poison_list;
+    CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
+    struct clear_poison_pl {
+        uint64_t dpa;
+        uint8_t data[64];
+    };
+    CXLPoison *ent;
+    uint64_t dpa;
+
+    struct clear_poison_pl *in = (void *)cmd->payload;
+
+    dpa = ldq_le_p(&in->dpa);
+    if (dpa + 64 > cxl_dstate->mem_size) {
+        return CXL_MBOX_INVALID_PA;
+    }
+
+    QLIST_FOREACH(ent, poison_list, node) {
+        /*
+         * Test for contained in entry. Simpler than general case
+         * as clearing 64 bytes and entries 64 byte aligned
+         */
+        if ((dpa < ent->start) || (dpa >= ent->start + ent->length)) {
+            continue;
+        }
+        /* Do accounting early as we know one will go away */
+        ct3d->poison_list_cnt--;
+        if (dpa > ent->start) {
+            CXLPoison *frag;
+            if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {
+                cxl_set_poison_list_overflowed(ct3d);
+                break;
+            }
+            frag = g_new0(CXLPoison, 1);
+
+            frag->start = ent->start;
+            frag->length = dpa - ent->start;
+            frag->type = ent->type;
+
+            QLIST_INSERT_HEAD(poison_list, frag, node);
+            ct3d->poison_list_cnt++;
+        }
+        if (dpa + 64 < ent->start + ent->length) {
+            CXLPoison *frag;
+
+            if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {
+                cxl_set_poison_list_overflowed(ct3d);
+                break;
+            }
+
+            frag = g_new0(CXLPoison, 1);
+
+            frag->start = dpa + 64;
+            frag->length = ent->start + ent->length - frag->start;
+            frag->type = ent->type;
+            QLIST_INSERT_HEAD(poison_list, frag, node);
+            ct3d->poison_list_cnt++;
+        }
+        /* Any fragments have been added, free original entry */
+        QLIST_REMOVE(ent, node);
+        g_free(ent);
+        break;
+    }
+    /* Clearing a region with no poison is not an error so always do so */
+    if (cvc->set_cacheline)
+        if (!cvc->set_cacheline(ct3d, dpa, in->data)) {
+            return CXL_MBOX_INTERNAL_ERROR;
+        }
+
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -542,6 +619,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
         cmd_media_get_poison_list, 16, 0 },
     [MEDIA_AND_POISON][INJECT_POISON] = { "MEDIA_AND_POISON_INJECT_POISON",
         cmd_media_inject_poison, 8, 0 },
+    [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
+        cmd_media_clear_poison, 72, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 21e3a84785..44ffc7d9b0 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -919,6 +919,41 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
      */
 }
 
+static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
+{
+    MemoryRegion *vmr = NULL, *pmr = NULL;
+    AddressSpace *as;
+
+    if (ct3d->hostvmem) {
+        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+    }
+    if (ct3d->hostpmem) {
+        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+    }
+
+    if (!vmr && !pmr) {
+        return false;
+    }
+
+    if (dpa_offset + 64 > int128_get64(ct3d->cxl_dstate.mem_size)) {
+        return false;
+    }
+
+    if (vmr) {
+        if (dpa_offset <= int128_get64(vmr->size)) {
+            as = &ct3d->hostvmem_as;
+        } else {
+            as = &ct3d->hostpmem_as;
+            dpa_offset -= vmr->size;
+        }
+    } else {
+        as = &ct3d->hostpmem_as;
+    }
+
+    address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data, 64);
+    return true;
+}
+
 void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d)
 {
         ct3d->poison_list_overflowed = true;
@@ -1140,6 +1175,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
     cvc->get_lsa_size = get_lsa_size;
     cvc->get_lsa = get_lsa;
     cvc->set_lsa = set_lsa;
+    cvc->set_cacheline = set_cacheline;
 }
 
 static const TypeInfo ct3d_info = {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 32c234ea91..73328a52cf 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -298,6 +298,7 @@ struct CXLType3Class {
                         uint64_t offset);
     void (*set_lsa)(CXLType3Dev *ct3d, const void *buf, uint64_t size,
                     uint64_t offset);
+    bool (*set_cacheline)(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data);
 };
 
 MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 4/6] hw/cxl: QMP based poison injection support
  2023-03-02 10:17 ` [PATCH v3 4/6] hw/cxl: QMP based poison injection support Jonathan Cameron
@ 2023-03-03  0:50   ` Ira Weiny
  2023-03-03 10:03     ` Jonathan Cameron
  0 siblings, 1 reply; 12+ messages in thread
From: Ira Weiny @ 2023-03-03  0:50 UTC (permalink / raw)
  To: Jonathan Cameron, qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

Jonathan Cameron wrote:
> Inject poison using qmp command cxl-inject-poison to add an entry to the
> poison list.
> 
> For now, the poison is not returned CXL.mem reads, but only via the
> mailbox command Get Poison List.
> 
> See CXL rev 3.0, sec 8.2.9.8.4.1 Get Poison list (Opcode 4300h)
> 
> Kernel patches to use this interface here:
> https://lore.kernel.org/linux-cxl/cover.1665606782.git.alison.schofield@intel.com/
> 
> To inject poison using qmp (telnet to the qmp port)
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-inject-poison",
>     "arguments": {
>          "path": "/machine/peripheral/cxl-pmem0",
>          "start": 2048,
>          "length": 256
>     }
> }
> 
> Adjusted to select a device on your machine.
> 
> Note that the poison list supported is kept short enough to avoid the
> complexity of state machine that is needed to handle the MORE flag.
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> ---
> v2:
> Improve QMP documentation.
> Fix up some endian issues

[...]

> +/*
> + * This is very inefficient, but good enough for now!
> + * Also the payload will always fit, so no need to handle the MORE flag and
> + * make this stateful. We may want to allow longer poison lists to aid
> + * testing that kernel functionality.
> + */
> +static CXLRetCode cmd_media_get_poison_list(struct cxl_cmd *cmd,
> +                                            CXLDeviceState *cxl_dstate,
> +                                            uint16_t *len)
> +{
> +    struct get_poison_list_pl {
> +        uint64_t pa;
> +        uint64_t length;
> +    } QEMU_PACKED;
> +
> +    struct get_poison_list_out_pl {
> +        uint8_t flags;
> +        uint8_t rsvd1;
> +        uint64_t overflow_timestamp;
> +        uint16_t count;
> +        uint8_t rsvd2[0x14];
> +        struct {
> +            uint64_t addr;
> +            uint32_t length;
> +            uint32_t resv;
> +        } QEMU_PACKED records[];
> +    } QEMU_PACKED;
> +
> +    struct get_poison_list_pl *in = (void *)cmd->payload;
> +    struct get_poison_list_out_pl *out = (void *)cmd->payload;
> +    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +    uint16_t record_count = 0, i = 0;
> +    uint64_t query_start, query_length;
> +    CXLPoisonList *poison_list = &ct3d->poison_list;
> +    CXLPoison *ent;
> +    uint16_t out_pl_len;
> +
> +    query_start = ldq_le_p(&in->pa);
> +    /* 64 byte alignemnt required */
> +    if (query_start & 0x3f) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +    query_length = ldq_le_p(&in->length) * 64;
> +
> +    QLIST_FOREACH(ent, poison_list, node) {
> +        /* Check for no overlap */
> +        if (ent->start >= query_start + query_length ||
> +            ent->start + ent->length <= query_start) {
> +            continue;
> +        }
> +        record_count++;
> +    }
> +    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> +    memset(out, 0, out_pl_len);
> +    QLIST_FOREACH(ent, poison_list, node) {
> +        uint64_t start, stop;
> +
> +        /* Check for no overlap */
> +        if (ent->start >= query_start + query_length ||
> +            ent->start + ent->length <= query_start) {
> +            continue;
> +        }
> +
> +        /* Deal with overlap */
> +        start = MAX(ent->start & 0xffffffffffffffc0, query_start);
> +        stop = MIN((ent->start & 0xffffffffffffffc0) + ent->length,
> +                   query_start + query_length);
> +        stq_le_p(&out->records[i].addr, start | (ent->type & 0x3));

Shouldn't the mask here be 0x7?  I see we have not define Vendor Specific
which I think is good but maybe better to allow it here?  I'm just not
sure what is going to happen if someone comes along later and wants to
use that value.

Ira

> +        stl_le_p(&out->records[i].length, (stop - start) / 64);
> +        i++;
> +    }
> +    if (ct3d->poison_list_overflowed) {
> +        out->flags = (1 << 1);
> +        stq_le_p(&out->overflow_timestamp, ct3d->poison_list_overflow_ts);
> +    }
> +    stw_le_p(&out->count, record_count);
> +    *len = out_pl_len;
> +    return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -411,6 +499,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>      [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 },
>      [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
>          ~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
> +    [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST",
> +        cmd_media_get_poison_list, 16, 0 },
>  };
>  
>  void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 572ad47fa2..21e3a84785 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -919,6 +919,62 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
>       */
>  }
>  
> +void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d)
> +{
> +        ct3d->poison_list_overflowed = true;
> +        ct3d->poison_list_overflow_ts =
> +            cxl_device_get_timestamp(&ct3d->cxl_dstate);
> +}
> +
> +void qmp_cxl_inject_poison(const char *path, uint64_t start, uint64_t length,
> +                           Error **errp)
> +{
> +    Object *obj = object_resolve_path(path, NULL);
> +    CXLType3Dev *ct3d;
> +    CXLPoison *p;
> +
> +    if (length % 64) {
> +        error_setg(errp, "Poison injection must be in multiples of 64 bytes");
> +        return;
> +    }
> +    if (start % 64) {
> +        error_setg(errp, "Poison start address must be 64 byte aligned");
> +        return;
> +    }
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve path");
> +        return;
> +    }
> +    if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> +        error_setg(errp, "Path does not point to a CXL type 3 device");
> +        return;
> +    }
> +
> +    ct3d = CXL_TYPE3(obj);
> +
> +    QLIST_FOREACH(p, &ct3d->poison_list, node) {
> +        if (((start >= p->start) && (start < p->start + p->length)) ||
> +            ((start + length > p->start) &&
> +             (start + length <= p->start + p->length))) {
> +            error_setg(errp, "Overlap with existing poisoned region not supported");
> +            return;
> +        }
> +    }
> +
> +    if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {
> +        cxl_set_poison_list_overflowed(ct3d);
> +        return;
> +    }
> +
> +    p = g_new0(CXLPoison, 1);
> +    p->length = length;
> +    p->start = start;
> +    p->type = CXL_POISON_TYPE_INTERNAL; /* Different from injected via the mbox */
> +
> +    QLIST_INSERT_HEAD(&ct3d->poison_list, p, node);
> +    ct3d->poison_list_cnt++;
> +}
> +
>  /* For uncorrectable errors include support for multiple header recording */
>  void qmp_cxl_inject_uncorrectable_errors(const char *path,
>                                           CXLUncorErrorRecordList *errors,
> diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
> index d574c58f9a..fd1166a610 100644
> --- a/hw/mem/cxl_type3_stubs.c
> +++ b/hw/mem/cxl_type3_stubs.c
> @@ -3,6 +3,12 @@
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-cxl.h"
>  
> +void qmp_cxl_inject_poison(const char *path, uint64_t start, uint64_t length,
> +                           Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> +
>  void qmp_cxl_inject_uncorrectable_errors(const char *path,
>                                           CXLUncorErrorRecordList *errors,
>                                           Error **errp)
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 02befda0f6..32c234ea91 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -242,6 +242,18 @@ typedef struct CXLError {
>  
>  typedef QTAILQ_HEAD(, CXLError) CXLErrorList;
>  
> +typedef struct CXLPoison {
> +    uint64_t start, length;
> +    uint8_t type;
> +#define CXL_POISON_TYPE_EXTERNAL 0x1
> +#define CXL_POISON_TYPE_INTERNAL 0x2
> +#define CXL_POISON_TYPE_INJECTED 0x3
> +    QLIST_ENTRY(CXLPoison) node;
> +} CXLPoison;
> +
> +typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
> +#define CXL_POISON_LIST_LIMIT 256
> +
>  struct CXLType3Dev {
>      /* Private */
>      PCIDevice parent_obj;
> @@ -264,6 +276,12 @@ struct CXLType3Dev {
>  
>      /* Error injection */
>      CXLErrorList error_list;
> +
> +    /* Poison Injection - cache */
> +    CXLPoisonList poison_list;
> +    unsigned int poison_list_cnt;
> +    bool poison_list_overflowed;
> +    uint64_t poison_list_overflow_ts;
>  };
>  
>  #define TYPE_CXL_TYPE3 "cxl-type3"
> @@ -289,4 +307,6 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
>  
>  uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds);
>  
> +void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
> +
>  #endif
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4be7d46041..9ebd680dfe 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -5,6 +5,24 @@
>  # = CXL devices
>  ##
>  
> +##
> +# @cxl-inject-poison:
> +#
> +# Poison records indicate that a CXL memory device knows that a particular
> +# memory region may be corrupted. This may be because of locally detected
> +# errors (e.g. ECC failure) or poisoned writes received from other components
> +# in the system. This injection mechanism enables testing of the OS handling
> +# of poison records which may be queried via the CXL mailbox.
> +#
> +# @path: CXL type 3 device canonical QOM path
> +# @start: Start address - must be 64 byte aligned.
> +# @length: Length of poison to inject - must be a multiple of 64 bytes.
> +#
> +# Since: 8.0
> +##
> +{ 'command': 'cxl-inject-poison',
> +  'data': { 'path': 'str', 'start': 'uint64', 'length': 'uint64' }}
> +
>  ##
>  # @CxlUncorErrorType:
>  #
> -- 
> 2.37.2
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 5/6] hw/cxl: Add poison injection via the mailbox.
  2023-03-02 10:17 ` [PATCH v3 5/6] hw/cxl: Add poison injection via the mailbox Jonathan Cameron
@ 2023-03-03  0:52   ` Ira Weiny
  0 siblings, 0 replies; 12+ messages in thread
From: Ira Weiny @ 2023-03-03  0:52 UTC (permalink / raw)
  To: Jonathan Cameron, qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

Jonathan Cameron wrote:
> Very simple implementation to allow testing of corresponding
> kernel code. Note that for now we track each 64 byte section
> independently.  Whilst a valid implementation choice, it may
> make sense to fuse entries so as to prove out more complex
> corners of the kernel code.
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 6/6] hw/cxl: Add clear poison mailbox command support.
  2023-03-02 10:17 ` [PATCH v3 6/6] hw/cxl: Add clear poison mailbox command support Jonathan Cameron
@ 2023-03-03  1:05   ` Ira Weiny
  2023-03-03  9:57     ` Jonathan Cameron
  0 siblings, 1 reply; 12+ messages in thread
From: Ira Weiny @ 2023-03-03  1:05 UTC (permalink / raw)
  To: Jonathan Cameron, qemu-devel, Michael Tsirkin, Fan Ni
  Cc: linux-cxl, linuxarm, Ira Weiny, Alison Schofield, Michael Roth,
	Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

Jonathan Cameron wrote:
> Current implementation is very simple so many of the corner
> cases do not exist (e.g. fragmenting larger poison list entries)
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> v2:
> - Endian fix
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 79 +++++++++++++++++++++++++++++++++++++
>  hw/mem/cxl_type3.c          | 36 +++++++++++++++++
>  include/hw/cxl/cxl_device.h |  1 +
>  3 files changed, 116 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index da8732a547..f2a339bedc 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -65,6 +65,7 @@ enum {
>      MEDIA_AND_POISON = 0x43,
>          #define GET_POISON_LIST        0x0
>          #define INJECT_POISON          0x1
> +        #define CLEAR_POISON           0x2
>  };
>  
>  /* 8.2.8.4.5.1 Command Return Codes */
> @@ -511,6 +512,82 @@ static CXLRetCode cmd_media_inject_poison(struct cxl_cmd *cmd,
>      return CXL_MBOX_SUCCESS;
>  }
>  
> +static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
> +                                         CXLDeviceState *cxl_dstate,
> +                                         uint16_t *len)
> +{
> +    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +    CXLPoisonList *poison_list = &ct3d->poison_list;
> +    CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
> +    struct clear_poison_pl {
> +        uint64_t dpa;
> +        uint8_t data[64];
> +    };
> +    CXLPoison *ent;
> +    uint64_t dpa;
> +
> +    struct clear_poison_pl *in = (void *)cmd->payload;
> +
> +    dpa = ldq_le_p(&in->dpa);
> +    if (dpa + 64 > cxl_dstate->mem_size) {
> +        return CXL_MBOX_INVALID_PA;
> +    }
> +
> +    QLIST_FOREACH(ent, poison_list, node) {

Because you are removing from the list I think this needs to be
QLIST_FOREACH_SAFE()? 

> +        /*
> +         * Test for contained in entry. Simpler than general case
> +         * as clearing 64 bytes and entries 64 byte aligned
> +         */
> +        if ((dpa < ent->start) || (dpa >= ent->start + ent->length)) {
> +            continue;
> +        }
> +        /* Do accounting early as we know one will go away */
> +        ct3d->poison_list_cnt--;
> +        if (dpa > ent->start) {
> +            CXLPoison *frag;
> +            if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {

I'm still not seeing how this is ever going to be true with the above
decrement?

> +                cxl_set_poison_list_overflowed(ct3d);
> +                break;
> +            }
> +            frag = g_new0(CXLPoison, 1);
> +
> +            frag->start = ent->start;
> +            frag->length = dpa - ent->start;
> +            frag->type = ent->type;
> +
> +            QLIST_INSERT_HEAD(poison_list, frag, node);
> +            ct3d->poison_list_cnt++;
> +        }
> +        if (dpa + 64 < ent->start + ent->length) {
> +            CXLPoison *frag;
> +
> +            if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {

Or this one.

> +                cxl_set_poison_list_overflowed(ct3d);
> +                break;
> +            }
> +
> +            frag = g_new0(CXLPoison, 1);
> +
> +            frag->start = dpa + 64;
> +            frag->length = ent->start + ent->length - frag->start;
> +            frag->type = ent->type;
> +            QLIST_INSERT_HEAD(poison_list, frag, node);
> +            ct3d->poison_list_cnt++;
> +        }
> +        /* Any fragments have been added, free original entry */
> +        QLIST_REMOVE(ent, node);

I think the decrement needs to happen here.

> +        g_free(ent);
> +        break;
> +    }
> +    /* Clearing a region with no poison is not an error so always do so */
> +    if (cvc->set_cacheline)
> +        if (!cvc->set_cacheline(ct3d, dpa, in->data)) {
> +            return CXL_MBOX_INTERNAL_ERROR;
> +        }
> +
> +    return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -542,6 +619,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>          cmd_media_get_poison_list, 16, 0 },
>      [MEDIA_AND_POISON][INJECT_POISON] = { "MEDIA_AND_POISON_INJECT_POISON",
>          cmd_media_inject_poison, 8, 0 },
> +    [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
> +        cmd_media_clear_poison, 72, 0 },
>  };
>  
>  void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 21e3a84785..44ffc7d9b0 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -919,6 +919,41 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
>       */
>  }
>  
> +static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
> +{
> +    MemoryRegion *vmr = NULL, *pmr = NULL;
> +    AddressSpace *as;
> +
> +    if (ct3d->hostvmem) {
> +        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> +    }
> +    if (ct3d->hostpmem) {
> +        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> +    }
> +
> +    if (!vmr && !pmr) {
> +        return false;
> +    }
> +
> +    if (dpa_offset + 64 > int128_get64(ct3d->cxl_dstate.mem_size)) {
> +        return false;
> +    }
> +
> +    if (vmr) {
> +        if (dpa_offset <= int128_get64(vmr->size)) {

vmr->size - 64?

Ira

> +            as = &ct3d->hostvmem_as;
> +        } else {
> +            as = &ct3d->hostpmem_as;
> +            dpa_offset -= vmr->size;
> +        }
> +    } else {
> +        as = &ct3d->hostpmem_as;
> +    }
> +
> +    address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data, 64);
> +    return true;
> +}
> +
>  void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d)
>  {
>          ct3d->poison_list_overflowed = true;
> @@ -1140,6 +1175,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
>      cvc->get_lsa_size = get_lsa_size;
>      cvc->get_lsa = get_lsa;
>      cvc->set_lsa = set_lsa;
> +    cvc->set_cacheline = set_cacheline;
>  }
>  
>  static const TypeInfo ct3d_info = {
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 32c234ea91..73328a52cf 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -298,6 +298,7 @@ struct CXLType3Class {
>                          uint64_t offset);
>      void (*set_lsa)(CXLType3Dev *ct3d, const void *buf, uint64_t size,
>                      uint64_t offset);
> +    bool (*set_cacheline)(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data);
>  };
>  
>  MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
> -- 
> 2.37.2
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 6/6] hw/cxl: Add clear poison mailbox command support.
  2023-03-03  1:05   ` Ira Weiny
@ 2023-03-03  9:57     ` Jonathan Cameron
  0 siblings, 0 replies; 12+ messages in thread
From: Jonathan Cameron @ 2023-03-03  9:57 UTC (permalink / raw)
  To: Ira Weiny
  Cc: qemu-devel, Michael Tsirkin, Fan Ni, linux-cxl, linuxarm,
	Alison Schofield, Michael Roth, Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster

On Thu, 2 Mar 2023 17:05:22 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> Jonathan Cameron wrote:
> > Current implementation is very simple so many of the corner
> > cases do not exist (e.g. fragmenting larger poison list entries)
> > 
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> > v2:
> > - Endian fix
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  | 79 +++++++++++++++++++++++++++++++++++++
> >  hw/mem/cxl_type3.c          | 36 +++++++++++++++++
> >  include/hw/cxl/cxl_device.h |  1 +
> >  3 files changed, 116 insertions(+)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index da8732a547..f2a339bedc 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -65,6 +65,7 @@ enum {
> >      MEDIA_AND_POISON = 0x43,
> >          #define GET_POISON_LIST        0x0
> >          #define INJECT_POISON          0x1
> > +        #define CLEAR_POISON           0x2
> >  };
> >  
> >  /* 8.2.8.4.5.1 Command Return Codes */
> > @@ -511,6 +512,82 @@ static CXLRetCode cmd_media_inject_poison(struct cxl_cmd *cmd,
> >      return CXL_MBOX_SUCCESS;
> >  }
> >  
> > +static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
> > +                                         CXLDeviceState *cxl_dstate,
> > +                                         uint16_t *len)
> > +{
> > +    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> > +    CXLPoisonList *poison_list = &ct3d->poison_list;
> > +    CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
> > +    struct clear_poison_pl {
> > +        uint64_t dpa;
> > +        uint8_t data[64];
> > +    };
> > +    CXLPoison *ent;
> > +    uint64_t dpa;
> > +
> > +    struct clear_poison_pl *in = (void *)cmd->payload;
> > +
> > +    dpa = ldq_le_p(&in->dpa);
> > +    if (dpa + 64 > cxl_dstate->mem_size) {
> > +        return CXL_MBOX_INVALID_PA;
> > +    }
> > +
> > +    QLIST_FOREACH(ent, poison_list, node) {  
> 
> Because you are removing from the list I think this needs to be
> QLIST_FOREACH_SAFE()? 
> 

I don't think we need it because we break immediately after the remove so the
list pointers are touched any more.

> > +        /*
> > +         * Test for contained in entry. Simpler than general case
> > +         * as clearing 64 bytes and entries 64 byte aligned
> > +         */
> > +        if ((dpa < ent->start) || (dpa >= ent->start + ent->length)) {
> > +            continue;
> > +        }
> > +        /* Do accounting early as we know one will go away */
> > +        ct3d->poison_list_cnt--;
> > +        if (dpa > ent->start) {
> > +            CXLPoison *frag;
> > +            if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {  
> 
> I'm still not seeing how this is ever going to be true with the above
> decrement?

Ah I'd missed the comment previously.
Agreed on this one.

> 
> > +                cxl_set_poison_list_overflowed(ct3d);
> > +                break;
> > +            }
> > +            frag = g_new0(CXLPoison, 1);
> > +
> > +            frag->start = ent->start;
> > +            frag->length = dpa - ent->start;
> > +            frag->type = ent->type;
> > +
> > +            QLIST_INSERT_HEAD(poison_list, frag, node);
> > +            ct3d->poison_list_cnt++;
> > +        }
> > +        if (dpa + 64 < ent->start + ent->length) {
> > +            CXLPoison *frag;
> > +
> > +            if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {  
> 
> Or this one.

This one however is needed but has another bug (sigh)
This is the hole punching case.  We are removing the middle
of an existing long entry.  As such we take one away and gain
2.  At this stage we've already gained one.

Whilst we know we are going to remove one element later, we will
still end up overflowing.   Not overflow is > CXL_POISON_LIST_LIMIT

The bug that is hiding here is that we have added an entry, but not
removed any yet. So if we break in this condition we are over by one.

The spec is silent on what should happen in this case (I think)
so best we can do is add the first new entry in place of the old
entry and just overflow for the second new entry.  


> 
> > +                cxl_set_poison_list_overflowed(ct3d);
> > +                break;

So no break here, and following code in an else.

> > +            }
> > +
> > +            frag = g_new0(CXLPoison, 1);
> > +
> > +            frag->start = dpa + 64;
> > +            frag->length = ent->start + ent->length - frag->start;
> > +            frag->type = ent->type;
> > +            QLIST_INSERT_HEAD(poison_list, frag, node);
> > +            ct3d->poison_list_cnt++;
> > +        }
> > +        /* Any fragments have been added, free original entry */
> > +        QLIST_REMOVE(ent, node);  
> 
> I think the decrement needs to happen here.

For the implementation this was meant to be (without the bug above)
we are cheating a little.  We 'could' take the entry off the list first
before working out if we need to add the entries in.  Thus ensuring we
remain below the list limit.  Instead we leave it on the list, add new
elements and drop it later.  To do that we need to briefly allow the
list to be one larger than it will be at the end of the code.
That's done by doing the decrement early so that the rest of the code
runs with a list that is one bigger than it appears to be.

> 
> > +        g_free(ent);
> > +        break;
> > +    }
> > +    /* Clearing a region with no poison is not an error so always do so */
> > +    if (cvc->set_cacheline)
> > +        if (!cvc->set_cacheline(ct3d, dpa, in->data)) {
> > +            return CXL_MBOX_INTERNAL_ERROR;
> > +        }
> > +
> > +    return CXL_MBOX_SUCCESS;
> > +}
> > +
> >  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> >  #define IMMEDIATE_DATA_CHANGE (1 << 2)
> >  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> > @@ -542,6 +619,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
> >          cmd_media_get_poison_list, 16, 0 },
> >      [MEDIA_AND_POISON][INJECT_POISON] = { "MEDIA_AND_POISON_INJECT_POISON",
> >          cmd_media_inject_poison, 8, 0 },
> > +    [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
> > +        cmd_media_clear_poison, 72, 0 },
> >  };
> >  
> >  void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 21e3a84785..44ffc7d9b0 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -919,6 +919,41 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
> >       */
> >  }
> >  
> > +static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
> > +{
> > +    MemoryRegion *vmr = NULL, *pmr = NULL;
> > +    AddressSpace *as;
> > +
> > +    if (ct3d->hostvmem) {
> > +        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> > +    }
> > +    if (ct3d->hostpmem) {
> > +        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> > +    }
> > +
> > +    if (!vmr && !pmr) {
> > +        return false;
> > +    }
> > +
> > +    if (dpa_offset + 64 > int128_get64(ct3d->cxl_dstate.mem_size)) {
> > +        return false;
> > +    }
> > +
> > +    if (vmr) {
> > +        if (dpa_offset <= int128_get64(vmr->size)) {  
> 
> vmr->size - 64?
Copy and paste error. Until very recently this was broken in volatile patch
series and I'd missed code was duplicated here. Given they are 64 byte
aligned we can make this simply < int128_get64(vmr->size)

Thanks,

Jonathan


> Ira

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 4/6] hw/cxl: QMP based poison injection support
  2023-03-03  0:50   ` Ira Weiny
@ 2023-03-03 10:03     ` Jonathan Cameron
  0 siblings, 0 replies; 12+ messages in thread
From: Jonathan Cameron @ 2023-03-03 10:03 UTC (permalink / raw)
  To: Ira Weiny
  Cc: qemu-devel, Michael Tsirkin, Fan Ni, linux-cxl, linuxarm,
	Alison Schofield, Michael Roth, Philippe Mathieu-Daudé,
	Dave Jiang, Markus Armbruster


> > +    memset(out, 0, out_pl_len);
> > +    QLIST_FOREACH(ent, poison_list, node) {
> > +        uint64_t start, stop;
> > +
> > +        /* Check for no overlap */
> > +        if (ent->start >= query_start + query_length ||
> > +            ent->start + ent->length <= query_start) {
> > +            continue;
> > +        }
> > +
> > +        /* Deal with overlap */
> > +        start = MAX(ent->start & 0xffffffffffffffc0, query_start);
> > +        stop = MIN((ent->start & 0xffffffffffffffc0) + ent->length,
> > +                   query_start + query_length);
> > +        stq_le_p(&out->records[i].addr, start | (ent->type & 0x3));  
> 
> Shouldn't the mask here be 0x7?  I see we have not define Vendor Specific
> which I think is good but maybe better to allow it here?  I'm just not
> sure what is going to happen if someone comes along later and wants to
> use that value.

As things stand there is no way to inject a vendor defined poison entry
so expanding the mask has no affect.  However, it'll be one less change needed
if we ever add that support to the qmp interface so I'll make the change
as I'm respinning for your other feedback.

Given we've not supported anything vendor defined yet, I'm not that bothered
about adding the qmp support any time soon ;)

+ where else can poison come from?  It's either in the device or it's not
or it doesn't exist and was injected.  I'll be interested to see the arguement
for another source unless vendor defined really means 'I don't know'.

Thanks for the review and eagle eyed register checking :)

Jonathan




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-03-03 10:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-02 10:17 [PATCH v3 0/6] hw/cxl: Poison get, inject, clear Jonathan Cameron
2023-03-02 10:17 ` [PATCH v3 1/6] hw/cxl: rename mailbox return code type from ret_code to CXLRetCode Jonathan Cameron
2023-03-02 10:17 ` [PATCH v3 2/6] hw/cxl: Introduce cxl_device_get_timestamp() utility function Jonathan Cameron
2023-03-02 10:17 ` [PATCH v3 3/6] bswap: Add the ability to store to an unaligned 24 bit field Jonathan Cameron
2023-03-02 10:17 ` [PATCH v3 4/6] hw/cxl: QMP based poison injection support Jonathan Cameron
2023-03-03  0:50   ` Ira Weiny
2023-03-03 10:03     ` Jonathan Cameron
2023-03-02 10:17 ` [PATCH v3 5/6] hw/cxl: Add poison injection via the mailbox Jonathan Cameron
2023-03-03  0:52   ` Ira Weiny
2023-03-02 10:17 ` [PATCH v3 6/6] hw/cxl: Add clear poison mailbox command support Jonathan Cameron
2023-03-03  1:05   ` Ira Weiny
2023-03-03  9:57     ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).