All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/12] Enabling DCD emulation support in Qemu
@ 2024-04-18 23:10 nifan.cxl
  2024-04-18 23:10 ` [PATCH v7 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
                   ` (13 more replies)
  0 siblings, 14 replies; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, fan.ni

A git tree of this series can be found here (with one extra commit on top
for printing out accepted/pending extent list): 
https://github.com/moking/qemu/tree/dcd-v7

v6->v7:

1. Fixed the dvsec range register issue mentioned in the the cover letter in v6.
   Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. (Jonathan)
3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
4. Added "Reviewed-by" tag to Patch 7.
5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
   reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. (Jørgen) 
6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
    (Jonathan)
7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
8. Modified QMP interfaces for adding/releasing DC extents to allow passing
   tags, selection policy, flags in the interface. (Jonathan, Gregory)
9. Redesigned the pending list so extents in the same requests are grouped
    together. A new data structure is introduced to represent "extent group"
    in pending list.  (Jonathan)
10. Added support in QMP interface for "More" flag. 
11. Check "Forced removal" flag for release request and not let it pass through.
12. Removed the dynamic capacity log type from CxlEventLog definition in cxl.json
   to avoid the side effect it may introduce to inject error to DC event log.
   (Jonathan)
13. Hard coded the event log type to dynamic capacity event log in QMP
    interfaces. (Jonathan)
14. Adding space in between "-1]". (Jonathan)
15. Some minor comment fixes.

The code is tested with similar setup and has passed similar tests as listed
in the cover letter of v5[1] and v6[2].
Also, the code is tested with the latest DCD kernel patchset[3].

[1] Qemu DCD patchset v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
[2] Qemu DCD patchset v6: https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan.cxl@gmail.com/T/#t
[3] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3


Fan Ni (12):
  hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
    payload of identify memory device command
  hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
    and mailbox command support
  include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
    type3 memory devices
  hw/mem/cxl_type3: Add support to create DC regions to type3 memory
    devices
  hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
    size instead of mr as argument
  hw/mem/cxl_type3: Add host backend and address space handling for DC
    regions
  hw/mem/cxl_type3: Add DC extent list representative and get DC extent
    list mailbox support
  hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
    dynamic capacity response
  hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
    extents
  hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
  hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
  hw/mem/cxl_type3: Allow to release extent superset in QMP interface

 hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
 hw/mem/cxl_type3_stubs.c    |  20 ++
 include/hw/cxl/cxl_device.h |  81 ++++-
 include/hw/cxl/cxl_events.h |  18 +
 qapi/cxl.json               |  69 ++++
 6 files changed, 1396 insertions(+), 45 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v7 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
@ 2024-04-18 23:10 ` nifan.cxl
  2024-04-19 16:40   ` Gregory Price
  2024-04-18 23:10 ` [PATCH v7 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Based on CXL spec r3.1 Table 8-127 (Identify Memory Device Output
Payload), dynamic capacity event log size should be part of
output of the Identify command.
Add dc_event_log_size to the output payload for the host to get the info.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4bcd727f4c..ba1d9901df 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -21,6 +21,7 @@
 #include "sysemu/hostmem.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
+#define CXL_DC_EVENT_LOG_SIZE 8
 
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
@@ -780,8 +781,9 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
         uint16_t inject_poison_limit;
         uint8_t poison_caps;
         uint8_t qos_telemetry_caps;
+        uint16_t dc_event_log_size;
     } QEMU_PACKED *id;
-    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
+    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x45);
     CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
     CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
@@ -807,6 +809,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     st24_le_p(id->poison_list_max_mer, 256);
     /* No limit - so limited by main poison record limit */
     stw_le_p(&id->inject_poison_limit, 0);
+    stw_le_p(&id->dc_event_log_size, CXL_DC_EVENT_LOG_SIZE);
 
     *len_out = sizeof(*id);
     return CXL_MBOX_SUCCESS;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
  2024-04-18 23:10 ` [PATCH v7 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
@ 2024-04-18 23:10 ` nifan.cxl
  2024-04-19 16:44   ` Gregory Price
  2024-04-18 23:10 ` [PATCH v7 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Per cxl spec r3.1, add dynamic capacity region representative based on
Table 8-165 and extend the cxl type3 device definition to include DC region
information. Also, based on info in 8.2.9.9.9.1, add 'Get Dynamic Capacity
Configuration' mailbox support.

Note: we store region decode length as byte-wise length on the device, which
should be divided by 256 * MiB before being returned to the host
for "Get Dynamic Capacity Configuration" mailbox command per
specification.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 96 +++++++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_device.h | 16 +++++++
 2 files changed, 112 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index ba1d9901df..49c7944d93 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -22,6 +22,8 @@
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 #define CXL_DC_EVENT_LOG_SIZE 8
+#define CXL_NUM_EXTENTS_SUPPORTED 512
+#define CXL_NUM_TAGS_SUPPORTED 0
 
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
@@ -80,6 +82,8 @@ enum {
         #define GET_POISON_LIST        0x0
         #define INJECT_POISON          0x1
         #define CLEAR_POISON           0x2
+    DCD_CONFIG  = 0x48,
+        #define GET_DC_CONFIG          0x0
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1238,6 +1242,88 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * CXL r3.1 section 8.2.9.9.9.1: Get Dynamic Capacity Configuration
+ * (Opcode: 4800h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
+                                             uint8_t *payload_in,
+                                             size_t len_in,
+                                             uint8_t *payload_out,
+                                             size_t *len_out,
+                                             CXLCCI *cci)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    struct {
+        uint8_t region_cnt;
+        uint8_t start_rid;
+    } QEMU_PACKED *in = (void *)payload_in;
+    struct {
+        uint8_t num_regions;
+        uint8_t regions_returned;
+        uint8_t rsvd1[6];
+        struct {
+            uint64_t base;
+            uint64_t decode_len;
+            uint64_t region_len;
+            uint64_t block_size;
+            uint32_t dsmadhandle;
+            uint8_t flags;
+            uint8_t rsvd2[3];
+        } QEMU_PACKED records[];
+    } QEMU_PACKED *out = (void *)payload_out;
+    struct {
+        uint32_t num_extents_supported;
+        uint32_t num_extents_available;
+        uint32_t num_tags_supported;
+        uint32_t num_tags_available;
+    } QEMU_PACKED *extra_out;
+    uint16_t record_count;
+    uint16_t i;
+    uint16_t out_pl_len;
+    uint8_t start_rid;
+
+    start_rid = in->start_rid;
+    if (start_rid >= ct3d->dc.num_regions) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    record_count = MIN(ct3d->dc.num_regions - in->start_rid, in->region_cnt);
+
+    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+    extra_out = (void *)(payload_out + out_pl_len);
+    out_pl_len += sizeof(*extra_out);
+    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+    out->num_regions = ct3d->dc.num_regions;
+    out->regions_returned = record_count;
+    for (i = 0; i < record_count; i++) {
+        stq_le_p(&out->records[i].base,
+                 ct3d->dc.regions[start_rid + i].base);
+        stq_le_p(&out->records[i].decode_len,
+                 ct3d->dc.regions[start_rid + i].decode_len /
+                 CXL_CAPACITY_MULTIPLIER);
+        stq_le_p(&out->records[i].region_len,
+                 ct3d->dc.regions[start_rid + i].len);
+        stq_le_p(&out->records[i].block_size,
+                 ct3d->dc.regions[start_rid + i].block_size);
+        stl_le_p(&out->records[i].dsmadhandle,
+                 ct3d->dc.regions[start_rid + i].dsmadhandle);
+        out->records[i].flags = ct3d->dc.regions[start_rid + i].flags;
+    }
+    /*
+     * TODO: Assign values once extents and tags are introduced
+     * to use.
+     */
+    stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
+    stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
+    stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
+    stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
+
+    *len_out = out_pl_len;
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1282,6 +1368,11 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
         cmd_media_clear_poison, 72, 0 },
 };
 
+static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
+    [DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
+        cmd_dcd_get_dyn_cap_config, 2, 0 },
+};
+
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
     [INFOSTAT][IS_IDENTIFY] = { "IDENTIFY", cmd_infostat_identify, 0, 0 },
     [INFOSTAT][BACKGROUND_OPERATION_STATUS] = { "BACKGROUND_OPERATION_STATUS",
@@ -1487,7 +1578,12 @@ void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
 
 void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, size_t payload_max)
 {
+    CXLType3Dev *ct3d = CXL_TYPE3(d);
+
     cxl_copy_cci_commands(cci, cxl_cmd_set);
+    if (ct3d->dc.num_regions) {
+        cxl_copy_cci_commands(cci, cxl_cmd_set_dcd);
+    }
     cci->d = d;
 
     /* No separation for PCI MB as protocol handled in PCI device */
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index a5f8e25020..e839370266 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -422,6 +422,17 @@ typedef struct CXLPoison {
 typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 #define CXL_POISON_LIST_LIMIT 256
 
+#define DCD_MAX_NUM_REGION 8
+
+typedef struct CXLDCRegion {
+    uint64_t base;       /* aligned to 256*MiB */
+    uint64_t decode_len; /* aligned to 256*MiB */
+    uint64_t len;
+    uint64_t block_size;
+    uint32_t dsmadhandle;
+    uint8_t flags;
+} CXLDCRegion;
+
 struct CXLType3Dev {
     /* Private */
     PCIDevice parent_obj;
@@ -454,6 +465,11 @@ struct CXLType3Dev {
     unsigned int poison_list_cnt;
     bool poison_list_overflowed;
     uint64_t poison_list_overflow_ts;
+
+    struct dynamic_capacity {
+        uint8_t num_regions; /* 0-8 regions */
+        CXLDCRegion regions[DCD_MAX_NUM_REGION];
+    } dc;
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
  2024-04-18 23:10 ` [PATCH v7 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
  2024-04-18 23:10 ` [PATCH v7 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
@ 2024-04-18 23:10 ` nifan.cxl
  2024-04-19 16:45   ` Gregory Price
  2024-04-18 23:10 ` [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
pmem capacity, preparing for the introduction of dynamic capacity to support
dynamic capacity devices.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 4 ++--
 hw/mem/cxl_type3.c          | 8 ++++----
 include/hw/cxl/cxl_device.h | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 49c7944d93..0f2ad58a14 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -803,7 +803,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
     stq_le_p(&id->total_capacity,
-             cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER);
+             cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->persistent_capacity,
              cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->volatile_capacity,
@@ -1179,7 +1179,7 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     struct clear_poison_pl *in = (void *)payload_in;
 
     dpa = ldq_le_p(&in->dpa);
-    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->mem_size) {
+    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
         return CXL_MBOX_INVALID_PA;
     }
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index b0a7e9f11b..5d6d3ab87d 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -608,7 +608,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostvmem_as, vmr, v_name);
         ct3d->cxl_dstate.vmem_size = memory_region_size(vmr);
-        ct3d->cxl_dstate.mem_size += memory_region_size(vmr);
+        ct3d->cxl_dstate.static_mem_size += memory_region_size(vmr);
         g_free(v_name);
     }
 
@@ -631,7 +631,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostpmem_as, pmr, p_name);
         ct3d->cxl_dstate.pmem_size = memory_region_size(pmr);
-        ct3d->cxl_dstate.mem_size += memory_region_size(pmr);
+        ct3d->cxl_dstate.static_mem_size += memory_region_size(pmr);
         g_free(p_name);
     }
 
@@ -838,7 +838,7 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         return -EINVAL;
     }
 
-    if (*dpa_offset > ct3d->cxl_dstate.mem_size) {
+    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
         return -EINVAL;
     }
 
@@ -1011,7 +1011,7 @@ static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
         return false;
     }
 
-    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.mem_size) {
+    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
         return false;
     }
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e839370266..f7f56b44e3 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -234,7 +234,7 @@ typedef struct cxl_device_state {
     } timestamp;
 
     /* memory region size, HDM */
-    uint64_t mem_size;
+    uint64_t static_mem_size;
     uint64_t pmem_size;
     uint64_t vmem_size;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (2 preceding siblings ...)
  2024-04-18 23:10 ` [PATCH v7 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
@ 2024-04-18 23:10 ` nifan.cxl
  2024-04-19 16:47   ` Gregory Price
  2024-05-14  8:14     ` Zhijian Li (Fujitsu)
  2024-04-18 23:10 ` [PATCH v7 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument nifan.cxl
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

With the change, when setting up memory for type3 memory device, we can
create DC regions.
A property 'num-dc-regions' is added to ct3_props to allow users to pass the
number of DC regions to create. To make it easier, other region parameters
like region base, length, and block size are hard coded. If needed,
these parameters can be added easily.

With the change, we can create DC regions with proper kernel side
support like below:

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways

echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size

echo 0x40000000 > /sys/bus/cxl/devices/$region/size
echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/mem/cxl_type3.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 5d6d3ab87d..5ceed0ab4c 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -30,6 +30,7 @@
 #include "hw/pci/msix.h"
 
 #define DWORD_BYTE 4
+#define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 
 /* Default CDAT entries for a memory region */
 enum {
@@ -567,6 +568,46 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
     }
 }
 
+/*
+ * TODO: dc region configuration will be updated once host backend and address
+ * space support is added for DCD.
+ */
+static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
+{
+    int i;
+    uint64_t region_base = 0;
+    uint64_t region_len =  2 * GiB;
+    uint64_t decode_len = 2 * GiB;
+    uint64_t blk_size = 2 * MiB;
+    CXLDCRegion *region;
+    MemoryRegion *mr;
+
+    if (ct3d->hostvmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostvmem);
+        region_base += memory_region_size(mr);
+    }
+    if (ct3d->hostpmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostpmem);
+        region_base += memory_region_size(mr);
+    }
+    assert(region_base % CXL_CAPACITY_MULTIPLIER == 0);
+
+    for (i = 0, region = &ct3d->dc.regions[0];
+         i < ct3d->dc.num_regions;
+         i++, region++, region_base += region_len) {
+        *region = (CXLDCRegion) {
+            .base = region_base,
+            .decode_len = decode_len,
+            .len = region_len,
+            .block_size = blk_size,
+            /* dsmad_handle set when creating CDAT table entries */
+            .flags = 0,
+        };
+    }
+
+    return true;
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -635,6 +676,13 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         g_free(p_name);
     }
 
+    if (ct3d->dc.num_regions > 0) {
+        if (!cxl_create_dc_regions(ct3d, errp)) {
+            error_setg(errp, "setup DC regions failed");
+            return false;
+        }
+    }
+
     return true;
 }
 
@@ -931,6 +979,7 @@ static Property ct3_props[] = {
                      HostMemoryBackend *),
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
+    DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (3 preceding siblings ...)
  2024-04-18 23:10 ` [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
@ 2024-04-18 23:10 ` nifan.cxl
  2024-04-19 16:39   ` Gregory Price
  2024-04-18 23:10 ` [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

The function ct3_build_cdat_entries_for_mr only uses size of the passed
memory region argument, refactor the function definition to make the passed
arguments more specific.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/mem/cxl_type3.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 5ceed0ab4c..a1fe268560 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -44,7 +44,7 @@ enum {
 };
 
 static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
-                                          int dsmad_handle, MemoryRegion *mr,
+                                          int dsmad_handle, uint64_t size,
                                           bool is_pmem, uint64_t dpa_base)
 {
     CDATDsmas *dsmas;
@@ -63,7 +63,7 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
         .DSMADhandle = dsmad_handle,
         .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
         .DPA_base = dpa_base,
-        .DPA_length = memory_region_size(mr),
+        .DPA_length = size,
     };
 
     /* For now, no memory side cache, plausiblish numbers */
@@ -132,7 +132,7 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
          */
         .EFI_memory_type_attr = is_pmem ? 2 : 1,
         .DPA_offset = 0,
-        .DPA_length = memory_region_size(mr),
+        .DPA_length = size,
     };
 
     /* Header always at start of structure */
@@ -149,6 +149,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     g_autofree CDATSubHeader **table = NULL;
     CXLType3Dev *ct3d = priv;
     MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+    uint64_t vmr_size = 0, pmr_size = 0;
     int dsmad_handle = 0;
     int cur_ent = 0;
     int len = 0;
@@ -163,6 +164,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
+        vmr_size = memory_region_size(volatile_mr);
     }
 
     if (ct3d->hostpmem) {
@@ -171,21 +173,22 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
+        pmr_size = memory_region_size(nonvolatile_mr);
     }
 
     table = g_malloc0(len * sizeof(*table));
 
     /* Now fill them in */
     if (volatile_mr) {
-        ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
+        ct3_build_cdat_entries_for_mr(table, dsmad_handle++, vmr_size,
                                       false, 0);
         cur_ent = CT3_CDAT_NUM_ENTRIES;
     }
 
     if (nonvolatile_mr) {
-        uint64_t base = volatile_mr ? memory_region_size(volatile_mr) : 0;
+        uint64_t base = vmr_size;
         ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
-                                      nonvolatile_mr, true, base);
+                                      pmr_size, true, base);
         cur_ent += CT3_CDAT_NUM_ENTRIES;
     }
     assert(len == cur_ent);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (4 preceding siblings ...)
  2024-04-18 23:10 ` [PATCH v7 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument nifan.cxl
@ 2024-04-18 23:10 ` nifan.cxl
  2024-04-19 17:27   ` Gregory Price
                     ` (2 more replies)
  2024-04-18 23:10 ` [PATCH v7 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
                   ` (7 subsequent siblings)
  13 siblings, 3 replies; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Add (file/memory backed) host backend for DCD. All the dynamic capacity
regions will share a single, large enough host backend. Set up address
space for DC regions to support read/write operations to dynamic capacity
for DCD.

With the change, the following support is added:
1. Add a new property to type3 device "volatile-dc-memdev" to point to host
   memory backend for dynamic capacity. Currently, all DC regions share one
   host backend;
2. Add namespace for dynamic capacity for read/write support;
3. Create cdat entries for each dynamic capacity region.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  16 ++--
 hw/mem/cxl_type3.c          | 172 +++++++++++++++++++++++++++++-------
 include/hw/cxl/cxl_device.h |   8 ++
 3 files changed, 160 insertions(+), 36 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 0f2ad58a14..831cef0567 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -622,7 +622,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
                                                size_t *len_out,
                                                CXLCCI *cci)
 {
-    CXLDeviceState *cxl_dstate = &CXL_TYPE3(cci->d)->cxl_dstate;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
     struct {
         uint8_t slots_supported;
         uint8_t slot_info;
@@ -636,7 +637,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
     QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
 
     if ((cxl_dstate->vmem_size < CXL_CAPACITY_MULTIPLIER) ||
-        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER)) {
+        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) ||
+        (ct3d->dc.total_capacity < CXL_CAPACITY_MULTIPLIER)) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -793,7 +795,8 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -835,9 +838,11 @@ static CXLRetCode cmd_ccls_get_partition_info(const struct cxl_cmd *cmd,
         uint64_t next_pmem;
     } QEMU_PACKED *part_info = (void *)payload_out;
     QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
+    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -1179,7 +1184,8 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     struct clear_poison_pl *in = (void *)payload_in;
 
     dpa = ldq_le_p(&in->dpa);
-    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
+    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size +
+        ct3d->dc.total_capacity) {
         return CXL_MBOX_INVALID_PA;
     }
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a1fe268560..ac87398089 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -45,7 +45,8 @@ enum {
 
 static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
                                           int dsmad_handle, uint64_t size,
-                                          bool is_pmem, uint64_t dpa_base)
+                                          bool is_pmem, bool is_dynamic,
+                                          uint64_t dpa_base)
 {
     CDATDsmas *dsmas;
     CDATDslbis *dslbis0;
@@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
             .length = sizeof(*dsmas),
         },
         .DSMADhandle = dsmad_handle,
-        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
+        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
+                 (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
         .DPA_base = dpa_base,
         .DPA_length = size,
     };
@@ -149,12 +151,13 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     g_autofree CDATSubHeader **table = NULL;
     CXLType3Dev *ct3d = priv;
     MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+    MemoryRegion *dc_mr = NULL;
     uint64_t vmr_size = 0, pmr_size = 0;
     int dsmad_handle = 0;
     int cur_ent = 0;
     int len = 0;
 
-    if (!ct3d->hostpmem && !ct3d->hostvmem) {
+    if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions) {
         return 0;
     }
 
@@ -176,21 +179,54 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
         pmr_size = memory_region_size(nonvolatile_mr);
     }
 
+    if (ct3d->dc.num_regions) {
+        if (!ct3d->dc.host_dc) {
+            return -EINVAL;
+        }
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        if (!dc_mr) {
+            return -EINVAL;
+        }
+        len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
+    }
+
     table = g_malloc0(len * sizeof(*table));
 
     /* Now fill them in */
     if (volatile_mr) {
         ct3_build_cdat_entries_for_mr(table, dsmad_handle++, vmr_size,
-                                      false, 0);
+                                      false, false, 0);
         cur_ent = CT3_CDAT_NUM_ENTRIES;
     }
 
     if (nonvolatile_mr) {
         uint64_t base = vmr_size;
         ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
-                                      pmr_size, true, base);
+                                      pmr_size, true, false, base);
         cur_ent += CT3_CDAT_NUM_ENTRIES;
     }
+
+    if (dc_mr) {
+        int i;
+        uint64_t region_base = vmr_size + pmr_size;
+
+        /*
+         * TODO: we assume the dynamic capacity to be volatile for now.
+         * Non-volatile dynamic capacity will be added if needed in the
+         * future.
+         */
+        for (i = 0; i < ct3d->dc.num_regions; i++) {
+            ct3_build_cdat_entries_for_mr(&(table[cur_ent]),
+                                          dsmad_handle++,
+                                          ct3d->dc.regions[i].len,
+                                          false, true, region_base);
+            ct3d->dc.regions[i].dsmadhandle = dsmad_handle - 1;
+
+            cur_ent += CT3_CDAT_NUM_ENTRIES;
+            region_base += ct3d->dc.regions[i].len;
+        }
+    }
+
     assert(len == cur_ent);
 
     *cdat_table = g_steal_pointer(&table);
@@ -301,10 +337,16 @@ static void build_dvsecs(CXLType3Dev *ct3d)
             range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                              (ct3d->hostpmem->size & 0xF0000000);
         }
-    } else {
+    } else if (ct3d->hostpmem) {
         range1_size_hi = ct3d->hostpmem->size >> 32;
         range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                          (ct3d->hostpmem->size & 0xF0000000);
+    } else {
+        /*
+         * For DCD with no static memory, set memory active, memory class bits.
+         * No range is set.
+         */
+        range1_size_lo = (2 << 5) | (2 << 2) | 0x3;
     }
 
     dvsec = (uint8_t *)&(CXLDVSECDevice){
@@ -579,11 +621,27 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 {
     int i;
     uint64_t region_base = 0;
-    uint64_t region_len =  2 * GiB;
-    uint64_t decode_len = 2 * GiB;
+    uint64_t region_len;
+    uint64_t decode_len;
     uint64_t blk_size = 2 * MiB;
     CXLDCRegion *region;
     MemoryRegion *mr;
+    uint64_t dc_size;
+
+    mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+    dc_size = memory_region_size(mr);
+    region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
+
+    if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLIER) != 0) {
+        error_setg(errp, "host backend size must be multiples of region len");
+        return false;
+    }
+    if (region_len % CXL_CAPACITY_MULTIPLIER != 0) {
+        error_setg(errp, "DC region size is unaligned to %lx",
+                   CXL_CAPACITY_MULTIPLIER);
+        return false;
+    }
+    decode_len = region_len;
 
     if (ct3d->hostvmem) {
         mr = host_memory_backend_get_memory(ct3d->hostvmem);
@@ -606,6 +664,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
             /* dsmad_handle set when creating CDAT table entries */
             .flags = 0,
         };
+        ct3d->dc.total_capacity += region->len;
     }
 
     return true;
@@ -615,7 +674,8 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
 
-    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
+    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem
+        && !ct3d->dc.num_regions) {
         error_setg(errp, "at least one memdev property must be set");
         return false;
     } else if (ct3d->hostmem && ct3d->hostpmem) {
@@ -679,7 +739,37 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         g_free(p_name);
     }
 
+    ct3d->dc.total_capacity = 0;
     if (ct3d->dc.num_regions > 0) {
+        MemoryRegion *dc_mr;
+        char *dc_name;
+
+        if (!ct3d->dc.host_dc) {
+            error_setg(errp, "dynamic capacity must have a backing device");
+            return false;
+        }
+
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        if (!dc_mr) {
+            error_setg(errp, "dynamic capacity must have a backing device");
+            return false;
+        }
+
+        /*
+         * TODO: set dc as volatile for now, non-volatile support can be added
+         * in the future if needed.
+         */
+        memory_region_set_nonvolatile(dc_mr, false);
+        memory_region_set_enabled(dc_mr, true);
+        host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
+        if (ds->id) {
+            dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
+        } else {
+            dc_name = g_strdup("cxl-dcd-dpa-dc-space");
+        }
+        address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
+        g_free(dc_name);
+
         if (!cxl_create_dc_regions(ct3d, errp)) {
             error_setg(errp, "setup DC regions failed");
             return false;
@@ -776,6 +866,9 @@ err_release_cdat:
 err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
+    if (ct3d->dc.host_dc) {
+        address_space_destroy(&ct3d->dc.host_dc_as);
+    }
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -794,6 +887,9 @@ static void ct3_exit(PCIDevice *pci_dev)
     pcie_aer_exit(pci_dev);
     cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
+    if (ct3d->dc.host_dc) {
+        address_space_destroy(&ct3d->dc.host_dc_as);
+    }
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -872,16 +968,23 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
                                        AddressSpace **as,
                                        uint64_t *dpa_offset)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
+    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+        vmr_size = memory_region_size(vmr);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+        pmr_size = memory_region_size(pmr);
+    }
+    if (ct3d->dc.host_dc) {
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        dc_size = memory_region_size(dc_mr);
     }
 
-    if (!vmr && !pmr) {
+    if (!vmr && !pmr && !dc_mr) {
         return -ENODEV;
     }
 
@@ -889,19 +992,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         return -EINVAL;
     }
 
-    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
+    if (*dpa_offset >= vmr_size + pmr_size + dc_size) {
         return -EINVAL;
     }
 
-    if (vmr) {
-        if (*dpa_offset < memory_region_size(vmr)) {
-            *as = &ct3d->hostvmem_as;
-        } else {
-            *as = &ct3d->hostpmem_as;
-            *dpa_offset -= memory_region_size(vmr);
-        }
-    } else {
+    if (*dpa_offset < vmr_size) {
+        *as = &ct3d->hostvmem_as;
+    } else if (*dpa_offset < vmr_size + pmr_size) {
         *as = &ct3d->hostpmem_as;
+        *dpa_offset -= vmr_size;
+    } else {
+        *as = &ct3d->dc.host_dc_as;
+        *dpa_offset -= (vmr_size + pmr_size);
     }
 
     return 0;
@@ -983,6 +1085,8 @@ static Property ct3_props[] = {
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
+    DEFINE_PROP_LINK("volatile-dc-memdev", CXLType3Dev, dc.host_dc,
+                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1049,33 +1153,39 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
 
 static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
     AddressSpace *as;
+    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+        vmr_size = memory_region_size(vmr);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+        pmr_size = memory_region_size(pmr);
     }
+    if (ct3d->dc.host_dc) {
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        dc_size = memory_region_size(dc_mr);
+     }
 
-    if (!vmr && !pmr) {
+    if (!vmr && !pmr && !dc_mr) {
         return false;
     }
 
-    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
+    if (dpa_offset + CXL_CACHE_LINE_SIZE > vmr_size + pmr_size + dc_size) {
         return false;
     }
 
-    if (vmr) {
-        if (dpa_offset < memory_region_size(vmr)) {
-            as = &ct3d->hostvmem_as;
-        } else {
-            as = &ct3d->hostpmem_as;
-            dpa_offset -= memory_region_size(vmr);
-        }
-    } else {
+    if (dpa_offset < vmr_size) {
+        as = &ct3d->hostvmem_as;
+    } else if (dpa_offset < vmr_size + pmr_size) {
         as = &ct3d->hostpmem_as;
+        dpa_offset -= vmr_size;
+    } else {
+        as = &ct3d->dc.host_dc_as;
+        dpa_offset -= (vmr_size + pmr_size);
     }
 
     address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data,
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index f7f56b44e3..c2c3df0d2a 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -467,6 +467,14 @@ struct CXLType3Dev {
     uint64_t poison_list_overflow_ts;
 
     struct dynamic_capacity {
+        HostMemoryBackend *host_dc;
+        AddressSpace host_dc_as;
+        /*
+         * total_capacity is equivalent to the dynamic capability
+         * memory region size.
+         */
+        uint64_t total_capacity; /* 256M aligned */
+
         uint8_t num_regions; /* 0-8 regions */
         CXLDCRegion regions[DCD_MAX_NUM_REGION];
     } dc;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (5 preceding siblings ...)
  2024-04-18 23:10 ` [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2024-04-18 23:10 ` nifan.cxl
  2024-04-19 16:52   ` Gregory Price
  2024-04-18 23:10 ` [PATCH v7 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Add dynamic capacity extent list representative to the definition of
CXLType3Dev and implement get DC extent list mailbox command per
CXL.spec.3.1:.8.2.9.9.9.2.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 73 ++++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3.c          |  1 +
 include/hw/cxl/cxl_device.h | 22 +++++++++++
 3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 831cef0567..1915959015 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -84,6 +84,7 @@ enum {
         #define CLEAR_POISON           0x2
     DCD_CONFIG  = 0x48,
         #define GET_DC_CONFIG          0x0
+        #define GET_DYN_CAP_EXT_LIST   0x1
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1322,7 +1323,8 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
      * to use.
      */
     stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
-    stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
+    stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED -
+             ct3d->dc.total_extent_count);
     stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
     stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
 
@@ -1330,6 +1332,72 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * CXL r3.1 section 8.2.9.9.9.2:
+ * Get Dynamic Capacity Extent List (Opcode 4801h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
+                                               uint8_t *payload_in,
+                                               size_t len_in,
+                                               uint8_t *payload_out,
+                                               size_t *len_out,
+                                               CXLCCI *cci)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    struct {
+        uint32_t extent_cnt;
+        uint32_t start_extent_id;
+    } QEMU_PACKED *in = (void *)payload_in;
+    struct {
+        uint32_t count;
+        uint32_t total_extents;
+        uint32_t generation_num;
+        uint8_t rsvd[4];
+        CXLDCExtentRaw records[];
+    } QEMU_PACKED *out = (void *)payload_out;
+    uint32_t start_extent_id = in->start_extent_id;
+    CXLDCExtentList *extent_list = &ct3d->dc.extents;
+    uint16_t record_count = 0, i = 0, record_done = 0;
+    uint16_t out_pl_len, size;
+    CXLDCExtent *ent;
+
+    if (start_extent_id > ct3d->dc.total_extent_count) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    record_count = MIN(in->extent_cnt,
+                       ct3d->dc.total_extent_count - start_extent_id);
+    size = CXL_MAILBOX_MAX_PAYLOAD_SIZE - sizeof(*out);
+    record_count = MIN(record_count, size / sizeof(out->records[0]));
+    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+
+    stl_le_p(&out->count, record_count);
+    stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
+    stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
+
+    if (record_count > 0) {
+        CXLDCExtentRaw *out_rec = &out->records[record_done];
+
+        QTAILQ_FOREACH(ent, extent_list, node) {
+            if (i++ < start_extent_id) {
+                continue;
+            }
+            stq_le_p(&out_rec->start_dpa, ent->start_dpa);
+            stq_le_p(&out_rec->len, ent->len);
+            memcpy(&out_rec->tag, ent->tag, 0x10);
+            stw_le_p(&out_rec->shared_seq, ent->shared_seq);
+
+            record_done++;
+            if (record_done == record_count) {
+                break;
+            }
+        }
+    }
+
+    *len_out = out_pl_len;
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1377,6 +1445,9 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
     [DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
         cmd_dcd_get_dyn_cap_config, 2, 0 },
+    [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
+        "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
+        8, 0 },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index ac87398089..9fffeae613 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -666,6 +666,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         };
         ct3d->dc.total_capacity += region->len;
     }
+    QTAILQ_INIT(&ct3d->dc.extents);
 
     return true;
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index c2c3df0d2a..6aec6ac983 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -424,6 +424,25 @@ typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 
 #define DCD_MAX_NUM_REGION 8
 
+typedef struct CXLDCExtentRaw {
+    uint64_t start_dpa;
+    uint64_t len;
+    uint8_t tag[0x10];
+    uint16_t shared_seq;
+    uint8_t rsvd[0x6];
+} QEMU_PACKED CXLDCExtentRaw;
+
+typedef struct CXLDCExtent {
+    uint64_t start_dpa;
+    uint64_t len;
+    uint8_t tag[0x10];
+    uint16_t shared_seq;
+    uint8_t rsvd[0x6];
+
+    QTAILQ_ENTRY(CXLDCExtent) node;
+} CXLDCExtent;
+typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -474,6 +493,9 @@ struct CXLType3Dev {
          * memory region size.
          */
         uint64_t total_capacity; /* 256M aligned */
+        CXLDCExtentList extents;
+        uint32_t total_extent_count;
+        uint32_t ext_list_gen_seq;
 
         uint8_t num_regions; /* 0-8 regions */
         CXLDCRegion regions[DCD_MAX_NUM_REGION];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (6 preceding siblings ...)
  2024-04-18 23:10 ` [PATCH v7 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
@ 2024-04-18 23:10 ` nifan.cxl
  2024-04-19 18:12   ` Gregory Price
  2024-04-18 23:11 ` [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Per CXL spec 3.1, two mailbox commands are implemented:
Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.

For the process of the above two commands, we use two-pass approach.
Pass 1: Check whether the input payload is valid or not; if not, skip
        Pass 2 and return mailbox process error.
Pass 2: Do the real work--add or release extents, respectively.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 394 ++++++++++++++++++++++++++++++++++++
 hw/mem/cxl_type3.c          |  11 +
 include/hw/cxl/cxl_device.h |   4 +
 3 files changed, 409 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 1915959015..9d54e10cd4 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -19,6 +19,7 @@
 #include "qemu/units.h"
 #include "qemu/uuid.h"
 #include "sysemu/hostmem.h"
+#include "qemu/range.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 #define CXL_DC_EVENT_LOG_SIZE 8
@@ -85,6 +86,8 @@ enum {
     DCD_CONFIG  = 0x48,
         #define GET_DC_CONFIG          0x0
         #define GET_DYN_CAP_EXT_LIST   0x1
+        #define ADD_DYN_CAP_RSP        0x2
+        #define RELEASE_DYN_CAP        0x3
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1398,6 +1401,391 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * Check whether any bit between addr[nr, nr+size) is set,
+ * return true if any bit is set, otherwise return false
+ */
+static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                              unsigned long size)
+{
+    unsigned long res = find_next_bit(addr, size + nr, nr);
+
+    return res < nr + size;
+}
+
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
+{
+    int i;
+    CXLDCRegion *region = &ct3d->dc.regions[0];
+
+    if (dpa < region->base ||
+        dpa >= region->base + ct3d->dc.total_capacity) {
+        return NULL;
+    }
+
+    /*
+     * CXL r3.1 section 9.13.3: Dynamic Capacity Device (DCD)
+     *
+     * Regions are used in increasing-DPA order, with Region 0 being used for
+     * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
+     * So check from the last region to find where the dpa belongs. Extents that
+     * cross multiple regions are not allowed.
+     */
+    for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
+        region = &ct3d->dc.regions[i];
+        if (dpa >= region->base) {
+            if (dpa + len > region->base + region->len) {
+                return NULL;
+            }
+            return region;
+        }
+    }
+
+    return NULL;
+}
+
+static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+                                             uint64_t dpa,
+                                             uint64_t len,
+                                             uint8_t *tag,
+                                             uint16_t shared_seq)
+{
+    CXLDCExtent *extent;
+
+    extent = g_new0(CXLDCExtent, 1);
+    extent->start_dpa = dpa;
+    extent->len = len;
+    if (tag) {
+        memcpy(extent->tag, tag, 0x10);
+    }
+    extent->shared_seq = shared_seq;
+
+    QTAILQ_INSERT_TAIL(list, extent, node);
+}
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+                                        CXLDCExtent *extent)
+{
+    QTAILQ_REMOVE(list, extent, node);
+    g_free(extent);
+}
+
+/*
+ * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
+ * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
+ */
+typedef struct CXLUpdateDCExtentListInPl {
+    uint32_t num_entries_updated;
+    uint8_t flags;
+    uint8_t rsvd[3];
+    /* CXL r3.1 Table 8-169: Updated Extent */
+    struct {
+        uint64_t start_dpa;
+        uint64_t len;
+        uint8_t rsvd[8];
+    } QEMU_PACKED updated_entries[];
+} QEMU_PACKED CXLUpdateDCExtentListInPl;
+
+/*
+ * For the extents in the extent list to operate, check whether they are valid
+ * 1. The extent should be in the range of a valid DC region;
+ * 2. The extent should not cross multiple regions;
+ * 3. The start DPA and the length of the extent should align with the block
+ * size of the region;
+ * 4. The address range of multiple extents in the list should not overlap.
+ */
+static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in)
+{
+    uint64_t min_block_size = UINT64_MAX;
+    CXLDCRegion *region;
+    CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
+    g_autofree unsigned long *blk_bitmap = NULL;
+    uint64_t dpa, len;
+    uint32_t i;
+
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        min_block_size = MIN(min_block_size, region->block_size);
+    }
+
+    blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
+                             ct3d->dc.regions[0].base) / min_block_size);
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        region = cxl_find_dc_region(ct3d, dpa, len);
+        if (!region) {
+            return CXL_MBOX_INVALID_PA;
+        }
+
+        dpa -= ct3d->dc.regions[0].base;
+        if (dpa % region->block_size || len % region->block_size) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        /* the dpa range already covered by some other extents in the list */
+        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
+            len / min_block_size)) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
+   }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in)
+{
+    uint32_t i;
+    CXLDCExtent *ent;
+    uint64_t dpa, len;
+    Range range1, range2;
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        range_init_nofail(&range1, dpa, len);
+
+        /*
+         * TODO: once the pending extent list is added, check against
+         * the list will be added here.
+         */
+
+        /* to-be-added range should not overlap with range already accepted */
+        QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
+            range_init_nofail(&range2, ent->start_dpa, ent->len);
+            if (range_overlaps_range(&range1, &range2)) {
+                return CXL_MBOX_INVALID_PA;
+            }
+        }
+    }
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
+ * An extent is added to the extent list and becomes usable only after the
+ * response is processed successfully.
+ */
+static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCExtentList *extent_list = &ct3d->dc.extents;
+    uint32_t i;
+    uint64_t dpa, len;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        /*
+         * TODO: once the pending list is introduced, extents in the beginning
+         * will get wiped out.
+         */
+        return CXL_MBOX_SUCCESS;
+    }
+
+    /* Adding extents causes exceeding device's extent tracking ability. */
+    if (in->num_entries_updated + ct3d->dc.total_extent_count >
+        CXL_NUM_EXTENTS_SUPPORTED) {
+        return CXL_MBOX_RESOURCES_EXHAUSTED;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
+        ct3d->dc.total_extent_count += 1;
+        /*
+         * TODO: we will add a pending extent list based on event log record
+         * and process the list accordingly here.
+         */
+    }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * Copy extent list from src to dst
+ * Return value: number of extents copied
+ */
+static uint32_t copy_extent_list(CXLDCExtentList *dst,
+                                 const CXLDCExtentList *src)
+{
+    uint32_t cnt = 0;
+    CXLDCExtent *ent;
+
+    if (!dst || !src) {
+        return 0;
+    }
+
+    QTAILQ_FOREACH(ent, src, node) {
+        cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
+                                         ent->tag, ent->shared_seq);
+        cnt++;
+    }
+    return cnt;
+}
+
+static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in, CXLDCExtentList *updated_list,
+        uint32_t *updated_list_size)
+{
+    CXLDCExtent *ent, *ent_next;
+    uint64_t dpa, len;
+    uint32_t i;
+    int cnt_delta = 0;
+    CXLRetCode ret = CXL_MBOX_SUCCESS;
+
+    QTAILQ_INIT(updated_list);
+    copy_extent_list(updated_list, &ct3d->dc.extents);
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        Range range;
+
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        while (len > 0) {
+            QTAILQ_FOREACH(ent, updated_list, node) {
+                range_init_nofail(&range, ent->start_dpa, ent->len);
+
+                if (range_contains(&range, dpa)) {
+                    uint64_t len1, len2 = 0, len_done = 0;
+                    uint64_t ent_start_dpa = ent->start_dpa;
+                    uint64_t ent_len = ent->len;
+
+                    len1 = dpa - ent->start_dpa;
+                    /* Found the extent or the subset of an existing extent */
+                    if (range_contains(&range, dpa + len - 1)) {
+                        len2 = ent_start_dpa + ent_len - dpa - len;
+                    } else {
+                        /*
+                         * TODO: we reject the attempt to remove an extent
+                         * that overlaps with multiple extents in the device
+                         * for now. We will allow it once superset release
+                         * support is added.
+                         */
+                        ret = CXL_MBOX_INVALID_PA;
+                        goto free_and_exit;
+                    }
+                    len_done = ent_len - len1 - len2;
+
+                    cxl_remove_extent_from_extent_list(updated_list, ent);
+                    cnt_delta--;
+
+                    if (len1) {
+                        cxl_insert_extent_to_extent_list(updated_list,
+                                                         ent_start_dpa,
+                                                         len1, NULL, 0);
+                        cnt_delta++;
+                    }
+                    if (len2) {
+                        cxl_insert_extent_to_extent_list(updated_list,
+                                                         dpa + len,
+                                                         len2, NULL, 0);
+                        cnt_delta++;
+                    }
+
+                    if (cnt_delta + ct3d->dc.total_extent_count >
+                            CXL_NUM_EXTENTS_SUPPORTED) {
+                        ret = CXL_MBOX_RESOURCES_EXHAUSTED;
+                        goto free_and_exit;
+                    }
+
+                    len -= len_done;
+                    /* len == 0 here until superset release is added */
+                    break;
+                }
+            }
+            if (len) {
+                ret = CXL_MBOX_INVALID_PA;
+                goto free_and_exit;
+            }
+        }
+    }
+free_and_exit:
+    if (ret != CXL_MBOX_SUCCESS) {
+        QTAILQ_FOREACH_SAFE(ent, updated_list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(updated_list, ent);
+        }
+        *updated_list_size = 0;
+    } else {
+        *updated_list_size = ct3d->dc.total_extent_count + cnt_delta;
+    }
+
+    return ret;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
+ */
+static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCExtentList updated_list;
+    CXLDCExtent *ent, *ent_next;
+    uint32_t updated_list_size;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    ret = cxl_dc_extent_release_dry_run(ct3d, in, &updated_list,
+                                        &updated_list_size);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    /*
+     * If the dry run release passes, the returned updated_list will
+     * be the updated extent list and we just need to clear the extents
+     * in the accepted list and copy extents in the updated_list to accepted
+     * list and update the extent count;
+     */
+    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
+    }
+    copy_extent_list(&ct3d->dc.extents, &updated_list);
+    QTAILQ_FOREACH_SAFE(ent, &updated_list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&updated_list, ent);
+    }
+    ct3d->dc.total_extent_count = updated_list_size;
+
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1448,6 +1836,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
     [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
         "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
         8, 0 },
+    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
+        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
+        ~0, IMMEDIATE_DATA_CHANGE },
+    [DCD_CONFIG][RELEASE_DYN_CAP] = {
+        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
+        ~0, IMMEDIATE_DATA_CHANGE },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 9fffeae613..c2cdd6d506 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -671,6 +671,15 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
     return true;
 }
 
+static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
+{
+    CXLDCExtent *ent, *ent_next;
+
+    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
+    }
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -868,6 +877,7 @@ err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
@@ -889,6 +899,7 @@ static void ct3_exit(PCIDevice *pci_dev)
     cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 6aec6ac983..df3511e91b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -551,4 +551,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
 
 void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
 
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+                                        CXLDCExtent *extent);
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (7 preceding siblings ...)
  2024-04-18 23:10 ` [PATCH v7 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
@ 2024-04-18 23:11 ` nifan.cxl
  2024-04-19 18:13   ` Gregory Price
                     ` (3 more replies)
  2024-04-18 23:11 ` [PATCH v7 10/12] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions nifan.cxl
                   ` (4 subsequent siblings)
  13 siblings, 4 replies; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "selection-policy": 2,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "flags": 1,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
 hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  20 +++
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               |  69 ++++++++
 6 files changed, 489 insertions(+), 13 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 9d54e10cd4..3569902e9e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
     g_free(extent);
 }
 
+/*
+ * Add a new extent to the extent "group" if group exists;
+ * otherwise, create a new group
+ * Return value: return the group where the extent is inserted.
+ */
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq)
+{
+    if (!group) {
+        group = g_new0(CXLDCExtentGroup, 1);
+        QTAILQ_INIT(&group->list);
+    }
+    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
+                                     tag, shared_seq);
+    return group;
+}
+
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group)
+{
+    QTAILQ_INSERT_TAIL(list, group, node);
+}
+
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
+{
+    CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
+
+    QTAILQ_REMOVE(list, group, node);
+    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&group->list, ent);
+    }
+    g_free(group);
+}
+
 /*
  * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
  * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
@@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 {
     uint32_t i;
     CXLDCExtent *ent;
+    CXLDCExtentGroup *ext_group;
     uint64_t dpa, len;
     Range range1, range2;
 
@@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
         range_init_nofail(&range1, dpa, len);
 
         /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
+         * The host-accepted DPA range must be contained by the first extent
+         * group in the pending list
          */
+        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
-        /*
-         * TODO: once the pending list is introduced, extents in the beginning
-         * will get wiped out.
-         */
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_SUCCESS;
     }
 
@@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list accordingly here.
-         */
     }
+    /* Remove the first extent group in the pending list*/
+    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
 
     return CXL_MBOX_SUCCESS;
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index c2cdd6d506..e892b3de7b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group, *group_next;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+
+    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
+        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
+        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(&group->list, ent);
+        }
+        g_free(group);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
     default:
         return -EINVAL;
     }
@@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
+                                                uint64_t dpa, uint64_t len)
+{
+    CXLDCExtentGroup *group;
+
+    if (!list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(group, list, node) {
+        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event with extent list.
+ * Currently DC extents add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
+        uint16_t hid, CXLDCEventType type, uint8_t rid,
+        CXLDCExtentRecordList *records, Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDCExtentRecordList *list;
+    CXLDCExtentGroup *group = NULL;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
+    uint64_t dpa, offset, len, block_size;
+    g_autofree unsigned long *blk_bitmap = NULL;
+    int i;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        } else if (type == DC_EVENT_ADD_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA already accessible  to the same LD");
+                return;
+            }
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA again while still pending");
+                return;
+            }
+        }
+        list = list->next;
+        num_extents++;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            group = cxl_insert_extent_to_extent_group(group,
+                                                      extents[i].start_dpa,
+                                                      extents[i].len,
+                                                      extents[i].tag,
+                                                      extents[i].shared_seq);
+        }
+
+        list = list->next;
+        i++;
+    }
+    if (group) {
+        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (i < num_extents - 1) {
+            /* Set "More" flag */
+            dCap.flags |= BIT(0);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
+                                  uint8_t sel_policy, uint8_t region_id,
+                                  const char *tag,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    enum {
+        CXL_SEL_POLICY_FREE,
+        CXL_SEL_POLICY_CONTIGUOUS,
+        CXL_SEL_POLICY_PRESCRIPTIVE,
+        CXL_SEL_POLICY_ENABLESHAREDACCESS,
+    };
+    switch (sel_policy) {
+    case CXL_SEL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
+                                                      DC_EVENT_ADD_CAPACITY,
+                                                      region_id, records, errp);
+        return;
+    default:
+        error_setg(errp, "Selection policy not supported");
+        return;
+    }
+}
+
+#define REMOVAL_POLICY_MASK 0xf
+#define REMOVAL_POLICY_PRESCRIPTIVE 1
+#define FORCED_REMOVAL_BIT BIT(4)
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
+                                      uint8_t flags, uint8_t region_id,
+                                      const char *tag,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
+
+    if (flags & FORCED_REMOVAL_BIT) {
+        /* TODO: enable forced removal in the future */
+        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
+        error_setg(errp, "Forced removal not supported yet");
+        return;
+    }
+
+    switch (flags & REMOVAL_POLICY_MASK) {
+    case REMOVAL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
+                                                      region_id, records, errp);
+        return;
+    default:
+        error_setg(errp, "Removal policy not supported");
+        return;
+    }
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..810685e0d5 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path,
+                                  uint16_t hid,
+                                  uint8_t sel_policy,
+                                  uint8_t region_id,
+                                  const char *tag,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
+                                      uint8_t flags, uint8_t region_id,
+                                      const char *tag,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..c69ff6b5de 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
 } CXLDCExtent;
 typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
 
+typedef struct CXLDCExtentGroup {
+    CXLDCExtentList list;
+    QTAILQ_ENTRY(CXLDCExtentGroup) node;
+} CXLDCExtentGroup;
+typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -494,6 +500,7 @@ struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentGroupList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq);
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group);
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 4281726dec..2dcf03d973 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -361,3 +361,72 @@
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDCExtentRecord:
+#
+# Record of a single extent to add/release
+#
+# @offset: offset to the start of the region where the extent to be operated
+# @len: length of the extent
+#
+# Since: 9.1
+##
+{ 'struct': 'CXLDCExtentRecord',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to start add dynamic capacity extents flow. The device will
+# have to acknowledged the acceptance of the extents before they are usable.
+#
+# @path: CXL DCD canonical QOM path
+# @hid: host id
+# @selection-policy: policy to use for selecting extents for adding capacity
+# @region-id: id of the region where the extent to add
+# @tag: Context field
+# @extents: Extents to add
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'hid': 'uint16',
+            'selection-policy': 'uint8',
+            'region-id': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to start release dynamic capacity extents flow. The host will
+# need to respond to indicate that it has released the capacity before it
+# is made unavailable for read and write and can be re-added.
+#
+# @path: CXL DCD canonical QOM path
+# @hid: host id
+# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
+#     sanitize on release, bit[7:6] reserved
+# @region-id: id of the region where the extent to release
+# @tag: Context field
+# @extents: Extents to release
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'hid': 'uint16',
+            'flags': 'uint8',
+            'region-id': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 10/12] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (8 preceding siblings ...)
  2024-04-18 23:11 ` [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-04-18 23:11 ` nifan.cxl
  2024-04-19 16:57   ` Gregory Price
  2024-04-18 23:11 ` [PATCH v7 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

All DPA ranges in the DC regions are invalid to access until an extent
covering the range has been successfully accepted by the host. A bitmap
is added to each region to record whether a DC block in the region has
been backed by a DC extent. Each bit in the bitmap represents a DC block.
When a DC extent is accepted, all the bits representing the blocks in the
extent are set, which will be cleared when the extent is released.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  3 ++
 hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_device.h |  7 ++++
 3 files changed, 86 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 3569902e9e..57f1ce9cce 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1655,6 +1655,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
+        ct3_set_region_block_backed(ct3d, dpa, len);
     }
     /* Remove the first extent group in the pending list*/
     cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
@@ -1813,10 +1814,12 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
      * list and update the extent count;
      */
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
+        ct3_clear_region_block_backed(ct3d, ent->start_dpa, ent->len);
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
     copy_extent_list(&ct3d->dc.extents, &updated_list);
     QTAILQ_FOREACH_SAFE(ent, &updated_list, node, ent_next) {
+        ct3_set_region_block_backed(ct3d, ent->start_dpa, ent->len);
         cxl_remove_extent_from_extent_list(&updated_list, ent);
     }
     ct3d->dc.total_extent_count = updated_list_size;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index e892b3de7b..a3e1a5de25 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -665,6 +665,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
             .flags = 0,
         };
         ct3d->dc.total_capacity += region->len;
+        region->blk_bitmap = bitmap_new(region->len / region->block_size);
     }
     QTAILQ_INIT(&ct3d->dc.extents);
     QTAILQ_INIT(&ct3d->dc.extents_pending);
@@ -676,6 +677,8 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
     CXLDCExtentGroup *group, *group_next;
+    int i;
+    CXLDCRegion *region;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
@@ -688,6 +691,11 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
         }
         g_free(group);
     }
+
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        g_free(region->blk_bitmap);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -920,6 +928,70 @@ static void ct3_exit(PCIDevice *pci_dev)
     }
 }
 
+/*
+ * Mark the DPA range [dpa, dap + len - 1] to be backed and accessible. This
+ * happens when a DC extent is added and accepted by the host.
+ */
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                 uint64_t len)
+{
+    CXLDCRegion *region;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return;
+    }
+
+    bitmap_set(region->blk_bitmap, (dpa - region->base) / region->block_size,
+               len / region->block_size);
+}
+
+/*
+ * Check whether the DPA range [dpa, dpa + len - 1] is backed with DC extents.
+ * Used when validating read/write to dc regions
+ */
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                  uint64_t len)
+{
+    CXLDCRegion *region;
+    uint64_t nbits;
+    long nr;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return false;
+    }
+
+    nr = (dpa - region->base) / region->block_size;
+    nbits = DIV_ROUND_UP(len, region->block_size);
+    /*
+     * if bits between [dpa, dpa + len) are all 1s, meaning the DPA range is
+     * backed with DC extents, return true; else return false.
+     */
+    return find_next_zero_bit(region->blk_bitmap, nr + nbits, nr) == nr + nbits;
+}
+
+/*
+ * Mark the DPA range [dpa, dap + len - 1] to be unbacked and inaccessible.
+ * This happens when a dc extent is released by the host.
+ */
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                   uint64_t len)
+{
+    CXLDCRegion *region;
+    uint64_t nbits;
+    long nr;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return;
+    }
+
+    nr = (dpa - region->base) / region->block_size;
+    nbits = len / region->block_size;
+    bitmap_clear(region->blk_bitmap, nr, nbits);
+}
+
 static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
 {
     int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;
@@ -1024,6 +1096,10 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         *as = &ct3d->hostpmem_as;
         *dpa_offset -= vmr_size;
     } else {
+        if (!ct3_test_region_block_backed(ct3d, *dpa_offset, size)) {
+            return -ENODEV;
+        }
+
         *as = &ct3d->dc.host_dc_as;
         *dpa_offset -= (vmr_size + pmr_size);
     }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index c69ff6b5de..0a4fcb2800 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -456,6 +456,7 @@ typedef struct CXLDCRegion {
     uint64_t block_size;
     uint32_t dsmadhandle;
     uint8_t flags;
+    unsigned long *blk_bitmap;
 } CXLDCRegion;
 
 struct CXLType3Dev {
@@ -577,4 +578,10 @@ CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
 void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
                                        CXLDCExtentGroup *group);
 void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                 uint64_t len);
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                   uint64_t len);
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                  uint64_t len);
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (9 preceding siblings ...)
  2024-04-18 23:11 ` [PATCH v7 10/12] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions nifan.cxl
@ 2024-04-18 23:11 ` nifan.cxl
  2024-04-19 18:20   ` Gregory Price
  2024-04-18 23:11 ` [PATCH v7 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface nifan.cxl
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

With the change, we extend the extent release mailbox command processing
to allow more flexible release. As long as the DPA range of the extent to
release is covered by accepted extent(s) in the device, the release can be
performed.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 57f1ce9cce..89f0ab8116 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1704,6 +1704,13 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
         dpa = in->updated_entries[i].start_dpa;
         len = in->updated_entries[i].len;
 
+        /* Check if the DPA range is not fully backed with valid extents */
+        if (!ct3_test_region_block_backed(ct3d, dpa, len)) {
+            ret = CXL_MBOX_INVALID_PA;
+            goto free_and_exit;
+        }
+
+        /* After this point, extent overflow is the only error can happen */
         while (len > 0) {
             QTAILQ_FOREACH(ent, updated_list, node) {
                 range_init_nofail(&range, ent->start_dpa, ent->len);
@@ -1718,14 +1725,7 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
                     if (range_contains(&range, dpa + len - 1)) {
                         len2 = ent_start_dpa + ent_len - dpa - len;
                     } else {
-                        /*
-                         * TODO: we reject the attempt to remove an extent
-                         * that overlaps with multiple extents in the device
-                         * for now. We will allow it once superset release
-                         * support is added.
-                         */
-                        ret = CXL_MBOX_INVALID_PA;
-                        goto free_and_exit;
+                        dpa = ent_start_dpa + ent_len;
                     }
                     len_done = ent_len - len1 - len2;
 
@@ -1752,14 +1752,9 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
                     }
 
                     len -= len_done;
-                    /* len == 0 here until superset release is added */
                     break;
                 }
             }
-            if (len) {
-                ret = CXL_MBOX_INVALID_PA;
-                goto free_and_exit;
-            }
         }
     }
 free_and_exit:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (10 preceding siblings ...)
  2024-04-18 23:11 ` [PATCH v7 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
@ 2024-04-18 23:11 ` nifan.cxl
  2024-04-19 18:20   ` Gregory Price
  2024-04-19 18:24 ` [PATCH v7 00/12] Enabling DCD emulation support in Qemu Gregory Price
  2024-05-14  2:16   ` Zhijian Li (Fujitsu) via
  13 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-04-18 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Before the change, the QMP interface used for add/release DC extents
only allows to release an extent whose DPA range is contained by a single
accepted extent in the device.

With the change, we relax the constraints.  As long as the DPA range of
the extent is covered by accepted extents, we allow the release.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/mem/cxl_type3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a3e1a5de25..9e725647f1 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1941,7 +1941,7 @@ static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
                            "cannot release extent with pending DPA range");
                 return;
             }
-            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+            if (!ct3_test_region_block_backed(dcd, dpa, len)) {
                 error_setg(errp,
                            "cannot release extent with non-existing DPA range");
                 return;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument
  2024-04-18 23:10 ` [PATCH v7 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument nifan.cxl
@ 2024-04-19 16:39   ` Gregory Price
  0 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 16:39 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:10:56PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> The function ct3_build_cdat_entries_for_mr only uses size of the passed
> memory region argument, refactor the function definition to make the passed
> arguments more specific.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/mem/cxl_type3.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 5ceed0ab4c..a1fe268560 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -44,7 +44,7 @@ enum {
>  };

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
  2024-04-18 23:10 ` [PATCH v7 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
@ 2024-04-19 16:40   ` Gregory Price
  0 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 16:40 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:10:52PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> Based on CXL spec r3.1 Table 8-127 (Identify Memory Device Output
> Payload), dynamic capacity event log size should be part of
> output of the Identify command.
> Add dc_event_log_size to the output payload for the host to get the info.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2024-04-18 23:10 ` [PATCH v7 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
@ 2024-04-19 16:44   ` Gregory Price
  0 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 16:44 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:10:53PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> Per cxl spec r3.1, add dynamic capacity region representative based on
> Table 8-165 and extend the cxl type3 device definition to include DC region
> information. Also, based on info in 8.2.9.9.9.1, add 'Get Dynamic Capacity
> Configuration' mailbox support.
> 
> Note: we store region decode length as byte-wise length on the device, which
> should be divided by 256 * MiB before being returned to the host
> for "Get Dynamic Capacity Configuration" mailbox command per
> specification.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 96 +++++++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_device.h | 16 +++++++
>  2 files changed, 112 insertions(+)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
  2024-04-18 23:10 ` [PATCH v7 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
@ 2024-04-19 16:45   ` Gregory Price
  0 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 16:45 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:10:54PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
> pmem capacity, preparing for the introduction of dynamic capacity to support
> dynamic capacity devices.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 4 ++--
>  hw/mem/cxl_type3.c          | 8 ++++----
>  include/hw/cxl/cxl_device.h | 2 +-
>  3 files changed, 7 insertions(+), 7 deletions(-)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
  2024-04-18 23:10 ` [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
@ 2024-04-19 16:47   ` Gregory Price
  2024-05-14  8:14     ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 16:47 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:10:55PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> With the change, when setting up memory for type3 memory device, we can
> create DC regions.
> A property 'num-dc-regions' is added to ct3_props to allow users to pass the
> number of DC regions to create. To make it easier, other region parameters
> like region base, length, and block size are hard coded. If needed,
> these parameters can be added easily.
> 
> With the change, we can create DC regions with proper kernel side
> support like below:
> 
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> 
> echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> 
> echo 0x40000000 > /sys/bus/cxl/devices/$region/size
> echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/mem/cxl_type3.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
  2024-04-18 23:10 ` [PATCH v7 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
@ 2024-04-19 16:52   ` Gregory Price
  0 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 16:52 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:10:58PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> Add dynamic capacity extent list representative to the definition of
> CXLType3Dev and implement get DC extent list mailbox command per
> CXL.spec.3.1:.8.2.9.9.9.2.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 73 ++++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          |  1 +
>  include/hw/cxl/cxl_device.h | 22 +++++++++++
>  3 files changed, 95 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 10/12] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
  2024-04-18 23:11 ` [PATCH v7 10/12] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions nifan.cxl
@ 2024-04-19 16:57   ` Gregory Price
  0 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 16:57 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:11:01PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> All DPA ranges in the DC regions are invalid to access until an extent
> covering the range has been successfully accepted by the host. A bitmap
> is added to each region to record whether a DC block in the region has
> been backed by a DC extent. Each bit in the bitmap represents a DC block.
> When a DC extent is accepted, all the bits representing the blocks in the
> extent are set, which will be cleared when the extent is released.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  3 ++
>  hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_device.h |  7 ++++
>  3 files changed, 86 insertions(+)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-04-18 23:10 ` [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2024-04-19 17:27   ` Gregory Price
  2024-04-22 11:55       ` Jonathan Cameron via
  2024-04-22 11:52     ` Jonathan Cameron via
  2024-05-14  8:28     ` Zhijian Li (Fujitsu)
  2 siblings, 1 reply; 65+ messages in thread
From: Gregory Price @ 2024-04-19 17:27 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:10:57PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> Add (file/memory backed) host backend for DCD. All the dynamic capacity
> regions will share a single, large enough host backend. Set up address
> space for DC regions to support read/write operations to dynamic capacity
> for DCD.
> 
> With the change, the following support is added:
> 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
>    memory backend for dynamic capacity. Currently, all DC regions share one
>    host backend;
> 2. Add namespace for dynamic capacity for read/write support;
> 3. Create cdat entries for each dynamic capacity region.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  16 ++--
>  hw/mem/cxl_type3.c          | 172 +++++++++++++++++++++++++++++-------
>  include/hw/cxl/cxl_device.h |   8 ++
>  3 files changed, 160 insertions(+), 36 deletions(-)
> 

A couple general comments in line for discussion, but patch looks good
otherwise. Notes are mostly on improvements we could make that should
not block this patch.

Reviewed-by: Gregory Price <gregory.price@memverge.com>

>  
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index a1fe268560..ac87398089 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -45,7 +45,8 @@ enum {
>  
>  static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
>                                            int dsmad_handle, uint64_t size,
> -                                          bool is_pmem, uint64_t dpa_base)
> +                                          bool is_pmem, bool is_dynamic,
> +                                          uint64_t dpa_base)

We should probably change the is_* fields into a flags field and do some
error checking on the combination of flags.

>  {
>      CDATDsmas *dsmas;
>      CDATDslbis *dslbis0;
> @@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
>              .length = sizeof(*dsmas),
>          },
>          .DSMADhandle = dsmad_handle,
> -        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> +        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> +                 (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),

For example, as noted elsewhere in the code, is_pmem+is_dynamic is not
presently supported, so this shouldn't even be allowed in this function.

> +    if (dc_mr) {
> +        int i;
> +        uint64_t region_base = vmr_size + pmr_size;
> +
> +        /*
> +         * TODO: we assume the dynamic capacity to be volatile for now.
> +         * Non-volatile dynamic capacity will be added if needed in the
> +         * future.
> +         */

Probably don't need to mark this TODO, can just leave it as a note.

Non-volatile dynamic capacity will coincide with shared memory, so it'll
end up handled.  So this isn't really a TODO for this current work, and
should read more like:

"Dynamic Capacity is always volatile, until shared memory is
implemented"

> +    } else if (ct3d->hostpmem) {
>          range1_size_hi = ct3d->hostpmem->size >> 32;
>          range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                           (ct3d->hostpmem->size & 0xF0000000);
> +    } else {
> +        /*
> +         * For DCD with no static memory, set memory active, memory class bits.
> +         * No range is set.
> +         */
> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3;

We should probably add defs for these fields at some point. Can be
tabled for later work though.

> +        /*
> +         * TODO: set dc as volatile for now, non-volatile support can be added
> +         * in the future if needed.
> +         */
> +        memory_region_set_nonvolatile(dc_mr, false);

Again can probably drop the TODO and just leave a statement.

~Gregory

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-04-18 23:10 ` [PATCH v7 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
@ 2024-04-19 18:12   ` Gregory Price
  0 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 18:12 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:10:59PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> Per CXL spec 3.1, two mailbox commands are implemented:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> 
> For the process of the above two commands, we use two-pass approach.
> Pass 1: Check whether the input payload is valid or not; if not, skip
>         Pass 2 and return mailbox process error.
> Pass 2: Do the real work--add or release extents, respectively.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 394 ++++++++++++++++++++++++++++++++++++
>  hw/mem/cxl_type3.c          |  11 +
>  include/hw/cxl/cxl_device.h |   4 +
>  3 files changed, 409 insertions(+)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-18 23:11 ` [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-04-19 18:13   ` Gregory Price
  2024-04-22 12:01     ` Jonathan Cameron via
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 18:13 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:11:00PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
...
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
>  hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json               |  69 ++++++++
>  6 files changed, 489 insertions(+), 13 deletions(-)
> 

Reviewed-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
  2024-04-18 23:11 ` [PATCH v7 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
@ 2024-04-19 18:20   ` Gregory Price
  0 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 18:20 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:11:02PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> With the change, we extend the extent release mailbox command processing
> to allow more flexible release. As long as the DPA range of the extent to
> release is covered by accepted extent(s) in the device, the release can be
> performed.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c | 21 ++++++++-------------
>  1 file changed, 8 insertions(+), 13 deletions(-)
>

Hmmm.  This will complicate MHD accounting, but it looks ok to me as-is.

Reviewed-by: Gregory Price <gregory.price@memverge.com>

> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 57f1ce9cce..89f0ab8116 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1704,6 +1704,13 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
>          dpa = in->updated_entries[i].start_dpa;
>          len = in->updated_entries[i].len;
>  
> +        /* Check if the DPA range is not fully backed with valid extents */
> +        if (!ct3_test_region_block_backed(ct3d, dpa, len)) {
> +            ret = CXL_MBOX_INVALID_PA;
> +            goto free_and_exit;
> +        }
> +
> +        /* After this point, extent overflow is the only error can happen */
>          while (len > 0) {
>              QTAILQ_FOREACH(ent, updated_list, node) {
>                  range_init_nofail(&range, ent->start_dpa, ent->len);
> @@ -1718,14 +1725,7 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
>                      if (range_contains(&range, dpa + len - 1)) {
>                          len2 = ent_start_dpa + ent_len - dpa - len;
>                      } else {
> -                        /*
> -                         * TODO: we reject the attempt to remove an extent
> -                         * that overlaps with multiple extents in the device
> -                         * for now. We will allow it once superset release
> -                         * support is added.
> -                         */
> -                        ret = CXL_MBOX_INVALID_PA;
> -                        goto free_and_exit;
> +                        dpa = ent_start_dpa + ent_len;
>                      }
>                      len_done = ent_len - len1 - len2;
>  
> @@ -1752,14 +1752,9 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
>                      }
>  
>                      len -= len_done;
> -                    /* len == 0 here until superset release is added */
>                      break;
>                  }
>              }
> -            if (len) {
> -                ret = CXL_MBOX_INVALID_PA;
> -                goto free_and_exit;
> -            }
>          }
>      }
>  free_and_exit:
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface
  2024-04-18 23:11 ` [PATCH v7 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface nifan.cxl
@ 2024-04-19 18:20   ` Gregory Price
  0 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 18:20 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Thu, Apr 18, 2024 at 04:11:03PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> Before the change, the QMP interface used for add/release DC extents
> only allows to release an extent whose DPA range is contained by a single
> accepted extent in the device.
> 
> With the change, we relax the constraints.  As long as the DPA range of
> the extent is covered by accepted extents, we allow the release.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/mem/cxl_type3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Reveiwed-by: Gregory Price <gregory.price@memverge.com>

> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index a3e1a5de25..9e725647f1 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -1941,7 +1941,7 @@ static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
>                             "cannot release extent with pending DPA range");
>                  return;
>              }
> -            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> +            if (!ct3_test_region_block_backed(dcd, dpa, len)) {
>                  error_setg(errp,
>                             "cannot release extent with non-existing DPA range");
>                  return;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (11 preceding siblings ...)
  2024-04-18 23:11 ` [PATCH v7 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface nifan.cxl
@ 2024-04-19 18:24 ` Gregory Price
  2024-04-19 18:43   ` fan
  2024-05-16 17:05   ` fan
  2024-05-14  2:16   ` Zhijian Li (Fujitsu) via
  13 siblings, 2 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-19 18:24 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, fan.ni

On Thu, Apr 18, 2024 at 04:10:51PM -0700, nifan.cxl@gmail.com wrote:
> A git tree of this series can be found here (with one extra commit on top
> for printing out accepted/pending extent list): 
> https://github.com/moking/qemu/tree/dcd-v7
> 
> v6->v7:
> 
> 1. Fixed the dvsec range register issue mentioned in the the cover letter in v6.
>    Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
> 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. (Jonathan)
> 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
> 4. Added "Reviewed-by" tag to Patch 7.
> 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
>    reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. (Jørgen) 
> 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
>     (Jonathan)
> 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
> 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
>    tags, selection policy, flags in the interface. (Jonathan, Gregory)
> 9. Redesigned the pending list so extents in the same requests are grouped
>     together. A new data structure is introduced to represent "extent group"
>     in pending list.  (Jonathan)
> 10. Added support in QMP interface for "More" flag. 
> 11. Check "Forced removal" flag for release request and not let it pass through.
> 12. Removed the dynamic capacity log type from CxlEventLog definition in cxl.json
>    to avoid the side effect it may introduce to inject error to DC event log.
>    (Jonathan)
> 13. Hard coded the event log type to dynamic capacity event log in QMP
>     interfaces. (Jonathan)
> 14. Adding space in between "-1]". (Jonathan)
> 15. Some minor comment fixes.
> 
> The code is tested with similar setup and has passed similar tests as listed
> in the cover letter of v5[1] and v6[2].
> Also, the code is tested with the latest DCD kernel patchset[3].
> 
> [1] Qemu DCD patchset v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
> [2] Qemu DCD patchset v6: https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan.cxl@gmail.com/T/#t
> [3] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
>

added review to all patches, will hopefully be able to add a Tested-by
tag early next week, along with a v1 RFC for MHD bit-tracking.

We've been testing v5/v6 for a bit, so I expect as soon as we get the
MHD code ported over to v7 i'll ship a tested-by tag pretty quick.

The super-set release will complicate a few things but this doesn't
look like a blocker on our end, just a change to how we track bits in a
shared bit/bytemap.

> 
> Fan Ni (12):
>   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>     payload of identify memory device command
>   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>     and mailbox command support
>   include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
>     type3 memory devices
>   hw/mem/cxl_type3: Add support to create DC regions to type3 memory
>     devices
>   hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
>     size instead of mr as argument
>   hw/mem/cxl_type3: Add host backend and address space handling for DC
>     regions
>   hw/mem/cxl_type3: Add DC extent list representative and get DC extent
>     list mailbox support
>   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>     dynamic capacity response
>   hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
>     extents
>   hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
>   hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
>   hw/mem/cxl_type3: Allow to release extent superset in QMP interface
> 
>  hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
>  hw/mem/cxl_type3_stubs.c    |  20 ++
>  include/hw/cxl/cxl_device.h |  81 ++++-
>  include/hw/cxl/cxl_events.h |  18 +
>  qapi/cxl.json               |  69 ++++
>  6 files changed, 1396 insertions(+), 45 deletions(-)
> 
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-04-19 18:24 ` [PATCH v7 00/12] Enabling DCD emulation support in Qemu Gregory Price
@ 2024-04-19 18:43   ` fan
  2024-04-20 20:35     ` Gregory Price
  2024-05-16 17:05   ` fan
  1 sibling, 1 reply; 65+ messages in thread
From: fan @ 2024-04-19 18:43 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, fan.ni

On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:
> On Thu, Apr 18, 2024 at 04:10:51PM -0700, nifan.cxl@gmail.com wrote:
> > A git tree of this series can be found here (with one extra commit on top
> > for printing out accepted/pending extent list): 
> > https://github.com/moking/qemu/tree/dcd-v7
> > 
> > v6->v7:
> > 
> > 1. Fixed the dvsec range register issue mentioned in the the cover letter in v6.
> >    Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
> > 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. (Jonathan)
> > 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
> > 4. Added "Reviewed-by" tag to Patch 7.
> > 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
> >    reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. (Jørgen) 
> > 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
> >     (Jonathan)
> > 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
> > 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
> >    tags, selection policy, flags in the interface. (Jonathan, Gregory)
> > 9. Redesigned the pending list so extents in the same requests are grouped
> >     together. A new data structure is introduced to represent "extent group"
> >     in pending list.  (Jonathan)
> > 10. Added support in QMP interface for "More" flag. 
> > 11. Check "Forced removal" flag for release request and not let it pass through.
> > 12. Removed the dynamic capacity log type from CxlEventLog definition in cxl.json
> >    to avoid the side effect it may introduce to inject error to DC event log.
> >    (Jonathan)
> > 13. Hard coded the event log type to dynamic capacity event log in QMP
> >     interfaces. (Jonathan)
> > 14. Adding space in between "-1]". (Jonathan)
> > 15. Some minor comment fixes.
> > 
> > The code is tested with similar setup and has passed similar tests as listed
> > in the cover letter of v5[1] and v6[2].
> > Also, the code is tested with the latest DCD kernel patchset[3].
> > 
> > [1] Qemu DCD patchset v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
> > [2] Qemu DCD patchset v6: https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan.cxl@gmail.com/T/#t
> > [3] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
> >
> 
> added review to all patches, will hopefully be able to add a Tested-by
> tag early next week, along with a v1 RFC for MHD bit-tracking.
> 
> We've been testing v5/v6 for a bit, so I expect as soon as we get the
> MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> 
> The super-set release will complicate a few things but this doesn't
> look like a blocker on our end, just a change to how we track bits in a
> shared bit/bytemap.
> 

Hi Gregory,
Thanks for reviewing the patches so quickly. 

No pressure, but look forward to your MHD work. :)

Fan

> > 
> > Fan Ni (12):
> >   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> >     payload of identify memory device command
> >   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> >     and mailbox command support
> >   include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
> >     type3 memory devices
> >   hw/mem/cxl_type3: Add support to create DC regions to type3 memory
> >     devices
> >   hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
> >     size instead of mr as argument
> >   hw/mem/cxl_type3: Add host backend and address space handling for DC
> >     regions
> >   hw/mem/cxl_type3: Add DC extent list representative and get DC extent
> >     list mailbox support
> >   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> >     dynamic capacity response
> >   hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
> >     extents
> >   hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
> >   hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
> >   hw/mem/cxl_type3: Allow to release extent superset in QMP interface
> > 
> >  hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
> >  hw/mem/cxl_type3_stubs.c    |  20 ++
> >  include/hw/cxl/cxl_device.h |  81 ++++-
> >  include/hw/cxl/cxl_events.h |  18 +
> >  qapi/cxl.json               |  69 ++++
> >  6 files changed, 1396 insertions(+), 45 deletions(-)
> > 
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-04-19 18:43   ` fan
@ 2024-04-20 20:35     ` Gregory Price
  2024-04-22 12:04         ` Jonathan Cameron via
  0 siblings, 1 reply; 65+ messages in thread
From: Gregory Price @ 2024-04-20 20:35 UTC (permalink / raw)
  To: fan
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, fan.ni

On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:
> On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:
> > 
> > added review to all patches, will hopefully be able to add a Tested-by
> > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > 
> > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > 
> > The super-set release will complicate a few things but this doesn't
> > look like a blocker on our end, just a change to how we track bits in a
> > shared bit/bytemap.
> > 
> 
> Hi Gregory,
> Thanks for reviewing the patches so quickly. 
> 
> No pressure, but look forward to your MHD work. :)
> 
> Fan

Starting to get into versioniong hell a bit, since the Niagara work was
based off of jonathan's branch and the mhd-dcd work needs some of the
extentions from that branch - while this branch is based on master.

Probably we'll need to wait for a new cxl dated branch to try and sus
out the pain points before we push an RFC.  I would not want to have
conflicting commits for something like this for example:

https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.price@memverge.com/

We get merge conflicts here because this is behind that patch. So
pushing up an RFC in this state would be mostly useless to everyone.

~Gregory

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-04-18 23:10 ` [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2024-04-22 11:52     ` Jonathan Cameron via
  2024-04-22 11:52     ` Jonathan Cameron via
  2024-05-14  8:28     ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-22 11:52 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Thu, 18 Apr 2024 16:10:57 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Add (file/memory backed) host backend for DCD. All the dynamic capacity
> regions will share a single, large enough host backend. Set up address
> space for DC regions to support read/write operations to dynamic capacity
> for DCD.
> 
> With the change, the following support is added:
> 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
>    memory backend for dynamic capacity. Currently, all DC regions share one
>    host backend;
> 2. Add namespace for dynamic capacity for read/write support;
> 3. Create cdat entries for each dynamic capacity region.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
One fixlet needed inline.

I've set range1_size_lo = 0 there for my tree.

> @@ -301,10 +337,16 @@ static void build_dvsecs(CXLType3Dev *ct3d)
>              range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                               (ct3d->hostpmem->size & 0xF0000000);
>          }
> -    } else {
> +    } else if (ct3d->hostpmem) {
>          range1_size_hi = ct3d->hostpmem->size >> 32;
>          range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                           (ct3d->hostpmem->size & 0xF0000000);
> +    } else {
> +        /*
> +         * For DCD with no static memory, set memory active, memory class bits.
> +         * No range is set.
> +         */

range1_size_hi is not initialized.

> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3;
>      }
>  
>      dvsec = (uint8_t *)&(CXLDVSECDevice){


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
@ 2024-04-22 11:52     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-22 11:52 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Thu, 18 Apr 2024 16:10:57 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Add (file/memory backed) host backend for DCD. All the dynamic capacity
> regions will share a single, large enough host backend. Set up address
> space for DC regions to support read/write operations to dynamic capacity
> for DCD.
> 
> With the change, the following support is added:
> 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
>    memory backend for dynamic capacity. Currently, all DC regions share one
>    host backend;
> 2. Add namespace for dynamic capacity for read/write support;
> 3. Create cdat entries for each dynamic capacity region.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
One fixlet needed inline.

I've set range1_size_lo = 0 there for my tree.

> @@ -301,10 +337,16 @@ static void build_dvsecs(CXLType3Dev *ct3d)
>              range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                               (ct3d->hostpmem->size & 0xF0000000);
>          }
> -    } else {
> +    } else if (ct3d->hostpmem) {
>          range1_size_hi = ct3d->hostpmem->size >> 32;
>          range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                           (ct3d->hostpmem->size & 0xF0000000);
> +    } else {
> +        /*
> +         * For DCD with no static memory, set memory active, memory class bits.
> +         * No range is set.
> +         */

range1_size_hi is not initialized.

> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3;
>      }
>  
>      dvsec = (uint8_t *)&(CXLDVSECDevice){



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-04-19 17:27   ` Gregory Price
@ 2024-04-22 11:55       ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-22 11:55 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Fri, 19 Apr 2024 13:27:59 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Thu, Apr 18, 2024 at 04:10:57PM -0700, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > Add (file/memory backed) host backend for DCD. All the dynamic capacity
> > regions will share a single, large enough host backend. Set up address
> > space for DC regions to support read/write operations to dynamic capacity
> > for DCD.
> > 
> > With the change, the following support is added:
> > 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
> >    memory backend for dynamic capacity. Currently, all DC regions share one
> >    host backend;
> > 2. Add namespace for dynamic capacity for read/write support;
> > 3. Create cdat entries for each dynamic capacity region.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  16 ++--
> >  hw/mem/cxl_type3.c          | 172 +++++++++++++++++++++++++++++-------
> >  include/hw/cxl/cxl_device.h |   8 ++
> >  3 files changed, 160 insertions(+), 36 deletions(-)
> >   
> 
> A couple general comments in line for discussion, but patch looks good
> otherwise. Notes are mostly on improvements we could make that should
> not block this patch.
> 
> Reviewed-by: Gregory Price <gregory.price@memverge.com>
> 
> >  
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index a1fe268560..ac87398089 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -45,7 +45,8 @@ enum {
> >  
> >  static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> >                                            int dsmad_handle, uint64_t size,
> > -                                          bool is_pmem, uint64_t dpa_base)
> > +                                          bool is_pmem, bool is_dynamic,
> > +                                          uint64_t dpa_base)  
> 
> We should probably change the is_* fields into a flags field and do some
> error checking on the combination of flags.
> 
> >  {
> >      CDATDsmas *dsmas;
> >      CDATDslbis *dslbis0;
> > @@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> >              .length = sizeof(*dsmas),
> >          },
> >          .DSMADhandle = dsmad_handle,
> > -        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> > +        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> > +                 (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),  
> 
> For example, as noted elsewhere in the code, is_pmem+is_dynamic is not
> presently supported, so this shouldn't even be allowed in this function.
> 
> > +    if (dc_mr) {
> > +        int i;
> > +        uint64_t region_base = vmr_size + pmr_size;
> > +
> > +        /*
> > +         * TODO: we assume the dynamic capacity to be volatile for now.
> > +         * Non-volatile dynamic capacity will be added if needed in the
> > +         * future.
> > +         */  
> 
> Probably don't need to mark this TODO, can just leave it as a note.
> 
> Non-volatile dynamic capacity will coincide with shared memory, so it'll
> end up handled.  So this isn't really a TODO for this current work, and
> should read more like:
> 
> "Dynamic Capacity is always volatile, until shared memory is
> implemented"

I can sort of see your logic, but there is a difference between
volatile memory that is shared and persistent memory (typically whether
we need to care about deep flushes in some architectures) so I'd expected
volatile shared capacity to still be a thing, even if the host OS treats
it in most ways as persistent.

Also, persistent + DCD could be a thing without sharing sometime in the
future.

> 
> > +    } else if (ct3d->hostpmem) {
> >          range1_size_hi = ct3d->hostpmem->size >> 32;
> >          range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> >                           (ct3d->hostpmem->size & 0xF0000000);
> > +    } else {
> > +        /*
> > +         * For DCD with no static memory, set memory active, memory class bits.
> > +         * No range is set.
> > +         */
> > +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3;  
> 
> We should probably add defs for these fields at some point. Can be
> tabled for later work though.
Agreed - worth tidying up but not on critical path.

> 
> > +        /*
> > +         * TODO: set dc as volatile for now, non-volatile support can be added
> > +         * in the future if needed.
> > +         */
> > +        memory_region_set_nonvolatile(dc_mr, false);  
> 
> Again can probably drop the TODO and just leave a statement.
> 
> ~Gregory


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
@ 2024-04-22 11:55       ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-22 11:55 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Fri, 19 Apr 2024 13:27:59 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Thu, Apr 18, 2024 at 04:10:57PM -0700, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > Add (file/memory backed) host backend for DCD. All the dynamic capacity
> > regions will share a single, large enough host backend. Set up address
> > space for DC regions to support read/write operations to dynamic capacity
> > for DCD.
> > 
> > With the change, the following support is added:
> > 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
> >    memory backend for dynamic capacity. Currently, all DC regions share one
> >    host backend;
> > 2. Add namespace for dynamic capacity for read/write support;
> > 3. Create cdat entries for each dynamic capacity region.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  16 ++--
> >  hw/mem/cxl_type3.c          | 172 +++++++++++++++++++++++++++++-------
> >  include/hw/cxl/cxl_device.h |   8 ++
> >  3 files changed, 160 insertions(+), 36 deletions(-)
> >   
> 
> A couple general comments in line for discussion, but patch looks good
> otherwise. Notes are mostly on improvements we could make that should
> not block this patch.
> 
> Reviewed-by: Gregory Price <gregory.price@memverge.com>
> 
> >  
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index a1fe268560..ac87398089 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -45,7 +45,8 @@ enum {
> >  
> >  static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> >                                            int dsmad_handle, uint64_t size,
> > -                                          bool is_pmem, uint64_t dpa_base)
> > +                                          bool is_pmem, bool is_dynamic,
> > +                                          uint64_t dpa_base)  
> 
> We should probably change the is_* fields into a flags field and do some
> error checking on the combination of flags.
> 
> >  {
> >      CDATDsmas *dsmas;
> >      CDATDslbis *dslbis0;
> > @@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> >              .length = sizeof(*dsmas),
> >          },
> >          .DSMADhandle = dsmad_handle,
> > -        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> > +        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> > +                 (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),  
> 
> For example, as noted elsewhere in the code, is_pmem+is_dynamic is not
> presently supported, so this shouldn't even be allowed in this function.
> 
> > +    if (dc_mr) {
> > +        int i;
> > +        uint64_t region_base = vmr_size + pmr_size;
> > +
> > +        /*
> > +         * TODO: we assume the dynamic capacity to be volatile for now.
> > +         * Non-volatile dynamic capacity will be added if needed in the
> > +         * future.
> > +         */  
> 
> Probably don't need to mark this TODO, can just leave it as a note.
> 
> Non-volatile dynamic capacity will coincide with shared memory, so it'll
> end up handled.  So this isn't really a TODO for this current work, and
> should read more like:
> 
> "Dynamic Capacity is always volatile, until shared memory is
> implemented"

I can sort of see your logic, but there is a difference between
volatile memory that is shared and persistent memory (typically whether
we need to care about deep flushes in some architectures) so I'd expected
volatile shared capacity to still be a thing, even if the host OS treats
it in most ways as persistent.

Also, persistent + DCD could be a thing without sharing sometime in the
future.

> 
> > +    } else if (ct3d->hostpmem) {
> >          range1_size_hi = ct3d->hostpmem->size >> 32;
> >          range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> >                           (ct3d->hostpmem->size & 0xF0000000);
> > +    } else {
> > +        /*
> > +         * For DCD with no static memory, set memory active, memory class bits.
> > +         * No range is set.
> > +         */
> > +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3;  
> 
> We should probably add defs for these fields at some point. Can be
> tabled for later work though.
Agreed - worth tidying up but not on critical path.

> 
> > +        /*
> > +         * TODO: set dc as volatile for now, non-volatile support can be added
> > +         * in the future if needed.
> > +         */
> > +        memory_region_set_nonvolatile(dc_mr, false);  
> 
> Again can probably drop the TODO and just leave a statement.
> 
> ~Gregory



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-18 23:11 ` [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-04-22 12:01     ` Jonathan Cameron via
  2024-04-22 12:01     ` Jonathan Cameron via
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-22 12:01 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni, Markus Armbruster, Michael Roth

On Thu, 18 Apr 2024 16:11:00 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
>

Hi Fan,

Please expand CC list to include QAPI maintainers.
+CC Markus and Micheal.

Also, for future versions +CC Michael Tsirkin.

I'm find rolling these up as a series with the precursors but
if it is already some Michael has seen it may speed things up.

Jonathan

p.s. Today I'm just building a tree, but will circle back around
later in the week with a final review of the last few changes.

 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "selection-policy": 2,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "flags": 1,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
>  hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json               |  69 ++++++++
>  6 files changed, 489 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 9d54e10cd4..3569902e9e 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
>   * Check whether any bit between addr[nr, nr+size) is set,
>   * return true if any bit is set, otherwise return false
>   */
> -static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
>                                unsigned long size)
>  {
>      unsigned long res = find_next_bit(addr, size + nr, nr);
> @@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
>      return NULL;
>  }
>  
> -static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
>                                               uint64_t dpa,
>                                               uint64_t len,
>                                               uint8_t *tag,
> @@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>      g_free(extent);
>  }
>  
> +/*
> + * Add a new extent to the extent "group" if group exists;
> + * otherwise, create a new group
> + * Return value: return the group where the extent is inserted.
> + */
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq)
> +{
> +    if (!group) {
> +        group = g_new0(CXLDCExtentGroup, 1);
> +        QTAILQ_INIT(&group->list);
> +    }
> +    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
> +                                     tag, shared_seq);
> +    return group;
> +}
> +
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group)
> +{
> +    QTAILQ_INSERT_TAIL(list, group, node);
> +}
> +
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
> +
> +    QTAILQ_REMOVE(list, group, node);
> +    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&group->list, ent);
> +    }
> +    g_free(group);
> +}
> +
>  /*
>   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
>   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>  {
>      uint32_t i;
>      CXLDCExtent *ent;
> +    CXLDCExtentGroup *ext_group;
>      uint64_t dpa, len;
>      Range range1, range2;
>  
> @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>          range_init_nofail(&range1, dpa, len);
>  
>          /*
> -         * TODO: once the pending extent list is added, check against
> -         * the list will be added here.
> +         * The host-accepted DPA range must be contained by the first extent
> +         * group in the pending list
>           */
> +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
>  
>          /* to-be-added range should not overlap with range already accepted */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> -        /*
> -         * TODO: once the pending list is introduced, extents in the beginning
> -         * will get wiped out.
> -         */
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return CXL_MBOX_SUCCESS;
>      }
>  
> @@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
> -        /*
> -         * TODO: we will add a pending extent list based on event log record
> -         * and process the list accordingly here.
> -         */
>      }
> +    /* Remove the first extent group in the pending list*/
> +    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>  
>      return CXL_MBOX_SUCCESS;
>  }
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c2cdd6d506..e892b3de7b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>          ct3d->dc.total_capacity += region->len;
>      }
>      QTAILQ_INIT(&ct3d->dc.extents);
> +    QTAILQ_INIT(&ct3d->dc.extents_pending);
>  
>      return true;
>  }
> @@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>  static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
>  {
>      CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group, *group_next;
>  
>      QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
>          cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
>      }
> +
> +    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
> +        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
> +        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +            cxl_remove_extent_from_extent_list(&group->list, ent);
> +        }
> +        g_free(group);
> +    }
>  }
>  
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> @@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
>          return CXL_EVENT_TYPE_FAIL;
>      case CXL_EVENT_LOG_FATAL:
>          return CXL_EVENT_TYPE_FATAL;
> -/* DCD not yet supported */
>      default:
>          return -EINVAL;
>      }
> @@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
>      }
>  }
>  
> +/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
> +static const QemuUUID dynamic_capacity_uuid = {
> +    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> +                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> +};
> +
> +typedef enum CXLDCEventType {
> +    DC_EVENT_ADD_CAPACITY = 0x0,
> +    DC_EVENT_RELEASE_CAPACITY = 0x1,
> +    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
> +    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
> +    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
> +    DC_EVENT_CAPACITY_RELEASED = 0x5,
> +} CXLDCEventType;
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
> + * the list.
> + * Return value: return true if has overlaps; otherwise, return false
> + */
> +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> +                                           uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_overlaps_range(&range1, &range2)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] is contained by extents in
> + * the list.
> + * Will check multiple extents containment once superset release is added.
> + * Return value: return true if range is contained; otherwise, return false
> + */
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_contains_range(&range2, &range1)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
> +                                                uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtentGroup *group;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    QTAILQ_FOREACH(group, list, node) {
> +        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * The main function to process dynamic capacity event with extent list.
> + * Currently DC extents add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> +        CXLDCExtentRecordList *records, Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    CXLDCExtentGroup *group = NULL;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> +    uint64_t dpa, offset, len, block_size;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    int i;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA again while still pending");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +        num_extents++;
> +    }
> +
> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            group = cxl_insert_extent_to_extent_group(group,
> +                                                      extents[i].start_dpa,
> +                                                      extents[i].len,
> +                                                      extents[i].tag,
> +                                                      extents[i].shared_seq);
> +        }
> +
> +        list = list->next;
> +        i++;
> +    }
> +    if (group) {
> +        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (i < num_extents - 1) {
> +            /* Set "More" flag */
> +            dCap.flags |= BIT(0);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {
> +            cxl_event_irq_assert(dcd);
> +        }
> +    }
> +}
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
> +                                  uint8_t sel_policy, uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    enum {
> +        CXL_SEL_POLICY_FREE,
> +        CXL_SEL_POLICY_CONTIGUOUS,
> +        CXL_SEL_POLICY_PRESCRIPTIVE,
> +        CXL_SEL_POLICY_ENABLESHAREDACCESS,
> +    };
> +    switch (sel_policy) {
> +    case CXL_SEL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
> +                                                      DC_EVENT_ADD_CAPACITY,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Selection policy not supported");
> +        return;
> +    }
> +}
> +
> +#define REMOVAL_POLICY_MASK 0xf
> +#define REMOVAL_POLICY_PRESCRIPTIVE 1
> +#define FORCED_REMOVAL_BIT BIT(4)
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> +
> +    if (flags & FORCED_REMOVAL_BIT) {
> +        /* TODO: enable forced removal in the future */
> +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> +        error_setg(errp, "Forced removal not supported yet");
> +        return;
> +    }
> +
> +    switch (flags & REMOVAL_POLICY_MASK) {
> +    case REMOVAL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Removal policy not supported");
> +        return;
> +    }
> +}
> +
>  static void ct3_class_init(ObjectClass *oc, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(oc);
> diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
> index 3e1851e32b..810685e0d5 100644
> --- a/hw/mem/cxl_type3_stubs.c
> +++ b/hw/mem/cxl_type3_stubs.c
> @@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
>  {
>      error_setg(errp, "CXL Type 3 support is not compiled in");
>  }
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path,
> +                                  uint16_t hid,
> +                                  uint8_t sel_policy,
> +                                  uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index df3511e91b..c69ff6b5de 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
>  } CXLDCExtent;
>  typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
>  
> +typedef struct CXLDCExtentGroup {
> +    CXLDCExtentList list;
> +    QTAILQ_ENTRY(CXLDCExtentGroup) node;
> +} CXLDCExtentGroup;
> +typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
> +
>  typedef struct CXLDCRegion {
>      uint64_t base;       /* aligned to 256*MiB */
>      uint64_t decode_len; /* aligned to 256*MiB */
> @@ -494,6 +500,7 @@ struct CXLType3Dev {
>           */
>          uint64_t total_capacity; /* 256M aligned */
>          CXLDCExtentList extents;
> +        CXLDCExtentGroupList extents_pending;
>          uint32_t total_extent_count;
>          uint32_t ext_list_gen_seq;
>  
> @@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
>  
>  void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>                                          CXLDCExtent *extent);
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> +                                      uint64_t len, uint8_t *tag,
> +                                      uint16_t shared_seq);
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +                       unsigned long size);
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len);
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq);
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group);
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
>  #endif
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 5170b8dbf8..38cadaa0f3 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
>      uint8_t reserved[0x3d];
>  } QEMU_PACKED CXLEventMemoryModule;
>  
> +/*
> + * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
> + * All fields little endian.
> + */
> +typedef struct CXLEventDynamicCapacity {
> +    CXLEventRecordHdr hdr;
> +    uint8_t type;
> +    uint8_t validity_flags;
> +    uint16_t host_id;
> +    uint8_t updated_region_id;
> +    uint8_t flags;
> +    uint8_t reserved2[2];
> +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
> +    uint8_t reserved[0x18];
> +    uint32_t extents_avail;
> +    uint32_t tags_avail;
> +} QEMU_PACKED CXLEventDynamicCapacity;
> +
>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4281726dec..2dcf03d973 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -361,3 +361,72 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDCExtentRecord:
> +#
> +# Record of a single extent to add/release
> +#
> +# @offset: offset to the start of the region where the extent to be operated
> +# @len: length of the extent
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'CXLDCExtentRecord',
> +  'data': {
> +      'offset':'uint64',
> +      'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The device will
> +# have to acknowledged the acceptance of the extents before they are usable.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @hid: host id
> +# @selection-policy: policy to use for selecting extents for adding capacity
> +# @region-id: id of the region where the extent to add
> +# @tag: Context field
> +# @extents: Extents to add
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'selection-policy': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @hid: host id
> +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> +#     sanitize on release, bit[7:6] reserved
> +# @region-id: id of the region where the extent to release
> +# @tag: Context field
> +# @extents: Extents to release
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'flags': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-22 12:01     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-22 12:01 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni, Markus Armbruster, Michael Roth

On Thu, 18 Apr 2024 16:11:00 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
>

Hi Fan,

Please expand CC list to include QAPI maintainers.
+CC Markus and Micheal.

Also, for future versions +CC Michael Tsirkin.

I'm find rolling these up as a series with the precursors but
if it is already some Michael has seen it may speed things up.

Jonathan

p.s. Today I'm just building a tree, but will circle back around
later in the week with a final review of the last few changes.

 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "selection-policy": 2,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "flags": 1,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
>  hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json               |  69 ++++++++
>  6 files changed, 489 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 9d54e10cd4..3569902e9e 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
>   * Check whether any bit between addr[nr, nr+size) is set,
>   * return true if any bit is set, otherwise return false
>   */
> -static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
>                                unsigned long size)
>  {
>      unsigned long res = find_next_bit(addr, size + nr, nr);
> @@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
>      return NULL;
>  }
>  
> -static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
>                                               uint64_t dpa,
>                                               uint64_t len,
>                                               uint8_t *tag,
> @@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>      g_free(extent);
>  }
>  
> +/*
> + * Add a new extent to the extent "group" if group exists;
> + * otherwise, create a new group
> + * Return value: return the group where the extent is inserted.
> + */
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq)
> +{
> +    if (!group) {
> +        group = g_new0(CXLDCExtentGroup, 1);
> +        QTAILQ_INIT(&group->list);
> +    }
> +    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
> +                                     tag, shared_seq);
> +    return group;
> +}
> +
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group)
> +{
> +    QTAILQ_INSERT_TAIL(list, group, node);
> +}
> +
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
> +
> +    QTAILQ_REMOVE(list, group, node);
> +    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&group->list, ent);
> +    }
> +    g_free(group);
> +}
> +
>  /*
>   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
>   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>  {
>      uint32_t i;
>      CXLDCExtent *ent;
> +    CXLDCExtentGroup *ext_group;
>      uint64_t dpa, len;
>      Range range1, range2;
>  
> @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>          range_init_nofail(&range1, dpa, len);
>  
>          /*
> -         * TODO: once the pending extent list is added, check against
> -         * the list will be added here.
> +         * The host-accepted DPA range must be contained by the first extent
> +         * group in the pending list
>           */
> +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
>  
>          /* to-be-added range should not overlap with range already accepted */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> -        /*
> -         * TODO: once the pending list is introduced, extents in the beginning
> -         * will get wiped out.
> -         */
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return CXL_MBOX_SUCCESS;
>      }
>  
> @@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
> -        /*
> -         * TODO: we will add a pending extent list based on event log record
> -         * and process the list accordingly here.
> -         */
>      }
> +    /* Remove the first extent group in the pending list*/
> +    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>  
>      return CXL_MBOX_SUCCESS;
>  }
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c2cdd6d506..e892b3de7b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>          ct3d->dc.total_capacity += region->len;
>      }
>      QTAILQ_INIT(&ct3d->dc.extents);
> +    QTAILQ_INIT(&ct3d->dc.extents_pending);
>  
>      return true;
>  }
> @@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>  static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
>  {
>      CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group, *group_next;
>  
>      QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
>          cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
>      }
> +
> +    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
> +        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
> +        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +            cxl_remove_extent_from_extent_list(&group->list, ent);
> +        }
> +        g_free(group);
> +    }
>  }
>  
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> @@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
>          return CXL_EVENT_TYPE_FAIL;
>      case CXL_EVENT_LOG_FATAL:
>          return CXL_EVENT_TYPE_FATAL;
> -/* DCD not yet supported */
>      default:
>          return -EINVAL;
>      }
> @@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
>      }
>  }
>  
> +/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
> +static const QemuUUID dynamic_capacity_uuid = {
> +    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> +                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> +};
> +
> +typedef enum CXLDCEventType {
> +    DC_EVENT_ADD_CAPACITY = 0x0,
> +    DC_EVENT_RELEASE_CAPACITY = 0x1,
> +    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
> +    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
> +    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
> +    DC_EVENT_CAPACITY_RELEASED = 0x5,
> +} CXLDCEventType;
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
> + * the list.
> + * Return value: return true if has overlaps; otherwise, return false
> + */
> +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> +                                           uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_overlaps_range(&range1, &range2)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] is contained by extents in
> + * the list.
> + * Will check multiple extents containment once superset release is added.
> + * Return value: return true if range is contained; otherwise, return false
> + */
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_contains_range(&range2, &range1)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
> +                                                uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtentGroup *group;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    QTAILQ_FOREACH(group, list, node) {
> +        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * The main function to process dynamic capacity event with extent list.
> + * Currently DC extents add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> +        CXLDCExtentRecordList *records, Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    CXLDCExtentGroup *group = NULL;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> +    uint64_t dpa, offset, len, block_size;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    int i;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA again while still pending");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +        num_extents++;
> +    }
> +
> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            group = cxl_insert_extent_to_extent_group(group,
> +                                                      extents[i].start_dpa,
> +                                                      extents[i].len,
> +                                                      extents[i].tag,
> +                                                      extents[i].shared_seq);
> +        }
> +
> +        list = list->next;
> +        i++;
> +    }
> +    if (group) {
> +        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (i < num_extents - 1) {
> +            /* Set "More" flag */
> +            dCap.flags |= BIT(0);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {
> +            cxl_event_irq_assert(dcd);
> +        }
> +    }
> +}
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
> +                                  uint8_t sel_policy, uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    enum {
> +        CXL_SEL_POLICY_FREE,
> +        CXL_SEL_POLICY_CONTIGUOUS,
> +        CXL_SEL_POLICY_PRESCRIPTIVE,
> +        CXL_SEL_POLICY_ENABLESHAREDACCESS,
> +    };
> +    switch (sel_policy) {
> +    case CXL_SEL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
> +                                                      DC_EVENT_ADD_CAPACITY,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Selection policy not supported");
> +        return;
> +    }
> +}
> +
> +#define REMOVAL_POLICY_MASK 0xf
> +#define REMOVAL_POLICY_PRESCRIPTIVE 1
> +#define FORCED_REMOVAL_BIT BIT(4)
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> +
> +    if (flags & FORCED_REMOVAL_BIT) {
> +        /* TODO: enable forced removal in the future */
> +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> +        error_setg(errp, "Forced removal not supported yet");
> +        return;
> +    }
> +
> +    switch (flags & REMOVAL_POLICY_MASK) {
> +    case REMOVAL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Removal policy not supported");
> +        return;
> +    }
> +}
> +
>  static void ct3_class_init(ObjectClass *oc, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(oc);
> diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
> index 3e1851e32b..810685e0d5 100644
> --- a/hw/mem/cxl_type3_stubs.c
> +++ b/hw/mem/cxl_type3_stubs.c
> @@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
>  {
>      error_setg(errp, "CXL Type 3 support is not compiled in");
>  }
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path,
> +                                  uint16_t hid,
> +                                  uint8_t sel_policy,
> +                                  uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index df3511e91b..c69ff6b5de 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
>  } CXLDCExtent;
>  typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
>  
> +typedef struct CXLDCExtentGroup {
> +    CXLDCExtentList list;
> +    QTAILQ_ENTRY(CXLDCExtentGroup) node;
> +} CXLDCExtentGroup;
> +typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
> +
>  typedef struct CXLDCRegion {
>      uint64_t base;       /* aligned to 256*MiB */
>      uint64_t decode_len; /* aligned to 256*MiB */
> @@ -494,6 +500,7 @@ struct CXLType3Dev {
>           */
>          uint64_t total_capacity; /* 256M aligned */
>          CXLDCExtentList extents;
> +        CXLDCExtentGroupList extents_pending;
>          uint32_t total_extent_count;
>          uint32_t ext_list_gen_seq;
>  
> @@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
>  
>  void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>                                          CXLDCExtent *extent);
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> +                                      uint64_t len, uint8_t *tag,
> +                                      uint16_t shared_seq);
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +                       unsigned long size);
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len);
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq);
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group);
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
>  #endif
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 5170b8dbf8..38cadaa0f3 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
>      uint8_t reserved[0x3d];
>  } QEMU_PACKED CXLEventMemoryModule;
>  
> +/*
> + * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
> + * All fields little endian.
> + */
> +typedef struct CXLEventDynamicCapacity {
> +    CXLEventRecordHdr hdr;
> +    uint8_t type;
> +    uint8_t validity_flags;
> +    uint16_t host_id;
> +    uint8_t updated_region_id;
> +    uint8_t flags;
> +    uint8_t reserved2[2];
> +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
> +    uint8_t reserved[0x18];
> +    uint32_t extents_avail;
> +    uint32_t tags_avail;
> +} QEMU_PACKED CXLEventDynamicCapacity;
> +
>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4281726dec..2dcf03d973 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -361,3 +361,72 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDCExtentRecord:
> +#
> +# Record of a single extent to add/release
> +#
> +# @offset: offset to the start of the region where the extent to be operated
> +# @len: length of the extent
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'CXLDCExtentRecord',
> +  'data': {
> +      'offset':'uint64',
> +      'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The device will
> +# have to acknowledged the acceptance of the extents before they are usable.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @hid: host id
> +# @selection-policy: policy to use for selecting extents for adding capacity
> +# @region-id: id of the region where the extent to add
> +# @tag: Context field
> +# @extents: Extents to add
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'selection-policy': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @hid: host id
> +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> +#     sanitize on release, bit[7:6] reserved
> +# @region-id: id of the region where the extent to release
> +# @tag: Context field
> +# @extents: Extents to release
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'flags': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-04-20 20:35     ` Gregory Price
@ 2024-04-22 12:04         ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-22 12:04 UTC (permalink / raw)
  To: Gregory Price
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, fan.ni

On Sat, 20 Apr 2024 16:35:46 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:
> > On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:  
> > > 
> > > added review to all patches, will hopefully be able to add a Tested-by
> > > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > > 
> > > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > > 
> > > The super-set release will complicate a few things but this doesn't
> > > look like a blocker on our end, just a change to how we track bits in a
> > > shared bit/bytemap.
> > >   
> > 
> > Hi Gregory,
> > Thanks for reviewing the patches so quickly. 
> > 
> > No pressure, but look forward to your MHD work. :)
> > 
> > Fan  
> 
> Starting to get into versioniong hell a bit, since the Niagara work was
> based off of jonathan's branch and the mhd-dcd work needs some of the
> extentions from that branch - while this branch is based on master.
> 
> Probably we'll need to wait for a new cxl dated branch to try and sus
> out the pain points before we push an RFC.  I would not want to have
> conflicting commits for something like this for example:
> 
> https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.price@memverge.com/
> 
> We get merge conflicts here because this is behind that patch. So
> pushing up an RFC in this state would be mostly useless to everyone

Subtle hint noted ;) 

I'll build a fresh tree - any remaining rebases until QEMU 9.0 should be
straight forward anyway.   My ideal is that the NUMA GP series lands early
in 9.1 cycle and this can go in parallel.  I'd really like to
get this in early if possible so we can start clearing some of the other
stuff that ended up built on top of it!

Jonathan

> 
> ~Gregory


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
@ 2024-04-22 12:04         ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-22 12:04 UTC (permalink / raw)
  To: Gregory Price
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, fan.ni

On Sat, 20 Apr 2024 16:35:46 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:
> > On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:  
> > > 
> > > added review to all patches, will hopefully be able to add a Tested-by
> > > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > > 
> > > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > > 
> > > The super-set release will complicate a few things but this doesn't
> > > look like a blocker on our end, just a change to how we track bits in a
> > > shared bit/bytemap.
> > >   
> > 
> > Hi Gregory,
> > Thanks for reviewing the patches so quickly. 
> > 
> > No pressure, but look forward to your MHD work. :)
> > 
> > Fan  
> 
> Starting to get into versioniong hell a bit, since the Niagara work was
> based off of jonathan's branch and the mhd-dcd work needs some of the
> extentions from that branch - while this branch is based on master.
> 
> Probably we'll need to wait for a new cxl dated branch to try and sus
> out the pain points before we push an RFC.  I would not want to have
> conflicting commits for something like this for example:
> 
> https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.price@memverge.com/
> 
> We get merge conflicts here because this is behind that patch. So
> pushing up an RFC in this state would be mostly useless to everyone

Subtle hint noted ;) 

I'll build a fresh tree - any remaining rebases until QEMU 9.0 should be
straight forward anyway.   My ideal is that the NUMA GP series lands early
in 9.1 cycle and this can go in parallel.  I'd really like to
get this in early if possible so we can start clearing some of the other
stuff that ended up built on top of it!

Jonathan

> 
> ~Gregory



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-04-22 12:04         ` Jonathan Cameron via
@ 2024-04-22 14:23           ` Jonathan Cameron via
  -1 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-22 14:23 UTC (permalink / raw)
  To: Gregory Price
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, fan.ni

On Mon, 22 Apr 2024 13:04:48 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Sat, 20 Apr 2024 16:35:46 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:  
> > > On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:    
> > > > 
> > > > added review to all patches, will hopefully be able to add a Tested-by
> > > > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > > > 
> > > > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > > > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > > > 
> > > > The super-set release will complicate a few things but this doesn't
> > > > look like a blocker on our end, just a change to how we track bits in a
> > > > shared bit/bytemap.
> > > >     
> > > 
> > > Hi Gregory,
> > > Thanks for reviewing the patches so quickly. 
> > > 
> > > No pressure, but look forward to your MHD work. :)
> > > 
> > > Fan    
> > 
> > Starting to get into versioniong hell a bit, since the Niagara work was
> > based off of jonathan's branch and the mhd-dcd work needs some of the
> > extentions from that branch - while this branch is based on master.
> > 
> > Probably we'll need to wait for a new cxl dated branch to try and sus
> > out the pain points before we push an RFC.  I would not want to have
> > conflicting commits for something like this for example:
> > 
> > https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.price@memverge.com/
> > 
> > We get merge conflicts here because this is behind that patch. So
> > pushing up an RFC in this state would be mostly useless to everyone  
> 
> Subtle hint noted ;) 
> 
> I'll build a fresh tree - any remaining rebases until QEMU 9.0 should be
> straight forward anyway.   My ideal is that the NUMA GP series lands early
> in 9.1 cycle and this can go in parallel.  I'd really like to
> get this in early if possible so we can start clearing some of the other
> stuff that ended up built on top of it!

I've pushed to gitlab.com/jic23/qemu cxl-2024-04-22-draft
Its extremely lightly tested so far.

To save time, I've temporarily dropped the fm-api DCD initiate
dynamic capacity add patch as that needs non trivial updates.

I've not yet caught up with some other outstanding series, but
I will almost certainly put them on top of DCD.

Jonathan

> 
> Jonathan
> 
> > 
> > ~Gregory  
> 
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
@ 2024-04-22 14:23           ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-22 14:23 UTC (permalink / raw)
  To: Gregory Price
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, fan.ni

On Mon, 22 Apr 2024 13:04:48 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Sat, 20 Apr 2024 16:35:46 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:  
> > > On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:    
> > > > 
> > > > added review to all patches, will hopefully be able to add a Tested-by
> > > > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > > > 
> > > > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > > > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > > > 
> > > > The super-set release will complicate a few things but this doesn't
> > > > look like a blocker on our end, just a change to how we track bits in a
> > > > shared bit/bytemap.
> > > >     
> > > 
> > > Hi Gregory,
> > > Thanks for reviewing the patches so quickly. 
> > > 
> > > No pressure, but look forward to your MHD work. :)
> > > 
> > > Fan    
> > 
> > Starting to get into versioniong hell a bit, since the Niagara work was
> > based off of jonathan's branch and the mhd-dcd work needs some of the
> > extentions from that branch - while this branch is based on master.
> > 
> > Probably we'll need to wait for a new cxl dated branch to try and sus
> > out the pain points before we push an RFC.  I would not want to have
> > conflicting commits for something like this for example:
> > 
> > https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.price@memverge.com/
> > 
> > We get merge conflicts here because this is behind that patch. So
> > pushing up an RFC in this state would be mostly useless to everyone  
> 
> Subtle hint noted ;) 
> 
> I'll build a fresh tree - any remaining rebases until QEMU 9.0 should be
> straight forward anyway.   My ideal is that the NUMA GP series lands early
> in 9.1 cycle and this can go in parallel.  I'd really like to
> get this in early if possible so we can start clearing some of the other
> stuff that ended up built on top of it!

I've pushed to gitlab.com/jic23/qemu cxl-2024-04-22-draft
Its extremely lightly tested so far.

To save time, I've temporarily dropped the fm-api DCD initiate
dynamic capacity add patch as that needs non trivial updates.

I've not yet caught up with some other outstanding series, but
I will almost certainly put them on top of DCD.

Jonathan

> 
> Jonathan
> 
> > 
> > ~Gregory  
> 
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-04-22 14:23           ` Jonathan Cameron via
@ 2024-04-22 15:07             ` Jonathan Cameron via
  -1 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-22 15:07 UTC (permalink / raw)
  To: Gregory Price
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, fan.ni

On Mon, 22 Apr 2024 15:23:16 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Mon, 22 Apr 2024 13:04:48 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Sat, 20 Apr 2024 16:35:46 -0400
> > Gregory Price <gregory.price@memverge.com> wrote:
> >   
> > > On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:    
> > > > On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:      
> > > > > 
> > > > > added review to all patches, will hopefully be able to add a Tested-by
> > > > > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > > > > 
> > > > > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > > > > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > > > > 
> > > > > The super-set release will complicate a few things but this doesn't
> > > > > look like a blocker on our end, just a change to how we track bits in a
> > > > > shared bit/bytemap.
> > > > >       
> > > > 
> > > > Hi Gregory,
> > > > Thanks for reviewing the patches so quickly. 
> > > > 
> > > > No pressure, but look forward to your MHD work. :)
> > > > 
> > > > Fan      
> > > 
> > > Starting to get into versioniong hell a bit, since the Niagara work was
> > > based off of jonathan's branch and the mhd-dcd work needs some of the
> > > extentions from that branch - while this branch is based on master.
> > > 
> > > Probably we'll need to wait for a new cxl dated branch to try and sus
> > > out the pain points before we push an RFC.  I would not want to have
> > > conflicting commits for something like this for example:
> > > 
> > > https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.price@memverge.com/
> > > 
> > > We get merge conflicts here because this is behind that patch. So
> > > pushing up an RFC in this state would be mostly useless to everyone    
> > 
> > Subtle hint noted ;) 
> > 
> > I'll build a fresh tree - any remaining rebases until QEMU 9.0 should be
> > straight forward anyway.   My ideal is that the NUMA GP series lands early
> > in 9.1 cycle and this can go in parallel.  I'd really like to
> > get this in early if possible so we can start clearing some of the other
> > stuff that ended up built on top of it!  
> 
> I've pushed to gitlab.com/jic23/qemu cxl-2024-04-22-draft
> Its extremely lightly tested so far.
> 
> To save time, I've temporarily dropped the fm-api DCD initiate
> dynamic capacity add patch as that needs non trivial updates.
> 
> I've not yet caught up with some other outstanding series, but
> I will almost certainly put them on top of DCD.

If anyone pulled in meantime... I failed to push down a fix from
my working tree on top of this.
Goes to show I shouldn't ignore patches simply named "Push down" :(

Updated on same branch.

Jonathan
> 
> Jonathan
> 
> > 
> > Jonathan
> >   
> > > 
> > > ~Gregory    
> > 
> >   
> 
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
@ 2024-04-22 15:07             ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-22 15:07 UTC (permalink / raw)
  To: Gregory Price
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, fan.ni

On Mon, 22 Apr 2024 15:23:16 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Mon, 22 Apr 2024 13:04:48 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Sat, 20 Apr 2024 16:35:46 -0400
> > Gregory Price <gregory.price@memverge.com> wrote:
> >   
> > > On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:    
> > > > On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:      
> > > > > 
> > > > > added review to all patches, will hopefully be able to add a Tested-by
> > > > > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > > > > 
> > > > > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > > > > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > > > > 
> > > > > The super-set release will complicate a few things but this doesn't
> > > > > look like a blocker on our end, just a change to how we track bits in a
> > > > > shared bit/bytemap.
> > > > >       
> > > > 
> > > > Hi Gregory,
> > > > Thanks for reviewing the patches so quickly. 
> > > > 
> > > > No pressure, but look forward to your MHD work. :)
> > > > 
> > > > Fan      
> > > 
> > > Starting to get into versioniong hell a bit, since the Niagara work was
> > > based off of jonathan's branch and the mhd-dcd work needs some of the
> > > extentions from that branch - while this branch is based on master.
> > > 
> > > Probably we'll need to wait for a new cxl dated branch to try and sus
> > > out the pain points before we push an RFC.  I would not want to have
> > > conflicting commits for something like this for example:
> > > 
> > > https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.price@memverge.com/
> > > 
> > > We get merge conflicts here because this is behind that patch. So
> > > pushing up an RFC in this state would be mostly useless to everyone    
> > 
> > Subtle hint noted ;) 
> > 
> > I'll build a fresh tree - any remaining rebases until QEMU 9.0 should be
> > straight forward anyway.   My ideal is that the NUMA GP series lands early
> > in 9.1 cycle and this can go in parallel.  I'd really like to
> > get this in early if possible so we can start clearing some of the other
> > stuff that ended up built on top of it!  
> 
> I've pushed to gitlab.com/jic23/qemu cxl-2024-04-22-draft
> Its extremely lightly tested so far.
> 
> To save time, I've temporarily dropped the fm-api DCD initiate
> dynamic capacity add patch as that needs non trivial updates.
> 
> I've not yet caught up with some other outstanding series, but
> I will almost certainly put them on top of DCD.

If anyone pulled in meantime... I failed to push down a fix from
my working tree on top of this.
Goes to show I shouldn't ignore patches simply named "Push down" :(

Updated on same branch.

Jonathan
> 
> Jonathan
> 
> > 
> > Jonathan
> >   
> > > 
> > > ~Gregory    
> > 
> >   
> 
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-04-22 12:04         ` Jonathan Cameron via
  (?)
  (?)
@ 2024-04-22 15:42         ` Gregory Price
  -1 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-22 15:42 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, fan.ni

On Mon, Apr 22, 2024 at 01:04:48PM +0100, Jonathan Cameron wrote:
> On Sat, 20 Apr 2024 16:35:46 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:
> > > On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:  
> > > > 
> > > > added review to all patches, will hopefully be able to add a Tested-by
> > > > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > > > 
> > > > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > > > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > > > 
> > > > The super-set release will complicate a few things but this doesn't
> > > > look like a blocker on our end, just a change to how we track bits in a
> > > > shared bit/bytemap.
> > > >   
> > > 
> > > Hi Gregory,
> > > Thanks for reviewing the patches so quickly. 
> > > 
> > > No pressure, but look forward to your MHD work. :)
> > > 
> > > Fan  
> > 
> > Starting to get into versioniong hell a bit, since the Niagara work was
> > based off of jonathan's branch and the mhd-dcd work needs some of the
> > extentions from that branch - while this branch is based on master.
> > 
> > Probably we'll need to wait for a new cxl dated branch to try and sus
> > out the pain points before we push an RFC.  I would not want to have
> > conflicting commits for something like this for example:
> > 
> > https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.price@memverge.com/
> > 
> > We get merge conflicts here because this is behind that patch. So
> > pushing up an RFC in this state would be mostly useless to everyone
> 
> Subtle hint noted ;) 
>

Gentle nudge/poke/prod :P

Got your updates, thank you!  We should have something cleaned up today hopefully.

> I'll build a fresh tree - any remaining rebases until QEMU 9.0 should be
> straight forward anyway.   My ideal is that the NUMA GP series lands early
> in 9.1 cycle and this can go in parallel.  I'd really like to
> get this in early if possible so we can start clearing some of the other
> stuff that ended up built on top of it!
> 
> Jonathan
> 
> > 
> > ~Gregory
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-18 23:11 ` [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
  2024-04-19 18:13   ` Gregory Price
  2024-04-22 12:01     ` Jonathan Cameron via
@ 2024-04-26  9:12   ` Markus Armbruster
  2024-04-26 17:31     ` fan
  2024-05-14  2:35     ` Zhijian Li (Fujitsu) via
  3 siblings, 1 reply; 65+ messages in thread
From: Markus Armbruster @ 2024-04-26  9:12 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
	ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

nifan.cxl@gmail.com writes:

> From: Fan Ni <fan.ni@samsung.com>
>
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
>
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
>
> 1. Add dynamic capacity extents:
>
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
>
> { "execute": "qmp_capabilities" }
>
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "selection-policy": 2,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
>
> 2. Release dynamic capacity extents:
>
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
>
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "flags": 1,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
>  hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json               |  69 ++++++++
>  6 files changed, 489 insertions(+), 13 deletions(-)
>
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 9d54e10cd4..3569902e9e 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
>   * Check whether any bit between addr[nr, nr+size) is set,
>   * return true if any bit is set, otherwise return false
>   */
> -static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
>                                unsigned long size)
>  {
>      unsigned long res = find_next_bit(addr, size + nr, nr);
> @@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
>      return NULL;
>  }
>  
> -static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
>                                               uint64_t dpa,
>                                               uint64_t len,
>                                               uint8_t *tag,
> @@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>      g_free(extent);
>  }
>  
> +/*
> + * Add a new extent to the extent "group" if group exists;
> + * otherwise, create a new group
> + * Return value: return the group where the extent is inserted.
> + */
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq)
> +{
> +    if (!group) {
> +        group = g_new0(CXLDCExtentGroup, 1);
> +        QTAILQ_INIT(&group->list);
> +    }
> +    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
> +                                     tag, shared_seq);
> +    return group;
> +}
> +
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group)
> +{
> +    QTAILQ_INSERT_TAIL(list, group, node);
> +}
> +
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
> +
> +    QTAILQ_REMOVE(list, group, node);
> +    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&group->list, ent);
> +    }
> +    g_free(group);
> +}
> +
>  /*
>   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
>   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>  {
>      uint32_t i;
>      CXLDCExtent *ent;
> +    CXLDCExtentGroup *ext_group;
>      uint64_t dpa, len;
>      Range range1, range2;
>  
> @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>          range_init_nofail(&range1, dpa, len);
>  
>          /*
> -         * TODO: once the pending extent list is added, check against
> -         * the list will be added here.
> +         * The host-accepted DPA range must be contained by the first extent
> +         * group in the pending list
>           */
> +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
>  
>          /* to-be-added range should not overlap with range already accepted */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> -        /*
> -         * TODO: once the pending list is introduced, extents in the beginning
> -         * will get wiped out.
> -         */
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return CXL_MBOX_SUCCESS;
>      }
>  
> @@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
> -        /*
> -         * TODO: we will add a pending extent list based on event log record
> -         * and process the list accordingly here.
> -         */
>      }
> +    /* Remove the first extent group in the pending list*/
> +    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>  
>      return CXL_MBOX_SUCCESS;
>  }
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c2cdd6d506..e892b3de7b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>          ct3d->dc.total_capacity += region->len;
>      }
>      QTAILQ_INIT(&ct3d->dc.extents);
> +    QTAILQ_INIT(&ct3d->dc.extents_pending);
>  
>      return true;
>  }
> @@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>  static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
>  {
>      CXLDCExtent *ent, *ent_next;
> +    CXLDCExtentGroup *group, *group_next;
>  
>      QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
>          cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
>      }
> +
> +    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
> +        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
> +        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> +            cxl_remove_extent_from_extent_list(&group->list, ent);
> +        }
> +        g_free(group);
> +    }
>  }
>  
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> @@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
>          return CXL_EVENT_TYPE_FAIL;
>      case CXL_EVENT_LOG_FATAL:
>          return CXL_EVENT_TYPE_FATAL;
> -/* DCD not yet supported */
>      default:
>          return -EINVAL;
>      }
> @@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
>      }
>  }
>  
> +/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
> +static const QemuUUID dynamic_capacity_uuid = {
> +    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> +                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> +};
> +
> +typedef enum CXLDCEventType {
> +    DC_EVENT_ADD_CAPACITY = 0x0,
> +    DC_EVENT_RELEASE_CAPACITY = 0x1,
> +    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
> +    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
> +    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
> +    DC_EVENT_CAPACITY_RELEASED = 0x5,
> +} CXLDCEventType;
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
> + * the list.
> + * Return value: return true if has overlaps; otherwise, return false
> + */
> +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> +                                           uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_overlaps_range(&range1, &range2)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * Check whether the range [dpa, dpa + len - 1] is contained by extents in
> + * the list.
> + * Will check multiple extents containment once superset release is added.
> + * Return value: return true if range is contained; otherwise, return false
> + */
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_contains_range(&range2, &range1)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
> +                                                uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtentGroup *group;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    QTAILQ_FOREACH(group, list, node) {
> +        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * The main function to process dynamic capacity event with extent list.
> + * Currently DC extents add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> +        CXLDCExtentRecordList *records, Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    CXLDCExtentGroup *group = NULL;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> +    uint64_t dpa, offset, len, block_size;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    int i;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA again while still pending");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +        num_extents++;
> +    }
> +
> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            group = cxl_insert_extent_to_extent_group(group,
> +                                                      extents[i].start_dpa,
> +                                                      extents[i].len,
> +                                                      extents[i].tag,
> +                                                      extents[i].shared_seq);
> +        }
> +
> +        list = list->next;
> +        i++;
> +    }
> +    if (group) {
> +        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (i < num_extents - 1) {
> +            /* Set "More" flag */
> +            dCap.flags |= BIT(0);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {
> +            cxl_event_irq_assert(dcd);
> +        }
> +    }
> +}
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
> +                                  uint8_t sel_policy, uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    enum {
> +        CXL_SEL_POLICY_FREE,
> +        CXL_SEL_POLICY_CONTIGUOUS,
> +        CXL_SEL_POLICY_PRESCRIPTIVE,
> +        CXL_SEL_POLICY_ENABLESHAREDACCESS,
> +    };
> +    switch (sel_policy) {
> +    case CXL_SEL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
> +                                                      DC_EVENT_ADD_CAPACITY,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Selection policy not supported");
> +        return;
> +    }
> +}
> +
> +#define REMOVAL_POLICY_MASK 0xf
> +#define REMOVAL_POLICY_PRESCRIPTIVE 1
> +#define FORCED_REMOVAL_BIT BIT(4)
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> +
> +    if (flags & FORCED_REMOVAL_BIT) {
> +        /* TODO: enable forced removal in the future */
> +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> +        error_setg(errp, "Forced removal not supported yet");
> +        return;
> +    }
> +
> +    switch (flags & REMOVAL_POLICY_MASK) {
> +    case REMOVAL_POLICY_PRESCRIPTIVE:
> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> +                                                      region_id, records, errp);
> +        return;
> +    default:
> +        error_setg(errp, "Removal policy not supported");
> +        return;
> +    }
> +}
> +
>  static void ct3_class_init(ObjectClass *oc, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(oc);
> diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
> index 3e1851e32b..810685e0d5 100644
> --- a/hw/mem/cxl_type3_stubs.c
> +++ b/hw/mem/cxl_type3_stubs.c
> @@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
>  {
>      error_setg(errp, "CXL Type 3 support is not compiled in");
>  }
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path,
> +                                  uint16_t hid,
> +                                  uint8_t sel_policy,
> +                                  uint8_t region_id,
> +                                  const char *tag,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    error_setg(errp, "CXL Type 3 support is not compiled in");
> +}
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index df3511e91b..c69ff6b5de 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
>  } CXLDCExtent;
>  typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
>  
> +typedef struct CXLDCExtentGroup {
> +    CXLDCExtentList list;
> +    QTAILQ_ENTRY(CXLDCExtentGroup) node;
> +} CXLDCExtentGroup;
> +typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
> +
>  typedef struct CXLDCRegion {
>      uint64_t base;       /* aligned to 256*MiB */
>      uint64_t decode_len; /* aligned to 256*MiB */
> @@ -494,6 +500,7 @@ struct CXLType3Dev {
>           */
>          uint64_t total_capacity; /* 256M aligned */
>          CXLDCExtentList extents;
> +        CXLDCExtentGroupList extents_pending;
>          uint32_t total_extent_count;
>          uint32_t ext_list_gen_seq;
>  
> @@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
>  
>  void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
>                                          CXLDCExtent *extent);
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> +                                      uint64_t len, uint8_t *tag,
> +                                      uint16_t shared_seq);
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +                       unsigned long size);
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len);
> +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> +                                                    uint64_t dpa,
> +                                                    uint64_t len,
> +                                                    uint8_t *tag,
> +                                                    uint16_t shared_seq);
> +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> +                                       CXLDCExtentGroup *group);
> +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
>  #endif
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 5170b8dbf8..38cadaa0f3 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
>      uint8_t reserved[0x3d];
>  } QEMU_PACKED CXLEventMemoryModule;
>  
> +/*
> + * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
> + * All fields little endian.
> + */
> +typedef struct CXLEventDynamicCapacity {
> +    CXLEventRecordHdr hdr;
> +    uint8_t type;
> +    uint8_t validity_flags;
> +    uint16_t host_id;
> +    uint8_t updated_region_id;
> +    uint8_t flags;
> +    uint8_t reserved2[2];
> +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
> +    uint8_t reserved[0x18];
> +    uint32_t extents_avail;
> +    uint32_t tags_avail;
> +} QEMU_PACKED CXLEventDynamicCapacity;
> +
>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4281726dec..2dcf03d973 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -361,3 +361,72 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDCExtentRecord:

Such traffic jams of capital letters are hard to read.  What about
CxlDynamicCapacityExtent?

> +#
> +# Record of a single extent to add/release

Suggest "A dynamic capacity extent."

> +#
> +# @offset: offset to the start of the region where the extent to be operated

Blank line here, please.



> +# @len: length of the extent
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'CXLDCExtentRecord',
> +  'data': {
> +      'offset':'uint64',
> +      'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The device will
> +# have to acknowledged the acceptance of the extents before they are usable.

This text needs work.  More on that at the end of my review.

docs/devel/qapi-code-gen.rst:

    For legibility, wrap text paragraphs so every line is at most 70
    characters long.

    Separate sentences with two spaces.

More elsewhere.

> +#
> +# @path: CXL DCD canonical QOM path

I'd prefer @qom-path, unless you can make a consistency argument for
@path.

Sure the QOM path needs to be canonical?

If not, what about "path to the CXL dynamic capacity device in the QOM
tree".  Intentionally close to existing descriptions of @qom-path
elsewhere.

> +# @hid: host id

@host-id, unless "HID" is established terminology in CXL DCD land.

What is a host ID?

> +# @selection-policy: policy to use for selecting extents for adding capacity

Where are selection policies defined?

> +# @region-id: id of the region where the extent to add

Is "region ID" the established terminology in CXL DCD land?  Or is
"region number" also used?  I'm asking because "ID" in this QEMU device
context suggests a connection to a qdev ID.

If region number is fine, I'd rename to just @region, and rephrase the
description to avoid "ID".  Perhaps "number of the region the extent is
to be added to".  Not entirely happy with the phrasing, doesn't exactly
roll off the tongue, but "where the extent to add" sounds worse to my
ears.  Mind, I'm not a native speaker.

> +# @tag: Context field

What is this about?

> +# @extents: Extents to add

Blank lines between argument descriptions, please.

> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'selection-policy': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.

This text needs work.  More on that at the end of my review.

> +#
> +# @path: CXL DCD canonical QOM path

My comment on cxl-add-dynamic-capacity applies.

> +# @hid: host id

Likewise.

> +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> +#     sanitize on release, bit[7:6] reserved

Where are these flags defined?

> +# @region-id: id of the region where the extent to release

My comment on cxl-add-dynamic-capacity applies.

> +# @tag: Context field

Likewise.

> +# @extents: Extents to release
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'hid': 'uint16',
> +            'flags': 'uint8',
> +            'region-id': 'uint8',
> +            'tag': 'str',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}

During review of v5, you wrote:

    For add command, the host will send a mailbox command to response to
    the add request to the device to indicate whether it accepts the add
    capacity offer or not.
    
    For release command, the host send a mailbox command (not always a
    response since the host can proactively release capacity if it does
    not need it any more) to device to ask device release the capacity.

Can you briefly sketch the protocol?  Peers and messages involved.
Possibly as a state diagram.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-26  9:12   ` Markus Armbruster
@ 2024-04-26 17:31     ` fan
  2024-04-29  7:58       ` Markus Armbruster
  0 siblings, 1 reply; 65+ messages in thread
From: fan @ 2024-04-26 17:31 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
	gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
	nmtadam.samsung, jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:
> nifan.cxl@gmail.com writes:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> >
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> >
> > 1. Add dynamic capacity extents:
> >
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> >
> > { "execute": "qmp_capabilities" }
> >
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "hid": 0,
> >       "selection-policy": 2,
> >       "region-id": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> >
> > 2. Release dynamic capacity extents:
> >
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> >
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "hid": 0,
> >       "flags": 1,
> >       "region-id": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
> >  hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3_stubs.c    |  20 +++
> >  include/hw/cxl/cxl_device.h |  22 +++
> >  include/hw/cxl/cxl_events.h |  18 +++
> >  qapi/cxl.json               |  69 ++++++++
> >  6 files changed, 489 insertions(+), 13 deletions(-)
> >
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 9d54e10cd4..3569902e9e 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> >   * Check whether any bit between addr[nr, nr+size) is set,
> >   * return true if any bit is set, otherwise return false
> >   */
> > -static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> > +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> >                                unsigned long size)
> >  {
> >      unsigned long res = find_next_bit(addr, size + nr, nr);
> > @@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
> >      return NULL;
> >  }
> >  
> > -static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> > +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> >                                               uint64_t dpa,
> >                                               uint64_t len,
> >                                               uint8_t *tag,
> > @@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> >      g_free(extent);
> >  }
> >  
> > +/*
> > + * Add a new extent to the extent "group" if group exists;
> > + * otherwise, create a new group
> > + * Return value: return the group where the extent is inserted.
> > + */
> > +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> > +                                                    uint64_t dpa,
> > +                                                    uint64_t len,
> > +                                                    uint8_t *tag,
> > +                                                    uint16_t shared_seq)
> > +{
> > +    if (!group) {
> > +        group = g_new0(CXLDCExtentGroup, 1);
> > +        QTAILQ_INIT(&group->list);
> > +    }
> > +    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
> > +                                     tag, shared_seq);
> > +    return group;
> > +}
> > +
> > +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> > +                                       CXLDCExtentGroup *group)
> > +{
> > +    QTAILQ_INSERT_TAIL(list, group, node);
> > +}
> > +
> > +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
> > +{
> > +    CXLDCExtent *ent, *ent_next;
> > +    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
> > +
> > +    QTAILQ_REMOVE(list, group, node);
> > +    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> > +        cxl_remove_extent_from_extent_list(&group->list, ent);
> > +    }
> > +    g_free(group);
> > +}
> > +
> >  /*
> >   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> >   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> > @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> >  {
> >      uint32_t i;
> >      CXLDCExtent *ent;
> > +    CXLDCExtentGroup *ext_group;
> >      uint64_t dpa, len;
> >      Range range1, range2;
> >  
> > @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> >          range_init_nofail(&range1, dpa, len);
> >  
> >          /*
> > -         * TODO: once the pending extent list is added, check against
> > -         * the list will be added here.
> > +         * The host-accepted DPA range must be contained by the first extent
> > +         * group in the pending list
> >           */
> > +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> > +            return CXL_MBOX_INVALID_PA;
> > +        }
> >  
> >          /* to-be-added range should not overlap with range already accepted */
> >          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > @@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >      CXLRetCode ret;
> >  
> >      if (in->num_entries_updated == 0) {
> > -        /*
> > -         * TODO: once the pending list is introduced, extents in the beginning
> > -         * will get wiped out.
> > -         */
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >          return CXL_MBOX_SUCCESS;
> >      }
> >  
> > @@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >  
> >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> >          ct3d->dc.total_extent_count += 1;
> > -        /*
> > -         * TODO: we will add a pending extent list based on event log record
> > -         * and process the list accordingly here.
> > -         */
> >      }
> > +    /* Remove the first extent group in the pending list*/
> > +    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >  
> >      return CXL_MBOX_SUCCESS;
> >  }
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index c2cdd6d506..e892b3de7b 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> >          ct3d->dc.total_capacity += region->len;
> >      }
> >      QTAILQ_INIT(&ct3d->dc.extents);
> > +    QTAILQ_INIT(&ct3d->dc.extents_pending);
> >  
> >      return true;
> >  }
> > @@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> >  static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> >  {
> >      CXLDCExtent *ent, *ent_next;
> > +    CXLDCExtentGroup *group, *group_next;
> >  
> >      QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
> >          cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> >      }
> > +
> > +    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
> > +        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
> > +        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
> > +            cxl_remove_extent_from_extent_list(&group->list, ent);
> > +        }
> > +        g_free(group);
> > +    }
> >  }
> >  
> >  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > @@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
> >          return CXL_EVENT_TYPE_FAIL;
> >      case CXL_EVENT_LOG_FATAL:
> >          return CXL_EVENT_TYPE_FATAL;
> > -/* DCD not yet supported */
> >      default:
> >          return -EINVAL;
> >      }
> > @@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
> >      }
> >  }
> >  
> > +/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
> > +static const QemuUUID dynamic_capacity_uuid = {
> > +    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> > +                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> > +};
> > +
> > +typedef enum CXLDCEventType {
> > +    DC_EVENT_ADD_CAPACITY = 0x0,
> > +    DC_EVENT_RELEASE_CAPACITY = 0x1,
> > +    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
> > +    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
> > +    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
> > +    DC_EVENT_CAPACITY_RELEASED = 0x5,
> > +} CXLDCEventType;
> > +
> > +/*
> > + * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
> > + * the list.
> > + * Return value: return true if has overlaps; otherwise, return false
> > + */
> > +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> > +                                           uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_overlaps_range(&range1, &range2)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * Check whether the range [dpa, dpa + len - 1] is contained by extents in
> > + * the list.
> > + * Will check multiple extents containment once superset release is added.
> > + * Return value: return true if range is contained; otherwise, return false
> > + */
> > +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> > +                                    uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_contains_range(&range2, &range1)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
> > +                                                uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtentGroup *group;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    QTAILQ_FOREACH(group, list, node) {
> > +        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * The main function to process dynamic capacity event with extent list.
> > + * Currently DC extents add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> > +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> > +        CXLDCExtentRecordList *records, Error **errp)
> > +{
> > +    Object *obj;
> > +    CXLEventDynamicCapacity dCap = {};
> > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > +    CXLType3Dev *dcd;
> > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > +    uint32_t num_extents = 0;
> > +    CXLDCExtentRecordList *list;
> > +    CXLDCExtentGroup *group = NULL;
> > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> > +    uint64_t dpa, offset, len, block_size;
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +    int i;
> > +
> > +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> > +    if (!obj) {
> > +        error_setg(errp, "Unable to resolve CXL type 3 device");
> > +        return;
> > +    }
> > +
> > +    dcd = CXL_TYPE3(obj);
> > +    if (!dcd->dc.num_regions) {
> > +        error_setg(errp, "No dynamic capacity support from the device");
> > +        return;
> > +    }
> > +
> > +
> > +    if (rid >= dcd->dc.num_regions) {
> > +        error_setg(errp, "region id is too large");
> > +        return;
> > +    }
> > +    block_size = dcd->dc.regions[rid].block_size;
> > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = offset + dcd->dc.regions[rid].base;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                                     dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with pending DPA range");
> > +                return;
> > +            }
> > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with non-existing DPA range");
> > +                return;
> > +            }
> > +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot add DPA already accessible  to the same LD");
> > +                return;
> > +            }
> > +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                                     dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot add DPA again while still pending");
> > +                return;
> > +            }
> > +        }
> > +        list = list->next;
> > +        num_extents++;
> > +    }
> > +
> > +    /* Create extent list for event being passed to host */
> > +    i = 0;
> > +    list = records;
> > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = dcd->dc.regions[rid].base + offset;
> > +
> > +        extents[i].start_dpa = dpa;
> > +        extents[i].len = len;
> > +        memset(extents[i].tag, 0, 0x10);
> > +        extents[i].shared_seq = 0;
> > +        if (type == DC_EVENT_ADD_CAPACITY) {
> > +            group = cxl_insert_extent_to_extent_group(group,
> > +                                                      extents[i].start_dpa,
> > +                                                      extents[i].len,
> > +                                                      extents[i].tag,
> > +                                                      extents[i].shared_seq);
> > +        }
> > +
> > +        list = list->next;
> > +        i++;
> > +    }
> > +    if (group) {
> > +        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
> > +    }
> > +
> > +    /*
> > +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > +     *
> > +     * All Dynamic Capacity event records shall set the Event Record Severity
> > +     * field in the Common Event Record Format to Informational Event. All
> > +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > +     * Event Log.
> > +     */
> > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > +    dCap.type = type;
> > +    /* FIXME: for now, validity flag is cleared */
> > +    dCap.validity_flags = 0;
> > +    stw_le_p(&dCap.host_id, hid);
> > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > +    dCap.updated_region_id = 0;
> > +    dCap.flags = 0;
> > +    for (i = 0; i < num_extents; i++) {
> > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > +               sizeof(CXLDCExtentRaw));
> > +
> > +        if (i < num_extents - 1) {
> > +            /* Set "More" flag */
> > +            dCap.flags |= BIT(0);
> > +        }
> > +
> > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > +                             (CXLEventRecordRaw *)&dCap)) {
> > +            cxl_event_irq_assert(dcd);
> > +        }
> > +    }
> > +}
> > +
> > +void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
> > +                                  uint8_t sel_policy, uint8_t region_id,
> > +                                  const char *tag,
> > +                                  CXLDCExtentRecordList  *records,
> > +                                  Error **errp)
> > +{
> > +    enum {
> > +        CXL_SEL_POLICY_FREE,
> > +        CXL_SEL_POLICY_CONTIGUOUS,
> > +        CXL_SEL_POLICY_PRESCRIPTIVE,
> > +        CXL_SEL_POLICY_ENABLESHAREDACCESS,
> > +    };
> > +    switch (sel_policy) {
> > +    case CXL_SEL_POLICY_PRESCRIPTIVE:
> > +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
> > +                                                      DC_EVENT_ADD_CAPACITY,
> > +                                                      region_id, records, errp);
> > +        return;
> > +    default:
> > +        error_setg(errp, "Selection policy not supported");
> > +        return;
> > +    }
> > +}
> > +
> > +#define REMOVAL_POLICY_MASK 0xf
> > +#define REMOVAL_POLICY_PRESCRIPTIVE 1
> > +#define FORCED_REMOVAL_BIT BIT(4)
> > +
> > +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> > +                                      uint8_t flags, uint8_t region_id,
> > +                                      const char *tag,
> > +                                      CXLDCExtentRecordList  *records,
> > +                                      Error **errp)
> > +{
> > +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> > +
> > +    if (flags & FORCED_REMOVAL_BIT) {
> > +        /* TODO: enable forced removal in the future */
> > +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> > +        error_setg(errp, "Forced removal not supported yet");
> > +        return;
> > +    }
> > +
> > +    switch (flags & REMOVAL_POLICY_MASK) {
> > +    case REMOVAL_POLICY_PRESCRIPTIVE:
> > +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> > +                                                      region_id, records, errp);
> > +        return;
> > +    default:
> > +        error_setg(errp, "Removal policy not supported");
> > +        return;
> > +    }
> > +}
> > +
> >  static void ct3_class_init(ObjectClass *oc, void *data)
> >  {
> >      DeviceClass *dc = DEVICE_CLASS(oc);
> > diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
> > index 3e1851e32b..810685e0d5 100644
> > --- a/hw/mem/cxl_type3_stubs.c
> > +++ b/hw/mem/cxl_type3_stubs.c
> > @@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
> >  {
> >      error_setg(errp, "CXL Type 3 support is not compiled in");
> >  }
> > +
> > +void qmp_cxl_add_dynamic_capacity(const char *path,
> > +                                  uint16_t hid,
> > +                                  uint8_t sel_policy,
> > +                                  uint8_t region_id,
> > +                                  const char *tag,
> > +                                  CXLDCExtentRecordList  *records,
> > +                                  Error **errp)
> > +{
> > +    error_setg(errp, "CXL Type 3 support is not compiled in");
> > +}
> > +
> > +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> > +                                      uint8_t flags, uint8_t region_id,
> > +                                      const char *tag,
> > +                                      CXLDCExtentRecordList  *records,
> > +                                      Error **errp)
> > +{
> > +    error_setg(errp, "CXL Type 3 support is not compiled in");
> > +}
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index df3511e91b..c69ff6b5de 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
> >  } CXLDCExtent;
> >  typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
> >  
> > +typedef struct CXLDCExtentGroup {
> > +    CXLDCExtentList list;
> > +    QTAILQ_ENTRY(CXLDCExtentGroup) node;
> > +} CXLDCExtentGroup;
> > +typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
> > +
> >  typedef struct CXLDCRegion {
> >      uint64_t base;       /* aligned to 256*MiB */
> >      uint64_t decode_len; /* aligned to 256*MiB */
> > @@ -494,6 +500,7 @@ struct CXLType3Dev {
> >           */
> >          uint64_t total_capacity; /* 256M aligned */
> >          CXLDCExtentList extents;
> > +        CXLDCExtentGroupList extents_pending;
> >          uint32_t total_extent_count;
> >          uint32_t ext_list_gen_seq;
> >  
> > @@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> >  
> >  void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> >                                          CXLDCExtent *extent);
> > +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> > +                                      uint64_t len, uint8_t *tag,
> > +                                      uint16_t shared_seq);
> > +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> > +                       unsigned long size);
> > +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> > +                                    uint64_t dpa, uint64_t len);
> > +CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
> > +                                                    uint64_t dpa,
> > +                                                    uint64_t len,
> > +                                                    uint8_t *tag,
> > +                                                    uint16_t shared_seq);
> > +void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
> > +                                       CXLDCExtentGroup *group);
> > +void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
> >  #endif
> > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> > index 5170b8dbf8..38cadaa0f3 100644
> > --- a/include/hw/cxl/cxl_events.h
> > +++ b/include/hw/cxl/cxl_events.h
> > @@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
> >      uint8_t reserved[0x3d];
> >  } QEMU_PACKED CXLEventMemoryModule;
> >  
> > +/*
> > + * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
> > + * All fields little endian.
> > + */
> > +typedef struct CXLEventDynamicCapacity {
> > +    CXLEventRecordHdr hdr;
> > +    uint8_t type;
> > +    uint8_t validity_flags;
> > +    uint16_t host_id;
> > +    uint8_t updated_region_id;
> > +    uint8_t flags;
> > +    uint8_t reserved2[2];
> > +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
> > +    uint8_t reserved[0x18];
> > +    uint32_t extents_avail;
> > +    uint32_t tags_avail;
> > +} QEMU_PACKED CXLEventDynamicCapacity;
> > +
> >  #endif /* CXL_EVENTS_H */
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 4281726dec..2dcf03d973 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -361,3 +361,72 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDCExtentRecord:
> 
> Such traffic jams of capital letters are hard to read.  What about
> CxlDynamicCapacityExtent?
> 
> > +#
> > +# Record of a single extent to add/release
> 
> Suggest "A dynamic capacity extent."
> 
> > +#
> > +# @offset: offset to the start of the region where the extent to be operated
> 
> Blank line here, please.
> 
> 
> 
> > +# @len: length of the extent
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'CXLDCExtentRecord',
> > +  'data': {
> > +      'offset':'uint64',
> > +      'len': 'uint64'
> > +  }
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to start add dynamic capacity extents flow. The device will
> > +# have to acknowledged the acceptance of the extents before they are usable.
> 
> This text needs work.  More on that at the end of my review.

Yes. I will work on it for the next version once all the feedbacks
are collected and comments are resolved.

See below.

> 
> docs/devel/qapi-code-gen.rst:
> 
>     For legibility, wrap text paragraphs so every line is at most 70
>     characters long.
> 
>     Separate sentences with two spaces.
> 
> More elsewhere.
> 
> > +#
> > +# @path: CXL DCD canonical QOM path
> 
> I'd prefer @qom-path, unless you can make a consistency argument for
> @path.
> 
> Sure the QOM path needs to be canonical?
> 
> If not, what about "path to the CXL dynamic capacity device in the QOM
> tree".  Intentionally close to existing descriptions of @qom-path
> elsewhere.

From the same file, I saw "path" was used for other commands, like
"cxl-inject-memory-module-event", so I followed it.
DCD is nothing different from "type 3 device" expect it can dynamically
change capacity. 
Renaming it to "qom-path" is no problem for me, just want to make sure it
will not break the naming consistency.

> 
> > +# @hid: host id
> 
> @host-id, unless "HID" is established terminology in CXL DCD land.

host-id works.
> 
> What is a host ID?

It is an id identifying the host to which the capacity is being added.

> 
> > +# @selection-policy: policy to use for selecting extents for adding capacity
> 
> Where are selection policies defined?

It is defined in CXL specification: Specifies the policy to use for selecting
which extents comprise the added capacity

> 
> > +# @region-id: id of the region where the extent to add
> 
> Is "region ID" the established terminology in CXL DCD land?  Or is
> "region number" also used?  I'm asking because "ID" in this QEMU device
> context suggests a connection to a qdev ID.
> 
> If region number is fine, I'd rename to just @region, and rephrase the
> description to avoid "ID".  Perhaps "number of the region the extent is
> to be added to".  Not entirely happy with the phrasing, doesn't exactly
> roll off the tongue, but "where the extent to add" sounds worse to my
> ears.  Mind, I'm not a native speaker.

Yes. region number is fine. Will rename it as "region"

> 
> > +# @tag: Context field
> 
> What is this about?

Based on the specification, it is "Context field utilized by implementations
that make use of the Dynamic Capacity feature.". Basically, it is a
string (label) attached to an dynamic capacity extent so we can achieve
specific purpose, like identifying or grouping extents.

> 
> > +# @extents: Extents to add
> 
> Blank lines between argument descriptions, please.
> 
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'hid': 'uint16',
> > +            'selection-policy': 'uint8',
> > +            'region-id': 'uint8',
> > +            'tag': 'str',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to start release dynamic capacity extents flow. The host will
> > +# need to respond to indicate that it has released the capacity before it
> > +# is made unavailable for read and write and can be re-added.
> 
> This text needs work.  More on that at the end of my review.

Will do.

> 
> > +#
> > +# @path: CXL DCD canonical QOM path
> 
> My comment on cxl-add-dynamic-capacity applies.
> 
> > +# @hid: host id
> 
> Likewise.
> 
> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> > +#     sanitize on release, bit[7:6] reserved
> 
> Where are these flags defined?

Defined in the CXL specification, it defines the release behaviour.

> 
> > +# @region-id: id of the region where the extent to release
> 
> My comment on cxl-add-dynamic-capacity applies.
> 
> > +# @tag: Context field
> 
> Likewise.
> 
> > +# @extents: Extents to release
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'hid': 'uint16',
> > +            'flags': 'uint8',
> > +            'region-id': 'uint8',
> > +            'tag': 'str',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> 
> During review of v5, you wrote:
> 
>     For add command, the host will send a mailbox command to response to
>     the add request to the device to indicate whether it accepts the add
>     capacity offer or not.
>     
>     For release command, the host send a mailbox command (not always a
>     response since the host can proactively release capacity if it does
>     not need it any more) to device to ask device release the capacity.
> 
> Can you briefly sketch the protocol?  Peers and messages involved.
> Possibly as a state diagram.

Need to think about it. If we can polish the text nicely, maybe the
sketch is not needed. My concern is that the sketch may
introduce unwanted complexity as we expose too much details. The two
commands provide ways to add/release dynamic capacity to/from a host,
that is all. All the other information, like what the host will do, or
how the device will react, are consequence of the command, not sure
whether we want to include here.

@Jonathan, Any thoughts on this?

Fan

> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-26 17:31     ` fan
@ 2024-04-29  7:58       ` Markus Armbruster
  2024-04-30 17:17         ` fan
                           ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Markus Armbruster @ 2024-04-29  7:58 UTC (permalink / raw)
  To: fan
  Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
	ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

fan <nifan.cxl@gmail.com> writes:

> On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:
>> nifan.cxl@gmail.com writes:

[...]

>> > diff --git a/qapi/cxl.json b/qapi/cxl.json
>> > index 4281726dec..2dcf03d973 100644
>> > --- a/qapi/cxl.json
>> > +++ b/qapi/cxl.json
>> > @@ -361,3 +361,72 @@
>> >  ##
>> >  {'command': 'cxl-inject-correctable-error',
>> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
>> > +
>> > +##
>> > +# @CXLDCExtentRecord:
>> 
>> Such traffic jams of capital letters are hard to read.  What about
>> CxlDynamicCapacityExtent?
>> 
>> > +#
>> > +# Record of a single extent to add/release
>> 
>> Suggest "A dynamic capacity extent."
>> 
>> > +#
>> > +# @offset: offset to the start of the region where the extent to be operated
>> 
>> Blank line here, please.
>> 
>> 
>> 
>> > +# @len: length of the extent
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'struct': 'CXLDCExtentRecord',
>> > +  'data': {
>> > +      'offset':'uint64',
>> > +      'len': 'uint64'
>> > +  }
>> > +}
>> > +
>> > +##
>> > +# @cxl-add-dynamic-capacity:
>> > +#
>> > +# Command to start add dynamic capacity extents flow. The device will
>> > +# have to acknowledged the acceptance of the extents before they are usable.
>> 
>> This text needs work.  More on that at the end of my review.
>
> Yes. I will work on it for the next version once all the feedbacks
> are collected and comments are resolved.
>
> See below.
>
>> 
>> docs/devel/qapi-code-gen.rst:
>> 
>>     For legibility, wrap text paragraphs so every line is at most 70
>>     characters long.
>> 
>>     Separate sentences with two spaces.
>> 
>> More elsewhere.
>> 
>> > +#
>> > +# @path: CXL DCD canonical QOM path
>> 
>> I'd prefer @qom-path, unless you can make a consistency argument for
>> @path.
>> 
>> Sure the QOM path needs to be canonical?
>> 
>> If not, what about "path to the CXL dynamic capacity device in the QOM
>> tree".  Intentionally close to existing descriptions of @qom-path
>> elsewhere.
>
> From the same file, I saw "path" was used for other commands, like
> "cxl-inject-memory-module-event", so I followed it.
> DCD is nothing different from "type 3 device" expect it can dynamically
> change capacity. 
> Renaming it to "qom-path" is no problem for me, just want to make sure it
> will not break the naming consistency.

Both @path and @qom-path are used (sadly).  @path is used for all kinds
of paths, whereas @qom-path is only used for QOM paths.  That's why I
prefer it.

However, you're making a compelling local consistency argument: cxl.json
uses only @path.  Sticking to that makes sense.

>> > +# @hid: host id
>> 
>> @host-id, unless "HID" is established terminology in CXL DCD land.
>
> host-id works.
>> 
>> What is a host ID?
>
> It is an id identifying the host to which the capacity is being added.

How are these IDs assigned?

>> > +# @selection-policy: policy to use for selecting extents for adding capacity
>> 
>> Where are selection policies defined?
>
> It is defined in CXL specification: Specifies the policy to use for selecting
> which extents comprise the added capacity

Include a reference to the spec here?

>> > +# @region-id: id of the region where the extent to add
>> 
>> Is "region ID" the established terminology in CXL DCD land?  Or is
>> "region number" also used?  I'm asking because "ID" in this QEMU device
>> context suggests a connection to a qdev ID.
>> 
>> If region number is fine, I'd rename to just @region, and rephrase the
>> description to avoid "ID".  Perhaps "number of the region the extent is
>> to be added to".  Not entirely happy with the phrasing, doesn't exactly
>> roll off the tongue, but "where the extent to add" sounds worse to my
>> ears.  Mind, I'm not a native speaker.
>
> Yes. region number is fine. Will rename it as "region"
>
>> 
>> > +# @tag: Context field
>> 
>> What is this about?
>
> Based on the specification, it is "Context field utilized by implementations
> that make use of the Dynamic Capacity feature.". Basically, it is a
> string (label) attached to an dynamic capacity extent so we can achieve
> specific purpose, like identifying or grouping extents.

Include a reference to the spec here?

>> > +# @extents: Extents to add
>> 
>> Blank lines between argument descriptions, please.
>> 
>> > +#
>> > +# Since : 9.1
>> > +##
>> > +{ 'command': 'cxl-add-dynamic-capacity',
>> > +  'data': { 'path': 'str',
>> > +            'hid': 'uint16',
>> > +            'selection-policy': 'uint8',
>> > +            'region-id': 'uint8',
>> > +            'tag': 'str',
>> > +            'extents': [ 'CXLDCExtentRecord' ]
>> > +           }
>> > +}
>> > +
>> > +##
>> > +# @cxl-release-dynamic-capacity:
>> > +#
>> > +# Command to start release dynamic capacity extents flow. The host will
>> > +# need to respond to indicate that it has released the capacity before it
>> > +# is made unavailable for read and write and can be re-added.
>> 
>> This text needs work.  More on that at the end of my review.
>
> Will do.
>
>> 
>> > +#
>> > +# @path: CXL DCD canonical QOM path
>> 
>> My comment on cxl-add-dynamic-capacity applies.
>> 
>> > +# @hid: host id
>> 
>> Likewise.
>> 
>> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
>> > +#     sanitize on release, bit[7:6] reserved
>> 
>> Where are these flags defined?
>
> Defined in the CXL specification, it defines the release behaviour.

Include a reference to the spec here?

Is the numeric encoding of flags appropriate?

In general, we prefer symbolic encodings.  Numeric encodings can make
sense when

• the encoding is stable, and

• QEMU doesn't need to decode it, only pass it on to something else, and

• both the QMP client and the "something else" prefer a numeric
  encoding.

>> > +# @region-id: id of the region where the extent to release
>> 
>> My comment on cxl-add-dynamic-capacity applies.
>> 
>> > +# @tag: Context field
>> 
>> Likewise.
>> 
>> > +# @extents: Extents to release
>> > +#
>> > +# Since : 9.1
>> > +##
>> > +{ 'command': 'cxl-release-dynamic-capacity',
>> > +  'data': { 'path': 'str',
>> > +            'hid': 'uint16',
>> > +            'flags': 'uint8',
>> > +            'region-id': 'uint8',
>> > +            'tag': 'str',
>> > +            'extents': [ 'CXLDCExtentRecord' ]
>> > +           }
>> > +}
>> 
>> During review of v5, you wrote:
>> 
>>     For add command, the host will send a mailbox command to response to
>>     the add request to the device to indicate whether it accepts the add
>>     capacity offer or not.
>>     
>>     For release command, the host send a mailbox command (not always a
>>     response since the host can proactively release capacity if it does
>>     not need it any more) to device to ask device release the capacity.
>> 
>> Can you briefly sketch the protocol?  Peers and messages involved.
>> Possibly as a state diagram.
>
> Need to think about it. If we can polish the text nicely, maybe the
> sketch is not needed. My concern is that the sketch may
> introduce unwanted complexity as we expose too much details. The two
> commands provide ways to add/release dynamic capacity to/from a host,
> that is all. All the other information, like what the host will do, or
> how the device will react, are consequence of the command, not sure
> whether we want to include here.

The protocol sketch is for me, not necessarily the doc comment.  I'd
like to understand at high level how this stuff works, because only then
can I meaningfully review the docs.

> @Jonathan, Any thoughts on this?

Thanks!


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-29  7:58       ` Markus Armbruster
@ 2024-04-30 17:17         ` fan
  2024-05-01 14:58             ` Jonathan Cameron via
  2024-04-30 17:21           ` Jonathan Cameron via
  2024-05-01 22:29         ` fan
  2 siblings, 1 reply; 65+ messages in thread
From: fan @ 2024-04-30 17:17 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: fan, qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
	ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

On Mon, Apr 29, 2024 at 09:58:42AM +0200, Markus Armbruster wrote:
> fan <nifan.cxl@gmail.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:
> >> nifan.cxl@gmail.com writes:
> 
> [...]
> 
> >> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> >> > index 4281726dec..2dcf03d973 100644
> >> > --- a/qapi/cxl.json
> >> > +++ b/qapi/cxl.json
> >> > @@ -361,3 +361,72 @@
> >> >  ##
> >> >  {'command': 'cxl-inject-correctable-error',
> >> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> >> > +
> >> > +##
> >> > +# @CXLDCExtentRecord:
> >> 
> >> Such traffic jams of capital letters are hard to read.  What about
> >> CxlDynamicCapacityExtent?
> >> 
> >> > +#
> >> > +# Record of a single extent to add/release
> >> 
> >> Suggest "A dynamic capacity extent."
> >> 
> >> > +#
> >> > +# @offset: offset to the start of the region where the extent to be operated
> >> 
> >> Blank line here, please.
> >> 
> >> 
> >> 
> >> > +# @len: length of the extent
> >> > +#
> >> > +# Since: 9.1
> >> > +##
> >> > +{ 'struct': 'CXLDCExtentRecord',
> >> > +  'data': {
> >> > +      'offset':'uint64',
> >> > +      'len': 'uint64'
> >> > +  }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-add-dynamic-capacity:
> >> > +#
> >> > +# Command to start add dynamic capacity extents flow. The device will
> >> > +# have to acknowledged the acceptance of the extents before they are usable.
> >> 
> >> This text needs work.  More on that at the end of my review.
> >
> > Yes. I will work on it for the next version once all the feedbacks
> > are collected and comments are resolved.
> >
> > See below.
> >
> >> 
> >> docs/devel/qapi-code-gen.rst:
> >> 
> >>     For legibility, wrap text paragraphs so every line is at most 70
> >>     characters long.
> >> 
> >>     Separate sentences with two spaces.
> >> 
> >> More elsewhere.
> >> 
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path
> >> 
> >> I'd prefer @qom-path, unless you can make a consistency argument for
> >> @path.
> >> 
> >> Sure the QOM path needs to be canonical?
> >> 
> >> If not, what about "path to the CXL dynamic capacity device in the QOM
> >> tree".  Intentionally close to existing descriptions of @qom-path
> >> elsewhere.
> >
> > From the same file, I saw "path" was used for other commands, like
> > "cxl-inject-memory-module-event", so I followed it.
> > DCD is nothing different from "type 3 device" expect it can dynamically
> > change capacity. 
> > Renaming it to "qom-path" is no problem for me, just want to make sure it
> > will not break the naming consistency.
> 
> Both @path and @qom-path are used (sadly).  @path is used for all kinds
> of paths, whereas @qom-path is only used for QOM paths.  That's why I
> prefer it.
> 
> However, you're making a compelling local consistency argument: cxl.json
> uses only @path.  Sticking to that makes sense.
> 
> >> > +# @hid: host id
> >> 
> >> @host-id, unless "HID" is established terminology in CXL DCD land.
> >
> > host-id works.
> >> 
> >> What is a host ID?
> >
> > It is an id identifying the host to which the capacity is being added.
> 
> How are these IDs assigned?

All the arguments passed to the command here are defined in CXL spec. I
will add reference to the spec.

Based on the spec, for LD-FAM (Fabric attached memory represented as
logical device), host id is the LD-ID of the host interface to which
the capacity is being added. LD-ID is a unique number (16-bit) assigned
to a host interface.

> 
> >> > +# @selection-policy: policy to use for selecting extents for adding capacity
> >> 
> >> Where are selection policies defined?
> >
> > It is defined in CXL specification: Specifies the policy to use for selecting
> > which extents comprise the added capacity
> 
> Include a reference to the spec here?
Wil do.
> 
> >> > +# @region-id: id of the region where the extent to add
> >> 
> >> Is "region ID" the established terminology in CXL DCD land?  Or is
> >> "region number" also used?  I'm asking because "ID" in this QEMU device
> >> context suggests a connection to a qdev ID.
> >> 
> >> If region number is fine, I'd rename to just @region, and rephrase the
> >> description to avoid "ID".  Perhaps "number of the region the extent is
> >> to be added to".  Not entirely happy with the phrasing, doesn't exactly
> >> roll off the tongue, but "where the extent to add" sounds worse to my
> >> ears.  Mind, I'm not a native speaker.
> >
> > Yes. region number is fine. Will rename it as "region"
> >
> >> 
> >> > +# @tag: Context field
> >> 
> >> What is this about?
> >
> > Based on the specification, it is "Context field utilized by implementations
> > that make use of the Dynamic Capacity feature.". Basically, it is a
> > string (label) attached to an dynamic capacity extent so we can achieve
> > specific purpose, like identifying or grouping extents.
> 
> Include a reference to the spec here?
Will do.
> 
> >> > +# @extents: Extents to add
> >> 
> >> Blank lines between argument descriptions, please.
> >> 
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-add-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'selection-policy': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-release-dynamic-capacity:
> >> > +#
> >> > +# Command to start release dynamic capacity extents flow. The host will
> >> > +# need to respond to indicate that it has released the capacity before it
> >> > +# is made unavailable for read and write and can be re-added.
> >> 
> >> This text needs work.  More on that at the end of my review.
> >
> > Will do.
> >
> >> 
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >> 
> >> > +# @hid: host id
> >> 
> >> Likewise.
> >> 
> >> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> >> > +#     sanitize on release, bit[7:6] reserved
> >> 
> >> Where are these flags defined?
> >
> > Defined in the CXL specification, it defines the release behaviour.
> 
> Include a reference to the spec here?
Will do.
> 
> Is the numeric encoding of flags appropriate?
> 
> In general, we prefer symbolic encodings.  Numeric encodings can make
> sense when
> 
> • the encoding is stable, and
> 
> • QEMU doesn't need to decode it, only pass it on to something else, and
> 
> • both the QMP client and the "something else" prefer a numeric
>   encoding.
The encoding is from the specification, and we do not invent anything
here. It is stable and all the updates to the spec need to be backward
compatible.
> 
> >> > +# @region-id: id of the region where the extent to release
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >> 
> >> > +# @tag: Context field
> >> 
> >> Likewise.
> >> 
> >> > +# @extents: Extents to release
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-release-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'flags': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}
> >> 
> >> During review of v5, you wrote:
> >> 
> >>     For add command, the host will send a mailbox command to response to
> >>     the add request to the device to indicate whether it accepts the add
> >>     capacity offer or not.
> >>     
> >>     For release command, the host send a mailbox command (not always a
> >>     response since the host can proactively release capacity if it does
> >>     not need it any more) to device to ask device release the capacity.
> >> 
> >> Can you briefly sketch the protocol?  Peers and messages involved.
> >> Possibly as a state diagram.
> >
> > Need to think about it. If we can polish the text nicely, maybe the
> > sketch is not needed. My concern is that the sketch may
> > introduce unwanted complexity as we expose too much details. The two
> > commands provide ways to add/release dynamic capacity to/from a host,
> > that is all. All the other information, like what the host will do, or
> > how the device will react, are consequence of the command, not sure
> > whether we want to include here.
> 
> The protocol sketch is for me, not necessarily the doc comment.  I'd
> like to understand at high level how this stuff works, because only then
> can I meaningfully review the docs.

--------------------------------
For add command, saying a user sends a request to FM to ask to add
extent A of the device (managed by FM) to host 0.
The function cxl-add-dynamic-capacity simulates what FM needs to do.
1. Verify extent A is valid (behaviour defined by the spec), return
error if not; otherwise,
2. Add a record to the device's event log (indicating the intent to
add extent A to host 0), update device internal extent tracking status,
signal an interrupt to host 0;
(The above step 1 & 2 are performed in the QMP interface, following
operations are QMP irrelevant, only host and device involved.)
3. Once the interrupt is received, host 0 fetch the event record from
the device's event log through some mailbox command (out of scope
of this patch series).
4. Host 0 decides whether it accepts extent A or not. Whether accept or
reject, host needs to send a response (add-response mailbox command) to
the device so the device can update its internal extent tracking
status accordingly.
The device return a value to the host showing whether the response is
successful or failed.
5. Based on the mailbox command return value, the host process
accordingly.
6. The host sends a mailbox command to the device to clear the event
record in the device's event log. 

---------------------------------
For release command, saying a user sends a request to FM to ask host 0
to release extent A and return it back to the device (managed by FM).

The function cxl-release-dynamic-capacity simulates what FM needs to do.
1. Verify extent A is valid (defined by the spec), return error if not;
otherwise,
2. Add a record to the event log (indicating the intent to
release extent A from host 0), signal an interrupt to host 0;
(The above step 1 & 2 are performed in the QMP interface, following
operations are QMP irrelevant, only host and device involved.
3. Once the interrupt is received, host 0 fetch the event record from
the device's event log through some mailbox command (out of scope
of this patch series).
4. Host 0 decides whether it can release extent A or not. Whether can or
cannot release, host needs to send a release (mailbox command) to the device
so the device can update its internal extent tracking status accordingly.
The device returns a value to host 0 showing whether the release is
successful or failed.
5. Based on the returned value, the host process accordingly.
6. The host sends mailbox command to clear the event record in the
device's event log. 

For release command, it is more complicated. Based on the release flag
passed to FM, FM can behaviour differently. For example, if the
forced-removal flag is set, FM can directly get the extent back from a
host for other uses without waiting for the host to send command to the
device. For the above step 2, their may be not event record to the event
log (no supported in this patch series yet).

Also, for the release interface here, it simulates FM initializes the
release request.
There is another case where the host can proactively release extents it
do not need any more back to device. However, this case is out of the
scope of this release interface.

Hope the above text helps a little for the context here.
Let me know if further clarification is needed.

Thanks,
Fan



> 
> > @Jonathan, Any thoughts on this?
> 
> Thanks!
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-29  7:58       ` Markus Armbruster
@ 2024-04-30 17:21           ` Jonathan Cameron via
  2024-04-30 17:21           ` Jonathan Cameron via
  2024-05-01 22:29         ` fan
  2 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-30 17:21 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: fan, qemu-devel, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Mon, 29 Apr 2024 09:58:42 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> fan <nifan.cxl@gmail.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:  
> >> nifan.cxl@gmail.com writes:  
> 
> [...]
> 
> >> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> >> > index 4281726dec..2dcf03d973 100644
> >> > --- a/qapi/cxl.json
> >> > +++ b/qapi/cxl.json
> >> > @@ -361,3 +361,72 @@
> >> >  ##
> >> >  {'command': 'cxl-inject-correctable-error',
> >> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> >> > +
> >> > +##
> >> > +# @CXLDCExtentRecord:  
> >> 
> >> Such traffic jams of capital letters are hard to read.  What about
> >> CxlDynamicCapacityExtent?
> >>   
> >> > +#
> >> > +# Record of a single extent to add/release  
> >> 
> >> Suggest "A dynamic capacity extent."
> >>   
> >> > +#
> >> > +# @offset: offset to the start of the region where the extent to be operated  
> >> 
> >> Blank line here, please.
> >> 
> >> 
> >>   
> >> > +# @len: length of the extent
> >> > +#
> >> > +# Since: 9.1
> >> > +##
> >> > +{ 'struct': 'CXLDCExtentRecord',
> >> > +  'data': {
> >> > +      'offset':'uint64',
> >> > +      'len': 'uint64'
> >> > +  }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-add-dynamic-capacity:
> >> > +#
> >> > +# Command to start add dynamic capacity extents flow. The device will
> >> > +# have to acknowledged the acceptance of the extents before they are usable.  
> >> 
> >> This text needs work.  More on that at the end of my review.  
> >
> > Yes. I will work on it for the next version once all the feedbacks
> > are collected and comments are resolved.
> >
> > See below.
> >  
> >> 
> >> docs/devel/qapi-code-gen.rst:
> >> 
> >>     For legibility, wrap text paragraphs so every line is at most 70
> >>     characters long.
> >> 
> >>     Separate sentences with two spaces.
> >> 
> >> More elsewhere.
> >>   
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path  
> >> 
> >> I'd prefer @qom-path, unless you can make a consistency argument for
> >> @path.
> >> 
> >> Sure the QOM path needs to be canonical?
> >> 
> >> If not, what about "path to the CXL dynamic capacity device in the QOM
> >> tree".  Intentionally close to existing descriptions of @qom-path
> >> elsewhere.  
> >
> > From the same file, I saw "path" was used for other commands, like
> > "cxl-inject-memory-module-event", so I followed it.
> > DCD is nothing different from "type 3 device" expect it can dynamically
> > change capacity. 
> > Renaming it to "qom-path" is no problem for me, just want to make sure it
> > will not break the naming consistency.  
> 
> Both @path and @qom-path are used (sadly).  @path is used for all kinds
> of paths, whereas @qom-path is only used for QOM paths.  That's why I
> prefer it.
> 
> However, you're making a compelling local consistency argument: cxl.json
> uses only @path.  Sticking to that makes sense.
> 
> >> > +# @hid: host id  
> >> 
> >> @host-id, unless "HID" is established terminology in CXL DCD land.  
> >
> > host-id works.  
> >> 
> >> What is a host ID?  
> >
> > It is an id identifying the host to which the capacity is being added.  
> 
> How are these IDs assigned?

Right now there is only 1 option.  We can drop this for now and introduce
it when needed (Default of 0 will be fine).  Multi head device patches
that will need this are on list though I haven't read them yet :(

> 
> >> > +# @selection-policy: policy to use for selecting extents for adding capacity  
> >> 
> >> Where are selection policies defined?  
> >
> > It is defined in CXL specification: Specifies the policy to use for selecting
> > which extents comprise the added capacity  
> 
> Include a reference to the spec here?
> 
> >> > +# @region-id: id of the region where the extent to add  
> >> 
> >> Is "region ID" the established terminology in CXL DCD land?  Or is
> >> "region number" also used?  I'm asking because "ID" in this QEMU device
> >> context suggests a connection to a qdev ID.
> >> 
> >> If region number is fine, I'd rename to just @region, and rephrase the
> >> description to avoid "ID".  Perhaps "number of the region the extent is
> >> to be added to".  Not entirely happy with the phrasing, doesn't exactly
> >> roll off the tongue, but "where the extent to add" sounds worse to my
> >> ears.  Mind, I'm not a native speaker.  
> >
> > Yes. region number is fine. Will rename it as "region"
> >  
> >>   
> >> > +# @tag: Context field  
> >> 
> >> What is this about?  
> >
> > Based on the specification, it is "Context field utilized by implementations
> > that make use of the Dynamic Capacity feature.". Basically, it is a
> > string (label) attached to an dynamic capacity extent so we can achieve
> > specific purpose, like identifying or grouping extents.  
> 
> Include a reference to the spec here?

Agreed - that is the best we can do. It'sa  magic value.

> 
> >> > +# @extents: Extents to add  
> >> 
> >> Blank lines between argument descriptions, please.
> >>   
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-add-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'selection-policy': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-release-dynamic-capacity:
> >> > +#
> >> > +# Command to start release dynamic capacity extents flow. The host will
> >> > +# need to respond to indicate that it has released the capacity before it
> >> > +# is made unavailable for read and write and can be re-added.  
> >> 
> >> This text needs work.  More on that at the end of my review.  
> >
> > Will do.
> >  
> >>   
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path  
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >>   
> >> > +# @hid: host id  
> >> 
> >> Likewise.
> >>   
> >> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> >> > +#     sanitize on release, bit[7:6] reserved  
> >> 
> >> Where are these flags defined?  
> >
> > Defined in the CXL specification, it defines the release behaviour.  
> 
> Include a reference to the spec here?
> 
> Is the numeric encoding of flags appropriate?

Could definitely break them out as a bunch of flags / symbolic for
the policy.

> 
> In general, we prefer symbolic encodings.  Numeric encodings can make
> sense when
> 
> • the encoding is stable, and
> 
> • QEMU doesn't need to decode it, only pass it on to something else, and
> 
> • both the QMP client and the "something else" prefer a numeric
>   encoding.

I don't think that really applies here - though Gregory's shim from
MCTP to this will have to go through a simple dance to fill them in.

> 
> >> > +# @region-id: id of the region where the extent to release  
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >>   
> >> > +# @tag: Context field  
> >> 
> >> Likewise.
> >>   
> >> > +# @extents: Extents to release
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-release-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'flags': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}  
> >> 
> >> During review of v5, you wrote:
> >> 
> >>     For add command, the host will send a mailbox command to response to
> >>     the add request to the device to indicate whether it accepts the add
> >>     capacity offer or not.
> >>     
> >>     For release command, the host send a mailbox command (not always a
> >>     response since the host can proactively release capacity if it does
> >>     not need it any more) to device to ask device release the capacity.
> >> 
> >> Can you briefly sketch the protocol?  Peers and messages involved.
> >> Possibly as a state diagram.  
> >
> > Need to think about it. If we can polish the text nicely, maybe the
> > sketch is not needed. My concern is that the sketch may
> > introduce unwanted complexity as we expose too much details. The two
> > commands provide ways to add/release dynamic capacity to/from a host,
> > that is all. All the other information, like what the host will do, or
> > how the device will react, are consequence of the command, not sure
> > whether we want to include here.  
> 
> The protocol sketch is for me, not necessarily the doc comment.  I'd
> like to understand at high level how this stuff works, because only then
> can I meaningfully review the docs.
> 
> > @Jonathan, Any thoughts on this? 
Makes sense to have a bit of artwork to explain what is going on.
Suitable stuff for the cover letter or patch description for v8
as well as in reply here.  Simple flows should be enough, we don't
need to worry on the messy corner cases (hopefully)

1) Offer extents to a host  + it accepts.
2) Ask for it back, it gives it back.

I can put my non existent artistic talents on it later in the
week if Fan doesn't get there first.

> 
> Thanks!
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-30 17:21           ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-30 17:21 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: fan, qemu-devel, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Mon, 29 Apr 2024 09:58:42 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> fan <nifan.cxl@gmail.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:  
> >> nifan.cxl@gmail.com writes:  
> 
> [...]
> 
> >> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> >> > index 4281726dec..2dcf03d973 100644
> >> > --- a/qapi/cxl.json
> >> > +++ b/qapi/cxl.json
> >> > @@ -361,3 +361,72 @@
> >> >  ##
> >> >  {'command': 'cxl-inject-correctable-error',
> >> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> >> > +
> >> > +##
> >> > +# @CXLDCExtentRecord:  
> >> 
> >> Such traffic jams of capital letters are hard to read.  What about
> >> CxlDynamicCapacityExtent?
> >>   
> >> > +#
> >> > +# Record of a single extent to add/release  
> >> 
> >> Suggest "A dynamic capacity extent."
> >>   
> >> > +#
> >> > +# @offset: offset to the start of the region where the extent to be operated  
> >> 
> >> Blank line here, please.
> >> 
> >> 
> >>   
> >> > +# @len: length of the extent
> >> > +#
> >> > +# Since: 9.1
> >> > +##
> >> > +{ 'struct': 'CXLDCExtentRecord',
> >> > +  'data': {
> >> > +      'offset':'uint64',
> >> > +      'len': 'uint64'
> >> > +  }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-add-dynamic-capacity:
> >> > +#
> >> > +# Command to start add dynamic capacity extents flow. The device will
> >> > +# have to acknowledged the acceptance of the extents before they are usable.  
> >> 
> >> This text needs work.  More on that at the end of my review.  
> >
> > Yes. I will work on it for the next version once all the feedbacks
> > are collected and comments are resolved.
> >
> > See below.
> >  
> >> 
> >> docs/devel/qapi-code-gen.rst:
> >> 
> >>     For legibility, wrap text paragraphs so every line is at most 70
> >>     characters long.
> >> 
> >>     Separate sentences with two spaces.
> >> 
> >> More elsewhere.
> >>   
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path  
> >> 
> >> I'd prefer @qom-path, unless you can make a consistency argument for
> >> @path.
> >> 
> >> Sure the QOM path needs to be canonical?
> >> 
> >> If not, what about "path to the CXL dynamic capacity device in the QOM
> >> tree".  Intentionally close to existing descriptions of @qom-path
> >> elsewhere.  
> >
> > From the same file, I saw "path" was used for other commands, like
> > "cxl-inject-memory-module-event", so I followed it.
> > DCD is nothing different from "type 3 device" expect it can dynamically
> > change capacity. 
> > Renaming it to "qom-path" is no problem for me, just want to make sure it
> > will not break the naming consistency.  
> 
> Both @path and @qom-path are used (sadly).  @path is used for all kinds
> of paths, whereas @qom-path is only used for QOM paths.  That's why I
> prefer it.
> 
> However, you're making a compelling local consistency argument: cxl.json
> uses only @path.  Sticking to that makes sense.
> 
> >> > +# @hid: host id  
> >> 
> >> @host-id, unless "HID" is established terminology in CXL DCD land.  
> >
> > host-id works.  
> >> 
> >> What is a host ID?  
> >
> > It is an id identifying the host to which the capacity is being added.  
> 
> How are these IDs assigned?

Right now there is only 1 option.  We can drop this for now and introduce
it when needed (Default of 0 will be fine).  Multi head device patches
that will need this are on list though I haven't read them yet :(

> 
> >> > +# @selection-policy: policy to use for selecting extents for adding capacity  
> >> 
> >> Where are selection policies defined?  
> >
> > It is defined in CXL specification: Specifies the policy to use for selecting
> > which extents comprise the added capacity  
> 
> Include a reference to the spec here?
> 
> >> > +# @region-id: id of the region where the extent to add  
> >> 
> >> Is "region ID" the established terminology in CXL DCD land?  Or is
> >> "region number" also used?  I'm asking because "ID" in this QEMU device
> >> context suggests a connection to a qdev ID.
> >> 
> >> If region number is fine, I'd rename to just @region, and rephrase the
> >> description to avoid "ID".  Perhaps "number of the region the extent is
> >> to be added to".  Not entirely happy with the phrasing, doesn't exactly
> >> roll off the tongue, but "where the extent to add" sounds worse to my
> >> ears.  Mind, I'm not a native speaker.  
> >
> > Yes. region number is fine. Will rename it as "region"
> >  
> >>   
> >> > +# @tag: Context field  
> >> 
> >> What is this about?  
> >
> > Based on the specification, it is "Context field utilized by implementations
> > that make use of the Dynamic Capacity feature.". Basically, it is a
> > string (label) attached to an dynamic capacity extent so we can achieve
> > specific purpose, like identifying or grouping extents.  
> 
> Include a reference to the spec here?

Agreed - that is the best we can do. It'sa  magic value.

> 
> >> > +# @extents: Extents to add  
> >> 
> >> Blank lines between argument descriptions, please.
> >>   
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-add-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'selection-policy': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-release-dynamic-capacity:
> >> > +#
> >> > +# Command to start release dynamic capacity extents flow. The host will
> >> > +# need to respond to indicate that it has released the capacity before it
> >> > +# is made unavailable for read and write and can be re-added.  
> >> 
> >> This text needs work.  More on that at the end of my review.  
> >
> > Will do.
> >  
> >>   
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path  
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >>   
> >> > +# @hid: host id  
> >> 
> >> Likewise.
> >>   
> >> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
> >> > +#     sanitize on release, bit[7:6] reserved  
> >> 
> >> Where are these flags defined?  
> >
> > Defined in the CXL specification, it defines the release behaviour.  
> 
> Include a reference to the spec here?
> 
> Is the numeric encoding of flags appropriate?

Could definitely break them out as a bunch of flags / symbolic for
the policy.

> 
> In general, we prefer symbolic encodings.  Numeric encodings can make
> sense when
> 
> • the encoding is stable, and
> 
> • QEMU doesn't need to decode it, only pass it on to something else, and
> 
> • both the QMP client and the "something else" prefer a numeric
>   encoding.

I don't think that really applies here - though Gregory's shim from
MCTP to this will have to go through a simple dance to fill them in.

> 
> >> > +# @region-id: id of the region where the extent to release  
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >>   
> >> > +# @tag: Context field  
> >> 
> >> Likewise.
> >>   
> >> > +# @extents: Extents to release
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-release-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'flags': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}  
> >> 
> >> During review of v5, you wrote:
> >> 
> >>     For add command, the host will send a mailbox command to response to
> >>     the add request to the device to indicate whether it accepts the add
> >>     capacity offer or not.
> >>     
> >>     For release command, the host send a mailbox command (not always a
> >>     response since the host can proactively release capacity if it does
> >>     not need it any more) to device to ask device release the capacity.
> >> 
> >> Can you briefly sketch the protocol?  Peers and messages involved.
> >> Possibly as a state diagram.  
> >
> > Need to think about it. If we can polish the text nicely, maybe the
> > sketch is not needed. My concern is that the sketch may
> > introduce unwanted complexity as we expose too much details. The two
> > commands provide ways to add/release dynamic capacity to/from a host,
> > that is all. All the other information, like what the host will do, or
> > how the device will react, are consequence of the command, not sure
> > whether we want to include here.  
> 
> The protocol sketch is for me, not necessarily the doc comment.  I'd
> like to understand at high level how this stuff works, because only then
> can I meaningfully review the docs.
> 
> > @Jonathan, Any thoughts on this? 
Makes sense to have a bit of artwork to explain what is going on.
Suitable stuff for the cover letter or patch description for v8
as well as in reply here.  Simple flows should be enough, we don't
need to worry on the messy corner cases (hopefully)

1) Offer extents to a host  + it accepts.
2) Ask for it back, it gives it back.

I can put my non existent artistic talents on it later in the
week if Fan doesn't get there first.

> 
> Thanks!
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-30 17:17         ` fan
@ 2024-05-01 14:58             ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-05-01 14:58 UTC (permalink / raw)
  To: fan
  Cc: Markus Armbruster, qemu-devel, linux-cxl, gregory.price,
	ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni




> > >> > +# @hid: host id  
> > >> 
> > >> @host-id, unless "HID" is established terminology in CXL DCD land.  
> > >
> > > host-id works.  
> > >> 
> > >> What is a host ID?  
> > >
> > > It is an id identifying the host to which the capacity is being added.  
> > 
> > How are these IDs assigned?  
> 
> All the arguments passed to the command here are defined in CXL spec. I
> will add reference to the spec.
> 
> Based on the spec, for LD-FAM (Fabric attached memory represented as
> logical device), host id is the LD-ID of the host interface to which
> the capacity is being added. LD-ID is a unique number (16-bit) assigned
> to a host interface.

Key here is the host doesn't know it.  This ID exists purely for rooting
to the appropriate host interface either via choosing a port on a
multihead Single Logical Device (SLD) (so today it's always 0 as we only
have one head) or if we ever implement a switch capable of handling MLDs
then the switch will handle routing of host PCIe accesses so it lands
on the interface defined by this ID (and the event turns up in that event log.

            Host A         Host B - could in theory be a RP on host A ;)
              |              |  Doesn't exist (yet, but there are partial.
             _|______________|_ patches for this on list.
            | LD 0         LD 1|
            |                  |
            |   Multi Head     |
            |   Single Logical |
            |  Device (MH-SLD) |
            |__________________|
Host view similar to the switch case, but just two direct
connected devices.

Or Switch and MLD case - we aren't emulating this yet at all

     Wiring / real topology                 Host View 
         
      Host A     Host B              Host A       Host B
        |          |                   |            |
     ___|__________|___               _|_          _|_
    |   \  SWITCH /    |             |SW0|        | | |
    |    \       /     |             | | |        | | |
    |    LD0   LD1     |             | | |        | | |
    |      \   /       |             | | |        | | |
    |        |         |             | | |        | | |
    |________|_________|             |_|_|        |_|_|
             |                         |            |
      Traffic tagged with LD           |            |
             |                         |            |
     ________|________________     ____|___     ____|___
    | Multilogical Device MLD |   |        |   |        |
    |        |                |   | Simple |   | Another|
    |       / \               |   | CXL    |   | CXL    |
    |      /   \              |   | Memory |   | Memory |
    |    Interfaces           |   | Device |   | Device |
    |   LD0     LD1           |   |        |   |        |
    |_________________________|   |________|   |________|

Note the hosts just see separate devices and switches with the fun exception that the
memory may actually be available to both at the same time.

Control plane for the switches and MLD see what is actually going on.

At this stage upshot is we could just default this to zero and add an optional
parameter to set it later.



...

> > >> > +# @extents: Extents to release
> > >> > +#
> > >> > +# Since : 9.1
> > >> > +##
> > >> > +{ 'command': 'cxl-release-dynamic-capacity',
> > >> > +  'data': { 'path': 'str',
> > >> > +            'hid': 'uint16',
> > >> > +            'flags': 'uint8',
> > >> > +            'region-id': 'uint8',
> > >> > +            'tag': 'str',
> > >> > +            'extents': [ 'CXLDCExtentRecord' ]
> > >> > +           }
> > >> > +}  
> > >> 
> > >> During review of v5, you wrote:
> > >> 
> > >>     For add command, the host will send a mailbox command to response to
> > >>     the add request to the device to indicate whether it accepts the add
> > >>     capacity offer or not.
> > >>     
> > >>     For release command, the host send a mailbox command (not always a
> > >>     response since the host can proactively release capacity if it does
> > >>     not need it any more) to device to ask device release the capacity.
> > >> 
> > >> Can you briefly sketch the protocol?  Peers and messages involved.
> > >> Possibly as a state diagram.  
> > >
> > > Need to think about it. If we can polish the text nicely, maybe the
> > > sketch is not needed. My concern is that the sketch may
> > > introduce unwanted complexity as we expose too much details. The two
> > > commands provide ways to add/release dynamic capacity to/from a host,
> > > that is all. All the other information, like what the host will do, or
> > > how the device will react, are consequence of the command, not sure
> > > whether we want to include here.  
> > 
> > The protocol sketch is for me, not necessarily the doc comment.  I'd
> > like to understand at high level how this stuff works, because only then
> > can I meaningfully review the docs.  
> 
> --------------------------------
> For add command, saying a user sends a request to FM to ask to add
> extent A of the device (managed by FM) to host 0.
> The function cxl-add-dynamic-capacity simulates what FM needs to do.

This gets a little fiddly as an explanation.  I'd argue this is more or
less at the level of the FM to device command flow so it's the device
verifying etc. (you could explain this interface as talking to an FM
that is talking to the device, but that just feels complicated to me).

> 1. Verify extent A is valid (behaviour defined by the spec), return
> error if not; otherwise,
> 2. Add a record to the device's event log (indicating the intent to
> add extent A to host 0), update device internal extent tracking status,
> signal an interrupt to host 0;
> (The above step 1 & 2 are performed in the QMP interface, following
> operations are QMP irrelevant, only host and device involved.)

In this patch.

> 3. Once the interrupt is received, host 0 fetch the event record from
> the device's event log through some mailbox command (out of scope
> of this patch series).

It's in patch 8.

> 4. Host 0 decides whether it accepts extent A or not. Whether accept or
> reject, host needs to send a response (add-response mailbox command) to
> the device so the device can update its internal extent tracking
> status accordingly.
> The device return a value to the host showing whether the response is
> successful or failed.

(assuming the host isn't buggy this always succeeds)

> 5. Based on the mailbox command return value, the host process
> accordingly.

Memory now useable by host if it accepted it successfully.

> 6. The host sends a mailbox command to the device to clear the event
> record in the device's event log. 
> 
> ---------------------------------
> For release command, saying a user sends a request to FM to ask host 0
> to release extent A and return it back to the device (managed by FM).
> 
> The function cxl-release-dynamic-capacity simulates what FM needs to do.
> 1. Verify extent A is valid (defined by the spec), return error if not;
> otherwise,
> 2. Add a record to the event log (indicating the intent to
> release extent A from host 0), signal an interrupt to host 0;
> (The above step 1 & 2 are performed in the QMP interface, following
> operations are QMP irrelevant, only host and device involved.
> 3. Once the interrupt is received, host 0 fetch the event record from
> the device's event log through some mailbox command (out of scope
> of this patch series).
> 4. Host 0 decides whether it can release extent A or not. Whether can or
> cannot release, host needs to send a release (mailbox command) to the device
> so the device can update its internal extent tracking status accordingly.
> The device returns a value to host 0 showing whether the release is
> successful or failed.
> 5. Based on the returned value, the host process accordingly.
> 6. The host sends mailbox command to clear the event record in the
> device's event log. 
> 
> For release command, it is more complicated. Based on the release flag
> passed to FM, FM can behaviour differently. For example, if the
> forced-removal flag is set, FM can directly get the extent back from a
> host for other uses without waiting for the host to send command to the
> device. For the above step 2, their may be not event record to the event
> log (no supported in this patch series yet).
I thought we weren't doing force remove yet?  So for that we could
set default value as normal release until we add that support perhaps.

> 
> Also, for the release interface here, it simulates FM initializes the
> release request.
> There is another case where the host can proactively release extents it
> do not need any more back to device. However, this case is out of the
> scope of this release interface.
> 
> Hope the above text helps a little for the context here.
> Let me know if further clarification is needed.

Only thing I'd add is that for now (because we don't need it for testing
the kernel flows) is that this does not provide any way for external
agents (e.g. our 'fabric manager' to find out what the state is - i.e.
if the extents have been accepted by the host etc). That stuff is all
defined by the spec, but not yet in the QMP interface.  At somepoint
we may want to add that as a state query type interface.

Jonathan

p.s. Our emails raced yesterday, so great you put together this explanation
of the flows before I got to it :)

> 
> Thanks,
> Fan
> 
> 
> 
> >   
> > > @Jonathan, Any thoughts on this?  
> > 
> > Thanks!
> >   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-05-01 14:58             ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-05-01 14:58 UTC (permalink / raw)
  To: fan
  Cc: Markus Armbruster, qemu-devel, linux-cxl, gregory.price,
	ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni




> > >> > +# @hid: host id  
> > >> 
> > >> @host-id, unless "HID" is established terminology in CXL DCD land.  
> > >
> > > host-id works.  
> > >> 
> > >> What is a host ID?  
> > >
> > > It is an id identifying the host to which the capacity is being added.  
> > 
> > How are these IDs assigned?  
> 
> All the arguments passed to the command here are defined in CXL spec. I
> will add reference to the spec.
> 
> Based on the spec, for LD-FAM (Fabric attached memory represented as
> logical device), host id is the LD-ID of the host interface to which
> the capacity is being added. LD-ID is a unique number (16-bit) assigned
> to a host interface.

Key here is the host doesn't know it.  This ID exists purely for rooting
to the appropriate host interface either via choosing a port on a
multihead Single Logical Device (SLD) (so today it's always 0 as we only
have one head) or if we ever implement a switch capable of handling MLDs
then the switch will handle routing of host PCIe accesses so it lands
on the interface defined by this ID (and the event turns up in that event log.

            Host A         Host B - could in theory be a RP on host A ;)
              |              |  Doesn't exist (yet, but there are partial.
             _|______________|_ patches for this on list.
            | LD 0         LD 1|
            |                  |
            |   Multi Head     |
            |   Single Logical |
            |  Device (MH-SLD) |
            |__________________|
Host view similar to the switch case, but just two direct
connected devices.

Or Switch and MLD case - we aren't emulating this yet at all

     Wiring / real topology                 Host View 
         
      Host A     Host B              Host A       Host B
        |          |                   |            |
     ___|__________|___               _|_          _|_
    |   \  SWITCH /    |             |SW0|        | | |
    |    \       /     |             | | |        | | |
    |    LD0   LD1     |             | | |        | | |
    |      \   /       |             | | |        | | |
    |        |         |             | | |        | | |
    |________|_________|             |_|_|        |_|_|
             |                         |            |
      Traffic tagged with LD           |            |
             |                         |            |
     ________|________________     ____|___     ____|___
    | Multilogical Device MLD |   |        |   |        |
    |        |                |   | Simple |   | Another|
    |       / \               |   | CXL    |   | CXL    |
    |      /   \              |   | Memory |   | Memory |
    |    Interfaces           |   | Device |   | Device |
    |   LD0     LD1           |   |        |   |        |
    |_________________________|   |________|   |________|

Note the hosts just see separate devices and switches with the fun exception that the
memory may actually be available to both at the same time.

Control plane for the switches and MLD see what is actually going on.

At this stage upshot is we could just default this to zero and add an optional
parameter to set it later.



...

> > >> > +# @extents: Extents to release
> > >> > +#
> > >> > +# Since : 9.1
> > >> > +##
> > >> > +{ 'command': 'cxl-release-dynamic-capacity',
> > >> > +  'data': { 'path': 'str',
> > >> > +            'hid': 'uint16',
> > >> > +            'flags': 'uint8',
> > >> > +            'region-id': 'uint8',
> > >> > +            'tag': 'str',
> > >> > +            'extents': [ 'CXLDCExtentRecord' ]
> > >> > +           }
> > >> > +}  
> > >> 
> > >> During review of v5, you wrote:
> > >> 
> > >>     For add command, the host will send a mailbox command to response to
> > >>     the add request to the device to indicate whether it accepts the add
> > >>     capacity offer or not.
> > >>     
> > >>     For release command, the host send a mailbox command (not always a
> > >>     response since the host can proactively release capacity if it does
> > >>     not need it any more) to device to ask device release the capacity.
> > >> 
> > >> Can you briefly sketch the protocol?  Peers and messages involved.
> > >> Possibly as a state diagram.  
> > >
> > > Need to think about it. If we can polish the text nicely, maybe the
> > > sketch is not needed. My concern is that the sketch may
> > > introduce unwanted complexity as we expose too much details. The two
> > > commands provide ways to add/release dynamic capacity to/from a host,
> > > that is all. All the other information, like what the host will do, or
> > > how the device will react, are consequence of the command, not sure
> > > whether we want to include here.  
> > 
> > The protocol sketch is for me, not necessarily the doc comment.  I'd
> > like to understand at high level how this stuff works, because only then
> > can I meaningfully review the docs.  
> 
> --------------------------------
> For add command, saying a user sends a request to FM to ask to add
> extent A of the device (managed by FM) to host 0.
> The function cxl-add-dynamic-capacity simulates what FM needs to do.

This gets a little fiddly as an explanation.  I'd argue this is more or
less at the level of the FM to device command flow so it's the device
verifying etc. (you could explain this interface as talking to an FM
that is talking to the device, but that just feels complicated to me).

> 1. Verify extent A is valid (behaviour defined by the spec), return
> error if not; otherwise,
> 2. Add a record to the device's event log (indicating the intent to
> add extent A to host 0), update device internal extent tracking status,
> signal an interrupt to host 0;
> (The above step 1 & 2 are performed in the QMP interface, following
> operations are QMP irrelevant, only host and device involved.)

In this patch.

> 3. Once the interrupt is received, host 0 fetch the event record from
> the device's event log through some mailbox command (out of scope
> of this patch series).

It's in patch 8.

> 4. Host 0 decides whether it accepts extent A or not. Whether accept or
> reject, host needs to send a response (add-response mailbox command) to
> the device so the device can update its internal extent tracking
> status accordingly.
> The device return a value to the host showing whether the response is
> successful or failed.

(assuming the host isn't buggy this always succeeds)

> 5. Based on the mailbox command return value, the host process
> accordingly.

Memory now useable by host if it accepted it successfully.

> 6. The host sends a mailbox command to the device to clear the event
> record in the device's event log. 
> 
> ---------------------------------
> For release command, saying a user sends a request to FM to ask host 0
> to release extent A and return it back to the device (managed by FM).
> 
> The function cxl-release-dynamic-capacity simulates what FM needs to do.
> 1. Verify extent A is valid (defined by the spec), return error if not;
> otherwise,
> 2. Add a record to the event log (indicating the intent to
> release extent A from host 0), signal an interrupt to host 0;
> (The above step 1 & 2 are performed in the QMP interface, following
> operations are QMP irrelevant, only host and device involved.
> 3. Once the interrupt is received, host 0 fetch the event record from
> the device's event log through some mailbox command (out of scope
> of this patch series).
> 4. Host 0 decides whether it can release extent A or not. Whether can or
> cannot release, host needs to send a release (mailbox command) to the device
> so the device can update its internal extent tracking status accordingly.
> The device returns a value to host 0 showing whether the release is
> successful or failed.
> 5. Based on the returned value, the host process accordingly.
> 6. The host sends mailbox command to clear the event record in the
> device's event log. 
> 
> For release command, it is more complicated. Based on the release flag
> passed to FM, FM can behaviour differently. For example, if the
> forced-removal flag is set, FM can directly get the extent back from a
> host for other uses without waiting for the host to send command to the
> device. For the above step 2, their may be not event record to the event
> log (no supported in this patch series yet).
I thought we weren't doing force remove yet?  So for that we could
set default value as normal release until we add that support perhaps.

> 
> Also, for the release interface here, it simulates FM initializes the
> release request.
> There is another case where the host can proactively release extents it
> do not need any more back to device. However, this case is out of the
> scope of this release interface.
> 
> Hope the above text helps a little for the context here.
> Let me know if further clarification is needed.

Only thing I'd add is that for now (because we don't need it for testing
the kernel flows) is that this does not provide any way for external
agents (e.g. our 'fabric manager' to find out what the state is - i.e.
if the extents have been accepted by the host etc). That stuff is all
defined by the spec, but not yet in the QMP interface.  At somepoint
we may want to add that as a state query type interface.

Jonathan

p.s. Our emails raced yesterday, so great you put together this explanation
of the flows before I got to it :)

> 
> Thanks,
> Fan
> 
> 
> 
> >   
> > > @Jonathan, Any thoughts on this?  
> > 
> > Thanks!
> >   



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-29  7:58       ` Markus Armbruster
  2024-04-30 17:17         ` fan
  2024-04-30 17:21           ` Jonathan Cameron via
@ 2024-05-01 22:29         ` fan
  2 siblings, 0 replies; 65+ messages in thread
From: fan @ 2024-05-01 22:29 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: fan, qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
	ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni


From 873f59ec06c38645768ada452d9b18920a34723e Mon Sep 17 00:00:00 2001
From: Fan Ni <fan.ni@samsung.com>
Date: Tue, 20 Feb 2024 09:48:31 -0800
Subject: [PATCH] hw/cxl/events: Add qmp interfaces to add/release dynamic
 capacity extents
Status: RO
Content-Length: 25172
Lines: 731

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "selection-policy": 2,
      "region": 0,
      "tag": "",
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "flags": 1,
      "region": 0,
      "tag": "",
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  62 +++++--
 hw/mem/cxl_type3.c          | 311 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  20 +++
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               |  90 +++++++++++
 6 files changed, 510 insertions(+), 13 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 9d54e10cd4..3569902e9e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
     g_free(extent);
 }
 
+/*
+ * Add a new extent to the extent "group" if group exists;
+ * otherwise, create a new group
+ * Return value: return the group where the extent is inserted.
+ */
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq)
+{
+    if (!group) {
+        group = g_new0(CXLDCExtentGroup, 1);
+        QTAILQ_INIT(&group->list);
+    }
+    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
+                                     tag, shared_seq);
+    return group;
+}
+
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group)
+{
+    QTAILQ_INSERT_TAIL(list, group, node);
+}
+
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
+{
+    CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
+
+    QTAILQ_REMOVE(list, group, node);
+    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&group->list, ent);
+    }
+    g_free(group);
+}
+
 /*
  * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
  * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
@@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 {
     uint32_t i;
     CXLDCExtent *ent;
+    CXLDCExtentGroup *ext_group;
     uint64_t dpa, len;
     Range range1, range2;
 
@@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
         range_init_nofail(&range1, dpa, len);
 
         /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
+         * The host-accepted DPA range must be contained by the first extent
+         * group in the pending list
          */
+        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
-        /*
-         * TODO: once the pending list is introduced, extents in the beginning
-         * will get wiped out.
-         */
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_SUCCESS;
     }
 
@@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list accordingly here.
-         */
     }
+    /* Remove the first extent group in the pending list*/
+    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
 
     return CXL_MBOX_SUCCESS;
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index c2cdd6d506..1bae3711a0 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group, *group_next;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+
+    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
+        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
+        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(&group->list, ent);
+        }
+        g_free(group);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1443,7 +1453,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
     default:
         return -EINVAL;
     }
@@ -1694,6 +1703,306 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
+                                                uint64_t dpa, uint64_t len)
+{
+    CXLDCExtentGroup *group;
+
+    if (!list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(group, list, node) {
+        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event with extent list.
+ * Currently DC extents add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
+        uint16_t hid, CXLDCEventType type, uint8_t rid,
+        CXLDynamicCapacityExtentList *records, Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDynamicCapacityExtentList *list;
+    CXLDCExtentGroup *group = NULL;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
+    uint64_t dpa, offset, len, block_size;
+    g_autofree unsigned long *blk_bitmap = NULL;
+    int i;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        } else if (type == DC_EVENT_ADD_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA already accessible  to the same LD");
+                return;
+            }
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA again while still pending");
+                return;
+            }
+        }
+        list = list->next;
+        num_extents++;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            group = cxl_insert_extent_to_extent_group(group,
+                                                      extents[i].start_dpa,
+                                                      extents[i].len,
+                                                      extents[i].tag,
+                                                      extents[i].shared_seq);
+        }
+
+        list = list->next;
+        i++;
+    }
+    if (group) {
+        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (i < num_extents - 1) {
+            /* Set "More" flag */
+            dCap.flags |= BIT(0);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t host_id,
+                                  uint8_t sel_policy, uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList  *extents,
+                                  Error **errp)
+{
+    enum {
+        CXL_SEL_POLICY_FREE,
+        CXL_SEL_POLICY_CONTIGUOUS,
+        CXL_SEL_POLICY_PRESCRIPTIVE,
+        CXL_SEL_POLICY_ENABLESHAREDACCESS,
+    };
+    switch (sel_policy) {
+    case CXL_SEL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id,
+                                                      DC_EVENT_ADD_CAPACITY,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Selection policy not supported");
+        return;
+    }
+}
+
+#define REMOVAL_POLICY_MASK 0xf
+#define REMOVAL_POLICY_PRESCRIPTIVE 1
+#define FORCED_REMOVAL_BIT BIT(4)
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      uint8_t flags, uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList  *extents,
+                                      Error **errp)
+{
+    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
+
+    if (flags & FORCED_REMOVAL_BIT) {
+        /* TODO: enable forced removal in the future */
+        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
+        error_setg(errp, "Forced removal not supported yet");
+        return;
+    }
+
+    switch (flags & REMOVAL_POLICY_MASK) {
+    case REMOVAL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id, type,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Removal policy not supported");
+        return;
+    }
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..9df530ceec 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path,
+                                  uint16_t host_id,
+                                  uint8_t sel_policy,
+                                  uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList *extents,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      uint8_t flags, uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList *extents,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..c69ff6b5de 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
 } CXLDCExtent;
 typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
 
+typedef struct CXLDCExtentGroup {
+    CXLDCExtentList list;
+    QTAILQ_ENTRY(CXLDCExtentGroup) node;
+} CXLDCExtentGroup;
+typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -494,6 +500,7 @@ struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentGroupList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq);
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group);
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 4281726dec..27cf39f448 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -361,3 +361,93 @@
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDynamicCapacityExtent:
+#
+# A single dynamic capacity extent
+#
+# @offset: The offset (in bytes) to the start of the region
+#     where the extent belongs to
+#
+# @len: The length of the extent in bytes
+#
+# Since: 9.1
+##
+{ 'struct': 'CXLDynamicCapacityExtent',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to initiate to add dynamic capacity extents to a host.  It
+# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
+#
+# @path: CXL DCD canonical QOM path
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# @selection-policy: The "Selection Policy" bits as defined in
+#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
+#     selecting which extents comprise the added capacity.
+#
+# @region: The "Region Number" field as defined in cxl spec r3.1
+#     Table 7-70.  The dynamic capacity region where the capacity
+#     is being added.  Valid range is from 0-7.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'selection-policy': 'uint8',
+            'region': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to initiate to release dynamic capacity extents from a
+# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
+#
+# @path: CXL DCD canonical QOM path
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# @flags: The "Flags" field as defined in cxl spec r3.1 Table 7-71,
+#     with bit[3:0] for removal policy, bit[4] for forced removal,
+#     bit[5] for sanitize on release, bit[7:6] reserved.
+#
+# @region: The dynamic capacity region where the extents will be
+#     released.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'flags': 'uint8',
+            'region': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-05-01 14:58             ` Jonathan Cameron via
  (?)
@ 2024-05-01 22:36             ` fan
  -1 siblings, 0 replies; 65+ messages in thread
From: fan @ 2024-05-01 22:36 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: fan, Markus Armbruster, qemu-devel, linux-cxl, gregory.price,
	ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, mst

Hi Markus, Michael and Jonathan,

FYI. I have updated this patch based on the feedbacks so far, and posted here:
https://lore.kernel.org/linux-cxl/20240418232902.583744-1-fan.ni@samsung.com/T/#ma25b6657597d39df23341dc43c22a8c49818e5f9

Comments are welcomed and appreciated.

Fan


On Wed, May 01, 2024 at 03:58:12PM +0100, Jonathan Cameron wrote:
> 
> 
> 
> > > >> > +# @hid: host id  
> > > >> 
> > > >> @host-id, unless "HID" is established terminology in CXL DCD land.  
> > > >
> > > > host-id works.  
> > > >> 
> > > >> What is a host ID?  
> > > >
> > > > It is an id identifying the host to which the capacity is being added.  
> > > 
> > > How are these IDs assigned?  
> > 
> > All the arguments passed to the command here are defined in CXL spec. I
> > will add reference to the spec.
> > 
> > Based on the spec, for LD-FAM (Fabric attached memory represented as
> > logical device), host id is the LD-ID of the host interface to which
> > the capacity is being added. LD-ID is a unique number (16-bit) assigned
> > to a host interface.
> 
> Key here is the host doesn't know it.  This ID exists purely for rooting
> to the appropriate host interface either via choosing a port on a
> multihead Single Logical Device (SLD) (so today it's always 0 as we only
> have one head) or if we ever implement a switch capable of handling MLDs
> then the switch will handle routing of host PCIe accesses so it lands
> on the interface defined by this ID (and the event turns up in that event log.
> 
>             Host A         Host B - could in theory be a RP on host A ;)
>               |              |  Doesn't exist (yet, but there are partial.
>              _|______________|_ patches for this on list.
>             | LD 0         LD 1|
>             |                  |
>             |   Multi Head     |
>             |   Single Logical |
>             |  Device (MH-SLD) |
>             |__________________|
> Host view similar to the switch case, but just two direct
> connected devices.
> 
> Or Switch and MLD case - we aren't emulating this yet at all
> 
>      Wiring / real topology                 Host View 
>          
>       Host A     Host B              Host A       Host B
>         |          |                   |            |
>      ___|__________|___               _|_          _|_
>     |   \  SWITCH /    |             |SW0|        | | |
>     |    \       /     |             | | |        | | |
>     |    LD0   LD1     |             | | |        | | |
>     |      \   /       |             | | |        | | |
>     |        |         |             | | |        | | |
>     |________|_________|             |_|_|        |_|_|
>              |                         |            |
>       Traffic tagged with LD           |            |
>              |                         |            |
>      ________|________________     ____|___     ____|___
>     | Multilogical Device MLD |   |        |   |        |
>     |        |                |   | Simple |   | Another|
>     |       / \               |   | CXL    |   | CXL    |
>     |      /   \              |   | Memory |   | Memory |
>     |    Interfaces           |   | Device |   | Device |
>     |   LD0     LD1           |   |        |   |        |
>     |_________________________|   |________|   |________|
> 
> Note the hosts just see separate devices and switches with the fun exception that the
> memory may actually be available to both at the same time.
> 
> Control plane for the switches and MLD see what is actually going on.
> 
> At this stage upshot is we could just default this to zero and add an optional
> parameter to set it later.
> 
> 
> 
> ...
> 
> > > >> > +# @extents: Extents to release
> > > >> > +#
> > > >> > +# Since : 9.1
> > > >> > +##
> > > >> > +{ 'command': 'cxl-release-dynamic-capacity',
> > > >> > +  'data': { 'path': 'str',
> > > >> > +            'hid': 'uint16',
> > > >> > +            'flags': 'uint8',
> > > >> > +            'region-id': 'uint8',
> > > >> > +            'tag': 'str',
> > > >> > +            'extents': [ 'CXLDCExtentRecord' ]
> > > >> > +           }
> > > >> > +}  
> > > >> 
> > > >> During review of v5, you wrote:
> > > >> 
> > > >>     For add command, the host will send a mailbox command to response to
> > > >>     the add request to the device to indicate whether it accepts the add
> > > >>     capacity offer or not.
> > > >>     
> > > >>     For release command, the host send a mailbox command (not always a
> > > >>     response since the host can proactively release capacity if it does
> > > >>     not need it any more) to device to ask device release the capacity.
> > > >> 
> > > >> Can you briefly sketch the protocol?  Peers and messages involved.
> > > >> Possibly as a state diagram.  
> > > >
> > > > Need to think about it. If we can polish the text nicely, maybe the
> > > > sketch is not needed. My concern is that the sketch may
> > > > introduce unwanted complexity as we expose too much details. The two
> > > > commands provide ways to add/release dynamic capacity to/from a host,
> > > > that is all. All the other information, like what the host will do, or
> > > > how the device will react, are consequence of the command, not sure
> > > > whether we want to include here.  
> > > 
> > > The protocol sketch is for me, not necessarily the doc comment.  I'd
> > > like to understand at high level how this stuff works, because only then
> > > can I meaningfully review the docs.  
> > 
> > --------------------------------
> > For add command, saying a user sends a request to FM to ask to add
> > extent A of the device (managed by FM) to host 0.
> > The function cxl-add-dynamic-capacity simulates what FM needs to do.
> 
> This gets a little fiddly as an explanation.  I'd argue this is more or
> less at the level of the FM to device command flow so it's the device
> verifying etc. (you could explain this interface as talking to an FM
> that is talking to the device, but that just feels complicated to me).
> 
> > 1. Verify extent A is valid (behaviour defined by the spec), return
> > error if not; otherwise,
> > 2. Add a record to the device's event log (indicating the intent to
> > add extent A to host 0), update device internal extent tracking status,
> > signal an interrupt to host 0;
> > (The above step 1 & 2 are performed in the QMP interface, following
> > operations are QMP irrelevant, only host and device involved.)
> 
> In this patch.
> 
> > 3. Once the interrupt is received, host 0 fetch the event record from
> > the device's event log through some mailbox command (out of scope
> > of this patch series).
> 
> It's in patch 8.
> 
> > 4. Host 0 decides whether it accepts extent A or not. Whether accept or
> > reject, host needs to send a response (add-response mailbox command) to
> > the device so the device can update its internal extent tracking
> > status accordingly.
> > The device return a value to the host showing whether the response is
> > successful or failed.
> 
> (assuming the host isn't buggy this always succeeds)
> 
> > 5. Based on the mailbox command return value, the host process
> > accordingly.
> 
> Memory now useable by host if it accepted it successfully.
> 
> > 6. The host sends a mailbox command to the device to clear the event
> > record in the device's event log. 
> > 
> > ---------------------------------
> > For release command, saying a user sends a request to FM to ask host 0
> > to release extent A and return it back to the device (managed by FM).
> > 
> > The function cxl-release-dynamic-capacity simulates what FM needs to do.
> > 1. Verify extent A is valid (defined by the spec), return error if not;
> > otherwise,
> > 2. Add a record to the event log (indicating the intent to
> > release extent A from host 0), signal an interrupt to host 0;
> > (The above step 1 & 2 are performed in the QMP interface, following
> > operations are QMP irrelevant, only host and device involved.
> > 3. Once the interrupt is received, host 0 fetch the event record from
> > the device's event log through some mailbox command (out of scope
> > of this patch series).
> > 4. Host 0 decides whether it can release extent A or not. Whether can or
> > cannot release, host needs to send a release (mailbox command) to the device
> > so the device can update its internal extent tracking status accordingly.
> > The device returns a value to host 0 showing whether the release is
> > successful or failed.
> > 5. Based on the returned value, the host process accordingly.
> > 6. The host sends mailbox command to clear the event record in the
> > device's event log. 
> > 
> > For release command, it is more complicated. Based on the release flag
> > passed to FM, FM can behaviour differently. For example, if the
> > forced-removal flag is set, FM can directly get the extent back from a
> > host for other uses without waiting for the host to send command to the
> > device. For the above step 2, their may be not event record to the event
> > log (no supported in this patch series yet).
> I thought we weren't doing force remove yet?  So for that we could
> set default value as normal release until we add that support perhaps.
> 
> > 
> > Also, for the release interface here, it simulates FM initializes the
> > release request.
> > There is another case where the host can proactively release extents it
> > do not need any more back to device. However, this case is out of the
> > scope of this release interface.
> > 
> > Hope the above text helps a little for the context here.
> > Let me know if further clarification is needed.
> 
> Only thing I'd add is that for now (because we don't need it for testing
> the kernel flows) is that this does not provide any way for external
> agents (e.g. our 'fabric manager' to find out what the state is - i.e.
> if the extents have been accepted by the host etc). That stuff is all
> defined by the spec, but not yet in the QMP interface.  At somepoint
> we may want to add that as a state query type interface.
> 
> Jonathan
> 
> p.s. Our emails raced yesterday, so great you put together this explanation
> of the flows before I got to it :)
> 
> > 
> > Thanks,
> > Fan
> > 
> > 
> > 
> > >   
> > > > @Jonathan, Any thoughts on this?  
> > > 
> > > Thanks!
> > >   
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
@ 2024-05-14  2:16   ` Zhijian Li (Fujitsu) via
  2024-04-18 23:10 ` [PATCH v7 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) @ 2024-05-14  2:16 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, fan.ni

Hi Fan


Do you have a newer instruction to play with the DCD. It seems that
the instruction in RFC[0] doesn't work for current code.

[0] https://lore.kernel.org/all/20230511175609.2091136-1-fan.ni@samsung.com/



On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
> A git tree of this series can be found here (with one extra commit on top
> for printing out accepted/pending extent list):
> https://github.com/moking/qemu/tree/dcd-v7
> 
> v6->v7:
> 
> 1. Fixed the dvsec range register issue mentioned in the the cover letter in v6.
>     Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
> 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. (Jonathan)
> 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
> 4. Added "Reviewed-by" tag to Patch 7.
> 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
>     reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. (Jørgen)
> 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
>      (Jonathan)
> 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
> 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
>     tags, selection policy, flags in the interface. (Jonathan, Gregory)
> 9. Redesigned the pending list so extents in the same requests are grouped
>      together. A new data structure is introduced to represent "extent group"
>      in pending list.  (Jonathan)
> 10. Added support in QMP interface for "More" flag.
> 11. Check "Forced removal" flag for release request and not let it pass through.
> 12. Removed the dynamic capacity log type from CxlEventLog definition in cxl.json
>     to avoid the side effect it may introduce to inject error to DC event log.
>     (Jonathan)
> 13. Hard coded the event log type to dynamic capacity event log in QMP
>      interfaces. (Jonathan)
> 14. Adding space in between "-1]". (Jonathan)
> 15. Some minor comment fixes.
> 
> The code is tested with similar setup and has passed similar tests as listed
> in the cover letter of v5[1] and v6[2].
> Also, the code is tested with the latest DCD kernel patchset[3].
> 
> [1] Qemu DCD patchset v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
> [2] Qemu DCD patchset v6: https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan.cxl@gmail.com/T/#t
> [3] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
> 
> 
> Fan Ni (12):
>    hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>      payload of identify memory device command
>    hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>      and mailbox command support
>    include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
>      type3 memory devices
>    hw/mem/cxl_type3: Add support to create DC regions to type3 memory
>      devices
>    hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
>      size instead of mr as argument
>    hw/mem/cxl_type3: Add host backend and address space handling for DC
>      regions
>    hw/mem/cxl_type3: Add DC extent list representative and get DC extent
>      list mailbox support
>    hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>      dynamic capacity response
>    hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
>      extents
>    hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
>    hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
>    hw/mem/cxl_type3: Allow to release extent superset in QMP interface
> 
>   hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
>   hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
>   hw/mem/cxl_type3_stubs.c    |  20 ++
>   include/hw/cxl/cxl_device.h |  81 ++++-
>   include/hw/cxl/cxl_events.h |  18 +
>   qapi/cxl.json               |  69 ++++
>   6 files changed, 1396 insertions(+), 45 deletions(-)
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
@ 2024-05-14  2:16   ` Zhijian Li (Fujitsu) via
  0 siblings, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2024-05-14  2:16 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, fan.ni

Hi Fan


Do you have a newer instruction to play with the DCD. It seems that
the instruction in RFC[0] doesn't work for current code.

[0] https://lore.kernel.org/all/20230511175609.2091136-1-fan.ni@samsung.com/



On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
> A git tree of this series can be found here (with one extra commit on top
> for printing out accepted/pending extent list):
> https://github.com/moking/qemu/tree/dcd-v7
> 
> v6->v7:
> 
> 1. Fixed the dvsec range register issue mentioned in the the cover letter in v6.
>     Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
> 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. (Jonathan)
> 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
> 4. Added "Reviewed-by" tag to Patch 7.
> 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
>     reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. (Jørgen)
> 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
>      (Jonathan)
> 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
> 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
>     tags, selection policy, flags in the interface. (Jonathan, Gregory)
> 9. Redesigned the pending list so extents in the same requests are grouped
>      together. A new data structure is introduced to represent "extent group"
>      in pending list.  (Jonathan)
> 10. Added support in QMP interface for "More" flag.
> 11. Check "Forced removal" flag for release request and not let it pass through.
> 12. Removed the dynamic capacity log type from CxlEventLog definition in cxl.json
>     to avoid the side effect it may introduce to inject error to DC event log.
>     (Jonathan)
> 13. Hard coded the event log type to dynamic capacity event log in QMP
>      interfaces. (Jonathan)
> 14. Adding space in between "-1]". (Jonathan)
> 15. Some minor comment fixes.
> 
> The code is tested with similar setup and has passed similar tests as listed
> in the cover letter of v5[1] and v6[2].
> Also, the code is tested with the latest DCD kernel patchset[3].
> 
> [1] Qemu DCD patchset v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
> [2] Qemu DCD patchset v6: https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan.cxl@gmail.com/T/#t
> [3] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
> 
> 
> Fan Ni (12):
>    hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>      payload of identify memory device command
>    hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>      and mailbox command support
>    include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
>      type3 memory devices
>    hw/mem/cxl_type3: Add support to create DC regions to type3 memory
>      devices
>    hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
>      size instead of mr as argument
>    hw/mem/cxl_type3: Add host backend and address space handling for DC
>      regions
>    hw/mem/cxl_type3: Add DC extent list representative and get DC extent
>      list mailbox support
>    hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>      dynamic capacity response
>    hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
>      extents
>    hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
>    hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
>    hw/mem/cxl_type3: Allow to release extent superset in QMP interface
> 
>   hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
>   hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
>   hw/mem/cxl_type3_stubs.c    |  20 ++
>   include/hw/cxl/cxl_device.h |  81 ++++-
>   include/hw/cxl/cxl_events.h |  18 +
>   qapi/cxl.json               |  69 ++++
>   6 files changed, 1396 insertions(+), 45 deletions(-)
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-18 23:11 ` [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-05-14  2:35     ` Zhijian Li (Fujitsu) via
  2024-04-22 12:01     ` Jonathan Cameron via
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) @ 2024-05-14  2:35 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni



On 19/04/2024 07:11, nifan.cxl@gmail.com wrote:
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }


Double *space* before 'to'

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-05-14  2:35     ` Zhijian Li (Fujitsu) via
  0 siblings, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2024-05-14  2:35 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni



On 19/04/2024 07:11, nifan.cxl@gmail.com wrote:
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }


Double *space* before 'to'

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
  2024-04-18 23:10 ` [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
@ 2024-05-14  8:14     ` Zhijian Li (Fujitsu)
  2024-05-14  8:14     ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2024-05-14  8:14 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni



On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 

> +}
> +
>   static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>   {
>       DeviceState *ds = DEVICE(ct3d);
> @@ -635,6 +676,13 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>           g_free(p_name);
>       }
>   
> +    if (ct3d->dc.num_regions > 0) {
> +        if (!cxl_create_dc_regions(ct3d, errp)) {
> +            error_setg(errp, "setup DC regions failed");

This error_set() would cause an assertion if the errp was assigned inside cxl_create_dc_regions();
Try error_append_hint() instead

#3  0x00007f1fdc4fafc6 in annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000555fd3edbea8 in error_setv
     (errp=0x7ffe6d1a3de0, src=0x555fd3fe262b "../hw/mem/cxl_type3.c", line=807, func=0x555fd3fe2fe0 <__func__.21> "cxl_setup_memory", err_class=ERROR_CLASS_GENERIC_ERROR, fmt=0x555fd3fe2939 "setup DC regions failed", ap=0x7ffe6d1a3
c00, suffix=0x0) at ../util/error.c:68
#5  0x0000555fd3edc126 in error_setg_internal
     (errp=0x7ffe6d1a3de0, src=0x555fd3fe262b "../hw/mem/cxl_type3.c", line=807, func=0x555fd3fe2fe0 <__func__.21> "cxl_setup_memory", fmt=0x555fd3fe2939 "setup DC regions failed") at ../util/error.c:105
#6  0x0000555fd3819c9f in cxl_setup_memory (ct3d=0x555fd8b2f3e0, errp=0x7ffe6d1a3de0) at ../hw/mem/cxl_type3.c:807
#7  0x0000555fd3819d7b in ct3_realize (pci_dev=0x555fd8b2f3e0, errp=0x7ffe6d1a3de0) at ../hw/mem/cxl_type3.c:833
#8  0x0000555fd38b575f in pci_qdev_realize (qdev=0x555fd8b2f3e0, errp=0x7ffe6d1a3e60) at ../hw/pci/pci.c:2093
#9  0x0000555fd3ccca9b in device_set_realized (obj=0x555fd8b2f3e0, value=true, errp=0x7ffe6d1a40d0)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
@ 2024-05-14  8:14     ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) @ 2024-05-14  8:14 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni



On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 

> +}
> +
>   static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>   {
>       DeviceState *ds = DEVICE(ct3d);
> @@ -635,6 +676,13 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>           g_free(p_name);
>       }
>   
> +    if (ct3d->dc.num_regions > 0) {
> +        if (!cxl_create_dc_regions(ct3d, errp)) {
> +            error_setg(errp, "setup DC regions failed");

This error_set() would cause an assertion if the errp was assigned inside cxl_create_dc_regions();
Try error_append_hint() instead

#3  0x00007f1fdc4fafc6 in annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000555fd3edbea8 in error_setv
     (errp=0x7ffe6d1a3de0, src=0x555fd3fe262b "../hw/mem/cxl_type3.c", line=807, func=0x555fd3fe2fe0 <__func__.21> "cxl_setup_memory", err_class=ERROR_CLASS_GENERIC_ERROR, fmt=0x555fd3fe2939 "setup DC regions failed", ap=0x7ffe6d1a3
c00, suffix=0x0) at ../util/error.c:68
#5  0x0000555fd3edc126 in error_setg_internal
     (errp=0x7ffe6d1a3de0, src=0x555fd3fe262b "../hw/mem/cxl_type3.c", line=807, func=0x555fd3fe2fe0 <__func__.21> "cxl_setup_memory", fmt=0x555fd3fe2939 "setup DC regions failed") at ../util/error.c:105
#6  0x0000555fd3819c9f in cxl_setup_memory (ct3d=0x555fd8b2f3e0, errp=0x7ffe6d1a3de0) at ../hw/mem/cxl_type3.c:807
#7  0x0000555fd3819d7b in ct3_realize (pci_dev=0x555fd8b2f3e0, errp=0x7ffe6d1a3de0) at ../hw/mem/cxl_type3.c:833
#8  0x0000555fd38b575f in pci_qdev_realize (qdev=0x555fd8b2f3e0, errp=0x7ffe6d1a3e60) at ../hw/pci/pci.c:2093
#9  0x0000555fd3ccca9b in device_set_realized (obj=0x555fd8b2f3e0, value=true, errp=0x7ffe6d1a40d0)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-04-18 23:10 ` [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2024-05-14  8:28     ` Zhijian Li (Fujitsu)
  2024-04-22 11:52     ` Jonathan Cameron via
  2024-05-14  8:28     ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2024-05-14  8:28 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni



On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
> +    uint64_t dc_size;
> +
> +    mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +    dc_size = memory_region_size(mr);
> +    region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
> +
> +    if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLIER) != 0) {
> +        error_setg(errp, "host backend size must be multiples of region len");

I prefer to have the %region_len% in the error message as well so that i can update the
backend file accordingly.



> +        return false;
> +    }

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
@ 2024-05-14  8:28     ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) @ 2024-05-14  8:28 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni



On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
> +    uint64_t dc_size;
> +
> +    mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +    dc_size = memory_region_size(mr);
> +    region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
> +
> +    if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLIER) != 0) {
> +        error_setg(errp, "host backend size must be multiples of region len");

I prefer to have the %region_len% in the error message as well so that i can update the
backend file accordingly.



> +        return false;
> +    }

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-04-19 18:24 ` [PATCH v7 00/12] Enabling DCD emulation support in Qemu Gregory Price
  2024-04-19 18:43   ` fan
@ 2024-05-16 17:05   ` fan
  1 sibling, 0 replies; 65+ messages in thread
From: fan @ 2024-05-16 17:05 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, fan.ni

On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:
> On Thu, Apr 18, 2024 at 04:10:51PM -0700, nifan.cxl@gmail.com wrote:
> > A git tree of this series can be found here (with one extra commit on top
> > for printing out accepted/pending extent list): 
> > https://github.com/moking/qemu/tree/dcd-v7
> > 
> > v6->v7:
> > 
> > 1. Fixed the dvsec range register issue mentioned in the the cover letter in v6.
> >    Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
> > 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. (Jonathan)
> > 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
> > 4. Added "Reviewed-by" tag to Patch 7.
> > 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
> >    reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. (Jørgen) 
> > 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
> >     (Jonathan)
> > 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
> > 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
> >    tags, selection policy, flags in the interface. (Jonathan, Gregory)
> > 9. Redesigned the pending list so extents in the same requests are grouped
> >     together. A new data structure is introduced to represent "extent group"
> >     in pending list.  (Jonathan)
> > 10. Added support in QMP interface for "More" flag. 
> > 11. Check "Forced removal" flag for release request and not let it pass through.
> > 12. Removed the dynamic capacity log type from CxlEventLog definition in cxl.json
> >    to avoid the side effect it may introduce to inject error to DC event log.
> >    (Jonathan)
> > 13. Hard coded the event log type to dynamic capacity event log in QMP
> >     interfaces. (Jonathan)
> > 14. Adding space in between "-1]". (Jonathan)
> > 15. Some minor comment fixes.
> > 
> > The code is tested with similar setup and has passed similar tests as listed
> > in the cover letter of v5[1] and v6[2].
> > Also, the code is tested with the latest DCD kernel patchset[3].
> > 
> > [1] Qemu DCD patchset v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
> > [2] Qemu DCD patchset v6: https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan.cxl@gmail.com/T/#t
> > [3] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
> >
> 
> added review to all patches, will hopefully be able to add a Tested-by
> tag early next week, along with a v1 RFC for MHD bit-tracking.
> 
> We've been testing v5/v6 for a bit, so I expect as soon as we get the
> MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> 
> The super-set release will complicate a few things but this doesn't
> look like a blocker on our end, just a change to how we track bits in a
> shared bit/bytemap.
> 

Hi Gregory,
I am planning to address all the concerns in this series and send out v8
next week. Jonathan mentioned you have few related patches built on top
of this series, can you point me to the latest version so I can look
into it? Also, would you like me to carry them over to send together
with my series in next version? It could be easier for you to avoid the
potential rebase needed for your patches?

Let me know.

Thanks,
Fan

> > 
> > Fan Ni (12):
> >   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> >     payload of identify memory device command
> >   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> >     and mailbox command support
> >   include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
> >     type3 memory devices
> >   hw/mem/cxl_type3: Add support to create DC regions to type3 memory
> >     devices
> >   hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
> >     size instead of mr as argument
> >   hw/mem/cxl_type3: Add host backend and address space handling for DC
> >     regions
> >   hw/mem/cxl_type3: Add DC extent list representative and get DC extent
> >     list mailbox support
> >   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> >     dynamic capacity response
> >   hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
> >     extents
> >   hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
> >   hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
> >   hw/mem/cxl_type3: Allow to release extent superset in QMP interface
> > 
> >  hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
> >  hw/mem/cxl_type3_stubs.c    |  20 ++
> >  include/hw/cxl/cxl_device.h |  81 ++++-
> >  include/hw/cxl/cxl_events.h |  18 +
> >  qapi/cxl.json               |  69 ++++
> >  6 files changed, 1396 insertions(+), 45 deletions(-)
> > 
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
  2024-05-14  8:14     ` Zhijian Li (Fujitsu)
  (?)
@ 2024-05-16 17:06     ` fan
  -1 siblings, 0 replies; 65+ messages in thread
From: fan @ 2024-05-16 17:06 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
	gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
	nmtadam.samsung, jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

On Tue, May 14, 2024 at 08:14:59AM +0000, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> 
> > +}
> > +
> >   static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >   {
> >       DeviceState *ds = DEVICE(ct3d);
> > @@ -635,6 +676,13 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >           g_free(p_name);
> >       }
> >   
> > +    if (ct3d->dc.num_regions > 0) {
> > +        if (!cxl_create_dc_regions(ct3d, errp)) {
> > +            error_setg(errp, "setup DC regions failed");
> 
> This error_set() would cause an assertion if the errp was assigned inside cxl_create_dc_regions();
> Try error_append_hint() instead
Thanks, Let me check and fix.

Fan
> 
> #3  0x00007f1fdc4fafc6 in annobin_assert.c_end () at /lib64/libc.so.6
> #4  0x0000555fd3edbea8 in error_setv
>      (errp=0x7ffe6d1a3de0, src=0x555fd3fe262b "../hw/mem/cxl_type3.c", line=807, func=0x555fd3fe2fe0 <__func__.21> "cxl_setup_memory", err_class=ERROR_CLASS_GENERIC_ERROR, fmt=0x555fd3fe2939 "setup DC regions failed", ap=0x7ffe6d1a3
> c00, suffix=0x0) at ../util/error.c:68
> #5  0x0000555fd3edc126 in error_setg_internal
>      (errp=0x7ffe6d1a3de0, src=0x555fd3fe262b "../hw/mem/cxl_type3.c", line=807, func=0x555fd3fe2fe0 <__func__.21> "cxl_setup_memory", fmt=0x555fd3fe2939 "setup DC regions failed") at ../util/error.c:105
> #6  0x0000555fd3819c9f in cxl_setup_memory (ct3d=0x555fd8b2f3e0, errp=0x7ffe6d1a3de0) at ../hw/mem/cxl_type3.c:807
> #7  0x0000555fd3819d7b in ct3_realize (pci_dev=0x555fd8b2f3e0, errp=0x7ffe6d1a3de0) at ../hw/mem/cxl_type3.c:833
> #8  0x0000555fd38b575f in pci_qdev_realize (qdev=0x555fd8b2f3e0, errp=0x7ffe6d1a3e60) at ../hw/pci/pci.c:2093
> #9  0x0000555fd3ccca9b in device_set_realized (obj=0x555fd8b2f3e0, value=true, errp=0x7ffe6d1a40d0)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-05-14  8:28     ` Zhijian Li (Fujitsu)
  (?)
@ 2024-05-16 17:07     ` fan
  -1 siblings, 0 replies; 65+ messages in thread
From: fan @ 2024-05-16 17:07 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
	gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
	nmtadam.samsung, jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

On Tue, May 14, 2024 at 08:28:27AM +0000, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
> > +    uint64_t dc_size;
> > +
> > +    mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > +    dc_size = memory_region_size(mr);
> > +    region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
> > +
> > +    if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLIER) != 0) {
> > +        error_setg(errp, "host backend size must be multiples of region len");
> 
> I prefer to have the %region_len% in the error message as well so that i can update the
> backend file accordingly.

Will add.

Fan
> 
> 
> 
> > +        return false;
> > +    }

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-05-14  2:16   ` Zhijian Li (Fujitsu) via
  (?)
@ 2024-05-16 17:12   ` fan
  2024-05-17  2:20       ` Zhijian Li (Fujitsu) via
  -1 siblings, 1 reply; 65+ messages in thread
From: fan @ 2024-05-16 17:12 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
	gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
	nmtadam.samsung, jim.harris, Jorgen.Hansen, wj28.lee, fan.ni

On Tue, May 14, 2024 at 02:16:51AM +0000, Zhijian Li (Fujitsu) wrote:
> Hi Fan
> 
> 
> Do you have a newer instruction to play with the DCD. It seems that
> the instruction in RFC[0] doesn't work for current code.
> 
> [0] https://lore.kernel.org/all/20230511175609.2091136-1-fan.ni@samsung.com/
> 

For the testing, the only thing that has been changed for this series is
the QMP interface for add/release DC extents.

https://lore.kernel.org/linux-cxl/d708f7c8-2598-4a17-9cbb-935c6ae2a2be@fujitsu.com/T/#m05066f0098e976fb1c4b05db5e7ff7ca1bf27b1e

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "selection-policy": 2,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "flags": 1,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

btw, I have a wiki page to explain how to test CXL DCD with a tool I
wrote.
https://github.com/moking/moking.github.io/wiki/cxl%E2%80%90test%E2%80%90tool:-A-tool-to-ease-CXL-test-with-QEMU-setup%E2%80%90%E2%80%90Using-DCD-test-as-an-example

Let me know if you need more info for testing.


Fan

> 
> 
> On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
> > A git tree of this series can be found here (with one extra commit on top
> > for printing out accepted/pending extent list):
> > https://github.com/moking/qemu/tree/dcd-v7
> > 
> > v6->v7:
> > 
> > 1. Fixed the dvsec range register issue mentioned in the the cover letter in v6.
> >     Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
> > 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. (Jonathan)
> > 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
> > 4. Added "Reviewed-by" tag to Patch 7.
> > 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
> >     reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. (Jørgen)
> > 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
> >      (Jonathan)
> > 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
> > 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
> >     tags, selection policy, flags in the interface. (Jonathan, Gregory)
> > 9. Redesigned the pending list so extents in the same requests are grouped
> >      together. A new data structure is introduced to represent "extent group"
> >      in pending list.  (Jonathan)
> > 10. Added support in QMP interface for "More" flag.
> > 11. Check "Forced removal" flag for release request and not let it pass through.
> > 12. Removed the dynamic capacity log type from CxlEventLog definition in cxl.json
> >     to avoid the side effect it may introduce to inject error to DC event log.
> >     (Jonathan)
> > 13. Hard coded the event log type to dynamic capacity event log in QMP
> >      interfaces. (Jonathan)
> > 14. Adding space in between "-1]". (Jonathan)
> > 15. Some minor comment fixes.
> > 
> > The code is tested with similar setup and has passed similar tests as listed
> > in the cover letter of v5[1] and v6[2].
> > Also, the code is tested with the latest DCD kernel patchset[3].
> > 
> > [1] Qemu DCD patchset v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
> > [2] Qemu DCD patchset v6: https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan.cxl@gmail.com/T/#t
> > [3] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
> > 
> > 
> > Fan Ni (12):
> >    hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> >      payload of identify memory device command
> >    hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> >      and mailbox command support
> >    include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
> >      type3 memory devices
> >    hw/mem/cxl_type3: Add support to create DC regions to type3 memory
> >      devices
> >    hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
> >      size instead of mr as argument
> >    hw/mem/cxl_type3: Add host backend and address space handling for DC
> >      regions
> >    hw/mem/cxl_type3: Add DC extent list representative and get DC extent
> >      list mailbox support
> >    hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> >      dynamic capacity response
> >    hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
> >      extents
> >    hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
> >    hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
> >    hw/mem/cxl_type3: Allow to release extent superset in QMP interface
> > 
> >   hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
> >   hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
> >   hw/mem/cxl_type3_stubs.c    |  20 ++
> >   include/hw/cxl/cxl_device.h |  81 ++++-
> >   include/hw/cxl/cxl_events.h |  18 +
> >   qapi/cxl.json               |  69 ++++
> >   6 files changed, 1396 insertions(+), 45 deletions(-)
> > 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
  2024-05-16 17:12   ` fan
@ 2024-05-17  2:20       ` Zhijian Li (Fujitsu) via
  0 siblings, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) @ 2024-05-17  2:20 UTC (permalink / raw)
  To: fan, ira.weiny
  Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, fan.ni

Fan,

Many thanks, it helps a lot. Previous I forgot to create a new dax device(daxctl create-device region0)
Question: Why we need to create a the dax0.1, why the dax0.0 doesn't associate to the new adding DCD region.

Ira,

Let me try to report a kernel panic.

kernel: dcd-2024-04-17
qemu: dcd-2024-04-17

QEMU command line:
164     <qemu:arg value='-device'/>
165     <qemu:arg value='cxl-type3,bus=cxl-rp-hb0rp0,persistent-memdev=cxl-pmem0,lsa=cxl-pmem-lsa0,id=pmem-dcmem,volatile-dc-memdev=cxl-dcmem0,num-dc-regions=4'/>
166     <qemu:arg value='-object'/>
167     <qemu:arg value='memory-backend-file,id=cxl-dcmem0,share=on,mem-path=/home/lizhijian/images/cxldcmem0.raw,size=2048M'/>
168     <qemu:arg value='-object'/>
169     <qemu:arg value='memory-backend-file,id=cxl-pmem0,share=on,mem-path=/home/lizhijian/images/cxlpmem0.raw,size=2048M'/>
170     <qemu:arg value='-object'/>
171     <qemu:arg value='memory-backend-file,id=cxl-pmem-lsa0,share=on,mem-path=/home/lizhijian/images/cxlpmem-lsa0.raw,size=4K'/>
172     <qemu:arg value='-M'/>
173     <qemu:arg value='cxl=on,cxl-fmw.0.targets.0=pxb-cxl.0,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k'/>


Reproducer:
  1. guest: ./create-dc.sh
  2. host: virsh qemu-monitor-command rdma-server-cxl-persistent-dcd $(cat cxl-add-dcd.json)
  3. guest: daxctl create-device region0 # will create dax0.1
  4. daxctl reconfigure-device  --mode=system-ram --force  dax0.1 -u  # kernel panic

=====================
# cat ./create-dc.sh
#!/bin/bash
set -ex

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
echo 0x10000000 > /sys/bus/cxl/devices/$region/size
echo "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind
=========================
# cat cxl-add-dcd.json
{ "execute": "cxl-add-dynamic-capacity",
   "arguments": {
       "path": "/machine/peripheral/pmem-dcmem",
       "hid": 0,
       "selection-policy": 2,
       "region-id": 0,
       "tag": "",
       "extents": [
       {
           "offset": 0,
           "len": 268435456
       }
       ]
   }
}



[  126.909297] Demotion targets for Node 0: preferred: 1, fallback: 1
[  126.911186] Demotion targets for Node 1: null
[  126.913808] BUG: kernel NULL pointer dereference, address: 0000000000000468
[  126.915431] #PF: supervisor read access in kernel mode
[  126.917156] #PF: error_code(0x0000) - not-present page
[  126.918976] PGD 8000000006771067 P4D 8000000006771067 PUD e777067 PMD 0
[  126.920587] Oops: 0000 [#1] PREEMPT SMP PTI
[  126.921714] CPU: 0 PID: 1101 Comm: daxctl Kdump: loaded Not tainted 6.9.0-rc3-lizhijian+ #489
[  126.924914] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  126.928620] RIP: 0010:cxl_region_perf_attrs_callback+0x25/0x110 [cxl_core]
[  126.930316] Code: 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 8b 6a 24 83 fd ff 74 20 48 83 fe 01 75 1a 48 8b 87 58 ff ff ff 48 89 fb <48> 8b b8 68 04 00 00 e8 cf a2 f4 e0 39 c5 74 13 45 31 e4 5b 44 89
[  126.934920] RSP: 0018:ffffc900007cbc58 EFLAGS: 00010246
[  126.936994] RAX: 0000000000000000 RBX: ffff888007534d60 RCX: 0000000000000020
[  126.939378] RDX: ffffc900007cbcf8 RSI: 0000000000000001 RDI: ffff888007534d60
[  126.942721] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
[  126.944762] R10: ffff88807fc31d80 R11: 0000000000000000 R12: 0000000000000000
[  126.946900] R13: 0000000000000001 R14: ffffc900007cbcf8 R15: ffff888007534d60
[  126.948871] FS:  00007fb2ab918880(0000) GS:ffff88807fc00000(0000) knlGS:0000000000000000
[  126.951241] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  126.952722] CR2: 0000000000000468 CR3: 000000000aaf0003 CR4: 00000000001706f0
[  126.954623] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  126.956768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  126.958887] Call Trace:
[  126.959814]  <TASK>
[  126.960569]  ? __die+0x20/0x70
[  126.961645]  ? page_fault_oops+0x15a/0x450
[  126.962930]  ? search_module_extables+0x33/0x90
[  126.964374]  ? fixup_exception+0x22/0x310
[  126.965693]  ? exc_page_fault+0x68/0x200
[  126.967371]  ? asm_exc_page_fault+0x22/0x30
[  126.968713]  ? cxl_region_perf_attrs_callback+0x25/0x110 [cxl_core]
[  126.972508]  notifier_call_chain+0x40/0x110
[  126.974380]  blocking_notifier_call_chain+0x43/0x60
[  126.975788]  online_pages+0x24c/0x2d0
[  126.977008]  memory_subsys_online+0x233/0x290
[  126.978338]  device_online+0x64/0x90
[  126.979440]  state_store+0xae/0xc0
[  126.980510]  kernfs_fop_write_iter+0x143/0x200
[  126.981734]  vfs_write+0x3a6/0x570
[  126.982851]  ksys_write+0x65/0xf0
[  126.984006]  do_syscall_64+0x6d/0x140
[  126.985309]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[  126.986927] RIP: 0033:0x7fb2abc777a7
[  126.987983] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  126.992770] RSP: 002b:00007ffebec70b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  126.994874] RAX: ffffffffffffffda RBX: 000000000040e1f0 RCX: 00007fb2abc777a7
[  126.996906] RDX: 000000000000000f RSI: 00007fb2abdb6434 RDI: 0000000000000004
[  126.998911] RBP: 00007ffebec70bd0 R08: 0000000000000000 R09: 00007ffebec70640
[  127.000879] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403840
[  127.003572] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  127.005543]  </TASK>

Thanks
Zhijian

On 17/05/2024 01:12, fan wrote:
> On Tue, May 14, 2024 at 02:16:51AM +0000, Zhijian Li (Fujitsu) wrote:
>> Hi Fan
>>
>>
>> Do you have a newer instruction to play with the DCD. It seems that
>> the instruction in RFC[0] doesn't work for current code.
>>
>> [0] https://lore.kernel.org/all/20230511175609.2091136-1-fan.ni@samsung.com/
>>
> 
> For the testing, the only thing that has been changed for this series is
> the QMP interface for add/release DC extents.
> 
> https://lore.kernel.org/linux-cxl/d708f7c8-2598-4a17-9cbb-935c6ae2a2be@fujitsu.com/T/#m05066f0098e976fb1c4b05db5e7ff7ca1bf27b1e
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>    "arguments": {
>        "path": "/machine/peripheral/cxl-dcd0",
>        "hid": 0,
>        "selection-policy": 2,
>        "region-id": 0,
>        "tag": "",
>        "extents": [
>        {
>            "offset": 0,
>            "len": 134217728
>        },
>        {
>            "offset": 134217728,
>            "len": 134217728
>        }
>        ]
>    }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>    "arguments": {
>        "path": "/machine/peripheral/cxl-dcd0",
>        "hid": 0,
>        "flags": 1,
>        "region-id": 0,
>        "tag": "",
>        "extents": [
>        {
>            "offset": 134217728,
>            "len": 134217728
>        }
>        ]
>    }
> }
> 
> btw, I have a wiki page to explain how to test CXL DCD with a tool I
> wrote.
> https://github.com/moking/moking.github.io/wiki/cxl%E2%80%90test%E2%80%90tool:-A-tool-to-ease-CXL-test-with-QEMU-setup%E2%80%90%E2%80%90Using-DCD-test-as-an-example
> 


> Let me know if you need more info for testing.
> 
> 
> Fan
> 
>>
>>
>> On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
>>> A git tree of this series can be found here (with one extra commit on top
>>> for printing out accepted/pending extent list):
>>> https://github.com/moking/qemu/tree/dcd-v7
>>>
>>> v6->v7:
>>>
>>> 1. Fixed the dvsec range register issue mentioned in the the cover letter in v6.
>>>      Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
>>> 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. (Jonathan)
>>> 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
>>> 4. Added "Reviewed-by" tag to Patch 7.
>>> 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
>>>      reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. (Jørgen)
>>> 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
>>>       (Jonathan)
>>> 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
>>> 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
>>>      tags, selection policy, flags in the interface. (Jonathan, Gregory)
>>> 9. Redesigned the pending list so extents in the same requests are grouped
>>>       together. A new data structure is introduced to represent "extent group"
>>>       in pending list.  (Jonathan)
>>> 10. Added support in QMP interface for "More" flag.
>>> 11. Check "Forced removal" flag for release request and not let it pass through.
>>> 12. Removed the dynamic capacity log type from CxlEventLog definition in cxl.json
>>>      to avoid the side effect it may introduce to inject error to DC event log.
>>>      (Jonathan)
>>> 13. Hard coded the event log type to dynamic capacity event log in QMP
>>>       interfaces. (Jonathan)
>>> 14. Adding space in between "-1]". (Jonathan)
>>> 15. Some minor comment fixes.
>>>
>>> The code is tested with similar setup and has passed similar tests as listed
>>> in the cover letter of v5[1] and v6[2].
>>> Also, the code is tested with the latest DCD kernel patchset[3].
>>>
>>> [1] Qemu DCD patchset v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
>>> [2] Qemu DCD patchset v6: https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan.cxl@gmail.com/T/#t
>>> [3] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
>>>
>>>
>>> Fan Ni (12):
>>>     hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>>>       payload of identify memory device command
>>>     hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>>>       and mailbox command support
>>>     include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
>>>       type3 memory devices
>>>     hw/mem/cxl_type3: Add support to create DC regions to type3 memory
>>>       devices
>>>     hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
>>>       size instead of mr as argument
>>>     hw/mem/cxl_type3: Add host backend and address space handling for DC
>>>       regions
>>>     hw/mem/cxl_type3: Add DC extent list representative and get DC extent
>>>       list mailbox support
>>>     hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>>>       dynamic capacity response
>>>     hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
>>>       extents
>>>     hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
>>>     hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
>>>     hw/mem/cxl_type3: Allow to release extent superset in QMP interface
>>>
>>>    hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
>>>    hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
>>>    hw/mem/cxl_type3_stubs.c    |  20 ++
>>>    include/hw/cxl/cxl_device.h |  81 ++++-
>>>    include/hw/cxl/cxl_events.h |  18 +
>>>    qapi/cxl.json               |  69 ++++
>>>    6 files changed, 1396 insertions(+), 45 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu
@ 2024-05-17  2:20       ` Zhijian Li (Fujitsu) via
  0 siblings, 0 replies; 65+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2024-05-17  2:20 UTC (permalink / raw)
  To: fan, ira.weiny
  Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, fan.ni

Fan,

Many thanks, it helps a lot. Previous I forgot to create a new dax device(daxctl create-device region0)
Question: Why we need to create a the dax0.1, why the dax0.0 doesn't associate to the new adding DCD region.

Ira,

Let me try to report a kernel panic.

kernel: dcd-2024-04-17
qemu: dcd-2024-04-17

QEMU command line:
164     <qemu:arg value='-device'/>
165     <qemu:arg value='cxl-type3,bus=cxl-rp-hb0rp0,persistent-memdev=cxl-pmem0,lsa=cxl-pmem-lsa0,id=pmem-dcmem,volatile-dc-memdev=cxl-dcmem0,num-dc-regions=4'/>
166     <qemu:arg value='-object'/>
167     <qemu:arg value='memory-backend-file,id=cxl-dcmem0,share=on,mem-path=/home/lizhijian/images/cxldcmem0.raw,size=2048M'/>
168     <qemu:arg value='-object'/>
169     <qemu:arg value='memory-backend-file,id=cxl-pmem0,share=on,mem-path=/home/lizhijian/images/cxlpmem0.raw,size=2048M'/>
170     <qemu:arg value='-object'/>
171     <qemu:arg value='memory-backend-file,id=cxl-pmem-lsa0,share=on,mem-path=/home/lizhijian/images/cxlpmem-lsa0.raw,size=4K'/>
172     <qemu:arg value='-M'/>
173     <qemu:arg value='cxl=on,cxl-fmw.0.targets.0=pxb-cxl.0,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k'/>


Reproducer:
  1. guest: ./create-dc.sh
  2. host: virsh qemu-monitor-command rdma-server-cxl-persistent-dcd $(cat cxl-add-dcd.json)
  3. guest: daxctl create-device region0 # will create dax0.1
  4. daxctl reconfigure-device  --mode=system-ram --force  dax0.1 -u  # kernel panic

=====================
# cat ./create-dc.sh
#!/bin/bash
set -ex

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
echo 0x10000000 > /sys/bus/cxl/devices/$region/size
echo "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind
=========================
# cat cxl-add-dcd.json
{ "execute": "cxl-add-dynamic-capacity",
   "arguments": {
       "path": "/machine/peripheral/pmem-dcmem",
       "hid": 0,
       "selection-policy": 2,
       "region-id": 0,
       "tag": "",
       "extents": [
       {
           "offset": 0,
           "len": 268435456
       }
       ]
   }
}



[  126.909297] Demotion targets for Node 0: preferred: 1, fallback: 1
[  126.911186] Demotion targets for Node 1: null
[  126.913808] BUG: kernel NULL pointer dereference, address: 0000000000000468
[  126.915431] #PF: supervisor read access in kernel mode
[  126.917156] #PF: error_code(0x0000) - not-present page
[  126.918976] PGD 8000000006771067 P4D 8000000006771067 PUD e777067 PMD 0
[  126.920587] Oops: 0000 [#1] PREEMPT SMP PTI
[  126.921714] CPU: 0 PID: 1101 Comm: daxctl Kdump: loaded Not tainted 6.9.0-rc3-lizhijian+ #489
[  126.924914] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  126.928620] RIP: 0010:cxl_region_perf_attrs_callback+0x25/0x110 [cxl_core]
[  126.930316] Code: 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 8b 6a 24 83 fd ff 74 20 48 83 fe 01 75 1a 48 8b 87 58 ff ff ff 48 89 fb <48> 8b b8 68 04 00 00 e8 cf a2 f4 e0 39 c5 74 13 45 31 e4 5b 44 89
[  126.934920] RSP: 0018:ffffc900007cbc58 EFLAGS: 00010246
[  126.936994] RAX: 0000000000000000 RBX: ffff888007534d60 RCX: 0000000000000020
[  126.939378] RDX: ffffc900007cbcf8 RSI: 0000000000000001 RDI: ffff888007534d60
[  126.942721] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
[  126.944762] R10: ffff88807fc31d80 R11: 0000000000000000 R12: 0000000000000000
[  126.946900] R13: 0000000000000001 R14: ffffc900007cbcf8 R15: ffff888007534d60
[  126.948871] FS:  00007fb2ab918880(0000) GS:ffff88807fc00000(0000) knlGS:0000000000000000
[  126.951241] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  126.952722] CR2: 0000000000000468 CR3: 000000000aaf0003 CR4: 00000000001706f0
[  126.954623] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  126.956768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  126.958887] Call Trace:
[  126.959814]  <TASK>
[  126.960569]  ? __die+0x20/0x70
[  126.961645]  ? page_fault_oops+0x15a/0x450
[  126.962930]  ? search_module_extables+0x33/0x90
[  126.964374]  ? fixup_exception+0x22/0x310
[  126.965693]  ? exc_page_fault+0x68/0x200
[  126.967371]  ? asm_exc_page_fault+0x22/0x30
[  126.968713]  ? cxl_region_perf_attrs_callback+0x25/0x110 [cxl_core]
[  126.972508]  notifier_call_chain+0x40/0x110
[  126.974380]  blocking_notifier_call_chain+0x43/0x60
[  126.975788]  online_pages+0x24c/0x2d0
[  126.977008]  memory_subsys_online+0x233/0x290
[  126.978338]  device_online+0x64/0x90
[  126.979440]  state_store+0xae/0xc0
[  126.980510]  kernfs_fop_write_iter+0x143/0x200
[  126.981734]  vfs_write+0x3a6/0x570
[  126.982851]  ksys_write+0x65/0xf0
[  126.984006]  do_syscall_64+0x6d/0x140
[  126.985309]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[  126.986927] RIP: 0033:0x7fb2abc777a7
[  126.987983] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  126.992770] RSP: 002b:00007ffebec70b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  126.994874] RAX: ffffffffffffffda RBX: 000000000040e1f0 RCX: 00007fb2abc777a7
[  126.996906] RDX: 000000000000000f RSI: 00007fb2abdb6434 RDI: 0000000000000004
[  126.998911] RBP: 00007ffebec70bd0 R08: 0000000000000000 R09: 00007ffebec70640
[  127.000879] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403840
[  127.003572] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  127.005543]  </TASK>

Thanks
Zhijian

On 17/05/2024 01:12, fan wrote:
> On Tue, May 14, 2024 at 02:16:51AM +0000, Zhijian Li (Fujitsu) wrote:
>> Hi Fan
>>
>>
>> Do you have a newer instruction to play with the DCD. It seems that
>> the instruction in RFC[0] doesn't work for current code.
>>
>> [0] https://lore.kernel.org/all/20230511175609.2091136-1-fan.ni@samsung.com/
>>
> 
> For the testing, the only thing that has been changed for this series is
> the QMP interface for add/release DC extents.
> 
> https://lore.kernel.org/linux-cxl/d708f7c8-2598-4a17-9cbb-935c6ae2a2be@fujitsu.com/T/#m05066f0098e976fb1c4b05db5e7ff7ca1bf27b1e
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>    "arguments": {
>        "path": "/machine/peripheral/cxl-dcd0",
>        "hid": 0,
>        "selection-policy": 2,
>        "region-id": 0,
>        "tag": "",
>        "extents": [
>        {
>            "offset": 0,
>            "len": 134217728
>        },
>        {
>            "offset": 134217728,
>            "len": 134217728
>        }
>        ]
>    }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>    "arguments": {
>        "path": "/machine/peripheral/cxl-dcd0",
>        "hid": 0,
>        "flags": 1,
>        "region-id": 0,
>        "tag": "",
>        "extents": [
>        {
>            "offset": 134217728,
>            "len": 134217728
>        }
>        ]
>    }
> }
> 
> btw, I have a wiki page to explain how to test CXL DCD with a tool I
> wrote.
> https://github.com/moking/moking.github.io/wiki/cxl%E2%80%90test%E2%80%90tool:-A-tool-to-ease-CXL-test-with-QEMU-setup%E2%80%90%E2%80%90Using-DCD-test-as-an-example
> 


> Let me know if you need more info for testing.
> 
> 
> Fan
> 
>>
>>
>> On 19/04/2024 07:10, nifan.cxl@gmail.com wrote:
>>> A git tree of this series can be found here (with one extra commit on top
>>> for printing out accepted/pending extent list):
>>> https://github.com/moking/qemu/tree/dcd-v7
>>>
>>> v6->v7:
>>>
>>> 1. Fixed the dvsec range register issue mentioned in the the cover letter in v6.
>>>      Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
>>> 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. (Jonathan)
>>> 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
>>> 4. Added "Reviewed-by" tag to Patch 7.
>>> 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
>>>      reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. (Jørgen)
>>> 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
>>>       (Jonathan)
>>> 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
>>> 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
>>>      tags, selection policy, flags in the interface. (Jonathan, Gregory)
>>> 9. Redesigned the pending list so extents in the same requests are grouped
>>>       together. A new data structure is introduced to represent "extent group"
>>>       in pending list.  (Jonathan)
>>> 10. Added support in QMP interface for "More" flag.
>>> 11. Check "Forced removal" flag for release request and not let it pass through.
>>> 12. Removed the dynamic capacity log type from CxlEventLog definition in cxl.json
>>>      to avoid the side effect it may introduce to inject error to DC event log.
>>>      (Jonathan)
>>> 13. Hard coded the event log type to dynamic capacity event log in QMP
>>>       interfaces. (Jonathan)
>>> 14. Adding space in between "-1]". (Jonathan)
>>> 15. Some minor comment fixes.
>>>
>>> The code is tested with similar setup and has passed similar tests as listed
>>> in the cover letter of v5[1] and v6[2].
>>> Also, the code is tested with the latest DCD kernel patchset[3].
>>>
>>> [1] Qemu DCD patchset v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
>>> [2] Qemu DCD patchset v6: https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan.cxl@gmail.com/T/#t
>>> [3] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
>>>
>>>
>>> Fan Ni (12):
>>>     hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>>>       payload of identify memory device command
>>>     hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>>>       and mailbox command support
>>>     include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
>>>       type3 memory devices
>>>     hw/mem/cxl_type3: Add support to create DC regions to type3 memory
>>>       devices
>>>     hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
>>>       size instead of mr as argument
>>>     hw/mem/cxl_type3: Add host backend and address space handling for DC
>>>       regions
>>>     hw/mem/cxl_type3: Add DC extent list representative and get DC extent
>>>       list mailbox support
>>>     hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>>>       dynamic capacity response
>>>     hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
>>>       extents
>>>     hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
>>>     hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
>>>     hw/mem/cxl_type3: Allow to release extent superset in QMP interface
>>>
>>>    hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
>>>    hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
>>>    hw/mem/cxl_type3_stubs.c    |  20 ++
>>>    include/hw/cxl/cxl_device.h |  81 ++++-
>>>    include/hw/cxl/cxl_events.h |  18 +
>>>    qapi/cxl.json               |  69 ++++
>>>    6 files changed, 1396 insertions(+), 45 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2024-05-17  2:21 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-18 23:10 [PATCH v7 00/12] Enabling DCD emulation support in Qemu nifan.cxl
2024-04-18 23:10 ` [PATCH v7 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
2024-04-19 16:40   ` Gregory Price
2024-04-18 23:10 ` [PATCH v7 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
2024-04-19 16:44   ` Gregory Price
2024-04-18 23:10 ` [PATCH v7 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
2024-04-19 16:45   ` Gregory Price
2024-04-18 23:10 ` [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
2024-04-19 16:47   ` Gregory Price
2024-05-14  8:14   ` Zhijian Li (Fujitsu) via
2024-05-14  8:14     ` Zhijian Li (Fujitsu)
2024-05-16 17:06     ` fan
2024-04-18 23:10 ` [PATCH v7 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument nifan.cxl
2024-04-19 16:39   ` Gregory Price
2024-04-18 23:10 ` [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
2024-04-19 17:27   ` Gregory Price
2024-04-22 11:55     ` Jonathan Cameron
2024-04-22 11:55       ` Jonathan Cameron via
2024-04-22 11:52   ` Jonathan Cameron
2024-04-22 11:52     ` Jonathan Cameron via
2024-05-14  8:28   ` Zhijian Li (Fujitsu) via
2024-05-14  8:28     ` Zhijian Li (Fujitsu)
2024-05-16 17:07     ` fan
2024-04-18 23:10 ` [PATCH v7 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
2024-04-19 16:52   ` Gregory Price
2024-04-18 23:10 ` [PATCH v7 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
2024-04-19 18:12   ` Gregory Price
2024-04-18 23:11 ` [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
2024-04-19 18:13   ` Gregory Price
2024-04-22 12:01   ` Jonathan Cameron
2024-04-22 12:01     ` Jonathan Cameron via
2024-04-26  9:12   ` Markus Armbruster
2024-04-26 17:31     ` fan
2024-04-29  7:58       ` Markus Armbruster
2024-04-30 17:17         ` fan
2024-05-01 14:58           ` Jonathan Cameron
2024-05-01 14:58             ` Jonathan Cameron via
2024-05-01 22:36             ` fan
2024-04-30 17:21         ` Jonathan Cameron
2024-04-30 17:21           ` Jonathan Cameron via
2024-05-01 22:29         ` fan
2024-05-14  2:35   ` Zhijian Li (Fujitsu)
2024-05-14  2:35     ` Zhijian Li (Fujitsu) via
2024-04-18 23:11 ` [PATCH v7 10/12] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions nifan.cxl
2024-04-19 16:57   ` Gregory Price
2024-04-18 23:11 ` [PATCH v7 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
2024-04-19 18:20   ` Gregory Price
2024-04-18 23:11 ` [PATCH v7 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface nifan.cxl
2024-04-19 18:20   ` Gregory Price
2024-04-19 18:24 ` [PATCH v7 00/12] Enabling DCD emulation support in Qemu Gregory Price
2024-04-19 18:43   ` fan
2024-04-20 20:35     ` Gregory Price
2024-04-22 12:04       ` Jonathan Cameron
2024-04-22 12:04         ` Jonathan Cameron via
2024-04-22 14:23         ` Jonathan Cameron
2024-04-22 14:23           ` Jonathan Cameron via
2024-04-22 15:07           ` Jonathan Cameron
2024-04-22 15:07             ` Jonathan Cameron via
2024-04-22 15:42         ` Gregory Price
2024-05-16 17:05   ` fan
2024-05-14  2:16 ` Zhijian Li (Fujitsu)
2024-05-14  2:16   ` Zhijian Li (Fujitsu) via
2024-05-16 17:12   ` fan
2024-05-17  2:20     ` Zhijian Li (Fujitsu)
2024-05-17  2:20       ` Zhijian Li (Fujitsu) via

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.