All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/12] Enabling DCD emulation support in Qemu
@ 2024-03-25 19:02 nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
                   ` (11 more replies)
  0 siblings, 12 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

A git tree of his series can be found here (with one extra commit on top
for printing out accepted/pending extent list): 
https://github.com/moking/qemu/tree/dcd-v6

v5->v6:
1. Picked up tags;
2. Renamed start_region_id to start_rid; (Jonathan)
3. For get extent list mailbox command, add logic to adjust returned extent
   count based on output payload size; (Jonathan)
4. Use Range to detect extent comparison and overlaps; (Jonathan)
5. Renamed extents_pending_to_add to extents_pending; (Jonathan)
6. Updated the commit log of the qmp interface patch by renaming "dpa" to offset
    to align with the code. (Gregory)
7. For DC extent add response and release mailbox command, we use a 2 pass
   approach. The first pass is to detect any potential errors, and the second
   pass to update the in-device data structure;
8. For QMP interface for add/release DC extents, use 2 pass approach with the
   first pass detecting any faulty input and second pass filling the event log.
   Note, based on sswg discussion, we disallow release extents which has DPA
   range not accepted by the host yet;
9. We enforce the in-order process of the pending list for DC extent release
   mailbox command, and the head of pending list is handled accordingly.
10. The last patch from v5 has been removed from this series.

Note: we do not drop the DC changes in build_dvsecs which was suggested
by Jonathan, the reason is that during testing, we found in the current
kernel code, when checking whether the media is ready
(in cxl_await_media_ready), the devsec range registers are checked, for
dcd device, if we leave dvsec range registers unset, the device cannot be
put into "ready" state, which will cause the device inactive.
The related code is below,
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/core/pci.c?h=fixes&id=d206a76d7d2726f3b096037f2079ce0bd3ba329b#n195

Compared to v5[1], PATCH 8-9 and PATCH 11-12 are almost re-coded, so need more
care when review.

The code is tested with similar setup and has passed similar tests as listed
in the cover letter of v5.
Also, the code passes similar tests with the latest DCD kernel patchset[2].

[1] Qemu DCD patches v5: https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan.cxl@gmail.com/T/#t
[2] DCD kernel patches: https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3



Fan Ni (12):
  hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
    payload of identify memory device command
  hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
    and mailbox command support
  include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
    type3 memory devices
  hw/mem/cxl_type3: Add support to create DC regions to type3 memory
    devices
  hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
    size instead of mr as argument
  hw/mem/cxl_type3: Add host backend and address space handling for DC
    regions
  hw/mem/cxl_type3: Add DC extent list representative and get DC extent
    list mailbox support
  hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
    dynamic capacity response
  hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
    extents
  hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
  hw/mem/cxl_type3: Allow to release extent superset in QMP interface

 hw/cxl/cxl-mailbox-utils.c  | 644 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3.c          | 580 +++++++++++++++++++++++++++++---
 hw/mem/cxl_type3_stubs.c    |  14 +
 include/hw/cxl/cxl_device.h |  67 +++-
 include/hw/cxl/cxl_events.h |  18 +
 qapi/cxl.json               |  61 +++-
 6 files changed, 1334 insertions(+), 50 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v6 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Based on CXL spec r3.1 Table 8-127 (Identify Memory Device Output
Payload), dynamic capacity event log size should be part of
output of the Identify command.
Add dc_event_log_size to the output payload for the host to get the info.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4bcd727f4c..ba1d9901df 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -21,6 +21,7 @@
 #include "sysemu/hostmem.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
+#define CXL_DC_EVENT_LOG_SIZE 8
 
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
@@ -780,8 +781,9 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
         uint16_t inject_poison_limit;
         uint8_t poison_caps;
         uint8_t qos_telemetry_caps;
+        uint16_t dc_event_log_size;
     } QEMU_PACKED *id;
-    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
+    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x45);
     CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
     CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
@@ -807,6 +809,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     st24_le_p(id->poison_list_max_mer, 256);
     /* No limit - so limited by main poison record limit */
     stw_le_p(&id->inject_poison_limit, 0);
+    stw_le_p(&id->dc_event_log_size, CXL_DC_EVENT_LOG_SIZE);
 
     *len_out = sizeof(*id);
     return CXL_MBOX_SUCCESS;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Per cxl spec r3.1, add dynamic capacity region representative based on
Table 8-165 and extend the cxl type3 device definition to include dc region
information. Also, based on info in 8.2.9.9.9.1, add 'Get Dynamic Capacity
Configuration' mailbox support.

Note: we store region decode length as byte-wise length on the device, which
should be divided by 256 * MiB before being returned to the host
for "Get Dynamic Capacity Configuration" mailbox command per
specification.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 96 +++++++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_device.h | 16 +++++++
 2 files changed, 112 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index ba1d9901df..49c7944d93 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -22,6 +22,8 @@
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 #define CXL_DC_EVENT_LOG_SIZE 8
+#define CXL_NUM_EXTENTS_SUPPORTED 512
+#define CXL_NUM_TAGS_SUPPORTED 0
 
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
@@ -80,6 +82,8 @@ enum {
         #define GET_POISON_LIST        0x0
         #define INJECT_POISON          0x1
         #define CLEAR_POISON           0x2
+    DCD_CONFIG  = 0x48,
+        #define GET_DC_CONFIG          0x0
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1238,6 +1242,88 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * CXL r3.1 section 8.2.9.9.9.1: Get Dynamic Capacity Configuration
+ * (Opcode: 4800h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
+                                             uint8_t *payload_in,
+                                             size_t len_in,
+                                             uint8_t *payload_out,
+                                             size_t *len_out,
+                                             CXLCCI *cci)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    struct {
+        uint8_t region_cnt;
+        uint8_t start_rid;
+    } QEMU_PACKED *in = (void *)payload_in;
+    struct {
+        uint8_t num_regions;
+        uint8_t regions_returned;
+        uint8_t rsvd1[6];
+        struct {
+            uint64_t base;
+            uint64_t decode_len;
+            uint64_t region_len;
+            uint64_t block_size;
+            uint32_t dsmadhandle;
+            uint8_t flags;
+            uint8_t rsvd2[3];
+        } QEMU_PACKED records[];
+    } QEMU_PACKED *out = (void *)payload_out;
+    struct {
+        uint32_t num_extents_supported;
+        uint32_t num_extents_available;
+        uint32_t num_tags_supported;
+        uint32_t num_tags_available;
+    } QEMU_PACKED *extra_out;
+    uint16_t record_count;
+    uint16_t i;
+    uint16_t out_pl_len;
+    uint8_t start_rid;
+
+    start_rid = in->start_rid;
+    if (start_rid >= ct3d->dc.num_regions) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    record_count = MIN(ct3d->dc.num_regions - in->start_rid, in->region_cnt);
+
+    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+    extra_out = (void *)(payload_out + out_pl_len);
+    out_pl_len += sizeof(*extra_out);
+    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+    out->num_regions = ct3d->dc.num_regions;
+    out->regions_returned = record_count;
+    for (i = 0; i < record_count; i++) {
+        stq_le_p(&out->records[i].base,
+                 ct3d->dc.regions[start_rid + i].base);
+        stq_le_p(&out->records[i].decode_len,
+                 ct3d->dc.regions[start_rid + i].decode_len /
+                 CXL_CAPACITY_MULTIPLIER);
+        stq_le_p(&out->records[i].region_len,
+                 ct3d->dc.regions[start_rid + i].len);
+        stq_le_p(&out->records[i].block_size,
+                 ct3d->dc.regions[start_rid + i].block_size);
+        stl_le_p(&out->records[i].dsmadhandle,
+                 ct3d->dc.regions[start_rid + i].dsmadhandle);
+        out->records[i].flags = ct3d->dc.regions[start_rid + i].flags;
+    }
+    /*
+     * TODO: Assign values once extents and tags are introduced
+     * to use.
+     */
+    stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
+    stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
+    stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
+    stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
+
+    *len_out = out_pl_len;
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1282,6 +1368,11 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
         cmd_media_clear_poison, 72, 0 },
 };
 
+static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
+    [DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
+        cmd_dcd_get_dyn_cap_config, 2, 0 },
+};
+
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
     [INFOSTAT][IS_IDENTIFY] = { "IDENTIFY", cmd_infostat_identify, 0, 0 },
     [INFOSTAT][BACKGROUND_OPERATION_STATUS] = { "BACKGROUND_OPERATION_STATUS",
@@ -1487,7 +1578,12 @@ void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
 
 void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, size_t payload_max)
 {
+    CXLType3Dev *ct3d = CXL_TYPE3(d);
+
     cxl_copy_cci_commands(cci, cxl_cmd_set);
+    if (ct3d->dc.num_regions) {
+        cxl_copy_cci_commands(cci, cxl_cmd_set_dcd);
+    }
     cci->d = d;
 
     /* No separation for PCI MB as protocol handled in PCI device */
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index a5f8e25020..e839370266 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -422,6 +422,17 @@ typedef struct CXLPoison {
 typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 #define CXL_POISON_LIST_LIMIT 256
 
+#define DCD_MAX_NUM_REGION 8
+
+typedef struct CXLDCRegion {
+    uint64_t base;       /* aligned to 256*MiB */
+    uint64_t decode_len; /* aligned to 256*MiB */
+    uint64_t len;
+    uint64_t block_size;
+    uint32_t dsmadhandle;
+    uint8_t flags;
+} CXLDCRegion;
+
 struct CXLType3Dev {
     /* Private */
     PCIDevice parent_obj;
@@ -454,6 +465,11 @@ struct CXLType3Dev {
     unsigned int poison_list_cnt;
     bool poison_list_overflowed;
     uint64_t poison_list_overflow_ts;
+
+    struct dynamic_capacity {
+        uint8_t num_regions; /* 0-8 regions */
+        CXLDCRegion regions[DCD_MAX_NUM_REGION];
+    } dc;
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 04/12] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
pmem capacity, preparing for the introduction of dynamic capacity to support
dynamic capacity devices.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 4 ++--
 hw/mem/cxl_type3.c          | 8 ++++----
 include/hw/cxl/cxl_device.h | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 49c7944d93..0f2ad58a14 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -803,7 +803,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
     stq_le_p(&id->total_capacity,
-             cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER);
+             cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->persistent_capacity,
              cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->volatile_capacity,
@@ -1179,7 +1179,7 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     struct clear_poison_pl *in = (void *)payload_in;
 
     dpa = ldq_le_p(&in->dpa);
-    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->mem_size) {
+    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
         return CXL_MBOX_INVALID_PA;
     }
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index b679dfae1c..0836e9135b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -608,7 +608,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostvmem_as, vmr, v_name);
         ct3d->cxl_dstate.vmem_size = memory_region_size(vmr);
-        ct3d->cxl_dstate.mem_size += memory_region_size(vmr);
+        ct3d->cxl_dstate.static_mem_size += memory_region_size(vmr);
         g_free(v_name);
     }
 
@@ -631,7 +631,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostpmem_as, pmr, p_name);
         ct3d->cxl_dstate.pmem_size = memory_region_size(pmr);
-        ct3d->cxl_dstate.mem_size += memory_region_size(pmr);
+        ct3d->cxl_dstate.static_mem_size += memory_region_size(pmr);
         g_free(p_name);
     }
 
@@ -837,7 +837,7 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         return -EINVAL;
     }
 
-    if (*dpa_offset > ct3d->cxl_dstate.mem_size) {
+    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
         return -EINVAL;
     }
 
@@ -1010,7 +1010,7 @@ static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
         return false;
     }
 
-    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.mem_size) {
+    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
         return false;
     }
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e839370266..f7f56b44e3 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -234,7 +234,7 @@ typedef struct cxl_device_state {
     } timestamp;
 
     /* memory region size, HDM */
-    uint64_t mem_size;
+    uint64_t static_mem_size;
     uint64_t pmem_size;
     uint64_t vmem_size;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 04/12] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (2 preceding siblings ...)
  2024-03-25 19:02 ` [PATCH v6 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument nifan.cxl
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

With the change, when setting up memory for type3 memory device, we can
create DC regions.
A property 'num-dc-regions' is added to ct3_props to allow users to pass the
number of DC regions to create. To make it easier, other region parameters
like region base, length, and block size are hard coded. If needed,
these parameters can be added easily.

With the change, we can create DC regions with proper kernel side
support like below:

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways

echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size

echo 0x40000000 > /sys/bus/cxl/devices/$region/size
echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/mem/cxl_type3.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 0836e9135b..c83d6f8004 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -30,6 +30,7 @@
 #include "hw/pci/msix.h"
 
 #define DWORD_BYTE 4
+#define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 
 /* Default CDAT entries for a memory region */
 enum {
@@ -567,6 +568,46 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
     }
 }
 
+/*
+ * TODO: dc region configuration will be updated once host backend and address
+ * space support is added for DCD.
+ */
+static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
+{
+    int i;
+    uint64_t region_base = 0;
+    uint64_t region_len =  2 * GiB;
+    uint64_t decode_len = 2 * GiB;
+    uint64_t blk_size = 2 * MiB;
+    CXLDCRegion *region;
+    MemoryRegion *mr;
+
+    if (ct3d->hostvmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostvmem);
+        region_base += memory_region_size(mr);
+    }
+    if (ct3d->hostpmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostpmem);
+        region_base += memory_region_size(mr);
+    }
+    assert(region_base % CXL_CAPACITY_MULTIPLIER == 0);
+
+    for (i = 0, region = &ct3d->dc.regions[0];
+         i < ct3d->dc.num_regions;
+         i++, region++, region_base += region_len) {
+        *region = (CXLDCRegion) {
+            .base = region_base,
+            .decode_len = decode_len,
+            .len = region_len,
+            .block_size = blk_size,
+            /* dsmad_handle set when creating CDAT table entries */
+            .flags = 0,
+        };
+    }
+
+    return true;
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -635,6 +676,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         g_free(p_name);
     }
 
+    if (!cxl_create_dc_regions(ct3d, errp)) {
+        error_setg(errp, "setup DC regions failed");
+        return false;
+    }
+
     return true;
 }
 
@@ -930,6 +976,7 @@ static Property ct3_props[] = {
                      HostMemoryBackend *),
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
+    DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (3 preceding siblings ...)
  2024-03-25 19:02 ` [PATCH v6 04/12] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

The function ct3_build_cdat_entries_for_mr only uses size of the passed
memory region argument, refactor the function definition to make the passed
arguments more specific.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/mem/cxl_type3.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index c83d6f8004..a9e8bdc436 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -44,7 +44,7 @@ enum {
 };
 
 static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
-                                          int dsmad_handle, MemoryRegion *mr,
+                                          int dsmad_handle, uint64_t size,
                                           bool is_pmem, uint64_t dpa_base)
 {
     CDATDsmas *dsmas;
@@ -63,7 +63,7 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
         .DSMADhandle = dsmad_handle,
         .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
         .DPA_base = dpa_base,
-        .DPA_length = memory_region_size(mr),
+        .DPA_length = size,
     };
 
     /* For now, no memory side cache, plausiblish numbers */
@@ -132,7 +132,7 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
          */
         .EFI_memory_type_attr = is_pmem ? 2 : 1,
         .DPA_offset = 0,
-        .DPA_length = memory_region_size(mr),
+        .DPA_length = size,
     };
 
     /* Header always at start of structure */
@@ -149,6 +149,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     g_autofree CDATSubHeader **table = NULL;
     CXLType3Dev *ct3d = priv;
     MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+    uint64_t vmr_size = 0, pmr_size = 0;
     int dsmad_handle = 0;
     int cur_ent = 0;
     int len = 0;
@@ -163,6 +164,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
+        vmr_size = memory_region_size(volatile_mr);
     }
 
     if (ct3d->hostpmem) {
@@ -171,21 +173,22 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
+        pmr_size = memory_region_size(nonvolatile_mr);
     }
 
     table = g_malloc0(len * sizeof(*table));
 
     /* Now fill them in */
     if (volatile_mr) {
-        ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
+        ct3_build_cdat_entries_for_mr(table, dsmad_handle++, vmr_size,
                                       false, 0);
         cur_ent = CT3_CDAT_NUM_ENTRIES;
     }
 
     if (nonvolatile_mr) {
-        uint64_t base = volatile_mr ? memory_region_size(volatile_mr) : 0;
+        uint64_t base = vmr_size;
         ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
-                                      nonvolatile_mr, true, base);
+                                      pmr_size, true, base);
         cur_ent += CT3_CDAT_NUM_ENTRIES;
     }
     assert(len == cur_ent);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (4 preceding siblings ...)
  2024-03-25 19:02 ` [PATCH v6 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-04-05 10:58     ` Jonathan Cameron via
  2024-03-25 19:02 ` [PATCH v6 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Add (file/memory backed) host backend, all the dynamic capacity regions
will share a single, large enough host backend. Set up address space for
DC regions to support read/write operations to dynamic capacity for DCD.

With the change, following supports are added:
1. Add a new property to type3 device "volatile-dc-memdev" to point to host
   memory backend for dynamic capacity. Currently, all dc regions share one
   host backend.
2. Add namespace for dynamic capacity for read/write support;
3. Create cdat entries for each dynamic capacity region;

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  16 ++-
 hw/mem/cxl_type3.c          | 187 +++++++++++++++++++++++++++++-------
 include/hw/cxl/cxl_device.h |   8 ++
 3 files changed, 172 insertions(+), 39 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 0f2ad58a14..831cef0567 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -622,7 +622,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
                                                size_t *len_out,
                                                CXLCCI *cci)
 {
-    CXLDeviceState *cxl_dstate = &CXL_TYPE3(cci->d)->cxl_dstate;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
     struct {
         uint8_t slots_supported;
         uint8_t slot_info;
@@ -636,7 +637,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
     QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
 
     if ((cxl_dstate->vmem_size < CXL_CAPACITY_MULTIPLIER) ||
-        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER)) {
+        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) ||
+        (ct3d->dc.total_capacity < CXL_CAPACITY_MULTIPLIER)) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -793,7 +795,8 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -835,9 +838,11 @@ static CXLRetCode cmd_ccls_get_partition_info(const struct cxl_cmd *cmd,
         uint64_t next_pmem;
     } QEMU_PACKED *part_info = (void *)payload_out;
     QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
+    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -1179,7 +1184,8 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     struct clear_poison_pl *in = (void *)payload_in;
 
     dpa = ldq_le_p(&in->dpa);
-    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
+    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size +
+        ct3d->dc.total_capacity) {
         return CXL_MBOX_INVALID_PA;
     }
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a9e8bdc436..75ea9b20e1 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -45,7 +45,8 @@ enum {
 
 static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
                                           int dsmad_handle, uint64_t size,
-                                          bool is_pmem, uint64_t dpa_base)
+                                          bool is_pmem, bool is_dynamic,
+                                          uint64_t dpa_base)
 {
     CDATDsmas *dsmas;
     CDATDslbis *dslbis0;
@@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
             .length = sizeof(*dsmas),
         },
         .DSMADhandle = dsmad_handle,
-        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
+        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
+                 (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
         .DPA_base = dpa_base,
         .DPA_length = size,
     };
@@ -149,12 +151,13 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     g_autofree CDATSubHeader **table = NULL;
     CXLType3Dev *ct3d = priv;
     MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+    MemoryRegion *dc_mr = NULL;
     uint64_t vmr_size = 0, pmr_size = 0;
     int dsmad_handle = 0;
     int cur_ent = 0;
     int len = 0;
 
-    if (!ct3d->hostpmem && !ct3d->hostvmem) {
+    if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions) {
         return 0;
     }
 
@@ -176,21 +179,54 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
         pmr_size = memory_region_size(nonvolatile_mr);
     }
 
+    if (ct3d->dc.num_regions) {
+        if (!ct3d->dc.host_dc) {
+            return -EINVAL;
+        }
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        if (!dc_mr) {
+            return -EINVAL;
+        }
+        len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
+    }
+
     table = g_malloc0(len * sizeof(*table));
 
     /* Now fill them in */
     if (volatile_mr) {
         ct3_build_cdat_entries_for_mr(table, dsmad_handle++, vmr_size,
-                                      false, 0);
+                                      false, false, 0);
         cur_ent = CT3_CDAT_NUM_ENTRIES;
     }
 
     if (nonvolatile_mr) {
         uint64_t base = vmr_size;
         ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
-                                      pmr_size, true, base);
+                                      pmr_size, true, false, base);
         cur_ent += CT3_CDAT_NUM_ENTRIES;
     }
+
+    if (dc_mr) {
+        int i;
+        uint64_t region_base = vmr_size + pmr_size;
+
+        /*
+         * TODO: we assume the dynamic capacity to be volatile for now,
+         * non-volatile dynamic capacity will be added if needed in the
+         * future.
+         */
+        for (i = 0; i < ct3d->dc.num_regions; i++) {
+            ct3_build_cdat_entries_for_mr(&(table[cur_ent]),
+                                          dsmad_handle++,
+                                          ct3d->dc.regions[i].len,
+                                          false, true, region_base);
+            ct3d->dc.regions[i].dsmadhandle = dsmad_handle - 1;
+
+            cur_ent += CT3_CDAT_NUM_ENTRIES;
+            region_base += ct3d->dc.regions[i].len;
+        }
+    }
+
     assert(len == cur_ent);
 
     *cdat_table = g_steal_pointer(&table);
@@ -300,11 +336,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
             range2_size_hi = ct3d->hostpmem->size >> 32;
             range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                              (ct3d->hostpmem->size & 0xF0000000);
+        } else if (ct3d->dc.host_dc) {
+            range2_size_hi = ct3d->dc.host_dc->size >> 32;
+            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                             (ct3d->dc.host_dc->size & 0xF0000000);
         }
-    } else {
+    } else if (ct3d->hostpmem) {
         range1_size_hi = ct3d->hostpmem->size >> 32;
         range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                          (ct3d->hostpmem->size & 0xF0000000);
+        if (ct3d->dc.host_dc) {
+            range2_size_hi = ct3d->dc.host_dc->size >> 32;
+            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                             (ct3d->dc.host_dc->size & 0xF0000000);
+        }
+    } else {
+        range1_size_hi = ct3d->dc.host_dc->size >> 32;
+        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                         (ct3d->dc.host_dc->size & 0xF0000000);
     }
 
     dvsec = (uint8_t *)&(CXLDVSECDevice){
@@ -579,11 +628,27 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 {
     int i;
     uint64_t region_base = 0;
-    uint64_t region_len =  2 * GiB;
-    uint64_t decode_len = 2 * GiB;
+    uint64_t region_len;
+    uint64_t decode_len;
     uint64_t blk_size = 2 * MiB;
     CXLDCRegion *region;
     MemoryRegion *mr;
+    uint64_t dc_size;
+
+    mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+    dc_size = memory_region_size(mr);
+    region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
+
+    if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLIER) != 0) {
+        error_setg(errp, "host backend size must be multiples of region len");
+        return false;
+    }
+    if (region_len % CXL_CAPACITY_MULTIPLIER != 0) {
+        error_setg(errp, "DC region size is unaligned to %lx",
+                   CXL_CAPACITY_MULTIPLIER);
+        return false;
+    }
+    decode_len = region_len;
 
     if (ct3d->hostvmem) {
         mr = host_memory_backend_get_memory(ct3d->hostvmem);
@@ -606,6 +671,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
             /* dsmad_handle set when creating CDAT table entries */
             .flags = 0,
         };
+        ct3d->dc.total_capacity += region->len;
     }
 
     return true;
@@ -615,7 +681,8 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
 
-    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
+    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem
+        && !ct3d->dc.num_regions) {
         error_setg(errp, "at least one memdev property must be set");
         return false;
     } else if (ct3d->hostmem && ct3d->hostpmem) {
@@ -679,9 +746,41 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         g_free(p_name);
     }
 
-    if (!cxl_create_dc_regions(ct3d, errp)) {
-        error_setg(errp, "setup DC regions failed");
-        return false;
+    ct3d->dc.total_capacity = 0;
+    if (ct3d->dc.num_regions) {
+        MemoryRegion *dc_mr;
+        char *dc_name;
+
+        if (!ct3d->dc.host_dc) {
+            error_setg(errp, "dynamic capacity must have a backing device");
+            return false;
+        }
+
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        if (!dc_mr) {
+            error_setg(errp, "dynamic capacity must have a backing device");
+            return false;
+        }
+
+        /*
+         * TODO: set dc as volatile for now, non-volatile support can be added
+         * in the future if needed.
+         */
+        memory_region_set_nonvolatile(dc_mr, false);
+        memory_region_set_enabled(dc_mr, true);
+        host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
+        if (ds->id) {
+            dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
+        } else {
+            dc_name = g_strdup("cxl-dcd-dpa-dc-space");
+        }
+        address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
+        g_free(dc_name);
+
+        if (!cxl_create_dc_regions(ct3d, errp)) {
+            error_setg(errp, "setup DC regions failed");
+            return false;
+        }
     }
 
     return true;
@@ -773,6 +872,9 @@ err_release_cdat:
 err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
+    if (ct3d->dc.host_dc) {
+        address_space_destroy(&ct3d->dc.host_dc_as);
+    }
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -791,6 +893,9 @@ static void ct3_exit(PCIDevice *pci_dev)
     pcie_aer_exit(pci_dev);
     cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
+    if (ct3d->dc.host_dc) {
+        address_space_destroy(&ct3d->dc.host_dc_as);
+    }
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -869,16 +974,23 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
                                        AddressSpace **as,
                                        uint64_t *dpa_offset)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
+    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+        vmr_size = memory_region_size(vmr);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+        pmr_size = memory_region_size(pmr);
+    }
+    if (ct3d->dc.host_dc) {
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        dc_size = memory_region_size(dc_mr);
     }
 
-    if (!vmr && !pmr) {
+    if (!vmr && !pmr && !dc_mr) {
         return -ENODEV;
     }
 
@@ -886,19 +998,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         return -EINVAL;
     }
 
-    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
+    if (*dpa_offset >= vmr_size + pmr_size + dc_size) {
         return -EINVAL;
     }
 
-    if (vmr) {
-        if (*dpa_offset < memory_region_size(vmr)) {
-            *as = &ct3d->hostvmem_as;
-        } else {
-            *as = &ct3d->hostpmem_as;
-            *dpa_offset -= memory_region_size(vmr);
-        }
-    } else {
+    if (*dpa_offset < vmr_size) {
+        *as = &ct3d->hostvmem_as;
+    } else if (*dpa_offset < vmr_size + pmr_size) {
         *as = &ct3d->hostpmem_as;
+        *dpa_offset -= vmr_size;
+    } else {
+        *as = &ct3d->dc.host_dc_as;
+        *dpa_offset -= (vmr_size + pmr_size);
     }
 
     return 0;
@@ -980,6 +1091,8 @@ static Property ct3_props[] = {
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
+    DEFINE_PROP_LINK("volatile-dc-memdev", CXLType3Dev, dc.host_dc,
+                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1046,33 +1159,39 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
 
 static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
     AddressSpace *as;
+    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+        vmr_size = memory_region_size(vmr);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+        pmr_size = memory_region_size(pmr);
     }
+    if (ct3d->dc.host_dc) {
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        dc_size = memory_region_size(dc_mr);
+     }
 
-    if (!vmr && !pmr) {
+    if (!vmr && !pmr && !dc_mr) {
         return false;
     }
 
-    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
+    if (dpa_offset + CXL_CACHE_LINE_SIZE > vmr_size + pmr_size + dc_size) {
         return false;
     }
 
-    if (vmr) {
-        if (dpa_offset < memory_region_size(vmr)) {
-            as = &ct3d->hostvmem_as;
-        } else {
-            as = &ct3d->hostpmem_as;
-            dpa_offset -= memory_region_size(vmr);
-        }
-    } else {
+    if (dpa_offset < vmr_size) {
+        as = &ct3d->hostvmem_as;
+    } else if (dpa_offset < vmr_size + pmr_size) {
         as = &ct3d->hostpmem_as;
+        dpa_offset -= vmr_size;
+    } else {
+        as = &ct3d->dc.host_dc_as;
+        dpa_offset -= (vmr_size + pmr_size);
     }
 
     address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data,
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index f7f56b44e3..c2c3df0d2a 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -467,6 +467,14 @@ struct CXLType3Dev {
     uint64_t poison_list_overflow_ts;
 
     struct dynamic_capacity {
+        HostMemoryBackend *host_dc;
+        AddressSpace host_dc_as;
+        /*
+         * total_capacity is equivalent to the dynamic capability
+         * memory region size.
+         */
+        uint64_t total_capacity; /* 256M aligned */
+
         uint8_t num_regions; /* 0-8 regions */
         CXLDCRegion regions[DCD_MAX_NUM_REGION];
     } dc;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (5 preceding siblings ...)
  2024-03-25 19:02 ` [PATCH v6 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-04-05 11:08     ` Jonathan Cameron via
  2024-03-25 19:02 ` [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Add dynamic capacity extent list representative to the definition of
CXLType3Dev and implement get DC extent list mailbox command per
CXL.spec.3.1:.8.2.9.9.9.2.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 75 ++++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3.c          |  1 +
 include/hw/cxl/cxl_device.h | 22 +++++++++++
 3 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 831cef0567..30ef46a036 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -84,6 +84,7 @@ enum {
         #define CLEAR_POISON           0x2
     DCD_CONFIG  = 0x48,
         #define GET_DC_CONFIG          0x0
+        #define GET_DYN_CAP_EXT_LIST   0x1
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1322,7 +1323,8 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
      * to use.
      */
     stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
-    stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
+    stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED -
+             ct3d->dc.total_extent_count);
     stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
     stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
 
@@ -1330,6 +1332,74 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * CXL r3.1 section 8.2.9.9.9.2:
+ * Get Dynamic Capacity Extent List (Opcode 4801h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
+                                               uint8_t *payload_in,
+                                               size_t len_in,
+                                               uint8_t *payload_out,
+                                               size_t *len_out,
+                                               CXLCCI *cci)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    struct {
+        uint32_t extent_cnt;
+        uint32_t start_extent_id;
+    } QEMU_PACKED *in = (void *)payload_in;
+    struct {
+        uint32_t count;
+        uint32_t total_extents;
+        uint32_t generation_num;
+        uint8_t rsvd[4];
+        CXLDCExtentRaw records[];
+    } QEMU_PACKED *out = (void *)payload_out;
+    uint32_t start_extent_id = in->start_extent_id;
+    CXLDCExtentList *extent_list = &ct3d->dc.extents;
+    uint16_t record_count = 0, i = 0, record_done = 0;
+    uint16_t out_pl_len, size;
+    CXLDCExtent *ent;
+
+    if (start_extent_id > ct3d->dc.total_extent_count) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    record_count = MIN(in->extent_cnt,
+                       ct3d->dc.total_extent_count - start_extent_id);
+    size = CXL_MAILBOX_MAX_PAYLOAD_SIZE - sizeof(*out);
+    if (size / sizeof(out->records[0]) < record_count) {
+        record_count = size / sizeof(out->records[0]);
+    }
+    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+
+    stl_le_p(&out->count, record_count);
+    stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
+    stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
+
+    if (record_count > 0) {
+        CXLDCExtentRaw *out_rec = &out->records[record_done];
+
+        QTAILQ_FOREACH(ent, extent_list, node) {
+            if (i++ < start_extent_id) {
+                continue;
+            }
+            stq_le_p(&out_rec->start_dpa, ent->start_dpa);
+            stq_le_p(&out_rec->len, ent->len);
+            memcpy(&out_rec->tag, ent->tag, 0x10);
+            stw_le_p(&out_rec->shared_seq, ent->shared_seq);
+
+            record_done++;
+            if (record_done == record_count) {
+                break;
+            }
+        }
+    }
+
+    *len_out = out_pl_len;
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1377,6 +1447,9 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
     [DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
         cmd_dcd_get_dyn_cap_config, 2, 0 },
+    [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
+        "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
+        8, 0 },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 75ea9b20e1..5be3c904ba 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -673,6 +673,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         };
         ct3d->dc.total_capacity += region->len;
     }
+    QTAILQ_INIT(&ct3d->dc.extents);
 
     return true;
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index c2c3df0d2a..6aec6ac983 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -424,6 +424,25 @@ typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 
 #define DCD_MAX_NUM_REGION 8
 
+typedef struct CXLDCExtentRaw {
+    uint64_t start_dpa;
+    uint64_t len;
+    uint8_t tag[0x10];
+    uint16_t shared_seq;
+    uint8_t rsvd[0x6];
+} QEMU_PACKED CXLDCExtentRaw;
+
+typedef struct CXLDCExtent {
+    uint64_t start_dpa;
+    uint64_t len;
+    uint8_t tag[0x10];
+    uint16_t shared_seq;
+    uint8_t rsvd[0x6];
+
+    QTAILQ_ENTRY(CXLDCExtent) node;
+} CXLDCExtent;
+typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -474,6 +493,9 @@ struct CXLType3Dev {
          * memory region size.
          */
         uint64_t total_capacity; /* 256M aligned */
+        CXLDCExtentList extents;
+        uint32_t total_extent_count;
+        uint32_t ext_list_gen_seq;
 
         uint8_t num_regions; /* 0-8 regions */
         CXLDCRegion regions[DCD_MAX_NUM_REGION];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (6 preceding siblings ...)
  2024-03-25 19:02 ` [PATCH v6 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-04-04 13:32   ` Jørgen Hansen
  2024-04-05 11:39     ` Jonathan Cameron via
  2024-03-25 19:02 ` [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Per CXL spec 3.1, two mailbox commands are implemented:
Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.

For the process of the above two commands, we use two-pass approach.
Pass 1: Check whether the input payload is valid or not; if not, skip
        Pass 2 and return mailbox process error.
Pass 2: Do the real work--add or release extents, respectively.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 433 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3.c          |  11 +
 include/hw/cxl/cxl_device.h |   4 +
 3 files changed, 444 insertions(+), 4 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 30ef46a036..a9eca516c8 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -19,6 +19,7 @@
 #include "qemu/units.h"
 #include "qemu/uuid.h"
 #include "sysemu/hostmem.h"
+#include "qemu/range.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 #define CXL_DC_EVENT_LOG_SIZE 8
@@ -85,6 +86,8 @@ enum {
     DCD_CONFIG  = 0x48,
         #define GET_DC_CONFIG          0x0
         #define GET_DYN_CAP_EXT_LIST   0x1
+        #define ADD_DYN_CAP_RSP        0x2
+        #define RELEASE_DYN_CAP        0x3
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1400,6 +1403,422 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * Check whether any bit between addr[nr, nr+size) is set,
+ * return true if any bit is set, otherwise return false
+ */
+static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                              unsigned long size)
+{
+    unsigned long res = find_next_bit(addr, size + nr, nr);
+
+    return res < nr + size;
+}
+
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
+{
+    int i;
+    CXLDCRegion *region = &ct3d->dc.regions[0];
+
+    if (dpa < region->base ||
+        dpa >= region->base + ct3d->dc.total_capacity) {
+        return NULL;
+    }
+
+    /*
+     * CXL r3.1 section 9.13.3: Dynamic Capacity Device (DCD)
+     *
+     * Regions are used in increasing-DPA order, with Region 0 being used for
+     * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
+     * So check from the last region to find where the dpa belongs. Extents that
+     * cross multiple regions are not allowed.
+     */
+    for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
+        region = &ct3d->dc.regions[i];
+        if (dpa >= region->base) {
+            if (dpa + len > region->base + region->len) {
+                return NULL;
+            }
+            return region;
+        }
+    }
+
+    return NULL;
+}
+
+static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+                                             uint64_t dpa,
+                                             uint64_t len,
+                                             uint8_t *tag,
+                                             uint16_t shared_seq)
+{
+    CXLDCExtent *extent;
+
+    extent = g_new0(CXLDCExtent, 1);
+    extent->start_dpa = dpa;
+    extent->len = len;
+    if (tag) {
+        memcpy(extent->tag, tag, 0x10);
+    }
+    extent->shared_seq = shared_seq;
+
+    QTAILQ_INSERT_TAIL(list, extent, node);
+}
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+                                        CXLDCExtent *extent)
+{
+    QTAILQ_REMOVE(list, extent, node);
+    g_free(extent);
+}
+
+/*
+ * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
+ * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
+ */
+typedef struct CXLUpdateDCExtentListInPl {
+    uint32_t num_entries_updated;
+    uint8_t flags;
+    uint8_t rsvd[3];
+    /* CXL r3.1 Table 8-169: Updated Extent */
+    struct {
+        uint64_t start_dpa;
+        uint64_t len;
+        uint8_t rsvd[8];
+    } QEMU_PACKED updated_entries[];
+} QEMU_PACKED CXLUpdateDCExtentListInPl;
+
+/*
+ * For the extents in the extent list to operate, check whether they are valid
+ * 1. The extent should be in the range of a valid DC region;
+ * 2. The extent should not cross multiple regions;
+ * 3. The start DPA and the length of the extent should align with the block
+ * size of the region;
+ * 4. The address range of multiple extents in the list should not overlap.
+ */
+static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in)
+{
+    uint64_t min_block_size = UINT64_MAX;
+    CXLDCRegion *region = &ct3d->dc.regions[0];
+    CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
+    g_autofree unsigned long *blk_bitmap = NULL;
+    uint64_t dpa, len;
+    uint32_t i;
+
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        min_block_size = MIN(min_block_size, region->block_size);
+    }
+
+    blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
+                             ct3d->dc.regions[0].base) / min_block_size);
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        region = cxl_find_dc_region(ct3d, dpa, len);
+        if (!region) {
+            return CXL_MBOX_INVALID_PA;
+        }
+
+        dpa -= ct3d->dc.regions[0].base;
+        if (dpa % region->block_size || len % region->block_size) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        /* the dpa range already covered by some other extents in the list */
+        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
+            len / min_block_size)) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
+   }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in)
+{
+    uint32_t i;
+    CXLDCExtent *ent;
+    uint64_t dpa, len;
+    Range range1, range2;
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        range_init_nofail(&range1, dpa, len);
+
+        /*
+         * TODO: once the pending extent list is added, check against
+         * the list will be added here.
+         */
+
+        /* to-be-added range should not overlap with range already accepted */
+        QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
+            range_init_nofail(&range2, ent->start_dpa, ent->len);
+            if (range_overlaps_range(&range1, &range2)) {
+                return CXL_MBOX_INVALID_PA;
+            }
+        }
+    }
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
+ * An extent is added to the extent list and becomes usable only after the
+ * response is processed successfully
+ */
+static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCExtentList *extent_list = &ct3d->dc.extents;
+    uint32_t i;
+    uint64_t dpa, len;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        return CXL_MBOX_SUCCESS;
+    }
+
+    /* Adding extents causes exceeding device's extent tracking ability. */
+    if (in->num_entries_updated + ct3d->dc.total_extent_count >
+        CXL_NUM_EXTENTS_SUPPORTED) {
+        return CXL_MBOX_RESOURCES_EXHAUSTED;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
+        ct3d->dc.total_extent_count += 1;
+        /*
+         * TODO: we will add a pending extent list based on event log record
+         * and process the list according here.
+         */
+    }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * Copy extent list from src to dst
+ * Return value: number of extents copied
+ */
+static uint32_t copy_extent_list(CXLDCExtentList *dst,
+                                 const CXLDCExtentList *src)
+{
+    uint32_t cnt = 0;
+    CXLDCExtent *ent;
+
+    if (!dst || !src) {
+        return 0;
+    }
+
+    QTAILQ_FOREACH(ent, src, node) {
+        cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
+                                         ent->tag, ent->shared_seq);
+        cnt++;
+    }
+    return cnt;
+}
+
+static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in)
+{
+    CXLDCExtent *ent, *ent_next;
+    uint64_t dpa, len;
+    uint32_t i;
+    int cnt_delta = 0;
+    CXLDCExtentList tmp_list;
+    CXLRetCode ret = CXL_MBOX_SUCCESS;
+
+    if (in->num_entries_updated == 0) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    QTAILQ_INIT(&tmp_list);
+    copy_extent_list(&tmp_list, &ct3d->dc.extents);
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        Range range;
+
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        while (len > 0) {
+            QTAILQ_FOREACH(ent, &tmp_list, node) {
+                range_init_nofail(&range, ent->start_dpa, ent->len);
+
+                if (range_contains(&range, dpa)) {
+                    uint64_t len1, len2, len_done = 0;
+                    uint64_t ent_start_dpa = ent->start_dpa;
+                    uint64_t ent_len = ent->len;
+                    /*
+                     * Found the exact extent or the subset of an existing
+                     * extent.
+                     */
+                    if (range_contains(&range, dpa + len - 1)) {
+                        len1 = dpa - ent->start_dpa;
+                        len2 = ent_start_dpa + ent_len - dpa - len;
+                        len_done = ent_len - len1 - len2;
+
+                        cxl_remove_extent_from_extent_list(&tmp_list, ent);
+                        cnt_delta--;
+
+                        if (len1) {
+                            cxl_insert_extent_to_extent_list(&tmp_list,
+                                                             ent_start_dpa,
+                                                             len1, NULL, 0);
+                            cnt_delta++;
+                        }
+                        if (len2) {
+                            cxl_insert_extent_to_extent_list(&tmp_list,
+                                                             dpa + len,
+                                                             len2, NULL, 0);
+                            cnt_delta++;
+                        }
+
+                        if (cnt_delta + ct3d->dc.total_extent_count >
+                            CXL_NUM_EXTENTS_SUPPORTED) {
+                            ret = CXL_MBOX_RESOURCES_EXHAUSTED;
+                            goto free_and_exit;
+                        }
+                    } else {
+                        /*
+                         * TODO: we reject the attempt to remove an extent
+                         * that overlaps with multiple extents in the device
+                         * for now, we will allow it once superset release
+                         * support is added.
+                         */
+                        ret = CXL_MBOX_INVALID_PA;
+                        goto free_and_exit;
+                    }
+
+                    len -= len_done;
+                    /* len == 0 here until superset release is added */
+                    break;
+                }
+            }
+            if (len) {
+                ret = CXL_MBOX_INVALID_PA;
+                goto free_and_exit;
+            }
+        }
+    }
+free_and_exit:
+    QTAILQ_FOREACH_SAFE(ent, &tmp_list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&tmp_list, ent);
+    }
+
+    return ret;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
+ */
+static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCExtentList *extent_list = &ct3d->dc.extents;
+    CXLDCExtent *ent;
+    uint32_t i;
+    uint64_t dpa, len;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    ret = cxl_dc_extent_release_dry_run(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    /* From this point, all the extents to release are valid */
+    for (i = 0; i < in->num_entries_updated; i++) {
+        Range range;
+
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        while (len > 0) {
+            QTAILQ_FOREACH(ent, extent_list, node) {
+                range_init_nofail(&range, ent->start_dpa, ent->len);
+
+                /* Found the extent overlapping with */
+                if (range_contains(&range, dpa)) {
+                    uint64_t len1, len2 = 0, len_done = 0;
+                    uint64_t ent_start_dpa = ent->start_dpa;
+                    uint64_t ent_len = ent->len;
+
+                    len1 = dpa - ent_start_dpa;
+                    if (range_contains(&range, dpa + len - 1)) {
+                        len2 = ent_start_dpa + ent_len - dpa - len;
+                    }
+                    len_done = ent_len - len1 - len2;
+
+                    cxl_remove_extent_from_extent_list(extent_list, ent);
+                    ct3d->dc.total_extent_count -= 1;
+
+                    if (len1) {
+                        cxl_insert_extent_to_extent_list(extent_list,
+                                                         ent_start_dpa,
+                                                         len1, NULL, 0);
+                        ct3d->dc.total_extent_count += 1;
+                    }
+                    if (len2) {
+                        cxl_insert_extent_to_extent_list(extent_list,
+                                                         dpa + len,
+                                                         len2, NULL, 0);
+                        ct3d->dc.total_extent_count += 1;
+                    }
+
+                    len -= len_done;
+                    /*
+                     * len will always be 0 until superset release is add.
+                     * TODO: superset release will be added.
+                     */
+                    break;
+                }
+            }
+        }
+    }
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1413,15 +1832,15 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
     [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
         cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
     [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
-                                      cmd_events_get_interrupt_policy, 0, 0 },
+        cmd_events_get_interrupt_policy, 0, 0 },
     [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
-                                      cmd_events_set_interrupt_policy,
-                                      ~0, IMMEDIATE_CONFIG_CHANGE },
+        cmd_events_set_interrupt_policy,
+        ~0, IMMEDIATE_CONFIG_CHANGE },
     [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
         cmd_firmware_update_get_info, 0, 0 },
     [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
     [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set,
-                         8, IMMEDIATE_POLICY_CHANGE },
+        8, IMMEDIATE_POLICY_CHANGE },
     [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported,
                               0, 0 },
     [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
@@ -1450,6 +1869,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
     [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
         "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
         8, 0 },
+    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
+        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
+        ~0, IMMEDIATE_DATA_CHANGE },
+    [DCD_CONFIG][RELEASE_DYN_CAP] = {
+        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
+        ~0, IMMEDIATE_DATA_CHANGE },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 5be3c904ba..951bd79a82 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -678,6 +678,15 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
     return true;
 }
 
+static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
+{
+    CXLDCExtent *ent, *ent_next;
+
+    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
+    }
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -874,6 +883,7 @@ err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
@@ -895,6 +905,7 @@ static void ct3_exit(PCIDevice *pci_dev)
     cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 6aec6ac983..df3511e91b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -551,4 +551,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
 
 void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
 
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+                                        CXLDCExtent *extent);
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (7 preceding siblings ...)
  2024-03-25 19:02 ` [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-04-03 18:16   ` Gregory Price
  2024-04-05 12:18     ` Jonathan Cameron via
  2024-03-25 19:02 ` [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions nifan.cxl
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "region-id": 0,
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "region-id": 0,
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  26 ++--
 hw/mem/cxl_type3.c          | 252 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  14 ++
 include/hw/cxl/cxl_device.h |   8 ++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               |  61 ++++++++-
 6 files changed, 367 insertions(+), 12 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index a9eca516c8..7094e007b9 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1407,7 +1407,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1446,7 +1446,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1552,10 +1552,11 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 
         range_init_nofail(&range1, dpa, len);
 
-        /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
-         */
+        /* host-accepted DPA range must be contained by pending extent */
+        if (!cxl_extents_contains_dpa_range(&ct3d->dc.extents_pending,
+                                            dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1585,9 +1586,13 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLDCExtentList *extent_list = &ct3d->dc.extents;
     uint32_t i;
     uint64_t dpa, len;
+    CXLDCExtent *ent;
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
+        /* Always remove the first pending extent when response received. */
+        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
         return CXL_MBOX_SUCCESS;
     }
 
@@ -1604,6 +1609,8 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
     ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
     if (ret != CXL_MBOX_SUCCESS) {
+        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
         return ret;
     }
 
@@ -1613,10 +1620,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list according here.
-         */
+
+        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
     }
 
     return CXL_MBOX_SUCCESS;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 951bd79a82..74cb64e843 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -674,6 +674,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -685,6 +686,10 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents_pending, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending,
+                                           ent);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1449,7 +1454,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
+    case CXL_EVENT_LOG_DYNCAP:
+        return CXL_EVENT_TYPE_DYNAMIC_CAP;
     default:
         return -EINVAL;
     }
@@ -1700,6 +1706,250 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len -1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len -1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event. Currently DC extents
+ * add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
+                                             CXLDCEventType type, uint16_t hid,
+                                             uint8_t rid,
+                                             CXLDCExtentRecordList *records,
+                                             Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDCExtentRecordList *list;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log;
+    uint64_t dpa, offset, len, block_size;
+    int i, rc;
+    g_autofree unsigned long *blk_bitmap = NULL;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+    rc = ct3d_qmp_cxl_event_log_enc(log);
+    if (rc < 0) {
+        error_setg(errp, "Unhandled error log type");
+        return;
+    }
+    enc_log = rc;
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        num_extents++;
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                               dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
+                                                dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        }
+        list = list->next;
+    }
+    if (num_extents == 0) {
+        error_setg(errp, "no valid extents to send to process");
+        return;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        list = list->next;
+        i++;
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    /*
+     * FIXME: for now, the "More" flag is cleared as there is only one
+     * extent associating with each record and tag-based release is
+     * not supported.
+     */
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
+                                             extents[i].start_dpa,
+                                             extents[i].len,
+                                             extents[i].tag,
+                                             extents[i].shared_seq);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
+                                    DC_EVENT_ADD_CAPACITY, 0,
+                                    region_id, records, errp);
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
+                                     DC_EVENT_RELEASE_CAPACITY, 0,
+                                     region_id, records, errp);
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..d913b11b4d 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,17 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..b84063d9f4 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -494,6 +494,7 @@ struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +556,11 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 8cc4c72fa9..2645004666 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -19,13 +19,16 @@
 #
 # @fatal: Fatal Event Log
 #
+# @dyncap: Dynamic Capacity Event Log
+#
 # Since: 8.1
 ##
 { 'enum': 'CxlEventLog',
   'data': ['informational',
            'warning',
            'failure',
-           'fatal']
+           'fatal',
+           'dyncap']
  }
 
 ##
@@ -361,3 +364,59 @@
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDCExtentRecord:
+#
+# Record of a single extent to add/release
+#
+# @offset: offset to the start of the region where the extent to be operated
+# @len: length of the extent
+#
+# Since: 9.0
+##
+{ 'struct': 'CXLDCExtentRecord',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to start add dynamic capacity extents flow. The device will
+# have to acknowledged the acceptance of the extents before they are usable.
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: id of the region where the extent to add
+# @extents: Extents to add
+#
+# Since : 9.0
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'region-id': 'uint8',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to start release dynamic capacity extents flow. The host will
+# need to respond to indicate that it has released the capacity before it
+# is made unavailable for read and write and can be re-added.
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: id of the region where the extent to release
+# @extents: Extents to release
+#
+# Since : 9.0
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'region-id': 'uint8',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (8 preceding siblings ...)
  2024-03-25 19:02 ` [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-04-05 12:29     ` Jonathan Cameron via
  2024-04-12 22:54   ` Gregory Price
  2024-03-25 19:02 ` [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
  2024-03-25 19:02 ` [PATCH v6 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface nifan.cxl
  11 siblings, 2 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

All dpa ranges in the DC regions are invalid to access until an extent
covering the range has been added. Add a bitmap for each region to
record whether a DC block in the region has been backed by DC extent.
For the bitmap, a bit in the bitmap represents a DC block. When a DC
extent is added, all the bits of the blocks in the extent will be set,
which will be cleared when the extent is released.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  6 +++
 hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_device.h |  7 ++++
 3 files changed, 89 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 7094e007b9..a0d2239176 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
+        ct3_set_region_block_backed(ct3d, dpa, len);
 
         ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
@@ -1798,18 +1799,23 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
 
                     cxl_remove_extent_from_extent_list(extent_list, ent);
                     ct3d->dc.total_extent_count -= 1;
+                    ct3_clear_region_block_backed(ct3d, ent_start_dpa,
+                                                  ent_len);
 
                     if (len1) {
                         cxl_insert_extent_to_extent_list(extent_list,
                                                          ent_start_dpa,
                                                          len1, NULL, 0);
                         ct3d->dc.total_extent_count += 1;
+                        ct3_set_region_block_backed(ct3d, ent_start_dpa,
+                                                    len1);
                     }
                     if (len2) {
                         cxl_insert_extent_to_extent_list(extent_list,
                                                          dpa + len,
                                                          len2, NULL, 0);
                         ct3d->dc.total_extent_count += 1;
+                        ct3_set_region_block_backed(ct3d, dpa + len, len2);
                     }
 
                     len -= len_done;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 74cb64e843..2628a6f50f 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -672,6 +672,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
             .flags = 0,
         };
         ct3d->dc.total_capacity += region->len;
+        region->blk_bitmap = bitmap_new(region->len / region->block_size);
     }
     QTAILQ_INIT(&ct3d->dc.extents);
     QTAILQ_INIT(&ct3d->dc.extents_pending);
@@ -682,6 +683,8 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    int i;
+    CXLDCRegion *region;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
@@ -690,6 +693,11 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending,
                                            ent);
     }
+
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        g_free(region->blk_bitmap);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -921,6 +929,70 @@ static void ct3_exit(PCIDevice *pci_dev)
     }
 }
 
+/*
+ * Mark the DPA range [dpa, dap + len - 1] to be backed and accessible. This
+ * happens when a DC extent is added and accepted by the host.
+ */
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                 uint64_t len)
+{
+    CXLDCRegion *region;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return;
+    }
+
+    bitmap_set(region->blk_bitmap, (dpa - region->base) / region->block_size,
+               len / region->block_size);
+}
+
+/*
+ * Check whether the DPA range [dpa, dpa + len - 1] is backed with DC extents.
+ * Used when validating read/write to dc regions
+ */
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                  uint64_t len)
+{
+    CXLDCRegion *region;
+    uint64_t nbits;
+    long nr;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return false;
+    }
+
+    nr = (dpa - region->base) / region->block_size;
+    nbits = DIV_ROUND_UP(len, region->block_size);
+    /*
+     * if bits between [dpa, dpa + len) are all 1s, meaning the DPA range is
+     * backed with DC extents, return true; else return false.
+     */
+    return find_next_zero_bit(region->blk_bitmap, nr + nbits, nr) == nr + nbits;
+}
+
+/*
+ * Mark the DPA range [dpa, dap + len -1] to be unbacked and inaccessible.
+ * This happens when a dc extent is released by the host.
+ */
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                   uint64_t len)
+{
+    CXLDCRegion *region;
+    uint64_t nbits;
+    long nr;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return;
+    }
+
+    nr = (dpa - region->base) / region->block_size;
+    nbits = len / region->block_size;
+    bitmap_clear(region->blk_bitmap, nr, nbits);
+}
+
 static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
 {
     int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;
@@ -1025,6 +1097,10 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         *as = &ct3d->hostpmem_as;
         *dpa_offset -= vmr_size;
     } else {
+        if (!ct3_test_region_block_backed(ct3d, *dpa_offset, size)) {
+            return -ENODEV;
+        }
+
         *as = &ct3d->dc.host_dc_as;
         *dpa_offset -= (vmr_size + pmr_size);
     }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index b84063d9f4..bc90da2ca2 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -450,6 +450,7 @@ typedef struct CXLDCRegion {
     uint64_t block_size;
     uint32_t dsmadhandle;
     uint8_t flags;
+    unsigned long *blk_bitmap;
 } CXLDCRegion;
 
 struct CXLType3Dev {
@@ -563,4 +564,10 @@ bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                        unsigned long size);
 bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
                                     uint64_t dpa, uint64_t len);
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                 uint64_t len);
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                   uint64_t len);
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                  uint64_t len);
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (9 preceding siblings ...)
  2024-03-25 19:02 ` [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-04-05  9:57   ` Jørgen Hansen
  2024-04-05 12:32     ` Jonathan Cameron via
  2024-03-25 19:02 ` [PATCH v6 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface nifan.cxl
  11 siblings, 2 replies; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

With the change, we extend the extent release mailbox command processing
to allow more flexible release. As long as the DPA range of the extent to
release is covered by accepted extent(s) in the device, the release can be
performed.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c | 41 ++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index a0d2239176..3b7949c364 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1674,6 +1674,12 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
         dpa = in->updated_entries[i].start_dpa;
         len = in->updated_entries[i].len;
 
+        /* Check if the DPA range is not fully backed with valid extents */
+        if (!ct3_test_region_block_backed(ct3d, dpa, len)) {
+            ret = CXL_MBOX_INVALID_PA;
+            goto free_and_exit;
+        }
+        /* After this point, extent overflow is the only error can happen */
         while (len > 0) {
             QTAILQ_FOREACH(ent, &tmp_list, node) {
                 range_init_nofail(&range, ent->start_dpa, ent->len);
@@ -1713,25 +1719,27 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
                             goto free_and_exit;
                         }
                     } else {
-                        /*
-                         * TODO: we reject the attempt to remove an extent
-                         * that overlaps with multiple extents in the device
-                         * for now, we will allow it once superset release
-                         * support is added.
-                         */
-                        ret = CXL_MBOX_INVALID_PA;
-                        goto free_and_exit;
+                        len1 = dpa - ent_start_dpa;
+                        len2 = 0;
+                        len_done = ent_len - len1 - len2;
+
+                        cxl_remove_extent_from_extent_list(&tmp_list, ent);
+                        cnt_delta--;
+                        if (len1) {
+                            cxl_insert_extent_to_extent_list(&tmp_list,
+                                                             ent_start_dpa,
+                                                             len1, NULL, 0);
+                            cnt_delta++;
+                        }
                     }
 
                     len -= len_done;
-                    /* len == 0 here until superset release is added */
+                    if (len) {
+                        dpa = ent_start_dpa + ent_len;
+                    }
                     break;
                 }
             }
-            if (len) {
-                ret = CXL_MBOX_INVALID_PA;
-                goto free_and_exit;
-            }
         }
     }
 free_and_exit:
@@ -1819,10 +1827,9 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
                     }
 
                     len -= len_done;
-                    /*
-                     * len will always be 0 until superset release is add.
-                     * TODO: superset release will be added.
-                     */
+                    if (len > 0) {
+                        dpa = ent_start_dpa + ent_len;
+                    }
                     break;
                 }
             }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface
  2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (10 preceding siblings ...)
  2024-03-25 19:02 ` [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
@ 2024-03-25 19:02 ` nifan.cxl
  2024-04-05 12:33     ` Jonathan Cameron via
  11 siblings, 1 reply; 65+ messages in thread
From: nifan.cxl @ 2024-03-25 19:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
	jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Before the change, the QMP interface used for add/release DC extents
only allows to release an extent whose DPA range is contained by a single
accepted extent in the device.

With the change, we relax the constraints.  As long as the DPA range of
the extent is covered by accepted extents, we allow the release.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/mem/cxl_type3.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 2628a6f50f..62c2022477 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1935,8 +1935,7 @@ static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
                            "cannot release extent with pending DPA range");
                 return;
             }
-            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
-                                                dpa, len)) {
+            if (!ct3_test_region_block_backed(dcd, dpa, len)) {
                 error_setg(errp,
                            "cannot release extent with non-existing DPA range");
                 return;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-03-25 19:02 ` [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-04-03 18:16   ` Gregory Price
  2024-04-05 12:27       ` Jonathan Cameron via
  2024-04-05 12:18     ` Jonathan Cameron via
  1 sibling, 1 reply; 65+ messages in thread
From: Gregory Price @ 2024-04-03 18:16 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Mon, Mar 25, 2024 at 12:02:27PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
... snip 
> +
> +/*
> + * The main function to process dynamic capacity event. Currently DC extents
> + * add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> +                                             CXLDCEventType type, uint16_t hid,
> +                                             uint8_t rid,
> +                                             CXLDCExtentRecordList *records,
> +                                             Error **errp)
> +{
... snip 
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        num_extents++;

I think num_extents is always equal to the length of the list, otherwise
this code will return with error.

Nitpick:
This can be moved to the bottom w/ `list = list->next` to express that a
little more clearly.

> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                               dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> +                                                dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +    }
> +
> +    if (num_extents == 0) {

Since num_extents is always the length of the list, this is equivalent to
`if (!records)` prior to the while loop. Makes it a little more clear that:

1. There must be at least 1 extent
2. All extents must be valid for the command to be serviced.

> +        error_setg(errp, "no valid extents to send to process");
> +        return;
> +    }
> +

I'm looking at adding the MHD extensions around this point, e.g.:

/* If MHD cannot allocate requested extents, the cmd fails */
if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
    num_extents != dcd->mhd_dcd_extents_allocate(...))
	return;

where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
for correctness (shared // no double-allocations, etc). On success,
it garuantees proper ownership.

the release path would then be done in the release response path from
the host, as opposed to the release event injection.

Do you see any issues with that flow?

> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        list = list->next;
> +        i++;
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    /*
> +     * FIXME: for now, the "More" flag is cleared as there is only one
> +     * extent associating with each record and tag-based release is
> +     * not supported.
> +     */
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> +                                             extents[i].start_dpa,
> +                                             extents[i].len,
> +                                             extents[i].tag,
> +                                             extents[i].shared_seq);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {

Pardon if I missed a prior discussion about this, but what happens to
pending events in the scenario where cxl_event_insert fails?

~Gregory

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-03-25 19:02 ` [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
@ 2024-04-04 13:32   ` Jørgen Hansen
  2024-04-05 11:12       ` Jonathan Cameron via
                       ` (3 more replies)
  2024-04-05 11:39     ` Jonathan Cameron via
  1 sibling, 4 replies; 65+ messages in thread
From: Jørgen Hansen @ 2024-04-04 13:32 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	wj28.lee, Fan Ni

On 3/25/24 20:02, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> Per CXL spec 3.1, two mailbox commands are implemented:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> 
> For the process of the above two commands, we use two-pass approach.
> Pass 1: Check whether the input payload is valid or not; if not, skip
>          Pass 2 and return mailbox process error.
> Pass 2: Do the real work--add or release extents, respectively.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>   hw/cxl/cxl-mailbox-utils.c  | 433 +++++++++++++++++++++++++++++++++++-
>   hw/mem/cxl_type3.c          |  11 +
>   include/hw/cxl/cxl_device.h |   4 +
>   3 files changed, 444 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 30ef46a036..a9eca516c8 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -19,6 +19,7 @@
>   #include "qemu/units.h"
>   #include "qemu/uuid.h"
>   #include "sysemu/hostmem.h"
> +#include "qemu/range.h"
> 
>   #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
>   #define CXL_DC_EVENT_LOG_SIZE 8
> @@ -85,6 +86,8 @@ enum {
>       DCD_CONFIG  = 0x48,
>           #define GET_DC_CONFIG          0x0
>           #define GET_DYN_CAP_EXT_LIST   0x1
> +        #define ADD_DYN_CAP_RSP        0x2
> +        #define RELEASE_DYN_CAP        0x3
>       PHYSICAL_SWITCH = 0x51,
>           #define IDENTIFY_SWITCH_DEVICE      0x0
>           #define GET_PHYSICAL_PORT_STATE     0x1
> @@ -1400,6 +1403,422 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
>       return CXL_MBOX_SUCCESS;
>   }
> 
> +/*
> + * Check whether any bit between addr[nr, nr+size) is set,
> + * return true if any bit is set, otherwise return false
> + */
> +static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> +                              unsigned long size)
> +{
> +    unsigned long res = find_next_bit(addr, size + nr, nr);
> +
> +    return res < nr + size;
> +}
> +
> +CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
> +{
> +    int i;
> +    CXLDCRegion *region = &ct3d->dc.regions[0];
> +
> +    if (dpa < region->base ||
> +        dpa >= region->base + ct3d->dc.total_capacity) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * CXL r3.1 section 9.13.3: Dynamic Capacity Device (DCD)
> +     *
> +     * Regions are used in increasing-DPA order, with Region 0 being used for
> +     * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
> +     * So check from the last region to find where the dpa belongs. Extents that
> +     * cross multiple regions are not allowed.
> +     */
> +    for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
> +        region = &ct3d->dc.regions[i];
> +        if (dpa >= region->base) {
> +            if (dpa + len > region->base + region->len) {
> +                return NULL;
> +            }
> +            return region;
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> +                                             uint64_t dpa,
> +                                             uint64_t len,
> +                                             uint8_t *tag,
> +                                             uint16_t shared_seq)
> +{
> +    CXLDCExtent *extent;
> +
> +    extent = g_new0(CXLDCExtent, 1);
> +    extent->start_dpa = dpa;
> +    extent->len = len;
> +    if (tag) {
> +        memcpy(extent->tag, tag, 0x10);
> +    }
> +    extent->shared_seq = shared_seq;
> +
> +    QTAILQ_INSERT_TAIL(list, extent, node);
> +}
> +
> +void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> +                                        CXLDCExtent *extent)
> +{
> +    QTAILQ_REMOVE(list, extent, node);
> +    g_free(extent);
> +}
> +
> +/*
> + * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> + * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> + */
> +typedef struct CXLUpdateDCExtentListInPl {
> +    uint32_t num_entries_updated;
> +    uint8_t flags;
> +    uint8_t rsvd[3];
> +    /* CXL r3.1 Table 8-169: Updated Extent */
> +    struct {
> +        uint64_t start_dpa;
> +        uint64_t len;
> +        uint8_t rsvd[8];
> +    } QEMU_PACKED updated_entries[];
> +} QEMU_PACKED CXLUpdateDCExtentListInPl;
> +
> +/*
> + * For the extents in the extent list to operate, check whether they are valid
> + * 1. The extent should be in the range of a valid DC region;
> + * 2. The extent should not cross multiple regions;
> + * 3. The start DPA and the length of the extent should align with the block
> + * size of the region;
> + * 4. The address range of multiple extents in the list should not overlap.
> + */
> +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> +        const CXLUpdateDCExtentListInPl *in)
> +{
> +    uint64_t min_block_size = UINT64_MAX;
> +    CXLDCRegion *region = &ct3d->dc.regions[0];
> +    CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    uint64_t dpa, len;
> +    uint32_t i;
> +
> +    for (i = 0; i < ct3d->dc.num_regions; i++) {
> +        region = &ct3d->dc.regions[i];
> +        min_block_size = MIN(min_block_size, region->block_size);
> +    }
> +
> +    blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
> +                             ct3d->dc.regions[0].base) / min_block_size);
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        region = cxl_find_dc_region(ct3d, dpa, len);
> +        if (!region) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
> +
> +        dpa -= ct3d->dc.regions[0].base;
> +        if (dpa % region->block_size || len % region->block_size) {
> +            return CXL_MBOX_INVALID_EXTENT_LIST;
> +        }
> +        /* the dpa range already covered by some other extents in the list */
> +        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> +            len / min_block_size)) {
> +            return CXL_MBOX_INVALID_EXTENT_LIST;
> +        }
> +        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> +   }
> +
> +    return CXL_MBOX_SUCCESS;
> +}
> +
> +static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> +        const CXLUpdateDCExtentListInPl *in)
> +{
> +    uint32_t i;
> +    CXLDCExtent *ent;
> +    uint64_t dpa, len;
> +    Range range1, range2;
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        range_init_nofail(&range1, dpa, len);
> +
> +        /*
> +         * TODO: once the pending extent list is added, check against
> +         * the list will be added here.
> +         */
> +
> +        /* to-be-added range should not overlap with range already accepted */
> +        QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> +            range_init_nofail(&range2, ent->start_dpa, ent->len);
> +            if (range_overlaps_range(&range1, &range2)) {
> +                return CXL_MBOX_INVALID_PA;
> +            }
> +        }
> +    }
> +    return CXL_MBOX_SUCCESS;
> +}

Instead of iterating over all new extents and all existing extents, 
couldn't this be rolled into cxl_detect_malformed_extent_list - the 
bitmap created there summarizes all ranges of the new extents, so you 
can just check that the existing (and pending) extents don't overlap 
with anything in the bitmap? Or allow the bitmap to be returned and used 
for this check, since cxl_detect_malformed_extent_list is also used on 
release, where things aren't as simple.

> +
> +/*
> + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
> + * An extent is added to the extent list and becomes usable only after the
> + * response is processed successfully
> + */
> +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> +                                          uint8_t *payload_in,
> +                                          size_t len_in,
> +                                          uint8_t *payload_out,
> +                                          size_t *len_out,
> +                                          CXLCCI *cci)
> +{
> +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> +    uint32_t i;
> +    uint64_t dpa, len;
> +    CXLRetCode ret;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_SUCCESS;
> +    }

The mailbox processing in patch 2 converts from le explicitly, whereas 
the mailbox commands here don't. Looking at the existing mailbox 
commands, convertion doesn't seem to be rigorously applied, so maybe 
that is OK?

> +
> +    /* Adding extents causes exceeding device's extent tracking ability. */
> +    if (in->num_entries_updated + ct3d->dc.total_extent_count >
> +        CXL_NUM_EXTENTS_SUPPORTED) {
> +        return CXL_MBOX_RESOURCES_EXHAUSTED;
> +    }
> +
> +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> +        ct3d->dc.total_extent_count += 1;
> +        /*
> +         * TODO: we will add a pending extent list based on event log record
> +         * and process the list according here.
> +         */
> +    }
> +
> +    return CXL_MBOX_SUCCESS;
> +}
> +
> +/*
> + * Copy extent list from src to dst
> + * Return value: number of extents copied
> + */
> +static uint32_t copy_extent_list(CXLDCExtentList *dst,
> +                                 const CXLDCExtentList *src)
> +{
> +    uint32_t cnt = 0;
> +    CXLDCExtent *ent;
> +
> +    if (!dst || !src) {
> +        return 0;
> +    }
> +
> +    QTAILQ_FOREACH(ent, src, node) {
> +        cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
> +                                         ent->tag, ent->shared_seq);
> +        cnt++;
> +    }
> +    return cnt;
> +}
> +
> +static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
> +        const CXLUpdateDCExtentListInPl *in)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +    uint64_t dpa, len;
> +    uint32_t i;
> +    int cnt_delta = 0;
> +    CXLDCExtentList tmp_list;
> +    CXLRetCode ret = CXL_MBOX_SUCCESS;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    QTAILQ_INIT(&tmp_list);
> +    copy_extent_list(&tmp_list, &ct3d->dc.extents);
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        Range range;
> +
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        while (len > 0) {
> +            QTAILQ_FOREACH(ent, &tmp_list, node) {
> +                range_init_nofail(&range, ent->start_dpa, ent->len);
> +
> +                if (range_contains(&range, dpa)) {
> +                    uint64_t len1, len2, len_done = 0;
> +                    uint64_t ent_start_dpa = ent->start_dpa;
> +                    uint64_t ent_len = ent->len;
> +                    /*
> +                     * Found the exact extent or the subset of an existing
> +                     * extent.
> +                     */
> +                    if (range_contains(&range, dpa + len - 1)) {
> +                        len1 = dpa - ent->start_dpa;
> +                        len2 = ent_start_dpa + ent_len - dpa - len;
> +                        len_done = ent_len - len1 - len2;
> +
> +                        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> +                        cnt_delta--;
> +
> +                        if (len1) {
> +                            cxl_insert_extent_to_extent_list(&tmp_list,
> +                                                             ent_start_dpa,
> +                                                             len1, NULL, 0);
> +                            cnt_delta++;
> +                        }
> +                        if (len2) {
> +                            cxl_insert_extent_to_extent_list(&tmp_list,
> +                                                             dpa + len,
> +                                                             len2, NULL, 0);
> +                            cnt_delta++;
> +                        }
> +
> +                        if (cnt_delta + ct3d->dc.total_extent_count >
> +                            CXL_NUM_EXTENTS_SUPPORTED) {
> +                            ret = CXL_MBOX_RESOURCES_EXHAUSTED;
> +                            goto free_and_exit;
> +                        }
> +                    } else {
> +                        /*
> +                         * TODO: we reject the attempt to remove an extent
> +                         * that overlaps with multiple extents in the device
> +                         * for now, we will allow it once superset release
> +                         * support is added.
> +                         */
> +                        ret = CXL_MBOX_INVALID_PA;
> +                        goto free_and_exit;
> +                    }
> +
> +                    len -= len_done;
> +                    /* len == 0 here until superset release is added */
> +                    break;
> +                }
> +            }
> +            if (len) {
> +                ret = CXL_MBOX_INVALID_PA;
> +                goto free_and_exit;
> +            }
> +        }
> +    }
> +free_and_exit:
> +    QTAILQ_FOREACH_SAFE(ent, &tmp_list, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> +    }
> +
> +    return ret;
> +}
> +
> +/*
> + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> + */
> +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> +                                          uint8_t *payload_in,
> +                                          size_t len_in,
> +                                          uint8_t *payload_out,
> +                                          size_t *len_out,
> +                                          CXLCCI *cci)
> +{
> +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> +    CXLDCExtent *ent;
> +    uint32_t i;
> +    uint64_t dpa, len;
> +    CXLRetCode ret;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    ret = cxl_dc_extent_release_dry_run(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    /* From this point, all the extents to release are valid */
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        Range range;
> +
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        while (len > 0) {
> +            QTAILQ_FOREACH(ent, extent_list, node) {
> +                range_init_nofail(&range, ent->start_dpa, ent->len);
> +
> +                /* Found the extent overlapping with */
> +                if (range_contains(&range, dpa)) {
> +                    uint64_t len1, len2 = 0, len_done = 0;
> +                    uint64_t ent_start_dpa = ent->start_dpa;
> +                    uint64_t ent_len = ent->len;
> +
> +                    len1 = dpa - ent_start_dpa;
> +                    if (range_contains(&range, dpa + len - 1)) {
> +                        len2 = ent_start_dpa + ent_len - dpa - len;
> +                    }
> +                    len_done = ent_len - len1 - len2;
> +
> +                    cxl_remove_extent_from_extent_list(extent_list, ent);
> +                    ct3d->dc.total_extent_count -= 1;
> +
> +                    if (len1) {
> +                        cxl_insert_extent_to_extent_list(extent_list,
> +                                                         ent_start_dpa,
> +                                                         len1, NULL, 0);
> +                        ct3d->dc.total_extent_count += 1;
> +                    }
> +                    if (len2) {
> +                        cxl_insert_extent_to_extent_list(extent_list,
> +                                                         dpa + len,
> +                                                         len2, NULL, 0);
> +                        ct3d->dc.total_extent_count += 1;
> +                    }
> +
> +                    len -= len_done;
> +                    /*
> +                     * len will always be 0 until superset release is add.
> +                     * TODO: superset release will be added.
> +                     */
> +                    break;
> +                }
> +            }
> +        }
> +    }

The tmp_list generated in cxl_dc_extent_release_dry_run is identical to 
the updated extent_list after the loops above - so you could swap the 
existing extent_list with the tmp_list and adjust the number of extents 
with the cnt_delta calculated, if the dry run is successful - instead of 
duplicating the logic.

Thanks,
Jørgen

> +    return CXL_MBOX_SUCCESS;
> +}
> +
>   #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>   #define IMMEDIATE_DATA_CHANGE (1 << 2)
>   #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -1413,15 +1832,15 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
>       [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
>           cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
>       [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
> -                                      cmd_events_get_interrupt_policy, 0, 0 },
> +        cmd_events_get_interrupt_policy, 0, 0 },
>       [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
> -                                      cmd_events_set_interrupt_policy,
> -                                      ~0, IMMEDIATE_CONFIG_CHANGE },
> +        cmd_events_set_interrupt_policy,
> +        ~0, IMMEDIATE_CONFIG_CHANGE },
>       [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
>           cmd_firmware_update_get_info, 0, 0 },
>       [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
>       [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set,
> -                         8, IMMEDIATE_POLICY_CHANGE },
> +        8, IMMEDIATE_POLICY_CHANGE },
>       [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported,
>                                 0, 0 },
>       [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
> @@ -1450,6 +1869,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
>       [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
>           "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
>           8, 0 },
> +    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
> +        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
> +        ~0, IMMEDIATE_DATA_CHANGE },
> +    [DCD_CONFIG][RELEASE_DYN_CAP] = {
> +        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
> +        ~0, IMMEDIATE_DATA_CHANGE },
>   };
> 
>   static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 5be3c904ba..951bd79a82 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -678,6 +678,15 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>       return true;
>   }
> 
> +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +
> +    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> +    }
> +}
> +
>   static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>   {
>       DeviceState *ds = DEVICE(ct3d);
> @@ -874,6 +883,7 @@ err_free_special_ops:
>       g_free(regs->special_ops);
>   err_address_space_free:
>       if (ct3d->dc.host_dc) {
> +        cxl_destroy_dc_regions(ct3d);
>           address_space_destroy(&ct3d->dc.host_dc_as);
>       }
>       if (ct3d->hostpmem) {
> @@ -895,6 +905,7 @@ static void ct3_exit(PCIDevice *pci_dev)
>       cxl_doe_cdat_release(cxl_cstate);
>       g_free(regs->special_ops);
>       if (ct3d->dc.host_dc) {
> +        cxl_destroy_dc_regions(ct3d);
>           address_space_destroy(&ct3d->dc.host_dc_as);
>       }
>       if (ct3d->hostpmem) {
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 6aec6ac983..df3511e91b 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -551,4 +551,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
> 
>   void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
> 
> +CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> +
> +void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> +                                        CXLDCExtent *extent);
>   #endif
> --
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
  2024-03-25 19:02 ` [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
@ 2024-04-05  9:57   ` Jørgen Hansen
  2024-04-15 20:17     ` fan
  2024-04-05 12:32     ` Jonathan Cameron via
  1 sibling, 1 reply; 65+ messages in thread
From: Jørgen Hansen @ 2024-04-05  9:57 UTC (permalink / raw)
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	wj28.lee, Fan Ni

On 3/25/24 20:02, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> With the change, we extend the extent release mailbox command processing
> to allow more flexible release. As long as the DPA range of the extent to
> release is covered by accepted extent(s) in the device, the release can be
> performed.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>   hw/cxl/cxl-mailbox-utils.c | 41 ++++++++++++++++++++++----------------
>   1 file changed, 24 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index a0d2239176..3b7949c364 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1674,6 +1674,12 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
>           dpa = in->updated_entries[i].start_dpa;
>           len = in->updated_entries[i].len;
> 
> +        /* Check if the DPA range is not fully backed with valid extents */
> +        if (!ct3_test_region_block_backed(ct3d, dpa, len)) {
> +            ret = CXL_MBOX_INVALID_PA;
> +            goto free_and_exit;
> +        }

In cxl_dcd_add_dyn_cap_rsp_dry_run, the opposite check (all 0's in the 
bitmap) could be used instead of looping through the full extent list 
(and this also makes my previous comment about reusing the bitmap from 
cxl_detect_malformed_extent_list irrelevant).

> +        /* After this point, extent overflow is the only error can happen */
>           while (len > 0) {
>               QTAILQ_FOREACH(ent, &tmp_list, node) {
>                   range_init_nofail(&range, ent->start_dpa, ent->len);
> @@ -1713,25 +1719,27 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
>                               goto free_and_exit;
>                           }
>                       } else {
> -                        /*
> -                         * TODO: we reject the attempt to remove an extent
> -                         * that overlaps with multiple extents in the device
> -                         * for now, we will allow it once superset release
> -                         * support is added.
> -                         */
> -                        ret = CXL_MBOX_INVALID_PA;
> -                        goto free_and_exit;
> +                        len1 = dpa - ent_start_dpa;
> +                        len2 = 0;
> +                        len_done = ent_len - len1 - len2;

You don't need len2 in the else statement.

Thanks,
Jørgen

> +
> +                        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> +                        cnt_delta--;
> +                        if (len1) {
> +                            cxl_insert_extent_to_extent_list(&tmp_list,
> +                                                             ent_start_dpa,
> +                                                             len1, NULL, 0);
> +                            cnt_delta++;
> +                        }
>                       }
> 
>                       len -= len_done;
> -                    /* len == 0 here until superset release is added */
> +                    if (len) {
> +                        dpa = ent_start_dpa + ent_len;
> +                    }
>                       break;
>                   }
>               }
> -            if (len) {
> -                ret = CXL_MBOX_INVALID_PA;
> -                goto free_and_exit;
> -            }
>           }
>       }
>   free_and_exit:
> @@ -1819,10 +1827,9 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
>                       }
> 
>                       len -= len_done;
> -                    /*
> -                     * len will always be 0 until superset release is add.
> -                     * TODO: superset release will be added.
> -                     */
> +                    if (len > 0) {
> +                        dpa = ent_start_dpa + ent_len;
> +                    }
>                       break;
>                   }
>               }
> --
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-03-25 19:02 ` [PATCH v6 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2024-04-05 10:58     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 10:58 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:24 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Add (file/memory backed) host backend, all the dynamic capacity regions
> will share a single, large enough host backend. 

This doesn't parse.  I suggests splitting it into 2 sentences.

Add (file/memory backend) host backend for DCD.  All the dynamic capacity
regions will share a single, large enough host backend.

> Set up address space for
> DC regions to support read/write operations to dynamic capacity for DCD.
> 
> With the change, following supports are added:

Oddity of English wrt to plurals.

With this change, the following support is added.

> 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
>    memory backend for dynamic capacity. Currently, all dc regions share one
>    host backend.
> 2. Add namespace for dynamic capacity for read/write support;
> 3. Create cdat entries for each dynamic capacity region;
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
All comments trivial with exception of the one about setting size of range
registers. For now I think just set the flags and we will deal with whatever
output we get from the consortium in the long run.
With that tweaked.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  16 ++-
>  hw/mem/cxl_type3.c          | 187 +++++++++++++++++++++++++++++-------
>  include/hw/cxl/cxl_device.h |   8 ++
>  3 files changed, 172 insertions(+), 39 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 0f2ad58a14..831cef0567 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -622,7 +622,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,

> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index a9e8bdc436..75ea9b20e1 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -45,7 +45,8 @@ enum {



> +    if (dc_mr) {
> +        int i;
> +        uint64_t region_base = vmr_size + pmr_size;
> +
> +        /*
> +         * TODO: we assume the dynamic capacity to be volatile for now,
> +         * non-volatile dynamic capacity will be added if needed in the
> +         * future.

Trivial but I'd make that 2 sentences with a full stop after "now".


>      assert(len == cur_ent);
>  
>      *cdat_table = g_steal_pointer(&table);
> @@ -300,11 +336,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
>              range2_size_hi = ct3d->hostpmem->size >> 32;
>              range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                               (ct3d->hostpmem->size & 0xF0000000);
> +        } else if (ct3d->dc.host_dc) {
> +            range2_size_hi = ct3d->dc.host_dc->size >> 32;
> +            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                             (ct3d->dc.host_dc->size & 0xF0000000);
>          }
> -    } else {
> +    } else if (ct3d->hostpmem) {
>          range1_size_hi = ct3d->hostpmem->size >> 32;
>          range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                           (ct3d->hostpmem->size & 0xF0000000);
> +        if (ct3d->dc.host_dc) {
> +            range2_size_hi = ct3d->dc.host_dc->size >> 32;
> +            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                             (ct3d->dc.host_dc->size & 0xF0000000);
> +        }
> +    } else {
> +        range1_size_hi = ct3d->dc.host_dc->size >> 32;
> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                         (ct3d->dc.host_dc->size & 0xF0000000);
>      }

As per your cover letter this is a work around for an ambiguity in the
spec and what Linux is currently doing with.  However as per the call
the other day, Linux only checks the flags.  So I'd set those only and
not the size field.  We may have to deal with spec errata later, but
I don't want to block this series on the corner case in the meantime.

Given complexity of DC we'll be waiting for ever if we have to get
all clarifications before we land anything!
(Quick though those nice folk in the CXL consortium working groups are :))


> @@ -679,9 +746,41 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>          g_free(p_name);
>      }
>  
> -    if (!cxl_create_dc_regions(ct3d, errp)) {
> -        error_setg(errp, "setup DC regions failed");
> -        return false;
> +    ct3d->dc.total_capacity = 0;
> +    if (ct3d->dc.num_regions) {

Trivial suggestion.

As dc.num_regions already existed from patch 4, maybe it's worth pushing this
if statement back there?  It will be harmless short cut for cxl_create_dc_regions()
which won't do anything if num_regions = 0 anyway  but will reduce churn a little
in this patch.

> +        MemoryRegion *dc_mr;
> +        char *dc_name;
> +
> +        if (!ct3d->dc.host_dc) {
> +            error_setg(errp, "dynamic capacity must have a backing device");
> +            return false;
> +        }
> +
> +        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +        if (!dc_mr) {
> +            error_setg(errp, "dynamic capacity must have a backing device");
> +            return false;
> +        }
> +
> +        /*
> +         * TODO: set dc as volatile for now, non-volatile support can be added
> +         * in the future if needed.
> +         */
> +        memory_region_set_nonvolatile(dc_mr, false);
> +        memory_region_set_enabled(dc_mr, true);
> +        host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
> +        if (ds->id) {
> +            dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
> +        } else {
> +            dc_name = g_strdup("cxl-dcd-dpa-dc-space");
> +        }
> +        address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
> +        g_free(dc_name);
> +
> +        if (!cxl_create_dc_regions(ct3d, errp)) {
> +            error_setg(errp, "setup DC regions failed");
> +            return false;
> +        }
>      }

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
@ 2024-04-05 10:58     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 10:58 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:24 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Add (file/memory backed) host backend, all the dynamic capacity regions
> will share a single, large enough host backend. 

This doesn't parse.  I suggests splitting it into 2 sentences.

Add (file/memory backend) host backend for DCD.  All the dynamic capacity
regions will share a single, large enough host backend.

> Set up address space for
> DC regions to support read/write operations to dynamic capacity for DCD.
> 
> With the change, following supports are added:

Oddity of English wrt to plurals.

With this change, the following support is added.

> 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
>    memory backend for dynamic capacity. Currently, all dc regions share one
>    host backend.
> 2. Add namespace for dynamic capacity for read/write support;
> 3. Create cdat entries for each dynamic capacity region;
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
All comments trivial with exception of the one about setting size of range
registers. For now I think just set the flags and we will deal with whatever
output we get from the consortium in the long run.
With that tweaked.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  16 ++-
>  hw/mem/cxl_type3.c          | 187 +++++++++++++++++++++++++++++-------
>  include/hw/cxl/cxl_device.h |   8 ++
>  3 files changed, 172 insertions(+), 39 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 0f2ad58a14..831cef0567 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -622,7 +622,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,

> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index a9e8bdc436..75ea9b20e1 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -45,7 +45,8 @@ enum {



> +    if (dc_mr) {
> +        int i;
> +        uint64_t region_base = vmr_size + pmr_size;
> +
> +        /*
> +         * TODO: we assume the dynamic capacity to be volatile for now,
> +         * non-volatile dynamic capacity will be added if needed in the
> +         * future.

Trivial but I'd make that 2 sentences with a full stop after "now".


>      assert(len == cur_ent);
>  
>      *cdat_table = g_steal_pointer(&table);
> @@ -300,11 +336,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
>              range2_size_hi = ct3d->hostpmem->size >> 32;
>              range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                               (ct3d->hostpmem->size & 0xF0000000);
> +        } else if (ct3d->dc.host_dc) {
> +            range2_size_hi = ct3d->dc.host_dc->size >> 32;
> +            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                             (ct3d->dc.host_dc->size & 0xF0000000);
>          }
> -    } else {
> +    } else if (ct3d->hostpmem) {
>          range1_size_hi = ct3d->hostpmem->size >> 32;
>          range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                           (ct3d->hostpmem->size & 0xF0000000);
> +        if (ct3d->dc.host_dc) {
> +            range2_size_hi = ct3d->dc.host_dc->size >> 32;
> +            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                             (ct3d->dc.host_dc->size & 0xF0000000);
> +        }
> +    } else {
> +        range1_size_hi = ct3d->dc.host_dc->size >> 32;
> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                         (ct3d->dc.host_dc->size & 0xF0000000);
>      }

As per your cover letter this is a work around for an ambiguity in the
spec and what Linux is currently doing with.  However as per the call
the other day, Linux only checks the flags.  So I'd set those only and
not the size field.  We may have to deal with spec errata later, but
I don't want to block this series on the corner case in the meantime.

Given complexity of DC we'll be waiting for ever if we have to get
all clarifications before we land anything!
(Quick though those nice folk in the CXL consortium working groups are :))


> @@ -679,9 +746,41 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>          g_free(p_name);
>      }
>  
> -    if (!cxl_create_dc_regions(ct3d, errp)) {
> -        error_setg(errp, "setup DC regions failed");
> -        return false;
> +    ct3d->dc.total_capacity = 0;
> +    if (ct3d->dc.num_regions) {

Trivial suggestion.

As dc.num_regions already existed from patch 4, maybe it's worth pushing this
if statement back there?  It will be harmless short cut for cxl_create_dc_regions()
which won't do anything if num_regions = 0 anyway  but will reduce churn a little
in this patch.

> +        MemoryRegion *dc_mr;
> +        char *dc_name;
> +
> +        if (!ct3d->dc.host_dc) {
> +            error_setg(errp, "dynamic capacity must have a backing device");
> +            return false;
> +        }
> +
> +        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +        if (!dc_mr) {
> +            error_setg(errp, "dynamic capacity must have a backing device");
> +            return false;
> +        }
> +
> +        /*
> +         * TODO: set dc as volatile for now, non-volatile support can be added
> +         * in the future if needed.
> +         */
> +        memory_region_set_nonvolatile(dc_mr, false);
> +        memory_region_set_enabled(dc_mr, true);
> +        host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
> +        if (ds->id) {
> +            dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
> +        } else {
> +            dc_name = g_strdup("cxl-dcd-dpa-dc-space");
> +        }
> +        address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
> +        g_free(dc_name);
> +
> +        if (!cxl_create_dc_regions(ct3d, errp)) {
> +            error_setg(errp, "setup DC regions failed");
> +            return false;
> +        }
>      }


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
  2024-03-25 19:02 ` [PATCH v6 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
@ 2024-04-05 11:08     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 11:08 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:25 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Add dynamic capacity extent list representative to the definition of
> CXLType3Dev and implement get DC extent list mailbox command per
> CXL.spec.3.1:.8.2.9.9.9.2.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

One really minor comment inline.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

>  
> +/*
> + * CXL r3.1 section 8.2.9.9.9.2:
> + * Get Dynamic Capacity Extent List (Opcode 4801h)
> + */
> +static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> +                                               uint8_t *payload_in,
> +                                               size_t len_in,
> +                                               uint8_t *payload_out,
> +                                               size_t *len_out,
> +                                               CXLCCI *cci)
> +{
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    struct {
> +        uint32_t extent_cnt;
> +        uint32_t start_extent_id;
> +    } QEMU_PACKED *in = (void *)payload_in;
> +    struct {
> +        uint32_t count;
> +        uint32_t total_extents;
> +        uint32_t generation_num;
> +        uint8_t rsvd[4];
> +        CXLDCExtentRaw records[];
> +    } QEMU_PACKED *out = (void *)payload_out;
> +    uint32_t start_extent_id = in->start_extent_id;
> +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> +    uint16_t record_count = 0, i = 0, record_done = 0;
> +    uint16_t out_pl_len, size;
> +    CXLDCExtent *ent;
> +
> +    if (start_extent_id > ct3d->dc.total_extent_count) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    record_count = MIN(in->extent_cnt,
> +                       ct3d->dc.total_extent_count - start_extent_id);
> +    size = CXL_MAILBOX_MAX_PAYLOAD_SIZE - sizeof(*out);
> +    if (size / sizeof(out->records[0]) < record_count) {
> +        record_count = size / sizeof(out->records[0]);
> +    }

Could use another min for this I think?
	record_count = MIN(record_count, size / sizeof(out->records[0]);

> +    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +
> +    stl_le_p(&out->count, record_count);
> +    stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
> +    stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
> +
> +    if (record_count > 0) {
> +        CXLDCExtentRaw *out_rec = &out->records[record_done];
> +
> +        QTAILQ_FOREACH(ent, extent_list, node) {
> +            if (i++ < start_extent_id) {
> +                continue;
> +            }
> +            stq_le_p(&out_rec->start_dpa, ent->start_dpa);
> +            stq_le_p(&out_rec->len, ent->len);
> +            memcpy(&out_rec->tag, ent->tag, 0x10);
> +            stw_le_p(&out_rec->shared_seq, ent->shared_seq);
> +
> +            record_done++;
> +            if (record_done == record_count) {
> +                break;
> +            }
> +        }
> +    }
> +
> +    *len_out = out_pl_len;
> +    return CXL_MBOX_SUCCESS;
> +}


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
@ 2024-04-05 11:08     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 11:08 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:25 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Add dynamic capacity extent list representative to the definition of
> CXLType3Dev and implement get DC extent list mailbox command per
> CXL.spec.3.1:.8.2.9.9.9.2.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

One really minor comment inline.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

>  
> +/*
> + * CXL r3.1 section 8.2.9.9.9.2:
> + * Get Dynamic Capacity Extent List (Opcode 4801h)
> + */
> +static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> +                                               uint8_t *payload_in,
> +                                               size_t len_in,
> +                                               uint8_t *payload_out,
> +                                               size_t *len_out,
> +                                               CXLCCI *cci)
> +{
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    struct {
> +        uint32_t extent_cnt;
> +        uint32_t start_extent_id;
> +    } QEMU_PACKED *in = (void *)payload_in;
> +    struct {
> +        uint32_t count;
> +        uint32_t total_extents;
> +        uint32_t generation_num;
> +        uint8_t rsvd[4];
> +        CXLDCExtentRaw records[];
> +    } QEMU_PACKED *out = (void *)payload_out;
> +    uint32_t start_extent_id = in->start_extent_id;
> +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> +    uint16_t record_count = 0, i = 0, record_done = 0;
> +    uint16_t out_pl_len, size;
> +    CXLDCExtent *ent;
> +
> +    if (start_extent_id > ct3d->dc.total_extent_count) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    record_count = MIN(in->extent_cnt,
> +                       ct3d->dc.total_extent_count - start_extent_id);
> +    size = CXL_MAILBOX_MAX_PAYLOAD_SIZE - sizeof(*out);
> +    if (size / sizeof(out->records[0]) < record_count) {
> +        record_count = size / sizeof(out->records[0]);
> +    }

Could use another min for this I think?
	record_count = MIN(record_count, size / sizeof(out->records[0]);

> +    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +
> +    stl_le_p(&out->count, record_count);
> +    stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
> +    stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
> +
> +    if (record_count > 0) {
> +        CXLDCExtentRaw *out_rec = &out->records[record_done];
> +
> +        QTAILQ_FOREACH(ent, extent_list, node) {
> +            if (i++ < start_extent_id) {
> +                continue;
> +            }
> +            stq_le_p(&out_rec->start_dpa, ent->start_dpa);
> +            stq_le_p(&out_rec->len, ent->len);
> +            memcpy(&out_rec->tag, ent->tag, 0x10);
> +            stw_le_p(&out_rec->shared_seq, ent->shared_seq);
> +
> +            record_done++;
> +            if (record_done == record_count) {
> +                break;
> +            }
> +        }
> +    }
> +
> +    *len_out = out_pl_len;
> +    return CXL_MBOX_SUCCESS;
> +}



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-04-04 13:32   ` Jørgen Hansen
@ 2024-04-05 11:12       ` Jonathan Cameron via
  2024-04-09 19:21     ` fan
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 11:12 UTC (permalink / raw)
  To: Jørgen Hansen
  Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	wj28.lee, Fan Ni

On Thu, 4 Apr 2024 13:32:23 +0000
Jørgen Hansen <Jorgen.Hansen@wdc.com> wrote:

Hi Jørgen,

> > +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > +                                          uint8_t *payload_in,
> > +                                          size_t len_in,
> > +                                          uint8_t *payload_out,
> > +                                          size_t *len_out,
> > +                                          CXLCCI *cci)
> > +{
> > +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > +    uint32_t i;
> > +    uint64_t dpa, len;
> > +    CXLRetCode ret;
> > +
> > +    if (in->num_entries_updated == 0) {
> > +        return CXL_MBOX_SUCCESS;
> > +    }  
> 
> The mailbox processing in patch 2 converts from le explicitly, whereas 
> the mailbox commands here don't. Looking at the existing mailbox 
> commands, convertion doesn't seem to be rigorously applied, so maybe 
> that is OK?

The early CXL code didn't take this into account much at all. We've
sort of been fixing stuff up as we happen to be working on it. Hence
some stuff is big endian safe and some not :(

Patches welcome, but it would be good to not introduce more cases
that need fixing when we eventually clean them all up (and have
a big endian test platform to see if we got it right!)

Jonathan

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
@ 2024-04-05 11:12       ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 11:12 UTC (permalink / raw)
  To: Jørgen Hansen
  Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	wj28.lee, Fan Ni

On Thu, 4 Apr 2024 13:32:23 +0000
Jørgen Hansen <Jorgen.Hansen@wdc.com> wrote:

Hi Jørgen,

> > +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > +                                          uint8_t *payload_in,
> > +                                          size_t len_in,
> > +                                          uint8_t *payload_out,
> > +                                          size_t *len_out,
> > +                                          CXLCCI *cci)
> > +{
> > +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > +    uint32_t i;
> > +    uint64_t dpa, len;
> > +    CXLRetCode ret;
> > +
> > +    if (in->num_entries_updated == 0) {
> > +        return CXL_MBOX_SUCCESS;
> > +    }  
> 
> The mailbox processing in patch 2 converts from le explicitly, whereas 
> the mailbox commands here don't. Looking at the existing mailbox 
> commands, convertion doesn't seem to be rigorously applied, so maybe 
> that is OK?

The early CXL code didn't take this into account much at all. We've
sort of been fixing stuff up as we happen to be working on it. Hence
some stuff is big endian safe and some not :(

Patches welcome, but it would be good to not introduce more cases
that need fixing when we eventually clean them all up (and have
a big endian test platform to see if we got it right!)

Jonathan


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-03-25 19:02 ` [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
@ 2024-04-05 11:39     ` Jonathan Cameron via
  2024-04-05 11:39     ` Jonathan Cameron via
  1 sibling, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 11:39 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:26 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Per CXL spec 3.1, two mailbox commands are implemented:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> 
> For the process of the above two commands, we use two-pass approach.
> Pass 1: Check whether the input payload is valid or not; if not, skip
>         Pass 2 and return mailbox process error.
> Pass 2: Do the real work--add or release extents, respectively.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

A few additional comments from me.

Jonathan


> +/*
> + * For the extents in the extent list to operate, check whether they are valid
> + * 1. The extent should be in the range of a valid DC region;
> + * 2. The extent should not cross multiple regions;
> + * 3. The start DPA and the length of the extent should align with the block
> + * size of the region;
> + * 4. The address range of multiple extents in the list should not overlap.
> + */
> +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> +        const CXLUpdateDCExtentListInPl *in)
> +{
> +    uint64_t min_block_size = UINT64_MAX;
> +    CXLDCRegion *region = &ct3d->dc.regions[0];

This is immediately overwritten if num_regions != 0 (Which I think is checked before
calling this function).  So no need to initialize it.

> +    CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    uint64_t dpa, len;
> +    uint32_t i;
> +
> +    for (i = 0; i < ct3d->dc.num_regions; i++) {
> +        region = &ct3d->dc.regions[i];
> +        min_block_size = MIN(min_block_size, region->block_size);
> +    }
> +
> +    blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
> +                             ct3d->dc.regions[0].base) / min_block_size);
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        region = cxl_find_dc_region(ct3d, dpa, len);
> +        if (!region) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
> +
> +        dpa -= ct3d->dc.regions[0].base;
> +        if (dpa % region->block_size || len % region->block_size) {
> +            return CXL_MBOX_INVALID_EXTENT_LIST;
> +        }
> +        /* the dpa range already covered by some other extents in the list */
> +        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> +            len / min_block_size)) {
> +            return CXL_MBOX_INVALID_EXTENT_LIST;
> +        }
> +        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> +   }
> +
> +    return CXL_MBOX_SUCCESS;
> +}



> +/*
> + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
> + * An extent is added to the extent list and becomes usable only after the
> + * response is processed successfully
> + */
> +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> +                                          uint8_t *payload_in,
> +                                          size_t len_in,
> +                                          uint8_t *payload_out,
> +                                          size_t *len_out,
> +                                          CXLCCI *cci)
> +{
> +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> +    uint32_t i;
> +    uint64_t dpa, len;
> +    CXLRetCode ret;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_SUCCESS;
> +    }


A zero length response is a rejection of an offered set of extents.
Probably want a todo here to say this will wipe out part of the pending list
(similar to the one you have below).

> +
> +    /* Adding extents causes exceeding device's extent tracking ability. */
> +    if (in->num_entries_updated + ct3d->dc.total_extent_count >
> +        CXL_NUM_EXTENTS_SUPPORTED) {
> +        return CXL_MBOX_RESOURCES_EXHAUSTED;
> +    }
> +
> +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> +        ct3d->dc.total_extent_count += 1;
> +        /*
> +         * TODO: we will add a pending extent list based on event log record
> +         * and process the list according here.
> +         */
> +    }
> +
> +    return CXL_MBOX_SUCCESS;
> +}

> +static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
> +        const CXLUpdateDCExtentListInPl *in)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +    uint64_t dpa, len;
> +    uint32_t i;
> +    int cnt_delta = 0;
> +    CXLDCExtentList tmp_list;
> +    CXLRetCode ret = CXL_MBOX_SUCCESS;
> +
> +    if (in->num_entries_updated == 0) {

This is only used in paths where we already checked this. I don't hink
we need to repeat.

> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    QTAILQ_INIT(&tmp_list);
> +    copy_extent_list(&tmp_list, &ct3d->dc.extents);
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        Range range;
> +
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        while (len > 0) {
> +            QTAILQ_FOREACH(ent, &tmp_list, node) 
> +                range_init_nofail(&range, ent->start_dpa, ent->len);
> +
> +                if (range_contains(&range, dpa)) {
> +                    uint64_t len1, len2, len_done = 0;
> +                    uint64_t ent_start_dpa = ent->start_dpa;
> +                    uint64_t ent_len = ent->len;
> +                    /*
> +                     * Found the exact extent or the subset of an existing
> +                     * extent.
> +                     */
> +                    if (range_contains(&range, dpa + len - 1)) {
> +                        len1 = dpa - ent->start_dpa;
> +                        len2 = ent_start_dpa + ent_len - dpa - len;
> +                        len_done = ent_len - len1 - len2;
I'd like this to look a bit more like the real run - possibly allowing code
sharing. Though definitely see if there is a way to share more as Jorgen suggested.

                        len1 = dpa - ent_start_dpa;
                        if (range_contains(&range, dpa + len - 1) {
                            len 2 = ent_start_dpa + ent_len - dpa - len;
                        } else { /* maybe add an if (dry_run) here to allow code reuse */
                            /*
                             * TODO: we reject the attempt to remove an extent
                             * that overlaps with multiple extents in the device 
                             * for now, we will allow it once superset release
                             * support is added.
                             */
                             ret = CXL_MBOX_INVALID_PA;
                             goto free_and_exit;
			}
> +
> +                        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> +                        cnt_delta--;
> +
> +                        if (len1) {
> +                            cxl_insert_extent_to_extent_list(&tmp_list,
> +                                                             ent_start_dpa,
> +                                                             len1, NULL, 0);
> +                            cnt_delta++;
> +                        }
> +                        if (len2) {
> +                            cxl_insert_extent_to_extent_list(&tmp_list,
> +                                                             dpa + len,
> +                                                             len2, NULL, 0);
> +                            cnt_delta++;
> +                        }
> +
> +                        if (cnt_delta + ct3d->dc.total_extent_count >
> +                            CXL_NUM_EXTENTS_SUPPORTED) {
> +                            ret = CXL_MBOX_RESOURCES_EXHAUSTED;
> +                            goto free_and_exit;
> +                        }
> +                    } else {
> +                        /*
> +                         * TODO: we reject the attempt to remove an extent
> +                         * that overlaps with multiple extents in the device
> +                         * for now, we will allow it once superset release
> +                         * support is added.
> +                         */
> +                        ret = CXL_MBOX_INVALID_PA;
> +                        goto free_and_exit;
> +                    }
> +
> +                    len -= len_done;
> +                    /* len == 0 here until superset release is added */
> +                    break;
> +                }
> +            }
> +            if (len) {
> +                ret = CXL_MBOX_INVALID_PA;
> +                goto free_and_exit;
> +            }
> +        }
> +    }
> +free_and_exit:
> +    QTAILQ_FOREACH_SAFE(ent, &tmp_list, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> +    }
> +
> +    return ret;
> +}
> +
> +/*
> + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> + */
> +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> +                                          uint8_t *payload_in,
> +                                          size_t len_in,
> +                                          uint8_t *payload_out,
> +                                          size_t *len_out,
> +                                          CXLCCI *cci)
> +{
> +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> +    CXLDCExtent *ent;
> +    uint32_t i;
> +    uint64_t dpa, len;
> +    CXLRetCode ret;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    ret = cxl_dc_extent_release_dry_run(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    /* From this point, all the extents to release are valid */

known to be valid

> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        Range range;
Perhaps factor out the handling of each extent?  Will reduce indent
and give more readable code I think. 

> +
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        while (len > 0) {
> +            QTAILQ_FOREACH(ent, extent_list, node) {
> +                range_init_nofail(&range, ent->start_dpa, ent->len);
> +
> +                /* Found the extent overlapping with */
> +                if (range_contains(&range, dpa)) {
> +                    uint64_t len1, len2 = 0, len_done = 0;
> +                    uint64_t ent_start_dpa = ent->start_dpa;
> +                    uint64_t ent_len = ent->len;
> +
> +                    len1 = dpa - ent_start_dpa;
> +                    if (range_contains(&range, dpa + len - 1)) {
> +                        len2 = ent_start_dpa + ent_len - dpa - len;
> +                    }
> +                    len_done = ent_len - len1 - len2;
> +
> +                    cxl_remove_extent_from_extent_list(extent_list, ent);
> +                    ct3d->dc.total_extent_count -= 1;
> +
> +                    if (len1) {
> +                        cxl_insert_extent_to_extent_list(extent_list,
> +                                                         ent_start_dpa,
> +                                                         len1, NULL, 0);
> +                        ct3d->dc.total_extent_count += 1;
> +                    }
> +                    if (len2) {
> +                        cxl_insert_extent_to_extent_list(extent_list,
> +                                                         dpa + len,
> +                                                         len2, NULL, 0);
> +                        ct3d->dc.total_extent_count += 1;
> +                    }
> +
> +                    len -= len_done;
> +                    /*
> +                     * len will always be 0 until superset release is add.
> +                     * TODO: superset release will be added.
> +                     */
> +                    break;
> +                }
> +            }
> +        }
> +    }
> +    return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -1413,15 +1832,15 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
>      [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
>          cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
>      [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
> -                                      cmd_events_get_interrupt_policy, 0, 0 },
> +        cmd_events_get_interrupt_policy, 0, 0 },
>      [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
> -                                      cmd_events_set_interrupt_policy,
> -                                      ~0, IMMEDIATE_CONFIG_CHANGE },
> +        cmd_events_set_interrupt_policy,
> +        ~0, IMMEDIATE_CONFIG_CHANGE },

Avoid the reformatting in a patch that does other stuff.
Adds noise and hides any actual changes in the blocks re indented.

>      [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
>          cmd_firmware_update_get_info, 0, 0 },
>      [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
>      [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set,
> -                         8, IMMEDIATE_POLICY_CHANGE },
> +        8, IMMEDIATE_POLICY_CHANGE },
>      [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported,
>                                0, 0 },
>      [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
> @@ -1450,6 +1869,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
>      [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
>          "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
>          8, 0 },
> +    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
> +        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
> +        ~0, IMMEDIATE_DATA_CHANGE },
> +    [DCD_CONFIG][RELEASE_DYN_CAP] = {
> +        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
> +        ~0, IMMEDIATE_DATA_CHANGE },
>  };
>  



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
@ 2024-04-05 11:39     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 11:39 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:26 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Per CXL spec 3.1, two mailbox commands are implemented:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> 
> For the process of the above two commands, we use two-pass approach.
> Pass 1: Check whether the input payload is valid or not; if not, skip
>         Pass 2 and return mailbox process error.
> Pass 2: Do the real work--add or release extents, respectively.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

A few additional comments from me.

Jonathan


> +/*
> + * For the extents in the extent list to operate, check whether they are valid
> + * 1. The extent should be in the range of a valid DC region;
> + * 2. The extent should not cross multiple regions;
> + * 3. The start DPA and the length of the extent should align with the block
> + * size of the region;
> + * 4. The address range of multiple extents in the list should not overlap.
> + */
> +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> +        const CXLUpdateDCExtentListInPl *in)
> +{
> +    uint64_t min_block_size = UINT64_MAX;
> +    CXLDCRegion *region = &ct3d->dc.regions[0];

This is immediately overwritten if num_regions != 0 (Which I think is checked before
calling this function).  So no need to initialize it.

> +    CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    uint64_t dpa, len;
> +    uint32_t i;
> +
> +    for (i = 0; i < ct3d->dc.num_regions; i++) {
> +        region = &ct3d->dc.regions[i];
> +        min_block_size = MIN(min_block_size, region->block_size);
> +    }
> +
> +    blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
> +                             ct3d->dc.regions[0].base) / min_block_size);
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        region = cxl_find_dc_region(ct3d, dpa, len);
> +        if (!region) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
> +
> +        dpa -= ct3d->dc.regions[0].base;
> +        if (dpa % region->block_size || len % region->block_size) {
> +            return CXL_MBOX_INVALID_EXTENT_LIST;
> +        }
> +        /* the dpa range already covered by some other extents in the list */
> +        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> +            len / min_block_size)) {
> +            return CXL_MBOX_INVALID_EXTENT_LIST;
> +        }
> +        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> +   }
> +
> +    return CXL_MBOX_SUCCESS;
> +}



> +/*
> + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
> + * An extent is added to the extent list and becomes usable only after the
> + * response is processed successfully
> + */
> +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> +                                          uint8_t *payload_in,
> +                                          size_t len_in,
> +                                          uint8_t *payload_out,
> +                                          size_t *len_out,
> +                                          CXLCCI *cci)
> +{
> +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> +    uint32_t i;
> +    uint64_t dpa, len;
> +    CXLRetCode ret;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_SUCCESS;
> +    }


A zero length response is a rejection of an offered set of extents.
Probably want a todo here to say this will wipe out part of the pending list
(similar to the one you have below).

> +
> +    /* Adding extents causes exceeding device's extent tracking ability. */
> +    if (in->num_entries_updated + ct3d->dc.total_extent_count >
> +        CXL_NUM_EXTENTS_SUPPORTED) {
> +        return CXL_MBOX_RESOURCES_EXHAUSTED;
> +    }
> +
> +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> +        ct3d->dc.total_extent_count += 1;
> +        /*
> +         * TODO: we will add a pending extent list based on event log record
> +         * and process the list according here.
> +         */
> +    }
> +
> +    return CXL_MBOX_SUCCESS;
> +}

> +static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
> +        const CXLUpdateDCExtentListInPl *in)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +    uint64_t dpa, len;
> +    uint32_t i;
> +    int cnt_delta = 0;
> +    CXLDCExtentList tmp_list;
> +    CXLRetCode ret = CXL_MBOX_SUCCESS;
> +
> +    if (in->num_entries_updated == 0) {

This is only used in paths where we already checked this. I don't hink
we need to repeat.

> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    QTAILQ_INIT(&tmp_list);
> +    copy_extent_list(&tmp_list, &ct3d->dc.extents);
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        Range range;
> +
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        while (len > 0) {
> +            QTAILQ_FOREACH(ent, &tmp_list, node) 
> +                range_init_nofail(&range, ent->start_dpa, ent->len);
> +
> +                if (range_contains(&range, dpa)) {
> +                    uint64_t len1, len2, len_done = 0;
> +                    uint64_t ent_start_dpa = ent->start_dpa;
> +                    uint64_t ent_len = ent->len;
> +                    /*
> +                     * Found the exact extent or the subset of an existing
> +                     * extent.
> +                     */
> +                    if (range_contains(&range, dpa + len - 1)) {
> +                        len1 = dpa - ent->start_dpa;
> +                        len2 = ent_start_dpa + ent_len - dpa - len;
> +                        len_done = ent_len - len1 - len2;
I'd like this to look a bit more like the real run - possibly allowing code
sharing. Though definitely see if there is a way to share more as Jorgen suggested.

                        len1 = dpa - ent_start_dpa;
                        if (range_contains(&range, dpa + len - 1) {
                            len 2 = ent_start_dpa + ent_len - dpa - len;
                        } else { /* maybe add an if (dry_run) here to allow code reuse */
                            /*
                             * TODO: we reject the attempt to remove an extent
                             * that overlaps with multiple extents in the device 
                             * for now, we will allow it once superset release
                             * support is added.
                             */
                             ret = CXL_MBOX_INVALID_PA;
                             goto free_and_exit;
			}
> +
> +                        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> +                        cnt_delta--;
> +
> +                        if (len1) {
> +                            cxl_insert_extent_to_extent_list(&tmp_list,
> +                                                             ent_start_dpa,
> +                                                             len1, NULL, 0);
> +                            cnt_delta++;
> +                        }
> +                        if (len2) {
> +                            cxl_insert_extent_to_extent_list(&tmp_list,
> +                                                             dpa + len,
> +                                                             len2, NULL, 0);
> +                            cnt_delta++;
> +                        }
> +
> +                        if (cnt_delta + ct3d->dc.total_extent_count >
> +                            CXL_NUM_EXTENTS_SUPPORTED) {
> +                            ret = CXL_MBOX_RESOURCES_EXHAUSTED;
> +                            goto free_and_exit;
> +                        }
> +                    } else {
> +                        /*
> +                         * TODO: we reject the attempt to remove an extent
> +                         * that overlaps with multiple extents in the device
> +                         * for now, we will allow it once superset release
> +                         * support is added.
> +                         */
> +                        ret = CXL_MBOX_INVALID_PA;
> +                        goto free_and_exit;
> +                    }
> +
> +                    len -= len_done;
> +                    /* len == 0 here until superset release is added */
> +                    break;
> +                }
> +            }
> +            if (len) {
> +                ret = CXL_MBOX_INVALID_PA;
> +                goto free_and_exit;
> +            }
> +        }
> +    }
> +free_and_exit:
> +    QTAILQ_FOREACH_SAFE(ent, &tmp_list, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> +    }
> +
> +    return ret;
> +}
> +
> +/*
> + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> + */
> +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> +                                          uint8_t *payload_in,
> +                                          size_t len_in,
> +                                          uint8_t *payload_out,
> +                                          size_t *len_out,
> +                                          CXLCCI *cci)
> +{
> +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> +    CXLDCExtent *ent;
> +    uint32_t i;
> +    uint64_t dpa, len;
> +    CXLRetCode ret;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    ret = cxl_dc_extent_release_dry_run(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    /* From this point, all the extents to release are valid */

known to be valid

> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        Range range;
Perhaps factor out the handling of each extent?  Will reduce indent
and give more readable code I think. 

> +
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        while (len > 0) {
> +            QTAILQ_FOREACH(ent, extent_list, node) {
> +                range_init_nofail(&range, ent->start_dpa, ent->len);
> +
> +                /* Found the extent overlapping with */
> +                if (range_contains(&range, dpa)) {
> +                    uint64_t len1, len2 = 0, len_done = 0;
> +                    uint64_t ent_start_dpa = ent->start_dpa;
> +                    uint64_t ent_len = ent->len;
> +
> +                    len1 = dpa - ent_start_dpa;
> +                    if (range_contains(&range, dpa + len - 1)) {
> +                        len2 = ent_start_dpa + ent_len - dpa - len;
> +                    }
> +                    len_done = ent_len - len1 - len2;
> +
> +                    cxl_remove_extent_from_extent_list(extent_list, ent);
> +                    ct3d->dc.total_extent_count -= 1;
> +
> +                    if (len1) {
> +                        cxl_insert_extent_to_extent_list(extent_list,
> +                                                         ent_start_dpa,
> +                                                         len1, NULL, 0);
> +                        ct3d->dc.total_extent_count += 1;
> +                    }
> +                    if (len2) {
> +                        cxl_insert_extent_to_extent_list(extent_list,
> +                                                         dpa + len,
> +                                                         len2, NULL, 0);
> +                        ct3d->dc.total_extent_count += 1;
> +                    }
> +
> +                    len -= len_done;
> +                    /*
> +                     * len will always be 0 until superset release is add.
> +                     * TODO: superset release will be added.
> +                     */
> +                    break;
> +                }
> +            }
> +        }
> +    }
> +    return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -1413,15 +1832,15 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
>      [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
>          cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
>      [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
> -                                      cmd_events_get_interrupt_policy, 0, 0 },
> +        cmd_events_get_interrupt_policy, 0, 0 },
>      [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
> -                                      cmd_events_set_interrupt_policy,
> -                                      ~0, IMMEDIATE_CONFIG_CHANGE },
> +        cmd_events_set_interrupt_policy,
> +        ~0, IMMEDIATE_CONFIG_CHANGE },

Avoid the reformatting in a patch that does other stuff.
Adds noise and hides any actual changes in the blocks re indented.

>      [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
>          cmd_firmware_update_get_info, 0, 0 },
>      [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
>      [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set,
> -                         8, IMMEDIATE_POLICY_CHANGE },
> +        8, IMMEDIATE_POLICY_CHANGE },
>      [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported,
>                                0, 0 },
>      [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
> @@ -1450,6 +1869,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
>      [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
>          "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
>          8, 0 },
> +    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
> +        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
> +        ~0, IMMEDIATE_DATA_CHANGE },
> +    [DCD_CONFIG][RELEASE_DYN_CAP] = {
> +        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
> +        ~0, IMMEDIATE_DATA_CHANGE },
>  };
>  




^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-03-25 19:02 ` [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-04-05 12:18     ` Jonathan Cameron via
  2024-04-05 12:18     ` Jonathan Cameron via
  1 sibling, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 12:18 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:27 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "region-id": 0,
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }

Hi Fan,

I talk more on this inline, but to me this interface takes multiple extents
so that we can treat them as a single 'offer' of capacity. That is they
should be linked in the event log with the more flag and the host should
have to handle them in one go (I known Ira and Navneet's code doesn't handle
this yet, but that doesn't mean QEMU shouldn't).

Alternative for now would be to only support a single entry.  Keep the
interface defined to take multiple entries but reject it at runtime.

I don't want to end up with a more complex interface in the end just
because we allowed this form to not set the MORE flag today.
We will need this to do tagged handling and ultimately sharing, so good
to get it right from the start.

For tagged handling I think the right option is to have the tag alongside
region-id not in the individual extents.  That way the interface is naturally
used to generate the right description to the host.

>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "region-id": 0,
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>



>          /* to-be-added range should not overlap with range already accepted */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1585,9 +1586,13 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>      CXLDCExtentList *extent_list = &ct3d->dc.extents;
>      uint32_t i;
>      uint64_t dpa, len;
> +    CXLDCExtent *ent;
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> +        /* Always remove the first pending extent when response received. */
> +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
>          return CXL_MBOX_SUCCESS;
>      }
>  
> @@ -1604,6 +1609,8 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
>      if (ret != CXL_MBOX_SUCCESS) {
> +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);

Ah this deals with the todo I suggest you add to the earlier patch.
I'd not mind so much if you hadn't been so thorough on other todo notes ;)
Add one in the earlier patch and get rid of ti here like you do below.

However as I note below I think we need to handle these as groups of extents
not single extents. That way we keep an 'offered' set offered at the same time by
a single command (and expose to host using the more flag) together and reject
them on mass.


>          return ret;
>      }
>  
> @@ -1613,10 +1620,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
> -        /*
> -         * TODO: we will add a pending extent list based on event log record
> -         * and process the list according here.
> -         */
> +
> +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
>      }
>  
>      return CXL_MBOX_SUCCESS;
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 951bd79a82..74cb64e843 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c

>  
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> @@ -1449,7 +1454,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
>          return CXL_EVENT_TYPE_FAIL;
>      case CXL_EVENT_LOG_FATAL:
>          return CXL_EVENT_TYPE_FATAL;
> -/* DCD not yet supported */

Drop the comment but don't add the code.  We are handling DCD differently
from other events, so this code should never deal with it.

> +    case CXL_EVENT_LOG_DYNCAP:
> +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
>      default:
>          return -EINVAL;
>      }
> @@ -1700,6 +1706,250 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
>      }
>  }

> +/*
> + * Check whether the range [dpa, dpa + len -1] has overlaps with extents in

space after - (just looks odd otherwise)

> + * the list.
> + * Return value: return true if has overlaps; otherwise, return false
> + */
> +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> +                                           uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_overlaps_range(&range1, &range2)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * Check whether the range [dpa, dpa + len -1] is contained by extents in 

space after -

> + * the list.
> + * Will check multiple extents containment once superset release is added.
> + * Return value: return true if range is contained; otherwise, return false
> + */
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_contains_range(&range2, &range1)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * The main function to process dynamic capacity event. Currently DC extents
> + * add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,

As below. Don't pass in a CxlEventLog.  Whilst some infrastructure is shared
with other event logs, we don't want to accidentally enable other events
being added to the DC event log.

> +                                             CXLDCEventType type, uint16_t hid,
> +                                             uint8_t rid,
> +                                             CXLDCExtentRecordList *records,
> +                                             Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log;
> +    uint64_t dpa, offset, len, block_size;
> +    int i, rc;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +    rc = ct3d_qmp_cxl_event_log_enc(log);

enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP; always so don't look it up.

> +    if (rc < 0) {
> +        error_setg(errp, "Unhandled error log type");
> +        return;
> +    }
> +    enc_log = rc;
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        num_extents++;
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                               dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> +                                                dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +    }
> +    if (num_extents == 0) {

We can just check if there is a first one.  That check can be done before
counting them and is probably a little more elegant than leaving it to down here.
I'm not sure we can pass in an empty list but if we can (easy to poke interface
and check) then I assume records == NULL. 

> +        error_setg(errp, "no valid extents to send to process");
> +        return;
> +    }
> +
> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        list = list->next;
> +        i++;
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    /*
> +     * FIXME: for now, the "More" flag is cleared as there is only one
> +     * extent associating with each record and tag-based release is
> +     * not supported.

This is misleading by my understanding of the specification.
More isn't directly related to tags (though it is necessary for some
flows with tags, when sharing is enabled anyway).
The reference to record also isn't that relevant. The idea is you set
it for all but the last record pushed to the event log (from a given
action from an FM).

The whole reason to have a multi extent injection interface is to set
the more flag to indicate that the OS needs to treat a bunch of extents
as one 'offer' of capacity.  So a rejection from the OS needs to take
out 'all those records'.  The proposed linux code will currently reject
all by the first extent (I moaned about that yesterday). 

It is fine to not support this in the current code, but then I would check
the number of extents and reject any multi extent commands until we
do support it.

Ultimately I want a qmp command with more than one extent to mean
they are one 'offer' of capacity and must be handled as such by
the OS.  I.e. it can't reply with multiple unrelated acceptance
or reject replies.

On the add side this is easy to support, the fiddly bit is if the
OS rejects some or all of the capacity and you then need to
take out all the extents offered that it hasn't accepted in it's reply.

Pending list will need to maintain that association.
Maybe the easiest way is to have pending list be a list of sublists?
That way each sublist is handled in one go and any non accepted extents
in that sub list are dropped.

 
> +     */
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> +                                             extents[i].start_dpa,
> +                                             extents[i].len,
> +                                             extents[i].tag,
> +                                             extents[i].shared_seq);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {
> +            cxl_event_irq_assert(dcd);
> +        }
> +    }
> +}
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,

Drop passing in the log, it doesn't make sense given these events only occur
on that log and we can hard code it in the function.

> +                                    DC_EVENT_ADD_CAPACITY, 0,
> +                                    region_id, records, errp);
> +}
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> +                                     DC_EVENT_RELEASE_CAPACITY, 0,
> +                                     region_id, records, errp);
> +}
> +
>  static void ct3_class_init(ObjectClass *oc, void *data)
>  {


> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index df3511e91b..b84063d9f4 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -494,6 +494,7 @@ struct CXLType3Dev {
>           */
>          uint64_t total_capacity; /* 256M aligned */
>          CXLDCExtentList extents;
> +        CXLDCExtentList extents_pending;
>          uint32_t total_extent_count;
>          uint32_t ext_list_gen_seq;



>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 8cc4c72fa9..2645004666 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -19,13 +19,16 @@
>  #
>  # @fatal: Fatal Event Log
>  #
> +# @dyncap: Dynamic Capacity Event Log
> +#
>  # Since: 8.1
>  ##
>  { 'enum': 'CxlEventLog',
>    'data': ['informational',
>             'warning',
>             'failure',
> -           'fatal']
> +           'fatal',
> +           'dyncap']

Does this have the side effect of letting us inject error events
onto the dynamic capacity log? 

>   }
>  
>  ##
> @@ -361,3 +364,59 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
...

> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The device will
> +# have to acknowledged the acceptance of the extents before they are usable.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: id of the region where the extent to add
> +# @extents: Extents to add
> +#
> +# Since : 9.0

9.1

> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'region-id': 'uint8',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: id of the region where the extent to release
> +# @extents: Extents to release
> +#
> +# Since : 9.0

9.1

> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'region-id': 'uint8',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-05 12:18     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 12:18 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:27 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "region-id": 0,
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }

Hi Fan,

I talk more on this inline, but to me this interface takes multiple extents
so that we can treat them as a single 'offer' of capacity. That is they
should be linked in the event log with the more flag and the host should
have to handle them in one go (I known Ira and Navneet's code doesn't handle
this yet, but that doesn't mean QEMU shouldn't).

Alternative for now would be to only support a single entry.  Keep the
interface defined to take multiple entries but reject it at runtime.

I don't want to end up with a more complex interface in the end just
because we allowed this form to not set the MORE flag today.
We will need this to do tagged handling and ultimately sharing, so good
to get it right from the start.

For tagged handling I think the right option is to have the tag alongside
region-id not in the individual extents.  That way the interface is naturally
used to generate the right description to the host.

>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "region-id": 0,
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>



>          /* to-be-added range should not overlap with range already accepted */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1585,9 +1586,13 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>      CXLDCExtentList *extent_list = &ct3d->dc.extents;
>      uint32_t i;
>      uint64_t dpa, len;
> +    CXLDCExtent *ent;
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> +        /* Always remove the first pending extent when response received. */
> +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
>          return CXL_MBOX_SUCCESS;
>      }
>  
> @@ -1604,6 +1609,8 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
>      if (ret != CXL_MBOX_SUCCESS) {
> +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);

Ah this deals with the todo I suggest you add to the earlier patch.
I'd not mind so much if you hadn't been so thorough on other todo notes ;)
Add one in the earlier patch and get rid of ti here like you do below.

However as I note below I think we need to handle these as groups of extents
not single extents. That way we keep an 'offered' set offered at the same time by
a single command (and expose to host using the more flag) together and reject
them on mass.


>          return ret;
>      }
>  
> @@ -1613,10 +1620,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
> -        /*
> -         * TODO: we will add a pending extent list based on event log record
> -         * and process the list according here.
> -         */
> +
> +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
>      }
>  
>      return CXL_MBOX_SUCCESS;
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 951bd79a82..74cb64e843 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c

>  
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> @@ -1449,7 +1454,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
>          return CXL_EVENT_TYPE_FAIL;
>      case CXL_EVENT_LOG_FATAL:
>          return CXL_EVENT_TYPE_FATAL;
> -/* DCD not yet supported */

Drop the comment but don't add the code.  We are handling DCD differently
from other events, so this code should never deal with it.

> +    case CXL_EVENT_LOG_DYNCAP:
> +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
>      default:
>          return -EINVAL;
>      }
> @@ -1700,6 +1706,250 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
>      }
>  }

> +/*
> + * Check whether the range [dpa, dpa + len -1] has overlaps with extents in

space after - (just looks odd otherwise)

> + * the list.
> + * Return value: return true if has overlaps; otherwise, return false
> + */
> +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> +                                           uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_overlaps_range(&range1, &range2)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * Check whether the range [dpa, dpa + len -1] is contained by extents in 

space after -

> + * the list.
> + * Will check multiple extents containment once superset release is added.
> + * Return value: return true if range is contained; otherwise, return false
> + */
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_contains_range(&range2, &range1)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * The main function to process dynamic capacity event. Currently DC extents
> + * add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,

As below. Don't pass in a CxlEventLog.  Whilst some infrastructure is shared
with other event logs, we don't want to accidentally enable other events
being added to the DC event log.

> +                                             CXLDCEventType type, uint16_t hid,
> +                                             uint8_t rid,
> +                                             CXLDCExtentRecordList *records,
> +                                             Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log;
> +    uint64_t dpa, offset, len, block_size;
> +    int i, rc;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +    rc = ct3d_qmp_cxl_event_log_enc(log);

enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP; always so don't look it up.

> +    if (rc < 0) {
> +        error_setg(errp, "Unhandled error log type");
> +        return;
> +    }
> +    enc_log = rc;
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        num_extents++;
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                               dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> +                                                dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +    }
> +    if (num_extents == 0) {

We can just check if there is a first one.  That check can be done before
counting them and is probably a little more elegant than leaving it to down here.
I'm not sure we can pass in an empty list but if we can (easy to poke interface
and check) then I assume records == NULL. 

> +        error_setg(errp, "no valid extents to send to process");
> +        return;
> +    }
> +
> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        list = list->next;
> +        i++;
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    /*
> +     * FIXME: for now, the "More" flag is cleared as there is only one
> +     * extent associating with each record and tag-based release is
> +     * not supported.

This is misleading by my understanding of the specification.
More isn't directly related to tags (though it is necessary for some
flows with tags, when sharing is enabled anyway).
The reference to record also isn't that relevant. The idea is you set
it for all but the last record pushed to the event log (from a given
action from an FM).

The whole reason to have a multi extent injection interface is to set
the more flag to indicate that the OS needs to treat a bunch of extents
as one 'offer' of capacity.  So a rejection from the OS needs to take
out 'all those records'.  The proposed linux code will currently reject
all by the first extent (I moaned about that yesterday). 

It is fine to not support this in the current code, but then I would check
the number of extents and reject any multi extent commands until we
do support it.

Ultimately I want a qmp command with more than one extent to mean
they are one 'offer' of capacity and must be handled as such by
the OS.  I.e. it can't reply with multiple unrelated acceptance
or reject replies.

On the add side this is easy to support, the fiddly bit is if the
OS rejects some or all of the capacity and you then need to
take out all the extents offered that it hasn't accepted in it's reply.

Pending list will need to maintain that association.
Maybe the easiest way is to have pending list be a list of sublists?
That way each sublist is handled in one go and any non accepted extents
in that sub list are dropped.

 
> +     */
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> +                                             extents[i].start_dpa,
> +                                             extents[i].len,
> +                                             extents[i].tag,
> +                                             extents[i].shared_seq);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {
> +            cxl_event_irq_assert(dcd);
> +        }
> +    }
> +}
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,

Drop passing in the log, it doesn't make sense given these events only occur
on that log and we can hard code it in the function.

> +                                    DC_EVENT_ADD_CAPACITY, 0,
> +                                    region_id, records, errp);
> +}
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> +                                     DC_EVENT_RELEASE_CAPACITY, 0,
> +                                     region_id, records, errp);
> +}
> +
>  static void ct3_class_init(ObjectClass *oc, void *data)
>  {


> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index df3511e91b..b84063d9f4 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -494,6 +494,7 @@ struct CXLType3Dev {
>           */
>          uint64_t total_capacity; /* 256M aligned */
>          CXLDCExtentList extents;
> +        CXLDCExtentList extents_pending;
>          uint32_t total_extent_count;
>          uint32_t ext_list_gen_seq;



>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 8cc4c72fa9..2645004666 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -19,13 +19,16 @@
>  #
>  # @fatal: Fatal Event Log
>  #
> +# @dyncap: Dynamic Capacity Event Log
> +#
>  # Since: 8.1
>  ##
>  { 'enum': 'CxlEventLog',
>    'data': ['informational',
>             'warning',
>             'failure',
> -           'fatal']
> +           'fatal',
> +           'dyncap']

Does this have the side effect of letting us inject error events
onto the dynamic capacity log? 

>   }
>  
>  ##
> @@ -361,3 +364,59 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
...

> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The device will
> +# have to acknowledged the acceptance of the extents before they are usable.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: id of the region where the extent to add
> +# @extents: Extents to add
> +#
> +# Since : 9.0

9.1

> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'region-id': 'uint8',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: id of the region where the extent to release
> +# @extents: Extents to release
> +#
> +# Since : 9.0

9.1

> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'region-id': 'uint8',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-03 18:16   ` Gregory Price
@ 2024-04-05 12:27       ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 12:27 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Wed, 3 Apr 2024 14:16:25 -0400
Gregory Price <gregory.price@memverge.com> wrote:

A few follow up comments.

> On Mon, Mar 25, 2024 at 12:02:27PM -0700, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> >   
> ... snip 
> > +
> > +/*
> > + * The main function to process dynamic capacity event. Currently DC extents
> > + * add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> > +                                             CXLDCEventType type, uint16_t hid,
> > +                                             uint8_t rid,
> > +                                             CXLDCExtentRecordList *records,
> > +                                             Error **errp)
> > +{  
> ... snip 
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = offset + dcd->dc.regions[rid].base;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        num_extents++;  
> 
> I think num_extents is always equal to the length of the list, otherwise
> this code will return with error.
> 
> Nitpick:
> This can be moved to the bottom w/ `list = list->next` to express that a
> little more clearly.
> 
> > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                               dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with pending DPA range");
> > +                return;
> > +            }
> > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> > +                                                dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with non-existing DPA range");
> > +                return;
> > +            }
> > +        }
> > +        list = list->next;
> > +    }
> > +
> > +    if (num_extents == 0) {  
> 
> Since num_extents is always the length of the list, this is equivalent to
> `if (!records)` prior to the while loop. Makes it a little more clear that:
> 
> 1. There must be at least 1 extent
> 2. All extents must be valid for the command to be serviced.

Agreed.

> 
> > +        error_setg(errp, "no valid extents to send to process");
> > +        return;
> > +    }
> > +  
> 
> I'm looking at adding the MHD extensions around this point, e.g.:
> 
> /* If MHD cannot allocate requested extents, the cmd fails */
> if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
>     num_extents != dcd->mhd_dcd_extents_allocate(...))
> 	return;
> 
> where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
> for correctness (shared // no double-allocations, etc). On success,
> it garuantees proper ownership.
> 
> the release path would then be done in the release response path from
> the host, as opposed to the release event injection.

I think it would be polite to check if the QMP command on release
for whether it is asking something plausible - makes for an easier
to user QMP interface.  I guess it's not strictly required though.
What races are there on release?  We aren't support force release
for now, and for anything else, it's host specific (unlike add where
the extra rules kick in).   AS such I 'think' a check at command
time will be valid as long as the host hasn't done an async
release of capacity between that and the event record.  That
is a race we always have and the host should at most log it and
not release capacity twice.

> 
> Do you see any issues with that flow?
> 
> > +    /* Create extent list for event being passed to host */
> > +    i = 0;
> > +    list = records;
> > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = dcd->dc.regions[rid].base + offset;
> > +
> > +        extents[i].start_dpa = dpa;
> > +        extents[i].len = len;
> > +        memset(extents[i].tag, 0, 0x10);
> > +        extents[i].shared_seq = 0;
> > +        list = list->next;
> > +        i++;
> > +    }
> > +
> > +    /*
> > +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > +     *
> > +     * All Dynamic Capacity event records shall set the Event Record Severity
> > +     * field in the Common Event Record Format to Informational Event. All
> > +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > +     * Event Log.
> > +     */
> > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > +    dCap.type = type;
> > +    /* FIXME: for now, validity flag is cleared */
> > +    dCap.validity_flags = 0;
> > +    stw_le_p(&dCap.host_id, hid);
> > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > +    dCap.updated_region_id = 0;
> > +    /*
> > +     * FIXME: for now, the "More" flag is cleared as there is only one
> > +     * extent associating with each record and tag-based release is
> > +     * not supported.
> > +     */
> > +    dCap.flags = 0;
> > +    for (i = 0; i < num_extents; i++) {
> > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > +               sizeof(CXLDCExtentRaw));
> > +
> > +        if (type == DC_EVENT_ADD_CAPACITY) {
> > +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> > +                                             extents[i].start_dpa,
> > +                                             extents[i].len,
> > +                                             extents[i].tag,
> > +                                             extents[i].shared_seq);
> > +        }
> > +
> > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > +                             (CXLEventRecordRaw *)&dCap)) {  
> 
> Pardon if I missed a prior discussion about this, but what happens to
> pending events in the scenario where cxl_event_insert fails?

For an add or release, error returned to the FM that tried it.
For Force release, carry on regardless (setting overflow etc).
Host goes boom but that probably happens anyway :)

Jonathan


> 
> ~Gregory


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-05 12:27       ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 12:27 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Wed, 3 Apr 2024 14:16:25 -0400
Gregory Price <gregory.price@memverge.com> wrote:

A few follow up comments.

> On Mon, Mar 25, 2024 at 12:02:27PM -0700, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> >   
> ... snip 
> > +
> > +/*
> > + * The main function to process dynamic capacity event. Currently DC extents
> > + * add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> > +                                             CXLDCEventType type, uint16_t hid,
> > +                                             uint8_t rid,
> > +                                             CXLDCExtentRecordList *records,
> > +                                             Error **errp)
> > +{  
> ... snip 
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = offset + dcd->dc.regions[rid].base;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        num_extents++;  
> 
> I think num_extents is always equal to the length of the list, otherwise
> this code will return with error.
> 
> Nitpick:
> This can be moved to the bottom w/ `list = list->next` to express that a
> little more clearly.
> 
> > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                               dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with pending DPA range");
> > +                return;
> > +            }
> > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> > +                                                dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with non-existing DPA range");
> > +                return;
> > +            }
> > +        }
> > +        list = list->next;
> > +    }
> > +
> > +    if (num_extents == 0) {  
> 
> Since num_extents is always the length of the list, this is equivalent to
> `if (!records)` prior to the while loop. Makes it a little more clear that:
> 
> 1. There must be at least 1 extent
> 2. All extents must be valid for the command to be serviced.

Agreed.

> 
> > +        error_setg(errp, "no valid extents to send to process");
> > +        return;
> > +    }
> > +  
> 
> I'm looking at adding the MHD extensions around this point, e.g.:
> 
> /* If MHD cannot allocate requested extents, the cmd fails */
> if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
>     num_extents != dcd->mhd_dcd_extents_allocate(...))
> 	return;
> 
> where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
> for correctness (shared // no double-allocations, etc). On success,
> it garuantees proper ownership.
> 
> the release path would then be done in the release response path from
> the host, as opposed to the release event injection.

I think it would be polite to check if the QMP command on release
for whether it is asking something plausible - makes for an easier
to user QMP interface.  I guess it's not strictly required though.
What races are there on release?  We aren't support force release
for now, and for anything else, it's host specific (unlike add where
the extra rules kick in).   AS such I 'think' a check at command
time will be valid as long as the host hasn't done an async
release of capacity between that and the event record.  That
is a race we always have and the host should at most log it and
not release capacity twice.

> 
> Do you see any issues with that flow?
> 
> > +    /* Create extent list for event being passed to host */
> > +    i = 0;
> > +    list = records;
> > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = dcd->dc.regions[rid].base + offset;
> > +
> > +        extents[i].start_dpa = dpa;
> > +        extents[i].len = len;
> > +        memset(extents[i].tag, 0, 0x10);
> > +        extents[i].shared_seq = 0;
> > +        list = list->next;
> > +        i++;
> > +    }
> > +
> > +    /*
> > +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > +     *
> > +     * All Dynamic Capacity event records shall set the Event Record Severity
> > +     * field in the Common Event Record Format to Informational Event. All
> > +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > +     * Event Log.
> > +     */
> > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > +    dCap.type = type;
> > +    /* FIXME: for now, validity flag is cleared */
> > +    dCap.validity_flags = 0;
> > +    stw_le_p(&dCap.host_id, hid);
> > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > +    dCap.updated_region_id = 0;
> > +    /*
> > +     * FIXME: for now, the "More" flag is cleared as there is only one
> > +     * extent associating with each record and tag-based release is
> > +     * not supported.
> > +     */
> > +    dCap.flags = 0;
> > +    for (i = 0; i < num_extents; i++) {
> > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > +               sizeof(CXLDCExtentRaw));
> > +
> > +        if (type == DC_EVENT_ADD_CAPACITY) {
> > +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> > +                                             extents[i].start_dpa,
> > +                                             extents[i].len,
> > +                                             extents[i].tag,
> > +                                             extents[i].shared_seq);
> > +        }
> > +
> > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > +                             (CXLEventRecordRaw *)&dCap)) {  
> 
> Pardon if I missed a prior discussion about this, but what happens to
> pending events in the scenario where cxl_event_insert fails?

For an add or release, error returned to the FM that tried it.
For Force release, carry on regardless (setting overflow etc).
Host goes boom but that probably happens anyway :)

Jonathan


> 
> ~Gregory



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  2024-03-25 19:02 ` [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions nifan.cxl
@ 2024-04-05 12:29     ` Jonathan Cameron via
  2024-04-12 22:54   ` Gregory Price
  1 sibling, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 12:29 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:28 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> All dpa ranges in the DC regions are invalid to access until an extent

Let's be more consistent for commit logs and use DPA DC HPA etc all
caps. It's a bit of a mixture in this series at the moment.

> covering the range has been added.
I'd expand that to 'has been successfully accepted by the host.'

> Add a bitmap for each region to
> record whether a DC block in the region has been backed by DC extent.
> For the bitmap, a bit in the bitmap represents a DC block. When a DC
> extent is added, all the bits of the blocks in the extent will be set,
> which will be cleared when the extent is released.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
@ 2024-04-05 12:29     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 12:29 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:28 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> All dpa ranges in the DC regions are invalid to access until an extent

Let's be more consistent for commit logs and use DPA DC HPA etc all
caps. It's a bit of a mixture in this series at the moment.

> covering the range has been added.
I'd expand that to 'has been successfully accepted by the host.'

> Add a bitmap for each region to
> record whether a DC block in the region has been backed by DC extent.
> For the bitmap, a bit in the bitmap represents a DC block. When a DC
> extent is added, all the bits of the blocks in the extent will be set,
> which will be cleared when the extent is released.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
  2024-03-25 19:02 ` [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
@ 2024-04-05 12:32     ` Jonathan Cameron via
  2024-04-05 12:32     ` Jonathan Cameron via
  1 sibling, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 12:32 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:29 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> With the change, we extend the extent release mailbox command processing
> to allow more flexible release. As long as the DPA range of the extent to
> release is covered by accepted extent(s) in the device, the release can be
> performed.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Nothing to add from me.
Nice and simple which is great.
Jonathan

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
@ 2024-04-05 12:32     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 12:32 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:29 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> With the change, we extend the extent release mailbox command processing
> to allow more flexible release. As long as the DPA range of the extent to
> release is covered by accepted extent(s) in the device, the release can be
> performed.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Nothing to add from me.
Nice and simple which is great.
Jonathan


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface
  2024-03-25 19:02 ` [PATCH v6 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface nifan.cxl
@ 2024-04-05 12:33     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 12:33 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:30 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Before the change, the QMP interface used for add/release DC extents
> only allows to release an extent whose DPA range is contained by a single
> accepted extent in the device.
> 
> With the change, we relax the constraints.  As long as the DPA range of
> the extent is covered by accepted extents, we allow the release.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Nice.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  hw/mem/cxl_type3.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 2628a6f50f..62c2022477 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -1935,8 +1935,7 @@ static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
>                             "cannot release extent with pending DPA range");
>                  return;
>              }
> -            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> -                                                dpa, len)) {
> +            if (!ct3_test_region_block_backed(dcd, dpa, len)) {
>                  error_setg(errp,
>                             "cannot release extent with non-existing DPA range");
>                  return;


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface
@ 2024-04-05 12:33     ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 12:33 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 25 Mar 2024 12:02:30 -0700
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Before the change, the QMP interface used for add/release DC extents
> only allows to release an extent whose DPA range is contained by a single
> accepted extent in the device.
> 
> With the change, we relax the constraints.  As long as the DPA range of
> the extent is covered by accepted extents, we allow the release.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Nice.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  hw/mem/cxl_type3.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 2628a6f50f..62c2022477 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -1935,8 +1935,7 @@ static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
>                             "cannot release extent with pending DPA range");
>                  return;
>              }
> -            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> -                                                dpa, len)) {
> +            if (!ct3_test_region_block_backed(dcd, dpa, len)) {
>                  error_setg(errp,
>                             "cannot release extent with non-existing DPA range");
>                  return;



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-05 12:27       ` Jonathan Cameron via
  (?)
@ 2024-04-05 16:07       ` Gregory Price
  2024-04-05 17:44           ` Jonathan Cameron via
  -1 siblings, 1 reply; 65+ messages in thread
From: Gregory Price @ 2024-04-05 16:07 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Fri, Apr 05, 2024 at 01:27:19PM +0100, Jonathan Cameron wrote:
> On Wed, 3 Apr 2024 14:16:25 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> A few follow up comments.
> 
> > 
> > > +        error_setg(errp, "no valid extents to send to process");
> > > +        return;
> > > +    }
> > > +  
> > 
> > I'm looking at adding the MHD extensions around this point, e.g.:
> > 
> > /* If MHD cannot allocate requested extents, the cmd fails */
> > if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
> >     num_extents != dcd->mhd_dcd_extents_allocate(...))
> > 	return;
> > 
> > where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
> > for correctness (shared // no double-allocations, etc). On success,
> > it garuantees proper ownership.
> > 
> > the release path would then be done in the release response path from
> > the host, as opposed to the release event injection.
> 
> I think it would be polite to check if the QMP command on release
> for whether it is asking something plausible - makes for an easier
> to user QMP interface.  I guess it's not strictly required though.
> What races are there on release?

The only real critical section, barring force-release beign supported,
is when you clear the bits in the device allowing new requests to swipe
those blocks. The appropriate place appears to be after the host kernel
has responded to the release extent request.

Also need to handle the case of multiple add-requests contending for the
same region, but that's just an "oops failed to get all the bits, roll
back" scenario - easy to handle.

Could go coarse-grained to just lock access to the bitmap entirely while
operating on it, or be fancy and use atomics to go lockless. The latter
code already exists in the Niagara model for reference.

> We aren't support force release
> for now, and for anything else, it's host specific (unlike add where
> the extra rules kick in).   AS such I 'think' a check at command
> time will be valid as long as the host hasn't done an async
> release of capacity between that and the event record.  That
> is a race we always have and the host should at most log it and
> not release capacity twice.
>

Borrowing from the Ira's flow chart, here are the pieces I believe are
needed to implement MHD support for DCD.

Orchestrator      FM         Device       Host Kernel    Host User

    |             |           |            |              |
    |-- Add ----->|-- Add --->A--- Add --->|              |
    |  Capacity   |  Extent   |   Extent   |              |
    |             |           |            |              |
    |             |<--Accept--B<--Accept --|              |
    |             |   Extent  |   Extent   |              |
    |             |           |            |              |
    |             |     ... snip ...       |              |
    |             |           |            |              |
    |-- Remove -->|--Release->C---Release->|              |
    |  Capacity   |  Extent   |   Extent   |              |
    |             |           |            |              |
    |             |<-Release--D<--Release--|              |
    |             |  Extent   |   Extent   |              |
    |             |           |            |              |

1. (A) Upon Device Receiving Add Capacity Request
   a. the device sanity checks the request against local mappings
   b. the mhd hook is called to sanity check against global mappings
   c. the mhd bitmap is updated, marking the capacity owned by that head

   function: qmp_cxl_process_dynamic_capacity

2. (B) Upon Device Receiving Add Dynamic Capacity Response
   a. accepted extents are compared to the original request
   b. not accepted extents are cleared from the bitmap (local and MHD)
   (Note: My understanding is that for now each request = 1 extent)

   function: cmd_dcd_add_dyn_cap_rsp

3. (C) Upon Device receiving Release Dynamic Capacity Request
   a. check for a pending release request. If exists, error.
   b. check that the bits in the MHD bitmap are actually set

   function: qmp_cxl_process_dynamic_capacity

4. (D) Upon Device receiving Release Dynamic Capacity Response
   a. clear the bits in the mhd bitmap
   b. remove the pending request from the pending list

   function: cmd_dcd_release_dyn_cap

Something to note: The MHD bitmap is essentially the same as the
existing DCD extent bitmap - except that it is located in a shared
region of memory (mmap file, shm, whatever - pick one).

Maybe it's worth abstracting out the bitmap twiddling to make that
backable by a file mmap'd SHARED and use atomics to twiddle the bits?

That would be about 90% of the way to MH-DCD.

Maybe flock() could be used for coarse locking on the a shared bitmap
in the short term?  This mitigates your concern of using shm.h as
the coordination piece, though i'm not sure how portable flock() is.

~Gregory

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-05 16:07       ` Gregory Price
@ 2024-04-05 17:44           ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-05 17:44 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Fri, 5 Apr 2024 12:07:45 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Fri, Apr 05, 2024 at 01:27:19PM +0100, Jonathan Cameron wrote:
> > On Wed, 3 Apr 2024 14:16:25 -0400
> > Gregory Price <gregory.price@memverge.com> wrote:
> > 
> > A few follow up comments.
> >   
> > >   
> > > > +        error_setg(errp, "no valid extents to send to process");
> > > > +        return;
> > > > +    }
> > > > +    
> > > 
> > > I'm looking at adding the MHD extensions around this point, e.g.:
> > > 
> > > /* If MHD cannot allocate requested extents, the cmd fails */
> > > if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
> > >     num_extents != dcd->mhd_dcd_extents_allocate(...))
> > > 	return;
> > > 
> > > where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
> > > for correctness (shared // no double-allocations, etc). On success,
> > > it garuantees proper ownership.
> > > 
> > > the release path would then be done in the release response path from
> > > the host, as opposed to the release event injection.  
> > 
> > I think it would be polite to check if the QMP command on release
> > for whether it is asking something plausible - makes for an easier
> > to user QMP interface.  I guess it's not strictly required though.
> > What races are there on release?  
> 
> The only real critical section, barring force-release beign supported,
> is when you clear the bits in the device allowing new requests to swipe
> those blocks. The appropriate place appears to be after the host kernel
> has responded to the release extent request.

Agreed you can't release till then, but you can check if it's going to 
work.  I think that's worth doing for ease of use reasons.

> 
> Also need to handle the case of multiple add-requests contending for the
> same region, but that's just an "oops failed to get all the bits, roll
> back" scenario - easy to handle.
> 
> Could go coarse-grained to just lock access to the bitmap entirely while
> operating on it, or be fancy and use atomics to go lockless. The latter
> code already exists in the Niagara model for reference.

I'm fine either way, though I'd just use a lock in initial version()

> 
> > We aren't support force release
> > for now, and for anything else, it's host specific (unlike add where
> > the extra rules kick in).   AS such I 'think' a check at command
> > time will be valid as long as the host hasn't done an async
> > release of capacity between that and the event record.  That
> > is a race we always have and the host should at most log it and
> > not release capacity twice.
> >  
> 
> Borrowing from the Ira's flow chart, here are the pieces I believe are
> needed to implement MHD support for DCD.
> 
> Orchestrator      FM         Device       Host Kernel    Host User
> 
>     |             |           |            |              |
>     |-- Add ----->|-- Add --->A--- Add --->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<--Accept--B<--Accept --|              |
>     |             |   Extent  |   Extent   |              |
>     |             |           |            |              |
>     |             |     ... snip ...       |              |
>     |             |           |            |              |
>     |-- Remove -->|--Release->C---Release->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<-Release--D<--Release--|              |
>     |             |  Extent   |   Extent   |              |
>     |             |           |            |              |
> 
> 1. (A) Upon Device Receiving Add Capacity Request
>    a. the device sanity checks the request against local mappings
>    b. the mhd hook is called to sanity check against global mappings
>    c. the mhd bitmap is updated, marking the capacity owned by that head
> 
>    function: qmp_cxl_process_dynamic_capacity
> 
> 2. (B) Upon Device Receiving Add Dynamic Capacity Response
>    a. accepted extents are compared to the original request
>    b. not accepted extents are cleared from the bitmap (local and MHD)
>    (Note: My understanding is that for now each request = 1 extent)

Yeah but that is a restriction I think we need to solve soon.

> 
>    function: cmd_dcd_add_dyn_cap_rsp
> 
> 3. (C) Upon Device receiving Release Dynamic Capacity Request
>    a. check for a pending release request. If exists, error.

Not sure that's necessary - can queue as long as the head
can track if the bits are in a pending release state.

>    b. check that the bits in the MHD bitmap are actually set
Good.
> 
>    function: qmp_cxl_process_dynamic_capacity
> 
> 4. (D) Upon Device receiving Release Dynamic Capacity Response
>    a. clear the bits in the mhd bitmap
>    b. remove the pending request from the pending list
> 
>    function: cmd_dcd_release_dyn_cap
> 
> Something to note: The MHD bitmap is essentially the same as the
> existing DCD extent bitmap - except that it is located in a shared
> region of memory (mmap file, shm, whatever - pick one).

I think you will ideally also have a per head one to track head access
to the things offered by the mhd.

> 
> Maybe it's worth abstracting out the bitmap twiddling to make that
> backable by a file mmap'd SHARED and use atomics to twiddle the bits?
> 
> That would be about 90% of the way to MH-DCD.

> 
> Maybe flock() could be used for coarse locking on the a shared bitmap
> in the short term?  This mitigates your concern of using shm.h as
> the coordination piece, though i'm not sure how portable flock() is.
Sounds nice, but you are wondering into that pesky userspace stuff
where I'd have to google a lot to understand :)

Jonathan

> 
> ~Gregory


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-05 17:44           ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-05 17:44 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Fri, 5 Apr 2024 12:07:45 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Fri, Apr 05, 2024 at 01:27:19PM +0100, Jonathan Cameron wrote:
> > On Wed, 3 Apr 2024 14:16:25 -0400
> > Gregory Price <gregory.price@memverge.com> wrote:
> > 
> > A few follow up comments.
> >   
> > >   
> > > > +        error_setg(errp, "no valid extents to send to process");
> > > > +        return;
> > > > +    }
> > > > +    
> > > 
> > > I'm looking at adding the MHD extensions around this point, e.g.:
> > > 
> > > /* If MHD cannot allocate requested extents, the cmd fails */
> > > if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
> > >     num_extents != dcd->mhd_dcd_extents_allocate(...))
> > > 	return;
> > > 
> > > where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
> > > for correctness (shared // no double-allocations, etc). On success,
> > > it garuantees proper ownership.
> > > 
> > > the release path would then be done in the release response path from
> > > the host, as opposed to the release event injection.  
> > 
> > I think it would be polite to check if the QMP command on release
> > for whether it is asking something plausible - makes for an easier
> > to user QMP interface.  I guess it's not strictly required though.
> > What races are there on release?  
> 
> The only real critical section, barring force-release beign supported,
> is when you clear the bits in the device allowing new requests to swipe
> those blocks. The appropriate place appears to be after the host kernel
> has responded to the release extent request.

Agreed you can't release till then, but you can check if it's going to 
work.  I think that's worth doing for ease of use reasons.

> 
> Also need to handle the case of multiple add-requests contending for the
> same region, but that's just an "oops failed to get all the bits, roll
> back" scenario - easy to handle.
> 
> Could go coarse-grained to just lock access to the bitmap entirely while
> operating on it, or be fancy and use atomics to go lockless. The latter
> code already exists in the Niagara model for reference.

I'm fine either way, though I'd just use a lock in initial version()

> 
> > We aren't support force release
> > for now, and for anything else, it's host specific (unlike add where
> > the extra rules kick in).   AS such I 'think' a check at command
> > time will be valid as long as the host hasn't done an async
> > release of capacity between that and the event record.  That
> > is a race we always have and the host should at most log it and
> > not release capacity twice.
> >  
> 
> Borrowing from the Ira's flow chart, here are the pieces I believe are
> needed to implement MHD support for DCD.
> 
> Orchestrator      FM         Device       Host Kernel    Host User
> 
>     |             |           |            |              |
>     |-- Add ----->|-- Add --->A--- Add --->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<--Accept--B<--Accept --|              |
>     |             |   Extent  |   Extent   |              |
>     |             |           |            |              |
>     |             |     ... snip ...       |              |
>     |             |           |            |              |
>     |-- Remove -->|--Release->C---Release->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<-Release--D<--Release--|              |
>     |             |  Extent   |   Extent   |              |
>     |             |           |            |              |
> 
> 1. (A) Upon Device Receiving Add Capacity Request
>    a. the device sanity checks the request against local mappings
>    b. the mhd hook is called to sanity check against global mappings
>    c. the mhd bitmap is updated, marking the capacity owned by that head
> 
>    function: qmp_cxl_process_dynamic_capacity
> 
> 2. (B) Upon Device Receiving Add Dynamic Capacity Response
>    a. accepted extents are compared to the original request
>    b. not accepted extents are cleared from the bitmap (local and MHD)
>    (Note: My understanding is that for now each request = 1 extent)

Yeah but that is a restriction I think we need to solve soon.

> 
>    function: cmd_dcd_add_dyn_cap_rsp
> 
> 3. (C) Upon Device receiving Release Dynamic Capacity Request
>    a. check for a pending release request. If exists, error.

Not sure that's necessary - can queue as long as the head
can track if the bits are in a pending release state.

>    b. check that the bits in the MHD bitmap are actually set
Good.
> 
>    function: qmp_cxl_process_dynamic_capacity
> 
> 4. (D) Upon Device receiving Release Dynamic Capacity Response
>    a. clear the bits in the mhd bitmap
>    b. remove the pending request from the pending list
> 
>    function: cmd_dcd_release_dyn_cap
> 
> Something to note: The MHD bitmap is essentially the same as the
> existing DCD extent bitmap - except that it is located in a shared
> region of memory (mmap file, shm, whatever - pick one).

I think you will ideally also have a per head one to track head access
to the things offered by the mhd.

> 
> Maybe it's worth abstracting out the bitmap twiddling to make that
> backable by a file mmap'd SHARED and use atomics to twiddle the bits?
> 
> That would be about 90% of the way to MH-DCD.

> 
> Maybe flock() could be used for coarse locking on the a shared bitmap
> in the short term?  This mitigates your concern of using shm.h as
> the coordination piece, though i'm not sure how portable flock() is.
Sounds nice, but you are wondering into that pesky userspace stuff
where I'd have to google a lot to understand :)

Jonathan

> 
> ~Gregory



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-05 17:44           ` Jonathan Cameron via
  (?)
@ 2024-04-05 18:09           ` Gregory Price
  2024-04-09 16:10               ` Jonathan Cameron via
  -1 siblings, 1 reply; 65+ messages in thread
From: Gregory Price @ 2024-04-05 18:09 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Fri, Apr 05, 2024 at 06:44:52PM +0100, Jonathan Cameron wrote:
> On Fri, 5 Apr 2024 12:07:45 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > 3. (C) Upon Device receiving Release Dynamic Capacity Request
> >    a. check for a pending release request. If exists, error.
> 
> Not sure that's necessary - can queue as long as the head
> can track if the bits are in a pending release state.
> 

Yeah probably it's fine to just queue the event and everything
downstream just handles it.

> >    b. check that the bits in the MHD bitmap are actually set
> Good.
> > 
> >    function: qmp_cxl_process_dynamic_capacity
> > 
> > 4. (D) Upon Device receiving Release Dynamic Capacity Response
> >    a. clear the bits in the mhd bitmap
> >    b. remove the pending request from the pending list
> > 
> >    function: cmd_dcd_release_dyn_cap
> > 
> > Something to note: The MHD bitmap is essentially the same as the
> > existing DCD extent bitmap - except that it is located in a shared
> > region of memory (mmap file, shm, whatever - pick one).
> 
> I think you will ideally also have a per head one to track head access
> to the things offered by the mhd.
> 

Generally I try not to duplicate state, reduces consistency problems.

You do still need a shared memory state and a per-head state to capture
per-head data, but the allocation bitmap is really device-global state.

Either way you have a race condition when checking the bitmap during a
memory access in the process of adding/releasing capacity - but that's
more an indication of bad host behavior than it is of a bug in the
implementatio of the emulated device. Probably we don't need to
read-lock the bitmap (for access validation), only write-lock.

My preference, for what it's worth, would be to have a single bitmap
and have it be anonymous-memory for Single-head and file-backed for
for Multi-head.  I'll have to work out the locking mechanism.

~Gregory

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-05 18:09           ` Gregory Price
@ 2024-04-09 16:10               ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-09 16:10 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Fri, 5 Apr 2024 14:09:23 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Fri, Apr 05, 2024 at 06:44:52PM +0100, Jonathan Cameron wrote:
> > On Fri, 5 Apr 2024 12:07:45 -0400
> > Gregory Price <gregory.price@memverge.com> wrote:
> >   
> > > 3. (C) Upon Device receiving Release Dynamic Capacity Request
> > >    a. check for a pending release request. If exists, error.  
> > 
> > Not sure that's necessary - can queue as long as the head
> > can track if the bits are in a pending release state.
> >   
> 
> Yeah probably it's fine to just queue the event and everything
> downstream just handles it.
> 
> > >    b. check that the bits in the MHD bitmap are actually set  
> > Good.  
> > > 
> > >    function: qmp_cxl_process_dynamic_capacity
> > > 
> > > 4. (D) Upon Device receiving Release Dynamic Capacity Response
> > >    a. clear the bits in the mhd bitmap
> > >    b. remove the pending request from the pending list
> > > 
> > >    function: cmd_dcd_release_dyn_cap
> > > 
> > > Something to note: The MHD bitmap is essentially the same as the
> > > existing DCD extent bitmap - except that it is located in a shared
> > > region of memory (mmap file, shm, whatever - pick one).  
> > 
> > I think you will ideally also have a per head one to track head access
> > to the things offered by the mhd.
> >   
> 
> Generally I try not to duplicate state, reduces consistency problems.
> 
> You do still need a shared memory state and a per-head state to capture
> per-head data, but the allocation bitmap is really device-global state.

There is a separation between 'offered' to a head and 'accepted on that head'.
Sure you could track all outstanding offers (if you let more than one be
outstanding) at the shared memory, just seemed easier to do that in the
per head element.


> 
> Either way you have a race condition when checking the bitmap during a
> memory access in the process of adding/releasing capacity - but that's
> more an indication of bad host behavior than it is of a bug in the
> implementatio of the emulated device. Probably we don't need to
> read-lock the bitmap (for access validation), only write-lock.
> 
> My preference, for what it's worth, would be to have a single bitmap
> and have it be anonymous-memory for Single-head and file-backed for
> for Multi-head.  I'll have to work out the locking mechanism.
I'll go with maybe until I see the code :)

J
> 
> ~Gregory


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-09 16:10               ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-09 16:10 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Fri, 5 Apr 2024 14:09:23 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Fri, Apr 05, 2024 at 06:44:52PM +0100, Jonathan Cameron wrote:
> > On Fri, 5 Apr 2024 12:07:45 -0400
> > Gregory Price <gregory.price@memverge.com> wrote:
> >   
> > > 3. (C) Upon Device receiving Release Dynamic Capacity Request
> > >    a. check for a pending release request. If exists, error.  
> > 
> > Not sure that's necessary - can queue as long as the head
> > can track if the bits are in a pending release state.
> >   
> 
> Yeah probably it's fine to just queue the event and everything
> downstream just handles it.
> 
> > >    b. check that the bits in the MHD bitmap are actually set  
> > Good.  
> > > 
> > >    function: qmp_cxl_process_dynamic_capacity
> > > 
> > > 4. (D) Upon Device receiving Release Dynamic Capacity Response
> > >    a. clear the bits in the mhd bitmap
> > >    b. remove the pending request from the pending list
> > > 
> > >    function: cmd_dcd_release_dyn_cap
> > > 
> > > Something to note: The MHD bitmap is essentially the same as the
> > > existing DCD extent bitmap - except that it is located in a shared
> > > region of memory (mmap file, shm, whatever - pick one).  
> > 
> > I think you will ideally also have a per head one to track head access
> > to the things offered by the mhd.
> >   
> 
> Generally I try not to duplicate state, reduces consistency problems.
> 
> You do still need a shared memory state and a per-head state to capture
> per-head data, but the allocation bitmap is really device-global state.

There is a separation between 'offered' to a head and 'accepted on that head'.
Sure you could track all outstanding offers (if you let more than one be
outstanding) at the shared memory, just seemed easier to do that in the
per head element.


> 
> Either way you have a race condition when checking the bitmap during a
> memory access in the process of adding/releasing capacity - but that's
> more an indication of bad host behavior than it is of a bug in the
> implementatio of the emulated device. Probably we don't need to
> read-lock the bitmap (for access validation), only write-lock.
> 
> My preference, for what it's worth, would be to have a single bitmap
> and have it be anonymous-memory for Single-head and file-backed for
> for Multi-head.  I'll have to work out the locking mechanism.
I'll go with maybe until I see the code :)

J
> 
> ~Gregory



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-04-04 13:32   ` Jørgen Hansen
  2024-04-05 11:12       ` Jonathan Cameron via
@ 2024-04-09 19:21     ` fan
  2024-04-15 17:56     ` fan
  2024-04-15 18:00     ` fan
  3 siblings, 0 replies; 65+ messages in thread
From: fan @ 2024-04-09 19:21 UTC (permalink / raw)
  To: Jørgen Hansen
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
	gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
	nmtadam.samsung, jim.harris, wj28.lee, Fan Ni

On Thu, Apr 04, 2024 at 01:32:23PM +0000, Jørgen Hansen wrote:
> On 3/25/24 20:02, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > Per CXL spec 3.1, two mailbox commands are implemented:
> > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> > 
> > For the process of the above two commands, we use two-pass approach.
> > Pass 1: Check whether the input payload is valid or not; if not, skip
> >          Pass 2 and return mailbox process error.
> > Pass 2: Do the real work--add or release extents, respectively.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >   hw/cxl/cxl-mailbox-utils.c  | 433 +++++++++++++++++++++++++++++++++++-
> >   hw/mem/cxl_type3.c          |  11 +
> >   include/hw/cxl/cxl_device.h |   4 +
> >   3 files changed, 444 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 30ef46a036..a9eca516c8 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -19,6 +19,7 @@
> >   #include "qemu/units.h"
> >   #include "qemu/uuid.h"
> >   #include "sysemu/hostmem.h"
> > +#include "qemu/range.h"
> > 
> >   #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
> >   #define CXL_DC_EVENT_LOG_SIZE 8
> > @@ -85,6 +86,8 @@ enum {
> >       DCD_CONFIG  = 0x48,
> >           #define GET_DC_CONFIG          0x0
> >           #define GET_DYN_CAP_EXT_LIST   0x1
> > +        #define ADD_DYN_CAP_RSP        0x2
> > +        #define RELEASE_DYN_CAP        0x3
> >       PHYSICAL_SWITCH = 0x51,
> >           #define IDENTIFY_SWITCH_DEVICE      0x0
> >           #define GET_PHYSICAL_PORT_STATE     0x1
> > @@ -1400,6 +1403,422 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> >       return CXL_MBOX_SUCCESS;
> >   }
> > 
> > +/*
> > + * Check whether any bit between addr[nr, nr+size) is set,
> > + * return true if any bit is set, otherwise return false
> > + */
> > +static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> > +                              unsigned long size)
> > +{
> > +    unsigned long res = find_next_bit(addr, size + nr, nr);
> > +
> > +    return res < nr + size;
> > +}
> > +
> > +CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
> > +{
> > +    int i;
> > +    CXLDCRegion *region = &ct3d->dc.regions[0];
> > +
> > +    if (dpa < region->base ||
> > +        dpa >= region->base + ct3d->dc.total_capacity) {
> > +        return NULL;
> > +    }
> > +
> > +    /*
> > +     * CXL r3.1 section 9.13.3: Dynamic Capacity Device (DCD)
> > +     *
> > +     * Regions are used in increasing-DPA order, with Region 0 being used for
> > +     * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
> > +     * So check from the last region to find where the dpa belongs. Extents that
> > +     * cross multiple regions are not allowed.
> > +     */
> > +    for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
> > +        region = &ct3d->dc.regions[i];
> > +        if (dpa >= region->base) {
> > +            if (dpa + len > region->base + region->len) {
> > +                return NULL;
> > +            }
> > +            return region;
> > +        }
> > +    }
> > +
> > +    return NULL;
> > +}
> > +
> > +static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> > +                                             uint64_t dpa,
> > +                                             uint64_t len,
> > +                                             uint8_t *tag,
> > +                                             uint16_t shared_seq)
> > +{
> > +    CXLDCExtent *extent;
> > +
> > +    extent = g_new0(CXLDCExtent, 1);
> > +    extent->start_dpa = dpa;
> > +    extent->len = len;
> > +    if (tag) {
> > +        memcpy(extent->tag, tag, 0x10);
> > +    }
> > +    extent->shared_seq = shared_seq;
> > +
> > +    QTAILQ_INSERT_TAIL(list, extent, node);
> > +}
> > +
> > +void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> > +                                        CXLDCExtent *extent)
> > +{
> > +    QTAILQ_REMOVE(list, extent, node);
> > +    g_free(extent);
> > +}
> > +
> > +/*
> > + * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> > + * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> > + */
> > +typedef struct CXLUpdateDCExtentListInPl {
> > +    uint32_t num_entries_updated;
> > +    uint8_t flags;
> > +    uint8_t rsvd[3];
> > +    /* CXL r3.1 Table 8-169: Updated Extent */
> > +    struct {
> > +        uint64_t start_dpa;
> > +        uint64_t len;
> > +        uint8_t rsvd[8];
> > +    } QEMU_PACKED updated_entries[];
> > +} QEMU_PACKED CXLUpdateDCExtentListInPl;
> > +
> > +/*
> > + * For the extents in the extent list to operate, check whether they are valid
> > + * 1. The extent should be in the range of a valid DC region;
> > + * 2. The extent should not cross multiple regions;
> > + * 3. The start DPA and the length of the extent should align with the block
> > + * size of the region;
> > + * 4. The address range of multiple extents in the list should not overlap.
> > + */
> > +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> > +        const CXLUpdateDCExtentListInPl *in)
> > +{
> > +    uint64_t min_block_size = UINT64_MAX;
> > +    CXLDCRegion *region = &ct3d->dc.regions[0];
> > +    CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +    uint64_t dpa, len;
> > +    uint32_t i;
> > +
> > +    for (i = 0; i < ct3d->dc.num_regions; i++) {
> > +        region = &ct3d->dc.regions[i];
> > +        min_block_size = MIN(min_block_size, region->block_size);
> > +    }
> > +
> > +    blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
> > +                             ct3d->dc.regions[0].base) / min_block_size);
> > +
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        region = cxl_find_dc_region(ct3d, dpa, len);
> > +        if (!region) {
> > +            return CXL_MBOX_INVALID_PA;
> > +        }
> > +
> > +        dpa -= ct3d->dc.regions[0].base;
> > +        if (dpa % region->block_size || len % region->block_size) {
> > +            return CXL_MBOX_INVALID_EXTENT_LIST;
> > +        }
> > +        /* the dpa range already covered by some other extents in the list */
> > +        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> > +            len / min_block_size)) {
> > +            return CXL_MBOX_INVALID_EXTENT_LIST;
> > +        }
> > +        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> > +   }
> > +
> > +    return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> > +        const CXLUpdateDCExtentListInPl *in)
> > +{
> > +    uint32_t i;
> > +    CXLDCExtent *ent;
> > +    uint64_t dpa, len;
> > +    Range range1, range2;
> > +
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        range_init_nofail(&range1, dpa, len);
> > +
> > +        /*
> > +         * TODO: once the pending extent list is added, check against
> > +         * the list will be added here.
> > +         */
> > +
> > +        /* to-be-added range should not overlap with range already accepted */
> > +        QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > +            range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +            if (range_overlaps_range(&range1, &range2)) {
> > +                return CXL_MBOX_INVALID_PA;
> > +            }
> > +        }
> > +    }
> > +    return CXL_MBOX_SUCCESS;
> > +}
> 
> Instead of iterating over all new extents and all existing extents, 
> couldn't this be rolled into cxl_detect_malformed_extent_list - the 
> bitmap created there summarizes all ranges of the new extents, so you 
> can just check that the existing (and pending) extents don't overlap 
> with anything in the bitmap? Or allow the bitmap to be returned and used 
> for this check, since cxl_detect_malformed_extent_list is also used on 
> release, where things aren't as simple.

Hi Jørgen,

Thanks for reviewing the code.
cxl_detect_malformed_extent_list is only for verify the incoming extent
list in the command payload, while the dry run is used for simulating
the real add/release operations, meaning touching the in-device data
structures. They detect different type of mailbox errors, I think it is
clearer to make them separated. Also, as you can see in the following
patch, we have bitmap for the purpose you mentioned above.

For your suggestion below to reuse the tmp_list to avoid the duplicate
code for iterating extents for real release, I think it would work and
can simplify the code. I am working on it and also need to check whether
it will cause problem when superset release is introduced, but for now
it seems very promising.

Once the code is completed and works as we expected, I will post the
code here so you, Jonathan and others can take a look at it before I
send out the next version of the whole patchset.

Thanks,
Fan

> 
> > +
> > +/*
> > + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
> > + * An extent is added to the extent list and becomes usable only after the
> > + * response is processed successfully
> > + */
> > +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > +                                          uint8_t *payload_in,
> > +                                          size_t len_in,
> > +                                          uint8_t *payload_out,
> > +                                          size_t *len_out,
> > +                                          CXLCCI *cci)
> > +{
> > +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > +    uint32_t i;
> > +    uint64_t dpa, len;
> > +    CXLRetCode ret;
> > +
> > +    if (in->num_entries_updated == 0) {
> > +        return CXL_MBOX_SUCCESS;
> > +    }
> 
> The mailbox processing in patch 2 converts from le explicitly, whereas 
> the mailbox commands here don't. Looking at the existing mailbox 
> commands, convertion doesn't seem to be rigorously applied, so maybe 
> that is OK?
> 
> > +
> > +    /* Adding extents causes exceeding device's extent tracking ability. */
> > +    if (in->num_entries_updated + ct3d->dc.total_extent_count >
> > +        CXL_NUM_EXTENTS_SUPPORTED) {
> > +        return CXL_MBOX_RESOURCES_EXHAUSTED;
> > +    }
> > +
> > +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > +        ct3d->dc.total_extent_count += 1;
> > +        /*
> > +         * TODO: we will add a pending extent list based on event log record
> > +         * and process the list according here.
> > +         */
> > +    }
> > +
> > +    return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +/*
> > + * Copy extent list from src to dst
> > + * Return value: number of extents copied
> > + */
> > +static uint32_t copy_extent_list(CXLDCExtentList *dst,
> > +                                 const CXLDCExtentList *src)
> > +{
> > +    uint32_t cnt = 0;
> > +    CXLDCExtent *ent;
> > +
> > +    if (!dst || !src) {
> > +        return 0;
> > +    }
> > +
> > +    QTAILQ_FOREACH(ent, src, node) {
> > +        cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
> > +                                         ent->tag, ent->shared_seq);
> > +        cnt++;
> > +    }
> > +    return cnt;
> > +}
> > +
> > +static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
> > +        const CXLUpdateDCExtentListInPl *in)
> > +{
> > +    CXLDCExtent *ent, *ent_next;
> > +    uint64_t dpa, len;
> > +    uint32_t i;
> > +    int cnt_delta = 0;
> > +    CXLDCExtentList tmp_list;
> > +    CXLRetCode ret = CXL_MBOX_SUCCESS;
> > +
> > +    if (in->num_entries_updated == 0) {
> > +        return CXL_MBOX_INVALID_INPUT;
> > +    }
> > +
> > +    QTAILQ_INIT(&tmp_list);
> > +    copy_extent_list(&tmp_list, &ct3d->dc.extents);
> > +
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        Range range;
> > +
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        while (len > 0) {
> > +            QTAILQ_FOREACH(ent, &tmp_list, node) {
> > +                range_init_nofail(&range, ent->start_dpa, ent->len);
> > +
> > +                if (range_contains(&range, dpa)) {
> > +                    uint64_t len1, len2, len_done = 0;
> > +                    uint64_t ent_start_dpa = ent->start_dpa;
> > +                    uint64_t ent_len = ent->len;
> > +                    /*
> > +                     * Found the exact extent or the subset of an existing
> > +                     * extent.
> > +                     */
> > +                    if (range_contains(&range, dpa + len - 1)) {
> > +                        len1 = dpa - ent->start_dpa;
> > +                        len2 = ent_start_dpa + ent_len - dpa - len;
> > +                        len_done = ent_len - len1 - len2;
> > +
> > +                        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> > +                        cnt_delta--;
> > +
> > +                        if (len1) {
> > +                            cxl_insert_extent_to_extent_list(&tmp_list,
> > +                                                             ent_start_dpa,
> > +                                                             len1, NULL, 0);
> > +                            cnt_delta++;
> > +                        }
> > +                        if (len2) {
> > +                            cxl_insert_extent_to_extent_list(&tmp_list,
> > +                                                             dpa + len,
> > +                                                             len2, NULL, 0);
> > +                            cnt_delta++;
> > +                        }
> > +
> > +                        if (cnt_delta + ct3d->dc.total_extent_count >
> > +                            CXL_NUM_EXTENTS_SUPPORTED) {
> > +                            ret = CXL_MBOX_RESOURCES_EXHAUSTED;
> > +                            goto free_and_exit;
> > +                        }
> > +                    } else {
> > +                        /*
> > +                         * TODO: we reject the attempt to remove an extent
> > +                         * that overlaps with multiple extents in the device
> > +                         * for now, we will allow it once superset release
> > +                         * support is added.
> > +                         */
> > +                        ret = CXL_MBOX_INVALID_PA;
> > +                        goto free_and_exit;
> > +                    }
> > +
> > +                    len -= len_done;
> > +                    /* len == 0 here until superset release is added */
> > +                    break;
> > +                }
> > +            }
> > +            if (len) {
> > +                ret = CXL_MBOX_INVALID_PA;
> > +                goto free_and_exit;
> > +            }
> > +        }
> > +    }
> > +free_and_exit:
> > +    QTAILQ_FOREACH_SAFE(ent, &tmp_list, node, ent_next) {
> > +        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +/*
> > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> > + */
> > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> > +                                          uint8_t *payload_in,
> > +                                          size_t len_in,
> > +                                          uint8_t *payload_out,
> > +                                          size_t *len_out,
> > +                                          CXLCCI *cci)
> > +{
> > +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > +    CXLDCExtent *ent;
> > +    uint32_t i;
> > +    uint64_t dpa, len;
> > +    CXLRetCode ret;
> > +
> > +    if (in->num_entries_updated == 0) {
> > +        return CXL_MBOX_INVALID_INPUT;
> > +    }
> > +
> > +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    ret = cxl_dc_extent_release_dry_run(ct3d, in);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    /* From this point, all the extents to release are valid */
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        Range range;
> > +
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        while (len > 0) {
> > +            QTAILQ_FOREACH(ent, extent_list, node) {
> > +                range_init_nofail(&range, ent->start_dpa, ent->len);
> > +
> > +                /* Found the extent overlapping with */
> > +                if (range_contains(&range, dpa)) {
> > +                    uint64_t len1, len2 = 0, len_done = 0;
> > +                    uint64_t ent_start_dpa = ent->start_dpa;
> > +                    uint64_t ent_len = ent->len;
> > +
> > +                    len1 = dpa - ent_start_dpa;
> > +                    if (range_contains(&range, dpa + len - 1)) {
> > +                        len2 = ent_start_dpa + ent_len - dpa - len;
> > +                    }
> > +                    len_done = ent_len - len1 - len2;
> > +
> > +                    cxl_remove_extent_from_extent_list(extent_list, ent);
> > +                    ct3d->dc.total_extent_count -= 1;
> > +
> > +                    if (len1) {
> > +                        cxl_insert_extent_to_extent_list(extent_list,
> > +                                                         ent_start_dpa,
> > +                                                         len1, NULL, 0);
> > +                        ct3d->dc.total_extent_count += 1;
> > +                    }
> > +                    if (len2) {
> > +                        cxl_insert_extent_to_extent_list(extent_list,
> > +                                                         dpa + len,
> > +                                                         len2, NULL, 0);
> > +                        ct3d->dc.total_extent_count += 1;
> > +                    }
> > +
> > +                    len -= len_done;
> > +                    /*
> > +                     * len will always be 0 until superset release is add.
> > +                     * TODO: superset release will be added.
> > +                     */
> > +                    break;
> > +                }
> > +            }
> > +        }
> > +    }
> 
> The tmp_list generated in cxl_dc_extent_release_dry_run is identical to 
> the updated extent_list after the loops above - so you could swap the 
> existing extent_list with the tmp_list and adjust the number of extents 
> with the cnt_delta calculated, if the dry run is successful - instead of 
> duplicating the logic.
> 
> Thanks,
> Jørgen
> 
> > +    return CXL_MBOX_SUCCESS;
> > +}
> > +
> >   #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> >   #define IMMEDIATE_DATA_CHANGE (1 << 2)
> >   #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> > @@ -1413,15 +1832,15 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
> >       [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
> >           cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
> >       [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
> > -                                      cmd_events_get_interrupt_policy, 0, 0 },
> > +        cmd_events_get_interrupt_policy, 0, 0 },
> >       [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
> > -                                      cmd_events_set_interrupt_policy,
> > -                                      ~0, IMMEDIATE_CONFIG_CHANGE },
> > +        cmd_events_set_interrupt_policy,
> > +        ~0, IMMEDIATE_CONFIG_CHANGE },
> >       [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
> >           cmd_firmware_update_get_info, 0, 0 },
> >       [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
> >       [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set,
> > -                         8, IMMEDIATE_POLICY_CHANGE },
> > +        8, IMMEDIATE_POLICY_CHANGE },
> >       [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported,
> >                                 0, 0 },
> >       [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
> > @@ -1450,6 +1869,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
> >       [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
> >           "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
> >           8, 0 },
> > +    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
> > +        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
> > +        ~0, IMMEDIATE_DATA_CHANGE },
> > +    [DCD_CONFIG][RELEASE_DYN_CAP] = {
> > +        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
> > +        ~0, IMMEDIATE_DATA_CHANGE },
> >   };
> > 
> >   static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 5be3c904ba..951bd79a82 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -678,6 +678,15 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> >       return true;
> >   }
> > 
> > +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> > +{
> > +    CXLDCExtent *ent, *ent_next;
> > +
> > +    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> > +    }
> > +}
> > +
> >   static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >   {
> >       DeviceState *ds = DEVICE(ct3d);
> > @@ -874,6 +883,7 @@ err_free_special_ops:
> >       g_free(regs->special_ops);
> >   err_address_space_free:
> >       if (ct3d->dc.host_dc) {
> > +        cxl_destroy_dc_regions(ct3d);
> >           address_space_destroy(&ct3d->dc.host_dc_as);
> >       }
> >       if (ct3d->hostpmem) {
> > @@ -895,6 +905,7 @@ static void ct3_exit(PCIDevice *pci_dev)
> >       cxl_doe_cdat_release(cxl_cstate);
> >       g_free(regs->special_ops);
> >       if (ct3d->dc.host_dc) {
> > +        cxl_destroy_dc_regions(ct3d);
> >           address_space_destroy(&ct3d->dc.host_dc_as);
> >       }
> >       if (ct3d->hostpmem) {
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index 6aec6ac983..df3511e91b 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -551,4 +551,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
> > 
> >   void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
> > 
> > +CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> > +
> > +void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> > +                                        CXLDCExtent *extent);
> >   #endif
> > --
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-05 12:18     ` Jonathan Cameron via
  (?)
@ 2024-04-09 21:26     ` fan
  2024-04-10 19:49         ` Jonathan Cameron via
  -1 siblings, 1 reply; 65+ messages in thread
From: fan @ 2024-04-09 21:26 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Fri, Apr 05, 2024 at 01:18:56PM +0100, Jonathan Cameron wrote:
> On Mon, 25 Mar 2024 12:02:27 -0700
> nifan.cxl@gmail.com wrote:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> > 
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> > 
> > 1. Add dynamic capacity extents:
> > 
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "region-id": 0,
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> 
> Hi Fan,
> 
> I talk more on this inline, but to me this interface takes multiple extents
> so that we can treat them as a single 'offer' of capacity. That is they
> should be linked in the event log with the more flag and the host should
> have to handle them in one go (I known Ira and Navneet's code doesn't handle
> this yet, but that doesn't mean QEMU shouldn't).
> 
> Alternative for now would be to only support a single entry.  Keep the
> interface defined to take multiple entries but reject it at runtime.
> 
> I don't want to end up with a more complex interface in the end just
> because we allowed this form to not set the MORE flag today.
> We will need this to do tagged handling and ultimately sharing, so good
> to get it right from the start.
> 
> For tagged handling I think the right option is to have the tag alongside
> region-id not in the individual extents.  That way the interface is naturally
> used to generate the right description to the host.
> 
> >       ]
> >   }
> > }
Hi Jonathan,
Thanks for the detailed comments.

For the QMP interface, I have one question. 
Do we want the interface to follow exactly as shown in
Table 7-70 and Table 7-71 in cxl r3.1?

Fan

> > 
> > 2. Release dynamic capacity extents:
> > 
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> > 
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "region-id": 0,
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> 
> 
> 
> >          /* to-be-added range should not overlap with range already accepted */
> >          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > @@ -1585,9 +1586,13 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >      CXLDCExtentList *extent_list = &ct3d->dc.extents;
> >      uint32_t i;
> >      uint64_t dpa, len;
> > +    CXLDCExtent *ent;
> >      CXLRetCode ret;
> >  
> >      if (in->num_entries_updated == 0) {
> > +        /* Always remove the first pending extent when response received. */
> > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> >          return CXL_MBOX_SUCCESS;
> >      }
> >  
> > @@ -1604,6 +1609,8 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >  
> >      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> >      if (ret != CXL_MBOX_SUCCESS) {
> > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> 
> Ah this deals with the todo I suggest you add to the earlier patch.
> I'd not mind so much if you hadn't been so thorough on other todo notes ;)
> Add one in the earlier patch and get rid of ti here like you do below.
> 
> However as I note below I think we need to handle these as groups of extents
> not single extents. That way we keep an 'offered' set offered at the same time by
> a single command (and expose to host using the more flag) together and reject
> them on mass.
> 
> 
> >          return ret;
> >      }
> >  
> > @@ -1613,10 +1620,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >  
> >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> >          ct3d->dc.total_extent_count += 1;
> > -        /*
> > -         * TODO: we will add a pending extent list based on event log record
> > -         * and process the list according here.
> > -         */
> > +
> > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> >      }
> >  
> >      return CXL_MBOX_SUCCESS;
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 951bd79a82..74cb64e843 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> 
> >  
> >  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > @@ -1449,7 +1454,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
> >          return CXL_EVENT_TYPE_FAIL;
> >      case CXL_EVENT_LOG_FATAL:
> >          return CXL_EVENT_TYPE_FATAL;
> > -/* DCD not yet supported */
> 
> Drop the comment but don't add the code.  We are handling DCD differently
> from other events, so this code should never deal with it.
> 
> > +    case CXL_EVENT_LOG_DYNCAP:
> > +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
> >      default:
> >          return -EINVAL;
> >      }
> > @@ -1700,6 +1706,250 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
> >      }
> >  }
> 
> > +/*
> > + * Check whether the range [dpa, dpa + len -1] has overlaps with extents in
> 
> space after - (just looks odd otherwise)
> 
> > + * the list.
> > + * Return value: return true if has overlaps; otherwise, return false
> > + */
> > +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> > +                                           uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_overlaps_range(&range1, &range2)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * Check whether the range [dpa, dpa + len -1] is contained by extents in 
> 
> space after -
> 
> > + * the list.
> > + * Will check multiple extents containment once superset release is added.
> > + * Return value: return true if range is contained; otherwise, return false
> > + */
> > +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> > +                                    uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_contains_range(&range2, &range1)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * The main function to process dynamic capacity event. Currently DC extents
> > + * add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> 
> As below. Don't pass in a CxlEventLog.  Whilst some infrastructure is shared
> with other event logs, we don't want to accidentally enable other events
> being added to the DC event log.
> 
> > +                                             CXLDCEventType type, uint16_t hid,
> > +                                             uint8_t rid,
> > +                                             CXLDCExtentRecordList *records,
> > +                                             Error **errp)
> > +{
> > +    Object *obj;
> > +    CXLEventDynamicCapacity dCap = {};
> > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > +    CXLType3Dev *dcd;
> > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > +    uint32_t num_extents = 0;
> > +    CXLDCExtentRecordList *list;
> > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > +    uint8_t enc_log;
> > +    uint64_t dpa, offset, len, block_size;
> > +    int i, rc;
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +
> > +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> > +    if (!obj) {
> > +        error_setg(errp, "Unable to resolve CXL type 3 device");
> > +        return;
> > +    }
> > +
> > +    dcd = CXL_TYPE3(obj);
> > +    if (!dcd->dc.num_regions) {
> > +        error_setg(errp, "No dynamic capacity support from the device");
> > +        return;
> > +    }
> > +
> > +    rc = ct3d_qmp_cxl_event_log_enc(log);
> 
> enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP; always so don't look it up.
> 
> > +    if (rc < 0) {
> > +        error_setg(errp, "Unhandled error log type");
> > +        return;
> > +    }
> > +    enc_log = rc;
> > +
> > +    if (rid >= dcd->dc.num_regions) {
> > +        error_setg(errp, "region id is too large");
> > +        return;
> > +    }
> > +    block_size = dcd->dc.regions[rid].block_size;
> > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = offset + dcd->dc.regions[rid].base;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        num_extents++;
> > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                               dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with pending DPA range");
> > +                return;
> > +            }
> > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> > +                                                dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with non-existing DPA range");
> > +                return;
> > +            }
> > +        }
> > +        list = list->next;
> > +    }
> > +    if (num_extents == 0) {
> 
> We can just check if there is a first one.  That check can be done before
> counting them and is probably a little more elegant than leaving it to down here.
> I'm not sure we can pass in an empty list but if we can (easy to poke interface
> and check) then I assume records == NULL. 
> 
> > +        error_setg(errp, "no valid extents to send to process");
> > +        return;
> > +    }
> > +
> > +    /* Create extent list for event being passed to host */
> > +    i = 0;
> > +    list = records;
> > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = dcd->dc.regions[rid].base + offset;
> > +
> > +        extents[i].start_dpa = dpa;
> > +        extents[i].len = len;
> > +        memset(extents[i].tag, 0, 0x10);
> > +        extents[i].shared_seq = 0;
> > +        list = list->next;
> > +        i++;
> > +    }
> > +
> > +    /*
> > +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > +     *
> > +     * All Dynamic Capacity event records shall set the Event Record Severity
> > +     * field in the Common Event Record Format to Informational Event. All
> > +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > +     * Event Log.
> > +     */
> > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > +    dCap.type = type;
> > +    /* FIXME: for now, validity flag is cleared */
> > +    dCap.validity_flags = 0;
> > +    stw_le_p(&dCap.host_id, hid);
> > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > +    dCap.updated_region_id = 0;
> > +    /*
> > +     * FIXME: for now, the "More" flag is cleared as there is only one
> > +     * extent associating with each record and tag-based release is
> > +     * not supported.
> 
> This is misleading by my understanding of the specification.
> More isn't directly related to tags (though it is necessary for some
> flows with tags, when sharing is enabled anyway).
> The reference to record also isn't that relevant. The idea is you set
> it for all but the last record pushed to the event log (from a given
> action from an FM).
> 
> The whole reason to have a multi extent injection interface is to set
> the more flag to indicate that the OS needs to treat a bunch of extents
> as one 'offer' of capacity.  So a rejection from the OS needs to take
> out 'all those records'.  The proposed linux code will currently reject
> all by the first extent (I moaned about that yesterday). 
> 
> It is fine to not support this in the current code, but then I would check
> the number of extents and reject any multi extent commands until we
> do support it.
> 
> Ultimately I want a qmp command with more than one extent to mean
> they are one 'offer' of capacity and must be handled as such by
> the OS.  I.e. it can't reply with multiple unrelated acceptance
> or reject replies.
> 
> On the add side this is easy to support, the fiddly bit is if the
> OS rejects some or all of the capacity and you then need to
> take out all the extents offered that it hasn't accepted in it's reply.
> 
> Pending list will need to maintain that association.
> Maybe the easiest way is to have pending list be a list of sublists?
> That way each sublist is handled in one go and any non accepted extents
> in that sub list are dropped.
> 
>  
> > +     */
> > +    dCap.flags = 0;
> > +    for (i = 0; i < num_extents; i++) {
> > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > +               sizeof(CXLDCExtentRaw));
> > +
> > +        if (type == DC_EVENT_ADD_CAPACITY) {
> > +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> > +                                             extents[i].start_dpa,
> > +                                             extents[i].len,
> > +                                             extents[i].tag,
> > +                                             extents[i].shared_seq);
> > +        }
> > +
> > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > +                             (CXLEventRecordRaw *)&dCap)) {
> > +            cxl_event_irq_assert(dcd);
> > +        }
> > +    }
> > +}
> > +
> > +void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
> > +                                  CXLDCExtentRecordList  *records,
> > +                                  Error **errp)
> > +{
> > +   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> 
> Drop passing in the log, it doesn't make sense given these events only occur
> on that log and we can hard code it in the function.
> 
> > +                                    DC_EVENT_ADD_CAPACITY, 0,
> > +                                    region_id, records, errp);
> > +}
> > +
> > +void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> > +                                      CXLDCExtentRecordList  *records,
> > +                                      Error **errp)
> > +{
> > +    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> > +                                     DC_EVENT_RELEASE_CAPACITY, 0,
> > +                                     region_id, records, errp);
> > +}
> > +
> >  static void ct3_class_init(ObjectClass *oc, void *data)
> >  {
> 
> 
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index df3511e91b..b84063d9f4 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -494,6 +494,7 @@ struct CXLType3Dev {
> >           */
> >          uint64_t total_capacity; /* 256M aligned */
> >          CXLDCExtentList extents;
> > +        CXLDCExtentList extents_pending;
> >          uint32_t total_extent_count;
> >          uint32_t ext_list_gen_seq;
> 
> 
> 
> >  #endif /* CXL_EVENTS_H */
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 8cc4c72fa9..2645004666 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -19,13 +19,16 @@
> >  #
> >  # @fatal: Fatal Event Log
> >  #
> > +# @dyncap: Dynamic Capacity Event Log
> > +#
> >  # Since: 8.1
> >  ##
> >  { 'enum': 'CxlEventLog',
> >    'data': ['informational',
> >             'warning',
> >             'failure',
> > -           'fatal']
> > +           'fatal',
> > +           'dyncap']
> 
> Does this have the side effect of letting us inject error events
> onto the dynamic capacity log? 
> 
> >   }
> >  
> >  ##
> > @@ -361,3 +364,59 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> ...
> 
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to start add dynamic capacity extents flow. The device will
> > +# have to acknowledged the acceptance of the extents before they are usable.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to add
> > +# @extents: Extents to add
> > +#
> > +# Since : 9.0
> 
> 9.1
> 
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'region-id': 'uint8',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to start release dynamic capacity extents flow. The host will
> > +# need to respond to indicate that it has released the capacity before it
> > +# is made unavailable for read and write and can be re-added.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to release
> > +# @extents: Extents to release
> > +#
> > +# Since : 9.0
> 
> 9.1
> 
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'region-id': 'uint8',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-09 21:26     ` fan
@ 2024-04-10 19:49         ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-10 19:49 UTC (permalink / raw)
  To: fan
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Tue, 9 Apr 2024 14:26:51 -0700
fan <nifan.cxl@gmail.com> wrote:

> On Fri, Apr 05, 2024 at 01:18:56PM +0100, Jonathan Cameron wrote:
> > On Mon, 25 Mar 2024 12:02:27 -0700
> > nifan.cxl@gmail.com wrote:
> >   
> > > From: Fan Ni <fan.ni@samsung.com>
> > > 
> > > To simulate FM functionalities for initiating Dynamic Capacity Add
> > > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > > add/release dynamic capacity extents requests.
> > > 
> > > With the change, we allow to release an extent only when its DPA range
> > > is contained by a single accepted extent in the device. That is to say,
> > > extent superset release is not supported yet.
> > > 
> > > 1. Add dynamic capacity extents:
> > > 
> > > For example, the command to add two continuous extents (each 128MiB long)
> > > to region 0 (starting at DPA offset 0) looks like below:
> > > 
> > > { "execute": "qmp_capabilities" }
> > > 
> > > { "execute": "cxl-add-dynamic-capacity",
> > >   "arguments": {
> > >       "path": "/machine/peripheral/cxl-dcd0",
> > >       "region-id": 0,
> > >       "extents": [
> > >       {
> > >           "offset": 0,
> > >           "len": 134217728
> > >       },
> > >       {
> > >           "offset": 134217728,
> > >           "len": 134217728
> > >       }  
> > 
> > Hi Fan,
> > 
> > I talk more on this inline, but to me this interface takes multiple extents
> > so that we can treat them as a single 'offer' of capacity. That is they
> > should be linked in the event log with the more flag and the host should
> > have to handle them in one go (I known Ira and Navneet's code doesn't handle
> > this yet, but that doesn't mean QEMU shouldn't).
> > 
> > Alternative for now would be to only support a single entry.  Keep the
> > interface defined to take multiple entries but reject it at runtime.
> > 
> > I don't want to end up with a more complex interface in the end just
> > because we allowed this form to not set the MORE flag today.
> > We will need this to do tagged handling and ultimately sharing, so good
> > to get it right from the start.
> > 
> > For tagged handling I think the right option is to have the tag alongside
> > region-id not in the individual extents.  That way the interface is naturally
> > used to generate the right description to the host.
> >   
> > >       ]
> > >   }
> > > }  
> Hi Jonathan,
> Thanks for the detailed comments.
> 
> For the QMP interface, I have one question. 
> Do we want the interface to follow exactly as shown in
> Table 7-70 and Table 7-71 in cxl r3.1?

I don't mind if it doesn't as long as it lets us pass reasonable
things in to test the kernel code.  I'd have the interface designed
to allow us to generate the set of records associate with a given
'request'.  E.g. All same tag in the same QMP command.

If we want multiple sets of such records (and the extents to back
them) we can issue multiple calls.

Jonathan



> 
> Fan
> 
> > > 
> > > 2. Release dynamic capacity extents:
> > > 
> > > For example, the command to release an extent of size 128MiB from region 0
> > > (DPA offset 128MiB) looks like below:
> > > 
> > > { "execute": "cxl-release-dynamic-capacity",
> > >   "arguments": {
> > >       "path": "/machine/peripheral/cxl-dcd0",
> > >       "region-id": 0,
> > >       "extents": [
> > >       {
> > >           "offset": 134217728,
> > >           "len": 134217728
> > >       }
> > >       ]
> > >   }
> > > }
> > > 
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>  
> > 
> > 
> >   
> > >          /* to-be-added range should not overlap with range already accepted */
> > >          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > > @@ -1585,9 +1586,13 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > >      CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > >      uint32_t i;
> > >      uint64_t dpa, len;
> > > +    CXLDCExtent *ent;
> > >      CXLRetCode ret;
> > >  
> > >      if (in->num_entries_updated == 0) {
> > > +        /* Always remove the first pending extent when response received. */
> > > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> > >          return CXL_MBOX_SUCCESS;
> > >      }
> > >  
> > > @@ -1604,6 +1609,8 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > >  
> > >      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> > >      if (ret != CXL_MBOX_SUCCESS) {
> > > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);  
> > 
> > Ah this deals with the todo I suggest you add to the earlier patch.
> > I'd not mind so much if you hadn't been so thorough on other todo notes ;)
> > Add one in the earlier patch and get rid of ti here like you do below.
> > 
> > However as I note below I think we need to handle these as groups of extents
> > not single extents. That way we keep an 'offered' set offered at the same time by
> > a single command (and expose to host using the more flag) together and reject
> > them on mass.
> > 
> >   
> > >          return ret;
> > >      }
> > >  
> > > @@ -1613,10 +1620,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > >  
> > >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > >          ct3d->dc.total_extent_count += 1;
> > > -        /*
> > > -         * TODO: we will add a pending extent list based on event log record
> > > -         * and process the list according here.
> > > -         */
> > > +
> > > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> > >      }
> > >  
> > >      return CXL_MBOX_SUCCESS;
> > > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > > index 951bd79a82..74cb64e843 100644
> > > --- a/hw/mem/cxl_type3.c
> > > +++ b/hw/mem/cxl_type3.c  
> >   
> > >  
> > >  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > > @@ -1449,7 +1454,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
> > >          return CXL_EVENT_TYPE_FAIL;
> > >      case CXL_EVENT_LOG_FATAL:
> > >          return CXL_EVENT_TYPE_FATAL;
> > > -/* DCD not yet supported */  
> > 
> > Drop the comment but don't add the code.  We are handling DCD differently
> > from other events, so this code should never deal with it.
> >   
> > > +    case CXL_EVENT_LOG_DYNCAP:
> > > +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
> > >      default:
> > >          return -EINVAL;
> > >      }
> > > @@ -1700,6 +1706,250 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
> > >      }
> > >  }  
> >   
> > > +/*
> > > + * Check whether the range [dpa, dpa + len -1] has overlaps with extents in  
> > 
> > space after - (just looks odd otherwise)
> >   
> > > + * the list.
> > > + * Return value: return true if has overlaps; otherwise, return false
> > > + */
> > > +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> > > +                                           uint64_t dpa, uint64_t len)
> > > +{
> > > +    CXLDCExtent *ent;
> > > +    Range range1, range2;
> > > +
> > > +    if (!list) {
> > > +        return false;
> > > +    }
> > > +
> > > +    range_init_nofail(&range1, dpa, len);
> > > +    QTAILQ_FOREACH(ent, list, node) {
> > > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > > +        if (range_overlaps_range(&range1, &range2)) {
> > > +            return true;
> > > +        }
> > > +    }
> > > +    return false;
> > > +}
> > > +
> > > +/*
> > > + * Check whether the range [dpa, dpa + len -1] is contained by extents in   
> > 
> > space after -
> >   
> > > + * the list.
> > > + * Will check multiple extents containment once superset release is added.
> > > + * Return value: return true if range is contained; otherwise, return false
> > > + */
> > > +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> > > +                                    uint64_t dpa, uint64_t len)
> > > +{
> > > +    CXLDCExtent *ent;
> > > +    Range range1, range2;
> > > +
> > > +    if (!list) {
> > > +        return false;
> > > +    }
> > > +
> > > +    range_init_nofail(&range1, dpa, len);
> > > +    QTAILQ_FOREACH(ent, list, node) {
> > > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > > +        if (range_contains_range(&range2, &range1)) {
> > > +            return true;
> > > +        }
> > > +    }
> > > +    return false;
> > > +}
> > > +
> > > +/*
> > > + * The main function to process dynamic capacity event. Currently DC extents
> > > + * add/release requests are processed.
> > > + */
> > > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,  
> > 
> > As below. Don't pass in a CxlEventLog.  Whilst some infrastructure is shared
> > with other event logs, we don't want to accidentally enable other events
> > being added to the DC event log.
> >   
> > > +                                             CXLDCEventType type, uint16_t hid,
> > > +                                             uint8_t rid,
> > > +                                             CXLDCExtentRecordList *records,
> > > +                                             Error **errp)
> > > +{
> > > +    Object *obj;
> > > +    CXLEventDynamicCapacity dCap = {};
> > > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > > +    CXLType3Dev *dcd;
> > > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > > +    uint32_t num_extents = 0;
> > > +    CXLDCExtentRecordList *list;
> > > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > > +    uint8_t enc_log;
> > > +    uint64_t dpa, offset, len, block_size;
> > > +    int i, rc;
> > > +    g_autofree unsigned long *blk_bitmap = NULL;
> > > +
> > > +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> > > +    if (!obj) {
> > > +        error_setg(errp, "Unable to resolve CXL type 3 device");
> > > +        return;
> > > +    }
> > > +
> > > +    dcd = CXL_TYPE3(obj);
> > > +    if (!dcd->dc.num_regions) {
> > > +        error_setg(errp, "No dynamic capacity support from the device");
> > > +        return;
> > > +    }
> > > +
> > > +    rc = ct3d_qmp_cxl_event_log_enc(log);  
> > 
> > enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP; always so don't look it up.
> >   
> > > +    if (rc < 0) {
> > > +        error_setg(errp, "Unhandled error log type");
> > > +        return;
> > > +    }
> > > +    enc_log = rc;
> > > +
> > > +    if (rid >= dcd->dc.num_regions) {
> > > +        error_setg(errp, "region id is too large");
> > > +        return;
> > > +    }
> > > +    block_size = dcd->dc.regions[rid].block_size;
> > > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > > +
> > > +    /* Sanity check and count the extents */
> > > +    list = records;
> > > +    while (list) {
> > > +        offset = list->value->offset;
> > > +        len = list->value->len;
> > > +        dpa = offset + dcd->dc.regions[rid].base;
> > > +
> > > +        if (len == 0) {
> > > +            error_setg(errp, "extent with 0 length is not allowed");
> > > +            return;
> > > +        }
> > > +
> > > +        if (offset % block_size || len % block_size) {
> > > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > > +            return;
> > > +        }
> > > +
> > > +        if (offset + len > dcd->dc.regions[rid].len) {
> > > +            error_setg(errp, "extent range is beyond the region end");
> > > +            return;
> > > +        }
> > > +
> > > +        /* No duplicate or overlapped extents are allowed */
> > > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > > +                              len / block_size)) {
> > > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > > +            return;
> > > +        }
> > > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > > +
> > > +        num_extents++;
> > > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> > > +                                               dpa, len)) {
> > > +                error_setg(errp,
> > > +                           "cannot release extent with pending DPA range");
> > > +                return;
> > > +            }
> > > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> > > +                                                dpa, len)) {
> > > +                error_setg(errp,
> > > +                           "cannot release extent with non-existing DPA range");
> > > +                return;
> > > +            }
> > > +        }
> > > +        list = list->next;
> > > +    }
> > > +    if (num_extents == 0) {  
> > 
> > We can just check if there is a first one.  That check can be done before
> > counting them and is probably a little more elegant than leaving it to down here.
> > I'm not sure we can pass in an empty list but if we can (easy to poke interface
> > and check) then I assume records == NULL. 
> >   
> > > +        error_setg(errp, "no valid extents to send to process");
> > > +        return;
> > > +    }
> > > +
> > > +    /* Create extent list for event being passed to host */
> > > +    i = 0;
> > > +    list = records;
> > > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > > +    while (list) {
> > > +        offset = list->value->offset;
> > > +        len = list->value->len;
> > > +        dpa = dcd->dc.regions[rid].base + offset;
> > > +
> > > +        extents[i].start_dpa = dpa;
> > > +        extents[i].len = len;
> > > +        memset(extents[i].tag, 0, 0x10);
> > > +        extents[i].shared_seq = 0;
> > > +        list = list->next;
> > > +        i++;
> > > +    }
> > > +
> > > +    /*
> > > +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > > +     *
> > > +     * All Dynamic Capacity event records shall set the Event Record Severity
> > > +     * field in the Common Event Record Format to Informational Event. All
> > > +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > > +     * Event Log.
> > > +     */
> > > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > > +
> > > +    dCap.type = type;
> > > +    /* FIXME: for now, validity flag is cleared */
> > > +    dCap.validity_flags = 0;
> > > +    stw_le_p(&dCap.host_id, hid);
> > > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > > +    dCap.updated_region_id = 0;
> > > +    /*
> > > +     * FIXME: for now, the "More" flag is cleared as there is only one
> > > +     * extent associating with each record and tag-based release is
> > > +     * not supported.  
> > 
> > This is misleading by my understanding of the specification.
> > More isn't directly related to tags (though it is necessary for some
> > flows with tags, when sharing is enabled anyway).
> > The reference to record also isn't that relevant. The idea is you set
> > it for all but the last record pushed to the event log (from a given
> > action from an FM).
> > 
> > The whole reason to have a multi extent injection interface is to set
> > the more flag to indicate that the OS needs to treat a bunch of extents
> > as one 'offer' of capacity.  So a rejection from the OS needs to take
> > out 'all those records'.  The proposed linux code will currently reject
> > all by the first extent (I moaned about that yesterday). 
> > 
> > It is fine to not support this in the current code, but then I would check
> > the number of extents and reject any multi extent commands until we
> > do support it.
> > 
> > Ultimately I want a qmp command with more than one extent to mean
> > they are one 'offer' of capacity and must be handled as such by
> > the OS.  I.e. it can't reply with multiple unrelated acceptance
> > or reject replies.
> > 
> > On the add side this is easy to support, the fiddly bit is if the
> > OS rejects some or all of the capacity and you then need to
> > take out all the extents offered that it hasn't accepted in it's reply.
> > 
> > Pending list will need to maintain that association.
> > Maybe the easiest way is to have pending list be a list of sublists?
> > That way each sublist is handled in one go and any non accepted extents
> > in that sub list are dropped.
> > 
> >    
> > > +     */
> > > +    dCap.flags = 0;
> > > +    for (i = 0; i < num_extents; i++) {
> > > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > > +               sizeof(CXLDCExtentRaw));
> > > +
> > > +        if (type == DC_EVENT_ADD_CAPACITY) {
> > > +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> > > +                                             extents[i].start_dpa,
> > > +                                             extents[i].len,
> > > +                                             extents[i].tag,
> > > +                                             extents[i].shared_seq);
> > > +        }
> > > +
> > > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > > +                             (CXLEventRecordRaw *)&dCap)) {
> > > +            cxl_event_irq_assert(dcd);
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
> > > +                                  CXLDCExtentRecordList  *records,
> > > +                                  Error **errp)
> > > +{
> > > +   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,  
> > 
> > Drop passing in the log, it doesn't make sense given these events only occur
> > on that log and we can hard code it in the function.
> >   
> > > +                                    DC_EVENT_ADD_CAPACITY, 0,
> > > +                                    region_id, records, errp);
> > > +}
> > > +
> > > +void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> > > +                                      CXLDCExtentRecordList  *records,
> > > +                                      Error **errp)
> > > +{
> > > +    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> > > +                                     DC_EVENT_RELEASE_CAPACITY, 0,
> > > +                                     region_id, records, errp);
> > > +}
> > > +
> > >  static void ct3_class_init(ObjectClass *oc, void *data)
> > >  {  
> > 
> >   
> > > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > > index df3511e91b..b84063d9f4 100644
> > > --- a/include/hw/cxl/cxl_device.h
> > > +++ b/include/hw/cxl/cxl_device.h
> > > @@ -494,6 +494,7 @@ struct CXLType3Dev {
> > >           */
> > >          uint64_t total_capacity; /* 256M aligned */
> > >          CXLDCExtentList extents;
> > > +        CXLDCExtentList extents_pending;
> > >          uint32_t total_extent_count;
> > >          uint32_t ext_list_gen_seq;  
> > 
> > 
> >   
> > >  #endif /* CXL_EVENTS_H */
> > > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > > index 8cc4c72fa9..2645004666 100644
> > > --- a/qapi/cxl.json
> > > +++ b/qapi/cxl.json
> > > @@ -19,13 +19,16 @@
> > >  #
> > >  # @fatal: Fatal Event Log
> > >  #
> > > +# @dyncap: Dynamic Capacity Event Log
> > > +#
> > >  # Since: 8.1
> > >  ##
> > >  { 'enum': 'CxlEventLog',
> > >    'data': ['informational',
> > >             'warning',
> > >             'failure',
> > > -           'fatal']
> > > +           'fatal',
> > > +           'dyncap']  
> > 
> > Does this have the side effect of letting us inject error events
> > onto the dynamic capacity log? 
> >   
> > >   }
> > >  
> > >  ##
> > > @@ -361,3 +364,59 @@
> > >  ##
> > >  {'command': 'cxl-inject-correctable-error',
> > >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}  
> > ...
> >   
> > > +##
> > > +# @cxl-add-dynamic-capacity:
> > > +#
> > > +# Command to start add dynamic capacity extents flow. The device will
> > > +# have to acknowledged the acceptance of the extents before they are usable.
> > > +#
> > > +# @path: CXL DCD canonical QOM path
> > > +# @region-id: id of the region where the extent to add
> > > +# @extents: Extents to add
> > > +#
> > > +# Since : 9.0  
> > 
> > 9.1
> >   
> > > +##
> > > +{ 'command': 'cxl-add-dynamic-capacity',
> > > +  'data': { 'path': 'str',
> > > +            'region-id': 'uint8',
> > > +            'extents': [ 'CXLDCExtentRecord' ]
> > > +           }
> > > +}
> > > +
> > > +##
> > > +# @cxl-release-dynamic-capacity:
> > > +#
> > > +# Command to start release dynamic capacity extents flow. The host will
> > > +# need to respond to indicate that it has released the capacity before it
> > > +# is made unavailable for read and write and can be re-added.
> > > +#
> > > +# @path: CXL DCD canonical QOM path
> > > +# @region-id: id of the region where the extent to release
> > > +# @extents: Extents to release
> > > +#
> > > +# Since : 9.0  
> > 
> > 9.1
> >   
> > > +##
> > > +{ 'command': 'cxl-release-dynamic-capacity',
> > > +  'data': { 'path': 'str',
> > > +            'region-id': 'uint8',
> > > +            'extents': [ 'CXLDCExtentRecord' ]
> > > +           }
> > > +}  
> >   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-10 19:49         ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-10 19:49 UTC (permalink / raw)
  To: fan
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Tue, 9 Apr 2024 14:26:51 -0700
fan <nifan.cxl@gmail.com> wrote:

> On Fri, Apr 05, 2024 at 01:18:56PM +0100, Jonathan Cameron wrote:
> > On Mon, 25 Mar 2024 12:02:27 -0700
> > nifan.cxl@gmail.com wrote:
> >   
> > > From: Fan Ni <fan.ni@samsung.com>
> > > 
> > > To simulate FM functionalities for initiating Dynamic Capacity Add
> > > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > > add/release dynamic capacity extents requests.
> > > 
> > > With the change, we allow to release an extent only when its DPA range
> > > is contained by a single accepted extent in the device. That is to say,
> > > extent superset release is not supported yet.
> > > 
> > > 1. Add dynamic capacity extents:
> > > 
> > > For example, the command to add two continuous extents (each 128MiB long)
> > > to region 0 (starting at DPA offset 0) looks like below:
> > > 
> > > { "execute": "qmp_capabilities" }
> > > 
> > > { "execute": "cxl-add-dynamic-capacity",
> > >   "arguments": {
> > >       "path": "/machine/peripheral/cxl-dcd0",
> > >       "region-id": 0,
> > >       "extents": [
> > >       {
> > >           "offset": 0,
> > >           "len": 134217728
> > >       },
> > >       {
> > >           "offset": 134217728,
> > >           "len": 134217728
> > >       }  
> > 
> > Hi Fan,
> > 
> > I talk more on this inline, but to me this interface takes multiple extents
> > so that we can treat them as a single 'offer' of capacity. That is they
> > should be linked in the event log with the more flag and the host should
> > have to handle them in one go (I known Ira and Navneet's code doesn't handle
> > this yet, but that doesn't mean QEMU shouldn't).
> > 
> > Alternative for now would be to only support a single entry.  Keep the
> > interface defined to take multiple entries but reject it at runtime.
> > 
> > I don't want to end up with a more complex interface in the end just
> > because we allowed this form to not set the MORE flag today.
> > We will need this to do tagged handling and ultimately sharing, so good
> > to get it right from the start.
> > 
> > For tagged handling I think the right option is to have the tag alongside
> > region-id not in the individual extents.  That way the interface is naturally
> > used to generate the right description to the host.
> >   
> > >       ]
> > >   }
> > > }  
> Hi Jonathan,
> Thanks for the detailed comments.
> 
> For the QMP interface, I have one question. 
> Do we want the interface to follow exactly as shown in
> Table 7-70 and Table 7-71 in cxl r3.1?

I don't mind if it doesn't as long as it lets us pass reasonable
things in to test the kernel code.  I'd have the interface designed
to allow us to generate the set of records associate with a given
'request'.  E.g. All same tag in the same QMP command.

If we want multiple sets of such records (and the extents to back
them) we can issue multiple calls.

Jonathan



> 
> Fan
> 
> > > 
> > > 2. Release dynamic capacity extents:
> > > 
> > > For example, the command to release an extent of size 128MiB from region 0
> > > (DPA offset 128MiB) looks like below:
> > > 
> > > { "execute": "cxl-release-dynamic-capacity",
> > >   "arguments": {
> > >       "path": "/machine/peripheral/cxl-dcd0",
> > >       "region-id": 0,
> > >       "extents": [
> > >       {
> > >           "offset": 134217728,
> > >           "len": 134217728
> > >       }
> > >       ]
> > >   }
> > > }
> > > 
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>  
> > 
> > 
> >   
> > >          /* to-be-added range should not overlap with range already accepted */
> > >          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > > @@ -1585,9 +1586,13 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > >      CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > >      uint32_t i;
> > >      uint64_t dpa, len;
> > > +    CXLDCExtent *ent;
> > >      CXLRetCode ret;
> > >  
> > >      if (in->num_entries_updated == 0) {
> > > +        /* Always remove the first pending extent when response received. */
> > > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> > >          return CXL_MBOX_SUCCESS;
> > >      }
> > >  
> > > @@ -1604,6 +1609,8 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > >  
> > >      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> > >      if (ret != CXL_MBOX_SUCCESS) {
> > > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);  
> > 
> > Ah this deals with the todo I suggest you add to the earlier patch.
> > I'd not mind so much if you hadn't been so thorough on other todo notes ;)
> > Add one in the earlier patch and get rid of ti here like you do below.
> > 
> > However as I note below I think we need to handle these as groups of extents
> > not single extents. That way we keep an 'offered' set offered at the same time by
> > a single command (and expose to host using the more flag) together and reject
> > them on mass.
> > 
> >   
> > >          return ret;
> > >      }
> > >  
> > > @@ -1613,10 +1620,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > >  
> > >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > >          ct3d->dc.total_extent_count += 1;
> > > -        /*
> > > -         * TODO: we will add a pending extent list based on event log record
> > > -         * and process the list according here.
> > > -         */
> > > +
> > > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> > >      }
> > >  
> > >      return CXL_MBOX_SUCCESS;
> > > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > > index 951bd79a82..74cb64e843 100644
> > > --- a/hw/mem/cxl_type3.c
> > > +++ b/hw/mem/cxl_type3.c  
> >   
> > >  
> > >  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > > @@ -1449,7 +1454,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
> > >          return CXL_EVENT_TYPE_FAIL;
> > >      case CXL_EVENT_LOG_FATAL:
> > >          return CXL_EVENT_TYPE_FATAL;
> > > -/* DCD not yet supported */  
> > 
> > Drop the comment but don't add the code.  We are handling DCD differently
> > from other events, so this code should never deal with it.
> >   
> > > +    case CXL_EVENT_LOG_DYNCAP:
> > > +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
> > >      default:
> > >          return -EINVAL;
> > >      }
> > > @@ -1700,6 +1706,250 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
> > >      }
> > >  }  
> >   
> > > +/*
> > > + * Check whether the range [dpa, dpa + len -1] has overlaps with extents in  
> > 
> > space after - (just looks odd otherwise)
> >   
> > > + * the list.
> > > + * Return value: return true if has overlaps; otherwise, return false
> > > + */
> > > +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> > > +                                           uint64_t dpa, uint64_t len)
> > > +{
> > > +    CXLDCExtent *ent;
> > > +    Range range1, range2;
> > > +
> > > +    if (!list) {
> > > +        return false;
> > > +    }
> > > +
> > > +    range_init_nofail(&range1, dpa, len);
> > > +    QTAILQ_FOREACH(ent, list, node) {
> > > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > > +        if (range_overlaps_range(&range1, &range2)) {
> > > +            return true;
> > > +        }
> > > +    }
> > > +    return false;
> > > +}
> > > +
> > > +/*
> > > + * Check whether the range [dpa, dpa + len -1] is contained by extents in   
> > 
> > space after -
> >   
> > > + * the list.
> > > + * Will check multiple extents containment once superset release is added.
> > > + * Return value: return true if range is contained; otherwise, return false
> > > + */
> > > +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> > > +                                    uint64_t dpa, uint64_t len)
> > > +{
> > > +    CXLDCExtent *ent;
> > > +    Range range1, range2;
> > > +
> > > +    if (!list) {
> > > +        return false;
> > > +    }
> > > +
> > > +    range_init_nofail(&range1, dpa, len);
> > > +    QTAILQ_FOREACH(ent, list, node) {
> > > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > > +        if (range_contains_range(&range2, &range1)) {
> > > +            return true;
> > > +        }
> > > +    }
> > > +    return false;
> > > +}
> > > +
> > > +/*
> > > + * The main function to process dynamic capacity event. Currently DC extents
> > > + * add/release requests are processed.
> > > + */
> > > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,  
> > 
> > As below. Don't pass in a CxlEventLog.  Whilst some infrastructure is shared
> > with other event logs, we don't want to accidentally enable other events
> > being added to the DC event log.
> >   
> > > +                                             CXLDCEventType type, uint16_t hid,
> > > +                                             uint8_t rid,
> > > +                                             CXLDCExtentRecordList *records,
> > > +                                             Error **errp)
> > > +{
> > > +    Object *obj;
> > > +    CXLEventDynamicCapacity dCap = {};
> > > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > > +    CXLType3Dev *dcd;
> > > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > > +    uint32_t num_extents = 0;
> > > +    CXLDCExtentRecordList *list;
> > > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > > +    uint8_t enc_log;
> > > +    uint64_t dpa, offset, len, block_size;
> > > +    int i, rc;
> > > +    g_autofree unsigned long *blk_bitmap = NULL;
> > > +
> > > +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> > > +    if (!obj) {
> > > +        error_setg(errp, "Unable to resolve CXL type 3 device");
> > > +        return;
> > > +    }
> > > +
> > > +    dcd = CXL_TYPE3(obj);
> > > +    if (!dcd->dc.num_regions) {
> > > +        error_setg(errp, "No dynamic capacity support from the device");
> > > +        return;
> > > +    }
> > > +
> > > +    rc = ct3d_qmp_cxl_event_log_enc(log);  
> > 
> > enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP; always so don't look it up.
> >   
> > > +    if (rc < 0) {
> > > +        error_setg(errp, "Unhandled error log type");
> > > +        return;
> > > +    }
> > > +    enc_log = rc;
> > > +
> > > +    if (rid >= dcd->dc.num_regions) {
> > > +        error_setg(errp, "region id is too large");
> > > +        return;
> > > +    }
> > > +    block_size = dcd->dc.regions[rid].block_size;
> > > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > > +
> > > +    /* Sanity check and count the extents */
> > > +    list = records;
> > > +    while (list) {
> > > +        offset = list->value->offset;
> > > +        len = list->value->len;
> > > +        dpa = offset + dcd->dc.regions[rid].base;
> > > +
> > > +        if (len == 0) {
> > > +            error_setg(errp, "extent with 0 length is not allowed");
> > > +            return;
> > > +        }
> > > +
> > > +        if (offset % block_size || len % block_size) {
> > > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > > +            return;
> > > +        }
> > > +
> > > +        if (offset + len > dcd->dc.regions[rid].len) {
> > > +            error_setg(errp, "extent range is beyond the region end");
> > > +            return;
> > > +        }
> > > +
> > > +        /* No duplicate or overlapped extents are allowed */
> > > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > > +                              len / block_size)) {
> > > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > > +            return;
> > > +        }
> > > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > > +
> > > +        num_extents++;
> > > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> > > +                                               dpa, len)) {
> > > +                error_setg(errp,
> > > +                           "cannot release extent with pending DPA range");
> > > +                return;
> > > +            }
> > > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> > > +                                                dpa, len)) {
> > > +                error_setg(errp,
> > > +                           "cannot release extent with non-existing DPA range");
> > > +                return;
> > > +            }
> > > +        }
> > > +        list = list->next;
> > > +    }
> > > +    if (num_extents == 0) {  
> > 
> > We can just check if there is a first one.  That check can be done before
> > counting them and is probably a little more elegant than leaving it to down here.
> > I'm not sure we can pass in an empty list but if we can (easy to poke interface
> > and check) then I assume records == NULL. 
> >   
> > > +        error_setg(errp, "no valid extents to send to process");
> > > +        return;
> > > +    }
> > > +
> > > +    /* Create extent list for event being passed to host */
> > > +    i = 0;
> > > +    list = records;
> > > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > > +    while (list) {
> > > +        offset = list->value->offset;
> > > +        len = list->value->len;
> > > +        dpa = dcd->dc.regions[rid].base + offset;
> > > +
> > > +        extents[i].start_dpa = dpa;
> > > +        extents[i].len = len;
> > > +        memset(extents[i].tag, 0, 0x10);
> > > +        extents[i].shared_seq = 0;
> > > +        list = list->next;
> > > +        i++;
> > > +    }
> > > +
> > > +    /*
> > > +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > > +     *
> > > +     * All Dynamic Capacity event records shall set the Event Record Severity
> > > +     * field in the Common Event Record Format to Informational Event. All
> > > +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > > +     * Event Log.
> > > +     */
> > > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > > +
> > > +    dCap.type = type;
> > > +    /* FIXME: for now, validity flag is cleared */
> > > +    dCap.validity_flags = 0;
> > > +    stw_le_p(&dCap.host_id, hid);
> > > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > > +    dCap.updated_region_id = 0;
> > > +    /*
> > > +     * FIXME: for now, the "More" flag is cleared as there is only one
> > > +     * extent associating with each record and tag-based release is
> > > +     * not supported.  
> > 
> > This is misleading by my understanding of the specification.
> > More isn't directly related to tags (though it is necessary for some
> > flows with tags, when sharing is enabled anyway).
> > The reference to record also isn't that relevant. The idea is you set
> > it for all but the last record pushed to the event log (from a given
> > action from an FM).
> > 
> > The whole reason to have a multi extent injection interface is to set
> > the more flag to indicate that the OS needs to treat a bunch of extents
> > as one 'offer' of capacity.  So a rejection from the OS needs to take
> > out 'all those records'.  The proposed linux code will currently reject
> > all by the first extent (I moaned about that yesterday). 
> > 
> > It is fine to not support this in the current code, but then I would check
> > the number of extents and reject any multi extent commands until we
> > do support it.
> > 
> > Ultimately I want a qmp command with more than one extent to mean
> > they are one 'offer' of capacity and must be handled as such by
> > the OS.  I.e. it can't reply with multiple unrelated acceptance
> > or reject replies.
> > 
> > On the add side this is easy to support, the fiddly bit is if the
> > OS rejects some or all of the capacity and you then need to
> > take out all the extents offered that it hasn't accepted in it's reply.
> > 
> > Pending list will need to maintain that association.
> > Maybe the easiest way is to have pending list be a list of sublists?
> > That way each sublist is handled in one go and any non accepted extents
> > in that sub list are dropped.
> > 
> >    
> > > +     */
> > > +    dCap.flags = 0;
> > > +    for (i = 0; i < num_extents; i++) {
> > > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > > +               sizeof(CXLDCExtentRaw));
> > > +
> > > +        if (type == DC_EVENT_ADD_CAPACITY) {
> > > +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> > > +                                             extents[i].start_dpa,
> > > +                                             extents[i].len,
> > > +                                             extents[i].tag,
> > > +                                             extents[i].shared_seq);
> > > +        }
> > > +
> > > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > > +                             (CXLEventRecordRaw *)&dCap)) {
> > > +            cxl_event_irq_assert(dcd);
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
> > > +                                  CXLDCExtentRecordList  *records,
> > > +                                  Error **errp)
> > > +{
> > > +   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,  
> > 
> > Drop passing in the log, it doesn't make sense given these events only occur
> > on that log and we can hard code it in the function.
> >   
> > > +                                    DC_EVENT_ADD_CAPACITY, 0,
> > > +                                    region_id, records, errp);
> > > +}
> > > +
> > > +void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> > > +                                      CXLDCExtentRecordList  *records,
> > > +                                      Error **errp)
> > > +{
> > > +    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> > > +                                     DC_EVENT_RELEASE_CAPACITY, 0,
> > > +                                     region_id, records, errp);
> > > +}
> > > +
> > >  static void ct3_class_init(ObjectClass *oc, void *data)
> > >  {  
> > 
> >   
> > > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > > index df3511e91b..b84063d9f4 100644
> > > --- a/include/hw/cxl/cxl_device.h
> > > +++ b/include/hw/cxl/cxl_device.h
> > > @@ -494,6 +494,7 @@ struct CXLType3Dev {
> > >           */
> > >          uint64_t total_capacity; /* 256M aligned */
> > >          CXLDCExtentList extents;
> > > +        CXLDCExtentList extents_pending;
> > >          uint32_t total_extent_count;
> > >          uint32_t ext_list_gen_seq;  
> > 
> > 
> >   
> > >  #endif /* CXL_EVENTS_H */
> > > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > > index 8cc4c72fa9..2645004666 100644
> > > --- a/qapi/cxl.json
> > > +++ b/qapi/cxl.json
> > > @@ -19,13 +19,16 @@
> > >  #
> > >  # @fatal: Fatal Event Log
> > >  #
> > > +# @dyncap: Dynamic Capacity Event Log
> > > +#
> > >  # Since: 8.1
> > >  ##
> > >  { 'enum': 'CxlEventLog',
> > >    'data': ['informational',
> > >             'warning',
> > >             'failure',
> > > -           'fatal']
> > > +           'fatal',
> > > +           'dyncap']  
> > 
> > Does this have the side effect of letting us inject error events
> > onto the dynamic capacity log? 
> >   
> > >   }
> > >  
> > >  ##
> > > @@ -361,3 +364,59 @@
> > >  ##
> > >  {'command': 'cxl-inject-correctable-error',
> > >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}  
> > ...
> >   
> > > +##
> > > +# @cxl-add-dynamic-capacity:
> > > +#
> > > +# Command to start add dynamic capacity extents flow. The device will
> > > +# have to acknowledged the acceptance of the extents before they are usable.
> > > +#
> > > +# @path: CXL DCD canonical QOM path
> > > +# @region-id: id of the region where the extent to add
> > > +# @extents: Extents to add
> > > +#
> > > +# Since : 9.0  
> > 
> > 9.1
> >   
> > > +##
> > > +{ 'command': 'cxl-add-dynamic-capacity',
> > > +  'data': { 'path': 'str',
> > > +            'region-id': 'uint8',
> > > +            'extents': [ 'CXLDCExtentRecord' ]
> > > +           }
> > > +}
> > > +
> > > +##
> > > +# @cxl-release-dynamic-capacity:
> > > +#
> > > +# Command to start release dynamic capacity extents flow. The host will
> > > +# need to respond to indicate that it has released the capacity before it
> > > +# is made unavailable for read and write and can be re-added.
> > > +#
> > > +# @path: CXL DCD canonical QOM path
> > > +# @region-id: id of the region where the extent to release
> > > +# @extents: Extents to release
> > > +#
> > > +# Since : 9.0  
> > 
> > 9.1
> >   
> > > +##
> > > +{ 'command': 'cxl-release-dynamic-capacity',
> > > +  'data': { 'path': 'str',
> > > +            'region-id': 'uint8',
> > > +            'extents': [ 'CXLDCExtentRecord' ]
> > > +           }
> > > +}  
> >   



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  2024-03-25 19:02 ` [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions nifan.cxl
  2024-04-05 12:29     ` Jonathan Cameron via
@ 2024-04-12 22:54   ` Gregory Price
  2024-04-15 17:37     ` fan
  1 sibling, 1 reply; 65+ messages in thread
From: Gregory Price @ 2024-04-12 22:54 UTC (permalink / raw)
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> All dpa ranges in the DC regions are invalid to access until an extent
> covering the range has been added. Add a bitmap for each region to
> record whether a DC block in the region has been backed by DC extent.
> For the bitmap, a bit in the bitmap represents a DC block. When a DC
> extent is added, all the bits of the blocks in the extent will be set,
> which will be cleared when the extent is released.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  6 +++
>  hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_device.h |  7 ++++
>  3 files changed, 89 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 7094e007b9..a0d2239176 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
> +        ct3_set_region_block_backed(ct3d, dpa, len);
>  
>          ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
>          cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);

while looking at the MHD code, we had decided to "reserve" the blocks in
the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
prevent a potential double-allocation (basically we need to sanity check
that two hosts aren't reserving the region PRIOR to the host being
notified).

I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
to prevent pending extents from being double-allocated.  Is this an
explicit choice?

I can see, for example, why you may want to allow the following in the
pending list: [Add X, Remove X, Add X].  I just want to know if this is
intentional or not. If not, you may consider adding a pending check
during the sanity check phase of `qmp_cxl_process_dynamic_capacity`

~Gregory

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  2024-04-12 22:54   ` Gregory Price
@ 2024-04-15 17:37     ` fan
  2024-04-16 15:00         ` Jonathan Cameron via
  0 siblings, 1 reply; 65+ messages in thread
From: fan @ 2024-04-15 17:37 UTC (permalink / raw)
  To: Gregory Price
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Fri, Apr 12, 2024 at 06:54:42PM -0400, Gregory Price wrote:
> On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > All dpa ranges in the DC regions are invalid to access until an extent
> > covering the range has been added. Add a bitmap for each region to
> > record whether a DC block in the region has been backed by DC extent.
> > For the bitmap, a bit in the bitmap represents a DC block. When a DC
> > extent is added, all the bits of the blocks in the extent will be set,
> > which will be cleared when the extent is released.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  6 +++
> >  hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
> >  include/hw/cxl/cxl_device.h |  7 ++++
> >  3 files changed, 89 insertions(+)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 7094e007b9..a0d2239176 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >  
> >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> >          ct3d->dc.total_extent_count += 1;
> > +        ct3_set_region_block_backed(ct3d, dpa, len);
> >  
> >          ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> >          cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> 
> while looking at the MHD code, we had decided to "reserve" the blocks in
> the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
> prevent a potential double-allocation (basically we need to sanity check
> that two hosts aren't reserving the region PRIOR to the host being
> notified).
> 
> I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
> to prevent pending extents from being double-allocated.  Is this an
> explicit choice?
> 
> I can see, for example, why you may want to allow the following in the
> pending list: [Add X, Remove X, Add X].  I just want to know if this is
> intentional or not. If not, you may consider adding a pending check
> during the sanity check phase of `qmp_cxl_process_dynamic_capacity`
> 
> ~Gregory

First, for remove request, pending list is not involved. See cxl r3.1,
9.13.3.3. Pending basically means "pending to add". 
So for the above example, in the pending list, you can see [Add x, add x] if the
event is not processed in time.
Second, from the spec, I cannot find any text saying we cannot issue
another add extent X if it is still pending.
From the kernel side, if the first one is accepted, the second one will
get rejected, and there is no issue there.
If the first is reject for some reason, the second one can get
accepted or rejected and do not need to worry about the first one.


Fan


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-04-04 13:32   ` Jørgen Hansen
  2024-04-05 11:12       ` Jonathan Cameron via
  2024-04-09 19:21     ` fan
@ 2024-04-15 17:56     ` fan
  2024-04-16 10:02       ` Jørgen Hansen
  2024-04-15 18:00     ` fan
  3 siblings, 1 reply; 65+ messages in thread
From: fan @ 2024-04-15 17:56 UTC (permalink / raw)
  To: Jørgen Hansen
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
	gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
	nmtadam.samsung, jim.harris, wj28.lee, Fan Ni


From 4b9695299d3d4b22f83666f8ab79099ec9f9817f Mon Sep 17 00:00:00 2001
From: Fan Ni <fan.ni@samsung.com>
Date: Tue, 20 Feb 2024 09:48:30 -0800
Subject: [PATCH 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to
 support add/release dynamic capacity response

Per CXL spec 3.1, two mailbox commands are implemented:
Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.

For the process of the above two commands, we use two-pass approach.
Pass 1: Check whether the input payload is valid or not; if not, skip
        Pass 2 and return mailbox process error.
Pass 2: Do the real work--add or release extents, respectively.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 396 ++++++++++++++++++++++++++++++++++++
 hw/mem/cxl_type3.c          |  11 +
 include/hw/cxl/cxl_device.h |   4 +
 3 files changed, 411 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 1915959015..cd9092b6bf 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -19,6 +19,7 @@
 #include "qemu/units.h"
 #include "qemu/uuid.h"
 #include "sysemu/hostmem.h"
+#include "qemu/range.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 #define CXL_DC_EVENT_LOG_SIZE 8
@@ -85,6 +86,8 @@ enum {
     DCD_CONFIG  = 0x48,
         #define GET_DC_CONFIG          0x0
         #define GET_DYN_CAP_EXT_LIST   0x1
+        #define ADD_DYN_CAP_RSP        0x2
+        #define RELEASE_DYN_CAP        0x3
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1398,6 +1401,393 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * Check whether any bit between addr[nr, nr+size) is set,
+ * return true if any bit is set, otherwise return false
+ */
+static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                              unsigned long size)
+{
+    unsigned long res = find_next_bit(addr, size + nr, nr);
+
+    return res < nr + size;
+}
+
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
+{
+    int i;
+    CXLDCRegion *region = &ct3d->dc.regions[0];
+
+    if (dpa < region->base ||
+        dpa >= region->base + ct3d->dc.total_capacity) {
+        return NULL;
+    }
+
+    /*
+     * CXL r3.1 section 9.13.3: Dynamic Capacity Device (DCD)
+     *
+     * Regions are used in increasing-DPA order, with Region 0 being used for
+     * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
+     * So check from the last region to find where the dpa belongs. Extents that
+     * cross multiple regions are not allowed.
+     */
+    for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
+        region = &ct3d->dc.regions[i];
+        if (dpa >= region->base) {
+            if (dpa + len > region->base + region->len) {
+                return NULL;
+            }
+            return region;
+        }
+    }
+
+    return NULL;
+}
+
+static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+                                             uint64_t dpa,
+                                             uint64_t len,
+                                             uint8_t *tag,
+                                             uint16_t shared_seq)
+{
+    CXLDCExtent *extent;
+
+    extent = g_new0(CXLDCExtent, 1);
+    extent->start_dpa = dpa;
+    extent->len = len;
+    if (tag) {
+        memcpy(extent->tag, tag, 0x10);
+    }
+    extent->shared_seq = shared_seq;
+
+    QTAILQ_INSERT_TAIL(list, extent, node);
+}
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+                                        CXLDCExtent *extent)
+{
+    QTAILQ_REMOVE(list, extent, node);
+    g_free(extent);
+}
+
+/*
+ * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
+ * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
+ */
+typedef struct CXLUpdateDCExtentListInPl {
+    uint32_t num_entries_updated;
+    uint8_t flags;
+    uint8_t rsvd[3];
+    /* CXL r3.1 Table 8-169: Updated Extent */
+    struct {
+        uint64_t start_dpa;
+        uint64_t len;
+        uint8_t rsvd[8];
+    } QEMU_PACKED updated_entries[];
+} QEMU_PACKED CXLUpdateDCExtentListInPl;
+
+/*
+ * For the extents in the extent list to operate, check whether they are valid
+ * 1. The extent should be in the range of a valid DC region;
+ * 2. The extent should not cross multiple regions;
+ * 3. The start DPA and the length of the extent should align with the block
+ * size of the region;
+ * 4. The address range of multiple extents in the list should not overlap.
+ */
+static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in)
+{
+    uint64_t min_block_size = UINT64_MAX;
+    CXLDCRegion *region;
+    CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
+    g_autofree unsigned long *blk_bitmap = NULL;
+    uint64_t dpa, len;
+    uint32_t i;
+
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        min_block_size = MIN(min_block_size, region->block_size);
+    }
+
+    blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
+                             ct3d->dc.regions[0].base) / min_block_size);
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        region = cxl_find_dc_region(ct3d, dpa, len);
+        if (!region) {
+            return CXL_MBOX_INVALID_PA;
+        }
+
+        dpa -= ct3d->dc.regions[0].base;
+        if (dpa % region->block_size || len % region->block_size) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        /* the dpa range already covered by some other extents in the list */
+        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
+            len / min_block_size)) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
+   }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in)
+{
+    uint32_t i;
+    CXLDCExtent *ent;
+    uint64_t dpa, len;
+    Range range1, range2;
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        range_init_nofail(&range1, dpa, len);
+
+        /*
+         * TODO: once the pending extent list is added, check against
+         * the list will be added here.
+         */
+
+        /* to-be-added range should not overlap with range already accepted */
+        QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
+            range_init_nofail(&range2, ent->start_dpa, ent->len);
+            if (range_overlaps_range(&range1, &range2)) {
+                return CXL_MBOX_INVALID_PA;
+            }
+        }
+    }
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
+ * An extent is added to the extent list and becomes usable only after the
+ * response is processed successfully.
+ * TODO: Action on the pending list will be added for both error path and
+ *       success path once the pending extent list is introduced.
+ */
+static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCExtentList *extent_list = &ct3d->dc.extents;
+    uint32_t i;
+    uint64_t dpa, len;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        /*
+         * TODO: once the pending list is introduced, extents in the beginning
+         * will get wiped out.
+         */
+        return CXL_MBOX_SUCCESS;
+    }
+
+    /* Adding extents causes exceeding device's extent tracking ability. */
+    if (in->num_entries_updated + ct3d->dc.total_extent_count >
+        CXL_NUM_EXTENTS_SUPPORTED) {
+        return CXL_MBOX_RESOURCES_EXHAUSTED;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
+        ct3d->dc.total_extent_count += 1;
+        /*
+         * TODO: we will add a pending extent list based on event log record
+         * and process the list accordingly here.
+         */
+    }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * Copy extent list from src to dst
+ * Return value: number of extents copied
+ */
+static uint32_t copy_extent_list(CXLDCExtentList *dst,
+                                 const CXLDCExtentList *src)
+{
+    uint32_t cnt = 0;
+    CXLDCExtent *ent;
+
+    if (!dst || !src) {
+        return 0;
+    }
+
+    QTAILQ_FOREACH(ent, src, node) {
+        cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
+                                         ent->tag, ent->shared_seq);
+        cnt++;
+    }
+    return cnt;
+}
+
+static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in, CXLDCExtentList *updated_list,
+        uint32_t *updated_list_size)
+{
+    CXLDCExtent *ent, *ent_next;
+    uint64_t dpa, len;
+    uint32_t i;
+    int cnt_delta = 0;
+    CXLRetCode ret = CXL_MBOX_SUCCESS;
+
+    QTAILQ_INIT(updated_list);
+    copy_extent_list(updated_list, &ct3d->dc.extents);
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        Range range;
+
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        while (len > 0) {
+            QTAILQ_FOREACH(ent, updated_list, node) {
+                range_init_nofail(&range, ent->start_dpa, ent->len);
+
+                if (range_contains(&range, dpa)) {
+                    uint64_t len1, len2 = 0, len_done = 0;
+                    uint64_t ent_start_dpa = ent->start_dpa;
+                    uint64_t ent_len = ent->len;
+
+                    len1 = dpa - ent->start_dpa;
+                    /* Found the extent or the subset of an existing extent */
+                    if (range_contains(&range, dpa + len - 1)) {
+                        len2 = ent_start_dpa + ent_len - dpa - len;
+                    } else {
+                        /*
+                         * TODO: we reject the attempt to remove an extent
+                         * that overlaps with multiple extents in the device
+                         * for now. We will allow it once superset release
+                         * support is added.
+                         */
+                        ret = CXL_MBOX_INVALID_PA;
+                        goto free_and_exit;
+                    }
+                    len_done = ent_len - len1 - len2;
+
+                    cxl_remove_extent_from_extent_list(updated_list, ent);
+                    cnt_delta--;
+
+                    if (len1) {
+                        cxl_insert_extent_to_extent_list(updated_list,
+                                                         ent_start_dpa,
+                                                         len1, NULL, 0);
+                        cnt_delta++;
+                    }
+                    if (len2) {
+                        cxl_insert_extent_to_extent_list(updated_list,
+                                                         dpa + len,
+                                                         len2, NULL, 0);
+                        cnt_delta++;
+                    }
+
+                    if (cnt_delta + ct3d->dc.total_extent_count >
+                            CXL_NUM_EXTENTS_SUPPORTED) {
+                        ret = CXL_MBOX_RESOURCES_EXHAUSTED;
+                        goto free_and_exit;
+                    }
+
+                    len -= len_done;
+                    /* len == 0 here until superset release is added */
+                    break;
+                }
+            }
+            if (len) {
+                ret = CXL_MBOX_INVALID_PA;
+                goto free_and_exit;
+            }
+        }
+    }
+free_and_exit:
+    if (ret != CXL_MBOX_SUCCESS) {
+        QTAILQ_FOREACH_SAFE(ent, updated_list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(updated_list, ent);
+        }
+        *updated_list_size = 0;
+    } else {
+        *updated_list_size = ct3d->dc.total_extent_count + cnt_delta;
+    }
+
+    return ret;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
+ */
+static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCExtentList updated_list;
+    CXLDCExtent *ent, *ent_next;
+    uint32_t updated_list_size;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    ret = cxl_dc_extent_release_dry_run(ct3d, in, &updated_list,
+                                        &updated_list_size);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    /*
+     * If the dry run release passes, the returned updated_list will
+     * be the updated extent list and we just need to clear the extents
+     * in the accepted list and copy extents in the updated_list to accepted
+     * list and update the extent count;
+     */
+    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
+    }
+    copy_extent_list(&ct3d->dc.extents, &updated_list);
+    QTAILQ_FOREACH_SAFE(ent, &updated_list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
+    }
+    ct3d->dc.total_extent_count = updated_list_size;
+
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1448,6 +1838,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
     [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
         "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
         8, 0 },
+    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
+        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
+        ~0, IMMEDIATE_DATA_CHANGE },
+    [DCD_CONFIG][RELEASE_DYN_CAP] = {
+        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
+        ~0, IMMEDIATE_DATA_CHANGE },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 48cce3bb13..2d4b6242f0 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -671,6 +671,15 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
     return true;
 }
 
+static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
+{
+    CXLDCExtent *ent, *ent_next;
+
+    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
+    }
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -867,6 +876,7 @@ err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
@@ -888,6 +898,7 @@ static void ct3_exit(PCIDevice *pci_dev)
     cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 6aec6ac983..df3511e91b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -551,4 +551,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
 
 void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
 
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+                                        CXLDCExtent *extent);
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-04-04 13:32   ` Jørgen Hansen
                       ` (2 preceding siblings ...)
  2024-04-15 17:56     ` fan
@ 2024-04-15 18:00     ` fan
  3 siblings, 0 replies; 65+ messages in thread
From: fan @ 2024-04-15 18:00 UTC (permalink / raw)
  To: Jørgen Hansen
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
	gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
	nmtadam.samsung, jim.harris, wj28.lee, Fan Ni

On Thu, Apr 04, 2024 at 01:32:23PM +0000, Jørgen Hansen wrote:
> On 3/25/24 20:02, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > Per CXL spec 3.1, two mailbox commands are implemented:
> > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> > 
> > For the process of the above two commands, we use two-pass approach.
> > Pass 1: Check whether the input payload is valid or not; if not, skip
> >          Pass 2 and return mailbox process error.
> > Pass 2: Do the real work--add or release extents, respectively.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >   hw/cxl/cxl-mailbox-utils.c  | 433 +++++++++++++++++++++++++++++++++++-
> >   hw/mem/cxl_type3.c          |  11 +
> >   include/hw/cxl/cxl_device.h |   4 +
> >   3 files changed, 444 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 30ef46a036..a9eca516c8 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -19,6 +19,7 @@
> >   #include "qemu/units.h"
> >   #include "qemu/uuid.h"
> >   #include "sysemu/hostmem.h"
> > +#include "qemu/range.h"
> > 
> >   #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
> >   #define CXL_DC_EVENT_LOG_SIZE 8
> > @@ -85,6 +86,8 @@ enum {
> >       DCD_CONFIG  = 0x48,
> >           #define GET_DC_CONFIG          0x0
> >           #define GET_DYN_CAP_EXT_LIST   0x1
> > +        #define ADD_DYN_CAP_RSP        0x2
> > +        #define RELEASE_DYN_CAP        0x3
> >       PHYSICAL_SWITCH = 0x51,
> >           #define IDENTIFY_SWITCH_DEVICE      0x0
> >           #define GET_PHYSICAL_PORT_STATE     0x1
> > @@ -1400,6 +1403,422 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> >       return CXL_MBOX_SUCCESS;
> >   }
> > 
> > +/*
> > + * Check whether any bit between addr[nr, nr+size) is set,
> > + * return true if any bit is set, otherwise return false
> > + */
> > +static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> > +                              unsigned long size)
> > +{
> > +    unsigned long res = find_next_bit(addr, size + nr, nr);
> > +
> > +    return res < nr + size;
> > +}
> > +
> > +CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
> > +{
> > +    int i;
> > +    CXLDCRegion *region = &ct3d->dc.regions[0];
> > +
> > +    if (dpa < region->base ||
> > +        dpa >= region->base + ct3d->dc.total_capacity) {
> > +        return NULL;
> > +    }
> > +
> > +    /*
> > +     * CXL r3.1 section 9.13.3: Dynamic Capacity Device (DCD)
> > +     *
> > +     * Regions are used in increasing-DPA order, with Region 0 being used for
> > +     * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
> > +     * So check from the last region to find where the dpa belongs. Extents that
> > +     * cross multiple regions are not allowed.
> > +     */
> > +    for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
> > +        region = &ct3d->dc.regions[i];
> > +        if (dpa >= region->base) {
> > +            if (dpa + len > region->base + region->len) {
> > +                return NULL;
> > +            }
> > +            return region;
> > +        }
> > +    }
> > +
> > +    return NULL;
> > +}
> > +
> > +static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
> > +                                             uint64_t dpa,
> > +                                             uint64_t len,
> > +                                             uint8_t *tag,
> > +                                             uint16_t shared_seq)
> > +{
> > +    CXLDCExtent *extent;
> > +
> > +    extent = g_new0(CXLDCExtent, 1);
> > +    extent->start_dpa = dpa;
> > +    extent->len = len;
> > +    if (tag) {
> > +        memcpy(extent->tag, tag, 0x10);
> > +    }
> > +    extent->shared_seq = shared_seq;
> > +
> > +    QTAILQ_INSERT_TAIL(list, extent, node);
> > +}
> > +
> > +void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> > +                                        CXLDCExtent *extent)
> > +{
> > +    QTAILQ_REMOVE(list, extent, node);
> > +    g_free(extent);
> > +}
> > +
> > +/*
> > + * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> > + * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> > + */
> > +typedef struct CXLUpdateDCExtentListInPl {
> > +    uint32_t num_entries_updated;
> > +    uint8_t flags;
> > +    uint8_t rsvd[3];
> > +    /* CXL r3.1 Table 8-169: Updated Extent */
> > +    struct {
> > +        uint64_t start_dpa;
> > +        uint64_t len;
> > +        uint8_t rsvd[8];
> > +    } QEMU_PACKED updated_entries[];
> > +} QEMU_PACKED CXLUpdateDCExtentListInPl;
> > +
> > +/*
> > + * For the extents in the extent list to operate, check whether they are valid
> > + * 1. The extent should be in the range of a valid DC region;
> > + * 2. The extent should not cross multiple regions;
> > + * 3. The start DPA and the length of the extent should align with the block
> > + * size of the region;
> > + * 4. The address range of multiple extents in the list should not overlap.
> > + */
> > +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> > +        const CXLUpdateDCExtentListInPl *in)
> > +{
> > +    uint64_t min_block_size = UINT64_MAX;
> > +    CXLDCRegion *region = &ct3d->dc.regions[0];
> > +    CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +    uint64_t dpa, len;
> > +    uint32_t i;
> > +
> > +    for (i = 0; i < ct3d->dc.num_regions; i++) {
> > +        region = &ct3d->dc.regions[i];
> > +        min_block_size = MIN(min_block_size, region->block_size);
> > +    }
> > +
> > +    blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
> > +                             ct3d->dc.regions[0].base) / min_block_size);
> > +
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        region = cxl_find_dc_region(ct3d, dpa, len);
> > +        if (!region) {
> > +            return CXL_MBOX_INVALID_PA;
> > +        }
> > +
> > +        dpa -= ct3d->dc.regions[0].base;
> > +        if (dpa % region->block_size || len % region->block_size) {
> > +            return CXL_MBOX_INVALID_EXTENT_LIST;
> > +        }
> > +        /* the dpa range already covered by some other extents in the list */
> > +        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> > +            len / min_block_size)) {
> > +            return CXL_MBOX_INVALID_EXTENT_LIST;
> > +        }
> > +        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> > +   }
> > +
> > +    return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> > +        const CXLUpdateDCExtentListInPl *in)
> > +{
> > +    uint32_t i;
> > +    CXLDCExtent *ent;
> > +    uint64_t dpa, len;
> > +    Range range1, range2;
> > +
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        range_init_nofail(&range1, dpa, len);
> > +
> > +        /*
> > +         * TODO: once the pending extent list is added, check against
> > +         * the list will be added here.
> > +         */
> > +
> > +        /* to-be-added range should not overlap with range already accepted */
> > +        QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > +            range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +            if (range_overlaps_range(&range1, &range2)) {
> > +                return CXL_MBOX_INVALID_PA;
> > +            }
> > +        }
> > +    }
> > +    return CXL_MBOX_SUCCESS;
> > +}
> 
> Instead of iterating over all new extents and all existing extents, 
> couldn't this be rolled into cxl_detect_malformed_extent_list - the 
> bitmap created there summarizes all ranges of the new extents, so you 
> can just check that the existing (and pending) extents don't overlap 
> with anything in the bitmap? Or allow the bitmap to be returned and used 
> for this check, since cxl_detect_malformed_extent_list is also used on 
> release, where things aren't as simple.
> 
> > +
> > +/*
> > + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
> > + * An extent is added to the extent list and becomes usable only after the
> > + * response is processed successfully
> > + */
> > +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > +                                          uint8_t *payload_in,
> > +                                          size_t len_in,
> > +                                          uint8_t *payload_out,
> > +                                          size_t *len_out,
> > +                                          CXLCCI *cci)
> > +{
> > +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > +    uint32_t i;
> > +    uint64_t dpa, len;
> > +    CXLRetCode ret;
> > +
> > +    if (in->num_entries_updated == 0) {
> > +        return CXL_MBOX_SUCCESS;
> > +    }
> 
> The mailbox processing in patch 2 converts from le explicitly, whereas 
> the mailbox commands here don't. Looking at the existing mailbox 
> commands, convertion doesn't seem to be rigorously applied, so maybe 
> that is OK?
> 
> > +
> > +    /* Adding extents causes exceeding device's extent tracking ability. */
> > +    if (in->num_entries_updated + ct3d->dc.total_extent_count >
> > +        CXL_NUM_EXTENTS_SUPPORTED) {
> > +        return CXL_MBOX_RESOURCES_EXHAUSTED;
> > +    }
> > +
> > +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > +        ct3d->dc.total_extent_count += 1;
> > +        /*
> > +         * TODO: we will add a pending extent list based on event log record
> > +         * and process the list according here.
> > +         */
> > +    }
> > +
> > +    return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +/*
> > + * Copy extent list from src to dst
> > + * Return value: number of extents copied
> > + */
> > +static uint32_t copy_extent_list(CXLDCExtentList *dst,
> > +                                 const CXLDCExtentList *src)
> > +{
> > +    uint32_t cnt = 0;
> > +    CXLDCExtent *ent;
> > +
> > +    if (!dst || !src) {
> > +        return 0;
> > +    }
> > +
> > +    QTAILQ_FOREACH(ent, src, node) {
> > +        cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
> > +                                         ent->tag, ent->shared_seq);
> > +        cnt++;
> > +    }
> > +    return cnt;
> > +}
> > +
> > +static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
> > +        const CXLUpdateDCExtentListInPl *in)
> > +{
> > +    CXLDCExtent *ent, *ent_next;
> > +    uint64_t dpa, len;
> > +    uint32_t i;
> > +    int cnt_delta = 0;
> > +    CXLDCExtentList tmp_list;
> > +    CXLRetCode ret = CXL_MBOX_SUCCESS;
> > +
> > +    if (in->num_entries_updated == 0) {
> > +        return CXL_MBOX_INVALID_INPUT;
> > +    }
> > +
> > +    QTAILQ_INIT(&tmp_list);
> > +    copy_extent_list(&tmp_list, &ct3d->dc.extents);
> > +
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        Range range;
> > +
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        while (len > 0) {
> > +            QTAILQ_FOREACH(ent, &tmp_list, node) {
> > +                range_init_nofail(&range, ent->start_dpa, ent->len);
> > +
> > +                if (range_contains(&range, dpa)) {
> > +                    uint64_t len1, len2, len_done = 0;
> > +                    uint64_t ent_start_dpa = ent->start_dpa;
> > +                    uint64_t ent_len = ent->len;
> > +                    /*
> > +                     * Found the exact extent or the subset of an existing
> > +                     * extent.
> > +                     */
> > +                    if (range_contains(&range, dpa + len - 1)) {
> > +                        len1 = dpa - ent->start_dpa;
> > +                        len2 = ent_start_dpa + ent_len - dpa - len;
> > +                        len_done = ent_len - len1 - len2;
> > +
> > +                        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> > +                        cnt_delta--;
> > +
> > +                        if (len1) {
> > +                            cxl_insert_extent_to_extent_list(&tmp_list,
> > +                                                             ent_start_dpa,
> > +                                                             len1, NULL, 0);
> > +                            cnt_delta++;
> > +                        }
> > +                        if (len2) {
> > +                            cxl_insert_extent_to_extent_list(&tmp_list,
> > +                                                             dpa + len,
> > +                                                             len2, NULL, 0);
> > +                            cnt_delta++;
> > +                        }
> > +
> > +                        if (cnt_delta + ct3d->dc.total_extent_count >
> > +                            CXL_NUM_EXTENTS_SUPPORTED) {
> > +                            ret = CXL_MBOX_RESOURCES_EXHAUSTED;
> > +                            goto free_and_exit;
> > +                        }
> > +                    } else {
> > +                        /*
> > +                         * TODO: we reject the attempt to remove an extent
> > +                         * that overlaps with multiple extents in the device
> > +                         * for now, we will allow it once superset release
> > +                         * support is added.
> > +                         */
> > +                        ret = CXL_MBOX_INVALID_PA;
> > +                        goto free_and_exit;
> > +                    }
> > +
> > +                    len -= len_done;
> > +                    /* len == 0 here until superset release is added */
> > +                    break;
> > +                }
> > +            }
> > +            if (len) {
> > +                ret = CXL_MBOX_INVALID_PA;
> > +                goto free_and_exit;
> > +            }
> > +        }
> > +    }
> > +free_and_exit:
> > +    QTAILQ_FOREACH_SAFE(ent, &tmp_list, node, ent_next) {
> > +        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +/*
> > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> > + */
> > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> > +                                          uint8_t *payload_in,
> > +                                          size_t len_in,
> > +                                          uint8_t *payload_out,
> > +                                          size_t *len_out,
> > +                                          CXLCCI *cci)
> > +{
> > +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > +    CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > +    CXLDCExtent *ent;
> > +    uint32_t i;
> > +    uint64_t dpa, len;
> > +    CXLRetCode ret;
> > +
> > +    if (in->num_entries_updated == 0) {
> > +        return CXL_MBOX_INVALID_INPUT;
> > +    }
> > +
> > +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    ret = cxl_dc_extent_release_dry_run(ct3d, in);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    /* From this point, all the extents to release are valid */
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        Range range;
> > +
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        while (len > 0) {
> > +            QTAILQ_FOREACH(ent, extent_list, node) {
> > +                range_init_nofail(&range, ent->start_dpa, ent->len);
> > +
> > +                /* Found the extent overlapping with */
> > +                if (range_contains(&range, dpa)) {
> > +                    uint64_t len1, len2 = 0, len_done = 0;
> > +                    uint64_t ent_start_dpa = ent->start_dpa;
> > +                    uint64_t ent_len = ent->len;
> > +
> > +                    len1 = dpa - ent_start_dpa;
> > +                    if (range_contains(&range, dpa + len - 1)) {
> > +                        len2 = ent_start_dpa + ent_len - dpa - len;
> > +                    }
> > +                    len_done = ent_len - len1 - len2;
> > +
> > +                    cxl_remove_extent_from_extent_list(extent_list, ent);
> > +                    ct3d->dc.total_extent_count -= 1;
> > +
> > +                    if (len1) {
> > +                        cxl_insert_extent_to_extent_list(extent_list,
> > +                                                         ent_start_dpa,
> > +                                                         len1, NULL, 0);
> > +                        ct3d->dc.total_extent_count += 1;
> > +                    }
> > +                    if (len2) {
> > +                        cxl_insert_extent_to_extent_list(extent_list,
> > +                                                         dpa + len,
> > +                                                         len2, NULL, 0);
> > +                        ct3d->dc.total_extent_count += 1;
> > +                    }
> > +
> > +                    len -= len_done;
> > +                    /*
> > +                     * len will always be 0 until superset release is add.
> > +                     * TODO: superset release will be added.
> > +                     */
> > +                    break;
> > +                }
> > +            }
> > +        }
> > +    }
> 
> The tmp_list generated in cxl_dc_extent_release_dry_run is identical to 
> the updated extent_list after the loops above - so you could swap the 
> existing extent_list with the tmp_list and adjust the number of extents 
> with the cnt_delta calculated, if the dry run is successful - instead of 
> duplicating the logic.
> 
> Thanks,
> Jørgen

Hi Jorgen and Jonathan,
Based on your feedback, I have simplified the code by reusing the
tmp_list.
I have redo the patch and all following and will share in this thread,
please help check if you have time.
I will send out the next full series if the change looks good to you.

Thanks,
Fan

> 
> > +    return CXL_MBOX_SUCCESS;
> > +}
> > +
> >   #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> >   #define IMMEDIATE_DATA_CHANGE (1 << 2)
> >   #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> > @@ -1413,15 +1832,15 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
> >       [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
> >           cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
> >       [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
> > -                                      cmd_events_get_interrupt_policy, 0, 0 },
> > +        cmd_events_get_interrupt_policy, 0, 0 },
> >       [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
> > -                                      cmd_events_set_interrupt_policy,
> > -                                      ~0, IMMEDIATE_CONFIG_CHANGE },
> > +        cmd_events_set_interrupt_policy,
> > +        ~0, IMMEDIATE_CONFIG_CHANGE },
> >       [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
> >           cmd_firmware_update_get_info, 0, 0 },
> >       [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
> >       [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set,
> > -                         8, IMMEDIATE_POLICY_CHANGE },
> > +        8, IMMEDIATE_POLICY_CHANGE },
> >       [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported,
> >                                 0, 0 },
> >       [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
> > @@ -1450,6 +1869,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
> >       [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
> >           "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
> >           8, 0 },
> > +    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
> > +        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
> > +        ~0, IMMEDIATE_DATA_CHANGE },
> > +    [DCD_CONFIG][RELEASE_DYN_CAP] = {
> > +        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
> > +        ~0, IMMEDIATE_DATA_CHANGE },
> >   };
> > 
> >   static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 5be3c904ba..951bd79a82 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -678,6 +678,15 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> >       return true;
> >   }
> > 
> > +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> > +{
> > +    CXLDCExtent *ent, *ent_next;
> > +
> > +    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> > +    }
> > +}
> > +
> >   static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >   {
> >       DeviceState *ds = DEVICE(ct3d);
> > @@ -874,6 +883,7 @@ err_free_special_ops:
> >       g_free(regs->special_ops);
> >   err_address_space_free:
> >       if (ct3d->dc.host_dc) {
> > +        cxl_destroy_dc_regions(ct3d);
> >           address_space_destroy(&ct3d->dc.host_dc_as);
> >       }
> >       if (ct3d->hostpmem) {
> > @@ -895,6 +905,7 @@ static void ct3_exit(PCIDevice *pci_dev)
> >       cxl_doe_cdat_release(cxl_cstate);
> >       g_free(regs->special_ops);
> >       if (ct3d->dc.host_dc) {
> > +        cxl_destroy_dc_regions(ct3d);
> >           address_space_destroy(&ct3d->dc.host_dc_as);
> >       }
> >       if (ct3d->hostpmem) {
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index 6aec6ac983..df3511e91b 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -551,4 +551,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
> > 
> >   void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
> > 
> > +CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> > +
> > +void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> > +                                        CXLDCExtent *extent);
> >   #endif
> > --
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-10 19:49         ` Jonathan Cameron via
  (?)
@ 2024-04-15 20:06         ` fan
  2024-04-16 14:58             ` Jonathan Cameron via
  -1 siblings, 1 reply; 65+ messages in thread
From: fan @ 2024-04-15 20:06 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: fan, qemu-devel, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni


From ce75be83e915fbc4dd6e489f976665b81174002b Mon Sep 17 00:00:00 2001
From: Fan Ni <fan.ni@samsung.com>
Date: Tue, 20 Feb 2024 09:48:31 -0800
Subject: [PATCH 09/13] hw/cxl/events: Add qmp interfaces to add/release
 dynamic capacity extents

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "selection-policy": 2,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "hid": 0,
      "flags": 1,
      "region-id": 0,
      "tag": "",
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  65 ++++++--
 hw/mem/cxl_type3.c          | 310 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  20 +++
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h |  18 +++
 qapi/cxl.json               |  69 ++++++++
 6 files changed, 491 insertions(+), 13 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index cd9092b6bf..839ae836a1 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
     g_free(extent);
 }
 
+/*
+ * Add a new extent to the extent "group" if group exists;
+ * otherwise, create a new group
+ * Return value: return the group where the extent is inserted.
+ */
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq)
+{
+    if (!group) {
+        group = g_new0(CXLDCExtentGroup, 1);
+        QTAILQ_INIT(&group->list);
+    }
+    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
+                                     tag, shared_seq);
+    return group;
+}
+
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group)
+{
+    QTAILQ_INSERT_TAIL(list, group, node);
+}
+
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
+{
+    CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
+
+    QTAILQ_REMOVE(list, group, node);
+    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&group->list, ent);
+    }
+    g_free(group);
+}
+
 /*
  * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
  * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
@@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 {
     uint32_t i;
     CXLDCExtent *ent;
+    CXLDCExtentGroup *ext_group;
     uint64_t dpa, len;
     Range range1, range2;
 
@@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
         range_init_nofail(&range1, dpa, len);
 
         /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
+         * The host-accepted DPA range must be contained by the first extent
+         * group in the pending list
          */
+        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1588,26 +1631,26 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
-        /*
-         * TODO: once the pending list is introduced, extents in the beginning
-         * will get wiped out.
-         */
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_SUCCESS;
     }
 
     /* Adding extents causes exceeding device's extent tracking ability. */
     if (in->num_entries_updated + ct3d->dc.total_extent_count >
         CXL_NUM_EXTENTS_SUPPORTED) {
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_RESOURCES_EXHAUSTED;
     }
 
     ret = cxl_detect_malformed_extent_list(ct3d, in);
     if (ret != CXL_MBOX_SUCCESS) {
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return ret;
     }
 
     ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
     if (ret != CXL_MBOX_SUCCESS) {
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return ret;
     }
 
@@ -1617,11 +1660,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list accordingly here.
-         */
     }
+    /* Remove the first extent group in the pending list*/
+    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
 
     return CXL_MBOX_SUCCESS;
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 2d4b6242f0..8d99b27b27 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -667,6 +667,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -674,10 +675,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group, *group_next;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+
+    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
+        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
+        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(&group->list, ent);
+        }
+        g_free(group);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1442,7 +1452,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
     default:
         return -EINVAL;
     }
@@ -1693,6 +1702,305 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
+                                                uint64_t dpa, uint64_t len)
+{
+    CXLDCExtentGroup *group;
+
+    if (!list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(group, list, node) {
+        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event with extent list.
+ * Currently DC extents add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
+        uint16_t hid, CXLDCEventType type, uint8_t rid,
+        CXLDCExtentRecordList *records, Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDCExtentRecordList *list;
+    CXLDCExtentGroup *group = NULL;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
+    uint64_t dpa, offset, len, block_size;
+    g_autofree unsigned long *blk_bitmap = NULL;
+    int i;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        } else if (type == DC_EVENT_ADD_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA already accessible  to the same LD");
+                return;
+            }
+        }
+        list = list->next;
+        num_extents++;
+    }
+
+    if (num_extents > 1) {
+        error_setg(errp,
+                   "TODO: remove the check once kernel support More flag");
+        return;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            group = cxl_insert_extent_to_extent_group(group,
+                                                      extents[i].start_dpa,
+                                                      extents[i].len,
+                                                      extents[i].tag,
+                                                      extents[i].shared_seq);
+        }
+
+        list = list->next;
+        i++;
+    }
+    if (group) {
+        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (i < num_extents - 1) {
+            /* Set "More" flag */
+            dCap.flags |= BIT(0);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t hid,
+                                  uint8_t sel_policy, uint8_t region_id,
+                                  const char *tag,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    enum {
+        CXL_SEL_POLICY_FREE,
+        CXL_SEL_POLICY_CONTIGUOUS,
+        CXL_SEL_POLICY_PRESCRIPTIVE,
+        CXL_SEL_POLICY_ENABLESHAREDACCESS,
+    };
+    switch (sel_policy) {
+    case CXL_SEL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid,
+                                                      DC_EVENT_ADD_CAPACITY,
+                                                      region_id, records, errp);
+        break;
+    default:
+        error_setg(errp, "Selection policy not supported");
+        break;
+    }
+}
+
+#define REMOVAL_POLICY_MASK 0xf
+#define FORCED_REMOVAL_BIT BIT(4)
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
+                                      uint8_t flags, uint8_t region_id,
+                                      const char *tag,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
+
+    if (flags & FORCED_REMOVAL_BIT) {
+        /* TODO: enable forced removal in the future */
+        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
+        error_setg(errp, "Forced removal not supported yet");
+        return;
+    }
+
+    switch (flags & REMOVAL_POLICY_MASK) {
+    case 1:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
+                                                      region_id, records, errp);
+        break;
+    default:
+        error_setg(errp, "Removal policy not supported");
+        break;
+    }
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..810685e0d5 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,23 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path,
+                                  uint16_t hid,
+                                  uint8_t sel_policy,
+                                  uint8_t region_id,
+                                  const char *tag,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
+                                      uint8_t flags, uint8_t region_id,
+                                      const char *tag,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..c69ff6b5de 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
 } CXLDCExtent;
 typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
 
+typedef struct CXLDCExtentGroup {
+    CXLDCExtentList list;
+    QTAILQ_ENTRY(CXLDCExtentGroup) node;
+} CXLDCExtentGroup;
+typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -494,6 +500,7 @@ struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentGroupList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq);
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group);
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 8cc4c72fa9..8cdcfce708 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -361,3 +361,72 @@
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDCExtentRecord:
+#
+# Record of a single extent to add/release
+#
+# @offset: offset to the start of the region where the extent to be operated
+# @len: length of the extent
+#
+# Since: 9.1
+##
+{ 'struct': 'CXLDCExtentRecord',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to start add dynamic capacity extents flow. The device will
+# have to acknowledged the acceptance of the extents before they are usable.
+#
+# @path: CXL DCD canonical QOM path
+# @hid: host id
+# @selection-policy: policy to use for selecting extents for adding capacity
+# @region-id: id of the region where the extent to add
+# @tag: Context field
+# @extents: Extents to add
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'hid': 'uint16',
+            'selection-policy': 'uint8',
+            'region-id': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to start release dynamic capacity extents flow. The host will
+# need to respond to indicate that it has released the capacity before it
+# is made unavailable for read and write and can be re-added.
+#
+# @path: CXL DCD canonical QOM path
+# @hid: host id
+# @flags: bit[3:0] for removal policy, bit[4] for forced removal, bit[5] for
+#     sanitize on release, bit[7:6] reserved
+# @region-id: id of the region where the extent to release
+# @tag: Context field
+# @extents: Extents to release
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'hid': 'uint16',
+            'flags': 'uint8',
+            'region-id': 'uint8',
+            'tag': 'str',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
  2024-04-05  9:57   ` Jørgen Hansen
@ 2024-04-15 20:17     ` fan
  0 siblings, 0 replies; 65+ messages in thread
From: fan @ 2024-04-15 20:17 UTC (permalink / raw)
  To: Jørgen Hansen
  Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
	gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
	nmtadam.samsung, jim.harris, wj28.lee, Fan Ni

On Fri, Apr 05, 2024 at 09:57:18AM +0000, Jørgen Hansen wrote:
> On 3/25/24 20:02, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > With the change, we extend the extent release mailbox command processing
> > to allow more flexible release. As long as the DPA range of the extent to
> > release is covered by accepted extent(s) in the device, the release can be
> > performed.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >   hw/cxl/cxl-mailbox-utils.c | 41 ++++++++++++++++++++++----------------
> >   1 file changed, 24 insertions(+), 17 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index a0d2239176..3b7949c364 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -1674,6 +1674,12 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
> >           dpa = in->updated_entries[i].start_dpa;
> >           len = in->updated_entries[i].len;
> > 
> > +        /* Check if the DPA range is not fully backed with valid extents */
> > +        if (!ct3_test_region_block_backed(ct3d, dpa, len)) {
> > +            ret = CXL_MBOX_INVALID_PA;
> > +            goto free_and_exit;
> > +        }
> 
> In cxl_dcd_add_dyn_cap_rsp_dry_run, the opposite check (all 0's in the 
> bitmap) could be used instead of looping through the full extent list 
> (and this also makes my previous comment about reusing the bitmap from 
> cxl_detect_malformed_extent_list irrelevant).

For adding, we need to make sure the incoming extents have no overlaps
with accepted extents, that means if any bit of the range is not 0, it
returns error. We cannot use !ct3_test_region_block_backed for the
purpose, as it return true when all 1s, not has 1s.

For the purpose, we need some function like
ct3_test_region_block_all_cleared or ct3_test_region_block_non_backed.
We do not have that in current code.
Checking bitmap is more performance efficient, but it introduces more
changes, so I will leave it as it is until there are more concerns.

Fan

> 
> > +        /* After this point, extent overflow is the only error can happen */
> >           while (len > 0) {
> >               QTAILQ_FOREACH(ent, &tmp_list, node) {
> >                   range_init_nofail(&range, ent->start_dpa, ent->len);
> > @@ -1713,25 +1719,27 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
> >                               goto free_and_exit;
> >                           }
> >                       } else {
> > -                        /*
> > -                         * TODO: we reject the attempt to remove an extent
> > -                         * that overlaps with multiple extents in the device
> > -                         * for now, we will allow it once superset release
> > -                         * support is added.
> > -                         */
> > -                        ret = CXL_MBOX_INVALID_PA;
> > -                        goto free_and_exit;
> > +                        len1 = dpa - ent_start_dpa;
> > +                        len2 = 0;
> > +                        len_done = ent_len - len1 - len2;
> 
> You don't need len2 in the else statement.
> 
> Thanks,
> Jørgen
> 
> > +
> > +                        cxl_remove_extent_from_extent_list(&tmp_list, ent);
> > +                        cnt_delta--;
> > +                        if (len1) {
> > +                            cxl_insert_extent_to_extent_list(&tmp_list,
> > +                                                             ent_start_dpa,
> > +                                                             len1, NULL, 0);
> > +                            cnt_delta++;
> > +                        }
> >                       }
> > 
> >                       len -= len_done;
> > -                    /* len == 0 here until superset release is added */
> > +                    if (len) {
> > +                        dpa = ent_start_dpa + ent_len;
> > +                    }
> >                       break;
> >                   }
> >               }
> > -            if (len) {
> > -                ret = CXL_MBOX_INVALID_PA;
> > -                goto free_and_exit;
> > -            }
> >           }
> >       }
> >   free_and_exit:
> > @@ -1819,10 +1827,9 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> >                       }
> > 
> >                       len -= len_done;
> > -                    /*
> > -                     * len will always be 0 until superset release is add.
> > -                     * TODO: superset release will be added.
> > -                     */
> > +                    if (len > 0) {
> > +                        dpa = ent_start_dpa + ent_len;
> > +                    }
> >                       break;
> >                   }
> >               }
> > --
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-04-15 17:56     ` fan
@ 2024-04-16 10:02       ` Jørgen Hansen
  2024-04-16 16:27         ` fan
  0 siblings, 1 reply; 65+ messages in thread
From: Jørgen Hansen @ 2024-04-16 10:02 UTC (permalink / raw)
  To: fan
  Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
	ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
	jim.harris, wj28.lee, Fan Ni

On 4/15/24 19:56, fan wrote:
>  From 4b9695299d3d4b22f83666f8ab79099ec9f9817f Mon Sep 17 00:00:00 2001
> From: Fan Ni <fan.ni@samsung.com>
> Date: Tue, 20 Feb 2024 09:48:30 -0800
> Subject: [PATCH 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to
>   support add/release dynamic capacity response
> 
> Per CXL spec 3.1, two mailbox commands are implemented:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> 
> For the process of the above two commands, we use two-pass approach.
> Pass 1: Check whether the input payload is valid or not; if not, skip
>          Pass 2 and return mailbox process error.
> Pass 2: Do the real work--add or release extents, respectively.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>   hw/cxl/cxl-mailbox-utils.c  | 396 ++++++++++++++++++++++++++++++++++++
>   hw/mem/cxl_type3.c          |  11 +
>   include/hw/cxl/cxl_device.h |   4 +
>   3 files changed, 411 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 1915959015..cd9092b6bf 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c

snip

> +/*
> + * Copy extent list from src to dst
> + * Return value: number of extents copied
> + */
> +static uint32_t copy_extent_list(CXLDCExtentList *dst,
> +                                 const CXLDCExtentList *src)
> +{
> +    uint32_t cnt = 0;
> +    CXLDCExtent *ent;
> +
> +    if (!dst || !src) {
> +        return 0;
> +    }
> +
> +    QTAILQ_FOREACH(ent, src, node) {
> +        cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
> +                                         ent->tag, ent->shared_seq);
> +        cnt++;
> +    }
> +    return cnt;
> +}
> +
> +static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
> +        const CXLUpdateDCExtentListInPl *in, CXLDCExtentList *updated_list,
> +        uint32_t *updated_list_size)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +    uint64_t dpa, len;
> +    uint32_t i;
> +    int cnt_delta = 0;
> +    CXLRetCode ret = CXL_MBOX_SUCCESS;
> +
> +    QTAILQ_INIT(updated_list);
> +    copy_extent_list(updated_list, &ct3d->dc.extents);
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        Range range;
> +
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        while (len > 0) {
> +            QTAILQ_FOREACH(ent, updated_list, node) {
> +                range_init_nofail(&range, ent->start_dpa, ent->len);
> +
> +                if (range_contains(&range, dpa)) {
> +                    uint64_t len1, len2 = 0, len_done = 0;
> +                    uint64_t ent_start_dpa = ent->start_dpa;
> +                    uint64_t ent_len = ent->len;
> +
> +                    len1 = dpa - ent->start_dpa;
> +                    /* Found the extent or the subset of an existing extent */
> +                    if (range_contains(&range, dpa + len - 1)) {
> +                        len2 = ent_start_dpa + ent_len - dpa - len;
> +                    } else {
> +                        /*
> +                         * TODO: we reject the attempt to remove an extent
> +                         * that overlaps with multiple extents in the device
> +                         * for now. We will allow it once superset release
> +                         * support is added.
> +                         */
> +                        ret = CXL_MBOX_INVALID_PA;
> +                        goto free_and_exit;
> +                    }
> +                    len_done = ent_len - len1 - len2;
> +
> +                    cxl_remove_extent_from_extent_list(updated_list, ent);
> +                    cnt_delta--;
> +
> +                    if (len1) {
> +                        cxl_insert_extent_to_extent_list(updated_list,
> +                                                         ent_start_dpa,
> +                                                         len1, NULL, 0);
> +                        cnt_delta++;
> +                    }
> +                    if (len2) {
> +                        cxl_insert_extent_to_extent_list(updated_list,
> +                                                         dpa + len,
> +                                                         len2, NULL, 0);
> +                        cnt_delta++;
> +                    }
> +
> +                    if (cnt_delta + ct3d->dc.total_extent_count >
> +                            CXL_NUM_EXTENTS_SUPPORTED) {
> +                        ret = CXL_MBOX_RESOURCES_EXHAUSTED;
> +                        goto free_and_exit;
> +                    }
> +
> +                    len -= len_done;
> +                    /* len == 0 here until superset release is added */
> +                    break;
> +                }
> +            }
> +            if (len) {
> +                ret = CXL_MBOX_INVALID_PA;
> +                goto free_and_exit;
> +            }
> +        }
> +    }
> +free_and_exit:
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        QTAILQ_FOREACH_SAFE(ent, updated_list, node, ent_next) {
> +            cxl_remove_extent_from_extent_list(updated_list, ent);
> +        }
> +        *updated_list_size = 0;
> +    } else {
> +        *updated_list_size = ct3d->dc.total_extent_count + cnt_delta;
> +    }
> +
> +    return ret;
> +}
> +
> +/*
> + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> + */
> +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> +                                          uint8_t *payload_in,
> +                                          size_t len_in,
> +                                          uint8_t *payload_out,
> +                                          size_t *len_out,
> +                                          CXLCCI *cci)
> +{
> +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    CXLDCExtentList updated_list;
> +    CXLDCExtent *ent, *ent_next;
> +    uint32_t updated_list_size;
> +    CXLRetCode ret;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    ret = cxl_dc_extent_release_dry_run(ct3d, in, &updated_list,
> +                                        &updated_list_size);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    /*
> +     * If the dry run release passes, the returned updated_list will
> +     * be the updated extent list and we just need to clear the extents
> +     * in the accepted list and copy extents in the updated_list to accepted
> +     * list and update the extent count;
> +     */
> +    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> +    }
> +    copy_extent_list(&ct3d->dc.extents, &updated_list);
> +    QTAILQ_FOREACH_SAFE(ent, &updated_list, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> +    }

Instead of doing a copy-delete, it should be simple to just relink the 
list pointers of updated_list to ct3d->dc.extents - similar to the 
QSIMPLEQ_CONCAT operation for QSIMPLEQ (unfortunately there isn't one 
defined already for QTAILQ, but you could add one :)

Otherwise, looks great to me. Thanks for the update,
Jørgen

> +    ct3d->dc.total_extent_count = updated_list_size;
> +
> +    return CXL_MBOX_SUCCESS;
> +}
> +
>   #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>   #define IMMEDIATE_DATA_CHANGE (1 << 2)
>   #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -1448,6 +1838,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
>       [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
>           "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
>           8, 0 },
> +    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
> +        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
> +        ~0, IMMEDIATE_DATA_CHANGE },
> +    [DCD_CONFIG][RELEASE_DYN_CAP] = {
> +        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
> +        ~0, IMMEDIATE_DATA_CHANGE },
>   };
> 
>   static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 48cce3bb13..2d4b6242f0 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -671,6 +671,15 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
>       return true;
>   }
> 
> +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> +{
> +    CXLDCExtent *ent, *ent_next;
> +
> +    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> +    }
> +}
> +
>   static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>   {
>       DeviceState *ds = DEVICE(ct3d);
> @@ -867,6 +876,7 @@ err_free_special_ops:
>       g_free(regs->special_ops);
>   err_address_space_free:
>       if (ct3d->dc.host_dc) {
> +        cxl_destroy_dc_regions(ct3d);
>           address_space_destroy(&ct3d->dc.host_dc_as);
>       }
>       if (ct3d->hostpmem) {
> @@ -888,6 +898,7 @@ static void ct3_exit(PCIDevice *pci_dev)
>       cxl_doe_cdat_release(cxl_cstate);
>       g_free(regs->special_ops);
>       if (ct3d->dc.host_dc) {
> +        cxl_destroy_dc_regions(ct3d);
>           address_space_destroy(&ct3d->dc.host_dc_as);
>       }
>       if (ct3d->hostpmem) {
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 6aec6ac983..df3511e91b 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -551,4 +551,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
> 
>   void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
> 
> +CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> +
> +void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> +                                        CXLDCExtent *extent);
>   #endif
> --
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-15 20:06         ` fan
@ 2024-04-16 14:58             ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-16 14:58 UTC (permalink / raw)
  To: fan
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 15 Apr 2024 13:06:04 -0700
fan <nifan.cxl@gmail.com> wrote:

> From ce75be83e915fbc4dd6e489f976665b81174002b Mon Sep 17 00:00:00 2001
> From: Fan Ni <fan.ni@samsung.com>
> Date: Tue, 20 Feb 2024 09:48:31 -0800
> Subject: [PATCH 09/13] hw/cxl/events: Add qmp interfaces to add/release
>  dynamic capacity extents
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "selection-policy": 2,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "flags": 1,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Nice!  A few small comments inline - particularly don't be nice to the
kernel by blocking things it doesn't understand yet ;)

Jonathan

> ---
>  hw/cxl/cxl-mailbox-utils.c  |  65 ++++++--
>  hw/mem/cxl_type3.c          | 310 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json               |  69 ++++++++
>  6 files changed, 491 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index cd9092b6bf..839ae836a1 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c

>  /*
>   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
>   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>  {
>      uint32_t i;
>      CXLDCExtent *ent;
> +    CXLDCExtentGroup *ext_group;
>      uint64_t dpa, len;
>      Range range1, range2;
>  
> @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>          range_init_nofail(&range1, dpa, len);
>  
>          /*
> -         * TODO: once the pending extent list is added, check against
> -         * the list will be added here.
> +         * The host-accepted DPA range must be contained by the first extent
> +         * group in the pending list
>           */
> +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
>  
>          /* to-be-added range should not overlap with range already accepted */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1588,26 +1631,26 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> -        /*
> -         * TODO: once the pending list is introduced, extents in the beginning
> -         * will get wiped out.
> -         */
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return CXL_MBOX_SUCCESS;
>      }
>  
>      /* Adding extents causes exceeding device's extent tracking ability. */
>      if (in->num_entries_updated + ct3d->dc.total_extent_count >
>          CXL_NUM_EXTENTS_SUPPORTED) {
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return CXL_MBOX_RESOURCES_EXHAUSTED;
>      }
>  
>      ret = cxl_detect_malformed_extent_list(ct3d, in);
>      if (ret != CXL_MBOX_SUCCESS) {
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);

If it's a bad message from the host, I don't think the device is supposed to
do anything with pending extents.

>          return ret;
>      }
>  
>      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
>      if (ret != CXL_MBOX_SUCCESS) {
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return ret;
>      }



> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 2d4b6242f0..8d99b27b27 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c

> +/*
> + * The main function to process dynamic capacity event with extent list.
> + * Currently DC extents add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> +        CXLDCExtentRecordList *records, Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    CXLDCExtentGroup *group = NULL;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> +    uint64_t dpa, offset, len, block_size;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    int i;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +        num_extents++;
> +    }
> +
> +    if (num_extents > 1) {
> +        error_setg(errp,
> +                   "TODO: remove the check once kernel support More flag");
Not our problem :)  For now we can just test the kernel by passing in single
extents via separate commands.

I don't want to carry unnecessary limitations in qemu.

> +        return;
> +    }
> +

> +
> +#define REMOVAL_POLICY_MASK 0xf
> +#define FORCED_REMOVAL_BIT BIT(4)
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> +
> +    if (flags & FORCED_REMOVAL_BIT) {
> +        /* TODO: enable forced removal in the future */
> +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> +        error_setg(errp, "Forced removal not supported yet");
> +        return;
> +    }
> +
> +    switch (flags & REMOVAL_POLICY_MASK) {
> +    case 1:
Probably benefit form a suitable define.

> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> +                                                      region_id, records, errp);
> +        break;

I'd not noticed before but might as well return from these case blocks.

> +    default:
> +        error_setg(errp, "Removal policy not supported");
> +        break;
> +    }
> +}

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-16 14:58             ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-16 14:58 UTC (permalink / raw)
  To: fan
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 15 Apr 2024 13:06:04 -0700
fan <nifan.cxl@gmail.com> wrote:

> From ce75be83e915fbc4dd6e489f976665b81174002b Mon Sep 17 00:00:00 2001
> From: Fan Ni <fan.ni@samsung.com>
> Date: Tue, 20 Feb 2024 09:48:31 -0800
> Subject: [PATCH 09/13] hw/cxl/events: Add qmp interfaces to add/release
>  dynamic capacity extents
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "selection-policy": 2,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "hid": 0,
>       "flags": 1,
>       "region-id": 0,
>       "tag": "",
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Nice!  A few small comments inline - particularly don't be nice to the
kernel by blocking things it doesn't understand yet ;)

Jonathan

> ---
>  hw/cxl/cxl-mailbox-utils.c  |  65 ++++++--
>  hw/mem/cxl_type3.c          | 310 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json               |  69 ++++++++
>  6 files changed, 491 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index cd9092b6bf..839ae836a1 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c

>  /*
>   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
>   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>  {
>      uint32_t i;
>      CXLDCExtent *ent;
> +    CXLDCExtentGroup *ext_group;
>      uint64_t dpa, len;
>      Range range1, range2;
>  
> @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
>          range_init_nofail(&range1, dpa, len);
>  
>          /*
> -         * TODO: once the pending extent list is added, check against
> -         * the list will be added here.
> +         * The host-accepted DPA range must be contained by the first extent
> +         * group in the pending list
>           */
> +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
>  
>          /* to-be-added range should not overlap with range already accepted */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1588,26 +1631,26 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> -        /*
> -         * TODO: once the pending list is introduced, extents in the beginning
> -         * will get wiped out.
> -         */
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return CXL_MBOX_SUCCESS;
>      }
>  
>      /* Adding extents causes exceeding device's extent tracking ability. */
>      if (in->num_entries_updated + ct3d->dc.total_extent_count >
>          CXL_NUM_EXTENTS_SUPPORTED) {
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return CXL_MBOX_RESOURCES_EXHAUSTED;
>      }
>  
>      ret = cxl_detect_malformed_extent_list(ct3d, in);
>      if (ret != CXL_MBOX_SUCCESS) {
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);

If it's a bad message from the host, I don't think the device is supposed to
do anything with pending extents.

>          return ret;
>      }
>  
>      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
>      if (ret != CXL_MBOX_SUCCESS) {
> +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
>          return ret;
>      }



> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 2d4b6242f0..8d99b27b27 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c

> +/*
> + * The main function to process dynamic capacity event with extent list.
> + * Currently DC extents add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> +        CXLDCExtentRecordList *records, Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    CXLDCExtentGroup *group = NULL;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> +    uint64_t dpa, offset, len, block_size;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    int i;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                                     dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA range");
> +                return;
> +            }
> +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> +                error_setg(errp,
> +                           "cannot add DPA already accessible  to the same LD");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +        num_extents++;
> +    }
> +
> +    if (num_extents > 1) {
> +        error_setg(errp,
> +                   "TODO: remove the check once kernel support More flag");
Not our problem :)  For now we can just test the kernel by passing in single
extents via separate commands.

I don't want to carry unnecessary limitations in qemu.

> +        return;
> +    }
> +

> +
> +#define REMOVAL_POLICY_MASK 0xf
> +#define FORCED_REMOVAL_BIT BIT(4)
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> +                                      uint8_t flags, uint8_t region_id,
> +                                      const char *tag,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> +
> +    if (flags & FORCED_REMOVAL_BIT) {
> +        /* TODO: enable forced removal in the future */
> +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> +        error_setg(errp, "Forced removal not supported yet");
> +        return;
> +    }
> +
> +    switch (flags & REMOVAL_POLICY_MASK) {
> +    case 1:
Probably benefit form a suitable define.

> +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> +                                                      region_id, records, errp);
> +        break;

I'd not noticed before but might as well return from these case blocks.

> +    default:
> +        error_setg(errp, "Removal policy not supported");
> +        break;
> +    }
> +}


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  2024-04-15 17:37     ` fan
@ 2024-04-16 15:00         ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-16 15:00 UTC (permalink / raw)
  To: fan
  Cc: Gregory Price, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 15 Apr 2024 10:37:00 -0700
fan <nifan.cxl@gmail.com> wrote:

> On Fri, Apr 12, 2024 at 06:54:42PM -0400, Gregory Price wrote:
> > On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan.cxl@gmail.com wrote:  
> > > From: Fan Ni <fan.ni@samsung.com>
> > > 
> > > All dpa ranges in the DC regions are invalid to access until an extent
> > > covering the range has been added. Add a bitmap for each region to
> > > record whether a DC block in the region has been backed by DC extent.
> > > For the bitmap, a bit in the bitmap represents a DC block. When a DC
> > > extent is added, all the bits of the blocks in the extent will be set,
> > > which will be cleared when the extent is released.
> > > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > > ---
> > >  hw/cxl/cxl-mailbox-utils.c  |  6 +++
> > >  hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
> > >  include/hw/cxl/cxl_device.h |  7 ++++
> > >  3 files changed, 89 insertions(+)
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index 7094e007b9..a0d2239176 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > >  
> > >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > >          ct3d->dc.total_extent_count += 1;
> > > +        ct3_set_region_block_backed(ct3d, dpa, len);
> > >  
> > >          ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > >          cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);  
> > 
> > while looking at the MHD code, we had decided to "reserve" the blocks in
> > the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
> > prevent a potential double-allocation (basically we need to sanity check
> > that two hosts aren't reserving the region PRIOR to the host being
> > notified).
> > 
> > I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
> > to prevent pending extents from being double-allocated.  Is this an
> > explicit choice?
> > 
> > I can see, for example, why you may want to allow the following in the
> > pending list: [Add X, Remove X, Add X].  I just want to know if this is
> > intentional or not. If not, you may consider adding a pending check
> > during the sanity check phase of `qmp_cxl_process_dynamic_capacity`
> > 
> > ~Gregory  
> 
> First, for remove request, pending list is not involved. See cxl r3.1,
> 9.13.3.3. Pending basically means "pending to add". 
> So for the above example, in the pending list, you can see [Add x, add x] if the
> event is not processed in time.
> Second, from the spec, I cannot find any text saying we cannot issue
> another add extent X if it is still pending.

I think there is text saying that the capacity is not released for reuse
by the device until it receives a response from the host.   Whilst
it's not explicit on offers to the same host, I'm not sure that matters.
So I don't think it is suppose to queue multiple extents...


> From the kernel side, if the first one is accepted, the second one will
> get rejected, and there is no issue there.
> If the first is reject for some reason, the second one can get
> accepted or rejected and do not need to worry about the first one.
> 
> 
> Fan
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
@ 2024-04-16 15:00         ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-16 15:00 UTC (permalink / raw)
  To: fan
  Cc: Gregory Price, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Mon, 15 Apr 2024 10:37:00 -0700
fan <nifan.cxl@gmail.com> wrote:

> On Fri, Apr 12, 2024 at 06:54:42PM -0400, Gregory Price wrote:
> > On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan.cxl@gmail.com wrote:  
> > > From: Fan Ni <fan.ni@samsung.com>
> > > 
> > > All dpa ranges in the DC regions are invalid to access until an extent
> > > covering the range has been added. Add a bitmap for each region to
> > > record whether a DC block in the region has been backed by DC extent.
> > > For the bitmap, a bit in the bitmap represents a DC block. When a DC
> > > extent is added, all the bits of the blocks in the extent will be set,
> > > which will be cleared when the extent is released.
> > > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > > ---
> > >  hw/cxl/cxl-mailbox-utils.c  |  6 +++
> > >  hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
> > >  include/hw/cxl/cxl_device.h |  7 ++++
> > >  3 files changed, 89 insertions(+)
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index 7094e007b9..a0d2239176 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > >  
> > >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > >          ct3d->dc.total_extent_count += 1;
> > > +        ct3_set_region_block_backed(ct3d, dpa, len);
> > >  
> > >          ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > >          cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);  
> > 
> > while looking at the MHD code, we had decided to "reserve" the blocks in
> > the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
> > prevent a potential double-allocation (basically we need to sanity check
> > that two hosts aren't reserving the region PRIOR to the host being
> > notified).
> > 
> > I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
> > to prevent pending extents from being double-allocated.  Is this an
> > explicit choice?
> > 
> > I can see, for example, why you may want to allow the following in the
> > pending list: [Add X, Remove X, Add X].  I just want to know if this is
> > intentional or not. If not, you may consider adding a pending check
> > during the sanity check phase of `qmp_cxl_process_dynamic_capacity`
> > 
> > ~Gregory  
> 
> First, for remove request, pending list is not involved. See cxl r3.1,
> 9.13.3.3. Pending basically means "pending to add". 
> So for the above example, in the pending list, you can see [Add x, add x] if the
> event is not processed in time.
> Second, from the spec, I cannot find any text saying we cannot issue
> another add extent X if it is still pending.

I think there is text saying that the capacity is not released for reuse
by the device until it receives a response from the host.   Whilst
it's not explicit on offers to the same host, I'm not sure that matters.
So I don't think it is suppose to queue multiple extents...


> From the kernel side, if the first one is accepted, the second one will
> get rejected, and there is no issue there.
> If the first is reject for some reason, the second one can get
> accepted or rejected and do not need to worry about the first one.
> 
> 
> Fan
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-04-16 10:02       ` Jørgen Hansen
@ 2024-04-16 16:27         ` fan
  0 siblings, 0 replies; 65+ messages in thread
From: fan @ 2024-04-16 16:27 UTC (permalink / raw)
  To: Jørgen Hansen
  Cc: fan, qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
	ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
	jim.harris, wj28.lee, Fan Ni

On Tue, Apr 16, 2024 at 10:02:53AM +0000, Jørgen Hansen wrote:
> On 4/15/24 19:56, fan wrote:
> >  From 4b9695299d3d4b22f83666f8ab79099ec9f9817f Mon Sep 17 00:00:00 2001
> > From: Fan Ni <fan.ni@samsung.com>
> > Date: Tue, 20 Feb 2024 09:48:30 -0800
> > Subject: [PATCH 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to
> >   support add/release dynamic capacity response
> > 
> > Per CXL spec 3.1, two mailbox commands are implemented:
> > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> > 
> > For the process of the above two commands, we use two-pass approach.
> > Pass 1: Check whether the input payload is valid or not; if not, skip
> >          Pass 2 and return mailbox process error.
> > Pass 2: Do the real work--add or release extents, respectively.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >   hw/cxl/cxl-mailbox-utils.c  | 396 ++++++++++++++++++++++++++++++++++++
> >   hw/mem/cxl_type3.c          |  11 +
> >   include/hw/cxl/cxl_device.h |   4 +
> >   3 files changed, 411 insertions(+)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 1915959015..cd9092b6bf 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> 
> snip
> 
> > +/*
> > + * Copy extent list from src to dst
> > + * Return value: number of extents copied
> > + */
> > +static uint32_t copy_extent_list(CXLDCExtentList *dst,
> > +                                 const CXLDCExtentList *src)
> > +{
> > +    uint32_t cnt = 0;
> > +    CXLDCExtent *ent;
> > +
> > +    if (!dst || !src) {
> > +        return 0;
> > +    }
> > +
> > +    QTAILQ_FOREACH(ent, src, node) {
> > +        cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
> > +                                         ent->tag, ent->shared_seq);
> > +        cnt++;
> > +    }
> > +    return cnt;
> > +}
> > +
> > +static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
> > +        const CXLUpdateDCExtentListInPl *in, CXLDCExtentList *updated_list,
> > +        uint32_t *updated_list_size)
> > +{
> > +    CXLDCExtent *ent, *ent_next;
> > +    uint64_t dpa, len;
> > +    uint32_t i;
> > +    int cnt_delta = 0;
> > +    CXLRetCode ret = CXL_MBOX_SUCCESS;
> > +
> > +    QTAILQ_INIT(updated_list);
> > +    copy_extent_list(updated_list, &ct3d->dc.extents);
> > +
> > +    for (i = 0; i < in->num_entries_updated; i++) {
> > +        Range range;
> > +
> > +        dpa = in->updated_entries[i].start_dpa;
> > +        len = in->updated_entries[i].len;
> > +
> > +        while (len > 0) {
> > +            QTAILQ_FOREACH(ent, updated_list, node) {
> > +                range_init_nofail(&range, ent->start_dpa, ent->len);
> > +
> > +                if (range_contains(&range, dpa)) {
> > +                    uint64_t len1, len2 = 0, len_done = 0;
> > +                    uint64_t ent_start_dpa = ent->start_dpa;
> > +                    uint64_t ent_len = ent->len;
> > +
> > +                    len1 = dpa - ent->start_dpa;
> > +                    /* Found the extent or the subset of an existing extent */
> > +                    if (range_contains(&range, dpa + len - 1)) {
> > +                        len2 = ent_start_dpa + ent_len - dpa - len;
> > +                    } else {
> > +                        /*
> > +                         * TODO: we reject the attempt to remove an extent
> > +                         * that overlaps with multiple extents in the device
> > +                         * for now. We will allow it once superset release
> > +                         * support is added.
> > +                         */
> > +                        ret = CXL_MBOX_INVALID_PA;
> > +                        goto free_and_exit;
> > +                    }
> > +                    len_done = ent_len - len1 - len2;
> > +
> > +                    cxl_remove_extent_from_extent_list(updated_list, ent);
> > +                    cnt_delta--;
> > +
> > +                    if (len1) {
> > +                        cxl_insert_extent_to_extent_list(updated_list,
> > +                                                         ent_start_dpa,
> > +                                                         len1, NULL, 0);
> > +                        cnt_delta++;
> > +                    }
> > +                    if (len2) {
> > +                        cxl_insert_extent_to_extent_list(updated_list,
> > +                                                         dpa + len,
> > +                                                         len2, NULL, 0);
> > +                        cnt_delta++;
> > +                    }
> > +
> > +                    if (cnt_delta + ct3d->dc.total_extent_count >
> > +                            CXL_NUM_EXTENTS_SUPPORTED) {
> > +                        ret = CXL_MBOX_RESOURCES_EXHAUSTED;
> > +                        goto free_and_exit;
> > +                    }
> > +
> > +                    len -= len_done;
> > +                    /* len == 0 here until superset release is added */
> > +                    break;
> > +                }
> > +            }
> > +            if (len) {
> > +                ret = CXL_MBOX_INVALID_PA;
> > +                goto free_and_exit;
> > +            }
> > +        }
> > +    }
> > +free_and_exit:
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        QTAILQ_FOREACH_SAFE(ent, updated_list, node, ent_next) {
> > +            cxl_remove_extent_from_extent_list(updated_list, ent);
> > +        }
> > +        *updated_list_size = 0;
> > +    } else {
> > +        *updated_list_size = ct3d->dc.total_extent_count + cnt_delta;
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +/*
> > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> > + */
> > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> > +                                          uint8_t *payload_in,
> > +                                          size_t len_in,
> > +                                          uint8_t *payload_out,
> > +                                          size_t *len_out,
> > +                                          CXLCCI *cci)
> > +{
> > +    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > +    CXLDCExtentList updated_list;
> > +    CXLDCExtent *ent, *ent_next;
> > +    uint32_t updated_list_size;
> > +    CXLRetCode ret;
> > +
> > +    if (in->num_entries_updated == 0) {
> > +        return CXL_MBOX_INVALID_INPUT;
> > +    }
> > +
> > +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    ret = cxl_dc_extent_release_dry_run(ct3d, in, &updated_list,
> > +                                        &updated_list_size);
> > +    if (ret != CXL_MBOX_SUCCESS) {
> > +        return ret;
> > +    }
> > +
> > +    /*
> > +     * If the dry run release passes, the returned updated_list will
> > +     * be the updated extent list and we just need to clear the extents
> > +     * in the accepted list and copy extents in the updated_list to accepted
> > +     * list and update the extent count;
> > +     */
> > +    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> > +    }
> > +    copy_extent_list(&ct3d->dc.extents, &updated_list);
> > +    QTAILQ_FOREACH_SAFE(ent, &updated_list, node, ent_next) {
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> > +    }
> 
> Instead of doing a copy-delete, it should be simple to just relink the 
> list pointers of updated_list to ct3d->dc.extents - similar to the 
> QSIMPLEQ_CONCAT operation for QSIMPLEQ (unfortunately there isn't one 
> defined already for QTAILQ, but you could add one :)
> 
> Otherwise, looks great to me. Thanks for the update,
> Jørgen

Hi Jorgen,
Thanks for the suggestion. The issue here is we will introduce a bitmap
indicating which DPA range is backed with added extents in the next
patch, for the add/release processing, we need to update the bitmap to
reflect the update-to-date extent information. The remove and add action here
provides a natural way to update the bitmap like below 


   QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
        ct3_clear_region_block_backed(ct3d, ent->start_dpa, ent->len);
        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
    }
    copy_extent_list(&ct3d->dc.extents, &updated_list);
    QTAILQ_FOREACH_SAFE(ent, &updated_list, node, ent_next) {
        ct3_set_region_block_backed(ct3d, ent->start_dpa, ent->len);
        cxl_remove_extent_from_extent_list(&updated_list, ent);
    }

Fan
> 
> > +    ct3d->dc.total_extent_count = updated_list_size;
> > +
> > +    return CXL_MBOX_SUCCESS;
> > +}
> > +
> >   #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> >   #define IMMEDIATE_DATA_CHANGE (1 << 2)
> >   #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> > @@ -1448,6 +1838,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
> >       [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
> >           "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
> >           8, 0 },
> > +    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
> > +        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
> > +        ~0, IMMEDIATE_DATA_CHANGE },
> > +    [DCD_CONFIG][RELEASE_DYN_CAP] = {
> > +        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
> > +        ~0, IMMEDIATE_DATA_CHANGE },
> >   };
> > 
> >   static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 48cce3bb13..2d4b6242f0 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -671,6 +671,15 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> >       return true;
> >   }
> > 
> > +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> > +{
> > +    CXLDCExtent *ent, *ent_next;
> > +
> > +    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> > +    }
> > +}
> > +
> >   static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >   {
> >       DeviceState *ds = DEVICE(ct3d);
> > @@ -867,6 +876,7 @@ err_free_special_ops:
> >       g_free(regs->special_ops);
> >   err_address_space_free:
> >       if (ct3d->dc.host_dc) {
> > +        cxl_destroy_dc_regions(ct3d);
> >           address_space_destroy(&ct3d->dc.host_dc_as);
> >       }
> >       if (ct3d->hostpmem) {
> > @@ -888,6 +898,7 @@ static void ct3_exit(PCIDevice *pci_dev)
> >       cxl_doe_cdat_release(cxl_cstate);
> >       g_free(regs->special_ops);
> >       if (ct3d->dc.host_dc) {
> > +        cxl_destroy_dc_regions(ct3d);
> >           address_space_destroy(&ct3d->dc.host_dc_as);
> >       }
> >       if (ct3d->hostpmem) {
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index 6aec6ac983..df3511e91b 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -551,4 +551,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
> > 
> >   void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
> > 
> > +CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> > +
> > +void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> > +                                        CXLDCExtent *extent);
> >   #endif
> > --
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  2024-04-16 15:00         ` Jonathan Cameron via
  (?)
@ 2024-04-16 16:37         ` fan
  2024-04-17 11:59             ` Jonathan Cameron via
  -1 siblings, 1 reply; 65+ messages in thread
From: fan @ 2024-04-16 16:37 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: fan, Gregory Price, qemu-devel, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Tue, Apr 16, 2024 at 04:00:56PM +0100, Jonathan Cameron wrote:
> On Mon, 15 Apr 2024 10:37:00 -0700
> fan <nifan.cxl@gmail.com> wrote:
> 
> > On Fri, Apr 12, 2024 at 06:54:42PM -0400, Gregory Price wrote:
> > > On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan.cxl@gmail.com wrote:  
> > > > From: Fan Ni <fan.ni@samsung.com>
> > > > 
> > > > All dpa ranges in the DC regions are invalid to access until an extent
> > > > covering the range has been added. Add a bitmap for each region to
> > > > record whether a DC block in the region has been backed by DC extent.
> > > > For the bitmap, a bit in the bitmap represents a DC block. When a DC
> > > > extent is added, all the bits of the blocks in the extent will be set,
> > > > which will be cleared when the extent is released.
> > > > 
> > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > > > ---
> > > >  hw/cxl/cxl-mailbox-utils.c  |  6 +++
> > > >  hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
> > > >  include/hw/cxl/cxl_device.h |  7 ++++
> > > >  3 files changed, 89 insertions(+)
> > > > 
> > > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > > index 7094e007b9..a0d2239176 100644
> > > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > > @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > > >  
> > > >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > > >          ct3d->dc.total_extent_count += 1;
> > > > +        ct3_set_region_block_backed(ct3d, dpa, len);
> > > >  
> > > >          ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > >          cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);  
> > > 
> > > while looking at the MHD code, we had decided to "reserve" the blocks in
> > > the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
> > > prevent a potential double-allocation (basically we need to sanity check
> > > that two hosts aren't reserving the region PRIOR to the host being
> > > notified).
> > > 
> > > I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
> > > to prevent pending extents from being double-allocated.  Is this an
> > > explicit choice?
> > > 
> > > I can see, for example, why you may want to allow the following in the
> > > pending list: [Add X, Remove X, Add X].  I just want to know if this is
> > > intentional or not. If not, you may consider adding a pending check
> > > during the sanity check phase of `qmp_cxl_process_dynamic_capacity`
> > > 
> > > ~Gregory  
> > 
> > First, for remove request, pending list is not involved. See cxl r3.1,
> > 9.13.3.3. Pending basically means "pending to add". 
> > So for the above example, in the pending list, you can see [Add x, add x] if the
> > event is not processed in time.
> > Second, from the spec, I cannot find any text saying we cannot issue
> > another add extent X if it is still pending.
> 
> I think there is text saying that the capacity is not released for reuse
> by the device until it receives a response from the host.   Whilst
> it's not explicit on offers to the same host, I'm not sure that matters.
> So I don't think it is suppose to queue multiple extents...

Are you suggesting we add a check here to reject the second add when the
first one is still pending?

Currently, we do not allow releasing an extent when it is still pending,
which aligns with the case you mentioned above "not release for reuse", I
think.
Can the second add mean a retry instead of reuse? 

Fan

> 
> 
> > From the kernel side, if the first one is accepted, the second one will
> > get rejected, and there is no issue there.
> > If the first is reject for some reason, the second one can get
> > accepted or rejected and do not need to worry about the first one.
> > 
> > 
> > Fan
> > 
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-16 14:58             ` Jonathan Cameron via
  (?)
@ 2024-04-16 16:52             ` fan
  2024-04-17 11:50                 ` Jonathan Cameron via
  -1 siblings, 1 reply; 65+ messages in thread
From: fan @ 2024-04-16 16:52 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: fan, qemu-devel, linux-cxl, gregory.price, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
	Jorgen.Hansen, wj28.lee, Fan Ni

On Tue, Apr 16, 2024 at 03:58:22PM +0100, Jonathan Cameron wrote:
> On Mon, 15 Apr 2024 13:06:04 -0700
> fan <nifan.cxl@gmail.com> wrote:
> 
> > From ce75be83e915fbc4dd6e489f976665b81174002b Mon Sep 17 00:00:00 2001
> > From: Fan Ni <fan.ni@samsung.com>
> > Date: Tue, 20 Feb 2024 09:48:31 -0800
> > Subject: [PATCH 09/13] hw/cxl/events: Add qmp interfaces to add/release
> >  dynamic capacity extents
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> > 
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> > 
> > 1. Add dynamic capacity extents:
> > 
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "hid": 0,
> >       "selection-policy": 2,
> >       "region-id": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > 2. Release dynamic capacity extents:
> > 
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> > 
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "hid": 0,
> >       "flags": 1,
> >       "region-id": 0,
> >       "tag": "",
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> 
> Nice!  A few small comments inline - particularly don't be nice to the
> kernel by blocking things it doesn't understand yet ;)
> 
> Jonathan
> 
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  65 ++++++--
> >  hw/mem/cxl_type3.c          | 310 +++++++++++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3_stubs.c    |  20 +++
> >  include/hw/cxl/cxl_device.h |  22 +++
> >  include/hw/cxl/cxl_events.h |  18 +++
> >  qapi/cxl.json               |  69 ++++++++
> >  6 files changed, 491 insertions(+), 13 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index cd9092b6bf..839ae836a1 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> 
> >  /*
> >   * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> >   * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> > @@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> >  {
> >      uint32_t i;
> >      CXLDCExtent *ent;
> > +    CXLDCExtentGroup *ext_group;
> >      uint64_t dpa, len;
> >      Range range1, range2;
> >  
> > @@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
> >          range_init_nofail(&range1, dpa, len);
> >  
> >          /*
> > -         * TODO: once the pending extent list is added, check against
> > -         * the list will be added here.
> > +         * The host-accepted DPA range must be contained by the first extent
> > +         * group in the pending list
> >           */
> > +        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
> > +            return CXL_MBOX_INVALID_PA;
> > +        }
> >  
> >          /* to-be-added range should not overlap with range already accepted */
> >          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > @@ -1588,26 +1631,26 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >      CXLRetCode ret;
> >  
> >      if (in->num_entries_updated == 0) {
> > -        /*
> > -         * TODO: once the pending list is introduced, extents in the beginning
> > -         * will get wiped out.
> > -         */
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >          return CXL_MBOX_SUCCESS;
> >      }
> >  
> >      /* Adding extents causes exceeding device's extent tracking ability. */
> >      if (in->num_entries_updated + ct3d->dc.total_extent_count >
> >          CXL_NUM_EXTENTS_SUPPORTED) {
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >          return CXL_MBOX_RESOURCES_EXHAUSTED;
> >      }
> >  
> >      ret = cxl_detect_malformed_extent_list(ct3d, in);
> >      if (ret != CXL_MBOX_SUCCESS) {
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> 
> If it's a bad message from the host, I don't think the device is supposed to
> do anything with pending extents.

It is not clear to me here. 

In the spec r3.1 8.2.9.9.9.3, Add Dynamic Capacity Response (Opcode 4802h),
there is text like "After this command is received, the device is free to
reclaim capacity that the host does not utilize.", that seems to imply
as long as the response is received, we need to update the pending list
so the capacity unused can be reclaimed. But of course, we can say if
there is error, we cannot tell whether the host accepts the extents or
not so not update the pending list.

> 
> >          return ret;
> >      }
> >  
> >      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> >      if (ret != CXL_MBOX_SUCCESS) {
> > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
> >          return ret;
> >      }
> 
> 
> 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 2d4b6242f0..8d99b27b27 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> 
> > +/*
> > + * The main function to process dynamic capacity event with extent list.
> > + * Currently DC extents add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> > +        uint16_t hid, CXLDCEventType type, uint8_t rid,
> > +        CXLDCExtentRecordList *records, Error **errp)
> > +{
> > +    Object *obj;
> > +    CXLEventDynamicCapacity dCap = {};
> > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > +    CXLType3Dev *dcd;
> > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > +    uint32_t num_extents = 0;
> > +    CXLDCExtentRecordList *list;
> > +    CXLDCExtentGroup *group = NULL;
> > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > +    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
> > +    uint64_t dpa, offset, len, block_size;
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +    int i;
> > +
> > +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> > +    if (!obj) {
> > +        error_setg(errp, "Unable to resolve CXL type 3 device");
> > +        return;
> > +    }
> > +
> > +    dcd = CXL_TYPE3(obj);
> > +    if (!dcd->dc.num_regions) {
> > +        error_setg(errp, "No dynamic capacity support from the device");
> > +        return;
> > +    }
> > +
> > +
> > +    if (rid >= dcd->dc.num_regions) {
> > +        error_setg(errp, "region id is too large");
> > +        return;
> > +    }
> > +    block_size = dcd->dc.regions[rid].block_size;
> > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = offset + dcd->dc.regions[rid].base;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > +            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                                     dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with pending DPA range");
> > +                return;
> > +            }
> > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with non-existing DPA range");
> > +                return;
> > +            }
> > +        } else if (type == DC_EVENT_ADD_CAPACITY) {
> > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot add DPA already accessible  to the same LD");
> > +                return;
> > +            }
> > +        }
> > +        list = list->next;
> > +        num_extents++;
> > +    }
> > +
> > +    if (num_extents > 1) {
> > +        error_setg(errp,
> > +                   "TODO: remove the check once kernel support More flag");
> Not our problem :)  For now we can just test the kernel by passing in single
> extents via separate commands.
> 
> I don't want to carry unnecessary limitations in qemu.
> 

Will remove the check here.

> > +        return;
> > +    }
> > +
> 
> > +
> > +#define REMOVAL_POLICY_MASK 0xf
> > +#define FORCED_REMOVAL_BIT BIT(4)
> > +
> > +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> > +                                      uint8_t flags, uint8_t region_id,
> > +                                      const char *tag,
> > +                                      CXLDCExtentRecordList  *records,
> > +                                      Error **errp)
> > +{
> > +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> > +
> > +    if (flags & FORCED_REMOVAL_BIT) {
> > +        /* TODO: enable forced removal in the future */
> > +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> > +        error_setg(errp, "Forced removal not supported yet");
> > +        return;
> > +    }
> > +
> > +    switch (flags & REMOVAL_POLICY_MASK) {
> > +    case 1:
> Probably benefit form a suitable define.
> 
> > +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> > +                                                      region_id, records, errp);
> > +        break;
> 
> I'd not noticed before but might as well return from these case blocks.

Sorry. I do not follow here. What do you mean by "return from these case
blocks", are you referring the check above about the forced removal case?

Fan

> 
> > +    default:
> > +        error_setg(errp, "Removal policy not supported");
> > +        break;
> > +    }
> > +}

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-16 14:58             ` Jonathan Cameron via
  (?)
  (?)
@ 2024-04-16 17:14             ` Gregory Price
  -1 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-16 17:14 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Tue, Apr 16, 2024 at 03:58:22PM +0100, Jonathan Cameron wrote:
> On Mon, 15 Apr 2024 13:06:04 -0700
> fan <nifan.cxl@gmail.com> wrote:
> 
> > From ce75be83e915fbc4dd6e489f976665b81174002b Mon Sep 17 00:00:00 2001
> > From: Fan Ni <fan.ni@samsung.com>
> > Date: Tue, 20 Feb 2024 09:48:31 -0800
> > Subject: [PATCH 09/13] hw/cxl/events: Add qmp interfaces to add/release
> >  dynamic capacity extents
> > 
> > +
> > +    if (num_extents > 1) {
> > +        error_setg(errp,
> > +                   "TODO: remove the check once kernel support More flag");
> Not our problem :)  For now we can just test the kernel by passing in single
> extents via separate commands.
> 
> I don't want to carry unnecessary limitations in qemu.
> 

Probably worth popping in to say some out of band discussions around the
`more bit` suggest it may be a while before it is supported.

Allowing QEMU to send more bit messages to the kernel would be extremely
helpful for validation that the kernel won't blow up if/when a real
device implements it.  So yes, please allow it!

~Gregory

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  2024-04-16 15:00         ` Jonathan Cameron via
  (?)
  (?)
@ 2024-04-16 17:15         ` Gregory Price
  -1 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-16 17:15 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Tue, Apr 16, 2024 at 04:00:56PM +0100, Jonathan Cameron wrote:
> On Mon, 15 Apr 2024 10:37:00 -0700
> fan <nifan.cxl@gmail.com> wrote:
> 
> > On Fri, Apr 12, 2024 at 06:54:42PM -0400, Gregory Price wrote:
> > > On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan.cxl@gmail.com wrote:  
> > > > From: Fan Ni <fan.ni@samsung.com>
> > > > 
> > > > All dpa ranges in the DC regions are invalid to access until an extent
> > > > covering the range has been added. Add a bitmap for each region to
> > > > record whether a DC block in the region has been backed by DC extent.
> > > > For the bitmap, a bit in the bitmap represents a DC block. When a DC
> > > > extent is added, all the bits of the blocks in the extent will be set,
> > > > which will be cleared when the extent is released.
> > > > 
> > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > > > ---
> > > >  hw/cxl/cxl-mailbox-utils.c  |  6 +++
> > > >  hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
> > > >  include/hw/cxl/cxl_device.h |  7 ++++
> > > >  3 files changed, 89 insertions(+)
> > > > 
> > > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > > index 7094e007b9..a0d2239176 100644
> > > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > > @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > > >  
> > > >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > > >          ct3d->dc.total_extent_count += 1;
> > > > +        ct3_set_region_block_backed(ct3d, dpa, len);
> > > >  
> > > >          ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > >          cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);  
> > > 
> > > while looking at the MHD code, we had decided to "reserve" the blocks in
> > > the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
> > > prevent a potential double-allocation (basically we need to sanity check
> > > that two hosts aren't reserving the region PRIOR to the host being
> > > notified).
> > > 
> > > I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
> > > to prevent pending extents from being double-allocated.  Is this an
> > > explicit choice?
> > > 
> > > I can see, for example, why you may want to allow the following in the
> > > pending list: [Add X, Remove X, Add X].  I just want to know if this is
> > > intentional or not. If not, you may consider adding a pending check
> > > during the sanity check phase of `qmp_cxl_process_dynamic_capacity`
> > > 
> > > ~Gregory  
> > 
> > First, for remove request, pending list is not involved. See cxl r3.1,
> > 9.13.3.3. Pending basically means "pending to add". 
> > So for the above example, in the pending list, you can see [Add x, add x] if the
> > event is not processed in time.
> > Second, from the spec, I cannot find any text saying we cannot issue
> > another add extent X if it is still pending.
> 
> I think there is text saying that the capacity is not released for reuse
> by the device until it receives a response from the host.   Whilst
> it's not explicit on offers to the same host, I'm not sure that matters.
> So I don't think it is suppose to queue multiple extents...
> 
> 

It definitely should not release capacity until it receives a response,
because the host could tell the device to kick rocks (which would be
reasonable under a variety of circumstances).

~Gregory

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-04-16 16:52             ` fan
@ 2024-04-17 11:50                 ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-17 11:50 UTC (permalink / raw)
  To: fan
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni


> > >  
> > >      ret = cxl_detect_malformed_extent_list(ct3d, in);
> > >      if (ret != CXL_MBOX_SUCCESS) {
> > > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);  
> > 
> > If it's a bad message from the host, I don't think the device is supposed to
> > do anything with pending extents.  
> 
> It is not clear to me here. 
> 
> In the spec r3.1 8.2.9.9.9.3, Add Dynamic Capacity Response (Opcode 4802h),
> there is text like "After this command is received, the device is free to
> reclaim capacity that the host does not utilize.", that seems to imply
> as long as the response is received, we need to update the pending list
> so the capacity unused can be reclaimed. But of course, we can say if
> there is error, we cannot tell whether the host accepts the extents or
> not so not update the pending list.

Can try and get a clarification as I agree 'is received' is unclear,
but in general any command that gets an error response should have no
affect on device state. If it does, then what affect it has must be stated
in the specification.

> >   
> > > +
> > > +#define REMOVAL_POLICY_MASK 0xf
> > > +#define FORCED_REMOVAL_BIT BIT(4)
> > > +
> > > +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> > > +                                      uint8_t flags, uint8_t region_id,
> > > +                                      const char *tag,
> > > +                                      CXLDCExtentRecordList  *records,
> > > +                                      Error **errp)
> > > +{
> > > +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> > > +
> > > +    if (flags & FORCED_REMOVAL_BIT) {
> > > +        /* TODO: enable forced removal in the future */
> > > +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> > > +        error_setg(errp, "Forced removal not supported yet");
> > > +        return;
> > > +    }
> > > +
> > > +    switch (flags & REMOVAL_POLICY_MASK) {
> > > +    case 1:  
> > Probably benefit form a suitable define.
> >   
> > > +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> > > +                                                      region_id, records, errp);
> > > +        break;  
> > 
> > I'd not noticed before but might as well return from these case blocks.  
> 
> Sorry. I do not follow here. What do you mean by "return from these case
> blocks", are you referring the check above about the forced removal case?

No, what I meant was much simpler - just a code refactoring thing.
        case 1:
            qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
                                                          region_id, records, errp);

            //break;
            return;
> 
> Fan
> 
> >   
> > > +    default:
> > > +        error_setg(errp, "Removal policy not supported");
> > > +        break;
               return;
> > > +    }
> > > +}  


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-17 11:50                 ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-17 11:50 UTC (permalink / raw)
  To: fan
  Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni


> > >  
> > >      ret = cxl_detect_malformed_extent_list(ct3d, in);
> > >      if (ret != CXL_MBOX_SUCCESS) {
> > > +        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);  
> > 
> > If it's a bad message from the host, I don't think the device is supposed to
> > do anything with pending extents.  
> 
> It is not clear to me here. 
> 
> In the spec r3.1 8.2.9.9.9.3, Add Dynamic Capacity Response (Opcode 4802h),
> there is text like "After this command is received, the device is free to
> reclaim capacity that the host does not utilize.", that seems to imply
> as long as the response is received, we need to update the pending list
> so the capacity unused can be reclaimed. But of course, we can say if
> there is error, we cannot tell whether the host accepts the extents or
> not so not update the pending list.

Can try and get a clarification as I agree 'is received' is unclear,
but in general any command that gets an error response should have no
affect on device state. If it does, then what affect it has must be stated
in the specification.

> >   
> > > +
> > > +#define REMOVAL_POLICY_MASK 0xf
> > > +#define FORCED_REMOVAL_BIT BIT(4)
> > > +
> > > +void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t hid,
> > > +                                      uint8_t flags, uint8_t region_id,
> > > +                                      const char *tag,
> > > +                                      CXLDCExtentRecordList  *records,
> > > +                                      Error **errp)
> > > +{
> > > +    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
> > > +
> > > +    if (flags & FORCED_REMOVAL_BIT) {
> > > +        /* TODO: enable forced removal in the future */
> > > +        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
> > > +        error_setg(errp, "Forced removal not supported yet");
> > > +        return;
> > > +    }
> > > +
> > > +    switch (flags & REMOVAL_POLICY_MASK) {
> > > +    case 1:  
> > Probably benefit form a suitable define.
> >   
> > > +        qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
> > > +                                                      region_id, records, errp);
> > > +        break;  
> > 
> > I'd not noticed before but might as well return from these case blocks.  
> 
> Sorry. I do not follow here. What do you mean by "return from these case
> blocks", are you referring the check above about the forced removal case?

No, what I meant was much simpler - just a code refactoring thing.
        case 1:
            qmp_cxl_process_dynamic_capacity_prescriptive(path, hid, type,
                                                          region_id, records, errp);

            //break;
            return;
> 
> Fan
> 
> >   
> > > +    default:
> > > +        error_setg(errp, "Removal policy not supported");
> > > +        break;
               return;
> > > +    }
> > > +}  



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  2024-04-16 16:37         ` fan
@ 2024-04-17 11:59             ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron @ 2024-04-17 11:59 UTC (permalink / raw)
  To: fan
  Cc: Gregory Price, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Tue, 16 Apr 2024 09:37:09 -0700
fan <nifan.cxl@gmail.com> wrote:

> On Tue, Apr 16, 2024 at 04:00:56PM +0100, Jonathan Cameron wrote:
> > On Mon, 15 Apr 2024 10:37:00 -0700
> > fan <nifan.cxl@gmail.com> wrote:
> >   
> > > On Fri, Apr 12, 2024 at 06:54:42PM -0400, Gregory Price wrote:  
> > > > On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan.cxl@gmail.com wrote:    
> > > > > From: Fan Ni <fan.ni@samsung.com>
> > > > > 
> > > > > All dpa ranges in the DC regions are invalid to access until an extent
> > > > > covering the range has been added. Add a bitmap for each region to
> > > > > record whether a DC block in the region has been backed by DC extent.
> > > > > For the bitmap, a bit in the bitmap represents a DC block. When a DC
> > > > > extent is added, all the bits of the blocks in the extent will be set,
> > > > > which will be cleared when the extent is released.
> > > > > 
> > > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > > > > ---
> > > > >  hw/cxl/cxl-mailbox-utils.c  |  6 +++
> > > > >  hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
> > > > >  include/hw/cxl/cxl_device.h |  7 ++++
> > > > >  3 files changed, 89 insertions(+)
> > > > > 
> > > > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > > > index 7094e007b9..a0d2239176 100644
> > > > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > > > @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > > > >  
> > > > >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > > > >          ct3d->dc.total_extent_count += 1;
> > > > > +        ct3_set_region_block_backed(ct3d, dpa, len);
> > > > >  
> > > > >          ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > > >          cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);    
> > > > 
> > > > while looking at the MHD code, we had decided to "reserve" the blocks in
> > > > the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
> > > > prevent a potential double-allocation (basically we need to sanity check
> > > > that two hosts aren't reserving the region PRIOR to the host being
> > > > notified).
> > > > 
> > > > I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
> > > > to prevent pending extents from being double-allocated.  Is this an
> > > > explicit choice?
> > > > 
> > > > I can see, for example, why you may want to allow the following in the
> > > > pending list: [Add X, Remove X, Add X].  I just want to know if this is
> > > > intentional or not. If not, you may consider adding a pending check
> > > > during the sanity check phase of `qmp_cxl_process_dynamic_capacity`
> > > > 
> > > > ~Gregory    
> > > 
> > > First, for remove request, pending list is not involved. See cxl r3.1,
> > > 9.13.3.3. Pending basically means "pending to add". 
> > > So for the above example, in the pending list, you can see [Add x, add x] if the
> > > event is not processed in time.
> > > Second, from the spec, I cannot find any text saying we cannot issue
> > > another add extent X if it is still pending.  
> > 
> > I think there is text saying that the capacity is not released for reuse
> > by the device until it receives a response from the host.   Whilst
> > it's not explicit on offers to the same host, I'm not sure that matters.
> > So I don't think it is suppose to queue multiple extents...  
> 
> Are you suggesting we add a check here to reject the second add when the
> first one is still pending?

Yes.  The capacity is not back with the device to reissue.
On an MH-MLD/SLD we'd need to prevent it being added (not shared) to multiple hosts,
this is kind of the temporal equivalent of that.

> 
> Currently, we do not allow releasing an extent when it is still pending,
> which aligns with the case you mentioned above "not release for reuse", I
> think.
> Can the second add mean a retry instead of reuse? 
No - or at least the device should not be doing that.  The FM might try
again, but only once it knows try 1 failed. For reasons of this aligning
with MHD case where you definitely can't offer it to more than one host,
I think we should not do it.  Whether we should put any effort into blocking
it is a different question.  User error :)

Note, the host must not remove a log entry until it has dealt with it
(sent a response) so there is no obvious reason to bother with a retry.
Maybe a booting host would reject all offered extents (because it's not ready
for them yet), but then I'd want the FM to explicitly decide to tell the device
to offer gain.

Whilst this is a custom interface, the equivalent FM API does say.

"The command, with selection policy Enable Shared Access, shall also fail with Invalid
Input under the following conditions:
• When the specified region is not Sharable
• When the tagged capacity is already mapped to any Host ID via a non-Sharable
region
• When the tagged capacity cannot be added to the requested region due to deviceimposed
restrictions
• When the same tagged capacity is currently accessible by the same LD"

Little fuzzy because of the whole pending vs 'mapped / accessible' wording but
I think intent is you can't send again until first one is dealt with.

Jonathan

> 
> Fan
> 
> > 
> >   
> > > From the kernel side, if the first one is accepted, the second one will
> > > get rejected, and there is no issue there.
> > > If the first is reject for some reason, the second one can get
> > > accepted or rejected and do not need to worry about the first one.
> > > 
> > > 
> > > Fan
> > >   
> >   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
@ 2024-04-17 11:59             ` Jonathan Cameron via
  0 siblings, 0 replies; 65+ messages in thread
From: Jonathan Cameron via @ 2024-04-17 11:59 UTC (permalink / raw)
  To: fan
  Cc: Gregory Price, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Tue, 16 Apr 2024 09:37:09 -0700
fan <nifan.cxl@gmail.com> wrote:

> On Tue, Apr 16, 2024 at 04:00:56PM +0100, Jonathan Cameron wrote:
> > On Mon, 15 Apr 2024 10:37:00 -0700
> > fan <nifan.cxl@gmail.com> wrote:
> >   
> > > On Fri, Apr 12, 2024 at 06:54:42PM -0400, Gregory Price wrote:  
> > > > On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan.cxl@gmail.com wrote:    
> > > > > From: Fan Ni <fan.ni@samsung.com>
> > > > > 
> > > > > All dpa ranges in the DC regions are invalid to access until an extent
> > > > > covering the range has been added. Add a bitmap for each region to
> > > > > record whether a DC block in the region has been backed by DC extent.
> > > > > For the bitmap, a bit in the bitmap represents a DC block. When a DC
> > > > > extent is added, all the bits of the blocks in the extent will be set,
> > > > > which will be cleared when the extent is released.
> > > > > 
> > > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > > > > ---
> > > > >  hw/cxl/cxl-mailbox-utils.c  |  6 +++
> > > > >  hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
> > > > >  include/hw/cxl/cxl_device.h |  7 ++++
> > > > >  3 files changed, 89 insertions(+)
> > > > > 
> > > > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > > > index 7094e007b9..a0d2239176 100644
> > > > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > > > @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > > > >  
> > > > >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > > > >          ct3d->dc.total_extent_count += 1;
> > > > > +        ct3_set_region_block_backed(ct3d, dpa, len);
> > > > >  
> > > > >          ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > > > >          cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);    
> > > > 
> > > > while looking at the MHD code, we had decided to "reserve" the blocks in
> > > > the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
> > > > prevent a potential double-allocation (basically we need to sanity check
> > > > that two hosts aren't reserving the region PRIOR to the host being
> > > > notified).
> > > > 
> > > > I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
> > > > to prevent pending extents from being double-allocated.  Is this an
> > > > explicit choice?
> > > > 
> > > > I can see, for example, why you may want to allow the following in the
> > > > pending list: [Add X, Remove X, Add X].  I just want to know if this is
> > > > intentional or not. If not, you may consider adding a pending check
> > > > during the sanity check phase of `qmp_cxl_process_dynamic_capacity`
> > > > 
> > > > ~Gregory    
> > > 
> > > First, for remove request, pending list is not involved. See cxl r3.1,
> > > 9.13.3.3. Pending basically means "pending to add". 
> > > So for the above example, in the pending list, you can see [Add x, add x] if the
> > > event is not processed in time.
> > > Second, from the spec, I cannot find any text saying we cannot issue
> > > another add extent X if it is still pending.  
> > 
> > I think there is text saying that the capacity is not released for reuse
> > by the device until it receives a response from the host.   Whilst
> > it's not explicit on offers to the same host, I'm not sure that matters.
> > So I don't think it is suppose to queue multiple extents...  
> 
> Are you suggesting we add a check here to reject the second add when the
> first one is still pending?

Yes.  The capacity is not back with the device to reissue.
On an MH-MLD/SLD we'd need to prevent it being added (not shared) to multiple hosts,
this is kind of the temporal equivalent of that.

> 
> Currently, we do not allow releasing an extent when it is still pending,
> which aligns with the case you mentioned above "not release for reuse", I
> think.
> Can the second add mean a retry instead of reuse? 
No - or at least the device should not be doing that.  The FM might try
again, but only once it knows try 1 failed. For reasons of this aligning
with MHD case where you definitely can't offer it to more than one host,
I think we should not do it.  Whether we should put any effort into blocking
it is a different question.  User error :)

Note, the host must not remove a log entry until it has dealt with it
(sent a response) so there is no obvious reason to bother with a retry.
Maybe a booting host would reject all offered extents (because it's not ready
for them yet), but then I'd want the FM to explicitly decide to tell the device
to offer gain.

Whilst this is a custom interface, the equivalent FM API does say.

"The command, with selection policy Enable Shared Access, shall also fail with Invalid
Input under the following conditions:
• When the specified region is not Sharable
• When the tagged capacity is already mapped to any Host ID via a non-Sharable
region
• When the tagged capacity cannot be added to the requested region due to deviceimposed
restrictions
• When the same tagged capacity is currently accessible by the same LD"

Little fuzzy because of the whole pending vs 'mapped / accessible' wording but
I think intent is you can't send again until first one is dealt with.

Jonathan

> 
> Fan
> 
> > 
> >   
> > > From the kernel side, if the first one is accepted, the second one will
> > > get rejected, and there is no issue there.
> > > If the first is reject for some reason, the second one can get
> > > accepted or rejected and do not need to worry about the first one.
> > > 
> > > 
> > > Fan
> > >   
> >   



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
  2024-04-17 11:59             ` Jonathan Cameron via
  (?)
@ 2024-04-18 17:58             ` Gregory Price
  -1 siblings, 0 replies; 65+ messages in thread
From: Gregory Price @ 2024-04-18 17:58 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: fan, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
	wj28.lee, Fan Ni

On Wed, Apr 17, 2024 at 12:59:51PM +0100, Jonathan Cameron wrote:
> On Tue, 16 Apr 2024 09:37:09 -0700
> fan <nifan.cxl@gmail.com> wrote:
> 
> > 
> > Currently, we do not allow releasing an extent when it is still pending,
> > which aligns with the case you mentioned above "not release for reuse", I
> > think.
> > Can the second add mean a retry instead of reuse? 
> No - or at least the device should not be doing that.  The FM might try
> again, but only once it knows try 1 failed. For reasons of this aligning
> with MHD case where you definitely can't offer it to more than one host,
> I think we should not do it.  Whether we should put any effort into blocking
> it is a different question.  User error :)
> 
> Note, the host must not remove a log entry until it has dealt with it
> (sent a response) so there is no obvious reason to bother with a retry.
> Maybe a booting host would reject all offered extents (because it's not ready
> for them yet), but then I'd want the FM to explicitly decide to tell the device
> to offer gain.
> 

This might be the only time a forced-removal makes sense, considering
that removal of a pending add could be potentially catestrophic, but if
the FM knows the host is not coming up and never coming up, an
allocation stuck in pending would not be recoverable unless you
force-removed it.

> Whilst this is a custom interface, the equivalent FM API does say.
> 
> "The command, with selection policy Enable Shared Access, shall also fail with Invalid
> Input under the following conditions:
> • When the specified region is not Sharable
> • When the tagged capacity is already mapped to any Host ID via a non-Sharable
> region
> • When the tagged capacity cannot be added to the requested region due to deviceimposed
> restrictions
> • When the same tagged capacity is currently accessible by the same LD"
> 
> Little fuzzy because of the whole pending vs 'mapped / accessible' wording but
> I think intent is you can't send again until first one is dealt with.
> 
> Jonathan
> 
> > 
> > Fan
> > 
> > > 
> > >   
> > > > From the kernel side, if the first one is accepted, the second one will
> > > > get rejected, and there is no issue there.
> > > > If the first is reject for some reason, the second one can get
> > > > accepted or rejected and do not need to worry about the first one.
> > > > 
> > > > 
> > > > Fan
> > > >   
> > >   
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2024-04-18 17:58 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
2024-03-25 19:02 ` [PATCH v6 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
2024-03-25 19:02 ` [PATCH v6 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
2024-03-25 19:02 ` [PATCH v6 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
2024-03-25 19:02 ` [PATCH v6 04/12] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
2024-03-25 19:02 ` [PATCH v6 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument nifan.cxl
2024-03-25 19:02 ` [PATCH v6 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
2024-04-05 10:58   ` Jonathan Cameron
2024-04-05 10:58     ` Jonathan Cameron via
2024-03-25 19:02 ` [PATCH v6 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
2024-04-05 11:08   ` Jonathan Cameron
2024-04-05 11:08     ` Jonathan Cameron via
2024-03-25 19:02 ` [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
2024-04-04 13:32   ` Jørgen Hansen
2024-04-05 11:12     ` Jonathan Cameron
2024-04-05 11:12       ` Jonathan Cameron via
2024-04-09 19:21     ` fan
2024-04-15 17:56     ` fan
2024-04-16 10:02       ` Jørgen Hansen
2024-04-16 16:27         ` fan
2024-04-15 18:00     ` fan
2024-04-05 11:39   ` Jonathan Cameron
2024-04-05 11:39     ` Jonathan Cameron via
2024-03-25 19:02 ` [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
2024-04-03 18:16   ` Gregory Price
2024-04-05 12:27     ` Jonathan Cameron
2024-04-05 12:27       ` Jonathan Cameron via
2024-04-05 16:07       ` Gregory Price
2024-04-05 17:44         ` Jonathan Cameron
2024-04-05 17:44           ` Jonathan Cameron via
2024-04-05 18:09           ` Gregory Price
2024-04-09 16:10             ` Jonathan Cameron
2024-04-09 16:10               ` Jonathan Cameron via
2024-04-05 12:18   ` Jonathan Cameron
2024-04-05 12:18     ` Jonathan Cameron via
2024-04-09 21:26     ` fan
2024-04-10 19:49       ` Jonathan Cameron
2024-04-10 19:49         ` Jonathan Cameron via
2024-04-15 20:06         ` fan
2024-04-16 14:58           ` Jonathan Cameron
2024-04-16 14:58             ` Jonathan Cameron via
2024-04-16 16:52             ` fan
2024-04-17 11:50               ` Jonathan Cameron
2024-04-17 11:50                 ` Jonathan Cameron via
2024-04-16 17:14             ` Gregory Price
2024-03-25 19:02 ` [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions nifan.cxl
2024-04-05 12:29   ` Jonathan Cameron
2024-04-05 12:29     ` Jonathan Cameron via
2024-04-12 22:54   ` Gregory Price
2024-04-15 17:37     ` fan
2024-04-16 15:00       ` Jonathan Cameron
2024-04-16 15:00         ` Jonathan Cameron via
2024-04-16 16:37         ` fan
2024-04-17 11:59           ` Jonathan Cameron
2024-04-17 11:59             ` Jonathan Cameron via
2024-04-18 17:58             ` Gregory Price
2024-04-16 17:15         ` Gregory Price
2024-03-25 19:02 ` [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
2024-04-05  9:57   ` Jørgen Hansen
2024-04-15 20:17     ` fan
2024-04-05 12:32   ` Jonathan Cameron
2024-04-05 12:32     ` Jonathan Cameron via
2024-03-25 19:02 ` [PATCH v6 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface nifan.cxl
2024-04-05 12:33   ` Jonathan Cameron
2024-04-05 12:33     ` Jonathan Cameron via

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.