* [PATCH v5 01/13] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
@ 2024-03-04 19:33 ` nifan.cxl
2024-03-06 15:07 ` Jonathan Cameron via
2024-03-04 19:33 ` [PATCH v5 02/13] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
` (11 subsequent siblings)
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:33 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
Based on CXL spec r3.1 Table 8-127 (Identify Memory Device Output
Payload), dynamic capacity event log size should be part of
output of the Identify command.
Add dc_event_log_size to the output payload for the host to get the info.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/cxl/cxl-mailbox-utils.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4bcd727f4c..ba1d9901df 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -21,6 +21,7 @@
#include "sysemu/hostmem.h"
#define CXL_CAPACITY_MULTIPLIER (256 * MiB)
+#define CXL_DC_EVENT_LOG_SIZE 8
/*
* How to add a new command, example. The command set FOO, with cmd BAR.
@@ -780,8 +781,9 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
uint16_t inject_poison_limit;
uint8_t poison_caps;
uint8_t qos_telemetry_caps;
+ uint16_t dc_event_log_size;
} QEMU_PACKED *id;
- QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
+ QEMU_BUILD_BUG_ON(sizeof(*id) != 0x45);
CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
@@ -807,6 +809,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
st24_le_p(id->poison_list_max_mer, 256);
/* No limit - so limited by main poison record limit */
stw_le_p(&id->inject_poison_limit, 0);
+ stw_le_p(&id->dc_event_log_size, CXL_DC_EVENT_LOG_SIZE);
*len_out = sizeof(*id);
return CXL_MBOX_SUCCESS;
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 01/13] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
2024-03-04 19:33 ` [PATCH v5 01/13] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
@ 2024-03-06 15:07 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 15:07 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:33:56 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Based on CXL spec r3.1 Table 8-127 (Identify Memory Device Output
> Payload), dynamic capacity event log size should be part of
> output of the Identify command.
> Add dc_event_log_size to the output payload for the host to get the info.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
I'm going to give tags on this mostly so I can easily see what I was happy
with on any future versions if we need them.
If I end up sending the code to Michael, a may squash them into
just a SoB.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 01/13] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
@ 2024-03-06 15:07 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 15:07 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:33:56 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Based on CXL spec r3.1 Table 8-127 (Identify Memory Device Output
> Payload), dynamic capacity event log size should be part of
> output of the Identify command.
> Add dc_event_log_size to the output payload for the host to get the info.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
I'm going to give tags on this mostly so I can easily see what I was happy
with on any future versions if we need them.
If I end up sending the code to Michael, a may squash them into
just a SoB.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 02/13] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
2024-03-04 19:33 ` [PATCH v5 01/13] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
@ 2024-03-04 19:33 ` nifan.cxl
2024-03-06 15:24 ` Jonathan Cameron via
2024-03-04 19:33 ` [PATCH v5 03/13] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
` (10 subsequent siblings)
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:33 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
Per cxl spec r3.1, add dynamic capacity region representative based on
Table 8-165 and extend the cxl type3 device definition to include dc region
information. Also, based on info in 8.2.9.9.9.1, add 'Get Dynamic Capacity
Configuration' mailbox support.
Note: we store region decode length as byte-wise length on the device, which
should be divided by 256 * MiB before being returned to the host
for "Get Dynamic Capacity Configuration" mailbox command per
specification.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/cxl/cxl-mailbox-utils.c | 99 +++++++++++++++++++++++++++++++++++++
include/hw/cxl/cxl_device.h | 16 ++++++
2 files changed, 115 insertions(+)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index ba1d9901df..5792010c12 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -22,6 +22,8 @@
#define CXL_CAPACITY_MULTIPLIER (256 * MiB)
#define CXL_DC_EVENT_LOG_SIZE 8
+#define CXL_NUM_EXTENTS_SUPPORTED 512
+#define CXL_NUM_TAGS_SUPPORTED 0
/*
* How to add a new command, example. The command set FOO, with cmd BAR.
@@ -80,6 +82,8 @@ enum {
#define GET_POISON_LIST 0x0
#define INJECT_POISON 0x1
#define CLEAR_POISON 0x2
+ DCD_CONFIG = 0x48,
+ #define GET_DC_CONFIG 0x0
PHYSICAL_SWITCH = 0x51,
#define IDENTIFY_SWITCH_DEVICE 0x0
#define GET_PHYSICAL_PORT_STATE 0x1
@@ -1238,6 +1242,91 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
return CXL_MBOX_SUCCESS;
}
+/*
+ * CXL r3.1 section 8.2.9.9.9.1: Get Dynamic Capacity Configuration
+ * (Opcode: 4800h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
+ uint8_t *payload_in,
+ size_t len_in,
+ uint8_t *payload_out,
+ size_t *len_out,
+ CXLCCI *cci)
+{
+ CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+ struct {
+ uint8_t region_cnt;
+ uint8_t start_region_id;
+ } QEMU_PACKED *in;
+ struct {
+ uint8_t num_regions;
+ uint8_t regions_returned;
+ uint8_t rsvd1[6];
+ struct {
+ uint64_t base;
+ uint64_t decode_len;
+ uint64_t region_len;
+ uint64_t block_size;
+ uint32_t dsmadhandle;
+ uint8_t flags;
+ uint8_t rsvd2[3];
+ } QEMU_PACKED records[];
+ } QEMU_PACKED *out;
+ struct {
+ uint32_t num_extents_supported;
+ uint32_t num_extents_available;
+ uint32_t num_tags_supported;
+ uint32_t num_tags_available;
+ } QEMU_PACKED *extra_out;
+ uint16_t record_count;
+ uint16_t i;
+ uint16_t out_pl_len;
+ uint8_t start_region_id;
+
+ in = (void *)payload_in;
+ out = (void *)payload_out;
+ start_region_id = in->start_region_id;
+ if (start_region_id >= ct3d->dc.num_regions) {
+ return CXL_MBOX_INVALID_INPUT;
+ }
+
+ record_count = MIN(ct3d->dc.num_regions - in->start_region_id,
+ in->region_cnt);
+
+ out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+ extra_out = (void *)(payload_out + out_pl_len);
+ out_pl_len += sizeof(*extra_out);
+ assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+ out->num_regions = ct3d->dc.num_regions;
+ out->regions_returned = record_count;
+ for (i = 0; i < record_count; i++) {
+ stq_le_p(&out->records[i].base,
+ ct3d->dc.regions[start_region_id + i].base);
+ stq_le_p(&out->records[i].decode_len,
+ ct3d->dc.regions[start_region_id + i].decode_len /
+ CXL_CAPACITY_MULTIPLIER);
+ stq_le_p(&out->records[i].region_len,
+ ct3d->dc.regions[start_region_id + i].len);
+ stq_le_p(&out->records[i].block_size,
+ ct3d->dc.regions[start_region_id + i].block_size);
+ stl_le_p(&out->records[i].dsmadhandle,
+ ct3d->dc.regions[start_region_id + i].dsmadhandle);
+ out->records[i].flags = ct3d->dc.regions[start_region_id + i].flags;
+ }
+ /*
+ * TODO: will assign proper values when extents and tags are introduced
+ * to use.
+ */
+ stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
+ stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
+ stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
+ stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
+
+ *len_out = out_pl_len;
+ return CXL_MBOX_SUCCESS;
+}
+
#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
#define IMMEDIATE_DATA_CHANGE (1 << 2)
#define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1282,6 +1371,11 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
cmd_media_clear_poison, 72, 0 },
};
+static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
+ [DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
+ cmd_dcd_get_dyn_cap_config, 2, 0 },
+};
+
static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
[INFOSTAT][IS_IDENTIFY] = { "IDENTIFY", cmd_infostat_identify, 0, 0 },
[INFOSTAT][BACKGROUND_OPERATION_STATUS] = { "BACKGROUND_OPERATION_STATUS",
@@ -1487,7 +1581,12 @@ void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, size_t payload_max)
{
+ CXLType3Dev *ct3d = CXL_TYPE3(d);
+
cxl_copy_cci_commands(cci, cxl_cmd_set);
+ if (ct3d->dc.num_regions) {
+ cxl_copy_cci_commands(cci, cxl_cmd_set_dcd);
+ }
cci->d = d;
/* No separation for PCI MB as protocol handled in PCI device */
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 3cf3077afa..93ce047b28 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -422,6 +422,17 @@ typedef struct CXLPoison {
typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
#define CXL_POISON_LIST_LIMIT 256
+#define DCD_MAX_NUM_REGION 8
+
+typedef struct CXLDCRegion {
+ uint64_t base; /* aligned to 256*MiB */
+ uint64_t decode_len; /* aligned to 256*MiB */
+ uint64_t len;
+ uint64_t block_size;
+ uint32_t dsmadhandle;
+ uint8_t flags;
+} CXLDCRegion;
+
struct CXLType3Dev {
/* Private */
PCIDevice parent_obj;
@@ -454,6 +465,11 @@ struct CXLType3Dev {
unsigned int poison_list_cnt;
bool poison_list_overflowed;
uint64_t poison_list_overflow_ts;
+
+ struct dynamic_capacity {
+ uint8_t num_regions; /* 0-8 regions */
+ CXLDCRegion regions[DCD_MAX_NUM_REGION];
+ } dc;
};
#define TYPE_CXL_TYPE3 "cxl-type3"
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 02/13] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
2024-03-04 19:33 ` [PATCH v5 02/13] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
@ 2024-03-06 15:24 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 15:24 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:33:57 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Per cxl spec r3.1, add dynamic capacity region representative based on
> Table 8-165 and extend the cxl type3 device definition to include dc region
> information. Also, based on info in 8.2.9.9.9.1, add 'Get Dynamic Capacity
> Configuration' mailbox support.
>
> Note: we store region decode length as byte-wise length on the device, which
> should be divided by 256 * MiB before being returned to the host
> for "Get Dynamic Capacity Configuration" mailbox command per
> specification.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Really minor nice to have type comments inline.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> hw/cxl/cxl-mailbox-utils.c | 99 +++++++++++++++++++++++++++++++++++++
> include/hw/cxl/cxl_device.h | 16 ++++++
> 2 files changed, 115 insertions(+)
>
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index ba1d9901df..5792010c12 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -22,6 +22,8 @@
>
> #define CXL_CAPACITY_MULTIPLIER (256 * MiB)
> #define CXL_DC_EVENT_LOG_SIZE 8
> +#define CXL_NUM_EXTENTS_SUPPORTED 512
> +#define CXL_NUM_TAGS_SUPPORTED 0
>
> /*
> * How to add a new command, example. The command set FOO, with cmd BAR.
> @@ -80,6 +82,8 @@ enum {
> #define GET_POISON_LIST 0x0
> #define INJECT_POISON 0x1
> #define CLEAR_POISON 0x2
> + DCD_CONFIG = 0x48,
> + #define GET_DC_CONFIG 0x0
> PHYSICAL_SWITCH = 0x51,
> #define IDENTIFY_SWITCH_DEVICE 0x0
> #define GET_PHYSICAL_PORT_STATE 0x1
> @@ -1238,6 +1242,91 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
> return CXL_MBOX_SUCCESS;
> }
>
> +/*
> + * CXL r3.1 section 8.2.9.9.9.1: Get Dynamic Capacity Configuration
> + * (Opcode: 4800h)
> + */
> +static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
> + uint8_t *payload_in,
> + size_t len_in,
> + uint8_t *payload_out,
> + size_t *len_out,
> + CXLCCI *cci)
> +{
> + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> + struct {
> + uint8_t region_cnt;
> + uint8_t start_region_id;
> + } QEMU_PACKED *in;
If you respin a few line breaks might help on readability.
I'd stick one after each struct.
> + struct {
> + uint8_t num_regions;
> + uint8_t regions_returned;
> + uint8_t rsvd1[6];
> + struct {
> + uint64_t base;
> + uint64_t decode_len;
> + uint64_t region_len;
> + uint64_t block_size;
> + uint32_t dsmadhandle;
> + uint8_t flags;
> + uint8_t rsvd2[3];
> + } QEMU_PACKED records[];
> + } QEMU_PACKED *out;
} QEMU_PACKED *out = (void *)payload_out;
(see below)
> + struct {
> + uint32_t num_extents_supported;
> + uint32_t num_extents_available;
> + uint32_t num_tags_supported;
> + uint32_t num_tags_available;
> + } QEMU_PACKED *extra_out;
> + uint16_t record_count;
> + uint16_t i;
> + uint16_t out_pl_len;
> + uint8_t start_region_id;
> +
> + in = (void *)payload_in;
> + out = (void *)payload_out;
These are a bit uninteresting so could just assign them at the definitions above?
> + start_region_id = in->start_region_id;
Perhaps something shorter like start_rid for the local variable?
> + if (start_region_id >= ct3d->dc.num_regions) {
> + return CXL_MBOX_INVALID_INPUT;
> + }
> +
> + record_count = MIN(ct3d->dc.num_regions - in->start_region_id,
> + in->region_cnt);
I'd align with just after opening bracket.
> +
> + out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> + extra_out = (void *)(payload_out + out_pl_len);
> + out_pl_len += sizeof(*extra_out);
> + assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> + out->num_regions = ct3d->dc.num_regions;
> + out->regions_returned = record_count;
> + for (i = 0; i < record_count; i++) {
> + stq_le_p(&out->records[i].base,
> + ct3d->dc.regions[start_region_id + i].base);
> + stq_le_p(&out->records[i].decode_len,
> + ct3d->dc.regions[start_region_id + i].decode_len /
> + CXL_CAPACITY_MULTIPLIER);
> + stq_le_p(&out->records[i].region_len,
> + ct3d->dc.regions[start_region_id + i].len);
> + stq_le_p(&out->records[i].block_size,
> + ct3d->dc.regions[start_region_id + i].block_size);
> + stl_le_p(&out->records[i].dsmadhandle,
> + ct3d->dc.regions[start_region_id + i].dsmadhandle);
> + out->records[i].flags = ct3d->dc.regions[start_region_id + i].flags;
> + }
> + /*
> + * TODO: will assign proper values when extents and tags are introduced
> + * to use.
Drop the to use
* TODO: Assign values once extents and tags are introduced.
> + */
> + stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
> + stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
> + stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
> + stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
> +
> + *len_out = out_pl_len;
> + return CXL_MBOX_SUCCESS;
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 02/13] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
@ 2024-03-06 15:24 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 15:24 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:33:57 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Per cxl spec r3.1, add dynamic capacity region representative based on
> Table 8-165 and extend the cxl type3 device definition to include dc region
> information. Also, based on info in 8.2.9.9.9.1, add 'Get Dynamic Capacity
> Configuration' mailbox support.
>
> Note: we store region decode length as byte-wise length on the device, which
> should be divided by 256 * MiB before being returned to the host
> for "Get Dynamic Capacity Configuration" mailbox command per
> specification.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Really minor nice to have type comments inline.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> hw/cxl/cxl-mailbox-utils.c | 99 +++++++++++++++++++++++++++++++++++++
> include/hw/cxl/cxl_device.h | 16 ++++++
> 2 files changed, 115 insertions(+)
>
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index ba1d9901df..5792010c12 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -22,6 +22,8 @@
>
> #define CXL_CAPACITY_MULTIPLIER (256 * MiB)
> #define CXL_DC_EVENT_LOG_SIZE 8
> +#define CXL_NUM_EXTENTS_SUPPORTED 512
> +#define CXL_NUM_TAGS_SUPPORTED 0
>
> /*
> * How to add a new command, example. The command set FOO, with cmd BAR.
> @@ -80,6 +82,8 @@ enum {
> #define GET_POISON_LIST 0x0
> #define INJECT_POISON 0x1
> #define CLEAR_POISON 0x2
> + DCD_CONFIG = 0x48,
> + #define GET_DC_CONFIG 0x0
> PHYSICAL_SWITCH = 0x51,
> #define IDENTIFY_SWITCH_DEVICE 0x0
> #define GET_PHYSICAL_PORT_STATE 0x1
> @@ -1238,6 +1242,91 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
> return CXL_MBOX_SUCCESS;
> }
>
> +/*
> + * CXL r3.1 section 8.2.9.9.9.1: Get Dynamic Capacity Configuration
> + * (Opcode: 4800h)
> + */
> +static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
> + uint8_t *payload_in,
> + size_t len_in,
> + uint8_t *payload_out,
> + size_t *len_out,
> + CXLCCI *cci)
> +{
> + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> + struct {
> + uint8_t region_cnt;
> + uint8_t start_region_id;
> + } QEMU_PACKED *in;
If you respin a few line breaks might help on readability.
I'd stick one after each struct.
> + struct {
> + uint8_t num_regions;
> + uint8_t regions_returned;
> + uint8_t rsvd1[6];
> + struct {
> + uint64_t base;
> + uint64_t decode_len;
> + uint64_t region_len;
> + uint64_t block_size;
> + uint32_t dsmadhandle;
> + uint8_t flags;
> + uint8_t rsvd2[3];
> + } QEMU_PACKED records[];
> + } QEMU_PACKED *out;
} QEMU_PACKED *out = (void *)payload_out;
(see below)
> + struct {
> + uint32_t num_extents_supported;
> + uint32_t num_extents_available;
> + uint32_t num_tags_supported;
> + uint32_t num_tags_available;
> + } QEMU_PACKED *extra_out;
> + uint16_t record_count;
> + uint16_t i;
> + uint16_t out_pl_len;
> + uint8_t start_region_id;
> +
> + in = (void *)payload_in;
> + out = (void *)payload_out;
These are a bit uninteresting so could just assign them at the definitions above?
> + start_region_id = in->start_region_id;
Perhaps something shorter like start_rid for the local variable?
> + if (start_region_id >= ct3d->dc.num_regions) {
> + return CXL_MBOX_INVALID_INPUT;
> + }
> +
> + record_count = MIN(ct3d->dc.num_regions - in->start_region_id,
> + in->region_cnt);
I'd align with just after opening bracket.
> +
> + out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> + extra_out = (void *)(payload_out + out_pl_len);
> + out_pl_len += sizeof(*extra_out);
> + assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> + out->num_regions = ct3d->dc.num_regions;
> + out->regions_returned = record_count;
> + for (i = 0; i < record_count; i++) {
> + stq_le_p(&out->records[i].base,
> + ct3d->dc.regions[start_region_id + i].base);
> + stq_le_p(&out->records[i].decode_len,
> + ct3d->dc.regions[start_region_id + i].decode_len /
> + CXL_CAPACITY_MULTIPLIER);
> + stq_le_p(&out->records[i].region_len,
> + ct3d->dc.regions[start_region_id + i].len);
> + stq_le_p(&out->records[i].block_size,
> + ct3d->dc.regions[start_region_id + i].block_size);
> + stl_le_p(&out->records[i].dsmadhandle,
> + ct3d->dc.regions[start_region_id + i].dsmadhandle);
> + out->records[i].flags = ct3d->dc.regions[start_region_id + i].flags;
> + }
> + /*
> + * TODO: will assign proper values when extents and tags are introduced
> + * to use.
Drop the to use
* TODO: Assign values once extents and tags are introduced.
> + */
> + stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
> + stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
> + stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
> + stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
> +
> + *len_out = out_pl_len;
> + return CXL_MBOX_SUCCESS;
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 03/13] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
2024-03-04 19:33 ` [PATCH v5 01/13] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
2024-03-04 19:33 ` [PATCH v5 02/13] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
@ 2024-03-04 19:33 ` nifan.cxl
2024-03-06 15:39 ` Jonathan Cameron via
2024-03-04 19:33 ` [PATCH v5 04/13] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
` (9 subsequent siblings)
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:33 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
pmem capacity, preparing for the introduction of dynamic capacity to support
dynamic capacity devices.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/cxl/cxl-mailbox-utils.c | 4 ++--
hw/mem/cxl_type3.c | 8 ++++----
include/hw/cxl/cxl_device.h | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 5792010c12..853dadba39 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -803,7 +803,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
stq_le_p(&id->total_capacity,
- cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER);
+ cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);
stq_le_p(&id->persistent_capacity,
cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
stq_le_p(&id->volatile_capacity,
@@ -1179,7 +1179,7 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
struct clear_poison_pl *in = (void *)payload_in;
dpa = ldq_le_p(&in->dpa);
- if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->mem_size) {
+ if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
return CXL_MBOX_INVALID_PA;
}
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index e8801805b9..244d2b5fd5 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -608,7 +608,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
}
address_space_init(&ct3d->hostvmem_as, vmr, v_name);
ct3d->cxl_dstate.vmem_size = memory_region_size(vmr);
- ct3d->cxl_dstate.mem_size += memory_region_size(vmr);
+ ct3d->cxl_dstate.static_mem_size += memory_region_size(vmr);
g_free(v_name);
}
@@ -631,7 +631,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
}
address_space_init(&ct3d->hostpmem_as, pmr, p_name);
ct3d->cxl_dstate.pmem_size = memory_region_size(pmr);
- ct3d->cxl_dstate.mem_size += memory_region_size(pmr);
+ ct3d->cxl_dstate.static_mem_size += memory_region_size(pmr);
g_free(p_name);
}
@@ -837,7 +837,7 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
return -EINVAL;
}
- if (*dpa_offset > ct3d->cxl_dstate.mem_size) {
+ if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
return -EINVAL;
}
@@ -1010,7 +1010,7 @@ static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
return false;
}
- if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.mem_size) {
+ if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
return false;
}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 93ce047b28..f82d018422 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -234,7 +234,7 @@ typedef struct cxl_device_state {
} timestamp;
/* memory region size, HDM */
- uint64_t mem_size;
+ uint64_t static_mem_size;
uint64_t pmem_size;
uint64_t vmem_size;
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 03/13] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
2024-03-04 19:33 ` [PATCH v5 03/13] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
@ 2024-03-06 15:39 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 15:39 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:33:58 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
> pmem capacity, preparing for the introduction of dynamic capacity to support
> dynamic capacity devices.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 03/13] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
@ 2024-03-06 15:39 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 15:39 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:33:58 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
> pmem capacity, preparing for the introduction of dynamic capacity to support
> dynamic capacity devices.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 04/13] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (2 preceding siblings ...)
2024-03-04 19:33 ` [PATCH v5 03/13] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
@ 2024-03-04 19:33 ` nifan.cxl
2024-03-06 15:48 ` Jonathan Cameron via
2024-03-04 19:34 ` [PATCH v5 05/13] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size insead of mr as argument nifan.cxl
` (8 subsequent siblings)
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:33 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
With the change, when setting up memory for type3 memory device, we can
create DC regions.
A property 'num-dc-regions' is added to ct3_props to allow users to pass the
number of DC regions to create. To make it easier, other region parameters
like region base, length, and block size are hard coded. If needed,
these parameters can be added easily.
With the change, we can create DC regions with proper kernel side
support like below:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
echo 0x40000000 > /sys/bus/cxl/devices/$region/size
echo "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/mem/cxl_type3.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 244d2b5fd5..a191211009 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -30,6 +30,7 @@
#include "hw/pci/msix.h"
#define DWORD_BYTE 4
+#define CXL_CAPACITY_MULTIPLIER (256 * MiB)
/* Default CDAT entries for a memory region */
enum {
@@ -567,6 +568,45 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
}
}
+/*
+ * TODO: dc region configuration will be updated once host backend and address
+ * space support is added for DCD.
+ */
+static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
+{
+ int i;
+ uint64_t region_base = 0;
+ uint64_t region_len = 2 * GiB;
+ uint64_t decode_len = 2 * GiB;
+ uint64_t blk_size = 2 * MiB;
+ CXLDCRegion *region;
+ MemoryRegion *mr;
+
+ if (ct3d->hostvmem) {
+ mr = host_memory_backend_get_memory(ct3d->hostvmem);
+ region_base += memory_region_size(mr);
+ }
+ if (ct3d->hostpmem) {
+ mr = host_memory_backend_get_memory(ct3d->hostpmem);
+ region_base += memory_region_size(mr);
+ }
+ assert(region_base % CXL_CAPACITY_MULTIPLIER == 0);
+
+ for (i = 0; i < ct3d->dc.num_regions; i++) {
+ region = &ct3d->dc.regions[i];
+ region->base = region_base;
+ region->decode_len = decode_len;
+ region->len = region_len;
+ region->block_size = blk_size;
+ /* dsmad_handle is set when creating cdat table entries */
+ region->flags = 0;
+
+ region_base += region->len;
+ }
+
+ return true;
+}
+
static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
{
DeviceState *ds = DEVICE(ct3d);
@@ -635,6 +675,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
g_free(p_name);
}
+ if (!cxl_create_dc_regions(ct3d, errp)) {
+ error_setg(errp, "setup DC regions failed");
+ return false;
+ }
+
return true;
}
@@ -930,6 +975,7 @@ static Property ct3_props[] = {
HostMemoryBackend *),
DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
+ DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
DEFINE_PROP_END_OF_LIST(),
};
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 04/13] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
2024-03-04 19:33 ` [PATCH v5 04/13] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
@ 2024-03-06 15:48 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 15:48 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:33:59 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> With the change, when setting up memory for type3 memory device, we can
> create DC regions.
> A property 'num-dc-regions' is added to ct3_props to allow users to pass the
> number of DC regions to create. To make it easier, other region parameters
> like region base, length, and block size are hard coded. If needed,
> these parameters can be added easily.
>
> With the change, we can create DC regions with proper kernel side
> support like below:
>
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
>
> echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
>
> echo 0x40000000 > /sys/bus/cxl/devices/$region/size
> echo "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Suggested changes are trivial formatting things
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> hw/mem/cxl_type3.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 244d2b5fd5..a191211009 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -30,6 +30,7 @@
> #include "hw/pci/msix.h"
>
> #define DWORD_BYTE 4
> +#define CXL_CAPACITY_MULTIPLIER (256 * MiB)
>
> /* Default CDAT entries for a memory region */
> enum {
> @@ -567,6 +568,45 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
> }
> }
>
> +/*
> + * TODO: dc region configuration will be updated once host backend and address
> + * space support is added for DCD.
> + */
> +static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> +{
> + int i;
> + uint64_t region_base = 0;
> + uint64_t region_len = 2 * GiB;
> + uint64_t decode_len = 2 * GiB;
> + uint64_t blk_size = 2 * MiB;
> + CXLDCRegion *region;
> + MemoryRegion *mr;
> +
> + if (ct3d->hostvmem) {
> + mr = host_memory_backend_get_memory(ct3d->hostvmem);
> + region_base += memory_region_size(mr);
> + }
> + if (ct3d->hostpmem) {
> + mr = host_memory_backend_get_memory(ct3d->hostpmem);
> + region_base += memory_region_size(mr);
> + }
> + assert(region_base % CXL_CAPACITY_MULTIPLIER == 0);
> +
> + for (i = 0; i < ct3d->dc.num_regions; i++) {
> + region = &ct3d->dc.regions[i];
> + region->base = region_base;
> + region->decode_len = decode_len;
> + region->len = region_len;
> + region->block_size = blk_size;
> + /* dsmad_handle is set when creating cdat table entries */
> + region->flags = 0;
> +
> + region_base += region->len;
Maybe make the loop update to do some or all of the variable updating
(perhaps all of them is a bit too complex!)
for (i = 0, region = &ct3d->dc_regions[0];
i < ct3d->dc.num_regions;
i++, region++, region_base += region_len) {
Also, using this style of assignment will avoid lots of repetition of region.
*region = (CXLDCRegion) {
.base = region_base,
.decode_len = decode_len,
.len = region_len,
.block_size = blk_size,
/* dsmad_handle set when creating CDAT table entries */
.flags = 0,
};
}
> + }
> +
> + return true;
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 04/13] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
@ 2024-03-06 15:48 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 15:48 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:33:59 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> With the change, when setting up memory for type3 memory device, we can
> create DC regions.
> A property 'num-dc-regions' is added to ct3_props to allow users to pass the
> number of DC regions to create. To make it easier, other region parameters
> like region base, length, and block size are hard coded. If needed,
> these parameters can be added easily.
>
> With the change, we can create DC regions with proper kernel side
> support like below:
>
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
>
> echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
>
> echo 0x40000000 > /sys/bus/cxl/devices/$region/size
> echo "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Suggested changes are trivial formatting things
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> hw/mem/cxl_type3.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 244d2b5fd5..a191211009 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -30,6 +30,7 @@
> #include "hw/pci/msix.h"
>
> #define DWORD_BYTE 4
> +#define CXL_CAPACITY_MULTIPLIER (256 * MiB)
>
> /* Default CDAT entries for a memory region */
> enum {
> @@ -567,6 +568,45 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
> }
> }
>
> +/*
> + * TODO: dc region configuration will be updated once host backend and address
> + * space support is added for DCD.
> + */
> +static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> +{
> + int i;
> + uint64_t region_base = 0;
> + uint64_t region_len = 2 * GiB;
> + uint64_t decode_len = 2 * GiB;
> + uint64_t blk_size = 2 * MiB;
> + CXLDCRegion *region;
> + MemoryRegion *mr;
> +
> + if (ct3d->hostvmem) {
> + mr = host_memory_backend_get_memory(ct3d->hostvmem);
> + region_base += memory_region_size(mr);
> + }
> + if (ct3d->hostpmem) {
> + mr = host_memory_backend_get_memory(ct3d->hostpmem);
> + region_base += memory_region_size(mr);
> + }
> + assert(region_base % CXL_CAPACITY_MULTIPLIER == 0);
> +
> + for (i = 0; i < ct3d->dc.num_regions; i++) {
> + region = &ct3d->dc.regions[i];
> + region->base = region_base;
> + region->decode_len = decode_len;
> + region->len = region_len;
> + region->block_size = blk_size;
> + /* dsmad_handle is set when creating cdat table entries */
> + region->flags = 0;
> +
> + region_base += region->len;
Maybe make the loop update to do some or all of the variable updating
(perhaps all of them is a bit too complex!)
for (i = 0, region = &ct3d->dc_regions[0];
i < ct3d->dc.num_regions;
i++, region++, region_base += region_len) {
Also, using this style of assignment will avoid lots of repetition of region.
*region = (CXLDCRegion) {
.base = region_base,
.decode_len = decode_len,
.len = region_len,
.block_size = blk_size,
/* dsmad_handle set when creating CDAT table entries */
.flags = 0,
};
}
> + }
> +
> + return true;
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 05/13] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size insead of mr as argument
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (3 preceding siblings ...)
2024-03-04 19:33 ` [PATCH v5 04/13] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
@ 2024-03-04 19:34 ` nifan.cxl
2024-03-06 16:02 ` Jonathan Cameron via
2024-03-06 16:03 ` Jonathan Cameron via
2024-03-04 19:34 ` [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
` (7 subsequent siblings)
12 siblings, 2 replies; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:34 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
The function ct3_build_cdat_entries_for_mr only uses size of the passed
memory region argument, refactor the function definition to make the passed
arguments more specific.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/mem/cxl_type3.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a191211009..c045fee32d 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -44,7 +44,7 @@ enum {
};
static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
- int dsmad_handle, MemoryRegion *mr,
+ int dsmad_handle, uint64_t size,
bool is_pmem, uint64_t dpa_base)
{
g_autofree CDATDsmas *dsmas = NULL;
@@ -63,7 +63,7 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
.DSMADhandle = dsmad_handle,
.flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
.DPA_base = dpa_base,
- .DPA_length = memory_region_size(mr),
+ .DPA_length = size,
};
/* For now, no memory side cache, plausiblish numbers */
@@ -132,7 +132,7 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
*/
.EFI_memory_type_attr = is_pmem ? 2 : 1,
.DPA_offset = 0,
- .DPA_length = memory_region_size(mr),
+ .DPA_length = size,
};
/* Header always at start of structure */
@@ -149,6 +149,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
g_autofree CDATSubHeader **table = NULL;
CXLType3Dev *ct3d = priv;
MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+ uint64_t vmr_size = 0, pmr_size = 0;
int dsmad_handle = 0;
int cur_ent = 0;
int len = 0;
@@ -163,6 +164,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
return -EINVAL;
}
len += CT3_CDAT_NUM_ENTRIES;
+ vmr_size = memory_region_size(volatile_mr);
}
if (ct3d->hostpmem) {
@@ -171,21 +173,22 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
return -EINVAL;
}
len += CT3_CDAT_NUM_ENTRIES;
+ pmr_size = memory_region_size(nonvolatile_mr);
}
table = g_malloc0(len * sizeof(*table));
/* Now fill them in */
if (volatile_mr) {
- ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
+ ct3_build_cdat_entries_for_mr(table, dsmad_handle++, vmr_size,
false, 0);
cur_ent = CT3_CDAT_NUM_ENTRIES;
}
if (nonvolatile_mr) {
- uint64_t base = volatile_mr ? memory_region_size(volatile_mr) : 0;
+ uint64_t base = vmr_size;
ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
- nonvolatile_mr, true, base);
+ pmr_size, true, base);
cur_ent += CT3_CDAT_NUM_ENTRIES;
}
assert(len == cur_ent);
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 05/13] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size insead of mr as argument
2024-03-04 19:34 ` [PATCH v5 05/13] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size insead of mr as argument nifan.cxl
@ 2024-03-06 16:02 ` Jonathan Cameron via
2024-03-06 16:03 ` Jonathan Cameron via
1 sibling, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 16:02 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:00 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> The function ct3_build_cdat_entries_for_mr only uses size of the passed
> memory region argument, refactor the function definition to make the passed
> arguments more specific.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 05/13] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size insead of mr as argument
@ 2024-03-06 16:02 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 16:02 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:00 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> The function ct3_build_cdat_entries_for_mr only uses size of the passed
> memory region argument, refactor the function definition to make the passed
> arguments more specific.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 05/13] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size insead of mr as argument
2024-03-04 19:34 ` [PATCH v5 05/13] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size insead of mr as argument nifan.cxl
@ 2024-03-06 16:03 ` Jonathan Cameron via
2024-03-06 16:03 ` Jonathan Cameron via
1 sibling, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 16:03 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:00 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> The function ct3_build_cdat_entries_for_mr only uses size of the passed
> memory region argument, refactor the function definition to make the passed
> arguments more specific.
Typo in title, instead
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 05/13] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size insead of mr as argument
@ 2024-03-06 16:03 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 16:03 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:00 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> The function ct3_build_cdat_entries_for_mr only uses size of the passed
> memory region argument, refactor the function definition to make the passed
> arguments more specific.
Typo in title, instead
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (4 preceding siblings ...)
2024-03-04 19:34 ` [PATCH v5 05/13] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size insead of mr as argument nifan.cxl
@ 2024-03-04 19:34 ` nifan.cxl
2024-03-06 16:28 ` Jonathan Cameron via
2024-03-04 19:34 ` [PATCH v5 07/13] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
` (6 subsequent siblings)
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:34 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
Add (file/memory backed) host backend, all the dynamic capacity regions
will share a single, large enough host backend. Set up address space for
DC regions to support read/write operations to dynamic capacity for DCD.
With the change, following supports are added:
1. Add a new property to type3 device "volatile-dc-memdev" to point to host
memory backend for dynamic capacity. Currently, all dc regions share one
host backend.
2. Add namespace for dynamic capacity for read/write support;
3. Create cdat entries for each dynamic capacity region;
4. Fix dvsec range registers to include DC regions.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/cxl/cxl-mailbox-utils.c | 16 ++-
hw/mem/cxl_type3.c | 189 +++++++++++++++++++++++++++++-------
include/hw/cxl/cxl_device.h | 4 +
3 files changed, 170 insertions(+), 39 deletions(-)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 853dadba39..8309f27a2b 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -622,7 +622,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
size_t *len_out,
CXLCCI *cci)
{
- CXLDeviceState *cxl_dstate = &CXL_TYPE3(cci->d)->cxl_dstate;
+ CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+ CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
struct {
uint8_t slots_supported;
uint8_t slot_info;
@@ -636,7 +637,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
if ((cxl_dstate->vmem_size < CXL_CAPACITY_MULTIPLIER) ||
- (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER)) {
+ (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) ||
+ (ct3d->dc.total_capacity < CXL_CAPACITY_MULTIPLIER)) {
return CXL_MBOX_INTERNAL_ERROR;
}
@@ -793,7 +795,8 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
- (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+ (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+ (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
return CXL_MBOX_INTERNAL_ERROR;
}
@@ -835,9 +838,11 @@ static CXLRetCode cmd_ccls_get_partition_info(const struct cxl_cmd *cmd,
uint64_t next_pmem;
} QEMU_PACKED *part_info = (void *)payload_out;
QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
+ CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
- (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+ (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+ (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
return CXL_MBOX_INTERNAL_ERROR;
}
@@ -1179,7 +1184,8 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
struct clear_poison_pl *in = (void *)payload_in;
dpa = ldq_le_p(&in->dpa);
- if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
+ if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size +
+ ct3d->dc.total_capacity) {
return CXL_MBOX_INVALID_PA;
}
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index c045fee32d..2b380a260b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -45,7 +45,8 @@ enum {
static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
int dsmad_handle, uint64_t size,
- bool is_pmem, uint64_t dpa_base)
+ bool is_pmem, bool is_dynamic,
+ uint64_t dpa_base)
{
g_autofree CDATDsmas *dsmas = NULL;
g_autofree CDATDslbis *dslbis0 = NULL;
@@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
.length = sizeof(*dsmas),
},
.DSMADhandle = dsmad_handle,
- .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
+ .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
+ (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
.DPA_base = dpa_base,
.DPA_length = size,
};
@@ -149,12 +151,13 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
g_autofree CDATSubHeader **table = NULL;
CXLType3Dev *ct3d = priv;
MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+ MemoryRegion *dc_mr = NULL;
uint64_t vmr_size = 0, pmr_size = 0;
int dsmad_handle = 0;
int cur_ent = 0;
int len = 0;
- if (!ct3d->hostpmem && !ct3d->hostvmem) {
+ if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions) {
return 0;
}
@@ -176,21 +179,55 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
pmr_size = memory_region_size(nonvolatile_mr);
}
+ if (ct3d->dc.num_regions) {
+ if (ct3d->dc.host_dc) {
+ dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+ if (!dc_mr) {
+ return -EINVAL;
+ }
+ len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
+ } else {
+ return -EINVAL;
+ }
+ }
+
table = g_malloc0(len * sizeof(*table));
/* Now fill them in */
if (volatile_mr) {
ct3_build_cdat_entries_for_mr(table, dsmad_handle++, vmr_size,
- false, 0);
+ false, false, 0);
cur_ent = CT3_CDAT_NUM_ENTRIES;
}
if (nonvolatile_mr) {
uint64_t base = vmr_size;
ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
- pmr_size, true, base);
+ pmr_size, true, false, base);
cur_ent += CT3_CDAT_NUM_ENTRIES;
}
+
+ if (dc_mr) {
+ int i;
+ uint64_t region_base = vmr_size + pmr_size;
+
+ /*
+ * TODO: we assume the dynamic capacity to be volatile for now,
+ * non-volatile dynamic capacity will be added if needed in the
+ * future.
+ */
+ for (i = 0; i < ct3d->dc.num_regions; i++) {
+ ct3_build_cdat_entries_for_mr(&(table[cur_ent]),
+ dsmad_handle++,
+ ct3d->dc.regions[i].len,
+ false, true, region_base);
+ ct3d->dc.regions[i].dsmadhandle = dsmad_handle - 1;
+
+ cur_ent += CT3_CDAT_NUM_ENTRIES;
+ region_base += ct3d->dc.regions[i].len;
+ }
+ }
+
assert(len == cur_ent);
*cdat_table = g_steal_pointer(&table);
@@ -300,11 +337,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
range2_size_hi = ct3d->hostpmem->size >> 32;
range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
(ct3d->hostpmem->size & 0xF0000000);
+ } else if (ct3d->dc.host_dc) {
+ range2_size_hi = ct3d->dc.host_dc->size >> 32;
+ range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+ (ct3d->dc.host_dc->size & 0xF0000000);
}
- } else {
+ } else if (ct3d->hostpmem) {
range1_size_hi = ct3d->hostpmem->size >> 32;
range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
(ct3d->hostpmem->size & 0xF0000000);
+ if (ct3d->dc.host_dc) {
+ range2_size_hi = ct3d->dc.host_dc->size >> 32;
+ range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+ (ct3d->dc.host_dc->size & 0xF0000000);
+ }
+ } else {
+ range1_size_hi = ct3d->dc.host_dc->size >> 32;
+ range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+ (ct3d->dc.host_dc->size & 0xF0000000);
}
dvsec = (uint8_t *)&(CXLDVSECDevice){
@@ -579,11 +629,27 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
{
int i;
uint64_t region_base = 0;
- uint64_t region_len = 2 * GiB;
- uint64_t decode_len = 2 * GiB;
+ uint64_t region_len;
+ uint64_t decode_len;
uint64_t blk_size = 2 * MiB;
CXLDCRegion *region;
MemoryRegion *mr;
+ uint64_t dc_size;
+
+ mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+ dc_size = memory_region_size(mr);
+ region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
+
+ if (region_len * ct3d->dc.num_regions > dc_size) {
+ error_setg(errp, "host backend size must be multiples of region len");
+ return false;
+ }
+ if (region_len % CXL_CAPACITY_MULTIPLIER != 0) {
+ error_setg(errp, "DC region size is unaligned to %lx",
+ CXL_CAPACITY_MULTIPLIER);
+ return false;
+ }
+ decode_len = region_len;
if (ct3d->hostvmem) {
mr = host_memory_backend_get_memory(ct3d->hostvmem);
@@ -605,6 +671,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
region->flags = 0;
region_base += region->len;
+ ct3d->dc.total_capacity += region->len;
}
return true;
@@ -614,7 +681,8 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
{
DeviceState *ds = DEVICE(ct3d);
- if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
+ if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem
+ && !ct3d->dc.num_regions) {
error_setg(errp, "at least one memdev property must be set");
return false;
} else if (ct3d->hostmem && ct3d->hostpmem) {
@@ -678,9 +746,41 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
g_free(p_name);
}
- if (!cxl_create_dc_regions(ct3d, errp)) {
- error_setg(errp, "setup DC regions failed");
- return false;
+ ct3d->dc.total_capacity = 0;
+ if (ct3d->dc.num_regions) {
+ MemoryRegion *dc_mr;
+ char *dc_name;
+
+ if (!ct3d->dc.host_dc) {
+ error_setg(errp, "dynamic capacity must have a backing device");
+ return false;
+ }
+
+ dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+ if (!dc_mr) {
+ error_setg(errp, "dynamic capacity must have a backing device");
+ return false;
+ }
+
+ /*
+ * TODO: set dc as volatile for now, non-volatile support can be added
+ * in the future if needed.
+ */
+ memory_region_set_nonvolatile(dc_mr, false);
+ memory_region_set_enabled(dc_mr, true);
+ host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
+ if (ds->id) {
+ dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
+ } else {
+ dc_name = g_strdup("cxl-dcd-dpa-dc-space");
+ }
+ address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
+ g_free(dc_name);
+
+ if (!cxl_create_dc_regions(ct3d, errp)) {
+ error_setg(errp, "setup DC regions failed");
+ return false;
+ }
}
return true;
@@ -772,6 +872,9 @@ err_release_cdat:
err_free_special_ops:
g_free(regs->special_ops);
err_address_space_free:
+ if (ct3d->dc.host_dc) {
+ address_space_destroy(&ct3d->dc.host_dc_as);
+ }
if (ct3d->hostpmem) {
address_space_destroy(&ct3d->hostpmem_as);
}
@@ -790,6 +893,9 @@ static void ct3_exit(PCIDevice *pci_dev)
pcie_aer_exit(pci_dev);
cxl_doe_cdat_release(cxl_cstate);
g_free(regs->special_ops);
+ if (ct3d->dc.host_dc) {
+ address_space_destroy(&ct3d->dc.host_dc_as);
+ }
if (ct3d->hostpmem) {
address_space_destroy(&ct3d->hostpmem_as);
}
@@ -868,16 +974,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
AddressSpace **as,
uint64_t *dpa_offset)
{
- MemoryRegion *vmr = NULL, *pmr = NULL;
+ MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
+ uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
if (ct3d->hostvmem) {
vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+ vmr_size = memory_region_size(vmr);
}
if (ct3d->hostpmem) {
pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+ pmr_size = memory_region_size(pmr);
+ }
+ if (ct3d->dc.host_dc) {
+ dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+ /* Do we want dc_size to be dc_mr->size or not?? */
+ dc_size = ct3d->dc.total_capacity;
}
- if (!vmr && !pmr) {
+ if (!vmr && !pmr && !dc_mr) {
return -ENODEV;
}
@@ -885,19 +999,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
return -EINVAL;
}
- if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
+ if (*dpa_offset >= vmr_size + pmr_size + dc_size) {
return -EINVAL;
}
- if (vmr) {
- if (*dpa_offset < memory_region_size(vmr)) {
- *as = &ct3d->hostvmem_as;
- } else {
- *as = &ct3d->hostpmem_as;
- *dpa_offset -= memory_region_size(vmr);
- }
- } else {
+ if (*dpa_offset < vmr_size) {
+ *as = &ct3d->hostvmem_as;
+ } else if (*dpa_offset < vmr_size + pmr_size) {
*as = &ct3d->hostpmem_as;
+ *dpa_offset -= vmr_size;
+ } else {
+ *as = &ct3d->dc.host_dc_as;
+ *dpa_offset -= (vmr_size + pmr_size);
}
return 0;
@@ -979,6 +1092,8 @@ static Property ct3_props[] = {
DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
+ DEFINE_PROP_LINK("volatile-dc-memdev", CXLType3Dev, dc.host_dc,
+ TYPE_MEMORY_BACKEND, HostMemoryBackend *),
DEFINE_PROP_END_OF_LIST(),
};
@@ -1045,33 +1160,39 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
{
- MemoryRegion *vmr = NULL, *pmr = NULL;
+ MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
AddressSpace *as;
+ uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
if (ct3d->hostvmem) {
vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+ vmr_size = memory_region_size(vmr);
}
if (ct3d->hostpmem) {
pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+ pmr_size = memory_region_size(pmr);
}
+ if (ct3d->dc.host_dc) {
+ dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+ dc_size = ct3d->dc.total_capacity;
+ }
- if (!vmr && !pmr) {
+ if (!vmr && !pmr && !dc_mr) {
return false;
}
- if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
+ if (dpa_offset + CXL_CACHE_LINE_SIZE > vmr_size + pmr_size + dc_size) {
return false;
}
- if (vmr) {
- if (dpa_offset < memory_region_size(vmr)) {
- as = &ct3d->hostvmem_as;
- } else {
- as = &ct3d->hostpmem_as;
- dpa_offset -= memory_region_size(vmr);
- }
- } else {
+ if (dpa_offset < vmr_size) {
+ as = &ct3d->hostvmem_as;
+ } else if (dpa_offset < vmr_size + pmr_size) {
as = &ct3d->hostpmem_as;
+ dpa_offset -= vmr_size;
+ } else {
+ as = &ct3d->dc.host_dc_as;
+ dpa_offset -= (vmr_size + pmr_size);
}
address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data,
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index f82d018422..265679302c 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -467,6 +467,10 @@ struct CXLType3Dev {
uint64_t poison_list_overflow_ts;
struct dynamic_capacity {
+ HostMemoryBackend *host_dc;
+ AddressSpace host_dc_as;
+ uint64_t total_capacity; /* 256M aligned */
+
uint8_t num_regions; /* 0-8 regions */
CXLDCRegion regions[DCD_MAX_NUM_REGION];
} dc;
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
2024-03-04 19:34 ` [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2024-03-06 16:28 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 16:28 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:01 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Add (file/memory backed) host backend, all the dynamic capacity regions
> will share a single, large enough host backend. Set up address space for
> DC regions to support read/write operations to dynamic capacity for DCD.
>
> With the change, following supports are added:
> 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
> memory backend for dynamic capacity. Currently, all dc regions share one
> host backend.
> 2. Add namespace for dynamic capacity for read/write support;
> 3. Create cdat entries for each dynamic capacity region;
> 4. Fix dvsec range registers to include DC regions.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Hi Fan,
This one has a few more significant comments inline.
thanks,
Jonathan
> ---
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c045fee32d..2b380a260b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -45,7 +45,8 @@ enum {
>
> static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> int dsmad_handle, uint64_t size,
> - bool is_pmem, uint64_t dpa_base)
> + bool is_pmem, bool is_dynamic,
> + uint64_t dpa_base)
> {
> g_autofree CDATDsmas *dsmas = NULL;
> g_autofree CDATDslbis *dslbis0 = NULL;
There is a fixlet going through for these as the autofree doesn't do anything.
Will require a rebase. I'll do it on my tree, but might not push that out for a
few days so this is just a heads up for anyone using these.
https://lore.kernel.org/qemu-devel/20240304104406.59855-1-thuth@redhat.com/
It went in clean for me, so may not even be something anyone notices!
> @@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> .length = sizeof(*dsmas),
> },
> .DSMADhandle = dsmad_handle,
> - .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> + .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> + (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
> .DPA_base = dpa_base,
> .DPA_length = size,
> };
> @@ -149,12 +151,13 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> g_autofree CDATSubHeader **table = NULL;
>
>
> @@ -176,21 +179,55 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> pmr_size = memory_region_size(nonvolatile_mr);
> }
>
> + if (ct3d->dc.num_regions) {
> + if (ct3d->dc.host_dc) {
> + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> + if (!dc_mr) {
> + return -EINVAL;
> + }
> + len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
> + } else {
> + return -EINVAL;
Flip logic to get the error out the way first and reduce indent.
if (ct3d->dc.num_regions) {
if (!ct3d->dc.host_dc) {
return -EINVAL;
}
dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
if (!dc_mr) {
return -EINVAL;
}
len += CT3...
}
> + }
> + }
> +
>
> *cdat_table = g_steal_pointer(&table);
> @@ -300,11 +337,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
> range2_size_hi = ct3d->hostpmem->size >> 32;
> range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> (ct3d->hostpmem->size & 0xF0000000);
> + } else if (ct3d->dc.host_dc) {
> + range2_size_hi = ct3d->dc.host_dc->size >> 32;
> + range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> + (ct3d->dc.host_dc->size & 0xF0000000);
> }
> - } else {
> + } else if (ct3d->hostpmem) {
> range1_size_hi = ct3d->hostpmem->size >> 32;
> range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> (ct3d->hostpmem->size & 0xF0000000);
> + if (ct3d->dc.host_dc) {
> + range2_size_hi = ct3d->dc.host_dc->size >> 32;
> + range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> + (ct3d->dc.host_dc->size & 0xF0000000);
> + }
> + } else {
> + range1_size_hi = ct3d->dc.host_dc->size >> 32;
> + range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> + (ct3d->dc.host_dc->size & 0xF0000000);
I've forgotten if we ever closed out on the right thing to do
with the legacy range registers. Maybe, just ignoring DC is the
right option for now? So I'd drop this block of changes.
Maybe Linux will do the wrong thing if we do, but then we should
make Linux more flexible on this.
If we did get a clarification that this is the right way to go
then add a note here.
> }
>
> dvsec = (uint8_t *)&(CXLDVSECDevice){
> @@ -579,11 +629,27 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> {
> int i;
> uint64_t region_base = 0;
> - uint64_t region_len = 2 * GiB;
> - uint64_t decode_len = 2 * GiB;
> + uint64_t region_len;
> + uint64_t decode_len;
> uint64_t blk_size = 2 * MiB;
> CXLDCRegion *region;
> MemoryRegion *mr;
> + uint64_t dc_size;
> +
> + mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> + dc_size = memory_region_size(mr);
> + region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
> +
> + if (region_len * ct3d->dc.num_regions > dc_size) {
This check had me scratching my head for a minute.
Why not just check
if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLER) != 0) {
error_setg(errp, "host backend must by a multiple of 256MiB and region len);
return false;
}
> + error_setg(errp, "host backend size must be multiples of region len");
> + return false;
> + }
> + if (region_len % CXL_CAPACITY_MULTIPLIER != 0) {
> + error_setg(errp, "DC region size is unaligned to %lx",
> + CXL_CAPACITY_MULTIPLIER);
> + return false;
> + }
> + decode_len = region_len;
> @@ -868,16 +974,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> AddressSpace **as,
> uint64_t *dpa_offset)
> {
> - MemoryRegion *vmr = NULL, *pmr = NULL;
> + MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> + uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
>
> if (ct3d->hostvmem) {
> vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> + vmr_size = memory_region_size(vmr);
> }
> if (ct3d->hostpmem) {
> pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> + pmr_size = memory_region_size(pmr);
> + }
> + if (ct3d->dc.host_dc) {
> + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> + /* Do we want dc_size to be dc_mr->size or not?? */
Maybe - definitely don't want to leave this comment here
unanswered and I think you enforce it above anyway.
So if we get here ct3d->dc.total_capacity == memory_region_size(ct3d->dc.host_dc);
As such we could just not stash total_capacity at all?
> + dc_size = ct3d->dc.total_capacity;
> }
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
@ 2024-03-06 16:28 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 16:28 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:01 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Add (file/memory backed) host backend, all the dynamic capacity regions
> will share a single, large enough host backend. Set up address space for
> DC regions to support read/write operations to dynamic capacity for DCD.
>
> With the change, following supports are added:
> 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
> memory backend for dynamic capacity. Currently, all dc regions share one
> host backend.
> 2. Add namespace for dynamic capacity for read/write support;
> 3. Create cdat entries for each dynamic capacity region;
> 4. Fix dvsec range registers to include DC regions.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Hi Fan,
This one has a few more significant comments inline.
thanks,
Jonathan
> ---
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c045fee32d..2b380a260b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -45,7 +45,8 @@ enum {
>
> static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> int dsmad_handle, uint64_t size,
> - bool is_pmem, uint64_t dpa_base)
> + bool is_pmem, bool is_dynamic,
> + uint64_t dpa_base)
> {
> g_autofree CDATDsmas *dsmas = NULL;
> g_autofree CDATDslbis *dslbis0 = NULL;
There is a fixlet going through for these as the autofree doesn't do anything.
Will require a rebase. I'll do it on my tree, but might not push that out for a
few days so this is just a heads up for anyone using these.
https://lore.kernel.org/qemu-devel/20240304104406.59855-1-thuth@redhat.com/
It went in clean for me, so may not even be something anyone notices!
> @@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> .length = sizeof(*dsmas),
> },
> .DSMADhandle = dsmad_handle,
> - .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> + .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> + (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
> .DPA_base = dpa_base,
> .DPA_length = size,
> };
> @@ -149,12 +151,13 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> g_autofree CDATSubHeader **table = NULL;
>
>
> @@ -176,21 +179,55 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> pmr_size = memory_region_size(nonvolatile_mr);
> }
>
> + if (ct3d->dc.num_regions) {
> + if (ct3d->dc.host_dc) {
> + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> + if (!dc_mr) {
> + return -EINVAL;
> + }
> + len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
> + } else {
> + return -EINVAL;
Flip logic to get the error out the way first and reduce indent.
if (ct3d->dc.num_regions) {
if (!ct3d->dc.host_dc) {
return -EINVAL;
}
dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
if (!dc_mr) {
return -EINVAL;
}
len += CT3...
}
> + }
> + }
> +
>
> *cdat_table = g_steal_pointer(&table);
> @@ -300,11 +337,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
> range2_size_hi = ct3d->hostpmem->size >> 32;
> range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> (ct3d->hostpmem->size & 0xF0000000);
> + } else if (ct3d->dc.host_dc) {
> + range2_size_hi = ct3d->dc.host_dc->size >> 32;
> + range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> + (ct3d->dc.host_dc->size & 0xF0000000);
> }
> - } else {
> + } else if (ct3d->hostpmem) {
> range1_size_hi = ct3d->hostpmem->size >> 32;
> range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> (ct3d->hostpmem->size & 0xF0000000);
> + if (ct3d->dc.host_dc) {
> + range2_size_hi = ct3d->dc.host_dc->size >> 32;
> + range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> + (ct3d->dc.host_dc->size & 0xF0000000);
> + }
> + } else {
> + range1_size_hi = ct3d->dc.host_dc->size >> 32;
> + range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> + (ct3d->dc.host_dc->size & 0xF0000000);
I've forgotten if we ever closed out on the right thing to do
with the legacy range registers. Maybe, just ignoring DC is the
right option for now? So I'd drop this block of changes.
Maybe Linux will do the wrong thing if we do, but then we should
make Linux more flexible on this.
If we did get a clarification that this is the right way to go
then add a note here.
> }
>
> dvsec = (uint8_t *)&(CXLDVSECDevice){
> @@ -579,11 +629,27 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> {
> int i;
> uint64_t region_base = 0;
> - uint64_t region_len = 2 * GiB;
> - uint64_t decode_len = 2 * GiB;
> + uint64_t region_len;
> + uint64_t decode_len;
> uint64_t blk_size = 2 * MiB;
> CXLDCRegion *region;
> MemoryRegion *mr;
> + uint64_t dc_size;
> +
> + mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> + dc_size = memory_region_size(mr);
> + region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
> +
> + if (region_len * ct3d->dc.num_regions > dc_size) {
This check had me scratching my head for a minute.
Why not just check
if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLER) != 0) {
error_setg(errp, "host backend must by a multiple of 256MiB and region len);
return false;
}
> + error_setg(errp, "host backend size must be multiples of region len");
> + return false;
> + }
> + if (region_len % CXL_CAPACITY_MULTIPLIER != 0) {
> + error_setg(errp, "DC region size is unaligned to %lx",
> + CXL_CAPACITY_MULTIPLIER);
> + return false;
> + }
> + decode_len = region_len;
> @@ -868,16 +974,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> AddressSpace **as,
> uint64_t *dpa_offset)
> {
> - MemoryRegion *vmr = NULL, *pmr = NULL;
> + MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> + uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
>
> if (ct3d->hostvmem) {
> vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> + vmr_size = memory_region_size(vmr);
> }
> if (ct3d->hostpmem) {
> pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> + pmr_size = memory_region_size(pmr);
> + }
> + if (ct3d->dc.host_dc) {
> + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> + /* Do we want dc_size to be dc_mr->size or not?? */
Maybe - definitely don't want to leave this comment here
unanswered and I think you enforce it above anyway.
So if we get here ct3d->dc.total_capacity == memory_region_size(ct3d->dc.host_dc);
As such we could just not stash total_capacity at all?
> + dc_size = ct3d->dc.total_capacity;
> }
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
2024-03-06 16:28 ` Jonathan Cameron via
(?)
@ 2024-03-06 19:14 ` fan
2024-03-07 12:16 ` Jonathan Cameron via
-1 siblings, 1 reply; 81+ messages in thread
From: fan @ 2024-03-06 19:14 UTC (permalink / raw)
To: Jonathan Cameron
Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Mar 06, 2024 at 04:28:16PM +0000, Jonathan Cameron wrote:
> On Mon, 4 Mar 2024 11:34:01 -0800
> nifan.cxl@gmail.com wrote:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > Add (file/memory backed) host backend, all the dynamic capacity regions
> > will share a single, large enough host backend. Set up address space for
> > DC regions to support read/write operations to dynamic capacity for DCD.
> >
> > With the change, following supports are added:
> > 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
> > memory backend for dynamic capacity. Currently, all dc regions share one
> > host backend.
> > 2. Add namespace for dynamic capacity for read/write support;
> > 3. Create cdat entries for each dynamic capacity region;
> > 4. Fix dvsec range registers to include DC regions.
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> Hi Fan,
>
> This one has a few more significant comments inline.
>
> thanks,
>
> Jonathan
>
> > ---
Hi Jonathan,
Thanks for the review. See below,
>
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index c045fee32d..2b380a260b 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -45,7 +45,8 @@ enum {
> >
> > static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> > int dsmad_handle, uint64_t size,
> > - bool is_pmem, uint64_t dpa_base)
> > + bool is_pmem, bool is_dynamic,
> > + uint64_t dpa_base)
> > {
> > g_autofree CDATDsmas *dsmas = NULL;
> > g_autofree CDATDslbis *dslbis0 = NULL;
>
> There is a fixlet going through for these as the autofree doesn't do anything.
> Will require a rebase. I'll do it on my tree, but might not push that out for a
> few days so this is just a heads up for anyone using these.
>
> https://lore.kernel.org/qemu-devel/20240304104406.59855-1-thuth@redhat.com/
>
> It went in clean for me, so may not even be something anyone notices!
>
OK. So I will not rebase for v6 until there is a break.
> > @@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> > .length = sizeof(*dsmas),
> > },
> > .DSMADhandle = dsmad_handle,
> > - .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> > + .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> > + (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
> > .DPA_base = dpa_base,
> > .DPA_length = size,
> > };
> > @@ -149,12 +151,13 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> > g_autofree CDATSubHeader **table = NULL;
> >
> >
> > @@ -176,21 +179,55 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> > pmr_size = memory_region_size(nonvolatile_mr);
> > }
> >
> > + if (ct3d->dc.num_regions) {
> > + if (ct3d->dc.host_dc) {
> > + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > + if (!dc_mr) {
> > + return -EINVAL;
> > + }
> > + len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
> > + } else {
> > + return -EINVAL;
>
> Flip logic to get the error out the way first and reduce indent.
>
> if (ct3d->dc.num_regions) {
> if (!ct3d->dc.host_dc) {
> return -EINVAL;
> }
> dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> if (!dc_mr) {
> return -EINVAL;
> }
> len += CT3...
> }
Will do.
>
> > + }
> > + }
> > +
>
> >
> > *cdat_table = g_steal_pointer(&table);
> > @@ -300,11 +337,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
> > range2_size_hi = ct3d->hostpmem->size >> 32;
> > range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > (ct3d->hostpmem->size & 0xF0000000);
> > + } else if (ct3d->dc.host_dc) {
> > + range2_size_hi = ct3d->dc.host_dc->size >> 32;
> > + range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > + (ct3d->dc.host_dc->size & 0xF0000000);
> > }
> > - } else {
> > + } else if (ct3d->hostpmem) {
> > range1_size_hi = ct3d->hostpmem->size >> 32;
> > range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > (ct3d->hostpmem->size & 0xF0000000);
> > + if (ct3d->dc.host_dc) {
> > + range2_size_hi = ct3d->dc.host_dc->size >> 32;
> > + range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > + (ct3d->dc.host_dc->size & 0xF0000000);
> > + }
> > + } else {
> > + range1_size_hi = ct3d->dc.host_dc->size >> 32;
> > + range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > + (ct3d->dc.host_dc->size & 0xF0000000);
>
> I've forgotten if we ever closed out on the right thing to do
> with the legacy range registers. Maybe, just ignoring DC is the
> right option for now? So I'd drop this block of changes.
> Maybe Linux will do the wrong thing if we do, but then we should
> make Linux more flexible on this.
>
> If we did get a clarification that this is the right way to go
> then add a note here.
>
OK. Will drop the changes here.
>
> > }
> >
> > dvsec = (uint8_t *)&(CXLDVSECDevice){
> > @@ -579,11 +629,27 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> > {
> > int i;
> > uint64_t region_base = 0;
> > - uint64_t region_len = 2 * GiB;
> > - uint64_t decode_len = 2 * GiB;
> > + uint64_t region_len;
> > + uint64_t decode_len;
> > uint64_t blk_size = 2 * MiB;
> > CXLDCRegion *region;
> > MemoryRegion *mr;
> > + uint64_t dc_size;
> > +
> > + mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > + dc_size = memory_region_size(mr);
> > + region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
> > +
> > + if (region_len * ct3d->dc.num_regions > dc_size) {
> This check had me scratching my head for a minute.
> Why not just check
>
> if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLER) != 0) {
> error_setg(errp, "host backend must by a multiple of 256MiB and region len);
> return false;
Your way is more straightforward, will follow your suggestion.
> }
> > + error_setg(errp, "host backend size must be multiples of region len");
> > + return false;
> > + }
> > + if (region_len % CXL_CAPACITY_MULTIPLIER != 0) {
> > + error_setg(errp, "DC region size is unaligned to %lx",
> > + CXL_CAPACITY_MULTIPLIER);
> > + return false;
> > + }
> > + decode_len = region_len;
>
>
>
>
> > @@ -868,16 +974,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> > AddressSpace **as,
> > uint64_t *dpa_offset)
> > {
> > - MemoryRegion *vmr = NULL, *pmr = NULL;
> > + MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> > + uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
> >
> > if (ct3d->hostvmem) {
> > vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> > + vmr_size = memory_region_size(vmr);
> > }
> > if (ct3d->hostpmem) {
> > pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> > + pmr_size = memory_region_size(pmr);
> > + }
> > + if (ct3d->dc.host_dc) {
> > + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > + /* Do we want dc_size to be dc_mr->size or not?? */
>
> Maybe - definitely don't want to leave this comment here
> unanswered and I think you enforce it above anyway.
>
> So if we get here ct3d->dc.total_capacity == memory_region_size(ct3d->dc.host_dc);
> As such we could just not stash total_capacity at all?
I cannot identify a case where these two will be different. But
total_capacity is referenced at quite some places, it may be nice to have
it so we do not need to call the function to get the value every time?
Fan
>
>
> > + dc_size = ct3d->dc.total_capacity;
> > }
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
2024-03-06 19:14 ` fan
@ 2024-03-07 12:16 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-07 12:16 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
> > > @@ -868,16 +974,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> > > AddressSpace **as,
> > > uint64_t *dpa_offset)
> > > {
> > > - MemoryRegion *vmr = NULL, *pmr = NULL;
> > > + MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> > > + uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
> > >
> > > if (ct3d->hostvmem) {
> > > vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> > > + vmr_size = memory_region_size(vmr);
> > > }
> > > if (ct3d->hostpmem) {
> > > pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> > > + pmr_size = memory_region_size(pmr);
> > > + }
> > > + if (ct3d->dc.host_dc) {
> > > + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > > + /* Do we want dc_size to be dc_mr->size or not?? */
> >
> > Maybe - definitely don't want to leave this comment here
> > unanswered and I think you enforce it above anyway.
> >
> > So if we get here ct3d->dc.total_capacity == memory_region_size(ct3d->dc.host_dc);
> > As such we could just not stash total_capacity at all?
>
> I cannot identify a case where these two will be different. But
> total_capacity is referenced at quite some places, it may be nice to have
> it so we do not need to call the function to get the value every time?
I kind of like having it via one path so that there is no confusion
for the reader, but up to you on this one. The function called is trivial
(other than some magic to handle very large memory regions) so
this is just a readability question, not a perf one.
Whatever, don't leave the question behind. Find to have something
that says they are always the same size if you don't get rid
of the total_capacity representation.
Jonathan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
@ 2024-03-07 12:16 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-07 12:16 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
> > > @@ -868,16 +974,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> > > AddressSpace **as,
> > > uint64_t *dpa_offset)
> > > {
> > > - MemoryRegion *vmr = NULL, *pmr = NULL;
> > > + MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> > > + uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
> > >
> > > if (ct3d->hostvmem) {
> > > vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> > > + vmr_size = memory_region_size(vmr);
> > > }
> > > if (ct3d->hostpmem) {
> > > pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> > > + pmr_size = memory_region_size(pmr);
> > > + }
> > > + if (ct3d->dc.host_dc) {
> > > + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > > + /* Do we want dc_size to be dc_mr->size or not?? */
> >
> > Maybe - definitely don't want to leave this comment here
> > unanswered and I think you enforce it above anyway.
> >
> > So if we get here ct3d->dc.total_capacity == memory_region_size(ct3d->dc.host_dc);
> > As such we could just not stash total_capacity at all?
>
> I cannot identify a case where these two will be different. But
> total_capacity is referenced at quite some places, it may be nice to have
> it so we do not need to call the function to get the value every time?
I kind of like having it via one path so that there is no confusion
for the reader, but up to you on this one. The function called is trivial
(other than some magic to handle very large memory regions) so
this is just a readability question, not a perf one.
Whatever, don't leave the question behind. Find to have something
that says they are always the same size if you don't get rid
of the total_capacity representation.
Jonathan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
2024-03-07 12:16 ` Jonathan Cameron via
(?)
@ 2024-03-07 23:34 ` fan
-1 siblings, 0 replies; 81+ messages in thread
From: fan @ 2024-03-07 23:34 UTC (permalink / raw)
To: Jonathan Cameron
Cc: fan, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Thu, Mar 07, 2024 at 12:16:05PM +0000, Jonathan Cameron wrote:
> > > > @@ -868,16 +974,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> > > > AddressSpace **as,
> > > > uint64_t *dpa_offset)
> > > > {
> > > > - MemoryRegion *vmr = NULL, *pmr = NULL;
> > > > + MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> > > > + uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
> > > >
> > > > if (ct3d->hostvmem) {
> > > > vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> > > > + vmr_size = memory_region_size(vmr);
> > > > }
> > > > if (ct3d->hostpmem) {
> > > > pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> > > > + pmr_size = memory_region_size(pmr);
> > > > + }
> > > > + if (ct3d->dc.host_dc) {
> > > > + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > > > + /* Do we want dc_size to be dc_mr->size or not?? */
> > >
> > > Maybe - definitely don't want to leave this comment here
> > > unanswered and I think you enforce it above anyway.
> > >
> > > So if we get here ct3d->dc.total_capacity == memory_region_size(ct3d->dc.host_dc);
> > > As such we could just not stash total_capacity at all?
> >
> > I cannot identify a case where these two will be different. But
> > total_capacity is referenced at quite some places, it may be nice to have
> > it so we do not need to call the function to get the value every time?
>
> I kind of like having it via one path so that there is no confusion
> for the reader, but up to you on this one. The function called is trivial
> (other than some magic to handle very large memory regions) so
> this is just a readability question, not a perf one.
>
> Whatever, don't leave the question behind. Find to have something
> that says they are always the same size if you don't get rid
> of the total_capacity representation.
>
I will fix it.
For static capability, we have a variable static_mem_size, although we
can calculate it from volatile and non-volatile memory region size.
There are quite some places need to get the dynamic capacity, it is much
more convenient to have a variable ready to use, I will keep it for
now.
Fan
>
> Jonathan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
2024-03-06 16:28 ` Jonathan Cameron via
(?)
(?)
@ 2024-03-14 20:43 ` fan
-1 siblings, 0 replies; 81+ messages in thread
From: fan @ 2024-03-14 20:43 UTC (permalink / raw)
To: Jonathan Cameron
Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Mar 06, 2024 at 04:28:16PM +0000, Jonathan Cameron wrote:
> On Mon, 4 Mar 2024 11:34:01 -0800
> nifan.cxl@gmail.com wrote:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > Add (file/memory backed) host backend, all the dynamic capacity regions
> > will share a single, large enough host backend. Set up address space for
> > DC regions to support read/write operations to dynamic capacity for DCD.
> >
> > With the change, following supports are added:
> > 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
> > memory backend for dynamic capacity. Currently, all dc regions share one
> > host backend.
> > 2. Add namespace for dynamic capacity for read/write support;
> > 3. Create cdat entries for each dynamic capacity region;
> > 4. Fix dvsec range registers to include DC regions.
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> Hi Fan,
>
> This one has a few more significant comments inline.
>
> thanks,
>
> Jonathan
>
> > ---
>
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index c045fee32d..2b380a260b 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -45,7 +45,8 @@ enum {
> >
> > static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> > int dsmad_handle, uint64_t size,
> > - bool is_pmem, uint64_t dpa_base)
> > + bool is_pmem, bool is_dynamic,
> > + uint64_t dpa_base)
> > {
> > g_autofree CDATDsmas *dsmas = NULL;
> > g_autofree CDATDslbis *dslbis0 = NULL;
>
> There is a fixlet going through for these as the autofree doesn't do anything.
> Will require a rebase. I'll do it on my tree, but might not push that out for a
> few days so this is just a heads up for anyone using these.
>
> https://lore.kernel.org/qemu-devel/20240304104406.59855-1-thuth@redhat.com/
>
> It went in clean for me, so may not even be something anyone notices!
>
> > @@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> > .length = sizeof(*dsmas),
> > },
> > .DSMADhandle = dsmad_handle,
> > - .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> > + .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> > + (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
> > .DPA_base = dpa_base,
> > .DPA_length = size,
> > };
> > @@ -149,12 +151,13 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> > g_autofree CDATSubHeader **table = NULL;
> >
> >
> > @@ -176,21 +179,55 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> > pmr_size = memory_region_size(nonvolatile_mr);
> > }
> >
> > + if (ct3d->dc.num_regions) {
> > + if (ct3d->dc.host_dc) {
> > + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > + if (!dc_mr) {
> > + return -EINVAL;
> > + }
> > + len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
> > + } else {
> > + return -EINVAL;
>
> Flip logic to get the error out the way first and reduce indent.
>
> if (ct3d->dc.num_regions) {
> if (!ct3d->dc.host_dc) {
> return -EINVAL;
> }
> dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> if (!dc_mr) {
> return -EINVAL;
> }
> len += CT3...
> }
>
> > + }
> > + }
> > +
>
> >
> > *cdat_table = g_steal_pointer(&table);
> > @@ -300,11 +337,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
> > range2_size_hi = ct3d->hostpmem->size >> 32;
> > range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > (ct3d->hostpmem->size & 0xF0000000);
> > + } else if (ct3d->dc.host_dc) {
> > + range2_size_hi = ct3d->dc.host_dc->size >> 32;
> > + range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > + (ct3d->dc.host_dc->size & 0xF0000000);
> > }
> > - } else {
> > + } else if (ct3d->hostpmem) {
> > range1_size_hi = ct3d->hostpmem->size >> 32;
> > range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > (ct3d->hostpmem->size & 0xF0000000);
> > + if (ct3d->dc.host_dc) {
> > + range2_size_hi = ct3d->dc.host_dc->size >> 32;
> > + range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > + (ct3d->dc.host_dc->size & 0xF0000000);
> > + }
> > + } else {
> > + range1_size_hi = ct3d->dc.host_dc->size >> 32;
> > + range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > + (ct3d->dc.host_dc->size & 0xF0000000);
>
> I've forgotten if we ever closed out on the right thing to do
> with the legacy range registers. Maybe, just ignoring DC is the
> right option for now? So I'd drop this block of changes.
> Maybe Linux will do the wrong thing if we do, but then we should
> make Linux more flexible on this.
>
> If we did get a clarification that this is the right way to go
> then add a note here.
Hi Jonathan,
I have noticed in the current kernel code, when checking whether the
media is ready (in cxl_await_media_ready), we need to check the devsec
range registers, for dcd device, if we leave dvsec range registers
unset, the device cannot put into "ready" state, which will cause the
device inactive.
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/core/pci.c?h=fixes&id=d206a76d7d2726f3b096037f2079ce0bd3ba329b#n195
So we need to set it as above?? DO I miss anything?
Fan
>
>
> > }
> >
> > dvsec = (uint8_t *)&(CXLDVSECDevice){
> > @@ -579,11 +629,27 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> > {
> > int i;
> > uint64_t region_base = 0;
> > - uint64_t region_len = 2 * GiB;
> > - uint64_t decode_len = 2 * GiB;
> > + uint64_t region_len;
> > + uint64_t decode_len;
> > uint64_t blk_size = 2 * MiB;
> > CXLDCRegion *region;
> > MemoryRegion *mr;
> > + uint64_t dc_size;
> > +
> > + mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > + dc_size = memory_region_size(mr);
> > + region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
> > +
> > + if (region_len * ct3d->dc.num_regions > dc_size) {
> This check had me scratching my head for a minute.
> Why not just check
>
> if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLER) != 0) {
> error_setg(errp, "host backend must by a multiple of 256MiB and region len);
> return false;
> }
> > + error_setg(errp, "host backend size must be multiples of region len");
> > + return false;
> > + }
> > + if (region_len % CXL_CAPACITY_MULTIPLIER != 0) {
> > + error_setg(errp, "DC region size is unaligned to %lx",
> > + CXL_CAPACITY_MULTIPLIER);
> > + return false;
> > + }
> > + decode_len = region_len;
>
>
>
>
> > @@ -868,16 +974,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> > AddressSpace **as,
> > uint64_t *dpa_offset)
> > {
> > - MemoryRegion *vmr = NULL, *pmr = NULL;
> > + MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> > + uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
> >
> > if (ct3d->hostvmem) {
> > vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> > + vmr_size = memory_region_size(vmr);
> > }
> > if (ct3d->hostpmem) {
> > pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> > + pmr_size = memory_region_size(pmr);
> > + }
> > + if (ct3d->dc.host_dc) {
> > + dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > + /* Do we want dc_size to be dc_mr->size or not?? */
>
> Maybe - definitely don't want to leave this comment here
> unanswered and I think you enforce it above anyway.
>
> So if we get here ct3d->dc.total_capacity == memory_region_size(ct3d->dc.host_dc);
> As such we could just not stash total_capacity at all?
>
>
> > + dc_size = ct3d->dc.total_capacity;
> > }
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 07/13] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (5 preceding siblings ...)
2024-03-04 19:34 ` [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2024-03-04 19:34 ` nifan.cxl
2024-03-06 16:37 ` Jonathan Cameron via
2024-03-04 19:34 ` [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
` (5 subsequent siblings)
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:34 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
Add dynamic capacity extent list representative to the definition of
CXLType3Dev and add get DC extent list mailbox command per
CXL.spec.3.1:.8.2.9.9.9.2.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/cxl/cxl-mailbox-utils.c | 71 ++++++++++++++++++++++++++++++++++++-
hw/mem/cxl_type3.c | 1 +
include/hw/cxl/cxl_device.h | 22 ++++++++++++
3 files changed, 93 insertions(+), 1 deletion(-)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 8309f27a2b..425b378a2c 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -84,6 +84,7 @@ enum {
#define CLEAR_POISON 0x2
DCD_CONFIG = 0x48,
#define GET_DC_CONFIG 0x0
+ #define GET_DYN_CAP_EXT_LIST 0x1
PHYSICAL_SWITCH = 0x51,
#define IDENTIFY_SWITCH_DEVICE 0x0
#define GET_PHYSICAL_PORT_STATE 0x1
@@ -1325,7 +1326,8 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
* to use.
*/
stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
- stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
+ stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED -
+ ct3d->dc.total_extent_count);
stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
@@ -1333,6 +1335,70 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
return CXL_MBOX_SUCCESS;
}
+/*
+ * CXL r3.1 section 8.2.9.9.9.2:
+ * Get Dynamic Capacity Extent List (Opcode 4801h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
+ uint8_t *payload_in,
+ size_t len_in,
+ uint8_t *payload_out,
+ size_t *len_out,
+ CXLCCI *cci)
+{
+ CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+ struct {
+ uint32_t extent_cnt;
+ uint32_t start_extent_id;
+ } QEMU_PACKED *in = (void *)payload_in;
+ struct {
+ uint32_t count;
+ uint32_t total_extents;
+ uint32_t generation_num;
+ uint8_t rsvd[4];
+ CXLDCExtentRaw records[];
+ } QEMU_PACKED *out = (void *)payload_out;
+ uint16_t record_count = 0, i = 0, record_done = 0;
+ uint16_t out_pl_len;
+ uint32_t start_extent_id = in->start_extent_id;
+ CXLDCExtentList *extent_list = &ct3d->dc.extents;
+ CXLDCExtent *ent;
+
+ if (start_extent_id > ct3d->dc.total_extent_count) {
+ return CXL_MBOX_INVALID_INPUT;
+ }
+
+ record_count = MIN(in->extent_cnt,
+ ct3d->dc.total_extent_count - start_extent_id);
+
+ out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+ assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+ stl_le_p(&out->count, record_count);
+ stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
+ stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
+
+ if (record_count > 0) {
+ QTAILQ_FOREACH(ent, extent_list, node) {
+ if (i++ < start_extent_id) {
+ continue;
+ }
+ stq_le_p(&out->records[record_done].start_dpa, ent->start_dpa);
+ stq_le_p(&out->records[record_done].len, ent->len);
+ memcpy(&out->records[record_done].tag, ent->tag, 0x10);
+ stw_le_p(&out->records[record_done].shared_seq, ent->shared_seq);
+
+ record_done++;
+ if (record_done == record_count) {
+ break;
+ }
+ }
+ }
+
+ *len_out = out_pl_len;
+ return CXL_MBOX_SUCCESS;
+}
+
#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
#define IMMEDIATE_DATA_CHANGE (1 << 2)
#define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1380,6 +1446,9 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
[DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
cmd_dcd_get_dyn_cap_config, 2, 0 },
+ [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
+ "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
+ 8, 0 },
};
static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 2b380a260b..102fa8151e 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -673,6 +673,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
region_base += region->len;
ct3d->dc.total_capacity += region->len;
}
+ QTAILQ_INIT(&ct3d->dc.extents);
return true;
}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 265679302c..8148bcc34b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -424,6 +424,25 @@ typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
#define DCD_MAX_NUM_REGION 8
+typedef struct CXLDCExtentRaw {
+ uint64_t start_dpa;
+ uint64_t len;
+ uint8_t tag[0x10];
+ uint16_t shared_seq;
+ uint8_t rsvd[0x6];
+} QEMU_PACKED CXLDCExtentRaw;
+
+typedef struct CXLDCExtent {
+ uint64_t start_dpa;
+ uint64_t len;
+ uint8_t tag[0x10];
+ uint16_t shared_seq;
+ uint8_t rsvd[0x6];
+
+ QTAILQ_ENTRY(CXLDCExtent) node;
+} CXLDCExtent;
+typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
+
typedef struct CXLDCRegion {
uint64_t base; /* aligned to 256*MiB */
uint64_t decode_len; /* aligned to 256*MiB */
@@ -470,6 +489,9 @@ struct CXLType3Dev {
HostMemoryBackend *host_dc;
AddressSpace host_dc_as;
uint64_t total_capacity; /* 256M aligned */
+ CXLDCExtentList extents;
+ uint32_t total_extent_count;
+ uint32_t ext_list_gen_seq;
uint8_t num_regions; /* 0-8 regions */
CXLDCRegion regions[DCD_MAX_NUM_REGION];
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 07/13] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
2024-03-04 19:34 ` [PATCH v5 07/13] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
@ 2024-03-06 16:37 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 16:37 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:02 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Add dynamic capacity extent list representative to the definition of
> CXLType3Dev and add get DC extent list mailbox command per
> CXL.spec.3.1:.8.2.9.9.9.2.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Hi Fan,
A small thing in here around the assert to ensure we don't overflow
the mailbox. We don't need that, just check what fits and return
fewer extents than asked for if they won't fit.
>
> +/*
> + * CXL r3.1 section 8.2.9.9.9.2:
> + * Get Dynamic Capacity Extent List (Opcode 4801h)
> + */
> +static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> + uint8_t *payload_in,
> + size_t len_in,
> + uint8_t *payload_out,
> + size_t *len_out,
> + CXLCCI *cci)
> + uint16_t record_count = 0, i = 0, record_done = 0;
> + uint16_t out_pl_len;
> + uint32_t start_extent_id = in->start_extent_id;
> + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> + CXLDCExtent *ent;
> +
> + if (start_extent_id > ct3d->dc.total_extent_count) {
> + return CXL_MBOX_INVALID_INPUT;
> + }
> +
> + record_count = MIN(in->extent_cnt,
> + ct3d->dc.total_extent_count - start_extent_id);
Should clamp this using the length so that it fits in the mailbox...
> +
> + out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> + assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
and get rid of this nasty assert.
The command is not obliged to even try to return as many as requested,
it can return any smaller number (1 or more) with the assumption the driver
just asks for the next ones..
> +
> + stl_le_p(&out->count, record_count);
> + stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
> + stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
> +
> + if (record_count > 0) {
> + QTAILQ_FOREACH(ent, extent_list, node) {
> + if (i++ < start_extent_id) {
> + continue;
> + }
> + stq_le_p(&out->records[record_done].start_dpa, ent->start_dpa);
Maybe a local variable for out->records[record_done]?
CXLDCExtentRaw *out_rec = &out->records[record]done];
stq_le_p(&out_rec->len, ent->len);
etc
> + stq_le_p(&out->records[record_done].len, ent->len);
> + memcpy(&out->records[record_done].tag, ent->tag, 0x10);
> + stw_le_p(&out->records[record_done].shared_seq, ent->shared_seq);
> +
> + record_done++;
> + if (record_done == record_count) {
> + break;
> + }
> + }
> + }
> +
> + *len_out = out_pl_len;
> + return CXL_MBOX_SUCCESS;
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 07/13] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
@ 2024-03-06 16:37 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 16:37 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:02 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Add dynamic capacity extent list representative to the definition of
> CXLType3Dev and add get DC extent list mailbox command per
> CXL.spec.3.1:.8.2.9.9.9.2.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Hi Fan,
A small thing in here around the assert to ensure we don't overflow
the mailbox. We don't need that, just check what fits and return
fewer extents than asked for if they won't fit.
>
> +/*
> + * CXL r3.1 section 8.2.9.9.9.2:
> + * Get Dynamic Capacity Extent List (Opcode 4801h)
> + */
> +static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> + uint8_t *payload_in,
> + size_t len_in,
> + uint8_t *payload_out,
> + size_t *len_out,
> + CXLCCI *cci)
> + uint16_t record_count = 0, i = 0, record_done = 0;
> + uint16_t out_pl_len;
> + uint32_t start_extent_id = in->start_extent_id;
> + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> + CXLDCExtent *ent;
> +
> + if (start_extent_id > ct3d->dc.total_extent_count) {
> + return CXL_MBOX_INVALID_INPUT;
> + }
> +
> + record_count = MIN(in->extent_cnt,
> + ct3d->dc.total_extent_count - start_extent_id);
Should clamp this using the length so that it fits in the mailbox...
> +
> + out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> + assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
and get rid of this nasty assert.
The command is not obliged to even try to return as many as requested,
it can return any smaller number (1 or more) with the assumption the driver
just asks for the next ones..
> +
> + stl_le_p(&out->count, record_count);
> + stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
> + stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
> +
> + if (record_count > 0) {
> + QTAILQ_FOREACH(ent, extent_list, node) {
> + if (i++ < start_extent_id) {
> + continue;
> + }
> + stq_le_p(&out->records[record_done].start_dpa, ent->start_dpa);
Maybe a local variable for out->records[record_done]?
CXLDCExtentRaw *out_rec = &out->records[record]done];
stq_le_p(&out_rec->len, ent->len);
etc
> + stq_le_p(&out->records[record_done].len, ent->len);
> + memcpy(&out->records[record_done].tag, ent->tag, 0x10);
> + stw_le_p(&out->records[record_done].shared_seq, ent->shared_seq);
> +
> + record_done++;
> + if (record_done == record_count) {
> + break;
> + }
> + }
> + }
> +
> + *len_out = out_pl_len;
> + return CXL_MBOX_SUCCESS;
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (6 preceding siblings ...)
2024-03-04 19:34 ` [PATCH v5 07/13] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
@ 2024-03-04 19:34 ` nifan.cxl
2024-03-06 17:28 ` Jonathan Cameron via
2024-03-04 19:34 ` [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
` (4 subsequent siblings)
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:34 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
Per CXL spec 3.1, two mailbox commands are implemented:
Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/cxl/cxl-mailbox-utils.c | 310 ++++++++++++++++++++++++++++++++++++
hw/mem/cxl_type3.c | 12 ++
include/hw/cxl/cxl_device.h | 4 +
3 files changed, 326 insertions(+)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 425b378a2c..8c59635a9f 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -85,6 +85,8 @@ enum {
DCD_CONFIG = 0x48,
#define GET_DC_CONFIG 0x0
#define GET_DYN_CAP_EXT_LIST 0x1
+ #define ADD_DYN_CAP_RSP 0x2
+ #define RELEASE_DYN_CAP 0x3
PHYSICAL_SWITCH = 0x51,
#define IDENTIFY_SWITCH_DEVICE 0x0
#define GET_PHYSICAL_PORT_STATE 0x1
@@ -1399,6 +1401,308 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
return CXL_MBOX_SUCCESS;
}
+/*
+ * Check whether any bit between addr[nr, nr+size) is set,
+ * return true if any bit is set, otherwise return false
+ */
+static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+ unsigned long size)
+{
+ unsigned long res = find_next_bit(addr, size + nr, nr);
+
+ return res < nr + size;
+}
+
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
+{
+ int i;
+ CXLDCRegion *region = &ct3d->dc.regions[0];
+
+ if (dpa < region->base ||
+ dpa >= region->base + ct3d->dc.total_capacity) {
+ return NULL;
+ }
+
+ /*
+ * CXL r3.1 section 9.13.3: Dynamic Capacity Device (DCD)
+ *
+ * Regions are used in increasing-DPA order, with Region 0 being used for
+ * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
+ * So check from the last region to find where the dpa belongs. Extents that
+ * cross multiple regions are not allowed.
+ */
+ for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
+ region = &ct3d->dc.regions[i];
+ if (dpa >= region->base) {
+ if (dpa + len > region->base + region->len) {
+ return NULL;
+ }
+ return region;
+ }
+ }
+
+ return NULL;
+}
+
+static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+ uint64_t dpa,
+ uint64_t len,
+ uint8_t *tag,
+ uint16_t shared_seq)
+{
+ CXLDCExtent *extent;
+
+ extent = g_new0(CXLDCExtent, 1);
+ extent->start_dpa = dpa;
+ extent->len = len;
+ if (tag) {
+ memcpy(extent->tag, tag, 0x10);
+ }
+ extent->shared_seq = shared_seq;
+
+ QTAILQ_INSERT_TAIL(list, extent, node);
+}
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+ CXLDCExtent *extent)
+{
+ QTAILQ_REMOVE(list, extent, node);
+ g_free(extent);
+}
+
+/*
+ * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
+ * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
+ */
+typedef struct CXLUpdateDCExtentListInPl {
+ uint32_t num_entries_updated;
+ uint8_t flags;
+ uint8_t rsvd[3];
+ /* CXL r3.1 Table 8-169: Updated Extent */
+ struct {
+ uint64_t start_dpa;
+ uint64_t len;
+ uint8_t rsvd[8];
+ } QEMU_PACKED updated_entries[];
+} QEMU_PACKED CXLUpdateDCExtentListInPl;
+
+/*
+ * For the extents in the extent list to operate, check whether they are valid
+ * 1. The extent should be in the range of a valid DC region;
+ * 2. The extent should not cross multiple regions;
+ * 3. The start DPA and the length of the extent should align with the block
+ * size of the region;
+ * 4. The address range of multiple extents in the list should not overlap.
+ */
+static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
+ const CXLUpdateDCExtentListInPl *in)
+{
+ uint64_t min_block_size = UINT64_MAX;
+ CXLDCRegion *region = &ct3d->dc.regions[0];
+ CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
+ g_autofree unsigned long *blk_bitmap = NULL;
+ uint64_t dpa, len;
+ uint32_t i;
+
+ for (i = 0; i < ct3d->dc.num_regions; i++) {
+ region = &ct3d->dc.regions[i];
+ min_block_size = MIN(min_block_size, region->block_size);
+ }
+
+ blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
+ ct3d->dc.regions[0].base) / min_block_size);
+
+ for (i = 0; i < in->num_entries_updated; i++) {
+ dpa = in->updated_entries[i].start_dpa;
+ len = in->updated_entries[i].len;
+
+ region = cxl_find_dc_region(ct3d, dpa, len);
+ if (!region) {
+ return CXL_MBOX_INVALID_PA;
+ }
+
+ dpa -= ct3d->dc.regions[0].base;
+ if (dpa % region->block_size || len % region->block_size) {
+ return CXL_MBOX_INVALID_EXTENT_LIST;
+ }
+ /* the dpa range already covered by some other extents in the list */
+ if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
+ len / min_block_size)) {
+ return CXL_MBOX_INVALID_EXTENT_LIST;
+ }
+ bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
+ }
+
+ return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
+ * An extent is added to the extent list and becomes usable only after the
+ * response is processed successfully
+ */
+static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
+ uint8_t *payload_in,
+ size_t len_in,
+ uint8_t *payload_out,
+ size_t *len_out,
+ CXLCCI *cci)
+{
+ CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+ CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+ CXLDCExtentList *extent_list = &ct3d->dc.extents;
+ CXLDCExtent *ent;
+ uint32_t i;
+ uint64_t dpa, len;
+ CXLRetCode ret;
+
+ if (in->num_entries_updated == 0) {
+ return CXL_MBOX_SUCCESS;
+ }
+
+ /* Adding extents causes exceeding device's extent tracking ability. */
+ if (in->num_entries_updated + ct3d->dc.total_extent_count >
+ CXL_NUM_EXTENTS_SUPPORTED) {
+ return CXL_MBOX_RESOURCES_EXHAUSTED;
+ }
+
+ ret = cxl_detect_malformed_extent_list(ct3d, in);
+ if (ret != CXL_MBOX_SUCCESS) {
+ return ret;
+ }
+
+ for (i = 0; i < in->num_entries_updated; i++) {
+ dpa = in->updated_entries[i].start_dpa;
+ len = in->updated_entries[i].len;
+
+ /*
+ * Check if the DPA range of the to-be-added extent overlaps with
+ * existing extent list maintained by the device.
+ */
+ QTAILQ_FOREACH(ent, extent_list, node) {
+ if (ent->start_dpa <= dpa &&
+ dpa + len <= ent->start_dpa + ent->len) {
+ return CXL_MBOX_INVALID_PA;
+ /* Overlapping one end of the other */
+ } else if ((dpa < ent->start_dpa + ent->len &&
+ dpa + len > ent->start_dpa + ent->len) ||
+ (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) {
+ return CXL_MBOX_INVALID_PA;
+ }
+ }
+
+ /*
+ * TODO: we will add a pending extent list based on event log record
+ * and verify the input response; also, the "More" flag is not
+ * considered at the moment.
+ */
+
+ cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
+ ct3d->dc.total_extent_count += 1;
+ }
+
+ return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
+ */
+static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
+ uint8_t *payload_in,
+ size_t len_in,
+ uint8_t *payload_out,
+ size_t *len_out,
+ CXLCCI *cci)
+{
+ CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+ CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+ CXLDCExtentList *extent_list = &ct3d->dc.extents;
+ CXLDCExtent *ent;
+ uint32_t i;
+ uint64_t dpa, len;
+ CXLRetCode ret;
+
+ if (in->num_entries_updated == 0) {
+ return CXL_MBOX_INVALID_INPUT;
+ }
+
+ ret = cxl_detect_malformed_extent_list(ct3d, in);
+ if (ret != CXL_MBOX_SUCCESS) {
+ return ret;
+ }
+
+ for (i = 0; i < in->num_entries_updated; i++) {
+ bool found = false;
+
+ dpa = in->updated_entries[i].start_dpa;
+ len = in->updated_entries[i].len;
+
+ QTAILQ_FOREACH(ent, extent_list, node) {
+ /* Found the extent overlapping with */
+ if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
+ if (dpa + len <= ent->start_dpa + ent->len) {
+ /*
+ * The incoming extent covers a portion of an extent
+ * in the device extent list, remove only the overlapping
+ * portion, meaning
+ * 1. the portions that are not covered by the incoming
+ * extent at both end of the original extent will become
+ * new extents and inserted to the extent list; and
+ * 2. the original extent is removed from the extent list;
+ * 3. DC extent count is updated accordingly.
+ */
+ uint64_t ent_start_dpa = ent->start_dpa;
+ uint64_t ent_len = ent->len;
+ uint64_t len1 = dpa - ent_start_dpa;
+ uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
+
+ /*
+ * TODO: checking for possible extent overflow, will be
+ * moved into a dedicated function of detecting extent
+ * overflow.
+ */
+ if (len1 && len2 && ct3d->dc.total_extent_count ==
+ CXL_NUM_EXTENTS_SUPPORTED) {
+ return CXL_MBOX_RESOURCES_EXHAUSTED;
+ }
+
+ found = true;
+ cxl_remove_extent_from_extent_list(extent_list, ent);
+ ct3d->dc.total_extent_count -= 1;
+
+ if (len1) {
+ cxl_insert_extent_to_extent_list(extent_list,
+ ent_start_dpa, len1,
+ NULL, 0);
+ ct3d->dc.total_extent_count += 1;
+ }
+ if (len2) {
+ cxl_insert_extent_to_extent_list(extent_list, dpa + len,
+ len2, NULL, 0);
+ ct3d->dc.total_extent_count += 1;
+ }
+ break;
+ } else {
+ /*
+ * TODO: we reject the attempt to remove an extent that
+ * overlaps with multiple extents in the device for now,
+ * once the bitmap indicating whether a DPA range is
+ * covered by valid extents is introduced, will allow it.
+ */
+ return CXL_MBOX_INVALID_PA;
+ }
+ }
+ }
+
+ if (!found) {
+ /* Try to remove a non-existing extent. */
+ return CXL_MBOX_INVALID_PA;
+ }
+ }
+
+ return CXL_MBOX_SUCCESS;
+}
+
#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
#define IMMEDIATE_DATA_CHANGE (1 << 2)
#define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1449,6 +1753,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
[DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
"DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
8, 0 },
+ [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
+ "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
+ ~0, IMMEDIATE_DATA_CHANGE },
+ [DCD_CONFIG][RELEASE_DYN_CAP] = {
+ "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
+ ~0, IMMEDIATE_DATA_CHANGE },
};
static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 102fa8151e..dccfaaad3a 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -678,6 +678,16 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
return true;
}
+static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
+{
+ CXLDCExtent *ent;
+
+ while (!QTAILQ_EMPTY(&ct3d->dc.extents)) {
+ ent = QTAILQ_FIRST(&ct3d->dc.extents);
+ cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
+ }
+}
+
static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
{
DeviceState *ds = DEVICE(ct3d);
@@ -874,6 +884,7 @@ err_free_special_ops:
g_free(regs->special_ops);
err_address_space_free:
if (ct3d->dc.host_dc) {
+ cxl_destroy_dc_regions(ct3d);
address_space_destroy(&ct3d->dc.host_dc_as);
}
if (ct3d->hostpmem) {
@@ -895,6 +906,7 @@ static void ct3_exit(PCIDevice *pci_dev)
cxl_doe_cdat_release(cxl_cstate);
g_free(regs->special_ops);
if (ct3d->dc.host_dc) {
+ cxl_destroy_dc_regions(ct3d);
address_space_destroy(&ct3d->dc.host_dc_as);
}
if (ct3d->hostpmem) {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 8148bcc34b..341260e6e4 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -547,4 +547,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+ CXLDCExtent *extent);
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
2024-03-04 19:34 ` [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
@ 2024-03-06 17:28 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 17:28 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:03 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Per CXL spec 3.1, two mailbox commands are implemented:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Hmm. So I had a thought which would work for what you
have here. See include/qemu/range.h
I like the region merging stuff that is also in the list operators
but we shouldn't use that because we have other reasons not to
fuse ranges (sequence numbering etc)
We could make an extent a wrapper around a struct Range though
so that we can use the comparison stuff directly.
+ we can use the list manipulation in there as the basis for a future
extent merging infrastructure that is tag and sequence number (if
provided - so shared capacity or pmem) aware.
Jonathan
> ---
> +
> +/*
> + * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> + * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> + */
> +typedef struct CXLUpdateDCExtentListInPl {
> + uint32_t num_entries_updated;
> + uint8_t flags;
> + uint8_t rsvd[3];
> + /* CXL r3.1 Table 8-169: Updated Extent */
> + struct {
> + uint64_t start_dpa;
> + uint64_t len;
> + uint8_t rsvd[8];
> + } QEMU_PACKED updated_entries[];
> +} QEMU_PACKED CXLUpdateDCExtentListInPl;
> +
> +/*
> + * For the extents in the extent list to operate, check whether they are valid
> + * 1. The extent should be in the range of a valid DC region;
> + * 2. The extent should not cross multiple regions;
> + * 3. The start DPA and the length of the extent should align with the block
> + * size of the region;
> + * 4. The address range of multiple extents in the list should not overlap.
Hmm. Interesting. I was thinking a given add / remove command rather than
just the extents can't overlap a region. However I can't find text on that
so I believe your interpretation is correct. It is only specified for the
event records, but that is good enough I think. We might want to propose
tightening the spec on this to allow devices to say no to such complex
extent lists. Maybe a nice friendly Memory vendor should query this one if
it's a potential problem for real devices. Might not be!
> + */
> +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> + const CXLUpdateDCExtentListInPl *in)
> +{
> + uint64_t min_block_size = UINT64_MAX;
> + CXLDCRegion *region = &ct3d->dc.regions[0];
> + CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> + g_autofree unsigned long *blk_bitmap = NULL;
> + uint64_t dpa, len;
> + uint32_t i;
> +
> + for (i = 0; i < ct3d->dc.num_regions; i++) {
> + region = &ct3d->dc.regions[i];
> + min_block_size = MIN(min_block_size, region->block_size);
> + }
> +
> + blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
> + ct3d->dc.regions[0].base) / min_block_size);
> +
> + for (i = 0; i < in->num_entries_updated; i++) {
> + dpa = in->updated_entries[i].start_dpa;
> + len = in->updated_entries[i].len;
> +
> + region = cxl_find_dc_region(ct3d, dpa, len);
> + if (!region) {
> + return CXL_MBOX_INVALID_PA;
> + }
> +
> + dpa -= ct3d->dc.regions[0].base;
> + if (dpa % region->block_size || len % region->block_size) {
> + return CXL_MBOX_INVALID_EXTENT_LIST;
> + }
> + /* the dpa range already covered by some other extents in the list */
> + if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> + len / min_block_size)) {
> + return CXL_MBOX_INVALID_EXTENT_LIST;
> + }
> + bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> + }
> +
> + return CXL_MBOX_SUCCESS;
> +}
> +
> +/*
> + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
> + * An extent is added to the extent list and becomes usable only after the
> + * response is processed successfully
> + */
> +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> + uint8_t *payload_in,
> + size_t len_in,
> + uint8_t *payload_out,
> + size_t *len_out,
> + CXLCCI *cci)
> +{
> + CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> + CXLDCExtent *ent;
> + uint32_t i;
> + uint64_t dpa, len;
> + CXLRetCode ret;
> +
> + if (in->num_entries_updated == 0) {
> + return CXL_MBOX_SUCCESS;
> + }
> +
> + /* Adding extents causes exceeding device's extent tracking ability. */
> + if (in->num_entries_updated + ct3d->dc.total_extent_count >
> + CXL_NUM_EXTENTS_SUPPORTED) {
> + return CXL_MBOX_RESOURCES_EXHAUSTED;
> + }
> +
> + ret = cxl_detect_malformed_extent_list(ct3d, in);
> + if (ret != CXL_MBOX_SUCCESS) {
> + return ret;
> + }
> +
> + for (i = 0; i < in->num_entries_updated; i++) {
> + dpa = in->updated_entries[i].start_dpa;
> + len = in->updated_entries[i].len;
> +
> + /*
> + * Check if the DPA range of the to-be-added extent overlaps with
> + * existing extent list maintained by the device.
> + */
> + QTAILQ_FOREACH(ent, extent_list, node) {
There are too many checks in here for an overlapping test.
Conditions are
| Extent tested against |
| Overlap entirely |
| overlap left edge |
| overlap right edge |
Think of it in the inverse condition and it is easier to reason about.
| Extent tested against |
| to left |--- ---| to right |
which I think is something like.
if (!((dpa + len <= ent->start_dpa) || (dpa >= ent->start_dpa + ent->len)) {
return CXL_MBOX_INVALID_PA;
}
Hmm. For internal tracking (not the exposed values) we should probably use
struct range from include/qemu/range.h.
Felt like there had to be something better than doing this ourselves so I went
looking. Note it uses inclusive upper bound so be careful with that!
Advantage is we get this checks for free.
https://elixir.bootlin.com/qemu/latest/source/include/qemu/range.h#L152
range_overlaps_range()
There are functions to set them up nicely for us and by base and size
as well which should tidy that part up.
> + if (ent->start_dpa <= dpa &&
> + dpa + len <= ent->start_dpa + ent->len) {
> + return CXL_MBOX_INVALID_PA;
> + /* Overlapping one end of the other */
> + } else if ((dpa < ent->start_dpa + ent->len &&
> + dpa + len > ent->start_dpa + ent->len) ||
> + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) {
> + return CXL_MBOX_INVALID_PA;
> + }
> + }
> +
> + /*
> + * TODO: we will add a pending extent list based on event log record
> + * and verify the input response; also, the "More" flag is not
> + * considered at the moment.
> + */
> +
> + cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> + ct3d->dc.total_extent_count += 1;
> + }
> +
> + return CXL_MBOX_SUCCESS;
> +}
> +
> +/*
> + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> + */
> +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> + uint8_t *payload_in,
> + size_t len_in,
> + uint8_t *payload_out,
> + size_t *len_out,
> + CXLCCI *cci)
> +{
> + CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> + CXLDCExtent *ent;
> + uint32_t i;
> + uint64_t dpa, len;
> + CXLRetCode ret;
> +
> + if (in->num_entries_updated == 0) {
> + return CXL_MBOX_INVALID_INPUT;
> + }
> +
> + ret = cxl_detect_malformed_extent_list(ct3d, in);
> + if (ret != CXL_MBOX_SUCCESS) {
> + return ret;
> + }
> +
> + for (i = 0; i < in->num_entries_updated; i++) {
> + bool found = false;
> +
> + dpa = in->updated_entries[i].start_dpa;
> + len = in->updated_entries[i].len;
> +
> + QTAILQ_FOREACH(ent, extent_list, node) {
> + /* Found the extent overlapping with */
> + if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
> + if (dpa + len <= ent->start_dpa + ent->len) {
> + /*
> + * The incoming extent covers a portion of an extent
> + * in the device extent list, remove only the overlapping
> + * portion, meaning
> + * 1. the portions that are not covered by the incoming
> + * extent at both end of the original extent will become
> + * new extents and inserted to the extent list; and
> + * 2. the original extent is removed from the extent list;
> + * 3. DC extent count is updated accordingly.
> + */
> + uint64_t ent_start_dpa = ent->start_dpa;
> + uint64_t ent_len = ent->len;
> + uint64_t len1 = dpa - ent_start_dpa;
> + uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
> +
> + /*
> + * TODO: checking for possible extent overflow, will be
> + * moved into a dedicated function of detecting extent
> + * overflow.
> + */
> + if (len1 && len2 && ct3d->dc.total_extent_count ==
> + CXL_NUM_EXTENTS_SUPPORTED) {
> + return CXL_MBOX_RESOURCES_EXHAUSTED;
> + }
> +
> + found = true;
> + cxl_remove_extent_from_extent_list(extent_list, ent);
> + ct3d->dc.total_extent_count -= 1;
> +
> + if (len1) {
> + cxl_insert_extent_to_extent_list(extent_list,
> + ent_start_dpa, len1,
> + NULL, 0);
> + ct3d->dc.total_extent_count += 1;
> + }
> + if (len2) {
> + cxl_insert_extent_to_extent_list(extent_list, dpa + len,
> + len2, NULL, 0);
> + ct3d->dc.total_extent_count += 1;
> + }
> + break;
Maybe this makes sense after the support below is added, but at this
point in the series
return CXL_MBOX_SUCCESS;
then found isn't relevant so can drop that. Looks like you drop it later in the
series anyway.
> + } else {
> + /*
> + * TODO: we reject the attempt to remove an extent that
> + * overlaps with multiple extents in the device for now,
> + * once the bitmap indicating whether a DPA range is
> + * covered by valid extents is introduced, will allow it.
> + */
> + return CXL_MBOX_INVALID_PA;
> + }
> + }
> + }
> +
> + if (!found) {
> + /* Try to remove a non-existing extent. */
> + return CXL_MBOX_INVALID_PA;
> + }
> + }
> +
> + return CXL_MBOX_SUCCESS;
> +}
> +
> static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 102fa8151e..dccfaaad3a 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -678,6 +678,16 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> return true;
> }
>
> +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> +{
> + CXLDCExtent *ent;
> +
> + while (!QTAILQ_EMPTY(&ct3d->dc.extents)) {
> + ent = QTAILQ_FIRST(&ct3d->dc.extents);
> + cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
Isn't this same a something like.
QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node)) {
cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
//This wrapper is small enough I'd be tempted to just have the
//code inline at the places it's called.
}
> + }
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
@ 2024-03-06 17:28 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 17:28 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:03 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Per CXL spec 3.1, two mailbox commands are implemented:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Hmm. So I had a thought which would work for what you
have here. See include/qemu/range.h
I like the region merging stuff that is also in the list operators
but we shouldn't use that because we have other reasons not to
fuse ranges (sequence numbering etc)
We could make an extent a wrapper around a struct Range though
so that we can use the comparison stuff directly.
+ we can use the list manipulation in there as the basis for a future
extent merging infrastructure that is tag and sequence number (if
provided - so shared capacity or pmem) aware.
Jonathan
> ---
> +
> +/*
> + * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> + * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> + */
> +typedef struct CXLUpdateDCExtentListInPl {
> + uint32_t num_entries_updated;
> + uint8_t flags;
> + uint8_t rsvd[3];
> + /* CXL r3.1 Table 8-169: Updated Extent */
> + struct {
> + uint64_t start_dpa;
> + uint64_t len;
> + uint8_t rsvd[8];
> + } QEMU_PACKED updated_entries[];
> +} QEMU_PACKED CXLUpdateDCExtentListInPl;
> +
> +/*
> + * For the extents in the extent list to operate, check whether they are valid
> + * 1. The extent should be in the range of a valid DC region;
> + * 2. The extent should not cross multiple regions;
> + * 3. The start DPA and the length of the extent should align with the block
> + * size of the region;
> + * 4. The address range of multiple extents in the list should not overlap.
Hmm. Interesting. I was thinking a given add / remove command rather than
just the extents can't overlap a region. However I can't find text on that
so I believe your interpretation is correct. It is only specified for the
event records, but that is good enough I think. We might want to propose
tightening the spec on this to allow devices to say no to such complex
extent lists. Maybe a nice friendly Memory vendor should query this one if
it's a potential problem for real devices. Might not be!
> + */
> +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> + const CXLUpdateDCExtentListInPl *in)
> +{
> + uint64_t min_block_size = UINT64_MAX;
> + CXLDCRegion *region = &ct3d->dc.regions[0];
> + CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> + g_autofree unsigned long *blk_bitmap = NULL;
> + uint64_t dpa, len;
> + uint32_t i;
> +
> + for (i = 0; i < ct3d->dc.num_regions; i++) {
> + region = &ct3d->dc.regions[i];
> + min_block_size = MIN(min_block_size, region->block_size);
> + }
> +
> + blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
> + ct3d->dc.regions[0].base) / min_block_size);
> +
> + for (i = 0; i < in->num_entries_updated; i++) {
> + dpa = in->updated_entries[i].start_dpa;
> + len = in->updated_entries[i].len;
> +
> + region = cxl_find_dc_region(ct3d, dpa, len);
> + if (!region) {
> + return CXL_MBOX_INVALID_PA;
> + }
> +
> + dpa -= ct3d->dc.regions[0].base;
> + if (dpa % region->block_size || len % region->block_size) {
> + return CXL_MBOX_INVALID_EXTENT_LIST;
> + }
> + /* the dpa range already covered by some other extents in the list */
> + if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> + len / min_block_size)) {
> + return CXL_MBOX_INVALID_EXTENT_LIST;
> + }
> + bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> + }
> +
> + return CXL_MBOX_SUCCESS;
> +}
> +
> +/*
> + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
> + * An extent is added to the extent list and becomes usable only after the
> + * response is processed successfully
> + */
> +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> + uint8_t *payload_in,
> + size_t len_in,
> + uint8_t *payload_out,
> + size_t *len_out,
> + CXLCCI *cci)
> +{
> + CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> + CXLDCExtent *ent;
> + uint32_t i;
> + uint64_t dpa, len;
> + CXLRetCode ret;
> +
> + if (in->num_entries_updated == 0) {
> + return CXL_MBOX_SUCCESS;
> + }
> +
> + /* Adding extents causes exceeding device's extent tracking ability. */
> + if (in->num_entries_updated + ct3d->dc.total_extent_count >
> + CXL_NUM_EXTENTS_SUPPORTED) {
> + return CXL_MBOX_RESOURCES_EXHAUSTED;
> + }
> +
> + ret = cxl_detect_malformed_extent_list(ct3d, in);
> + if (ret != CXL_MBOX_SUCCESS) {
> + return ret;
> + }
> +
> + for (i = 0; i < in->num_entries_updated; i++) {
> + dpa = in->updated_entries[i].start_dpa;
> + len = in->updated_entries[i].len;
> +
> + /*
> + * Check if the DPA range of the to-be-added extent overlaps with
> + * existing extent list maintained by the device.
> + */
> + QTAILQ_FOREACH(ent, extent_list, node) {
There are too many checks in here for an overlapping test.
Conditions are
| Extent tested against |
| Overlap entirely |
| overlap left edge |
| overlap right edge |
Think of it in the inverse condition and it is easier to reason about.
| Extent tested against |
| to left |--- ---| to right |
which I think is something like.
if (!((dpa + len <= ent->start_dpa) || (dpa >= ent->start_dpa + ent->len)) {
return CXL_MBOX_INVALID_PA;
}
Hmm. For internal tracking (not the exposed values) we should probably use
struct range from include/qemu/range.h.
Felt like there had to be something better than doing this ourselves so I went
looking. Note it uses inclusive upper bound so be careful with that!
Advantage is we get this checks for free.
https://elixir.bootlin.com/qemu/latest/source/include/qemu/range.h#L152
range_overlaps_range()
There are functions to set them up nicely for us and by base and size
as well which should tidy that part up.
> + if (ent->start_dpa <= dpa &&
> + dpa + len <= ent->start_dpa + ent->len) {
> + return CXL_MBOX_INVALID_PA;
> + /* Overlapping one end of the other */
> + } else if ((dpa < ent->start_dpa + ent->len &&
> + dpa + len > ent->start_dpa + ent->len) ||
> + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) {
> + return CXL_MBOX_INVALID_PA;
> + }
> + }
> +
> + /*
> + * TODO: we will add a pending extent list based on event log record
> + * and verify the input response; also, the "More" flag is not
> + * considered at the moment.
> + */
> +
> + cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> + ct3d->dc.total_extent_count += 1;
> + }
> +
> + return CXL_MBOX_SUCCESS;
> +}
> +
> +/*
> + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> + */
> +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> + uint8_t *payload_in,
> + size_t len_in,
> + uint8_t *payload_out,
> + size_t *len_out,
> + CXLCCI *cci)
> +{
> + CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> + CXLDCExtent *ent;
> + uint32_t i;
> + uint64_t dpa, len;
> + CXLRetCode ret;
> +
> + if (in->num_entries_updated == 0) {
> + return CXL_MBOX_INVALID_INPUT;
> + }
> +
> + ret = cxl_detect_malformed_extent_list(ct3d, in);
> + if (ret != CXL_MBOX_SUCCESS) {
> + return ret;
> + }
> +
> + for (i = 0; i < in->num_entries_updated; i++) {
> + bool found = false;
> +
> + dpa = in->updated_entries[i].start_dpa;
> + len = in->updated_entries[i].len;
> +
> + QTAILQ_FOREACH(ent, extent_list, node) {
> + /* Found the extent overlapping with */
> + if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
> + if (dpa + len <= ent->start_dpa + ent->len) {
> + /*
> + * The incoming extent covers a portion of an extent
> + * in the device extent list, remove only the overlapping
> + * portion, meaning
> + * 1. the portions that are not covered by the incoming
> + * extent at both end of the original extent will become
> + * new extents and inserted to the extent list; and
> + * 2. the original extent is removed from the extent list;
> + * 3. DC extent count is updated accordingly.
> + */
> + uint64_t ent_start_dpa = ent->start_dpa;
> + uint64_t ent_len = ent->len;
> + uint64_t len1 = dpa - ent_start_dpa;
> + uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
> +
> + /*
> + * TODO: checking for possible extent overflow, will be
> + * moved into a dedicated function of detecting extent
> + * overflow.
> + */
> + if (len1 && len2 && ct3d->dc.total_extent_count ==
> + CXL_NUM_EXTENTS_SUPPORTED) {
> + return CXL_MBOX_RESOURCES_EXHAUSTED;
> + }
> +
> + found = true;
> + cxl_remove_extent_from_extent_list(extent_list, ent);
> + ct3d->dc.total_extent_count -= 1;
> +
> + if (len1) {
> + cxl_insert_extent_to_extent_list(extent_list,
> + ent_start_dpa, len1,
> + NULL, 0);
> + ct3d->dc.total_extent_count += 1;
> + }
> + if (len2) {
> + cxl_insert_extent_to_extent_list(extent_list, dpa + len,
> + len2, NULL, 0);
> + ct3d->dc.total_extent_count += 1;
> + }
> + break;
Maybe this makes sense after the support below is added, but at this
point in the series
return CXL_MBOX_SUCCESS;
then found isn't relevant so can drop that. Looks like you drop it later in the
series anyway.
> + } else {
> + /*
> + * TODO: we reject the attempt to remove an extent that
> + * overlaps with multiple extents in the device for now,
> + * once the bitmap indicating whether a DPA range is
> + * covered by valid extents is introduced, will allow it.
> + */
> + return CXL_MBOX_INVALID_PA;
> + }
> + }
> + }
> +
> + if (!found) {
> + /* Try to remove a non-existing extent. */
> + return CXL_MBOX_INVALID_PA;
> + }
> + }
> +
> + return CXL_MBOX_SUCCESS;
> +}
> +
> static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 102fa8151e..dccfaaad3a 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -678,6 +678,16 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> return true;
> }
>
> +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> +{
> + CXLDCExtent *ent;
> +
> + while (!QTAILQ_EMPTY(&ct3d->dc.extents)) {
> + ent = QTAILQ_FIRST(&ct3d->dc.extents);
> + cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
Isn't this same a something like.
QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node)) {
cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
//This wrapper is small enough I'd be tempted to just have the
//code inline at the places it's called.
}
> + }
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
2024-03-06 17:28 ` Jonathan Cameron via
(?)
@ 2024-03-06 21:39 ` fan
2024-03-07 12:20 ` Jonathan Cameron via
-1 siblings, 1 reply; 81+ messages in thread
From: fan @ 2024-03-06 21:39 UTC (permalink / raw)
To: Jonathan Cameron
Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Mar 06, 2024 at 05:28:27PM +0000, Jonathan Cameron wrote:
> On Mon, 4 Mar 2024 11:34:03 -0800
> nifan.cxl@gmail.com wrote:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > Per CXL spec 3.1, two mailbox commands are implemented:
> > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
>
> Hmm. So I had a thought which would work for what you
> have here. See include/qemu/range.h
> I like the region merging stuff that is also in the list operators
> but we shouldn't use that because we have other reasons not to
> fuse ranges (sequence numbering etc)
>
> We could make an extent a wrapper around a struct Range though
> so that we can use the comparison stuff directly.
> + we can use the list manipulation in there as the basis for a future
> extent merging infrastructure that is tag and sequence number (if
> provided - so shared capacity or pmem) aware.
>
> Jonathan
>
>
> > ---
> > +
> > +/*
> > + * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> > + * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> > + */
> > +typedef struct CXLUpdateDCExtentListInPl {
> > + uint32_t num_entries_updated;
> > + uint8_t flags;
> > + uint8_t rsvd[3];
> > + /* CXL r3.1 Table 8-169: Updated Extent */
> > + struct {
> > + uint64_t start_dpa;
> > + uint64_t len;
> > + uint8_t rsvd[8];
> > + } QEMU_PACKED updated_entries[];
> > +} QEMU_PACKED CXLUpdateDCExtentListInPl;
> > +
> > +/*
> > + * For the extents in the extent list to operate, check whether they are valid
> > + * 1. The extent should be in the range of a valid DC region;
> > + * 2. The extent should not cross multiple regions;
> > + * 3. The start DPA and the length of the extent should align with the block
> > + * size of the region;
> > + * 4. The address range of multiple extents in the list should not overlap.
>
> Hmm. Interesting. I was thinking a given add / remove command rather than
> just the extents can't overlap a region. However I can't find text on that
> so I believe your interpretation is correct. It is only specified for the
> event records, but that is good enough I think. We might want to propose
> tightening the spec on this to allow devices to say no to such complex
> extent lists. Maybe a nice friendly Memory vendor should query this one if
> it's a potential problem for real devices. Might not be!
>
> > + */
> > +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> > + const CXLUpdateDCExtentListInPl *in)
> > +{
> > + uint64_t min_block_size = UINT64_MAX;
> > + CXLDCRegion *region = &ct3d->dc.regions[0];
> > + CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> > + g_autofree unsigned long *blk_bitmap = NULL;
> > + uint64_t dpa, len;
> > + uint32_t i;
> > +
> > + for (i = 0; i < ct3d->dc.num_regions; i++) {
> > + region = &ct3d->dc.regions[i];
> > + min_block_size = MIN(min_block_size, region->block_size);
> > + }
> > +
> > + blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
> > + ct3d->dc.regions[0].base) / min_block_size);
> > +
> > + for (i = 0; i < in->num_entries_updated; i++) {
> > + dpa = in->updated_entries[i].start_dpa;
> > + len = in->updated_entries[i].len;
> > +
> > + region = cxl_find_dc_region(ct3d, dpa, len);
> > + if (!region) {
> > + return CXL_MBOX_INVALID_PA;
> > + }
> > +
> > + dpa -= ct3d->dc.regions[0].base;
> > + if (dpa % region->block_size || len % region->block_size) {
> > + return CXL_MBOX_INVALID_EXTENT_LIST;
> > + }
> > + /* the dpa range already covered by some other extents in the list */
> > + if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> > + len / min_block_size)) {
> > + return CXL_MBOX_INVALID_EXTENT_LIST;
> > + }
> > + bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> > + }
> > +
> > + return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +/*
> > + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
> > + * An extent is added to the extent list and becomes usable only after the
> > + * response is processed successfully
> > + */
> > +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > + uint8_t *payload_in,
> > + size_t len_in,
> > + uint8_t *payload_out,
> > + size_t *len_out,
> > + CXLCCI *cci)
> > +{
> > + CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > + CXLDCExtent *ent;
> > + uint32_t i;
> > + uint64_t dpa, len;
> > + CXLRetCode ret;
> > +
> > + if (in->num_entries_updated == 0) {
> > + return CXL_MBOX_SUCCESS;
> > + }
> > +
> > + /* Adding extents causes exceeding device's extent tracking ability. */
> > + if (in->num_entries_updated + ct3d->dc.total_extent_count >
> > + CXL_NUM_EXTENTS_SUPPORTED) {
> > + return CXL_MBOX_RESOURCES_EXHAUSTED;
> > + }
> > +
> > + ret = cxl_detect_malformed_extent_list(ct3d, in);
> > + if (ret != CXL_MBOX_SUCCESS) {
> > + return ret;
> > + }
> > +
> > + for (i = 0; i < in->num_entries_updated; i++) {
> > + dpa = in->updated_entries[i].start_dpa;
> > + len = in->updated_entries[i].len;
> > +
> > + /*
> > + * Check if the DPA range of the to-be-added extent overlaps with
> > + * existing extent list maintained by the device.
> > + */
> > + QTAILQ_FOREACH(ent, extent_list, node) {
>
> There are too many checks in here for an overlapping test.
>
> Conditions are
>
> | Extent tested against |
> | Overlap entirely |
> | overlap left edge |
> | overlap right edge |
> Think of it in the inverse condition and it is easier to reason about.
>
> | Extent tested against |
> | to left |--- ---| to right |
>
> which I think is something like.
>
> if (!((dpa + len <= ent->start_dpa) || (dpa >= ent->start_dpa + ent->len)) {
> return CXL_MBOX_INVALID_PA;
> }
>
> Hmm. For internal tracking (not the exposed values) we should probably use
> struct range from include/qemu/range.h.
> Felt like there had to be something better than doing this ourselves so I went
> looking. Note it uses inclusive upper bound so be careful with that!
>
> Advantage is we get this checks for free.
> https://elixir.bootlin.com/qemu/latest/source/include/qemu/range.h#L152
> range_overlaps_range()
>
> There are functions to set them up nicely for us and by base and size
> as well which should tidy that part up.
>
>
>
> > + if (ent->start_dpa <= dpa &&
> > + dpa + len <= ent->start_dpa + ent->len) {
> > + return CXL_MBOX_INVALID_PA;
> > + /* Overlapping one end of the other */
> > + } else if ((dpa < ent->start_dpa + ent->len &&
> > + dpa + len > ent->start_dpa + ent->len) ||
> > + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) {
> > + return CXL_MBOX_INVALID_PA;
> > + }
> > + }
> > +
> > + /*
> > + * TODO: we will add a pending extent list based on event log record
> > + * and verify the input response; also, the "More" flag is not
> > + * considered at the moment.
> > + */
> > +
> > + cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > + ct3d->dc.total_extent_count += 1;
> > + }
> > +
> > + return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +/*
> > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> > + */
> > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> > + uint8_t *payload_in,
> > + size_t len_in,
> > + uint8_t *payload_out,
> > + size_t *len_out,
> > + CXLCCI *cci)
> > +{
> > + CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > + CXLDCExtent *ent;
> > + uint32_t i;
> > + uint64_t dpa, len;
> > + CXLRetCode ret;
> > +
> > + if (in->num_entries_updated == 0) {
> > + return CXL_MBOX_INVALID_INPUT;
> > + }
> > +
> > + ret = cxl_detect_malformed_extent_list(ct3d, in);
> > + if (ret != CXL_MBOX_SUCCESS) {
> > + return ret;
> > + }
> > +
> > + for (i = 0; i < in->num_entries_updated; i++) {
> > + bool found = false;
> > +
> > + dpa = in->updated_entries[i].start_dpa;
> > + len = in->updated_entries[i].len;
> > +
> > + QTAILQ_FOREACH(ent, extent_list, node) {
> > + /* Found the extent overlapping with */
> > + if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
> > + if (dpa + len <= ent->start_dpa + ent->len) {
> > + /*
> > + * The incoming extent covers a portion of an extent
> > + * in the device extent list, remove only the overlapping
> > + * portion, meaning
> > + * 1. the portions that are not covered by the incoming
> > + * extent at both end of the original extent will become
> > + * new extents and inserted to the extent list; and
> > + * 2. the original extent is removed from the extent list;
> > + * 3. DC extent count is updated accordingly.
> > + */
> > + uint64_t ent_start_dpa = ent->start_dpa;
> > + uint64_t ent_len = ent->len;
> > + uint64_t len1 = dpa - ent_start_dpa;
> > + uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
> > +
> > + /*
> > + * TODO: checking for possible extent overflow, will be
> > + * moved into a dedicated function of detecting extent
> > + * overflow.
> > + */
> > + if (len1 && len2 && ct3d->dc.total_extent_count ==
> > + CXL_NUM_EXTENTS_SUPPORTED) {
> > + return CXL_MBOX_RESOURCES_EXHAUSTED;
> > + }
> > +
> > + found = true;
> > + cxl_remove_extent_from_extent_list(extent_list, ent);
> > + ct3d->dc.total_extent_count -= 1;
> > +
> > + if (len1) {
> > + cxl_insert_extent_to_extent_list(extent_list,
> > + ent_start_dpa, len1,
> > + NULL, 0);
> > + ct3d->dc.total_extent_count += 1;
> > + }
> > + if (len2) {
> > + cxl_insert_extent_to_extent_list(extent_list, dpa + len,
> > + len2, NULL, 0);
> > + ct3d->dc.total_extent_count += 1;
> > + }
> > + break;
> Maybe this makes sense after the support below is added, but at this
> point in the series
> return CXL_MBOX_SUCCESS;
> then found isn't relevant so can drop that. Looks like you drop it later in the
> series anyway.
We cannot return directly as we have more extents to release.
One thing I think I need to add is a dry run to test if any extent in
the income list is not contained by an extent in the extent list and
return error before starting to do the real release. The spec just says
we need to return invalid PA but not specify whether we should update the list
until we found a "bad" extent or reject the request directly. Current code
leaves a situation where we may have updated the extent list until we found a
"bad" extent to release.
>
> > + } else {
> > + /*
> > + * TODO: we reject the attempt to remove an extent that
> > + * overlaps with multiple extents in the device for now,
> > + * once the bitmap indicating whether a DPA range is
> > + * covered by valid extents is introduced, will allow it.
> > + */
> > + return CXL_MBOX_INVALID_PA;
> > + }
> > + }
> > + }
> > +
> > + if (!found) {
> > + /* Try to remove a non-existing extent. */
> > + return CXL_MBOX_INVALID_PA;
> > + }
> > + }
> > +
> > + return CXL_MBOX_SUCCESS;
> > +}
> > +
>
> > static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 102fa8151e..dccfaaad3a 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -678,6 +678,16 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> > return true;
> > }
> >
> > +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> > +{
> > + CXLDCExtent *ent;
> > +
> > + while (!QTAILQ_EMPTY(&ct3d->dc.extents)) {
> > + ent = QTAILQ_FIRST(&ct3d->dc.extents);
> > + cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
>
> Isn't this same a something like.
> QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node)) {
> cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> //This wrapper is small enough I'd be tempted to just have the
> //code inline at the places it's called.
Good point, will update.
Fan
>
> }
> > + }
> > +}
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
2024-03-06 21:39 ` fan
@ 2024-03-07 12:20 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-07 12:20 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Wed, 6 Mar 2024 13:39:50 -0800
fan <nifan.cxl@gmail.com> wrote:
> > > + }
> > > + if (len2) {
> > > + cxl_insert_extent_to_extent_list(extent_list, dpa + len,
> > > + len2, NULL, 0);
> > > + ct3d->dc.total_extent_count += 1;
> > > + }
> > > + break;
> > Maybe this makes sense after the support below is added, but at this
> > point in the series
> > return CXL_MBOX_SUCCESS;
> > then found isn't relevant so can drop that. Looks like you drop it later in the
> > series anyway.
>
> We cannot return directly as we have more extents to release.
Ah good point. I'd missed the double loop.
> One thing I think I need to add is a dry run to test if any extent in
> the income list is not contained by an extent in the extent list and
> return error before starting to do the real release. The spec just says
> we need to return invalid PA but not specify whether we should update the list
> until we found a "bad" extent or reject the request directly. Current code
> leaves a situation where we may have updated the extent list until we found a
> "bad" extent to release.
Yes, I'm not sure on the correct answer to this either. My assumption is that in
error cases there are no side effects, but I don't see a clear statement of that.
So I think we are in the world of best practice, not spec compliance.
If we wanted to recover from such an error case we'd have to verify the current
extent list. I'll fire off a question to relevant folk in appropriate forum.
Jonathan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
@ 2024-03-07 12:20 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-07 12:20 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Wed, 6 Mar 2024 13:39:50 -0800
fan <nifan.cxl@gmail.com> wrote:
> > > + }
> > > + if (len2) {
> > > + cxl_insert_extent_to_extent_list(extent_list, dpa + len,
> > > + len2, NULL, 0);
> > > + ct3d->dc.total_extent_count += 1;
> > > + }
> > > + break;
> > Maybe this makes sense after the support below is added, but at this
> > point in the series
> > return CXL_MBOX_SUCCESS;
> > then found isn't relevant so can drop that. Looks like you drop it later in the
> > series anyway.
>
> We cannot return directly as we have more extents to release.
Ah good point. I'd missed the double loop.
> One thing I think I need to add is a dry run to test if any extent in
> the income list is not contained by an extent in the extent list and
> return error before starting to do the real release. The spec just says
> we need to return invalid PA but not specify whether we should update the list
> until we found a "bad" extent or reject the request directly. Current code
> leaves a situation where we may have updated the extent list until we found a
> "bad" extent to release.
Yes, I'm not sure on the correct answer to this either. My assumption is that in
error cases there are no side effects, but I don't see a clear statement of that.
So I think we are in the world of best practice, not spec compliance.
If we wanted to recover from such an error case we'd have to verify the current
extent list. I'll fire off a question to relevant folk in appropriate forum.
Jonathan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
2024-03-06 17:28 ` Jonathan Cameron via
(?)
(?)
@ 2024-03-06 22:34 ` fan
2024-03-07 12:30 ` Jonathan Cameron via
-1 siblings, 1 reply; 81+ messages in thread
From: fan @ 2024-03-06 22:34 UTC (permalink / raw)
To: Jonathan Cameron
Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Mar 06, 2024 at 05:28:27PM +0000, Jonathan Cameron wrote:
> On Mon, 4 Mar 2024 11:34:03 -0800
> nifan.cxl@gmail.com wrote:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > Per CXL spec 3.1, two mailbox commands are implemented:
> > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
>
> Hmm. So I had a thought which would work for what you
> have here. See include/qemu/range.h
> I like the region merging stuff that is also in the list operators
> but we shouldn't use that because we have other reasons not to
> fuse ranges (sequence numbering etc)
>
> We could make an extent a wrapper around a struct Range though
> so that we can use the comparison stuff directly.
> + we can use the list manipulation in there as the basis for a future
> extent merging infrastructure that is tag and sequence number (if
> provided - so shared capacity or pmem) aware.
>
> Jonathan
>
>
> > ---
> > +
> > +/*
> > + * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
> > + * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
> > + */
> > +typedef struct CXLUpdateDCExtentListInPl {
> > + uint32_t num_entries_updated;
> > + uint8_t flags;
> > + uint8_t rsvd[3];
> > + /* CXL r3.1 Table 8-169: Updated Extent */
> > + struct {
> > + uint64_t start_dpa;
> > + uint64_t len;
> > + uint8_t rsvd[8];
> > + } QEMU_PACKED updated_entries[];
> > +} QEMU_PACKED CXLUpdateDCExtentListInPl;
> > +
> > +/*
> > + * For the extents in the extent list to operate, check whether they are valid
> > + * 1. The extent should be in the range of a valid DC region;
> > + * 2. The extent should not cross multiple regions;
> > + * 3. The start DPA and the length of the extent should align with the block
> > + * size of the region;
> > + * 4. The address range of multiple extents in the list should not overlap.
>
> Hmm. Interesting. I was thinking a given add / remove command rather than
> just the extents can't overlap a region. However I can't find text on that
> so I believe your interpretation is correct. It is only specified for the
> event records, but that is good enough I think. We might want to propose
> tightening the spec on this to allow devices to say no to such complex
> extent lists. Maybe a nice friendly Memory vendor should query this one if
> it's a potential problem for real devices. Might not be!
>
> > + */
> > +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> > + const CXLUpdateDCExtentListInPl *in)
> > +{
> > + uint64_t min_block_size = UINT64_MAX;
> > + CXLDCRegion *region = &ct3d->dc.regions[0];
> > + CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> > + g_autofree unsigned long *blk_bitmap = NULL;
> > + uint64_t dpa, len;
> > + uint32_t i;
> > +
> > + for (i = 0; i < ct3d->dc.num_regions; i++) {
> > + region = &ct3d->dc.regions[i];
> > + min_block_size = MIN(min_block_size, region->block_size);
> > + }
> > +
> > + blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
> > + ct3d->dc.regions[0].base) / min_block_size);
> > +
> > + for (i = 0; i < in->num_entries_updated; i++) {
> > + dpa = in->updated_entries[i].start_dpa;
> > + len = in->updated_entries[i].len;
> > +
> > + region = cxl_find_dc_region(ct3d, dpa, len);
> > + if (!region) {
> > + return CXL_MBOX_INVALID_PA;
> > + }
> > +
> > + dpa -= ct3d->dc.regions[0].base;
> > + if (dpa % region->block_size || len % region->block_size) {
> > + return CXL_MBOX_INVALID_EXTENT_LIST;
> > + }
> > + /* the dpa range already covered by some other extents in the list */
> > + if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> > + len / min_block_size)) {
> > + return CXL_MBOX_INVALID_EXTENT_LIST;
> > + }
> > + bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> > + }
> > +
> > + return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +/*
> > + * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
> > + * An extent is added to the extent list and becomes usable only after the
> > + * response is processed successfully
> > + */
> > +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> > + uint8_t *payload_in,
> > + size_t len_in,
> > + uint8_t *payload_out,
> > + size_t *len_out,
> > + CXLCCI *cci)
> > +{
> > + CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > + CXLDCExtent *ent;
> > + uint32_t i;
> > + uint64_t dpa, len;
> > + CXLRetCode ret;
> > +
> > + if (in->num_entries_updated == 0) {
> > + return CXL_MBOX_SUCCESS;
> > + }
> > +
> > + /* Adding extents causes exceeding device's extent tracking ability. */
> > + if (in->num_entries_updated + ct3d->dc.total_extent_count >
> > + CXL_NUM_EXTENTS_SUPPORTED) {
> > + return CXL_MBOX_RESOURCES_EXHAUSTED;
> > + }
> > +
> > + ret = cxl_detect_malformed_extent_list(ct3d, in);
> > + if (ret != CXL_MBOX_SUCCESS) {
> > + return ret;
> > + }
> > +
> > + for (i = 0; i < in->num_entries_updated; i++) {
> > + dpa = in->updated_entries[i].start_dpa;
> > + len = in->updated_entries[i].len;
> > +
> > + /*
> > + * Check if the DPA range of the to-be-added extent overlaps with
> > + * existing extent list maintained by the device.
> > + */
> > + QTAILQ_FOREACH(ent, extent_list, node) {
>
> There are too many checks in here for an overlapping test.
>
> Conditions are
>
> | Extent tested against |
> | Overlap entirely |
> | overlap left edge |
> | overlap right edge |
> Think of it in the inverse condition and it is easier to reason about.
>
> | Extent tested against |
> | to left |--- ---| to right |
>
> which I think is something like.
>
> if (!((dpa + len <= ent->start_dpa) || (dpa >= ent->start_dpa + ent->len)) {
> return CXL_MBOX_INVALID_PA;
> }
>
> Hmm. For internal tracking (not the exposed values) we should probably use
> struct range from include/qemu/range.h.
> Felt like there had to be something better than doing this ourselves so I went
> looking. Note it uses inclusive upper bound so be careful with that!
>
> Advantage is we get this checks for free.
> https://elixir.bootlin.com/qemu/latest/source/include/qemu/range.h#L152
> range_overlaps_range()
>
> There are functions to set them up nicely for us and by base and size
> as well which should tidy that part up.
>
>
>
> > + if (ent->start_dpa <= dpa &&
> > + dpa + len <= ent->start_dpa + ent->len) {
> > + return CXL_MBOX_INVALID_PA;
> > + /* Overlapping one end of the other */
> > + } else if ((dpa < ent->start_dpa + ent->len &&
> > + dpa + len > ent->start_dpa + ent->len) ||
> > + (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) {
> > + return CXL_MBOX_INVALID_PA;
> > + }
> > + }
> > +
> > + /*
> > + * TODO: we will add a pending extent list based on event log record
> > + * and verify the input response; also, the "More" flag is not
> > + * considered at the moment.
> > + */
> > +
> > + cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> > + ct3d->dc.total_extent_count += 1;
> > + }
> > +
> > + return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +/*
> > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> > + */
> > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> > + uint8_t *payload_in,
> > + size_t len_in,
> > + uint8_t *payload_out,
> > + size_t *len_out,
> > + CXLCCI *cci)
> > +{
> > + CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > + CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > + CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > + CXLDCExtent *ent;
> > + uint32_t i;
> > + uint64_t dpa, len;
> > + CXLRetCode ret;
> > +
> > + if (in->num_entries_updated == 0) {
> > + return CXL_MBOX_INVALID_INPUT;
> > + }
> > +
> > + ret = cxl_detect_malformed_extent_list(ct3d, in);
> > + if (ret != CXL_MBOX_SUCCESS) {
> > + return ret;
> > + }
> > +
> > + for (i = 0; i < in->num_entries_updated; i++) {
> > + bool found = false;
> > +
> > + dpa = in->updated_entries[i].start_dpa;
> > + len = in->updated_entries[i].len;
> > +
> > + QTAILQ_FOREACH(ent, extent_list, node) {
> > + /* Found the extent overlapping with */
> > + if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
> > + if (dpa + len <= ent->start_dpa + ent->len) {
> > + /*
> > + * The incoming extent covers a portion of an extent
> > + * in the device extent list, remove only the overlapping
> > + * portion, meaning
> > + * 1. the portions that are not covered by the incoming
> > + * extent at both end of the original extent will become
> > + * new extents and inserted to the extent list; and
> > + * 2. the original extent is removed from the extent list;
> > + * 3. DC extent count is updated accordingly.
> > + */
> > + uint64_t ent_start_dpa = ent->start_dpa;
> > + uint64_t ent_len = ent->len;
> > + uint64_t len1 = dpa - ent_start_dpa;
> > + uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
> > +
> > + /*
> > + * TODO: checking for possible extent overflow, will be
> > + * moved into a dedicated function of detecting extent
> > + * overflow.
> > + */
> > + if (len1 && len2 && ct3d->dc.total_extent_count ==
> > + CXL_NUM_EXTENTS_SUPPORTED) {
> > + return CXL_MBOX_RESOURCES_EXHAUSTED;
> > + }
> > +
> > + found = true;
> > + cxl_remove_extent_from_extent_list(extent_list, ent);
> > + ct3d->dc.total_extent_count -= 1;
> > +
> > + if (len1) {
> > + cxl_insert_extent_to_extent_list(extent_list,
> > + ent_start_dpa, len1,
> > + NULL, 0);
> > + ct3d->dc.total_extent_count += 1;
> > + }
> > + if (len2) {
> > + cxl_insert_extent_to_extent_list(extent_list, dpa + len,
> > + len2, NULL, 0);
> > + ct3d->dc.total_extent_count += 1;
> > + }
> > + break;
> Maybe this makes sense after the support below is added, but at this
> point in the series
> return CXL_MBOX_SUCCESS;
> then found isn't relevant so can drop that. Looks like you drop it later in the
> series anyway.
>
> > + } else {
> > + /*
> > + * TODO: we reject the attempt to remove an extent that
> > + * overlaps with multiple extents in the device for now,
> > + * once the bitmap indicating whether a DPA range is
> > + * covered by valid extents is introduced, will allow it.
> > + */
> > + return CXL_MBOX_INVALID_PA;
> > + }
> > + }
> > + }
> > +
> > + if (!found) {
> > + /* Try to remove a non-existing extent. */
> > + return CXL_MBOX_INVALID_PA;
> > + }
> > + }
> > +
> > + return CXL_MBOX_SUCCESS;
> > +}
> > +
>
> > static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 102fa8151e..dccfaaad3a 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -678,6 +678,16 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> > return true;
> > }
> >
> > +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> > +{
> > + CXLDCExtent *ent;
> > +
> > + while (!QTAILQ_EMPTY(&ct3d->dc.extents)) {
> > + ent = QTAILQ_FIRST(&ct3d->dc.extents);
> > + cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
>
> Isn't this same a something like.
> QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node)) {
> cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> //This wrapper is small enough I'd be tempted to just have the
> //code inline at the places it's called.
>
We will have more to release after we introduce pending list as well as
bitmap. Keep it?
Fan
> }
> > + }
> > +}
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
2024-03-06 22:34 ` fan
@ 2024-03-07 12:30 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-07 12:30 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
> > > +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> > > +{
> > > + CXLDCExtent *ent;
> > > +
> > > + while (!QTAILQ_EMPTY(&ct3d->dc.extents)) {
> > > + ent = QTAILQ_FIRST(&ct3d->dc.extents);
> > > + cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> >
> > Isn't this same a something like.
> > QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node)) {
> > cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> > //This wrapper is small enough I'd be tempted to just have the
> > //code inline at the places it's called.
> >
> We will have more to release after we introduce pending list as well as
> bitmap. Keep it?
ok.
>
> Fan
>
> > }
> > > + }
> > > +}
> >
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
@ 2024-03-07 12:30 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-07 12:30 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
> > > +static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> > > +{
> > > + CXLDCExtent *ent;
> > > +
> > > + while (!QTAILQ_EMPTY(&ct3d->dc.extents)) {
> > > + ent = QTAILQ_FIRST(&ct3d->dc.extents);
> > > + cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> >
> > Isn't this same a something like.
> > QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node)) {
> > cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> > //This wrapper is small enough I'd be tempted to just have the
> > //code inline at the places it's called.
> >
> We will have more to release after we introduce pending list as well as
> bitmap. Keep it?
ok.
>
> Fan
>
> > }
> > > + }
> > > +}
> >
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (7 preceding siblings ...)
2024-03-04 19:34 ` [PATCH v5 08/13] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
@ 2024-03-04 19:34 ` nifan.cxl
2024-03-06 17:48 ` Jonathan Cameron via
2024-04-24 13:09 ` Markus Armbruster
2024-03-04 19:34 ` [PATCH v5 10/13] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions nifan.cxl
` (3 subsequent siblings)
12 siblings, 2 replies; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:34 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
Since fabric manager emulation is not supported yet, the change implements
the functions to add/release dynamic capacity extents as QMP interfaces.
Note: we skips any FM issued extent release request if the exact extent
does not exist in the extent list of the device. We will loose the
restriction later once we have partial release support in the kernel.
1. Add dynamic capacity extents:
For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:
{ "execute": "qmp_capabilities" }
{ "execute": "cxl-add-dynamic-capacity",
"arguments": {
"path": "/machine/peripheral/cxl-dcd0",
"region-id": 0,
"extents": [
{
"dpa": 0,
"len": 134217728
},
{
"dpa": 134217728,
"len": 134217728
}
]
}
}
2. Release dynamic capacity extents:
For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) look like below:
{ "execute": "cxl-release-dynamic-capacity",
"arguments": {
"path": "/machine/peripheral/cxl-dcd0",
"region-id": 0,
"extents": [
{
"dpa": 134217728,
"len": 134217728
}
]
}
}
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/cxl/cxl-mailbox-utils.c | 26 ++--
hw/mem/cxl_type3.c | 245 +++++++++++++++++++++++++++++++++++-
hw/mem/cxl_type3_stubs.c | 14 +++
include/hw/cxl/cxl_device.h | 6 +
include/hw/cxl/cxl_events.h | 18 +++
qapi/cxl.json | 61 ++++++++-
6 files changed, 361 insertions(+), 9 deletions(-)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 8c59635a9f..53ebc526ae 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
* Check whether any bit between addr[nr, nr+size) is set,
* return true if any bit is set, otherwise return false
*/
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
unsigned long size)
{
unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
return NULL;
}
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
uint64_t dpa,
uint64_t len,
uint8_t *tag,
@@ -1591,16 +1591,28 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
}
}
- /*
- * TODO: we will add a pending extent list based on event log record
- * and verify the input response; also, the "More" flag is not
- * considered at the moment.
- */
+ QTAILQ_FOREACH(ent, &ct3d->dc.extents_pending_to_add, node) {
+ if (ent->start_dpa <= dpa &&
+ dpa + len <= ent->start_dpa + ent->len) {
+ break;
+ }
+ }
+ if (!ent) {
+ return CXL_MBOX_INVALID_PA;
+ }
+
+ cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending_to_add,
+ ent);
cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
ct3d->dc.total_extent_count += 1;
}
+ /*
+ * TODO: extents_pending_to_add needs to be cleared so the extents not
+ * accepted can be reclaimed base on spec r3.1: 8.2.9.9.9.3
+ */
+
return CXL_MBOX_SUCCESS;
}
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index dccfaaad3a..e9c8994cdb 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -674,6 +674,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
ct3d->dc.total_capacity += region->len;
}
QTAILQ_INIT(&ct3d->dc.extents);
+ QTAILQ_INIT(&ct3d->dc.extents_pending_to_add);
return true;
}
@@ -686,6 +687,12 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
ent = QTAILQ_FIRST(&ct3d->dc.extents);
cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
}
+
+ while (!QTAILQ_EMPTY(&ct3d->dc.extents_pending_to_add)) {
+ ent = QTAILQ_FIRST(&ct3d->dc.extents_pending_to_add);
+ cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending_to_add,
+ ent);
+ }
}
static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1451,7 +1458,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
return CXL_EVENT_TYPE_FAIL;
case CXL_EVENT_LOG_FATAL:
return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
+ case CXL_EVENT_LOG_DYNCAP:
+ return CXL_EVENT_TYPE_DYNAMIC_CAP;
default:
return -EINVAL;
}
@@ -1702,6 +1710,241 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
}
}
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+ .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+ 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+ DC_EVENT_ADD_CAPACITY = 0x0,
+ DC_EVENT_RELEASE_CAPACITY = 0x1,
+ DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+ DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+ DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+ DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the exact extent exists in the list
+ * Return value: the extent pointer in the list; else null
+ */
+static CXLDCExtent *cxl_dc_extent_exists(CXLDCExtentList *list,
+ CXLDCExtentRaw *ext)
+{
+ CXLDCExtent *ent;
+
+ if (!ext || !list) {
+ return NULL;
+ }
+
+ QTAILQ_FOREACH(ent, list, node) {
+ if (ent->start_dpa != ext->start_dpa) {
+ continue;
+ }
+
+ /* Found exact extent */
+ return ent->len == ext->len ? ent : NULL;
+ }
+
+ return NULL;
+}
+
+/*
+ * The main function to process dynamic capacity event. Currently DC extents
+ * add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
+ CXLDCEventType type, uint16_t hid,
+ uint8_t rid,
+ CXLDCExtentRecordList *records,
+ Error **errp)
+{
+ Object *obj;
+ CXLEventDynamicCapacity dCap = {};
+ CXLEventRecordHdr *hdr = &dCap.hdr;
+ CXLType3Dev *dcd;
+ uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+ uint32_t num_extents = 0;
+ CXLDCExtentRecordList *list;
+ g_autofree CXLDCExtentRaw *extents = NULL;
+ uint8_t enc_log;
+ uint64_t offset, len, block_size;
+ int i;
+ int rc;
+ g_autofree unsigned long *blk_bitmap = NULL;
+
+ obj = object_resolve_path(path, NULL);
+ if (!obj) {
+ error_setg(errp, "Unable to resolve path");
+ return;
+ }
+ if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+ error_setg(errp, "Path not point to a valid CXL type3 device");
+ return;
+ }
+
+ dcd = CXL_TYPE3(obj);
+ if (!dcd->dc.num_regions) {
+ error_setg(errp, "No dynamic capacity support from the device");
+ return;
+ }
+
+ rc = ct3d_qmp_cxl_event_log_enc(log);
+ if (rc < 0) {
+ error_setg(errp, "Unhandled error log type");
+ return;
+ }
+ enc_log = rc;
+
+ if (rid >= dcd->dc.num_regions) {
+ error_setg(errp, "region id is too large");
+ return;
+ }
+ block_size = dcd->dc.regions[rid].block_size;
+
+ /* Sanity check and count the extents */
+ list = records;
+ while (list) {
+ offset = list->value->offset;
+ len = list->value->len;
+
+ if (len == 0) {
+ error_setg(errp, "extent with 0 length is not allowed");
+ return;
+ }
+
+ if (offset % block_size || len % block_size) {
+ error_setg(errp, "dpa or len is not aligned to region block size");
+ return;
+ }
+
+ if (offset + len > dcd->dc.regions[rid].len) {
+ error_setg(errp, "extent range is beyond the region end");
+ return;
+ }
+
+ num_extents++;
+ list = list->next;
+ }
+ if (num_extents == 0) {
+ error_setg(errp, "No extents found in the command");
+ return;
+ }
+
+ blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+ /* Create Extent list for event being passed to host */
+ i = 0;
+ list = records;
+ extents = g_new0(CXLDCExtentRaw, num_extents);
+ while (list) {
+ CXLDCExtent *ent;
+ bool skip_extent = false;
+
+ offset = list->value->offset;
+ len = list->value->len;
+
+ extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
+ extents[i].len = len;
+ memset(extents[i].tag, 0, 0x10);
+ extents[i].shared_seq = 0;
+
+ if (type == DC_EVENT_RELEASE_CAPACITY ||
+ type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
+ /*
+ * if the extent is still pending to be added to the host,
+ * remove it from the pending extent list, so later when the add
+ * response for the extent arrives, the device can reject the
+ * extent as it is not in the pending list.
+ */
+ ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
+ &extents[i]);
+ if (ent) {
+ QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
+ g_free(ent);
+ skip_extent = true;
+ } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
+ /* If the exact extent is not in the accepted list, skip */
+ skip_extent = true;
+ }
+ }
+
+ /* No duplicate or overlapped extents are allowed */
+ if (test_any_bits_set(blk_bitmap, offset / block_size,
+ len / block_size)) {
+ error_setg(errp, "duplicate or overlapped extents are detected");
+ return;
+ }
+ bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+ list = list->next;
+ if (!skip_extent) {
+ i++;
+ }
+ }
+ num_extents = i;
+
+ /*
+ * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+ *
+ * All Dynamic Capacity event records shall set the Event Record Severity
+ * field in the Common Event Record Format to Informational Event. All
+ * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+ * Event Log.
+ */
+ cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+ cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+ dCap.type = type;
+ /* FIXME: for now, validity flag is cleared */
+ dCap.validity_flags = 0;
+ stw_le_p(&dCap.host_id, hid);
+ /* only valid for DC_REGION_CONFIG_UPDATED event */
+ dCap.updated_region_id = 0;
+ /*
+ * FIXME: for now, the "More" flag is cleared as there is only one
+ * extent associating with each record and tag-based release is
+ * not supported.
+ */
+ dCap.flags = 0;
+ for (i = 0; i < num_extents; i++) {
+ memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+ sizeof(CXLDCExtentRaw));
+
+ if (type == DC_EVENT_ADD_CAPACITY) {
+ cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending_to_add,
+ extents[i].start_dpa,
+ extents[i].len,
+ extents[i].tag,
+ extents[i].shared_seq);
+ }
+
+ if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+ (CXLEventRecordRaw *)&dCap)) {
+ cxl_event_irq_assert(dcd);
+ }
+ }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
+ CXLDCExtentRecordList *records,
+ Error **errp)
+{
+ qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
+ DC_EVENT_ADD_CAPACITY, 0,
+ region_id, records, errp);
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
+ CXLDCExtentRecordList *records,
+ Error **errp)
+{
+ qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
+ DC_EVENT_RELEASE_CAPACITY, 0,
+ region_id, records, errp);
+}
+
static void ct3_class_init(ObjectClass *oc, void *data)
{
DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..d913b11b4d 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,17 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
{
error_setg(errp, "CXL Type 3 support is not compiled in");
}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
+ CXLDCExtentRecordList *records,
+ Error **errp)
+{
+ error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
+ CXLDCExtentRecordList *records,
+ Error **errp)
+{
+ error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 341260e6e4..b524c5e699 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -490,6 +490,7 @@ struct CXLType3Dev {
AddressSpace host_dc_as;
uint64_t total_capacity; /* 256M aligned */
CXLDCExtentList extents;
+ CXLDCExtentList extents_pending_to_add;
uint32_t total_extent_count;
uint32_t ext_list_gen_seq;
@@ -551,4 +552,9 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+ uint64_t len, uint8_t *tag,
+ uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+ unsigned long size);
#endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
uint8_t reserved[0x3d];
} QEMU_PACKED CXLEventMemoryModule;
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+ CXLEventRecordHdr hdr;
+ uint8_t type;
+ uint8_t validity_flags;
+ uint16_t host_id;
+ uint8_t updated_region_id;
+ uint8_t flags;
+ uint8_t reserved2[2];
+ uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+ uint8_t reserved[0x18];
+ uint32_t extents_avail;
+ uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
#endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 8cc4c72fa9..2645004666 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -19,13 +19,16 @@
#
# @fatal: Fatal Event Log
#
+# @dyncap: Dynamic Capacity Event Log
+#
# Since: 8.1
##
{ 'enum': 'CxlEventLog',
'data': ['informational',
'warning',
'failure',
- 'fatal']
+ 'fatal',
+ 'dyncap']
}
##
@@ -361,3 +364,59 @@
##
{'command': 'cxl-inject-correctable-error',
'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDCExtentRecord:
+#
+# Record of a single extent to add/release
+#
+# @offset: offset to the start of the region where the extent to be operated
+# @len: length of the extent
+#
+# Since: 9.0
+##
+{ 'struct': 'CXLDCExtentRecord',
+ 'data': {
+ 'offset':'uint64',
+ 'len': 'uint64'
+ }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to start add dynamic capacity extents flow. The device will
+# have to acknowledged the acceptance of the extents before they are usable.
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: id of the region where the extent to add
+# @extents: Extents to add
+#
+# Since : 9.0
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+ 'data': { 'path': 'str',
+ 'region-id': 'uint8',
+ 'extents': [ 'CXLDCExtentRecord' ]
+ }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to start release dynamic capacity extents flow. The host will
+# need to respond to indicate that it has released the capacity before it
+# is made unavailable for read and write and can be re-added.
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: id of the region where the extent to release
+# @extents: Extents to release
+#
+# Since : 9.0
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+ 'data': { 'path': 'str',
+ 'region-id': 'uint8',
+ 'extents': [ 'CXLDCExtentRecord' ]
+ }
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-04 19:34 ` [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-03-06 17:48 ` Jonathan Cameron via
2024-04-24 13:09 ` Markus Armbruster
1 sibling, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 17:48 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:04 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Since fabric manager emulation is not supported yet, the change implements
> the functions to add/release dynamic capacity extents as QMP interfaces.
We'll need them anyway, or to implement an fm interface via QMP which is
going to be ugly and complex.
>
> Note: we skips any FM issued extent release request if the exact extent
> does not exist in the extent list of the device. We will loose the
> restriction later once we have partial release support in the kernel.
Maybe the kernel will treat it as a request to release the extent it
is tracking that contains it. So we may want to add a way to poke that.
Not today though!
>
> 1. Add dynamic capacity extents:
>
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
>
> { "execute": "qmp_capabilities" }
>
> { "execute": "cxl-add-dynamic-capacity",
> "arguments": {
> "path": "/machine/peripheral/cxl-dcd0",
> "region-id": 0,
> "extents": [
> {
> "dpa": 0,
> "len": 134217728
> },
> {
> "dpa": 134217728,
> "len": 134217728
> }
> ]
> }
> }
>
> 2. Release dynamic capacity extents:
>
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) look like below:
>
> { "execute": "cxl-release-dynamic-capacity",
> "arguments": {
> "path": "/machine/peripheral/cxl-dcd0",
> "region-id": 0,
> "extents": [
> {
> "dpa": 134217728,
> "len": 134217728
> }
> ]
> }
> }
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
...
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index dccfaaad3a..e9c8994cdb 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -674,6 +674,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> ct3d->dc.total_capacity += region->len;
> }
> QTAILQ_INIT(&ct3d->dc.extents);
> + QTAILQ_INIT(&ct3d->dc.extents_pending_to_add);
>
> return true;
> }
> @@ -686,6 +687,12 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> ent = QTAILQ_FIRST(&ct3d->dc.extents);
> cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> }
> +
> + while (!QTAILQ_EMPTY(&ct3d->dc.extents_pending_to_add)) {
QTAILQ_FOR_EACHSAFE
> + ent = QTAILQ_FIRST(&ct3d->dc.extents_pending_to_add);
> + cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending_to_add,
> + ent);
> + }
> }
> +/*
> + * The main function to process dynamic capacity event. Currently DC extents
> + * add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> + CXLDCEventType type, uint16_t hid,
> + uint8_t rid,
> + CXLDCExtentRecordList *records,
> + Error **errp)
> +{
> + Object *obj;
> + CXLEventDynamicCapacity dCap = {};
> + CXLEventRecordHdr *hdr = &dCap.hdr;
> + CXLType3Dev *dcd;
> + uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> + uint32_t num_extents = 0;
> + CXLDCExtentRecordList *list;
> + g_autofree CXLDCExtentRaw *extents = NULL;
> + uint8_t enc_log;
> + uint64_t offset, len, block_size;
> + int i;
> + int rc;
Combine the two lines above.
> + g_autofree unsigned long *blk_bitmap = NULL;
> +
> + obj = object_resolve_path(path, NULL);
> + if (!obj) {
> + error_setg(errp, "Unable to resolve path");
> + return;
> + }
object_resolve_path_type() and skip a step (should do this in various places
in our existing code!)
> + if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> + error_setg(errp, "Path not point to a valid CXL type3 device");
> + return;
> + }
> +
> + dcd = CXL_TYPE3(obj);
> + if (!dcd->dc.num_regions) {
> + error_setg(errp, "No dynamic capacity support from the device");
> + return;
> + }
> +
> + rc = ct3d_qmp_cxl_event_log_enc(log);
> + if (rc < 0) {
> + error_setg(errp, "Unhandled error log type");
> + return;
> + }
> + enc_log = rc;
> +
> + if (rid >= dcd->dc.num_regions) {
> + error_setg(errp, "region id is too large");
> + return;
> + }
> + block_size = dcd->dc.regions[rid].block_size;
> +
> + /* Sanity check and count the extents */
> + list = records;
> + while (list) {
> + offset = list->value->offset;
> + len = list->value->len;
> +
> + if (len == 0) {
> + error_setg(errp, "extent with 0 length is not allowed");
> + return;
> + }
> +
> + if (offset % block_size || len % block_size) {
> + error_setg(errp, "dpa or len is not aligned to region block size");
> + return;
> + }
> +
> + if (offset + len > dcd->dc.regions[rid].len) {
> + error_setg(errp, "extent range is beyond the region end");
> + return;
> + }
> +
> + num_extents++;
> + list = list->next;
> + }
> + if (num_extents == 0) {
> + error_setg(errp, "No extents found in the command");
> + return;
> + }
> +
> + blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> + /* Create Extent list for event being passed to host */
> + i = 0;
> + list = records;
> + extents = g_new0(CXLDCExtentRaw, num_extents);
> + while (list) {
> + CXLDCExtent *ent;
> + bool skip_extent = false;
> +
> + offset = list->value->offset;
> + len = list->value->len;
> +
> + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> + extents[i].len = len;
> + memset(extents[i].tag, 0, 0x10);
> + extents[i].shared_seq = 0;
> +
> + if (type == DC_EVENT_RELEASE_CAPACITY ||
> + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> + /*
> + * if the extent is still pending to be added to the host,
Odd spacing.
> + * remove it from the pending extent list, so later when the add
> + * response for the extent arrives, the device can reject the
> + * extent as it is not in the pending list.
> + */
> + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> + &extents[i]);
> + if (ent) {
> + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> + g_free(ent);
> + skip_extent = true;
> + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> + /* If the exact extent is not in the accepted list, skip */
> + skip_extent = true;
> + }
I think we need to reject case of some extents skipped and others not.
That's not supported yet so we need to complain if we get it at least. Maybe we need
to do two passes so we know this has happened early (or perhaps this is a later
patch in which case a todo here would help).
> +
> +
> + /* No duplicate or overlapped extents are allowed */
> + if (test_any_bits_set(blk_bitmap, offset / block_size,
> + len / block_size)) {
> + error_setg(errp, "duplicate or overlapped extents are detected");
> + return;
> + }
> + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> + list = list->next;
> + if (!skip_extent) {
> + i++;
Problem is if we skip one in the middle the records will be wrong below.
> + }
> + }
> + num_extents = i;
> +
> + /*
> + * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> + *
> + * All Dynamic Capacity event records shall set the Event Record Severity
> + * field in the Common Event Record Format to Informational Event. All
> + * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> + * Event Log.
> + */
> + cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> + cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> + dCap.type = type;
> + /* FIXME: for now, validity flag is cleared */
> + dCap.validity_flags = 0;
> + stw_le_p(&dCap.host_id, hid);
> + /* only valid for DC_REGION_CONFIG_UPDATED event */
> + dCap.updated_region_id = 0;
> + /*
> + * FIXME: for now, the "More" flag is cleared as there is only one
> + * extent associating with each record and tag-based release is
> + * not supported.
Hmm. Seems like tag support would be easy. Add an optional qmp parameter,
if a tag is set, we set the more flag for all but the last entry in this
loop. I'm ok with that being a follow up patch though.
> + */
> + dCap.flags = 0;
> + for (i = 0; i < num_extents; i++) {
> + memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> + sizeof(CXLDCExtentRaw));
> +
> + if (type == DC_EVENT_ADD_CAPACITY) {
> + cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending_to_add,
> + extents[i].start_dpa,
> + extents[i].len,
> + extents[i].tag,
> + extents[i].shared_seq);
> + }
> +
> + if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> + (CXLEventRecordRaw *)&dCap)) {
> + cxl_event_irq_assert(dcd);
> + }
> + }
> +}
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 341260e6e4..b524c5e699 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -490,6 +490,7 @@ struct CXLType3Dev {
> AddressSpace host_dc_as;
> uint64_t total_capacity; /* 256M aligned */
> CXLDCExtentList extents;
> + CXLDCExtentList extents_pending_to_add;
Long name, extents_pending or just pending is plenty I think.
> uint32_t total_extent_count;
> uint32_t ext_list_gen_seq;
>
> @@ -551,4 +552,9 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
>
> void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> CXLDCExtent *extent);
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> + uint64_t len, uint8_t *tag,
> + uint16_t shared_seq);
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> + unsigned long size);
> #endif
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-03-06 17:48 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 17:48 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:04 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Since fabric manager emulation is not supported yet, the change implements
> the functions to add/release dynamic capacity extents as QMP interfaces.
We'll need them anyway, or to implement an fm interface via QMP which is
going to be ugly and complex.
>
> Note: we skips any FM issued extent release request if the exact extent
> does not exist in the extent list of the device. We will loose the
> restriction later once we have partial release support in the kernel.
Maybe the kernel will treat it as a request to release the extent it
is tracking that contains it. So we may want to add a way to poke that.
Not today though!
>
> 1. Add dynamic capacity extents:
>
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
>
> { "execute": "qmp_capabilities" }
>
> { "execute": "cxl-add-dynamic-capacity",
> "arguments": {
> "path": "/machine/peripheral/cxl-dcd0",
> "region-id": 0,
> "extents": [
> {
> "dpa": 0,
> "len": 134217728
> },
> {
> "dpa": 134217728,
> "len": 134217728
> }
> ]
> }
> }
>
> 2. Release dynamic capacity extents:
>
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) look like below:
>
> { "execute": "cxl-release-dynamic-capacity",
> "arguments": {
> "path": "/machine/peripheral/cxl-dcd0",
> "region-id": 0,
> "extents": [
> {
> "dpa": 134217728,
> "len": 134217728
> }
> ]
> }
> }
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
...
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index dccfaaad3a..e9c8994cdb 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -674,6 +674,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> ct3d->dc.total_capacity += region->len;
> }
> QTAILQ_INIT(&ct3d->dc.extents);
> + QTAILQ_INIT(&ct3d->dc.extents_pending_to_add);
>
> return true;
> }
> @@ -686,6 +687,12 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> ent = QTAILQ_FIRST(&ct3d->dc.extents);
> cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> }
> +
> + while (!QTAILQ_EMPTY(&ct3d->dc.extents_pending_to_add)) {
QTAILQ_FOR_EACHSAFE
> + ent = QTAILQ_FIRST(&ct3d->dc.extents_pending_to_add);
> + cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending_to_add,
> + ent);
> + }
> }
> +/*
> + * The main function to process dynamic capacity event. Currently DC extents
> + * add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> + CXLDCEventType type, uint16_t hid,
> + uint8_t rid,
> + CXLDCExtentRecordList *records,
> + Error **errp)
> +{
> + Object *obj;
> + CXLEventDynamicCapacity dCap = {};
> + CXLEventRecordHdr *hdr = &dCap.hdr;
> + CXLType3Dev *dcd;
> + uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> + uint32_t num_extents = 0;
> + CXLDCExtentRecordList *list;
> + g_autofree CXLDCExtentRaw *extents = NULL;
> + uint8_t enc_log;
> + uint64_t offset, len, block_size;
> + int i;
> + int rc;
Combine the two lines above.
> + g_autofree unsigned long *blk_bitmap = NULL;
> +
> + obj = object_resolve_path(path, NULL);
> + if (!obj) {
> + error_setg(errp, "Unable to resolve path");
> + return;
> + }
object_resolve_path_type() and skip a step (should do this in various places
in our existing code!)
> + if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> + error_setg(errp, "Path not point to a valid CXL type3 device");
> + return;
> + }
> +
> + dcd = CXL_TYPE3(obj);
> + if (!dcd->dc.num_regions) {
> + error_setg(errp, "No dynamic capacity support from the device");
> + return;
> + }
> +
> + rc = ct3d_qmp_cxl_event_log_enc(log);
> + if (rc < 0) {
> + error_setg(errp, "Unhandled error log type");
> + return;
> + }
> + enc_log = rc;
> +
> + if (rid >= dcd->dc.num_regions) {
> + error_setg(errp, "region id is too large");
> + return;
> + }
> + block_size = dcd->dc.regions[rid].block_size;
> +
> + /* Sanity check and count the extents */
> + list = records;
> + while (list) {
> + offset = list->value->offset;
> + len = list->value->len;
> +
> + if (len == 0) {
> + error_setg(errp, "extent with 0 length is not allowed");
> + return;
> + }
> +
> + if (offset % block_size || len % block_size) {
> + error_setg(errp, "dpa or len is not aligned to region block size");
> + return;
> + }
> +
> + if (offset + len > dcd->dc.regions[rid].len) {
> + error_setg(errp, "extent range is beyond the region end");
> + return;
> + }
> +
> + num_extents++;
> + list = list->next;
> + }
> + if (num_extents == 0) {
> + error_setg(errp, "No extents found in the command");
> + return;
> + }
> +
> + blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> + /* Create Extent list for event being passed to host */
> + i = 0;
> + list = records;
> + extents = g_new0(CXLDCExtentRaw, num_extents);
> + while (list) {
> + CXLDCExtent *ent;
> + bool skip_extent = false;
> +
> + offset = list->value->offset;
> + len = list->value->len;
> +
> + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> + extents[i].len = len;
> + memset(extents[i].tag, 0, 0x10);
> + extents[i].shared_seq = 0;
> +
> + if (type == DC_EVENT_RELEASE_CAPACITY ||
> + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> + /*
> + * if the extent is still pending to be added to the host,
Odd spacing.
> + * remove it from the pending extent list, so later when the add
> + * response for the extent arrives, the device can reject the
> + * extent as it is not in the pending list.
> + */
> + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> + &extents[i]);
> + if (ent) {
> + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> + g_free(ent);
> + skip_extent = true;
> + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> + /* If the exact extent is not in the accepted list, skip */
> + skip_extent = true;
> + }
I think we need to reject case of some extents skipped and others not.
That's not supported yet so we need to complain if we get it at least. Maybe we need
to do two passes so we know this has happened early (or perhaps this is a later
patch in which case a todo here would help).
> +
> +
> + /* No duplicate or overlapped extents are allowed */
> + if (test_any_bits_set(blk_bitmap, offset / block_size,
> + len / block_size)) {
> + error_setg(errp, "duplicate or overlapped extents are detected");
> + return;
> + }
> + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> + list = list->next;
> + if (!skip_extent) {
> + i++;
Problem is if we skip one in the middle the records will be wrong below.
> + }
> + }
> + num_extents = i;
> +
> + /*
> + * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> + *
> + * All Dynamic Capacity event records shall set the Event Record Severity
> + * field in the Common Event Record Format to Informational Event. All
> + * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> + * Event Log.
> + */
> + cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> + cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> + dCap.type = type;
> + /* FIXME: for now, validity flag is cleared */
> + dCap.validity_flags = 0;
> + stw_le_p(&dCap.host_id, hid);
> + /* only valid for DC_REGION_CONFIG_UPDATED event */
> + dCap.updated_region_id = 0;
> + /*
> + * FIXME: for now, the "More" flag is cleared as there is only one
> + * extent associating with each record and tag-based release is
> + * not supported.
Hmm. Seems like tag support would be easy. Add an optional qmp parameter,
if a tag is set, we set the more flag for all but the last entry in this
loop. I'm ok with that being a follow up patch though.
> + */
> + dCap.flags = 0;
> + for (i = 0; i < num_extents; i++) {
> + memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> + sizeof(CXLDCExtentRaw));
> +
> + if (type == DC_EVENT_ADD_CAPACITY) {
> + cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending_to_add,
> + extents[i].start_dpa,
> + extents[i].len,
> + extents[i].tag,
> + extents[i].shared_seq);
> + }
> +
> + if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> + (CXLEventRecordRaw *)&dCap)) {
> + cxl_event_irq_assert(dcd);
> + }
> + }
> +}
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 341260e6e4..b524c5e699 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -490,6 +490,7 @@ struct CXLType3Dev {
> AddressSpace host_dc_as;
> uint64_t total_capacity; /* 256M aligned */
> CXLDCExtentList extents;
> + CXLDCExtentList extents_pending_to_add;
Long name, extents_pending or just pending is plenty I think.
> uint32_t total_extent_count;
> uint32_t ext_list_gen_seq;
>
> @@ -551,4 +552,9 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
>
> void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> CXLDCExtent *extent);
> +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> + uint64_t len, uint8_t *tag,
> + uint16_t shared_seq);
> +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> + unsigned long size);
> #endif
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-06 17:48 ` Jonathan Cameron via
(?)
@ 2024-03-06 23:15 ` fan
2024-03-07 12:45 ` Jonathan Cameron via
-1 siblings, 1 reply; 81+ messages in thread
From: fan @ 2024-03-06 23:15 UTC (permalink / raw)
To: Jonathan Cameron
Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Mar 06, 2024 at 05:48:11PM +0000, Jonathan Cameron wrote:
> On Mon, 4 Mar 2024 11:34:04 -0800
> nifan.cxl@gmail.com wrote:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > Since fabric manager emulation is not supported yet, the change implements
> > the functions to add/release dynamic capacity extents as QMP interfaces.
>
> We'll need them anyway, or to implement an fm interface via QMP which is
> going to be ugly and complex.
>
> >
> > Note: we skips any FM issued extent release request if the exact extent
> > does not exist in the extent list of the device. We will loose the
> > restriction later once we have partial release support in the kernel.
>
> Maybe the kernel will treat it as a request to release the extent it
> is tracking that contains it. So we may want to add a way to poke that.
> Not today though!
>
> >
> > 1. Add dynamic capacity extents:
> >
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> >
> > { "execute": "qmp_capabilities" }
> >
> > { "execute": "cxl-add-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 0,
> > "len": 134217728
> > },
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > 2. Release dynamic capacity extents:
> >
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) look like below:
> >
> > { "execute": "cxl-release-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
>
> ...
>
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index dccfaaad3a..e9c8994cdb 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -674,6 +674,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> > ct3d->dc.total_capacity += region->len;
> > }
> > QTAILQ_INIT(&ct3d->dc.extents);
> > + QTAILQ_INIT(&ct3d->dc.extents_pending_to_add);
> >
> > return true;
> > }
> > @@ -686,6 +687,12 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> > ent = QTAILQ_FIRST(&ct3d->dc.extents);
> > cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> > }
> > +
> > + while (!QTAILQ_EMPTY(&ct3d->dc.extents_pending_to_add)) {
>
> QTAILQ_FOR_EACHSAFE
>
> > + ent = QTAILQ_FIRST(&ct3d->dc.extents_pending_to_add);
> > + cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending_to_add,
> > + ent);
> > + }
> > }
>
> > +/*
> > + * The main function to process dynamic capacity event. Currently DC extents
> > + * add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> > + CXLDCEventType type, uint16_t hid,
> > + uint8_t rid,
> > + CXLDCExtentRecordList *records,
> > + Error **errp)
> > +{
> > + Object *obj;
> > + CXLEventDynamicCapacity dCap = {};
> > + CXLEventRecordHdr *hdr = &dCap.hdr;
> > + CXLType3Dev *dcd;
> > + uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > + uint32_t num_extents = 0;
> > + CXLDCExtentRecordList *list;
> > + g_autofree CXLDCExtentRaw *extents = NULL;
> > + uint8_t enc_log;
> > + uint64_t offset, len, block_size;
> > + int i;
> > + int rc;
>
> Combine the two lines above.
>
> > + g_autofree unsigned long *blk_bitmap = NULL;
> > +
> > + obj = object_resolve_path(path, NULL);
> > + if (!obj) {
> > + error_setg(errp, "Unable to resolve path");
> > + return;
> > + }
>
> object_resolve_path_type() and skip a step (should do this in various places
> in our existing code!)
>
> > + if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> > + error_setg(errp, "Path not point to a valid CXL type3 device");
> > + return;
> > + }
> > +
> > + dcd = CXL_TYPE3(obj);
> > + if (!dcd->dc.num_regions) {
> > + error_setg(errp, "No dynamic capacity support from the device");
> > + return;
> > + }
> > +
> > + rc = ct3d_qmp_cxl_event_log_enc(log);
> > + if (rc < 0) {
> > + error_setg(errp, "Unhandled error log type");
> > + return;
> > + }
> > + enc_log = rc;
> > +
> > + if (rid >= dcd->dc.num_regions) {
> > + error_setg(errp, "region id is too large");
> > + return;
> > + }
> > + block_size = dcd->dc.regions[rid].block_size;
> > +
> > + /* Sanity check and count the extents */
> > + list = records;
> > + while (list) {
> > + offset = list->value->offset;
> > + len = list->value->len;
> > +
> > + if (len == 0) {
> > + error_setg(errp, "extent with 0 length is not allowed");
> > + return;
> > + }
> > +
> > + if (offset % block_size || len % block_size) {
> > + error_setg(errp, "dpa or len is not aligned to region block size");
> > + return;
> > + }
> > +
> > + if (offset + len > dcd->dc.regions[rid].len) {
> > + error_setg(errp, "extent range is beyond the region end");
> > + return;
> > + }
> > +
> > + num_extents++;
> > + list = list->next;
> > + }
> > + if (num_extents == 0) {
> > + error_setg(errp, "No extents found in the command");
> > + return;
> > + }
> > +
> > + blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > + /* Create Extent list for event being passed to host */
> > + i = 0;
> > + list = records;
> > + extents = g_new0(CXLDCExtentRaw, num_extents);
> > + while (list) {
> > + CXLDCExtent *ent;
> > + bool skip_extent = false;
> > +
> > + offset = list->value->offset;
> > + len = list->value->len;
> > +
> > + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > + extents[i].len = len;
> > + memset(extents[i].tag, 0, 0x10);
> > + extents[i].shared_seq = 0;
> > +
> > + if (type == DC_EVENT_RELEASE_CAPACITY ||
> > + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > + /*
> > + * if the extent is still pending to be added to the host,
>
> Odd spacing.
>
> > + * remove it from the pending extent list, so later when the add
> > + * response for the extent arrives, the device can reject the
> > + * extent as it is not in the pending list.
> > + */
> > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > + &extents[i]);
> > + if (ent) {
> > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > + g_free(ent);
> > + skip_extent = true;
> > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > + /* If the exact extent is not in the accepted list, skip */
> > + skip_extent = true;
> > + }
> I think we need to reject case of some extents skipped and others not.
> That's not supported yet so we need to complain if we get it at least. Maybe we need
> to do two passes so we know this has happened early (or perhaps this is a later
> patch in which case a todo here would help).
Skip here does not mean the extent is invalid, it just means the extent
is still pending to add, so remove them from pending list would be
enough to reject the extent, no need to release further. That is based
on your feedback on v4.
The loop here is only to collect the extents to sent to the event log.
But as you said, we need one pass before updating pending list.
Actually if we do not allow the above case where extents to release is
still in the pending to add list, we can just return here with error, no
extra dry run needed.
What do you think?
>
> > +
> > +
> > + /* No duplicate or overlapped extents are allowed */
> > + if (test_any_bits_set(blk_bitmap, offset / block_size,
> > + len / block_size)) {
> > + error_setg(errp, "duplicate or overlapped extents are detected");
> > + return;
> > + }
> > + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > + list = list->next;
> > + if (!skip_extent) {
> > + i++;
> Problem is if we skip one in the middle the records will be wrong below.
Why? Only extents passed the check will be stored in variable extents and
processed further and i be updated.
For skipped ones, since i is not updated, they will be
overwritten by following valid ones.
Fan
> > + }
> > + }
> > + num_extents = i;
> > +
> > + /*
> > + * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > + *
> > + * All Dynamic Capacity event records shall set the Event Record Severity
> > + * field in the Common Event Record Format to Informational Event. All
> > + * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > + * Event Log.
> > + */
> > + cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > + cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > + dCap.type = type;
> > + /* FIXME: for now, validity flag is cleared */
> > + dCap.validity_flags = 0;
> > + stw_le_p(&dCap.host_id, hid);
> > + /* only valid for DC_REGION_CONFIG_UPDATED event */
> > + dCap.updated_region_id = 0;
> > + /*
> > + * FIXME: for now, the "More" flag is cleared as there is only one
> > + * extent associating with each record and tag-based release is
> > + * not supported.
>
> Hmm. Seems like tag support would be easy. Add an optional qmp parameter,
> if a tag is set, we set the more flag for all but the last entry in this
> loop. I'm ok with that being a follow up patch though.
>
> > + */
> > + dCap.flags = 0;
> > + for (i = 0; i < num_extents; i++) {
> > + memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > + sizeof(CXLDCExtentRaw));
> > +
> > + if (type == DC_EVENT_ADD_CAPACITY) {
> > + cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending_to_add,
> > + extents[i].start_dpa,
> > + extents[i].len,
> > + extents[i].tag,
> > + extents[i].shared_seq);
> > + }
> > +
> > + if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > + (CXLEventRecordRaw *)&dCap)) {
> > + cxl_event_irq_assert(dcd);
> > + }
> > + }
> > +}
>
>
>
>
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index 341260e6e4..b524c5e699 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -490,6 +490,7 @@ struct CXLType3Dev {
> > AddressSpace host_dc_as;
> > uint64_t total_capacity; /* 256M aligned */
> > CXLDCExtentList extents;
> > + CXLDCExtentList extents_pending_to_add;
>
> Long name, extents_pending or just pending is plenty I think.
>
> > uint32_t total_extent_count;
> > uint32_t ext_list_gen_seq;
> >
> > @@ -551,4 +552,9 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> >
> > void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> > CXLDCExtent *extent);
> > +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> > + uint64_t len, uint8_t *tag,
> > + uint16_t shared_seq);
> > +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> > + unsigned long size);
> > #endif
>
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-06 23:15 ` fan
@ 2024-03-07 12:45 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-07 12:45 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
...
> > > + list = records;
> > > + extents = g_new0(CXLDCExtentRaw, num_extents);
> > > + while (list) {
> > > + CXLDCExtent *ent;
> > > + bool skip_extent = false;
> > > +
> > > + offset = list->value->offset;
> > > + len = list->value->len;
> > > +
> > > + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > > + extents[i].len = len;
> > > + memset(extents[i].tag, 0, 0x10);
> > > + extents[i].shared_seq = 0;
> > > +
> > > + if (type == DC_EVENT_RELEASE_CAPACITY ||
> > > + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > > + /*
> > > + * if the extent is still pending to be added to the host,
> >
> > Odd spacing.
> >
> > > + * remove it from the pending extent list, so later when the add
> > > + * response for the extent arrives, the device can reject the
> > > + * extent as it is not in the pending list.
> > > + */
> > > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > > + &extents[i]);
> > > + if (ent) {
> > > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > > + g_free(ent);
> > > + skip_extent = true;
> > > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > > + /* If the exact extent is not in the accepted list, skip */
> > > + skip_extent = true;
> > > + }
> > I think we need to reject case of some extents skipped and others not.
> > That's not supported yet so we need to complain if we get it at least. Maybe we need
> > to do two passes so we know this has happened early (or perhaps this is a later
> > patch in which case a todo here would help).
>
> Skip here does not mean the extent is invalid, it just means the extent
> is still pending to add, so remove them from pending list would be
> enough to reject the extent, no need to release further. That is based
> on your feedback on v4.
Ah. I'd missunderstood.
>
> The loop here is only to collect the extents to sent to the event log.
> But as you said, we need one pass before updating pending list.
> Actually if we do not allow the above case where extents to release is
> still in the pending to add list, we can just return here with error, no
> extra dry run needed.
>
> What do you think?
I think we need a way to back out extents from the pending to add list
so we can create the race where they are offered to the OS and it takes
forever to accept and by the time it does we've removed them.
>
> >
> > > +
> > > +
> > > + /* No duplicate or overlapped extents are allowed */
> > > + if (test_any_bits_set(blk_bitmap, offset / block_size,
> > > + len / block_size)) {
> > > + error_setg(errp, "duplicate or overlapped extents are detected");
> > > + return;
> > > + }
> > > + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > > +
> > > + list = list->next;
> > > + if (!skip_extent) {
> > > + i++;
> > Problem is if we skip one in the middle the records will be wrong below.
>
> Why? Only extents passed the check will be stored in variable extents and
> processed further and i be updated.
> For skipped ones, since i is not updated, they will be
> overwritten by following valid ones.
Ah. I'd missed the fact you store into the extent without a check on validity
but only move the index on if they were valid. Then rely on not passing a trailing
entry at the end.
If would be more readable I think if local variables were used for the parameters
until we've decided not to skip and the this ended with
if (!skip_extent) {
extents[i] = (DCXLDCExtentRaw) {
.start_dpa = ...
...
};
i++
}
We have local len already so probably just need
uint64_t start_dpa = offset + dcd->dc.regions[rid].base;
Also maybe skip_extent_evlog or something like that to explain we are only
skipping that part.
Helps people like me who read it completely wrong!
Jonathan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-03-07 12:45 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-07 12:45 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
...
> > > + list = records;
> > > + extents = g_new0(CXLDCExtentRaw, num_extents);
> > > + while (list) {
> > > + CXLDCExtent *ent;
> > > + bool skip_extent = false;
> > > +
> > > + offset = list->value->offset;
> > > + len = list->value->len;
> > > +
> > > + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > > + extents[i].len = len;
> > > + memset(extents[i].tag, 0, 0x10);
> > > + extents[i].shared_seq = 0;
> > > +
> > > + if (type == DC_EVENT_RELEASE_CAPACITY ||
> > > + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > > + /*
> > > + * if the extent is still pending to be added to the host,
> >
> > Odd spacing.
> >
> > > + * remove it from the pending extent list, so later when the add
> > > + * response for the extent arrives, the device can reject the
> > > + * extent as it is not in the pending list.
> > > + */
> > > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > > + &extents[i]);
> > > + if (ent) {
> > > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > > + g_free(ent);
> > > + skip_extent = true;
> > > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > > + /* If the exact extent is not in the accepted list, skip */
> > > + skip_extent = true;
> > > + }
> > I think we need to reject case of some extents skipped and others not.
> > That's not supported yet so we need to complain if we get it at least. Maybe we need
> > to do two passes so we know this has happened early (or perhaps this is a later
> > patch in which case a todo here would help).
>
> Skip here does not mean the extent is invalid, it just means the extent
> is still pending to add, so remove them from pending list would be
> enough to reject the extent, no need to release further. That is based
> on your feedback on v4.
Ah. I'd missunderstood.
>
> The loop here is only to collect the extents to sent to the event log.
> But as you said, we need one pass before updating pending list.
> Actually if we do not allow the above case where extents to release is
> still in the pending to add list, we can just return here with error, no
> extra dry run needed.
>
> What do you think?
I think we need a way to back out extents from the pending to add list
so we can create the race where they are offered to the OS and it takes
forever to accept and by the time it does we've removed them.
>
> >
> > > +
> > > +
> > > + /* No duplicate or overlapped extents are allowed */
> > > + if (test_any_bits_set(blk_bitmap, offset / block_size,
> > > + len / block_size)) {
> > > + error_setg(errp, "duplicate or overlapped extents are detected");
> > > + return;
> > > + }
> > > + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > > +
> > > + list = list->next;
> > > + if (!skip_extent) {
> > > + i++;
> > Problem is if we skip one in the middle the records will be wrong below.
>
> Why? Only extents passed the check will be stored in variable extents and
> processed further and i be updated.
> For skipped ones, since i is not updated, they will be
> overwritten by following valid ones.
Ah. I'd missed the fact you store into the extent without a check on validity
but only move the index on if they were valid. Then rely on not passing a trailing
entry at the end.
If would be more readable I think if local variables were used for the parameters
until we've decided not to skip and the this ended with
if (!skip_extent) {
extents[i] = (DCXLDCExtentRaw) {
.start_dpa = ...
...
};
i++
}
We have local len already so probably just need
uint64_t start_dpa = offset + dcd->dc.regions[rid].base;
Also maybe skip_extent_evlog or something like that to explain we are only
skipping that part.
Helps people like me who read it completely wrong!
Jonathan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-07 12:45 ` Jonathan Cameron via
(?)
@ 2024-03-09 4:35 ` fan
2024-03-12 12:37 ` Jonathan Cameron via
-1 siblings, 1 reply; 81+ messages in thread
From: fan @ 2024-03-09 4:35 UTC (permalink / raw)
To: Jonathan Cameron
Cc: fan, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Thu, Mar 07, 2024 at 12:45:55PM +0000, Jonathan Cameron wrote:
> ...
>
> > > > + list = records;
> > > > + extents = g_new0(CXLDCExtentRaw, num_extents);
> > > > + while (list) {
> > > > + CXLDCExtent *ent;
> > > > + bool skip_extent = false;
> > > > +
> > > > + offset = list->value->offset;
> > > > + len = list->value->len;
> > > > +
> > > > + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > > > + extents[i].len = len;
> > > > + memset(extents[i].tag, 0, 0x10);
> > > > + extents[i].shared_seq = 0;
> > > > +
> > > > + if (type == DC_EVENT_RELEASE_CAPACITY ||
> > > > + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > > > + /*
> > > > + * if the extent is still pending to be added to the host,
> > >
> > > Odd spacing.
> > >
> > > > + * remove it from the pending extent list, so later when the add
> > > > + * response for the extent arrives, the device can reject the
> > > > + * extent as it is not in the pending list.
> > > > + */
> > > > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > > > + &extents[i]);
> > > > + if (ent) {
> > > > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > > > + g_free(ent);
> > > > + skip_extent = true;
> > > > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > > > + /* If the exact extent is not in the accepted list, skip */
> > > > + skip_extent = true;
> > > > + }
> > > I think we need to reject case of some extents skipped and others not.
> > > That's not supported yet so we need to complain if we get it at least. Maybe we need
> > > to do two passes so we know this has happened early (or perhaps this is a later
> > > patch in which case a todo here would help).
> >
> > Skip here does not mean the extent is invalid, it just means the extent
> > is still pending to add, so remove them from pending list would be
> > enough to reject the extent, no need to release further. That is based
> > on your feedback on v4.
>
> Ah. I'd missunderstood.
Hi Jonathan,
I think we should not allow to release extents that are still pending to
add.
If we allow it, there is a case that will not work.
Let's see the following case (time order):
1. Send request to add extent A to host; (A --> pending list)
2. Send request to release A from the host; (Delete A from pending list,
hoping the following add response for A will fail as there is not a matched
extent in the pending list).
3. Host send response to the device for the add request, however, for
some reason, it does not accept any of it, so updated list is empty,
spec allows it. Based on the spec, we need to drop the extent at the
head of the event log. Now we have problem. Since extent A is already
dropped from the list, we either cannot drop as the list is empty, which
is not the worst. If we have more extents in the list, we may drop the
one following A, which is for another request. If this happens, all the
following extents will be acked incorrectly as the order has been
shifted.
Does the above reasoning make sense to you?
Fan
>
> >
> > The loop here is only to collect the extents to sent to the event log.
> > But as you said, we need one pass before updating pending list.
> > Actually if we do not allow the above case where extents to release is
> > still in the pending to add list, we can just return here with error, no
> > extra dry run needed.
> >
> > What do you think?
>
> I think we need a way to back out extents from the pending to add list
> so we can create the race where they are offered to the OS and it takes
> forever to accept and by the time it does we've removed them.
>
> >
> > >
> > > > +
> > > > +
> > > > + /* No duplicate or overlapped extents are allowed */
> > > > + if (test_any_bits_set(blk_bitmap, offset / block_size,
> > > > + len / block_size)) {
> > > > + error_setg(errp, "duplicate or overlapped extents are detected");
> > > > + return;
> > > > + }
> > > > + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > > > +
> > > > + list = list->next;
> > > > + if (!skip_extent) {
> > > > + i++;
> > > Problem is if we skip one in the middle the records will be wrong below.
> >
> > Why? Only extents passed the check will be stored in variable extents and
> > processed further and i be updated.
> > For skipped ones, since i is not updated, they will be
> > overwritten by following valid ones.
> Ah. I'd missed the fact you store into the extent without a check on validity
> but only move the index on if they were valid. Then rely on not passing a trailing
> entry at the end.
> If would be more readable I think if local variables were used for the parameters
> until we've decided not to skip and the this ended with
>
> if (!skip_extent) {
> extents[i] = (DCXLDCExtentRaw) {
> .start_dpa = ...
> ...
> };
> i++
> }
> We have local len already so probably just need
> uint64_t start_dpa = offset + dcd->dc.regions[rid].base;
>
> Also maybe skip_extent_evlog or something like that to explain we are only
> skipping that part.
> Helps people like me who read it completely wrong!
>
> Jonathan
>
>
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-09 4:35 ` fan
@ 2024-03-12 12:37 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-12 12:37 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Fri, 8 Mar 2024 20:35:53 -0800
fan <nifan.cxl@gmail.com> wrote:
> On Thu, Mar 07, 2024 at 12:45:55PM +0000, Jonathan Cameron wrote:
> > ...
> >
> > > > > + list = records;
> > > > > + extents = g_new0(CXLDCExtentRaw, num_extents);
> > > > > + while (list) {
> > > > > + CXLDCExtent *ent;
> > > > > + bool skip_extent = false;
> > > > > +
> > > > > + offset = list->value->offset;
> > > > > + len = list->value->len;
> > > > > +
> > > > > + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > > > > + extents[i].len = len;
> > > > > + memset(extents[i].tag, 0, 0x10);
> > > > > + extents[i].shared_seq = 0;
> > > > > +
> > > > > + if (type == DC_EVENT_RELEASE_CAPACITY ||
> > > > > + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > > > > + /*
> > > > > + * if the extent is still pending to be added to the host,
> > > >
> > > > Odd spacing.
> > > >
> > > > > + * remove it from the pending extent list, so later when the add
> > > > > + * response for the extent arrives, the device can reject the
> > > > > + * extent as it is not in the pending list.
> > > > > + */
> > > > > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > > > > + &extents[i]);
> > > > > + if (ent) {
> > > > > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > > > > + g_free(ent);
> > > > > + skip_extent = true;
> > > > > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > > > > + /* If the exact extent is not in the accepted list, skip */
> > > > > + skip_extent = true;
> > > > > + }
> > > > I think we need to reject case of some extents skipped and others not.
> > > > That's not supported yet so we need to complain if we get it at least. Maybe we need
> > > > to do two passes so we know this has happened early (or perhaps this is a later
> > > > patch in which case a todo here would help).
> > >
> > > Skip here does not mean the extent is invalid, it just means the extent
> > > is still pending to add, so remove them from pending list would be
> > > enough to reject the extent, no need to release further. That is based
> > > on your feedback on v4.
> >
> > Ah. I'd missunderstood.
>
> Hi Jonathan,
>
> I think we should not allow to release extents that are still pending to
> add.
> If we allow it, there is a case that will not work.
> Let's see the following case (time order):
> 1. Send request to add extent A to host; (A --> pending list)
> 2. Send request to release A from the host; (Delete A from pending list,
> hoping the following add response for A will fail as there is not a matched
> extent in the pending list).
Definitely not allow the host to release something it hasn't accepted.
Should allow QMP to release such entrees though (and same for fmapi when
we get there). Any such requested from host should be treated as whatever
it says to do if you release an extent that you don't have.
> 3. Host send response to the device for the add request, however, for
> some reason, it does not accept any of it, so updated list is empty,
> spec allows it. Based on the spec, we need to drop the extent at the
> head of the event log. Now we have problem. Since extent A is already
> dropped from the list, we either cannot drop as the list is empty, which
> is not the worst. If we have more extents in the list, we may drop the
> one following A, which is for another request. If this happens, all the
> following extents will be acked incorrectly as the order has been
> shifted.
>
> Does the above reasoning make sense to you?
Absolutely. I got confused here on who was doing release.
Host definitely can't release stuff it hasn't successfully accepted.
Jonathan
>
> Fan
>
> >
> > >
> > > The loop here is only to collect the extents to sent to the event log.
> > > But as you said, we need one pass before updating pending list.
> > > Actually if we do not allow the above case where extents to release is
> > > still in the pending to add list, we can just return here with error, no
> > > extra dry run needed.
> > >
> > > What do you think?
> >
> > I think we need a way to back out extents from the pending to add list
> > so we can create the race where they are offered to the OS and it takes
> > forever to accept and by the time it does we've removed them.
> >
> > >
> > > >
> > > > > +
> > > > > +
> > > > > + /* No duplicate or overlapped extents are allowed */
> > > > > + if (test_any_bits_set(blk_bitmap, offset / block_size,
> > > > > + len / block_size)) {
> > > > > + error_setg(errp, "duplicate or overlapped extents are detected");
> > > > > + return;
> > > > > + }
> > > > > + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > > > > +
> > > > > + list = list->next;
> > > > > + if (!skip_extent) {
> > > > > + i++;
> > > > Problem is if we skip one in the middle the records will be wrong below.
> > >
> > > Why? Only extents passed the check will be stored in variable extents and
> > > processed further and i be updated.
> > > For skipped ones, since i is not updated, they will be
> > > overwritten by following valid ones.
> > Ah. I'd missed the fact you store into the extent without a check on validity
> > but only move the index on if they were valid. Then rely on not passing a trailing
> > entry at the end.
> > If would be more readable I think if local variables were used for the parameters
> > until we've decided not to skip and the this ended with
> >
> > if (!skip_extent) {
> > extents[i] = (DCXLDCExtentRaw) {
> > .start_dpa = ...
> > ...
> > };
> > i++
> > }
> > We have local len already so probably just need
> > uint64_t start_dpa = offset + dcd->dc.regions[rid].base;
> >
> > Also maybe skip_extent_evlog or something like that to explain we are only
> > skipping that part.
> > Helps people like me who read it completely wrong!
> >
> > Jonathan
> >
> >
> >
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-03-12 12:37 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-12 12:37 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Fri, 8 Mar 2024 20:35:53 -0800
fan <nifan.cxl@gmail.com> wrote:
> On Thu, Mar 07, 2024 at 12:45:55PM +0000, Jonathan Cameron wrote:
> > ...
> >
> > > > > + list = records;
> > > > > + extents = g_new0(CXLDCExtentRaw, num_extents);
> > > > > + while (list) {
> > > > > + CXLDCExtent *ent;
> > > > > + bool skip_extent = false;
> > > > > +
> > > > > + offset = list->value->offset;
> > > > > + len = list->value->len;
> > > > > +
> > > > > + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > > > > + extents[i].len = len;
> > > > > + memset(extents[i].tag, 0, 0x10);
> > > > > + extents[i].shared_seq = 0;
> > > > > +
> > > > > + if (type == DC_EVENT_RELEASE_CAPACITY ||
> > > > > + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > > > > + /*
> > > > > + * if the extent is still pending to be added to the host,
> > > >
> > > > Odd spacing.
> > > >
> > > > > + * remove it from the pending extent list, so later when the add
> > > > > + * response for the extent arrives, the device can reject the
> > > > > + * extent as it is not in the pending list.
> > > > > + */
> > > > > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > > > > + &extents[i]);
> > > > > + if (ent) {
> > > > > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > > > > + g_free(ent);
> > > > > + skip_extent = true;
> > > > > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > > > > + /* If the exact extent is not in the accepted list, skip */
> > > > > + skip_extent = true;
> > > > > + }
> > > > I think we need to reject case of some extents skipped and others not.
> > > > That's not supported yet so we need to complain if we get it at least. Maybe we need
> > > > to do two passes so we know this has happened early (or perhaps this is a later
> > > > patch in which case a todo here would help).
> > >
> > > Skip here does not mean the extent is invalid, it just means the extent
> > > is still pending to add, so remove them from pending list would be
> > > enough to reject the extent, no need to release further. That is based
> > > on your feedback on v4.
> >
> > Ah. I'd missunderstood.
>
> Hi Jonathan,
>
> I think we should not allow to release extents that are still pending to
> add.
> If we allow it, there is a case that will not work.
> Let's see the following case (time order):
> 1. Send request to add extent A to host; (A --> pending list)
> 2. Send request to release A from the host; (Delete A from pending list,
> hoping the following add response for A will fail as there is not a matched
> extent in the pending list).
Definitely not allow the host to release something it hasn't accepted.
Should allow QMP to release such entrees though (and same for fmapi when
we get there). Any such requested from host should be treated as whatever
it says to do if you release an extent that you don't have.
> 3. Host send response to the device for the add request, however, for
> some reason, it does not accept any of it, so updated list is empty,
> spec allows it. Based on the spec, we need to drop the extent at the
> head of the event log. Now we have problem. Since extent A is already
> dropped from the list, we either cannot drop as the list is empty, which
> is not the worst. If we have more extents in the list, we may drop the
> one following A, which is for another request. If this happens, all the
> following extents will be acked incorrectly as the order has been
> shifted.
>
> Does the above reasoning make sense to you?
Absolutely. I got confused here on who was doing release.
Host definitely can't release stuff it hasn't successfully accepted.
Jonathan
>
> Fan
>
> >
> > >
> > > The loop here is only to collect the extents to sent to the event log.
> > > But as you said, we need one pass before updating pending list.
> > > Actually if we do not allow the above case where extents to release is
> > > still in the pending to add list, we can just return here with error, no
> > > extra dry run needed.
> > >
> > > What do you think?
> >
> > I think we need a way to back out extents from the pending to add list
> > so we can create the race where they are offered to the OS and it takes
> > forever to accept and by the time it does we've removed them.
> >
> > >
> > > >
> > > > > +
> > > > > +
> > > > > + /* No duplicate or overlapped extents are allowed */
> > > > > + if (test_any_bits_set(blk_bitmap, offset / block_size,
> > > > > + len / block_size)) {
> > > > > + error_setg(errp, "duplicate or overlapped extents are detected");
> > > > > + return;
> > > > > + }
> > > > > + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > > > > +
> > > > > + list = list->next;
> > > > > + if (!skip_extent) {
> > > > > + i++;
> > > > Problem is if we skip one in the middle the records will be wrong below.
> > >
> > > Why? Only extents passed the check will be stored in variable extents and
> > > processed further and i be updated.
> > > For skipped ones, since i is not updated, they will be
> > > overwritten by following valid ones.
> > Ah. I'd missed the fact you store into the extent without a check on validity
> > but only move the index on if they were valid. Then rely on not passing a trailing
> > entry at the end.
> > If would be more readable I think if local variables were used for the parameters
> > until we've decided not to skip and the this ended with
> >
> > if (!skip_extent) {
> > extents[i] = (DCXLDCExtentRaw) {
> > .start_dpa = ...
> > ...
> > };
> > i++
> > }
> > We have local len already so probably just need
> > uint64_t start_dpa = offset + dcd->dc.regions[rid].base;
> >
> > Also maybe skip_extent_evlog or something like that to explain we are only
> > skipping that part.
> > Helps people like me who read it completely wrong!
> >
> > Jonathan
> >
> >
> >
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-12 12:37 ` Jonathan Cameron via
(?)
@ 2024-03-12 16:27 ` fan
-1 siblings, 0 replies; 81+ messages in thread
From: fan @ 2024-03-12 16:27 UTC (permalink / raw)
To: Jonathan Cameron
Cc: fan, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Tue, Mar 12, 2024 at 12:37:23PM +0000, Jonathan Cameron wrote:
> On Fri, 8 Mar 2024 20:35:53 -0800
> fan <nifan.cxl@gmail.com> wrote:
>
> > On Thu, Mar 07, 2024 at 12:45:55PM +0000, Jonathan Cameron wrote:
> > > ...
> > >
> > > > > > + list = records;
> > > > > > + extents = g_new0(CXLDCExtentRaw, num_extents);
> > > > > > + while (list) {
> > > > > > + CXLDCExtent *ent;
> > > > > > + bool skip_extent = false;
> > > > > > +
> > > > > > + offset = list->value->offset;
> > > > > > + len = list->value->len;
> > > > > > +
> > > > > > + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > > > > > + extents[i].len = len;
> > > > > > + memset(extents[i].tag, 0, 0x10);
> > > > > > + extents[i].shared_seq = 0;
> > > > > > +
> > > > > > + if (type == DC_EVENT_RELEASE_CAPACITY ||
> > > > > > + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > > > > > + /*
> > > > > > + * if the extent is still pending to be added to the host,
> > > > >
> > > > > Odd spacing.
> > > > >
> > > > > > + * remove it from the pending extent list, so later when the add
> > > > > > + * response for the extent arrives, the device can reject the
> > > > > > + * extent as it is not in the pending list.
> > > > > > + */
> > > > > > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > > > > > + &extents[i]);
> > > > > > + if (ent) {
> > > > > > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > > > > > + g_free(ent);
> > > > > > + skip_extent = true;
> > > > > > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > > > > > + /* If the exact extent is not in the accepted list, skip */
> > > > > > + skip_extent = true;
> > > > > > + }
> > > > > I think we need to reject case of some extents skipped and others not.
> > > > > That's not supported yet so we need to complain if we get it at least. Maybe we need
> > > > > to do two passes so we know this has happened early (or perhaps this is a later
> > > > > patch in which case a todo here would help).
> > > >
> > > > Skip here does not mean the extent is invalid, it just means the extent
> > > > is still pending to add, so remove them from pending list would be
> > > > enough to reject the extent, no need to release further. That is based
> > > > on your feedback on v4.
> > >
> > > Ah. I'd missunderstood.
> >
> > Hi Jonathan,
> >
> > I think we should not allow to release extents that are still pending to
> > add.
> > If we allow it, there is a case that will not work.
> > Let's see the following case (time order):
> > 1. Send request to add extent A to host; (A --> pending list)
> > 2. Send request to release A from the host; (Delete A from pending list,
> > hoping the following add response for A will fail as there is not a matched
> > extent in the pending list).
>
> Definitely not allow the host to release something it hasn't accepted.
> Should allow QMP to release such entrees though (and same for fmapi when
> we get there). Any such requested from host should be treated as whatever
> it says to do if you release an extent that you don't have.
Not sure how it works here. If we allow QMP to release such extents and
clear the pending list entrees accordingly, later if the host response with
empty extent list, how can the device figure out which request the response is
for. The spec assumes the response comes in order, so the head of the
pending list should be removed from the pending list, however, if QMP
process already removed it.
The key problem here is for empty updated extent list, we do not have a way to
figure out the corresponding request as there is no DPA info to look
into.
>
> > 3. Host send response to the device for the add request, however, for
> > some reason, it does not accept any of it, so updated list is empty,
> > spec allows it. Based on the spec, we need to drop the extent at the
> > head of the event log. Now we have problem. Since extent A is already
> > dropped from the list, we either cannot drop as the list is empty, which
> > is not the worst. If we have more extents in the list, we may drop the
> > one following A, which is for another request. If this happens, all the
> > following extents will be acked incorrectly as the order has been
> > shifted.
> >
> > Does the above reasoning make sense to you?
> Absolutely. I got confused here on who was doing release.
> Host definitely can't release stuff it hasn't successfully accepted.
>
> Jonathan
>
The assumption here is FM first initiates the request to add some
extents to the hosts, and later FM initiates to release the extents
while the extents has not been accepted by the host yet.
Fan
> >
> > Fan
> >
> > >
> > > >
> > > > The loop here is only to collect the extents to sent to the event log.
> > > > But as you said, we need one pass before updating pending list.
> > > > Actually if we do not allow the above case where extents to release is
> > > > still in the pending to add list, we can just return here with error, no
> > > > extra dry run needed.
> > > >
> > > > What do you think?
> > >
> > > I think we need a way to back out extents from the pending to add list
> > > so we can create the race where they are offered to the OS and it takes
> > > forever to accept and by the time it does we've removed them.
> > >
> > > >
> > > > >
> > > > > > +
> > > > > > +
> > > > > > + /* No duplicate or overlapped extents are allowed */
> > > > > > + if (test_any_bits_set(blk_bitmap, offset / block_size,
> > > > > > + len / block_size)) {
> > > > > > + error_setg(errp, "duplicate or overlapped extents are detected");
> > > > > > + return;
> > > > > > + }
> > > > > > + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > > > > > +
> > > > > > + list = list->next;
> > > > > > + if (!skip_extent) {
> > > > > > + i++;
> > > > > Problem is if we skip one in the middle the records will be wrong below.
> > > >
> > > > Why? Only extents passed the check will be stored in variable extents and
> > > > processed further and i be updated.
> > > > For skipped ones, since i is not updated, they will be
> > > > overwritten by following valid ones.
> > > Ah. I'd missed the fact you store into the extent without a check on validity
> > > but only move the index on if they were valid. Then rely on not passing a trailing
> > > entry at the end.
> > > If would be more readable I think if local variables were used for the parameters
> > > until we've decided not to skip and the this ended with
> > >
> > > if (!skip_extent) {
> > > extents[i] = (DCXLDCExtentRaw) {
> > > .start_dpa = ...
> > > ...
> > > };
> > > i++
> > > }
> > > We have local len already so probably just need
> > > uint64_t start_dpa = offset + dcd->dc.regions[rid].base;
> > >
> > > Also maybe skip_extent_evlog or something like that to explain we are only
> > > skipping that part.
> > > Helps people like me who read it completely wrong!
> > >
> > > Jonathan
> > >
> > >
> > >
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-06 17:48 ` Jonathan Cameron via
(?)
(?)
@ 2024-03-06 23:36 ` fan
2024-03-07 12:47 ` Jonathan Cameron via
-1 siblings, 1 reply; 81+ messages in thread
From: fan @ 2024-03-06 23:36 UTC (permalink / raw)
To: Jonathan Cameron
Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Mar 06, 2024 at 05:48:11PM +0000, Jonathan Cameron wrote:
> On Mon, 4 Mar 2024 11:34:04 -0800
> nifan.cxl@gmail.com wrote:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > Since fabric manager emulation is not supported yet, the change implements
> > the functions to add/release dynamic capacity extents as QMP interfaces.
>
> We'll need them anyway, or to implement an fm interface via QMP which is
> going to be ugly and complex.
>
> >
> > Note: we skips any FM issued extent release request if the exact extent
> > does not exist in the extent list of the device. We will loose the
> > restriction later once we have partial release support in the kernel.
>
> Maybe the kernel will treat it as a request to release the extent it
> is tracking that contains it. So we may want to add a way to poke that.
> Not today though!
>
> >
> > 1. Add dynamic capacity extents:
> >
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> >
> > { "execute": "qmp_capabilities" }
> >
> > { "execute": "cxl-add-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 0,
> > "len": 134217728
> > },
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > 2. Release dynamic capacity extents:
> >
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) look like below:
> >
> > { "execute": "cxl-release-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
>
> ...
>
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index dccfaaad3a..e9c8994cdb 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -674,6 +674,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
> > ct3d->dc.total_capacity += region->len;
> > }
> > QTAILQ_INIT(&ct3d->dc.extents);
> > + QTAILQ_INIT(&ct3d->dc.extents_pending_to_add);
> >
> > return true;
> > }
> > @@ -686,6 +687,12 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
> > ent = QTAILQ_FIRST(&ct3d->dc.extents);
> > cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
> > }
> > +
> > + while (!QTAILQ_EMPTY(&ct3d->dc.extents_pending_to_add)) {
>
> QTAILQ_FOR_EACHSAFE
>
> > + ent = QTAILQ_FIRST(&ct3d->dc.extents_pending_to_add);
> > + cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending_to_add,
> > + ent);
> > + }
> > }
>
> > +/*
> > + * The main function to process dynamic capacity event. Currently DC extents
> > + * add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> > + CXLDCEventType type, uint16_t hid,
> > + uint8_t rid,
> > + CXLDCExtentRecordList *records,
> > + Error **errp)
> > +{
> > + Object *obj;
> > + CXLEventDynamicCapacity dCap = {};
> > + CXLEventRecordHdr *hdr = &dCap.hdr;
> > + CXLType3Dev *dcd;
> > + uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > + uint32_t num_extents = 0;
> > + CXLDCExtentRecordList *list;
> > + g_autofree CXLDCExtentRaw *extents = NULL;
> > + uint8_t enc_log;
> > + uint64_t offset, len, block_size;
> > + int i;
> > + int rc;
>
> Combine the two lines above.
>
> > + g_autofree unsigned long *blk_bitmap = NULL;
> > +
> > + obj = object_resolve_path(path, NULL);
> > + if (!obj) {
> > + error_setg(errp, "Unable to resolve path");
> > + return;
> > + }
>
> object_resolve_path_type() and skip a step (should do this in various places
> in our existing code!)
>
> > + if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> > + error_setg(errp, "Path not point to a valid CXL type3 device");
> > + return;
> > + }
> > +
> > + dcd = CXL_TYPE3(obj);
> > + if (!dcd->dc.num_regions) {
> > + error_setg(errp, "No dynamic capacity support from the device");
> > + return;
> > + }
> > +
> > + rc = ct3d_qmp_cxl_event_log_enc(log);
> > + if (rc < 0) {
> > + error_setg(errp, "Unhandled error log type");
> > + return;
> > + }
> > + enc_log = rc;
> > +
> > + if (rid >= dcd->dc.num_regions) {
> > + error_setg(errp, "region id is too large");
> > + return;
> > + }
> > + block_size = dcd->dc.regions[rid].block_size;
> > +
> > + /* Sanity check and count the extents */
> > + list = records;
> > + while (list) {
> > + offset = list->value->offset;
> > + len = list->value->len;
> > +
> > + if (len == 0) {
> > + error_setg(errp, "extent with 0 length is not allowed");
> > + return;
> > + }
> > +
> > + if (offset % block_size || len % block_size) {
> > + error_setg(errp, "dpa or len is not aligned to region block size");
> > + return;
> > + }
> > +
> > + if (offset + len > dcd->dc.regions[rid].len) {
> > + error_setg(errp, "extent range is beyond the region end");
> > + return;
> > + }
> > +
> > + num_extents++;
> > + list = list->next;
> > + }
> > + if (num_extents == 0) {
> > + error_setg(errp, "No extents found in the command");
> > + return;
> > + }
> > +
> > + blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > + /* Create Extent list for event being passed to host */
> > + i = 0;
> > + list = records;
> > + extents = g_new0(CXLDCExtentRaw, num_extents);
> > + while (list) {
> > + CXLDCExtent *ent;
> > + bool skip_extent = false;
> > +
> > + offset = list->value->offset;
> > + len = list->value->len;
> > +
> > + extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > + extents[i].len = len;
> > + memset(extents[i].tag, 0, 0x10);
> > + extents[i].shared_seq = 0;
> > +
> > + if (type == DC_EVENT_RELEASE_CAPACITY ||
> > + type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > + /*
> > + * if the extent is still pending to be added to the host,
>
> Odd spacing.
>
> > + * remove it from the pending extent list, so later when the add
> > + * response for the extent arrives, the device can reject the
> > + * extent as it is not in the pending list.
> > + */
> > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > + &extents[i]);
> > + if (ent) {
> > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > + g_free(ent);
> > + skip_extent = true;
> > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > + /* If the exact extent is not in the accepted list, skip */
> > + skip_extent = true;
> > + }
> I think we need to reject case of some extents skipped and others not.
> That's not supported yet so we need to complain if we get it at least. Maybe we need
> to do two passes so we know this has happened early (or perhaps this is a later
> patch in which case a todo here would help).
If the second skip_extent case, I will reject earlier instead of
skipping.
Fan
>
> > +
> > +
> > + /* No duplicate or overlapped extents are allowed */
> > + if (test_any_bits_set(blk_bitmap, offset / block_size,
> > + len / block_size)) {
> > + error_setg(errp, "duplicate or overlapped extents are detected");
> > + return;
> > + }
> > + bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > + list = list->next;
> > + if (!skip_extent) {
> > + i++;
> Problem is if we skip one in the middle the records will be wrong below.
> > + }
> > + }
> > + num_extents = i;
> > +
> > + /*
> > + * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > + *
> > + * All Dynamic Capacity event records shall set the Event Record Severity
> > + * field in the Common Event Record Format to Informational Event. All
> > + * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > + * Event Log.
> > + */
> > + cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > + cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > + dCap.type = type;
> > + /* FIXME: for now, validity flag is cleared */
> > + dCap.validity_flags = 0;
> > + stw_le_p(&dCap.host_id, hid);
> > + /* only valid for DC_REGION_CONFIG_UPDATED event */
> > + dCap.updated_region_id = 0;
> > + /*
> > + * FIXME: for now, the "More" flag is cleared as there is only one
> > + * extent associating with each record and tag-based release is
> > + * not supported.
>
> Hmm. Seems like tag support would be easy. Add an optional qmp parameter,
> if a tag is set, we set the more flag for all but the last entry in this
> loop. I'm ok with that being a follow up patch though.
>
> > + */
> > + dCap.flags = 0;
> > + for (i = 0; i < num_extents; i++) {
> > + memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > + sizeof(CXLDCExtentRaw));
> > +
> > + if (type == DC_EVENT_ADD_CAPACITY) {
> > + cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending_to_add,
> > + extents[i].start_dpa,
> > + extents[i].len,
> > + extents[i].tag,
> > + extents[i].shared_seq);
> > + }
> > +
> > + if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > + (CXLEventRecordRaw *)&dCap)) {
> > + cxl_event_irq_assert(dcd);
> > + }
> > + }
> > +}
>
>
>
>
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index 341260e6e4..b524c5e699 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -490,6 +490,7 @@ struct CXLType3Dev {
> > AddressSpace host_dc_as;
> > uint64_t total_capacity; /* 256M aligned */
> > CXLDCExtentList extents;
> > + CXLDCExtentList extents_pending_to_add;
>
> Long name, extents_pending or just pending is plenty I think.
>
> > uint32_t total_extent_count;
> > uint32_t ext_list_gen_seq;
> >
> > @@ -551,4 +552,9 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> >
> > void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
> > CXLDCExtent *extent);
> > +void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
> > + uint64_t len, uint8_t *tag,
> > + uint16_t shared_seq);
> > +bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
> > + unsigned long size);
> > #endif
>
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-06 23:36 ` fan
@ 2024-03-07 12:47 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-07 12:47 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
> >
> > > + * remove it from the pending extent list, so later when the add
> > > + * response for the extent arrives, the device can reject the
> > > + * extent as it is not in the pending list.
> > > + */
> > > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > > + &extents[i]);
> > > + if (ent) {
> > > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > > + g_free(ent);
> > > + skip_extent = true;
> > > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > > + /* If the exact extent is not in the accepted list, skip */
> > > + skip_extent = true;
> > > + }
> > I think we need to reject case of some extents skipped and others not.
> > That's not supported yet so we need to complain if we get it at least. Maybe we need
> > to do two passes so we know this has happened early (or perhaps this is a later
> > patch in which case a todo here would help).
>
> If the second skip_extent case, I will reject earlier instead of
> skipping.
That was me misunderstanding the flow. I think this is fine as you have it already.
Jonathan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-03-07 12:47 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-07 12:47 UTC (permalink / raw)
To: fan
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
> >
> > > + * remove it from the pending extent list, so later when the add
> > > + * response for the extent arrives, the device can reject the
> > > + * extent as it is not in the pending list.
> > > + */
> > > + ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> > > + &extents[i]);
> > > + if (ent) {
> > > + QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> > > + g_free(ent);
> > > + skip_extent = true;
> > > + } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > > + /* If the exact extent is not in the accepted list, skip */
> > > + skip_extent = true;
> > > + }
> > I think we need to reject case of some extents skipped and others not.
> > That's not supported yet so we need to complain if we get it at least. Maybe we need
> > to do two passes so we know this has happened early (or perhaps this is a later
> > patch in which case a todo here would help).
>
> If the second skip_extent case, I will reject earlier instead of
> skipping.
That was me misunderstanding the flow. I think this is fine as you have it already.
Jonathan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-03-04 19:34 ` [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
2024-03-06 17:48 ` Jonathan Cameron via
@ 2024-04-24 13:09 ` Markus Armbruster
2024-04-24 17:10 ` fan
` (2 more replies)
1 sibling, 3 replies; 81+ messages in thread
From: Markus Armbruster @ 2024-04-24 13:09 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
nifan.cxl@gmail.com writes:
> From: Fan Ni <fan.ni@samsung.com>
>
> Since fabric manager emulation is not supported yet, the change implements
> the functions to add/release dynamic capacity extents as QMP interfaces.
Will fabric manager emulation obsolete these commands?
> Note: we skips any FM issued extent release request if the exact extent
> does not exist in the extent list of the device. We will loose the
> restriction later once we have partial release support in the kernel.
>
> 1. Add dynamic capacity extents:
>
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
>
> { "execute": "qmp_capabilities" }
>
> { "execute": "cxl-add-dynamic-capacity",
> "arguments": {
> "path": "/machine/peripheral/cxl-dcd0",
> "region-id": 0,
> "extents": [
> {
> "dpa": 0,
> "len": 134217728
> },
> {
> "dpa": 134217728,
> "len": 134217728
> }
> ]
> }
> }
>
> 2. Release dynamic capacity extents:
>
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) look like below:
>
> { "execute": "cxl-release-dynamic-capacity",
> "arguments": {
> "path": "/machine/peripheral/cxl-dcd0",
> "region-id": 0,
> "extents": [
> {
> "dpa": 134217728,
> "len": 134217728
> }
> ]
> }
> }
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
[...]
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 8cc4c72fa9..2645004666 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -19,13 +19,16 @@
> #
> # @fatal: Fatal Event Log
> #
> +# @dyncap: Dynamic Capacity Event Log
> +#
> # Since: 8.1
> ##
> { 'enum': 'CxlEventLog',
> 'data': ['informational',
> 'warning',
> 'failure',
> - 'fatal']
> + 'fatal',
> + 'dyncap']
We tend to avoid abbreviations in QMP identifiers: dynamic-capacity.
> }
>
> ##
> @@ -361,3 +364,59 @@
> ##
> {'command': 'cxl-inject-correctable-error',
> 'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDCExtentRecord:
Such traffic jams of capital letters are hard to read.
What does DC mean?
> +#
> +# Record of a single extent to add/release
> +#
> +# @offset: offset to the start of the region where the extent to be operated
Blank line here, please
> +# @len: length of the extent
> +#
> +# Since: 9.0
> +##
> +{ 'struct': 'CXLDCExtentRecord',
> + 'data': {
> + 'offset':'uint64',
> + 'len': 'uint64'
> + }
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The device will
I think we're missing an article here. Is it "a flow" or "the flow"?
> +# have to acknowledged the acceptance of the extents before they are usable.
to acknowledge
docs/devel/qapi-code-gen.rst:
For legibility, wrap text paragraphs so every line is at most 70
characters long.
Separate sentences with two spaces.
> +#
> +# @path: CXL DCD canonical QOM path
What is a CXL DCD? Is it a device?
I'd prefer @qom-path, unless you can make a consistency argument for
@path.
> +# @region-id: id of the region where the extent to add
What's a region, and how do they get their IDs?
> +# @extents: Extents to add
Blank lines between argument descriptions, please.
> +#
> +# Since : 9.0
9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> + 'data': { 'path': 'str',
> + 'region-id': 'uint8',
> + 'extents': [ 'CXLDCExtentRecord' ]
> + }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
Article again.
The host? In cxl-add-dynamic-capacity's doc comment, it's the device.
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.
Is "and can be re-added" relevant here?
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: id of the region where the extent to release
> +# @extents: Extents to release
> +#
> +# Since : 9.0
9.1
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> + 'data': { 'path': 'str',
> + 'region-id': 'uint8',
> + 'extents': [ 'CXLDCExtentRecord' ]
> + }
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-24 13:09 ` Markus Armbruster
@ 2024-04-24 17:10 ` fan
2024-04-24 17:26 ` Markus Armbruster
2024-04-24 17:33 ` Ira Weiny
2024-04-24 17:39 ` fan
2 siblings, 1 reply; 81+ messages in thread
From: fan @ 2024-04-24 17:10 UTC (permalink / raw)
To: Markus Armbruster
Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
nmtadam.samsung, jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Apr 24, 2024 at 03:09:52PM +0200, Markus Armbruster wrote:
> nifan.cxl@gmail.com writes:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > Since fabric manager emulation is not supported yet, the change implements
> > the functions to add/release dynamic capacity extents as QMP interfaces.
>
> Will fabric manager emulation obsolete these commands?
>
Hi Markus,
Thanks for reviewing the patchset. This is v5 and we have sent out v7
recently, there are a lot of changes from v5 to v7.
FYI. v7: https://lore.kernel.org/linux-cxl/ZiaFYUB6FC9NR7W4@memverge.com/T/#t
Fan
> > Note: we skips any FM issued extent release request if the exact extent
> > does not exist in the extent list of the device. We will loose the
> > restriction later once we have partial release support in the kernel.
> >
> > 1. Add dynamic capacity extents:
> >
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> >
> > { "execute": "qmp_capabilities" }
> >
> > { "execute": "cxl-add-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 0,
> > "len": 134217728
> > },
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > 2. Release dynamic capacity extents:
> >
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) look like below:
> >
> > { "execute": "cxl-release-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
>
> [...]
>
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 8cc4c72fa9..2645004666 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -19,13 +19,16 @@
> > #
> > # @fatal: Fatal Event Log
> > #
> > +# @dyncap: Dynamic Capacity Event Log
> > +#
> > # Since: 8.1
> > ##
> > { 'enum': 'CxlEventLog',
> > 'data': ['informational',
> > 'warning',
> > 'failure',
> > - 'fatal']
> > + 'fatal',
> > + 'dyncap']
>
> We tend to avoid abbreviations in QMP identifiers: dynamic-capacity.
>
> > }
> >
> > ##
> > @@ -361,3 +364,59 @@
> > ##
> > {'command': 'cxl-inject-correctable-error',
> > 'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDCExtentRecord:
>
> Such traffic jams of capital letters are hard to read.
>
> What does DC mean?
>
> > +#
> > +# Record of a single extent to add/release
> > +#
> > +# @offset: offset to the start of the region where the extent to be operated
>
> Blank line here, please
>
> > +# @len: length of the extent
> > +#
> > +# Since: 9.0
> > +##
> > +{ 'struct': 'CXLDCExtentRecord',
> > + 'data': {
> > + 'offset':'uint64',
> > + 'len': 'uint64'
> > + }
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to start add dynamic capacity extents flow. The device will
>
> I think we're missing an article here. Is it "a flow" or "the flow"?
>
> > +# have to acknowledged the acceptance of the extents before they are usable.
>
> to acknowledge
>
> docs/devel/qapi-code-gen.rst:
>
> For legibility, wrap text paragraphs so every line is at most 70
> characters long.
>
> Separate sentences with two spaces.
>
> > +#
> > +# @path: CXL DCD canonical QOM path
>
> What is a CXL DCD? Is it a device?
>
> I'd prefer @qom-path, unless you can make a consistency argument for
> @path.
>
> > +# @region-id: id of the region where the extent to add
>
> What's a region, and how do they get their IDs?
>
> > +# @extents: Extents to add
>
> Blank lines between argument descriptions, please.
>
> > +#
> > +# Since : 9.0
>
> 9.1
>
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > + 'data': { 'path': 'str',
> > + 'region-id': 'uint8',
> > + 'extents': [ 'CXLDCExtentRecord' ]
> > + }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to start release dynamic capacity extents flow. The host will
>
> Article again.
>
> The host? In cxl-add-dynamic-capacity's doc comment, it's the device.
>
> > +# need to respond to indicate that it has released the capacity before it
> > +# is made unavailable for read and write and can be re-added.
>
> Is "and can be re-added" relevant here?
>
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to release
> > +# @extents: Extents to release
> > +#
> > +# Since : 9.0
>
> 9.1
>
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > + 'data': { 'path': 'str',
> > + 'region-id': 'uint8',
> > + 'extents': [ 'CXLDCExtentRecord' ]
> > + }
> > +}
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-24 17:10 ` fan
@ 2024-04-24 17:26 ` Markus Armbruster
2024-04-24 17:44 ` fan
0 siblings, 1 reply; 81+ messages in thread
From: Markus Armbruster @ 2024-04-24 17:26 UTC (permalink / raw)
To: fan
Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
fan <nifan.cxl@gmail.com> writes:
> On Wed, Apr 24, 2024 at 03:09:52PM +0200, Markus Armbruster wrote:
>> nifan.cxl@gmail.com writes:
>>
>> > From: Fan Ni <fan.ni@samsung.com>
>> >
>> > Since fabric manager emulation is not supported yet, the change implements
>> > the functions to add/release dynamic capacity extents as QMP interfaces.
>>
>> Will fabric manager emulation obsolete these commands?
>>
>
> Hi Markus,
> Thanks for reviewing the patchset. This is v5 and we have sent out v7
> recently, there are a lot of changes from v5 to v7.
>
> FYI. v7: https://lore.kernel.org/linux-cxl/ZiaFYUB6FC9NR7W4@memverge.com/T/#t
Missed it because you neglected to cc: me for qapi/cxl.json :)
Thanks!
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-24 17:26 ` Markus Armbruster
@ 2024-04-24 17:44 ` fan
0 siblings, 0 replies; 81+ messages in thread
From: fan @ 2024-04-24 17:44 UTC (permalink / raw)
To: Markus Armbruster
Cc: fan, qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Apr 24, 2024 at 07:26:23PM +0200, Markus Armbruster wrote:
> fan <nifan.cxl@gmail.com> writes:
>
> > On Wed, Apr 24, 2024 at 03:09:52PM +0200, Markus Armbruster wrote:
> >> nifan.cxl@gmail.com writes:
> >>
> >> > From: Fan Ni <fan.ni@samsung.com>
> >> >
> >> > Since fabric manager emulation is not supported yet, the change implements
> >> > the functions to add/release dynamic capacity extents as QMP interfaces.
> >>
> >> Will fabric manager emulation obsolete these commands?
> >>
> >
> > Hi Markus,
> > Thanks for reviewing the patchset. This is v5 and we have sent out v7
> > recently, there are a lot of changes from v5 to v7.
> >
> > FYI. v7: https://lore.kernel.org/linux-cxl/ZiaFYUB6FC9NR7W4@memverge.com/T/#t
>
> Missed it because you neglected to cc: me for qapi/cxl.json :)
>
> Thanks!
Sorry for that. This is the first time I made changes to qapi/cxl.json so
missed that. I will cc you when I sent out the next version.
Btw, thanks for the review. I have replied to your comments in another reply.
Fan
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-24 13:09 ` Markus Armbruster
2024-04-24 17:10 ` fan
@ 2024-04-24 17:33 ` Ira Weiny
2024-04-26 15:55 ` Jonathan Cameron via
2024-04-24 17:39 ` fan
2 siblings, 1 reply; 81+ messages in thread
From: Ira Weiny @ 2024-04-24 17:33 UTC (permalink / raw)
To: Markus Armbruster, nifan.cxl
Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
Markus Armbruster wrote:
> nifan.cxl@gmail.com writes:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > Since fabric manager emulation is not supported yet, the change implements
> > the functions to add/release dynamic capacity extents as QMP interfaces.
>
> Will fabric manager emulation obsolete these commands?
I don't think so. In the development of the kernel, I see these being
valuable to do CI and regression testing without the complexity of an FM.
Ira
>
> > Note: we skips any FM issued extent release request if the exact extent
> > does not exist in the extent list of the device. We will loose the
> > restriction later once we have partial release support in the kernel.
> >
> > 1. Add dynamic capacity extents:
> >
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> >
> > { "execute": "qmp_capabilities" }
> >
> > { "execute": "cxl-add-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 0,
> > "len": 134217728
> > },
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > 2. Release dynamic capacity extents:
> >
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) look like below:
> >
> > { "execute": "cxl-release-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
>
> [...]
>
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 8cc4c72fa9..2645004666 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -19,13 +19,16 @@
> > #
> > # @fatal: Fatal Event Log
> > #
> > +# @dyncap: Dynamic Capacity Event Log
> > +#
> > # Since: 8.1
> > ##
> > { 'enum': 'CxlEventLog',
> > 'data': ['informational',
> > 'warning',
> > 'failure',
> > - 'fatal']
> > + 'fatal',
> > + 'dyncap']
>
> We tend to avoid abbreviations in QMP identifiers: dynamic-capacity.
>
> > }
> >
> > ##
> > @@ -361,3 +364,59 @@
> > ##
> > {'command': 'cxl-inject-correctable-error',
> > 'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDCExtentRecord:
>
> Such traffic jams of capital letters are hard to read.
>
> What does DC mean?
>
> > +#
> > +# Record of a single extent to add/release
> > +#
> > +# @offset: offset to the start of the region where the extent to be operated
>
> Blank line here, please
>
> > +# @len: length of the extent
> > +#
> > +# Since: 9.0
> > +##
> > +{ 'struct': 'CXLDCExtentRecord',
> > + 'data': {
> > + 'offset':'uint64',
> > + 'len': 'uint64'
> > + }
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to start add dynamic capacity extents flow. The device will
>
> I think we're missing an article here. Is it "a flow" or "the flow"?
>
> > +# have to acknowledged the acceptance of the extents before they are usable.
>
> to acknowledge
>
> docs/devel/qapi-code-gen.rst:
>
> For legibility, wrap text paragraphs so every line is at most 70
> characters long.
>
> Separate sentences with two spaces.
>
> > +#
> > +# @path: CXL DCD canonical QOM path
>
> What is a CXL DCD? Is it a device?
>
> I'd prefer @qom-path, unless you can make a consistency argument for
> @path.
>
> > +# @region-id: id of the region where the extent to add
>
> What's a region, and how do they get their IDs?
>
> > +# @extents: Extents to add
>
> Blank lines between argument descriptions, please.
>
> > +#
> > +# Since : 9.0
>
> 9.1
>
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > + 'data': { 'path': 'str',
> > + 'region-id': 'uint8',
> > + 'extents': [ 'CXLDCExtentRecord' ]
> > + }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to start release dynamic capacity extents flow. The host will
>
> Article again.
>
> The host? In cxl-add-dynamic-capacity's doc comment, it's the device.
>
> > +# need to respond to indicate that it has released the capacity before it
> > +# is made unavailable for read and write and can be re-added.
>
> Is "and can be re-added" relevant here?
>
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to release
> > +# @extents: Extents to release
> > +#
> > +# Since : 9.0
>
> 9.1
>
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > + 'data': { 'path': 'str',
> > + 'region-id': 'uint8',
> > + 'extents': [ 'CXLDCExtentRecord' ]
> > + }
> > +}
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-24 17:33 ` Ira Weiny
@ 2024-04-26 15:55 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-04-26 15:55 UTC (permalink / raw)
To: Ira Weiny
Cc: Markus Armbruster, nifan.cxl, qemu-devel, linux-cxl,
gregory.price, dan.j.williams, a.manzanares, dave,
nmtadam.samsung, jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, 24 Apr 2024 10:33:33 -0700
Ira Weiny <ira.weiny@intel.com> wrote:
> Markus Armbruster wrote:
> > nifan.cxl@gmail.com writes:
> >
> > > From: Fan Ni <fan.ni@samsung.com>
> > >
> > > Since fabric manager emulation is not supported yet, the change implements
> > > the functions to add/release dynamic capacity extents as QMP interfaces.
> >
> > Will fabric manager emulation obsolete these commands?
>
> I don't think so. In the development of the kernel, I see these being
> valuable to do CI and regression testing without the complexity of an FM.
Fully agree - I also long term see these as the drivers for one
possible virtualization stack for DCD devices (whether it turns
out to be the way forwards for that is going to take a while to
resolve!)
It doesn't make much sense to add a fabric manager into that flow
or to expose an appropriate (maybe MCTP) interface from QEMU just
to poke the emulated device.
Jonathan
>
> Ira
>
> >
> > > Note: we skips any FM issued extent release request if the exact extent
> > > does not exist in the extent list of the device. We will loose the
> > > restriction later once we have partial release support in the kernel.
> > >
> > > 1. Add dynamic capacity extents:
> > >
> > > For example, the command to add two continuous extents (each 128MiB long)
> > > to region 0 (starting at DPA offset 0) looks like below:
> > >
> > > { "execute": "qmp_capabilities" }
> > >
> > > { "execute": "cxl-add-dynamic-capacity",
> > > "arguments": {
> > > "path": "/machine/peripheral/cxl-dcd0",
> > > "region-id": 0,
> > > "extents": [
> > > {
> > > "dpa": 0,
> > > "len": 134217728
> > > },
> > > {
> > > "dpa": 134217728,
> > > "len": 134217728
> > > }
> > > ]
> > > }
> > > }
> > >
> > > 2. Release dynamic capacity extents:
> > >
> > > For example, the command to release an extent of size 128MiB from region 0
> > > (DPA offset 128MiB) look like below:
> > >
> > > { "execute": "cxl-release-dynamic-capacity",
> > > "arguments": {
> > > "path": "/machine/peripheral/cxl-dcd0",
> > > "region-id": 0,
> > > "extents": [
> > > {
> > > "dpa": 134217728,
> > > "len": 134217728
> > > }
> > > ]
> > > }
> > > }
> > >
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> >
> > [...]
> >
> > > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > > index 8cc4c72fa9..2645004666 100644
> > > --- a/qapi/cxl.json
> > > +++ b/qapi/cxl.json
> > > @@ -19,13 +19,16 @@
> > > #
> > > # @fatal: Fatal Event Log
> > > #
> > > +# @dyncap: Dynamic Capacity Event Log
> > > +#
> > > # Since: 8.1
> > > ##
> > > { 'enum': 'CxlEventLog',
> > > 'data': ['informational',
> > > 'warning',
> > > 'failure',
> > > - 'fatal']
> > > + 'fatal',
> > > + 'dyncap']
> >
> > We tend to avoid abbreviations in QMP identifiers: dynamic-capacity.
> >
> > > }
> > >
> > > ##
> > > @@ -361,3 +364,59 @@
> > > ##
> > > {'command': 'cxl-inject-correctable-error',
> > > 'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > > +
> > > +##
> > > +# @CXLDCExtentRecord:
> >
> > Such traffic jams of capital letters are hard to read.
> >
> > What does DC mean?
> >
> > > +#
> > > +# Record of a single extent to add/release
> > > +#
> > > +# @offset: offset to the start of the region where the extent to be operated
> >
> > Blank line here, please
> >
> > > +# @len: length of the extent
> > > +#
> > > +# Since: 9.0
> > > +##
> > > +{ 'struct': 'CXLDCExtentRecord',
> > > + 'data': {
> > > + 'offset':'uint64',
> > > + 'len': 'uint64'
> > > + }
> > > +}
> > > +
> > > +##
> > > +# @cxl-add-dynamic-capacity:
> > > +#
> > > +# Command to start add dynamic capacity extents flow. The device will
> >
> > I think we're missing an article here. Is it "a flow" or "the flow"?
> >
> > > +# have to acknowledged the acceptance of the extents before they are usable.
> >
> > to acknowledge
> >
> > docs/devel/qapi-code-gen.rst:
> >
> > For legibility, wrap text paragraphs so every line is at most 70
> > characters long.
> >
> > Separate sentences with two spaces.
> >
> > > +#
> > > +# @path: CXL DCD canonical QOM path
> >
> > What is a CXL DCD? Is it a device?
> >
> > I'd prefer @qom-path, unless you can make a consistency argument for
> > @path.
> >
> > > +# @region-id: id of the region where the extent to add
> >
> > What's a region, and how do they get their IDs?
> >
> > > +# @extents: Extents to add
> >
> > Blank lines between argument descriptions, please.
> >
> > > +#
> > > +# Since : 9.0
> >
> > 9.1
> >
> > > +##
> > > +{ 'command': 'cxl-add-dynamic-capacity',
> > > + 'data': { 'path': 'str',
> > > + 'region-id': 'uint8',
> > > + 'extents': [ 'CXLDCExtentRecord' ]
> > > + }
> > > +}
> > > +
> > > +##
> > > +# @cxl-release-dynamic-capacity:
> > > +#
> > > +# Command to start release dynamic capacity extents flow. The host will
> >
> > Article again.
> >
> > The host? In cxl-add-dynamic-capacity's doc comment, it's the device.
> >
> > > +# need to respond to indicate that it has released the capacity before it
> > > +# is made unavailable for read and write and can be re-added.
> >
> > Is "and can be re-added" relevant here?
> >
> > > +#
> > > +# @path: CXL DCD canonical QOM path
> > > +# @region-id: id of the region where the extent to release
> > > +# @extents: Extents to release
> > > +#
> > > +# Since : 9.0
> >
> > 9.1
> >
> > > +##
> > > +{ 'command': 'cxl-release-dynamic-capacity',
> > > + 'data': { 'path': 'str',
> > > + 'region-id': 'uint8',
> > > + 'extents': [ 'CXLDCExtentRecord' ]
> > > + }
> > > +}
> >
>
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-26 15:55 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-04-26 15:55 UTC (permalink / raw)
To: Ira Weiny
Cc: Markus Armbruster, nifan.cxl, qemu-devel, linux-cxl,
gregory.price, dan.j.williams, a.manzanares, dave,
nmtadam.samsung, jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, 24 Apr 2024 10:33:33 -0700
Ira Weiny <ira.weiny@intel.com> wrote:
> Markus Armbruster wrote:
> > nifan.cxl@gmail.com writes:
> >
> > > From: Fan Ni <fan.ni@samsung.com>
> > >
> > > Since fabric manager emulation is not supported yet, the change implements
> > > the functions to add/release dynamic capacity extents as QMP interfaces.
> >
> > Will fabric manager emulation obsolete these commands?
>
> I don't think so. In the development of the kernel, I see these being
> valuable to do CI and regression testing without the complexity of an FM.
Fully agree - I also long term see these as the drivers for one
possible virtualization stack for DCD devices (whether it turns
out to be the way forwards for that is going to take a while to
resolve!)
It doesn't make much sense to add a fabric manager into that flow
or to expose an appropriate (maybe MCTP) interface from QEMU just
to poke the emulated device.
Jonathan
>
> Ira
>
> >
> > > Note: we skips any FM issued extent release request if the exact extent
> > > does not exist in the extent list of the device. We will loose the
> > > restriction later once we have partial release support in the kernel.
> > >
> > > 1. Add dynamic capacity extents:
> > >
> > > For example, the command to add two continuous extents (each 128MiB long)
> > > to region 0 (starting at DPA offset 0) looks like below:
> > >
> > > { "execute": "qmp_capabilities" }
> > >
> > > { "execute": "cxl-add-dynamic-capacity",
> > > "arguments": {
> > > "path": "/machine/peripheral/cxl-dcd0",
> > > "region-id": 0,
> > > "extents": [
> > > {
> > > "dpa": 0,
> > > "len": 134217728
> > > },
> > > {
> > > "dpa": 134217728,
> > > "len": 134217728
> > > }
> > > ]
> > > }
> > > }
> > >
> > > 2. Release dynamic capacity extents:
> > >
> > > For example, the command to release an extent of size 128MiB from region 0
> > > (DPA offset 128MiB) look like below:
> > >
> > > { "execute": "cxl-release-dynamic-capacity",
> > > "arguments": {
> > > "path": "/machine/peripheral/cxl-dcd0",
> > > "region-id": 0,
> > > "extents": [
> > > {
> > > "dpa": 134217728,
> > > "len": 134217728
> > > }
> > > ]
> > > }
> > > }
> > >
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> >
> > [...]
> >
> > > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > > index 8cc4c72fa9..2645004666 100644
> > > --- a/qapi/cxl.json
> > > +++ b/qapi/cxl.json
> > > @@ -19,13 +19,16 @@
> > > #
> > > # @fatal: Fatal Event Log
> > > #
> > > +# @dyncap: Dynamic Capacity Event Log
> > > +#
> > > # Since: 8.1
> > > ##
> > > { 'enum': 'CxlEventLog',
> > > 'data': ['informational',
> > > 'warning',
> > > 'failure',
> > > - 'fatal']
> > > + 'fatal',
> > > + 'dyncap']
> >
> > We tend to avoid abbreviations in QMP identifiers: dynamic-capacity.
> >
> > > }
> > >
> > > ##
> > > @@ -361,3 +364,59 @@
> > > ##
> > > {'command': 'cxl-inject-correctable-error',
> > > 'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > > +
> > > +##
> > > +# @CXLDCExtentRecord:
> >
> > Such traffic jams of capital letters are hard to read.
> >
> > What does DC mean?
> >
> > > +#
> > > +# Record of a single extent to add/release
> > > +#
> > > +# @offset: offset to the start of the region where the extent to be operated
> >
> > Blank line here, please
> >
> > > +# @len: length of the extent
> > > +#
> > > +# Since: 9.0
> > > +##
> > > +{ 'struct': 'CXLDCExtentRecord',
> > > + 'data': {
> > > + 'offset':'uint64',
> > > + 'len': 'uint64'
> > > + }
> > > +}
> > > +
> > > +##
> > > +# @cxl-add-dynamic-capacity:
> > > +#
> > > +# Command to start add dynamic capacity extents flow. The device will
> >
> > I think we're missing an article here. Is it "a flow" or "the flow"?
> >
> > > +# have to acknowledged the acceptance of the extents before they are usable.
> >
> > to acknowledge
> >
> > docs/devel/qapi-code-gen.rst:
> >
> > For legibility, wrap text paragraphs so every line is at most 70
> > characters long.
> >
> > Separate sentences with two spaces.
> >
> > > +#
> > > +# @path: CXL DCD canonical QOM path
> >
> > What is a CXL DCD? Is it a device?
> >
> > I'd prefer @qom-path, unless you can make a consistency argument for
> > @path.
> >
> > > +# @region-id: id of the region where the extent to add
> >
> > What's a region, and how do they get their IDs?
> >
> > > +# @extents: Extents to add
> >
> > Blank lines between argument descriptions, please.
> >
> > > +#
> > > +# Since : 9.0
> >
> > 9.1
> >
> > > +##
> > > +{ 'command': 'cxl-add-dynamic-capacity',
> > > + 'data': { 'path': 'str',
> > > + 'region-id': 'uint8',
> > > + 'extents': [ 'CXLDCExtentRecord' ]
> > > + }
> > > +}
> > > +
> > > +##
> > > +# @cxl-release-dynamic-capacity:
> > > +#
> > > +# Command to start release dynamic capacity extents flow. The host will
> >
> > Article again.
> >
> > The host? In cxl-add-dynamic-capacity's doc comment, it's the device.
> >
> > > +# need to respond to indicate that it has released the capacity before it
> > > +# is made unavailable for read and write and can be re-added.
> >
> > Is "and can be re-added" relevant here?
> >
> > > +#
> > > +# @path: CXL DCD canonical QOM path
> > > +# @region-id: id of the region where the extent to release
> > > +# @extents: Extents to release
> > > +#
> > > +# Since : 9.0
> >
> > 9.1
> >
> > > +##
> > > +{ 'command': 'cxl-release-dynamic-capacity',
> > > + 'data': { 'path': 'str',
> > > + 'region-id': 'uint8',
> > > + 'extents': [ 'CXLDCExtentRecord' ]
> > > + }
> > > +}
> >
>
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-26 15:55 ` Jonathan Cameron via
(?)
@ 2024-04-26 16:22 ` Gregory Price
-1 siblings, 0 replies; 81+ messages in thread
From: Gregory Price @ 2024-04-26 16:22 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Ira Weiny, Markus Armbruster, nifan.cxl, qemu-devel, linux-cxl,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Fri, Apr 26, 2024 at 04:55:55PM +0100, Jonathan Cameron wrote:
> On Wed, 24 Apr 2024 10:33:33 -0700
> Ira Weiny <ira.weiny@intel.com> wrote:
>
> > Markus Armbruster wrote:
> > > nifan.cxl@gmail.com writes:
> > >
> > > > From: Fan Ni <fan.ni@samsung.com>
> > > >
> > > > Since fabric manager emulation is not supported yet, the change implements
> > > > the functions to add/release dynamic capacity extents as QMP interfaces.
> > >
> > > Will fabric manager emulation obsolete these commands?
> >
> > I don't think so. In the development of the kernel, I see these being
> > valuable to do CI and regression testing without the complexity of an FM.
>
> Fully agree - I also long term see these as the drivers for one
> possible virtualization stack for DCD devices (whether it turns
> out to be the way forwards for that is going to take a while to
> resolve!)
>
> It doesn't make much sense to add a fabric manager into that flow
> or to expose an appropriate (maybe MCTP) interface from QEMU just
> to poke the emulated device.
>
> Jonathan
>
fwiw it's useful in modeling the Orchestrator/Fabric Manager interaction,
since you can basically build a little emulated MHD FM-LD on top of this.
You basically just put a tiny software layer in front that converts what
would be MCTP or whatever commands into QMP commands forwarded to the
appropriate socket.
When a real device comes around, you just point it at the real thing
instead of that small software layer.
But for the actual fabric manager, less useful. (Also, if you're
confused, it's because fabric manager is such an overloaded term
*laughcry*)
~Gregory
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-24 13:09 ` Markus Armbruster
2024-04-24 17:10 ` fan
2024-04-24 17:33 ` Ira Weiny
@ 2024-04-24 17:39 ` fan
2024-04-25 5:48 ` Markus Armbruster
2 siblings, 1 reply; 81+ messages in thread
From: fan @ 2024-04-24 17:39 UTC (permalink / raw)
To: Markus Armbruster
Cc: nifan.cxl, qemu-devel, jonathan.cameron, linux-cxl,
gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
nmtadam.samsung, jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Apr 24, 2024 at 03:09:52PM +0200, Markus Armbruster wrote:
> nifan.cxl@gmail.com writes:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > Since fabric manager emulation is not supported yet, the change implements
> > the functions to add/release dynamic capacity extents as QMP interfaces.
>
> Will fabric manager emulation obsolete these commands?
If in the future, fabric manager emulation supports commands for dynamic capacity
extent add/release, it is possible we do not need the commands.
But it seems not to happen soon, we need the qmp commands for the
end-to-end test with kernel DCD support.
>
> > Note: we skips any FM issued extent release request if the exact extent
> > does not exist in the extent list of the device. We will loose the
> > restriction later once we have partial release support in the kernel.
> >
> > 1. Add dynamic capacity extents:
> >
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> >
> > { "execute": "qmp_capabilities" }
> >
> > { "execute": "cxl-add-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 0,
> > "len": 134217728
> > },
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > 2. Release dynamic capacity extents:
> >
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) look like below:
> >
> > { "execute": "cxl-release-dynamic-capacity",
> > "arguments": {
> > "path": "/machine/peripheral/cxl-dcd0",
> > "region-id": 0,
> > "extents": [
> > {
> > "dpa": 134217728,
> > "len": 134217728
> > }
> > ]
> > }
> > }
> >
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
>
> [...]
>
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 8cc4c72fa9..2645004666 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -19,13 +19,16 @@
> > #
> > # @fatal: Fatal Event Log
> > #
> > +# @dyncap: Dynamic Capacity Event Log
> > +#
> > # Since: 8.1
> > ##
> > { 'enum': 'CxlEventLog',
> > 'data': ['informational',
> > 'warning',
> > 'failure',
> > - 'fatal']
> > + 'fatal',
> > + 'dyncap']
>
> We tend to avoid abbreviations in QMP identifiers: dynamic-capacity.
FYI. This has been removed to avoid the potential side effect in the
latest post.
v7: https://lore.kernel.org/linux-cxl/ZiaFYUB6FC9NR7W4@memverge.com/T/#t
>
> > }
> >
> > ##
> > @@ -361,3 +364,59 @@
> > ##
> > {'command': 'cxl-inject-correctable-error',
> > 'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDCExtentRecord:
>
> Such traffic jams of capital letters are hard to read.
>
> What does DC mean?
Dynamic capacity
>
> > +#
> > +# Record of a single extent to add/release
> > +#
> > +# @offset: offset to the start of the region where the extent to be operated
>
> Blank line here, please
>
> > +# @len: length of the extent
> > +#
> > +# Since: 9.0
> > +##
> > +{ 'struct': 'CXLDCExtentRecord',
> > + 'data': {
> > + 'offset':'uint64',
> > + 'len': 'uint64'
> > + }
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to start add dynamic capacity extents flow. The device will
>
> I think we're missing an article here. Is it "a flow" or "the flow"?
>
> > +# have to acknowledged the acceptance of the extents before they are usable.
>
> to acknowledge
It should be "to be acknowledged".
>
> docs/devel/qapi-code-gen.rst:
>
> For legibility, wrap text paragraphs so every line is at most 70
> characters long.
>
> Separate sentences with two spaces.
Thanks. Will fix.
>
> > +#
> > +# @path: CXL DCD canonical QOM path
>
> What is a CXL DCD? Is it a device?
Dynamic capacity device.
Yes. It is cxl memory device that can change capacity dynamically.
>
> I'd prefer @qom-path, unless you can make a consistency argument for
> @path.
>
> > +# @region-id: id of the region where the extent to add
>
> What's a region, and how do they get their IDs?
Each DCD device can support up to 8 regions (0-7).
>
> > +# @extents: Extents to add
>
> Blank lines between argument descriptions, please.
>
> > +#
> > +# Since : 9.0
>
> 9.1
Already fixed in the latest post.
>
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > + 'data': { 'path': 'str',
> > + 'region-id': 'uint8',
> > + 'extents': [ 'CXLDCExtentRecord' ]
> > + }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to start release dynamic capacity extents flow. The host will
>
> Article again.
>
> The host? In cxl-add-dynamic-capacity's doc comment, it's the device.
For add command, the host will send a mailbox command to response to the add
request to the device to indicate whether it accepts the add capacity offer
or not.
For release command, the host send a mailbox command (not always a response
since the host can proactively release capacity if it does not need it
any more) to device to ask device release the capacity.
But yes, the text needs to be polished.
>
> > +# need to respond to indicate that it has released the capacity before it
> > +# is made unavailable for read and write and can be re-added.
>
> Is "and can be re-added" relevant here?
Not really. Will fix.
>
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to release
> > +# @extents: Extents to release
> > +#
> > +# Since : 9.0
>
> 9.1
Already fixed in the latest post.
Thanks again for the review. Will take care of the comments in the next
version.
Fan
>
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > + 'data': { 'path': 'str',
> > + 'region-id': 'uint8',
> > + 'extents': [ 'CXLDCExtentRecord' ]
> > + }
> > +}
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-24 17:39 ` fan
@ 2024-04-25 5:48 ` Markus Armbruster
2024-04-25 17:30 ` Ira Weiny
0 siblings, 1 reply; 81+ messages in thread
From: Markus Armbruster @ 2024-04-25 5:48 UTC (permalink / raw)
To: fan
Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
fan <nifan.cxl@gmail.com> writes:
> On Wed, Apr 24, 2024 at 03:09:52PM +0200, Markus Armbruster wrote:
>> nifan.cxl@gmail.com writes:
>>
>> > From: Fan Ni <fan.ni@samsung.com>
>> >
>> > Since fabric manager emulation is not supported yet, the change implements
>> > the functions to add/release dynamic capacity extents as QMP interfaces.
>>
>> Will fabric manager emulation obsolete these commands?
>
> If in the future, fabric manager emulation supports commands for dynamic capacity
> extent add/release, it is possible we do not need the commands.
> But it seems not to happen soon, we need the qmp commands for the
> end-to-end test with kernel DCD support.
I asked because if the commands are temporary testing aids, they should
probably be declared unstable. Even if they are permanent testing aids,
unstable might be the right choice. This is for the CXL maintainers to
decide.
What does "unstable" mean? docs/devel/qapi-code-gen.rst: "Interfaces so
marked may be withdrawn or changed incompatibly in future releases."
Management applications need stable interfaces. Libvirt developers
generally refuse to touch anything in QMP that's declared unstable.
Human users and their ad hoc scripts appreciate stability, but they
don't need it nearly as much as management applications do.
A stability promise increases the maintenance burden. By how much is
unclear. In other words, by promising stability, the maintainers take
on risk. Are the CXL maintainers happy to accept the risk here?
>> > Note: we skips any FM issued extent release request if the exact extent
>> > does not exist in the extent list of the device. We will loose the
>> > restriction later once we have partial release support in the kernel.
>> >
>> > 1. Add dynamic capacity extents:
>> >
>> > For example, the command to add two continuous extents (each 128MiB long)
>> > to region 0 (starting at DPA offset 0) looks like below:
>> >
>> > { "execute": "qmp_capabilities" }
>> >
>> > { "execute": "cxl-add-dynamic-capacity",
>> > "arguments": {
>> > "path": "/machine/peripheral/cxl-dcd0",
>> > "region-id": 0,
>> > "extents": [
>> > {
>> > "dpa": 0,
>> > "len": 134217728
>> > },
>> > {
>> > "dpa": 134217728,
>> > "len": 134217728
>> > }
>> > ]
>> > }
>> > }
>> >
>> > 2. Release dynamic capacity extents:
>> >
>> > For example, the command to release an extent of size 128MiB from region 0
>> > (DPA offset 128MiB) look like below:
>> >
>> > { "execute": "cxl-release-dynamic-capacity",
>> > "arguments": {
>> > "path": "/machine/peripheral/cxl-dcd0",
>> > "region-id": 0,
>> > "extents": [
>> > {
>> > "dpa": 134217728,
>> > "len": 134217728
>> > }
>> > ]
>> > }
>> > }
>> >
>> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
>>
>> [...]
>>
>> > diff --git a/qapi/cxl.json b/qapi/cxl.json
>> > index 8cc4c72fa9..2645004666 100644
>> > --- a/qapi/cxl.json
>> > +++ b/qapi/cxl.json
>> > @@ -19,13 +19,16 @@
>> > #
>> > # @fatal: Fatal Event Log
>> > #
>> > +# @dyncap: Dynamic Capacity Event Log
>> > +#
>> > # Since: 8.1
>> > ##
>> > { 'enum': 'CxlEventLog',
>> > 'data': ['informational',
>> > 'warning',
>> > 'failure',
>> > - 'fatal']
>> > + 'fatal',
>> > + 'dyncap']
>>
>> We tend to avoid abbreviations in QMP identifiers: dynamic-capacity.
>
> FYI. This has been removed to avoid the potential side effect in the
> latest post.
> v7: https://lore.kernel.org/linux-cxl/ZiaFYUB6FC9NR7W4@memverge.com/T/#t
>
>>
>> > }
>> >
>> > ##
>> > @@ -361,3 +364,59 @@
>> > ##
>> > {'command': 'cxl-inject-correctable-error',
>> > 'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
>> > +
>> > +##
>> > +# @CXLDCExtentRecord:
>>
>> Such traffic jams of capital letters are hard to read.
>>
>> What does DC mean?
>
> Dynamic capacity
Suggest CxlDynamicCapacityExtent.
>> > +#
>> > +# Record of a single extent to add/release
>> > +#
>> > +# @offset: offset to the start of the region where the extent to be operated
>>
>> Blank line here, please
>>
>> > +# @len: length of the extent
>> > +#
>> > +# Since: 9.0
>> > +##
>> > +{ 'struct': 'CXLDCExtentRecord',
>> > + 'data': {
>> > + 'offset':'uint64',
>> > + 'len': 'uint64'
>> > + }
>> > +}
>> > +
>> > +##
>> > +# @cxl-add-dynamic-capacity:
>> > +#
>> > +# Command to start add dynamic capacity extents flow. The device will
>>
>> I think we're missing an article here. Is it "a flow" or "the flow"?
>>
>> > +# have to acknowledged the acceptance of the extents before they are usable.
>>
>> to acknowledge
>
> It should be "to be acknowledged".
>
>>
>> docs/devel/qapi-code-gen.rst:
>>
>> For legibility, wrap text paragraphs so every line is at most 70
>> characters long.
>>
>> Separate sentences with two spaces.
>
> Thanks. Will fix.
>>
>> > +#
>> > +# @path: CXL DCD canonical QOM path
>>
>> What is a CXL DCD? Is it a device?
>
> Dynamic capacity device.
> Yes. It is cxl memory device that can change capacity dynamically.
Sure the QOM path needs to be canonical?
If not, what about "path to the CXL dynamic capacity device in the QOM
tree". Intentionally close to existing descriptions of @qom-path
elsewhere.
>> I'd prefer @qom-path, unless you can make a consistency argument for
>> @path.
>>
>> > +# @region-id: id of the region where the extent to add
>>
>> What's a region, and how do they get their IDs?
>
> Each DCD device can support up to 8 regions (0-7).
Is "region ID" the established terminology in CXL-land? Or is "region
number" also used? I'm asking because "ID" in this QEMU device context
suggests a connection to a qdev ID.
If region number is fine, I'd rename to just @region, and rephrase the
description to avoid "ID". Perhaps "number of the region the extent is
to be added to". Not entirely happy with the phrasing, doesn't exactly
roll off the tongue, but "where the extent to add" sounds worse to my
ears. Mind, I'm not a native speaker.
>> > +# @extents: Extents to add
>>
>> Blank lines between argument descriptions, please.
>>
>> > +#
>> > +# Since : 9.0
>>
>> 9.1
>
> Already fixed in the latest post.
>
>>
>> > +##
>> > +{ 'command': 'cxl-add-dynamic-capacity',
>> > + 'data': { 'path': 'str',
>> > + 'region-id': 'uint8',
>> > + 'extents': [ 'CXLDCExtentRecord' ]
>> > + }
>> > +}
>> > +
>> > +##
>> > +# @cxl-release-dynamic-capacity:
>> > +#
>> > +# Command to start release dynamic capacity extents flow. The host will
>>
>> Article again.
>>
>> The host? In cxl-add-dynamic-capacity's doc comment, it's the device.
>
> For add command, the host will send a mailbox command to response to the add
> request to the device to indicate whether it accepts the add capacity offer
> or not.
>
> For release command, the host send a mailbox command (not always a response
> since the host can proactively release capacity if it does not need it
> any more) to device to ask device release the capacity.
>
> But yes, the text needs to be polished.
Please do. You may have to briefly explain which peer initiates what
for this to make sense.
>> > +# need to respond to indicate that it has released the capacity before it
>> > +# is made unavailable for read and write and can be re-added.
>>
>> Is "and can be re-added" relevant here?
>
> Not really. Will fix.
>
>>
>> > +#
>> > +# @path: CXL DCD canonical QOM path
>> > +# @region-id: id of the region where the extent to release
>> > +# @extents: Extents to release
>> > +#
>> > +# Since : 9.0
>>
>> 9.1
>
> Already fixed in the latest post.
>
> Thanks again for the review. Will take care of the comments in the next
> version.
You're welcome!
> Fan
>>
>> > +##
>> > +{ 'command': 'cxl-release-dynamic-capacity',
>> > + 'data': { 'path': 'str',
>> > + 'region-id': 'uint8',
>> > + 'extents': [ 'CXLDCExtentRecord' ]
>> > + }
>> > +}
>>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-25 5:48 ` Markus Armbruster
@ 2024-04-25 17:30 ` Ira Weiny
2024-04-26 16:00 ` Jonathan Cameron via
0 siblings, 1 reply; 81+ messages in thread
From: Ira Weiny @ 2024-04-25 17:30 UTC (permalink / raw)
To: Markus Armbruster, fan
Cc: qemu-devel, jonathan.cameron, linux-cxl, gregory.price,
ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
Markus Armbruster wrote:
> fan <nifan.cxl@gmail.com> writes:
>
> > On Wed, Apr 24, 2024 at 03:09:52PM +0200, Markus Armbruster wrote:
> >> nifan.cxl@gmail.com writes:
> >>
> >> > From: Fan Ni <fan.ni@samsung.com>
> >> >
> >> > Since fabric manager emulation is not supported yet, the change implements
> >> > the functions to add/release dynamic capacity extents as QMP interfaces.
> >>
> >> Will fabric manager emulation obsolete these commands?
> >
> > If in the future, fabric manager emulation supports commands for dynamic capacity
> > extent add/release, it is possible we do not need the commands.
> > But it seems not to happen soon, we need the qmp commands for the
> > end-to-end test with kernel DCD support.
>
> I asked because if the commands are temporary testing aids, they should
> probably be declared unstable. Even if they are permanent testing aids,
> unstable might be the right choice. This is for the CXL maintainers to
> decide.
>
> What does "unstable" mean? docs/devel/qapi-code-gen.rst: "Interfaces so
> marked may be withdrawn or changed incompatibly in future releases."
>
> Management applications need stable interfaces. Libvirt developers
> generally refuse to touch anything in QMP that's declared unstable.
>
> Human users and their ad hoc scripts appreciate stability, but they
> don't need it nearly as much as management applications do.
>
> A stability promise increases the maintenance burden. By how much is
> unclear. In other words, by promising stability, the maintainers take
> on risk. Are the CXL maintainers happy to accept the risk here?
>
Ah... All great points.
Outside of CXL development I don't think there is a strong need for them
to be stable. I would like to see more than ad hoc scripts use them
though. So I don't think they are going to be changed without some
thought though.
Ira
[snip]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
2024-04-25 17:30 ` Ira Weiny
@ 2024-04-26 16:00 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-04-26 16:00 UTC (permalink / raw)
To: Ira Weiny
Cc: Markus Armbruster, fan, qemu-devel, linux-cxl, gregory.price,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Thu, 25 Apr 2024 10:30:51 -0700
Ira Weiny <ira.weiny@intel.com> wrote:
> Markus Armbruster wrote:
> > fan <nifan.cxl@gmail.com> writes:
> >
> > > On Wed, Apr 24, 2024 at 03:09:52PM +0200, Markus Armbruster wrote:
> > >> nifan.cxl@gmail.com writes:
> > >>
> > >> > From: Fan Ni <fan.ni@samsung.com>
> > >> >
> > >> > Since fabric manager emulation is not supported yet, the change implements
> > >> > the functions to add/release dynamic capacity extents as QMP interfaces.
> > >>
> > >> Will fabric manager emulation obsolete these commands?
> > >
> > > If in the future, fabric manager emulation supports commands for dynamic capacity
> > > extent add/release, it is possible we do not need the commands.
> > > But it seems not to happen soon, we need the qmp commands for the
> > > end-to-end test with kernel DCD support.
> >
> > I asked because if the commands are temporary testing aids, they should
> > probably be declared unstable. Even if they are permanent testing aids,
> > unstable might be the right choice. This is for the CXL maintainers to
> > decide.
> >
> > What does "unstable" mean? docs/devel/qapi-code-gen.rst: "Interfaces so
> > marked may be withdrawn or changed incompatibly in future releases."
> >
> > Management applications need stable interfaces. Libvirt developers
> > generally refuse to touch anything in QMP that's declared unstable.
> >
> > Human users and their ad hoc scripts appreciate stability, but they
> > don't need it nearly as much as management applications do.
> >
> > A stability promise increases the maintenance burden. By how much is
> > unclear. In other words, by promising stability, the maintainers take
> > on risk. Are the CXL maintainers happy to accept the risk here?
> >
>
> Ah... All great points.
>
> Outside of CXL development I don't think there is a strong need for them
> to be stable. I would like to see more than ad hoc scripts use them
> though. So I don't think they are going to be changed without some
> thought though.
These align closely with the data that comes from the fabric management
API in the CXL spec. So I don't see a big maintenance burden problem
in having these as stable interfaces. Whilst they aren't doing quite
the same job as the FM-API (which will be emulated such that it is
visible to the guest as that aids some other types of testing) that
interface defines the limits on what we can tell the device to do.
So yes, risk for these is minimal and I'm happy to accept that.
It'll be a while before we need libvirt to use them but I do
expect to see that happen. (subject to some guessing on a future
virtualization stack!)
Jonathan
>
> Ira
>
> [snip]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
@ 2024-04-26 16:00 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-04-26 16:00 UTC (permalink / raw)
To: Ira Weiny
Cc: Markus Armbruster, fan, qemu-devel, linux-cxl, gregory.price,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Thu, 25 Apr 2024 10:30:51 -0700
Ira Weiny <ira.weiny@intel.com> wrote:
> Markus Armbruster wrote:
> > fan <nifan.cxl@gmail.com> writes:
> >
> > > On Wed, Apr 24, 2024 at 03:09:52PM +0200, Markus Armbruster wrote:
> > >> nifan.cxl@gmail.com writes:
> > >>
> > >> > From: Fan Ni <fan.ni@samsung.com>
> > >> >
> > >> > Since fabric manager emulation is not supported yet, the change implements
> > >> > the functions to add/release dynamic capacity extents as QMP interfaces.
> > >>
> > >> Will fabric manager emulation obsolete these commands?
> > >
> > > If in the future, fabric manager emulation supports commands for dynamic capacity
> > > extent add/release, it is possible we do not need the commands.
> > > But it seems not to happen soon, we need the qmp commands for the
> > > end-to-end test with kernel DCD support.
> >
> > I asked because if the commands are temporary testing aids, they should
> > probably be declared unstable. Even if they are permanent testing aids,
> > unstable might be the right choice. This is for the CXL maintainers to
> > decide.
> >
> > What does "unstable" mean? docs/devel/qapi-code-gen.rst: "Interfaces so
> > marked may be withdrawn or changed incompatibly in future releases."
> >
> > Management applications need stable interfaces. Libvirt developers
> > generally refuse to touch anything in QMP that's declared unstable.
> >
> > Human users and their ad hoc scripts appreciate stability, but they
> > don't need it nearly as much as management applications do.
> >
> > A stability promise increases the maintenance burden. By how much is
> > unclear. In other words, by promising stability, the maintainers take
> > on risk. Are the CXL maintainers happy to accept the risk here?
> >
>
> Ah... All great points.
>
> Outside of CXL development I don't think there is a strong need for them
> to be stable. I would like to see more than ad hoc scripts use them
> though. So I don't think they are going to be changed without some
> thought though.
These align closely with the data that comes from the fabric management
API in the CXL spec. So I don't see a big maintenance burden problem
in having these as stable interfaces. Whilst they aren't doing quite
the same job as the FM-API (which will be emulated such that it is
visible to the guest as that aids some other types of testing) that
interface defines the limits on what we can tell the device to do.
So yes, risk for these is minimal and I'm happy to accept that.
It'll be a while before we need libvirt to use them but I do
expect to see that happen. (subject to some guessing on a future
virtualization stack!)
Jonathan
>
> Ira
>
> [snip]
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 10/13] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (8 preceding siblings ...)
2024-03-04 19:34 ` [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-03-04 19:34 ` nifan.cxl
2024-03-06 17:50 ` Jonathan Cameron via
2024-03-04 19:34 ` [PATCH v5 11/13] hw/cxl/cxl-mailbox-utils: Add partial and superset extent release mailbox support nifan.cxl
` (2 subsequent siblings)
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:34 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
Not all dpa range in the DC regions is valid to access until an extent
covering the range has been added. Add a bitmap for each region to
record whether a DC block in the region has been backed by DC extent.
For the bitmap, a bit in the bitmap represents a DC block. When a DC
extent is added, all the bits of the blocks in the extent will be set,
which will be cleared when the extent is released.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/cxl/cxl-mailbox-utils.c | 4 ++
hw/mem/cxl_type3.c | 76 +++++++++++++++++++++++++++++++++++++
include/hw/cxl/cxl_device.h | 7 ++++
3 files changed, 87 insertions(+)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 53ebc526ae..b538297bb5 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1606,6 +1606,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
ct3d->dc.total_extent_count += 1;
+ ct3_set_region_block_backed(ct3d, dpa, len);
}
/*
@@ -1681,17 +1682,20 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
found = true;
cxl_remove_extent_from_extent_list(extent_list, ent);
ct3d->dc.total_extent_count -= 1;
+ ct3_clear_region_block_backed(ct3d, ent_start_dpa, ent_len);
if (len1) {
cxl_insert_extent_to_extent_list(extent_list,
ent_start_dpa, len1,
NULL, 0);
ct3d->dc.total_extent_count += 1;
+ ct3_set_region_block_backed(ct3d, ent_start_dpa, len1);
}
if (len2) {
cxl_insert_extent_to_extent_list(extent_list, dpa + len,
len2, NULL, 0);
ct3d->dc.total_extent_count += 1;
+ ct3_set_region_block_backed(ct3d, dpa + len, len2);
}
break;
} else {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index e9c8994cdb..c164cf4580 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -672,6 +672,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
region_base += region->len;
ct3d->dc.total_capacity += region->len;
+ region->blk_bitmap = bitmap_new(region->len / region->block_size);
}
QTAILQ_INIT(&ct3d->dc.extents);
QTAILQ_INIT(&ct3d->dc.extents_pending_to_add);
@@ -682,6 +683,8 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
{
CXLDCExtent *ent;
+ int i;
+ CXLDCRegion *region;
while (!QTAILQ_EMPTY(&ct3d->dc.extents)) {
ent = QTAILQ_FIRST(&ct3d->dc.extents);
@@ -693,6 +696,11 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending_to_add,
ent);
}
+
+ for (i = 0; i < ct3d->dc.num_regions; i++) {
+ region = &ct3d->dc.regions[i];
+ g_free(region->blk_bitmap);
+ }
}
static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -924,6 +932,70 @@ static void ct3_exit(PCIDevice *pci_dev)
}
}
+/*
+ * Mark the DPA range [dpa, dap + len) to be backed and accessible. This
+ * happens when a DC extent is added and accepted by the host.
+ */
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+ uint64_t len)
+{
+ CXLDCRegion *region;
+
+ region = cxl_find_dc_region(ct3d, dpa, len);
+ if (!region) {
+ return;
+ }
+
+ bitmap_set(region->blk_bitmap, (dpa - region->base) / region->block_size,
+ len / region->block_size);
+}
+
+/*
+ * Check whether the DPA range [dpa, dpa + len) is backed with DC extents.
+ * Used when validating read/write to dc regions
+ */
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+ uint64_t len)
+{
+ CXLDCRegion *region;
+ uint64_t nbits;
+ long nr;
+
+ region = cxl_find_dc_region(ct3d, dpa, len);
+ if (!region) {
+ return false;
+ }
+
+ nr = (dpa - region->base) / region->block_size;
+ nbits = DIV_ROUND_UP(len, region->block_size);
+ /*
+ * if bits between [dpa, dpa + len) are all 1s, meaning the DPA range is
+ * backed with DC extents, return true; else return false.
+ */
+ return find_next_zero_bit(region->blk_bitmap, nr + nbits, nr) == nr + nbits;
+}
+
+/*
+ * Mark the DPA range [dpa, dap + len) to be unbacked and inaccessible. This
+ * happens when a dc extent is released by the host.
+ */
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+ uint64_t len)
+{
+ CXLDCRegion *region;
+ uint64_t nbits;
+ long nr;
+
+ region = cxl_find_dc_region(ct3d, dpa, len);
+ if (!region) {
+ return;
+ }
+
+ nr = (dpa - region->base) / region->block_size;
+ nbits = len / region->block_size;
+ bitmap_clear(region->blk_bitmap, nr, nbits);
+}
+
static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
{
int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;
@@ -1029,6 +1101,10 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
*as = &ct3d->hostpmem_as;
*dpa_offset -= vmr_size;
} else {
+ if (!ct3_test_region_block_backed(ct3d, *dpa_offset, size)) {
+ return -ENODEV;
+ }
+
*as = &ct3d->dc.host_dc_as;
*dpa_offset -= (vmr_size + pmr_size);
}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index b524c5e699..b213149de2 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -450,6 +450,7 @@ typedef struct CXLDCRegion {
uint64_t block_size;
uint32_t dsmadhandle;
uint8_t flags;
+ unsigned long *blk_bitmap;
} CXLDCRegion;
struct CXLType3Dev {
@@ -557,4 +558,10 @@ void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
uint16_t shared_seq);
bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
unsigned long size);
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+ uint64_t len);
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+ uint64_t len);
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+ uint64_t len);
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 10/13] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
2024-03-04 19:34 ` [PATCH v5 10/13] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions nifan.cxl
@ 2024-03-06 17:50 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 17:50 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:05 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Not all dpa range in the DC regions is valid to access until an extent
All DPA ranges in the DC regions are invalid to access until an extent
covering the range has been added.
> covering the range has been added. Add a bitmap for each region to
> record whether a DC block in the region has been backed by DC extent.
> For the bitmap, a bit in the bitmap represents a DC block. When a DC
> extent is added, all the bits of the blocks in the extent will be set,
> which will be cleared when the extent is released.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 10/13] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions
@ 2024-03-06 17:50 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 17:50 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:05 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Not all dpa range in the DC regions is valid to access until an extent
All DPA ranges in the DC regions are invalid to access until an extent
covering the range has been added.
> covering the range has been added. Add a bitmap for each region to
> record whether a DC block in the region has been backed by DC extent.
> For the bitmap, a bit in the bitmap represents a DC block. When a DC
> extent is added, all the bits of the blocks in the extent will be set,
> which will be cleared when the extent is released.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 11/13] hw/cxl/cxl-mailbox-utils: Add partial and superset extent release mailbox support
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (9 preceding siblings ...)
2024-03-04 19:34 ` [PATCH v5 10/13] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions nifan.cxl
@ 2024-03-04 19:34 ` nifan.cxl
2024-03-06 18:09 ` Jonathan Cameron via
2024-03-04 19:34 ` [PATCH v5 12/13] hw/mem/cxl_type3: Allow to release partial extent and extent superset in QMP interface nifan.cxl
2024-03-04 19:34 ` [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents nifan.cxl
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:34 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
With the change, we extend the extent release mailbox command processing
to allow more flexible release. As long as the DPA range of the extent to
release is covered by valid extent(s) in the device, the release can be
performed.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/cxl/cxl-mailbox-utils.c | 211 +++++++++++++++++++++++++++++++++----
1 file changed, 188 insertions(+), 23 deletions(-)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index b538297bb5..eaff5c4c93 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1617,6 +1617,155 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
return CXL_MBOX_SUCCESS;
}
+/*
+ * Return value: the id of the DC region that covers the DPA range
+ * [dpa, dpa+len) The assumption is that the range is valid and within
+ * a DC region.
+ */
+static uint8_t cxl_find_dc_region_id(const CXLType3Dev *ct3d, uint64_t dpa,
+ uint64_t len)
+{
+ int i;
+ const CXLDCRegion *region;
+
+ for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
+ region = &ct3d->dc.regions[i];
+ if (dpa >= region->base) {
+ break;
+ }
+ }
+ return i;
+}
+
+/*
+ * Copy extent list from src to dst
+ * Return value: number of extents copied
+ */
+static uint32_t copy_extent_list(CXLDCExtentList *dst,
+ const CXLDCExtentList *src)
+{
+ uint32_t cnt = 0;
+ CXLDCExtent *ent;
+
+ if (!dst || !src) {
+ return 0;
+ }
+
+ QTAILQ_FOREACH(ent, src, node) {
+ cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
+ ent->tag, ent->shared_seq);
+ cnt++;
+ }
+ return cnt;
+}
+
+/*
+ * Detect potential extent overflow caused by extent split during processing
+ * extent release requests, also allow releasing superset of extents where the
+ * extent to release covers the range of multiple extents in the device.
+ * Note:
+ * 1.we will reject releasing an extent if some portion of its rang is
+ * not covered by valid extents.
+ * 2.This function is called after cxl_detect_malformed_extent_list so checks
+ * already performed there will be skipped.
+ */
+static CXLRetCode cxl_detect_extent_overflow(const CXLType3Dev *ct3d,
+ const CXLUpdateDCExtentListInPl *in)
+{
+ uint64_t nbits, offset;
+ const CXLDCRegion *region;
+ unsigned long **bitmaps_copied;
+ uint64_t dpa, len;
+ int i, rid;
+ CXLRetCode ret = CXL_MBOX_SUCCESS;
+ long extent_cnt_delta = 0;
+ CXLDCExtentList tmp_list;
+ CXLDCExtent *ent;
+
+ QTAILQ_INIT(&tmp_list);
+ copy_extent_list(&tmp_list, &ct3d->dc.extents);
+
+ bitmaps_copied = g_new0(unsigned long *, ct3d->dc.num_regions);
+ for (i = 0; i < ct3d->dc.num_regions; i++) {
+ region = &ct3d->dc.regions[i];
+ nbits = region->len / region->block_size;
+ bitmaps_copied[i] = bitmap_new(nbits);
+ bitmap_copy(bitmaps_copied[i], region->blk_bitmap, nbits);
+ }
+
+ for (i = 0; i < in->num_entries_updated; i++) {
+ dpa = in->updated_entries[i].start_dpa;
+ len = in->updated_entries[i].len;
+
+ rid = cxl_find_dc_region_id(ct3d, dpa, len);
+ region = &ct3d->dc.regions[rid];
+ offset = (dpa - region->base) / region->block_size;
+ nbits = len / region->block_size;
+
+ /* Check whether range [dpa, dpa + len) is covered by valid range */
+ if (find_next_zero_bit(bitmaps_copied[rid], offset + nbits, offset) <
+ offset + nbits) {
+ ret = CXL_MBOX_INVALID_PA;
+ goto free_and_exit;
+ }
+
+ QTAILQ_FOREACH(ent, &tmp_list, node) {
+ /* Only split within an extent can cause extent count increase */
+ if (ent->start_dpa <= dpa &&
+ dpa + len <= ent->start_dpa + ent->len) {
+ uint64_t ent_start_dpa = ent->start_dpa;
+ uint64_t ent_len = ent->len;
+ uint64_t len1 = dpa - ent_start_dpa;
+ uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
+
+ extent_cnt_delta += len1 && len2 ? 2 : (len1 || len2 ? 1 : 0);
+ extent_cnt_delta -= 1;
+ if (ct3d->dc.total_extent_count + extent_cnt_delta >
+ CXL_NUM_EXTENTS_SUPPORTED) {
+ ret = CXL_MBOX_RESOURCES_EXHAUSTED;
+ goto free_and_exit;
+ }
+
+ offset = (ent->start_dpa - region->base) / region->block_size;
+ nbits = ent->len / region->block_size;
+ bitmap_clear(bitmaps_copied[rid], offset, nbits);
+ cxl_remove_extent_from_extent_list(&tmp_list, ent);
+
+ if (len1) {
+ offset = (dpa - region->base) / region->block_size;
+ nbits = len1 / region->block_size;
+ bitmap_set(bitmaps_copied[rid], offset, nbits);
+ cxl_insert_extent_to_extent_list(&tmp_list,
+ ent_start_dpa, len1,
+ NULL, 0);
+ }
+
+ if (len2) {
+ offset = (dpa + len - region->base) / region->block_size;
+ nbits = len2 / region->block_size;
+ bitmap_set(bitmaps_copied[rid], offset, nbits);
+ cxl_insert_extent_to_extent_list(&tmp_list, dpa + len,
+ len2, NULL, 0);
+ }
+ break;
+ }
+ }
+ }
+
+free_and_exit:
+ for (i = 0; i < ct3d->dc.num_regions; i++) {
+ g_free(bitmaps_copied[i]);
+ }
+ g_free(bitmaps_copied);
+
+ while (!QTAILQ_EMPTY(&tmp_list)) {
+ ent = QTAILQ_FIRST(&tmp_list);
+ cxl_remove_extent_from_extent_list(&tmp_list, ent);
+ }
+
+ return ret;
+}
+
/*
* CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
*/
@@ -1644,15 +1793,28 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
return ret;
}
- for (i = 0; i < in->num_entries_updated; i++) {
- bool found = false;
+ ret = cxl_detect_extent_overflow(ct3d, in);
+ if (ret != CXL_MBOX_SUCCESS) {
+ return ret;
+ }
+ /*
+ * After this point, it is guaranteed that the extents in the
+ * updated extent list to release is valid, that means:
+ * 1. All extents in the list have no overlaps;
+ * 2. Each extent belongs to a valid DC region;
+ * 3. The DPA range of each extent is covered by valid extent
+ * in the device.
+ */
+ for (i = 0; i < in->num_entries_updated; i++) {
dpa = in->updated_entries[i].start_dpa;
len = in->updated_entries[i].len;
+process_leftover:
QTAILQ_FOREACH(ent, extent_list, node) {
/* Found the extent overlapping with */
if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
+ /* Case 1: The to-release extent is subset of ent */
if (dpa + len <= ent->start_dpa + ent->len) {
/*
* The incoming extent covers a portion of an extent
@@ -1669,17 +1831,6 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
uint64_t len1 = dpa - ent_start_dpa;
uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
- /*
- * TODO: checking for possible extent overflow, will be
- * moved into a dedicated function of detecting extent
- * overflow.
- */
- if (len1 && len2 && ct3d->dc.total_extent_count ==
- CXL_NUM_EXTENTS_SUPPORTED) {
- return CXL_MBOX_RESOURCES_EXHAUSTED;
- }
-
- found = true;
cxl_remove_extent_from_extent_list(extent_list, ent);
ct3d->dc.total_extent_count -= 1;
ct3_clear_region_block_backed(ct3d, ent_start_dpa, ent_len);
@@ -1700,20 +1851,34 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
break;
} else {
/*
- * TODO: we reject the attempt to remove an extent that
- * overlaps with multiple extents in the device for now,
- * once the bitmap indicating whether a DPA range is
- * covered by valid extents is introduced, will allow it.
+ * Case 2: the to-release extent overlaps with multiple
+ * extents, including the superset case
*/
- return CXL_MBOX_INVALID_PA;
+ uint64_t ent_start_dpa = ent->start_dpa;
+ uint64_t ent_len = ent->len;
+ uint64_t len1 = dpa - ent_start_dpa;
+
+ cxl_remove_extent_from_extent_list(extent_list, ent);
+ ct3d->dc.total_extent_count -= 1;
+ ct3_clear_region_block_backed(ct3d, ent_start_dpa, ent_len);
+
+ if (len1) {
+ cxl_insert_extent_to_extent_list(extent_list,
+ ent_start_dpa, len1,
+ NULL, 0);
+ ct3d->dc.total_extent_count += 1;
+ ct3_set_region_block_backed(ct3d, ent_start_dpa, len1);
+ }
+ /*
+ * processing the portion of the range following current
+ * extent
+ */
+ len = dpa + len - ent_start_dpa - ent_len;
+ dpa = ent_start_dpa + ent_len;
+ goto process_leftover;
}
}
}
-
- if (!found) {
- /* Try to remove a non-existing extent. */
- return CXL_MBOX_INVALID_PA;
- }
}
return CXL_MBOX_SUCCESS;
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 11/13] hw/cxl/cxl-mailbox-utils: Add partial and superset extent release mailbox support
2024-03-04 19:34 ` [PATCH v5 11/13] hw/cxl/cxl-mailbox-utils: Add partial and superset extent release mailbox support nifan.cxl
@ 2024-03-06 18:09 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 18:09 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:06 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> With the change, we extend the extent release mailbox command processing
> to allow more flexible release. As long as the DPA range of the extent to
> release is covered by valid extent(s) in the device, the release can be
> performed.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Ouch this is more complex than I was thinking, but seems correct to me.
A few minor comments inline
Jonathan
> +/*
> + * Detect potential extent overflow caused by extent split during processing
> + * extent release requests, also allow releasing superset of extents where the
> + * extent to release covers the range of multiple extents in the device.
> + * Note:
> + * 1.we will reject releasing an extent if some portion of its rang is
range
> + * not covered by valid extents.
> + * 2.This function is called after cxl_detect_malformed_extent_list so checks
> + * already performed there will be skipped.
> + */
> +static CXLRetCode cxl_detect_extent_overflow(const CXLType3Dev *ct3d,
> + const CXLUpdateDCExtentListInPl *in)
This code is basically dry running the actual removal. Can we just
make the core code the same for both cases? The bit where you update bitmaps
and extent lists at least.
> +{
> + uint64_t nbits, offset;
> + const CXLDCRegion *region;
> + unsigned long **bitmaps_copied;
> + uint64_t dpa, len;
> + int i, rid;
> + CXLRetCode ret = CXL_MBOX_SUCCESS;
> + long extent_cnt_delta = 0;
> + CXLDCExtentList tmp_list;
> + CXLDCExtent *ent;
> +
> + QTAILQ_INIT(&tmp_list);
> + copy_extent_list(&tmp_list, &ct3d->dc.extents);
> +
> + bitmaps_copied = g_new0(unsigned long *, ct3d->dc.num_regions);
> + for (i = 0; i < ct3d->dc.num_regions; i++) {
> + region = &ct3d->dc.regions[i];
> + nbits = region->len / region->block_size;
> + bitmaps_copied[i] = bitmap_new(nbits);
> + bitmap_copy(bitmaps_copied[i], region->blk_bitmap, nbits);
> + }
> +
> + for (i = 0; i < in->num_entries_updated; i++) {
> + dpa = in->updated_entries[i].start_dpa;
> + len = in->updated_entries[i].len;
> +
> + rid = cxl_find_dc_region_id(ct3d, dpa, len);
> + region = &ct3d->dc.regions[rid];
> + offset = (dpa - region->base) / region->block_size;
> + nbits = len / region->block_size;
> +
> + /* Check whether range [dpa, dpa + len) is covered by valid range */
> + if (find_next_zero_bit(bitmaps_copied[rid], offset + nbits, offset) <
> + offset + nbits) {
> + ret = CXL_MBOX_INVALID_PA;
> + goto free_and_exit;
> + }
> +
> + QTAILQ_FOREACH(ent, &tmp_list, node) {
> + /* Only split within an extent can cause extent count increase */
> + if (ent->start_dpa <= dpa &&
> + dpa + len <= ent->start_dpa + ent->len) {
> + uint64_t ent_start_dpa = ent->start_dpa;
> + uint64_t ent_len = ent->len;
> + uint64_t len1 = dpa - ent_start_dpa;
> + uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
> +
> + extent_cnt_delta += len1 && len2 ? 2 : (len1 || len2 ? 1 : 0);
I think this is the same as
if (len1)
extent_cnt_delta++;
if (len2)
extent_cnt_delta++;
extent_cnt_delta--;
> + extent_cnt_delta -= 1;
> + if (ct3d->dc.total_extent_count + extent_cnt_delta >
> + CXL_NUM_EXTENTS_SUPPORTED) {
This early overflow detect seems valid to me because a device might run
out or resource mid processing the list even if it would fit at the end.
Good.
> + ret = CXL_MBOX_RESOURCES_EXHAUSTED;
> + goto free_and_exit;
> + }
> +
> + offset = (ent->start_dpa - region->base) / region->block_size;
> + nbits = ent->len / region->block_size;
> + bitmap_clear(bitmaps_copied[rid], offset, nbits);
> + cxl_remove_extent_from_extent_list(&tmp_list, ent);
> +
> + if (len1) {
> + offset = (dpa - region->base) / region->block_size;
> + nbits = len1 / region->block_size;
> + bitmap_set(bitmaps_copied[rid], offset, nbits);
> + cxl_insert_extent_to_extent_list(&tmp_list,
> + ent_start_dpa, len1,
> + NULL, 0);
> + }
> +
> + if (len2) {
> + offset = (dpa + len - region->base) / region->block_size;
> + nbits = len2 / region->block_size;
> + bitmap_set(bitmaps_copied[rid], offset, nbits);
> + cxl_insert_extent_to_extent_list(&tmp_list, dpa + len,
> + len2, NULL, 0);
> + }
> + break;
> + }
> + }
> + }
> +
> +free_and_exit:
> + for (i = 0; i < ct3d->dc.num_regions; i++) {
> + g_free(bitmaps_copied[i]);
> + }
> + g_free(bitmaps_copied);
> +
> + while (!QTAILQ_EMPTY(&tmp_list)) {
> + ent = QTAILQ_FIRST(&tmp_list);
> + cxl_remove_extent_from_extent_list(&tmp_list, ent);
> + }
> +
> + return ret;
> +}
> +
> /*
> * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> */
> @@ -1644,15 +1793,28 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> return ret;
> }
>
> - for (i = 0; i < in->num_entries_updated; i++) {
> - bool found = false;
> + ret = cxl_detect_extent_overflow(ct3d, in);
> + if (ret != CXL_MBOX_SUCCESS) {
> + return ret;
> + }
>
> + /*
> + * After this point, it is guaranteed that the extents in the
> + * updated extent list to release is valid, that means:
> + * 1. All extents in the list have no overlaps;
> + * 2. Each extent belongs to a valid DC region;
> + * 3. The DPA range of each extent is covered by valid extent
> + * in the device.
> + */
> + for (i = 0; i < in->num_entries_updated; i++) {
> dpa = in->updated_entries[i].start_dpa;
> len = in->updated_entries[i].len;
>
> +process_leftover:
> QTAILQ_FOREACH(ent, extent_list, node) {
> /* Found the extent overlapping with */
> if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
> + /* Case 1: The to-release extent is subset of ent */
> if (dpa + len <= ent->start_dpa + ent->len) {
> /*
> * The incoming extent covers a portion of an extent
> @@ -1669,17 +1831,6 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> uint64_t len1 = dpa - ent_start_dpa;
> uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
>
> - /*
> - * TODO: checking for possible extent overflow, will be
> - * moved into a dedicated function of detecting extent
> - * overflow.
> - */
> - if (len1 && len2 && ct3d->dc.total_extent_count ==
> - CXL_NUM_EXTENTS_SUPPORTED) {
> - return CXL_MBOX_RESOURCES_EXHAUSTED;
> - }
> -
> - found = true;
> cxl_remove_extent_from_extent_list(extent_list, ent);
> ct3d->dc.total_extent_count -= 1;
> ct3_clear_region_block_backed(ct3d, ent_start_dpa, ent_len);
> @@ -1700,20 +1851,34 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> break;
> } else {
> /*
> - * TODO: we reject the attempt to remove an extent that
> - * overlaps with multiple extents in the device for now,
> - * once the bitmap indicating whether a DPA range is
> - * covered by valid extents is introduced, will allow it.
> + * Case 2: the to-release extent overlaps with multiple
> + * extents, including the superset case
> */
> - return CXL_MBOX_INVALID_PA;
> + uint64_t ent_start_dpa = ent->start_dpa;
> + uint64_t ent_len = ent->len;
> + uint64_t len1 = dpa - ent_start_dpa;
> +
> + cxl_remove_extent_from_extent_list(extent_list, ent);
> + ct3d->dc.total_extent_count -= 1;
> + ct3_clear_region_block_backed(ct3d, ent_start_dpa, ent_len);
> +
> + if (len1) {
> + cxl_insert_extent_to_extent_list(extent_list,
> + ent_start_dpa, len1,
> + NULL, 0);
> + ct3d->dc.total_extent_count += 1;
> + ct3_set_region_block_backed(ct3d, ent_start_dpa, len1);
> + }
> + /*
> + * processing the portion of the range following current
> + * extent
> + */
> + len = dpa + len - ent_start_dpa - ent_len;
> + dpa = ent_start_dpa + ent_len;
> + goto process_leftover;
I'd slightly prefer a while loop I think based on len > 0
It does add indent, but easier to follow for me than a retry type goto.
> }
> }
> }
> -
> - if (!found) {
> - /* Try to remove a non-existing extent. */
> - return CXL_MBOX_INVALID_PA;
> - }
> }
>
> return CXL_MBOX_SUCCESS;
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 11/13] hw/cxl/cxl-mailbox-utils: Add partial and superset extent release mailbox support
@ 2024-03-06 18:09 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 18:09 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:06 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> With the change, we extend the extent release mailbox command processing
> to allow more flexible release. As long as the DPA range of the extent to
> release is covered by valid extent(s) in the device, the release can be
> performed.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Ouch this is more complex than I was thinking, but seems correct to me.
A few minor comments inline
Jonathan
> +/*
> + * Detect potential extent overflow caused by extent split during processing
> + * extent release requests, also allow releasing superset of extents where the
> + * extent to release covers the range of multiple extents in the device.
> + * Note:
> + * 1.we will reject releasing an extent if some portion of its rang is
range
> + * not covered by valid extents.
> + * 2.This function is called after cxl_detect_malformed_extent_list so checks
> + * already performed there will be skipped.
> + */
> +static CXLRetCode cxl_detect_extent_overflow(const CXLType3Dev *ct3d,
> + const CXLUpdateDCExtentListInPl *in)
This code is basically dry running the actual removal. Can we just
make the core code the same for both cases? The bit where you update bitmaps
and extent lists at least.
> +{
> + uint64_t nbits, offset;
> + const CXLDCRegion *region;
> + unsigned long **bitmaps_copied;
> + uint64_t dpa, len;
> + int i, rid;
> + CXLRetCode ret = CXL_MBOX_SUCCESS;
> + long extent_cnt_delta = 0;
> + CXLDCExtentList tmp_list;
> + CXLDCExtent *ent;
> +
> + QTAILQ_INIT(&tmp_list);
> + copy_extent_list(&tmp_list, &ct3d->dc.extents);
> +
> + bitmaps_copied = g_new0(unsigned long *, ct3d->dc.num_regions);
> + for (i = 0; i < ct3d->dc.num_regions; i++) {
> + region = &ct3d->dc.regions[i];
> + nbits = region->len / region->block_size;
> + bitmaps_copied[i] = bitmap_new(nbits);
> + bitmap_copy(bitmaps_copied[i], region->blk_bitmap, nbits);
> + }
> +
> + for (i = 0; i < in->num_entries_updated; i++) {
> + dpa = in->updated_entries[i].start_dpa;
> + len = in->updated_entries[i].len;
> +
> + rid = cxl_find_dc_region_id(ct3d, dpa, len);
> + region = &ct3d->dc.regions[rid];
> + offset = (dpa - region->base) / region->block_size;
> + nbits = len / region->block_size;
> +
> + /* Check whether range [dpa, dpa + len) is covered by valid range */
> + if (find_next_zero_bit(bitmaps_copied[rid], offset + nbits, offset) <
> + offset + nbits) {
> + ret = CXL_MBOX_INVALID_PA;
> + goto free_and_exit;
> + }
> +
> + QTAILQ_FOREACH(ent, &tmp_list, node) {
> + /* Only split within an extent can cause extent count increase */
> + if (ent->start_dpa <= dpa &&
> + dpa + len <= ent->start_dpa + ent->len) {
> + uint64_t ent_start_dpa = ent->start_dpa;
> + uint64_t ent_len = ent->len;
> + uint64_t len1 = dpa - ent_start_dpa;
> + uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
> +
> + extent_cnt_delta += len1 && len2 ? 2 : (len1 || len2 ? 1 : 0);
I think this is the same as
if (len1)
extent_cnt_delta++;
if (len2)
extent_cnt_delta++;
extent_cnt_delta--;
> + extent_cnt_delta -= 1;
> + if (ct3d->dc.total_extent_count + extent_cnt_delta >
> + CXL_NUM_EXTENTS_SUPPORTED) {
This early overflow detect seems valid to me because a device might run
out or resource mid processing the list even if it would fit at the end.
Good.
> + ret = CXL_MBOX_RESOURCES_EXHAUSTED;
> + goto free_and_exit;
> + }
> +
> + offset = (ent->start_dpa - region->base) / region->block_size;
> + nbits = ent->len / region->block_size;
> + bitmap_clear(bitmaps_copied[rid], offset, nbits);
> + cxl_remove_extent_from_extent_list(&tmp_list, ent);
> +
> + if (len1) {
> + offset = (dpa - region->base) / region->block_size;
> + nbits = len1 / region->block_size;
> + bitmap_set(bitmaps_copied[rid], offset, nbits);
> + cxl_insert_extent_to_extent_list(&tmp_list,
> + ent_start_dpa, len1,
> + NULL, 0);
> + }
> +
> + if (len2) {
> + offset = (dpa + len - region->base) / region->block_size;
> + nbits = len2 / region->block_size;
> + bitmap_set(bitmaps_copied[rid], offset, nbits);
> + cxl_insert_extent_to_extent_list(&tmp_list, dpa + len,
> + len2, NULL, 0);
> + }
> + break;
> + }
> + }
> + }
> +
> +free_and_exit:
> + for (i = 0; i < ct3d->dc.num_regions; i++) {
> + g_free(bitmaps_copied[i]);
> + }
> + g_free(bitmaps_copied);
> +
> + while (!QTAILQ_EMPTY(&tmp_list)) {
> + ent = QTAILQ_FIRST(&tmp_list);
> + cxl_remove_extent_from_extent_list(&tmp_list, ent);
> + }
> +
> + return ret;
> +}
> +
> /*
> * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
> */
> @@ -1644,15 +1793,28 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> return ret;
> }
>
> - for (i = 0; i < in->num_entries_updated; i++) {
> - bool found = false;
> + ret = cxl_detect_extent_overflow(ct3d, in);
> + if (ret != CXL_MBOX_SUCCESS) {
> + return ret;
> + }
>
> + /*
> + * After this point, it is guaranteed that the extents in the
> + * updated extent list to release is valid, that means:
> + * 1. All extents in the list have no overlaps;
> + * 2. Each extent belongs to a valid DC region;
> + * 3. The DPA range of each extent is covered by valid extent
> + * in the device.
> + */
> + for (i = 0; i < in->num_entries_updated; i++) {
> dpa = in->updated_entries[i].start_dpa;
> len = in->updated_entries[i].len;
>
> +process_leftover:
> QTAILQ_FOREACH(ent, extent_list, node) {
> /* Found the extent overlapping with */
> if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
> + /* Case 1: The to-release extent is subset of ent */
> if (dpa + len <= ent->start_dpa + ent->len) {
> /*
> * The incoming extent covers a portion of an extent
> @@ -1669,17 +1831,6 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> uint64_t len1 = dpa - ent_start_dpa;
> uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
>
> - /*
> - * TODO: checking for possible extent overflow, will be
> - * moved into a dedicated function of detecting extent
> - * overflow.
> - */
> - if (len1 && len2 && ct3d->dc.total_extent_count ==
> - CXL_NUM_EXTENTS_SUPPORTED) {
> - return CXL_MBOX_RESOURCES_EXHAUSTED;
> - }
> -
> - found = true;
> cxl_remove_extent_from_extent_list(extent_list, ent);
> ct3d->dc.total_extent_count -= 1;
> ct3_clear_region_block_backed(ct3d, ent_start_dpa, ent_len);
> @@ -1700,20 +1851,34 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> break;
> } else {
> /*
> - * TODO: we reject the attempt to remove an extent that
> - * overlaps with multiple extents in the device for now,
> - * once the bitmap indicating whether a DPA range is
> - * covered by valid extents is introduced, will allow it.
> + * Case 2: the to-release extent overlaps with multiple
> + * extents, including the superset case
> */
> - return CXL_MBOX_INVALID_PA;
> + uint64_t ent_start_dpa = ent->start_dpa;
> + uint64_t ent_len = ent->len;
> + uint64_t len1 = dpa - ent_start_dpa;
> +
> + cxl_remove_extent_from_extent_list(extent_list, ent);
> + ct3d->dc.total_extent_count -= 1;
> + ct3_clear_region_block_backed(ct3d, ent_start_dpa, ent_len);
> +
> + if (len1) {
> + cxl_insert_extent_to_extent_list(extent_list,
> + ent_start_dpa, len1,
> + NULL, 0);
> + ct3d->dc.total_extent_count += 1;
> + ct3_set_region_block_backed(ct3d, ent_start_dpa, len1);
> + }
> + /*
> + * processing the portion of the range following current
> + * extent
> + */
> + len = dpa + len - ent_start_dpa - ent_len;
> + dpa = ent_start_dpa + ent_len;
> + goto process_leftover;
I'd slightly prefer a while loop I think based on len > 0
It does add indent, but easier to follow for me than a retry type goto.
> }
> }
> }
> -
> - if (!found) {
> - /* Try to remove a non-existing extent. */
> - return CXL_MBOX_INVALID_PA;
> - }
> }
>
> return CXL_MBOX_SUCCESS;
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 12/13] hw/mem/cxl_type3: Allow to release partial extent and extent superset in QMP interface
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (10 preceding siblings ...)
2024-03-04 19:34 ` [PATCH v5 11/13] hw/cxl/cxl-mailbox-utils: Add partial and superset extent release mailbox support nifan.cxl
@ 2024-03-04 19:34 ` nifan.cxl
2024-03-06 18:14 ` Jonathan Cameron via
2024-03-04 19:34 ` [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents nifan.cxl
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:34 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
Before the change, the QMP interface used for add/release DC extents
only allows to release extents that exist in either pending-to-add list
or accepted list in the device, which means the DPA range of the extent must
match exactly that of an extent in either list. Otherwise, the release
request will be ignored.
With the change, we relax the constraints. As long as the DPA range of the
extent to release is covered by extents in one of the two lists
mentioned above, we allow the release.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/mem/cxl_type3.c | 110 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 89 insertions(+), 21 deletions(-)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index c164cf4580..5bd64e604e 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1802,28 +1802,79 @@ typedef enum CXLDCEventType {
} CXLDCEventType;
/*
- * Check whether the exact extent exists in the list
- * Return value: the extent pointer in the list; else null
+ * Testing whether the DPA range [dpa, dpa + len) is covered by
+ * extents in the list.
*/
-static CXLDCExtent *cxl_dc_extent_exists(CXLDCExtentList *list,
- CXLDCExtentRaw *ext)
+static bool cxl_test_dpa_range_covered_by_extents(CXLDCExtentList *list,
+ uint64_t dpa, uint64_t len)
{
CXLDCExtent *ent;
- if (!ext || !list) {
- return NULL;
+ if (!list) {
+ return false;
}
- QTAILQ_FOREACH(ent, list, node) {
- if (ent->start_dpa != ext->start_dpa) {
- continue;
- }
+ while (len) {
+ bool has_leftover = false;
- /* Found exact extent */
- return ent->len == ext->len ? ent : NULL;
+ QTAILQ_FOREACH(ent, list, node) {
+ if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
+ if (dpa + len <= ent->start_dpa + ent->len) {
+ return true;
+ } else {
+ len = dpa + len - ent->start_dpa - ent->len;
+ dpa = ent->start_dpa + ent->len;
+ has_leftover = true;
+ break;
+ }
+ }
+ }
+ if (!has_leftover) {
+ break;
+ }
}
+ return false;
+}
+
+/*
+ * Remove all extents whose DPA range has overlaps with the DPA range
+ * [dpa, dpa + len) from the list, and delete the overlapped portion.
+ * Note:
+ * 1. If the removed extents is fully within the DPA range, delete the extent;
+ * 2. Otherwise, keep the portion that does not overlap, insert new extents to
+ * the list if needed for the un-coverlapped part.
+ */
+static void cxl_delist_extent_by_dpa_range(CXLDCExtentList *list,
+ uint64_t dpa, uint64_t len)
+{
+ CXLDCExtent *ent;
- return NULL;
+process_leftover:
+ QTAILQ_FOREACH(ent, list, node) {
+ if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
+ uint64_t ent_start_dpa = ent->start_dpa;
+ uint64_t ent_len = ent->len;
+ uint64_t len1 = dpa - ent_start_dpa;
+
+ cxl_remove_extent_from_extent_list(list, ent);
+ if (len1) {
+ cxl_insert_extent_to_extent_list(list, ent_start_dpa,
+ len1, NULL, 0);
+ }
+
+ if (dpa + len <= ent_start_dpa + ent_len) {
+ uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
+ if (len2) {
+ cxl_insert_extent_to_extent_list(list, dpa + len,
+ len2, NULL, 0);
+ }
+ } else {
+ len = dpa + len - ent_start_dpa - ent_len;
+ dpa = ent_start_dpa + ent_len;
+ goto process_leftover;
+ }
+ }
+ }
}
/*
@@ -1915,8 +1966,8 @@ static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
list = records;
extents = g_new0(CXLDCExtentRaw, num_extents);
while (list) {
- CXLDCExtent *ent;
bool skip_extent = false;
+ CXLDCExtentList *extent_list;
offset = list->value->offset;
len = list->value->len;
@@ -1933,15 +1984,32 @@ static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
* remove it from the pending extent list, so later when the add
* response for the extent arrives, the device can reject the
* extent as it is not in the pending list.
+ * Now, we can handle the case where the extent covers the DPA
+ * range of multiple extents in the pending_to_add list.
+ * TODO: we do not allow the extent covers range of extents in
+ * pending_to_add list and accepted list at the same time for now.
*/
- ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
- &extents[i]);
- if (ent) {
- QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
- g_free(ent);
+ extent_list = &dcd->dc.extents_pending_to_add;
+ if (cxl_test_dpa_range_covered_by_extents(extent_list,
+ extents[i].start_dpa,
+ extents[i].len)) {
+ cxl_delist_extent_by_dpa_range(extent_list,
+ extents[i].start_dpa,
+ extents[i].len);
+ } else if (!ct3_test_region_block_backed(dcd, extents[i].start_dpa,
+ extents[i].len)) {
+ /*
+ * If the DPA range of the extent is not covered by extents
+ * in the accepted list, skip
+ */
skip_extent = true;
- } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
- /* If the exact extent is not in the accepted list, skip */
+ }
+ } else if (type == DC_EVENT_ADD_CAPACITY) {
+ extent_list = &dcd->dc.extents;
+ /* If the extent is ready pending to add, skip */
+ if (cxl_test_dpa_range_covered_by_extents(extent_list,
+ extents[i].start_dpa,
+ extents[i].len)) {
skip_extent = true;
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 12/13] hw/mem/cxl_type3: Allow to release partial extent and extent superset in QMP interface
2024-03-04 19:34 ` [PATCH v5 12/13] hw/mem/cxl_type3: Allow to release partial extent and extent superset in QMP interface nifan.cxl
@ 2024-03-06 18:14 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-06 18:14 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:07 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Before the change, the QMP interface used for add/release DC extents
> only allows to release extents that exist in either pending-to-add list
> or accepted list in the device, which means the DPA range of the extent must
> match exactly that of an extent in either list. Otherwise, the release
> request will be ignored.
>
> With the change, we relax the constraints. As long as the DPA range of the
> extent to release is covered by extents in one of the two lists
> mentioned above, we allow the release.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Run out of time today, so just took a very quick look at this.
Seemed fine but similar comments on exit conditions and retry gotos as
earlier patches.
> +/*
> + * Remove all extents whose DPA range has overlaps with the DPA range
> + * [dpa, dpa + len) from the list, and delete the overlapped portion.
> + * Note:
> + * 1. If the removed extents is fully within the DPA range, delete the extent;
> + * 2. Otherwise, keep the portion that does not overlap, insert new extents to
> + * the list if needed for the un-coverlapped part.
> + */
> +static void cxl_delist_extent_by_dpa_range(CXLDCExtentList *list,
> + uint64_t dpa, uint64_t len)
> +{
> + CXLDCExtent *ent;
>
> - return NULL;
> +process_leftover:
As before can we turn this into a while loop so the exit conditions are
more obvious? Based on len I think.
> + QTAILQ_FOREACH(ent, list, node) {
> + if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
> + uint64_t ent_start_dpa = ent->start_dpa;
> + uint64_t ent_len = ent->len;
> + uint64_t len1 = dpa - ent_start_dpa;
> +
> + cxl_remove_extent_from_extent_list(list, ent);
> + if (len1) {
> + cxl_insert_extent_to_extent_list(list, ent_start_dpa,
> + len1, NULL, 0);
> + }
> +
> + if (dpa + len <= ent_start_dpa + ent_len) {
> + uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
> + if (len2) {
> + cxl_insert_extent_to_extent_list(list, dpa + len,
> + len2, NULL, 0);
> + }
> + } else {
> + len = dpa + len - ent_start_dpa - ent_len;
> + dpa = ent_start_dpa + ent_len;
> + goto process_leftover;
> + }
> + }
> + }
> }
>
> /*
> @@ -1915,8 +1966,8 @@ static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> list = records;
> extents = g_new0(CXLDCExtentRaw, num_extents);
> while (list) {
> - CXLDCExtent *ent;
> bool skip_extent = false;
> + CXLDCExtentList *extent_list;
>
> offset = list->value->offset;
> len = list->value->len;
> @@ -1933,15 +1984,32 @@ static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> * remove it from the pending extent list, so later when the add
> * response for the extent arrives, the device can reject the
> * extent as it is not in the pending list.
> + * Now, we can handle the case where the extent covers the DPA
No need for Now. Anyone reading it is look at the cod here.
> + * range of multiple extents in the pending_to_add list.
> + * TODO: we do not allow the extent covers range of extents in
> + * pending_to_add list and accepted list at the same time for now.
> */
> - ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> - &extents[i]);
> - if (ent) {
> - QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> - g_free(ent);
> + extent_list = &dcd->dc.extents_pending_to_add;
> + if (cxl_test_dpa_range_covered_by_extents(extent_list,
> + extents[i].start_dpa,
> + extents[i].len)) {
> + cxl_delist_extent_by_dpa_range(extent_list,
> + extents[i].start_dpa,
> + extents[i].len);
> + } else if (!ct3_test_region_block_backed(dcd, extents[i].start_dpa,
> + extents[i].len)) {
> + /*
> + * If the DPA range of the extent is not covered by extents
> + * in the accepted list, skip
> + */
> skip_extent = true;
> - } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> - /* If the exact extent is not in the accepted list, skip */
> + }
> + } else if (type == DC_EVENT_ADD_CAPACITY) {
> + extent_list = &dcd->dc.extents;
> + /* If the extent is ready pending to add, skip */
> + if (cxl_test_dpa_range_covered_by_extents(extent_list,
> + extents[i].start_dpa,
> + extents[i].len)) {
> skip_extent = true;
> }
> }
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 12/13] hw/mem/cxl_type3: Allow to release partial extent and extent superset in QMP interface
@ 2024-03-06 18:14 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 18:14 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:07 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> Before the change, the QMP interface used for add/release DC extents
> only allows to release extents that exist in either pending-to-add list
> or accepted list in the device, which means the DPA range of the extent must
> match exactly that of an extent in either list. Otherwise, the release
> request will be ignored.
>
> With the change, we relax the constraints. As long as the DPA range of the
> extent to release is covered by extents in one of the two lists
> mentioned above, we allow the release.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Run out of time today, so just took a very quick look at this.
Seemed fine but similar comments on exit conditions and retry gotos as
earlier patches.
> +/*
> + * Remove all extents whose DPA range has overlaps with the DPA range
> + * [dpa, dpa + len) from the list, and delete the overlapped portion.
> + * Note:
> + * 1. If the removed extents is fully within the DPA range, delete the extent;
> + * 2. Otherwise, keep the portion that does not overlap, insert new extents to
> + * the list if needed for the un-coverlapped part.
> + */
> +static void cxl_delist_extent_by_dpa_range(CXLDCExtentList *list,
> + uint64_t dpa, uint64_t len)
> +{
> + CXLDCExtent *ent;
>
> - return NULL;
> +process_leftover:
As before can we turn this into a while loop so the exit conditions are
more obvious? Based on len I think.
> + QTAILQ_FOREACH(ent, list, node) {
> + if (ent->start_dpa <= dpa && dpa < ent->start_dpa + ent->len) {
> + uint64_t ent_start_dpa = ent->start_dpa;
> + uint64_t ent_len = ent->len;
> + uint64_t len1 = dpa - ent_start_dpa;
> +
> + cxl_remove_extent_from_extent_list(list, ent);
> + if (len1) {
> + cxl_insert_extent_to_extent_list(list, ent_start_dpa,
> + len1, NULL, 0);
> + }
> +
> + if (dpa + len <= ent_start_dpa + ent_len) {
> + uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
> + if (len2) {
> + cxl_insert_extent_to_extent_list(list, dpa + len,
> + len2, NULL, 0);
> + }
> + } else {
> + len = dpa + len - ent_start_dpa - ent_len;
> + dpa = ent_start_dpa + ent_len;
> + goto process_leftover;
> + }
> + }
> + }
> }
>
> /*
> @@ -1915,8 +1966,8 @@ static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> list = records;
> extents = g_new0(CXLDCExtentRaw, num_extents);
> while (list) {
> - CXLDCExtent *ent;
> bool skip_extent = false;
> + CXLDCExtentList *extent_list;
>
> offset = list->value->offset;
> len = list->value->len;
> @@ -1933,15 +1984,32 @@ static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> * remove it from the pending extent list, so later when the add
> * response for the extent arrives, the device can reject the
> * extent as it is not in the pending list.
> + * Now, we can handle the case where the extent covers the DPA
No need for Now. Anyone reading it is look at the cod here.
> + * range of multiple extents in the pending_to_add list.
> + * TODO: we do not allow the extent covers range of extents in
> + * pending_to_add list and accepted list at the same time for now.
> */
> - ent = cxl_dc_extent_exists(&dcd->dc.extents_pending_to_add,
> - &extents[i]);
> - if (ent) {
> - QTAILQ_REMOVE(&dcd->dc.extents_pending_to_add, ent, node);
> - g_free(ent);
> + extent_list = &dcd->dc.extents_pending_to_add;
> + if (cxl_test_dpa_range_covered_by_extents(extent_list,
> + extents[i].start_dpa,
> + extents[i].len)) {
> + cxl_delist_extent_by_dpa_range(extent_list,
> + extents[i].start_dpa,
> + extents[i].len);
> + } else if (!ct3_test_region_block_backed(dcd, extents[i].start_dpa,
> + extents[i].len)) {
> + /*
> + * If the DPA range of the extent is not covered by extents
> + * in the accepted list, skip
> + */
> skip_extent = true;
> - } else if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> - /* If the exact extent is not in the accepted list, skip */
> + }
> + } else if (type == DC_EVENT_ADD_CAPACITY) {
> + extent_list = &dcd->dc.extents;
> + /* If the extent is ready pending to add, skip */
> + if (cxl_test_dpa_range_covered_by_extents(extent_list,
> + extents[i].start_dpa,
> + extents[i].len)) {
> skip_extent = true;
> }
> }
^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents
2024-03-04 19:33 [PATCH v5 00/13] Enabling DCD emulation support in Qemu nifan.cxl
` (11 preceding siblings ...)
2024-03-04 19:34 ` [PATCH v5 12/13] hw/mem/cxl_type3: Allow to release partial extent and extent superset in QMP interface nifan.cxl
@ 2024-03-04 19:34 ` nifan.cxl
2024-03-05 16:09 ` Jonathan Cameron via
12 siblings, 1 reply; 81+ messages in thread
From: nifan.cxl @ 2024-03-04 19:34 UTC (permalink / raw)
To: qemu-devel
Cc: jonathan.cameron, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan.cxl,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
From: Fan Ni <fan.ni@samsung.com>
With the change, we add the following two QMP interfaces to print out
extents information in the device,
1. cxl-display-accepted-dc-extents: print out the accepted DC extents in
the device;
2. cxl-display-pending-to-add-dc-extents: print out the pending-to-add
DC extents in the device;
The output is appended to a file passed to the command and by default
it is /tmp/dc-extent.txt.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
hw/mem/cxl_type3.c | 80 ++++++++++++++++++++++++++++++++++++++++
hw/mem/cxl_type3_stubs.c | 12 ++++++
qapi/cxl.json | 32 ++++++++++++++++
3 files changed, 124 insertions(+)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 5bd64e604e..6a08e7ae40 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -2089,6 +2089,86 @@ void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
region_id, records, errp);
}
+static void cxl_dcd_display_extent_list(const CXLType3Dev *dcd, const char *f,
+ bool accepted_list, Error **errp)
+{
+ const CXLDCExtentList *list;
+ CXLDCExtent *ent;
+ FILE *fp = NULL;
+ int i = 0;
+
+ if (!dcd->dc.num_regions) {
+ error_setg(errp, "No dynamic capacity support from the device");
+ return;
+ }
+
+ if (!f) {
+ fp = fopen("/tmp/dc-extent.txt", "a+");
+ } else {
+ fp = fopen(f, "a+");
+ }
+
+ if (!fp) {
+ error_setg(errp, "Open log file failed");
+ return;
+ }
+ if (accepted_list) {
+ list = &dcd->dc.extents;
+ fprintf(fp, "Print accepted extent info:\n");
+ } else {
+ list = &dcd->dc.extents_pending_to_add;
+ fprintf(fp, "Print pending-to-add extent info:\n");
+ }
+
+ QTAILQ_FOREACH(ent, list, node) {
+ fprintf(fp, "%d: [0x%lx - 0x%lx]\n", i++, ent->start_dpa,
+ ent->start_dpa + ent->len);
+ }
+ fprintf(fp, "In total, %d extents printed!\n", i);
+ fclose(fp);
+}
+
+void qmp_cxl_display_accepted_dc_extents(const char *path, const char *f,
+ Error **errp)
+{
+ Object *obj;
+ CXLType3Dev *dcd;
+
+ obj = object_resolve_path(path, NULL);
+ if (!obj) {
+ error_setg(errp, "Unable to resolve path");
+ return;
+ }
+ if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+ error_setg(errp, "Path not point to a valid CXL type3 device");
+ return;
+ }
+
+ dcd = CXL_TYPE3(obj);
+ cxl_dcd_display_extent_list(dcd, f, true, errp);
+}
+
+void qmp_cxl_display_pending_to_add_dc_extents(const char *path, const char *f,
+ Error **errp)
+{
+ Object *obj;
+ CXLType3Dev *dcd;
+
+ obj = object_resolve_path(path, NULL);
+ if (!obj) {
+ error_setg(errp, "Unable to resolve path");
+ return;
+ }
+ if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+ error_setg(errp, "Path not point to a valid CXL type3 device");
+ return;
+ }
+
+
+ dcd = CXL_TYPE3(obj);
+ cxl_dcd_display_extent_list(dcd, f, false, errp);
+}
+
static void ct3_class_init(ObjectClass *oc, void *data)
{
DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index d913b11b4d..d896758301 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -81,3 +81,15 @@ void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
{
error_setg(errp, "CXL Type 3 support is not compiled in");
}
+
+void qmp_cxl_display_accepted_dc_extents(const char *path, const char *f,
+ Error **errp)
+{
+ error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_display_pending_to_add_dc_extents(const char *path, const char *f,
+ Error **errp)
+{
+ error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 2645004666..6f10300ec6 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -420,3 +420,35 @@
'extents': [ 'CXLDCExtentRecord' ]
}
}
+
+##
+# @cxl-display-accepted-dc-extents:
+#
+# Command to print out all the accepted DC extents in the device
+#
+# @path: CXL DCD canonical QOM path
+# @output: path of output file to dump the results to
+#
+# Since : 9.0
+##
+{ 'command': 'cxl-display-accepted-dc-extents',
+ 'data': { 'path': 'str',
+ 'output': 'str'
+ }
+}
+
+##
+# @cxl-display-pending-to-add-dc-extents:
+#
+# Command to print out all the pending-to-add DC extents in the device
+#
+# @path: CXL DCD canonical QOM path
+# @output: path of output file to dump the results to
+#
+# Since : 9.0
+##
+{ 'command': 'cxl-display-pending-to-add-dc-extents',
+ 'data': { 'path': 'str',
+ 'output': 'str'
+ }
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents
2024-03-04 19:34 ` [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents nifan.cxl
@ 2024-03-05 16:09 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron @ 2024-03-05 16:09 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:08 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> With the change, we add the following two QMP interfaces to print out
> extents information in the device,
> 1. cxl-display-accepted-dc-extents: print out the accepted DC extents in
> the device;
> 2. cxl-display-pending-to-add-dc-extents: print out the pending-to-add
> DC extents in the device;
> The output is appended to a file passed to the command and by default
> it is /tmp/dc-extent.txt.
Hi Fan,
Is there precedence for this sort of logging to a file from a qmp
command? I can see something like this being useful.
A few comments inline.
Jonathan
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
> hw/mem/cxl_type3.c | 80 ++++++++++++++++++++++++++++++++++++++++
> hw/mem/cxl_type3_stubs.c | 12 ++++++
> qapi/cxl.json | 32 ++++++++++++++++
> 3 files changed, 124 insertions(+)
>
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 5bd64e604e..6a08e7ae40 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -2089,6 +2089,86 @@ void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> region_id, records, errp);
> }
>
> +static void cxl_dcd_display_extent_list(const CXLType3Dev *dcd, const char *f,
> + bool accepted_list, Error **errp)
> +{
> + const CXLDCExtentList *list;
> + CXLDCExtent *ent;
> + FILE *fp = NULL;
> + int i = 0;
> +
> + if (!dcd->dc.num_regions) {
> + error_setg(errp, "No dynamic capacity support from the device");
> + return;
> + }
> +
> + if (!f) {
> + fp = fopen("/tmp/dc-extent.txt", "a+");
> + } else {
> + fp = fopen(f, "a+");
> + }
> +
> + if (!fp) {
> + error_setg(errp, "Open log file failed");
> + return;
> + }
> + if (accepted_list) {
> + list = &dcd->dc.extents;
> + fprintf(fp, "Print accepted extent info:\n");
> + } else {
> + list = &dcd->dc.extents_pending_to_add;
> + fprintf(fp, "Print pending-to-add extent info:\n");
> + }
> +
> + QTAILQ_FOREACH(ent, list, node) {
> + fprintf(fp, "%d: [0x%lx - 0x%lx]\n", i++, ent->start_dpa,
> + ent->start_dpa + ent->len);
> + }
> + fprintf(fp, "In total, %d extents printed!\n", i);
> + fclose(fp);
> +}
> +void qmp_cxl_display_pending_to_add_dc_extents(const char *path, const char *f,
> + Error **errp)
> +{
> + Object *obj;
> + CXLType3Dev *dcd;
> +
> + obj = object_resolve_path(path, NULL);
As an aside, we could probably flatten a lot of these cases into
object_resolve_path_type()
> + if (!obj) {
> + error_setg(errp, "Unable to resolve path");
> + return;
> + }
> + if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> + error_setg(errp, "Path not point to a valid CXL type3 device");
> + return;
> + }
> +
> +
> + dcd = CXL_TYPE3(obj);
> + cxl_dcd_display_extent_list(dcd, f, false, errp);
> +}
> +
> static void ct3_class_init(ObjectClass *oc, void *data)
> {
> DeviceClass *dc = DEVICE_CLASS(oc);
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 2645004666..6f10300ec6 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -420,3 +420,35 @@
> 'extents': [ 'CXLDCExtentRecord' ]
> }
> }
> +
> +##
> +# @cxl-display-accepted-dc-extents:
> +#
> +# Command to print out all the accepted DC extents in the device
> +#
> +# @path: CXL DCD canonical QOM path
> +# @output: path of output file to dump the results to
We take a path, but dump to the same file whatever this is set to?
I'm not sure what precedence there is for qom commands that
dump to a debug log. Perhaps reference any other cases in the
patch description.
> +#
> +# Since : 9.0
> +##
> +{ 'command': 'cxl-display-accepted-dc-extents',
> + 'data': { 'path': 'str',
> + 'output': 'str'
> + }
> +}
> +
> +##
> +# @cxl-display-pending-to-add-dc-extents:
> +#
> +# Command to print out all the pending-to-add DC extents in the device
> +#
> +# @path: CXL DCD canonical QOM path
> +# @output: path of output file to dump the results to
> +#
> +# Since : 9.0
> +##
> +{ 'command': 'cxl-display-pending-to-add-dc-extents',
> + 'data': { 'path': 'str',
> + 'output': 'str'
> + }
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents
@ 2024-03-05 16:09 ` Jonathan Cameron via
0 siblings, 0 replies; 81+ messages in thread
From: Jonathan Cameron via @ 2024-03-05 16:09 UTC (permalink / raw)
To: nifan.cxl
Cc: qemu-devel, linux-cxl, gregory.price, ira.weiny, dan.j.williams,
a.manzanares, dave, nmtadam.samsung, jim.harris, Jorgen.Hansen,
wj28.lee, Fan Ni
On Mon, 4 Mar 2024 11:34:08 -0800
nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
>
> With the change, we add the following two QMP interfaces to print out
> extents information in the device,
> 1. cxl-display-accepted-dc-extents: print out the accepted DC extents in
> the device;
> 2. cxl-display-pending-to-add-dc-extents: print out the pending-to-add
> DC extents in the device;
> The output is appended to a file passed to the command and by default
> it is /tmp/dc-extent.txt.
Hi Fan,
Is there precedence for this sort of logging to a file from a qmp
command? I can see something like this being useful.
A few comments inline.
Jonathan
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
> hw/mem/cxl_type3.c | 80 ++++++++++++++++++++++++++++++++++++++++
> hw/mem/cxl_type3_stubs.c | 12 ++++++
> qapi/cxl.json | 32 ++++++++++++++++
> 3 files changed, 124 insertions(+)
>
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 5bd64e604e..6a08e7ae40 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -2089,6 +2089,86 @@ void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> region_id, records, errp);
> }
>
> +static void cxl_dcd_display_extent_list(const CXLType3Dev *dcd, const char *f,
> + bool accepted_list, Error **errp)
> +{
> + const CXLDCExtentList *list;
> + CXLDCExtent *ent;
> + FILE *fp = NULL;
> + int i = 0;
> +
> + if (!dcd->dc.num_regions) {
> + error_setg(errp, "No dynamic capacity support from the device");
> + return;
> + }
> +
> + if (!f) {
> + fp = fopen("/tmp/dc-extent.txt", "a+");
> + } else {
> + fp = fopen(f, "a+");
> + }
> +
> + if (!fp) {
> + error_setg(errp, "Open log file failed");
> + return;
> + }
> + if (accepted_list) {
> + list = &dcd->dc.extents;
> + fprintf(fp, "Print accepted extent info:\n");
> + } else {
> + list = &dcd->dc.extents_pending_to_add;
> + fprintf(fp, "Print pending-to-add extent info:\n");
> + }
> +
> + QTAILQ_FOREACH(ent, list, node) {
> + fprintf(fp, "%d: [0x%lx - 0x%lx]\n", i++, ent->start_dpa,
> + ent->start_dpa + ent->len);
> + }
> + fprintf(fp, "In total, %d extents printed!\n", i);
> + fclose(fp);
> +}
> +void qmp_cxl_display_pending_to_add_dc_extents(const char *path, const char *f,
> + Error **errp)
> +{
> + Object *obj;
> + CXLType3Dev *dcd;
> +
> + obj = object_resolve_path(path, NULL);
As an aside, we could probably flatten a lot of these cases into
object_resolve_path_type()
> + if (!obj) {
> + error_setg(errp, "Unable to resolve path");
> + return;
> + }
> + if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> + error_setg(errp, "Path not point to a valid CXL type3 device");
> + return;
> + }
> +
> +
> + dcd = CXL_TYPE3(obj);
> + cxl_dcd_display_extent_list(dcd, f, false, errp);
> +}
> +
> static void ct3_class_init(ObjectClass *oc, void *data)
> {
> DeviceClass *dc = DEVICE_CLASS(oc);
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 2645004666..6f10300ec6 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -420,3 +420,35 @@
> 'extents': [ 'CXLDCExtentRecord' ]
> }
> }
> +
> +##
> +# @cxl-display-accepted-dc-extents:
> +#
> +# Command to print out all the accepted DC extents in the device
> +#
> +# @path: CXL DCD canonical QOM path
> +# @output: path of output file to dump the results to
We take a path, but dump to the same file whatever this is set to?
I'm not sure what precedence there is for qom commands that
dump to a debug log. Perhaps reference any other cases in the
patch description.
> +#
> +# Since : 9.0
> +##
> +{ 'command': 'cxl-display-accepted-dc-extents',
> + 'data': { 'path': 'str',
> + 'output': 'str'
> + }
> +}
> +
> +##
> +# @cxl-display-pending-to-add-dc-extents:
> +#
> +# Command to print out all the pending-to-add DC extents in the device
> +#
> +# @path: CXL DCD canonical QOM path
> +# @output: path of output file to dump the results to
> +#
> +# Since : 9.0
> +##
> +{ 'command': 'cxl-display-pending-to-add-dc-extents',
> + 'data': { 'path': 'str',
> + 'output': 'str'
> + }
> +}
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents
2024-03-05 16:09 ` Jonathan Cameron via
(?)
@ 2024-03-05 16:15 ` Daniel P. Berrangé
2024-03-05 17:09 ` fan
-1 siblings, 1 reply; 81+ messages in thread
From: Daniel P. Berrangé @ 2024-03-05 16:15 UTC (permalink / raw)
To: Jonathan Cameron
Cc: nifan.cxl, qemu-devel, linux-cxl, gregory.price, ira.weiny,
dan.j.williams, a.manzanares, dave, nmtadam.samsung, jim.harris,
Jorgen.Hansen, wj28.lee, Fan Ni
On Tue, Mar 05, 2024 at 04:09:08PM +0000, Jonathan Cameron via wrote:
> On Mon, 4 Mar 2024 11:34:08 -0800
> nifan.cxl@gmail.com wrote:
>
> > From: Fan Ni <fan.ni@samsung.com>
> >
> > With the change, we add the following two QMP interfaces to print out
> > extents information in the device,
> > 1. cxl-display-accepted-dc-extents: print out the accepted DC extents in
> > the device;
> > 2. cxl-display-pending-to-add-dc-extents: print out the pending-to-add
> > DC extents in the device;
> > The output is appended to a file passed to the command and by default
> > it is /tmp/dc-extent.txt.
> Hi Fan,
>
> Is there precedence for this sort of logging to a file from a qmp
> command? I can see something like this being useful.
This is pretty unusual.
For runtime debugging information our strong preference is to integrate
'trace' probes throughout the code:
https://www.qemu.org/docs/master/devel/tracing.html#tracing
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents
2024-03-05 16:15 ` Daniel P. Berrangé
@ 2024-03-05 17:09 ` fan
2024-03-05 17:14 ` Daniel P. Berrangé
0 siblings, 1 reply; 81+ messages in thread
From: fan @ 2024-03-05 17:09 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Jonathan Cameron, nifan.cxl, qemu-devel, linux-cxl,
gregory.price, ira.weiny, dan.j.williams, a.manzanares, dave,
nmtadam.samsung, jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
On Tue, Mar 05, 2024 at 04:15:30PM +0000, Daniel P. Berrangé wrote:
> On Tue, Mar 05, 2024 at 04:09:08PM +0000, Jonathan Cameron via wrote:
> > On Mon, 4 Mar 2024 11:34:08 -0800
> > nifan.cxl@gmail.com wrote:
> >
> > > From: Fan Ni <fan.ni@samsung.com>
> > >
> > > With the change, we add the following two QMP interfaces to print out
> > > extents information in the device,
> > > 1. cxl-display-accepted-dc-extents: print out the accepted DC extents in
> > > the device;
> > > 2. cxl-display-pending-to-add-dc-extents: print out the pending-to-add
> > > DC extents in the device;
> > > The output is appended to a file passed to the command and by default
> > > it is /tmp/dc-extent.txt.
> > Hi Fan,
> >
> > Is there precedence for this sort of logging to a file from a qmp
> > command? I can see something like this being useful.
>
> This is pretty unusual.
Yeah. I cannot find anything similar in existing code, my initial plan
was to print out to the screen directly, however, cannot find out how to
do it nicely, so decided to go with a file.
Is there a reason why we do not want to go with this approach?
>
> For runtime debugging information our strong preference is to integrate
> 'trace' probes throughout the code:
>
> https://www.qemu.org/docs/master/devel/tracing.html#tracing
I am not familiar with the trace mechanism. However, I think the
approach in this patch may be useful not only for debugging purpose.
Although not tried yet, maybe we can also use the approach to set
some parameters at runtime like what procfs does?
Just a rough thought.
Fan
>
> With regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents
2024-03-05 17:09 ` fan
@ 2024-03-05 17:14 ` Daniel P. Berrangé
2024-04-24 13:12 ` Markus Armbruster
0 siblings, 1 reply; 81+ messages in thread
From: Daniel P. Berrangé @ 2024-03-05 17:14 UTC (permalink / raw)
To: fan
Cc: Jonathan Cameron, qemu-devel, linux-cxl, gregory.price,
ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
On Tue, Mar 05, 2024 at 09:09:05AM -0800, fan wrote:
> On Tue, Mar 05, 2024 at 04:15:30PM +0000, Daniel P. Berrangé wrote:
> > On Tue, Mar 05, 2024 at 04:09:08PM +0000, Jonathan Cameron via wrote:
> > > On Mon, 4 Mar 2024 11:34:08 -0800
> > > nifan.cxl@gmail.com wrote:
> > >
> > > > From: Fan Ni <fan.ni@samsung.com>
> > > >
> > > > With the change, we add the following two QMP interfaces to print out
> > > > extents information in the device,
> > > > 1. cxl-display-accepted-dc-extents: print out the accepted DC extents in
> > > > the device;
> > > > 2. cxl-display-pending-to-add-dc-extents: print out the pending-to-add
> > > > DC extents in the device;
> > > > The output is appended to a file passed to the command and by default
> > > > it is /tmp/dc-extent.txt.
> > > Hi Fan,
> > >
> > > Is there precedence for this sort of logging to a file from a qmp
> > > command? I can see something like this being useful.
> >
> > This is pretty unusual.
>
> Yeah. I cannot find anything similar in existing code, my initial plan
> was to print out to the screen directly, however, cannot find out how to
> do it nicely, so decided to go with a file.
>
> Is there a reason why we do not want to go with this approach?
>
> >
> > For runtime debugging information our strong preference is to integrate
> > 'trace' probes throughout the code:
> >
> > https://www.qemu.org/docs/master/devel/tracing.html#tracing
>
> I am not familiar with the trace mechanism. However, I think the
> approach in this patch may be useful not only for debugging purpose.
> Although not tried yet, maybe we can also use the approach to set
> some parameters at runtime like what procfs does?
Please don't invent something new unless you can show why QEMU's existing
tracing system isn't sufficiently good for the problem. QEMU's tracing
can dump to the terminal directly, or integrate with a variety of other
backends, and data can be turned off/on at runtime per-trace point.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents
2024-03-05 17:14 ` Daniel P. Berrangé
@ 2024-04-24 13:12 ` Markus Armbruster
2024-04-24 17:12 ` fan
0 siblings, 1 reply; 81+ messages in thread
From: Markus Armbruster @ 2024-04-24 13:12 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: fan, Jonathan Cameron, qemu-devel, linux-cxl, gregory.price,
ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
Daniel P. Berrangé <berrange@redhat.com> writes:
> On Tue, Mar 05, 2024 at 09:09:05AM -0800, fan wrote:
>> On Tue, Mar 05, 2024 at 04:15:30PM +0000, Daniel P. Berrangé wrote:
>> > On Tue, Mar 05, 2024 at 04:09:08PM +0000, Jonathan Cameron via wrote:
>> > > On Mon, 4 Mar 2024 11:34:08 -0800
>> > > nifan.cxl@gmail.com wrote:
>> > >
>> > > > From: Fan Ni <fan.ni@samsung.com>
>> > > >
>> > > > With the change, we add the following two QMP interfaces to print out
>> > > > extents information in the device,
>> > > > 1. cxl-display-accepted-dc-extents: print out the accepted DC extents in
>> > > > the device;
>> > > > 2. cxl-display-pending-to-add-dc-extents: print out the pending-to-add
>> > > > DC extents in the device;
>> > > > The output is appended to a file passed to the command and by default
>> > > > it is /tmp/dc-extent.txt.
>> > > Hi Fan,
>> > >
>> > > Is there precedence for this sort of logging to a file from a qmp
>> > > command? I can see something like this being useful.
>> >
>> > This is pretty unusual.
>>
>> Yeah. I cannot find anything similar in existing code, my initial plan
>> was to print out to the screen directly, however, cannot find out how to
>> do it nicely, so decided to go with a file.
>>
>> Is there a reason why we do not want to go with this approach?
>>
>> >
>> > For runtime debugging information our strong preference is to integrate
>> > 'trace' probes throughout the code:
>> >
>> > https://www.qemu.org/docs/master/devel/tracing.html#tracing
>>
>> I am not familiar with the trace mechanism. However, I think the
>> approach in this patch may be useful not only for debugging purpose.
>> Although not tried yet, maybe we can also use the approach to set
>> some parameters at runtime like what procfs does?
>
> Please don't invent something new unless you can show why QEMU's existing
> tracing system isn't sufficiently good for the problem. QEMU's tracing
> can dump to the terminal directly, or integrate with a variety of other
> backends, and data can be turned off/on at runtime per-trace point.
Seconded.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v5 13/13] qapi/cxl.json: Add QMP interfaces to print out accepted and pending DC extents
2024-04-24 13:12 ` Markus Armbruster
@ 2024-04-24 17:12 ` fan
0 siblings, 0 replies; 81+ messages in thread
From: fan @ 2024-04-24 17:12 UTC (permalink / raw)
To: Markus Armbruster
Cc: Daniel P. Berrangé,
fan, Jonathan Cameron, qemu-devel, linux-cxl, gregory.price,
ira.weiny, dan.j.williams, a.manzanares, dave, nmtadam.samsung,
jim.harris, Jorgen.Hansen, wj28.lee, Fan Ni
On Wed, Apr 24, 2024 at 03:12:34PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > On Tue, Mar 05, 2024 at 09:09:05AM -0800, fan wrote:
> >> On Tue, Mar 05, 2024 at 04:15:30PM +0000, Daniel P. Berrangé wrote:
> >> > On Tue, Mar 05, 2024 at 04:09:08PM +0000, Jonathan Cameron via wrote:
> >> > > On Mon, 4 Mar 2024 11:34:08 -0800
> >> > > nifan.cxl@gmail.com wrote:
> >> > >
> >> > > > From: Fan Ni <fan.ni@samsung.com>
> >> > > >
> >> > > > With the change, we add the following two QMP interfaces to print out
> >> > > > extents information in the device,
> >> > > > 1. cxl-display-accepted-dc-extents: print out the accepted DC extents in
> >> > > > the device;
> >> > > > 2. cxl-display-pending-to-add-dc-extents: print out the pending-to-add
> >> > > > DC extents in the device;
> >> > > > The output is appended to a file passed to the command and by default
> >> > > > it is /tmp/dc-extent.txt.
> >> > > Hi Fan,
> >> > >
> >> > > Is there precedence for this sort of logging to a file from a qmp
> >> > > command? I can see something like this being useful.
> >> >
> >> > This is pretty unusual.
> >>
> >> Yeah. I cannot find anything similar in existing code, my initial plan
> >> was to print out to the screen directly, however, cannot find out how to
> >> do it nicely, so decided to go with a file.
> >>
> >> Is there a reason why we do not want to go with this approach?
> >>
> >> >
> >> > For runtime debugging information our strong preference is to integrate
> >> > 'trace' probes throughout the code:
> >> >
> >> > https://www.qemu.org/docs/master/devel/tracing.html#tracing
> >>
> >> I am not familiar with the trace mechanism. However, I think the
> >> approach in this patch may be useful not only for debugging purpose.
> >> Although not tried yet, maybe we can also use the approach to set
> >> some parameters at runtime like what procfs does?
> >
> > Please don't invent something new unless you can show why QEMU's existing
> > tracing system isn't sufficiently good for the problem. QEMU's tracing
> > can dump to the terminal directly, or integrate with a variety of other
> > backends, and data can be turned off/on at runtime per-trace point.
>
> Seconded.
>
Thanks.
This patch is removed from the latest version (v7):
https://lore.kernel.org/linux-cxl/ZiaFYUB6FC9NR7W4@memverge.com/T/#t
Fan
^ permalink raw reply [flat|nested] 81+ messages in thread