All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
@ 2022-10-26  0:47 ` Gregory Price
  2022-10-26  0:47   ` [PATCH 1/4] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios Gregory Price
                     ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Gregory Price @ 2022-10-26  0:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, mst, marcel.apfelbaum, imammedo,
	ani, alison.schofield, dave, a.manzanares, bwidawsk,
	gregory.price, hchkuo, cbrowy, ira.weiny

Submitted as an extention to the multi-feature branch maintained
by Jonathan Cameron at:
https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24 


Summary of Changes:
1) E820 CFMW Bug fix.  
2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev


Regarding the E820 fix
  * This bugfix is required for memory regions to work on x86
  * input from Dan Williams and others suggest that E820 entry for
    the CFMW should not exist, as it is expected to be dynamically
    assigned at runtime.  If this entry exists, it instead blocks
    region creation by nature of the memory region being marked as
    reserved.

Regarding Multi-Region and Volatile Memory
  * Developed with input from Jonathan Cameron and Davidlohr Bueso.

Regarding SRAT Generation for Type-3 Devices
  * Co-Developed by Davidlohr Bueso.  Built from his base patch and
    extended to work with both volatile and persistent regions.
  * This can be used to demonstrate static type-3 device mapping and
    testing numa-access to type-3 device memory regions.


This series brings 3 features to CXL Type-3 Devices:
    1) Volatile Memory Region support
    2) Multi-Region support (1 Volatile, 1 Persistent)
    3) (optional) SRAT Entry generation for type-3 device regions

In this series we implement multi-region and volatile region support
through 7 major changes to CXL devices
    1) The HostMemoryBackend [hostmem] has been replaced by two
       [hostvmem] and [hostpmem] to store volatile and persistent
       memory respectively
    2) The single AddressSpace has been replaced by two AddressSpaces
       [hostvmem_as] and [hostpmem_as] to map respective memdevs.
    3) Each memory region size and total region are stored separately
    4) The CDAT and DVSEC memory map entries have been updated:
       a) if vmem is present, vmem is mapped at DPA(0)
       b) if pmem is present
          i)  and vmem is present, pmem is mapped at DPA(vmem->size)
          ii) else, pmem is mapped at DPA(0)
       c) partitioning of pmem is not supported in this patch set but
          has been discussed and this design should suffice.
    5) Read/Write functions have been updated to access AddressSpaces
       according to the mapping described in #4
    6) cxl-mailbox has been updated to report the respective size of
       volatile and persistent memory regions
    7) SRAT entries may optionally be generated by manually assigning
       memdevs to a cpuless numa node

To support the Device Physical Address (DPA) Mapping decisions, see
CXL Spec (3.0) Section 8.2.9.8.2.0 - Get Partition Info:
  Active Volatile Memory
    The device shall provide this volatile capacity starting at DPA 0
  Active Persistent Memory
    The device shall provide this persistent capacity starting at the
    DPA immediately following the volatile capacity

Partitioning of Persistent Memory regions may be supported on
following patch sets.


Gregory Price (4):
  hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios
  hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
  hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
  hw/acpi/cxl.c: Fill in SRAT for vmem/pmem if NUMA node is assigned

 docs/system/devices/cxl.rst |  74 ++++++++--
 hw/acpi/cxl.c               |  67 +++++++++
 hw/cxl/cxl-mailbox-utils.c  |  23 +--
 hw/i386/acpi-build.c        |   4 +
 hw/i386/pc.c                |   2 -
 hw/mem/cxl_type3.c          | 274 +++++++++++++++++++++++++++---------
 include/hw/acpi/cxl.h       |   1 +
 include/hw/cxl/cxl_device.h |  11 +-
 tests/qtest/cxl-test.c      | 111 +++++++++++----
 9 files changed, 443 insertions(+), 124 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/4] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios
  2022-10-26  0:47 ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
@ 2022-10-26  0:47   ` Gregory Price
  2022-10-26 20:33     ` Michael S. Tsirkin
  2022-10-26  0:47   ` [PATCH 2/4] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition Gregory Price
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Gregory Price @ 2022-10-26  0:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, mst, marcel.apfelbaum, imammedo,
	ani, alison.schofield, dave, a.manzanares, bwidawsk,
	gregory.price, hchkuo, cbrowy, ira.weiny

Early-boot e820 records will be inserted by the bios/efi/early boot
software and be reported to the kernel via insert_resource.  Later, when
CXL drivers iterate through the regions again, they will insert another
resource and make the RESERVED memory area a child.

This RESERVED memory area causes the memory region to become unusable,
and as a result attempting to create memory regions with

    `cxl create-region ...`

Will fail due to the RESERVED area intersecting with the CXL window.

During boot the following traceback is observed:

0xffffffff81101650 in insert_resource_expand_to_fit ()
0xffffffff83d964c5 in e820__reserve_resources_late ()
0xffffffff83e03210 in pcibios_resource_survey ()
0xffffffff83e04f4a in pcibios_init ()

Which produces a call to reserve the CFMWS area:

(gdb) p *new
$54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved",
       flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0,
       child = 0x0}

Later the Kernel parses ACPI tables and reserves the exact same area as
the CXL Fixed Memory Window:

0xffffffff811016a4 in insert_resource_conflict ()
                      insert_resource ()
0xffffffff81a81389 in cxl_parse_cfmws ()
0xffffffff818c4a81 in call_handler ()
                      acpi_parse_entries_array ()

(gdb) p/x *new
$59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0",
       flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0,
       child = 0x0}

This produces the following output in /proc/iomem:

590000000-68fffffff : CXL Window 0
  590000000-68fffffff : Reserved

This reserved area causes `get_free_mem_region()` to fail due to a check
against `__region_intersects()`.  Due to this reserved area, the
intersect check will only ever return REGION_INTERSECTS, which causes
`cxl create-region` to always fail.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 hw/i386/pc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 768982ae9a..203c90fedb 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1062,7 +1062,6 @@ void pc_memory_init(PCMachineState *pcms,
         hwaddr cxl_size = MiB;
 
         cxl_base = pc_get_cxl_range_start(pcms);
-        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
         memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
         memory_region_add_subregion(system_memory, cxl_base, mr);
         cxl_resv_end = cxl_base + cxl_size;
@@ -1078,7 +1077,6 @@ void pc_memory_init(PCMachineState *pcms,
                 memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
                                       "cxl-fixed-memory-region", fw->size);
                 memory_region_add_subregion(system_memory, fw->base, &fw->mr);
-                e820_add_entry(fw->base, fw->size, E820_RESERVED);
                 cxl_fmw_base += fw->size;
                 cxl_resv_end = cxl_fmw_base;
             }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/4] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
  2022-10-26  0:47 ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
  2022-10-26  0:47   ` [PATCH 1/4] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios Gregory Price
@ 2022-10-26  0:47   ` Gregory Price
  2022-11-14 17:55       ` Jonathan Cameron via
  2022-10-26  0:47   ` [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent) Gregory Price
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Gregory Price @ 2022-10-26  0:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, mst, marcel.apfelbaum, imammedo,
	ani, alison.schofield, dave, a.manzanares, bwidawsk,
	gregory.price, hchkuo, cbrowy, ira.weiny

Remove usage of magic numbers when accessing capacity fields and replace
with CXL_CAPACITY_MULTIPLIER, matching the kernel definition.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
---
 hw/cxl/cxl-mailbox-utils.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 3e23d29e2d..d7543fd5b4 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -15,6 +15,8 @@
 #include "qemu/log.h"
 #include "qemu/uuid.h"
 
+#define CXL_CAPACITY_MULTIPLIER   0x10000000 /* SZ_256M */
+
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
  *  1. Add the command set and cmd to the enum.
@@ -267,7 +269,7 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
     } QEMU_PACKED *fw_info;
     QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
 
-    if (cxl_dstate->pmem_size < (256 << 20)) {
+    if (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -412,7 +414,7 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
     uint64_t size = cxl_dstate->pmem_size;
 
-    if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -422,8 +424,8 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
     /* PMEM only */
     snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
-    id->total_capacity = size / (256 << 20);
-    id->persistent_capacity = size / (256 << 20);
+    id->total_capacity = size / CXL_CAPACITY_MULTIPLIER;
+    id->persistent_capacity = size / CXL_CAPACITY_MULTIPLIER;
     id->lsa_size = cvc->get_lsa_size(ct3d);
     id->poison_list_max_mer[1] = 0x1; /* 256 poison records */
 
@@ -444,14 +446,14 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
     QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
     uint64_t size = cxl_dstate->pmem_size;
 
-    if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
     /* PMEM only */
     part_info->active_vmem = 0;
     part_info->next_vmem = 0;
-    part_info->active_pmem = size / (256 << 20);
+    part_info->active_pmem = size / CXL_CAPACITY_MULTIPLIER;
     part_info->next_pmem = 0;
 
     *len = sizeof(*part_info);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
  2022-10-26  0:47 ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
  2022-10-26  0:47   ` [PATCH 1/4] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios Gregory Price
  2022-10-26  0:47   ` [PATCH 2/4] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition Gregory Price
@ 2022-10-26  0:47   ` Gregory Price
  2022-11-14 17:53       ` Jonathan Cameron via
  2022-10-26  0:47   ` [PATCH 4/4] hw/acpi/cxl.c: Fill in SRAT for vmem/pmem if NUMA node is assigned Gregory Price
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Gregory Price @ 2022-10-26  0:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, mst, marcel.apfelbaum, imammedo,
	ani, alison.schofield, dave, a.manzanares, bwidawsk,
	gregory.price, hchkuo, cbrowy, ira.weiny

This commit enables each CXL Type-3 device to contain one volatile
memory region and one persistent region.

Two new properties have been added to cxl-type3 device initialization:
    [volatile-memdev] and [persistent-memdev]

The existing [memdev] property has been deprecated and will default the
memory region to a persistent memory region (although a user may assign
the region to a ram or file backed region). It cannot be used in
combination with the new [persistent-memdev] property.

Partitioning volatile memory from persistent memory is not yet supported.

Volatile memory is mapped at DPA(0x0), while Persistent memory is mapped
at DPA(vmem->size), per CXL Spec 8.2.9.8.2.0 - Get Partition Info.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 docs/system/devices/cxl.rst |  53 +++++--
 hw/cxl/cxl-mailbox-utils.c  |  21 ++-
 hw/mem/cxl_type3.c          | 274 +++++++++++++++++++++++++++---------
 include/hw/cxl/cxl_device.h |  11 +-
 tests/qtest/cxl-test.c      | 111 +++++++++++----
 5 files changed, 348 insertions(+), 122 deletions(-)

diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
index f25783a4ec..9e165064c8 100644
--- a/docs/system/devices/cxl.rst
+++ b/docs/system/devices/cxl.rst
@@ -300,15 +300,36 @@ Example topology involving a switch::
 
 Example command lines
 ---------------------
-A very simple setup with just one directly attached CXL Type 3 device::
+A very simple setup with just one directly attached CXL Type 3 Persistent Memory device::
 
   qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
   ...
-  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
+  -object memory-backend-file,pmem=true,id=pmem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
+  -object memory-backend-file,pmem=true,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \
+  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
+  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
+  -device cxl-type3,bus=root_port13,persistent-memdev=pmem0,lsa=cxl-lsa1,id=cxl-pmem0 \
+  -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
+
+A very simple setup with just one directly attached CXL Type 3 Volatile Memory device::
+
+  qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
+  ...
+  -object memory-backend-ram,id=vmem0,share=on,size=256M \
   -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
   -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
+  -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \
+  -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
+
+The same volatile setup may optionally include an LSA region::
+
+  qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
+  ...
+  -object memory-backend-ram,id=vmem0,share=on,size=256M \
+  -object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \
+  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
+  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
+  -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,lsa=cxl-lsa0,id=cxl-vmem0 \
   -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
 
 A setup suitable for 4 way interleave. Only one fixed window provided, to enable 2 way
@@ -328,13 +349,13 @@ the CXL Type3 device directly attached (no switches).::
   -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
   -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
   -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
+  -device cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
   -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
-  -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \
+  -device cxl-type3,bus=root_port14,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \
   -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
-  -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \
+  -device cxl-type3,bus=root_port15,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \
   -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
-  -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \
+  -device cxl-type3,bus=root_port16,persistent-memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \
   -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.targets.1=cxl.2,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k
 
 An example of 4 devices below a switch suitable for 1, 2 or 4 way interleave::
@@ -354,15 +375,23 @@ An example of 4 devices below a switch suitable for 1, 2 or 4 way interleave::
   -device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \
   -device cxl-upstream,bus=root_port0,id=us0 \
   -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-  -device cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0,size=256M \
+  -device cxl-type3,bus=swport0,persistent-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0,size=256M \
   -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-  -device cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1,size=256M \
+  -device cxl-type3,bus=swport1,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1,size=256M \
   -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-  -device cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2,size=256M \
+  -device cxl-type3,bus=swport2,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2,size=256M \
   -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-  -device cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3,size=256M \
+  -device cxl-type3,bus=swport3,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3,size=256M \
   -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
 
+Deprecations
+------------
+
+The Type 3 device [memdev] attribute has been deprecated in favor
+of the [persistent-memdev] and [volatile-memdev] attributes. [memdev]
+will default to a persistent memory device for backward compatibility
+and is incapable of being used in combination with [persistent-memdev].
+
 Kernel Configuration Options
 ----------------------------
 
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index d7543fd5b4..c1183614b9 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -269,7 +269,7 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
     } QEMU_PACKED *fw_info;
     QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
 
-    if (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) {
+    if (cxl_dstate->mem_size < CXL_CAPACITY_MULTIPLIER) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -412,20 +412,20 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
 
     CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
-    uint64_t size = cxl_dstate->pmem_size;
 
-    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
+    if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
     id = (void *)cmd->payload;
     memset(id, 0, sizeof(*id));
 
-    /* PMEM only */
     snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
-    id->total_capacity = size / CXL_CAPACITY_MULTIPLIER;
-    id->persistent_capacity = size / CXL_CAPACITY_MULTIPLIER;
+    id->total_capacity = cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER;
+    id->persistent_capacity = cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER;
+    id->volatile_capacity = cxl_dstate->vmem_size / CXL_CAPACITY_MULTIPLIER;
     id->lsa_size = cvc->get_lsa_size(ct3d);
     id->poison_list_max_mer[1] = 0x1; /* 256 poison records */
 
@@ -444,16 +444,15 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
         uint64_t next_pmem;
     } QEMU_PACKED *part_info = (void *)cmd->payload;
     QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
-    uint64_t size = cxl_dstate->pmem_size;
 
-    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
+    if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
-    /* PMEM only */
-    part_info->active_vmem = 0;
+    part_info->active_vmem = cxl_dstate->vmem_size / CXL_CAPACITY_MULTIPLIER;
     part_info->next_vmem = 0;
-    part_info->active_pmem = size / CXL_CAPACITY_MULTIPLIER;
+    part_info->active_pmem = cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER;
     part_info->next_pmem = 0;
 
     *len = sizeof(*part_info);
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 0317bd96a6..21e866dcaf 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -32,7 +32,8 @@ enum {
 };
 
 static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
-                                         int dsmad_handle, MemoryRegion *mr)
+                                         int dsmad_handle, MemoryRegion *mr,
+                                         bool is_pmem, uint64_t dpa_base)
 {
     g_autofree CDATDsmas *dsmas = NULL;
     g_autofree CDATDslbis *dslbis0 = NULL;
@@ -51,8 +52,8 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
             .length = sizeof(*dsmas),
         },
         .DSMADhandle = dsmad_handle,
-        .flags = CDAT_DSMAS_FLAG_NV,
-        .DPA_base = 0,
+        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
+        .DPA_base = dpa_base,
         .DPA_length = int128_get64(mr->size),
     };
 
@@ -151,33 +152,67 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
 static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
 {
     g_autofree CDATSubHeader **table = NULL;
-    MemoryRegion *nonvolatile_mr;
     CXLType3Dev *ct3d = priv;
+    MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
     int dsmad_handle = 0;
+    int cur_ent = 0;
+    int len = 0;
     int rc;
 
-    if (!ct3d->hostmem) {
+    if (!ct3d->hostpmem && !ct3d->hostvmem) {
         return 0;
     }
 
-    nonvolatile_mr = host_memory_backend_get_memory(ct3d->hostmem);
-    if (!nonvolatile_mr) {
-        return -EINVAL;
+    if (ct3d->hostvmem) {
+        volatile_mr = host_memory_backend_get_memory(ct3d->hostvmem);
+        if (!volatile_mr) {
+            return -EINVAL;
+        }
+        len += CT3_CDAT_NUM_ENTRIES;
+    }
+
+    if (ct3d->hostpmem) {
+        nonvolatile_mr = host_memory_backend_get_memory(ct3d->hostpmem);
+        if (!nonvolatile_mr) {
+            return -EINVAL;
+        }
+        len += CT3_CDAT_NUM_ENTRIES;
     }
 
-    table = g_malloc0(CT3_CDAT_NUM_ENTRIES * sizeof(*table));
+    table = g_malloc0(len * sizeof(*table));
     if (!table) {
         return -ENOMEM;
     }
 
-    rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, nonvolatile_mr);
-    if (rc < 0) {
-        return rc;
+    /* Now fill them in */
+    if (volatile_mr) {
+        rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
+                true, 0);
+        if (rc < 0) {
+            return rc;
+        }
+        cur_ent = CT3_CDAT_NUM_ENTRIES;
+    }
+
+    if (nonvolatile_mr) {
+        rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
+                nonvolatile_mr, true, (volatile_mr ? volatile_mr->size : 0));
+        if (rc < 0) {
+            goto error_cleanup;
+        }
+        cur_ent += CT3_CDAT_NUM_ENTRIES;
     }
+    assert(len == cur_ent);
 
     *cdat_table = g_steal_pointer(&table);
 
-    return CT3_CDAT_NUM_ENTRIES;
+    return len;
+error_cleanup:
+    int i;
+    for (i = 0; i < cur_ent; i++) {
+        g_free(*cdat_table[i]);
+    }
+    return rc;
 }
 
 static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
@@ -378,16 +413,48 @@ static void build_dvsecs(CXLType3Dev *ct3d)
     CXLDVSECRegisterLocator *regloc_dvsec;
     uint8_t *dvsec;
     int i;
+    uint32_t range1_size_hi = 0, range1_size_lo = 0,
+             range1_base_hi = 0, range1_base_lo = 0,
+             range2_size_hi = 0, range2_size_lo = 0,
+             range2_base_hi = 0, range2_base_lo = 0;
+
+    /*
+     * Volatile memory is mapped as (0x0)
+     * Persistent memory is mapped at (volatile->size)
+     */
+    if (ct3d->hostvmem && ct3d->hostpmem) {
+        range1_size_hi = ct3d->hostvmem->size >> 32;
+        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                         (ct3d->hostvmem->size & 0xF0000000);
+        range1_base_hi = 0;
+        range1_base_lo = 0;
+        range2_size_hi = ct3d->hostpmem->size >> 32;
+        range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                         (ct3d->hostpmem->size & 0xF0000000);
+        range2_base_hi = ct3d->hostvmem->size >> 32;
+        range2_base_lo = ct3d->hostvmem->size & 0xF0000000;
+    } else {
+        HostMemoryBackend *hmbe = ct3d->hostvmem ?
+                                  ct3d->hostvmem : ct3d->hostpmem;
+        range1_size_hi = hmbe->size >> 32;
+        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                         (hmbe->size & 0xF0000000);
+        range1_base_hi = 0;
+        range1_base_lo = 0;
+    }
 
     dvsec = (uint8_t *)&(CXLDVSECDevice){
         .cap = 0x1e,
         .ctrl = 0x2,
         .status2 = 0x2,
-        .range1_size_hi = ct3d->hostmem->size >> 32,
-        .range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
-        (ct3d->hostmem->size & 0xF0000000),
-        .range1_base_hi = 0,
-        .range1_base_lo = 0,
+        .range1_size_hi = range1_size_hi,
+        .range1_size_lo = range1_size_lo,
+        .range1_base_hi = range1_base_hi,
+        .range1_base_lo = range1_base_lo,
+        .range2_size_hi = range2_size_hi,
+        .range2_size_lo = range2_size_lo,
+        .range2_base_hi = range2_base_hi,
+        .range2_base_lo = range2_base_lo
     };
     cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE,
                                PCIE_CXL_DEVICE_DVSEC_LENGTH,
@@ -475,33 +542,62 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
     MemoryRegion *mr;
     char *name;
 
-    if (!ct3d->hostmem) {
-        error_setg(errp, "memdev property must be set");
+    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
+        error_setg(errp, "at least one memdev property must be set");
         return false;
+    } else if (ct3d->hostmem && ct3d->hostpmem) {
+        error_setg(errp, "[memdev] cannot be used with new "
+                         "[persistent-memdev] property");
+        return false;
+    } else if (ct3d->hostmem) {
+        /* Use of hostmem property implies pmem */
+        ct3d->hostpmem = ct3d->hostmem;
+        ct3d->hostmem = NULL;
     }
 
-    mr = host_memory_backend_get_memory(ct3d->hostmem);
-    if (!mr) {
-        error_setg(errp, "memdev property must be set");
+    if (ct3d->hostpmem && !ct3d->lsa) {
+        error_setg(errp, "lsa property must be set for persistent devices");
         return false;
     }
-    memory_region_set_nonvolatile(mr, true);
-    memory_region_set_enabled(mr, true);
-    host_memory_backend_set_mapped(ct3d->hostmem, true);
 
-    if (ds->id) {
-        name = g_strdup_printf("cxl-type3-dpa-space:%s", ds->id);
-    } else {
-        name = g_strdup("cxl-type3-dpa-space");
+    if (ct3d->hostvmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostvmem);
+        if (!mr) {
+            error_setg(errp, "volatile memdev must have backing device");
+            return false;
+        }
+        memory_region_set_nonvolatile(mr, false);
+        memory_region_set_enabled(mr, true);
+        host_memory_backend_set_mapped(ct3d->hostvmem, true);
+        if (ds->id) {
+            name = g_strdup_printf("cxl-type3-dpa-vmem-space:%s", ds->id);
+        } else {
+            name = g_strdup("cxl-type3-dpa-vmem-space");
+        }
+        address_space_init(&ct3d->hostvmem_as, mr, name);
+        ct3d->cxl_dstate.vmem_size = mr->size;
+        ct3d->cxl_dstate.mem_size += mr->size;
+        g_free(name);
     }
-    address_space_init(&ct3d->hostmem_as, mr, name);
-    g_free(name);
-
-    ct3d->cxl_dstate.pmem_size = ct3d->hostmem->size;
 
-    if (!ct3d->lsa) {
-        error_setg(errp, "lsa property must be set");
-        return false;
+    if (ct3d->hostpmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostpmem);
+        if (!mr) {
+            error_setg(errp, "persistent memdev must have backing device");
+            return false;
+        }
+        memory_region_set_nonvolatile(mr, true);
+        memory_region_set_enabled(mr, true);
+        host_memory_backend_set_mapped(ct3d->hostpmem, true);
+        if (ds->id) {
+            name = g_strdup_printf("cxl-type3-dpa-pmem-space:%s", ds->id);
+        } else {
+            name = g_strdup("cxl-type3-dpa-pmem-space");
+        }
+        address_space_init(&ct3d->hostpmem_as, mr, name);
+        ct3d->cxl_dstate.pmem_size = mr->size;
+        ct3d->cxl_dstate.mem_size += mr->size;
+        g_free(name);
     }
 
     return true;
@@ -609,7 +705,12 @@ err_free_spdm_socket:
 err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
-    address_space_destroy(&ct3d->hostmem_as);
+    if (ct3d->hostvmem) {
+        address_space_destroy(&ct3d->hostvmem_as);
+    }
+    if (ct3d->hostpmem) {
+        address_space_destroy(&ct3d->hostpmem_as);
+    }
     return;
 }
 
@@ -623,7 +724,12 @@ static void ct3_exit(PCIDevice *pci_dev)
     cxl_doe_cdat_release(cxl_cstate);
     spdm_sock_fini(ct3d->doe_spdm.socket);
     g_free(regs->special_ops);
-    address_space_destroy(&ct3d->hostmem_as);
+    if (ct3d->hostvmem) {
+        address_space_destroy(&ct3d->hostvmem_as);
+    }
+    if (ct3d->hostpmem) {
+        address_space_destroy(&ct3d->hostpmem_as);
+    }
 }
 
 /* TODO: Support multiple HDM decoders and DPA skip */
@@ -663,11 +769,17 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
 {
     CXLType3Dev *ct3d = CXL_TYPE3(d);
     uint64_t dpa_offset;
-    MemoryRegion *mr;
+    MemoryRegion *vmr = NULL, *pmr = NULL;
+    AddressSpace *as;
+
+    if (ct3d->hostvmem) {
+        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+    }
+    if (ct3d->hostpmem) {
+        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+    }
 
-    /* TODO support volatile region */
-    mr = host_memory_backend_get_memory(ct3d->hostmem);
-    if (!mr) {
+    if (!vmr && !pmr) {
         return MEMTX_ERROR;
     }
 
@@ -675,11 +787,13 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
         return MEMTX_ERROR;
     }
 
-    if (dpa_offset > int128_get64(mr->size)) {
+    if (dpa_offset > int128_get64(ct3d->cxl_dstate.mem_size)) {
         return MEMTX_ERROR;
     }
 
-    return address_space_read(&ct3d->hostmem_as, dpa_offset, attrs, data, size);
+    as = (vmr && (dpa_offset <= int128_get64(vmr->size))) ?
+         &ct3d->hostvmem_as : &ct3d->hostpmem_as;
+    return address_space_read(as, dpa_offset, attrs, data, size);
 }
 
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
@@ -687,10 +801,17 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
 {
     CXLType3Dev *ct3d = CXL_TYPE3(d);
     uint64_t dpa_offset;
-    MemoryRegion *mr;
+    MemoryRegion *vmr = NULL, *pmr = NULL;
+    AddressSpace *as;
+
+    if (ct3d->hostvmem) {
+        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+    }
+    if (ct3d->hostpmem) {
+        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+    }
 
-    mr = host_memory_backend_get_memory(ct3d->hostmem);
-    if (!mr) {
+    if (!vmr && !pmr) {
         return MEMTX_OK;
     }
 
@@ -698,11 +819,13 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
         return MEMTX_OK;
     }
 
-    if (dpa_offset > int128_get64(mr->size)) {
+    if (dpa_offset > int128_get64(ct3d->cxl_dstate.mem_size)) {
         return MEMTX_OK;
     }
-    return address_space_write(&ct3d->hostmem_as, dpa_offset, attrs,
-                               &data, size);
+
+    as = (vmr && (dpa_offset <= int128_get64(vmr->size))) ?
+         &ct3d->hostvmem_as : &ct3d->hostpmem_as;
+    return address_space_write(as, dpa_offset, attrs, &data, size);
 }
 
 static void ct3d_reset(DeviceState *dev)
@@ -717,7 +840,11 @@ static void ct3d_reset(DeviceState *dev)
 
 static Property ct3_props[] = {
     DEFINE_PROP_LINK("memdev", CXLType3Dev, hostmem, TYPE_MEMORY_BACKEND,
-                     HostMemoryBackend *),
+                     HostMemoryBackend *), /* for backward compatibility */
+    DEFINE_PROP_LINK("persistent-memdev", CXLType3Dev, hostpmem,
+                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
+    DEFINE_PROP_LINK("volatile-memdev", CXLType3Dev, hostvmem,
+                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
     DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
                      HostMemoryBackend *),
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
@@ -728,10 +855,12 @@ static Property ct3_props[] = {
 
 static uint64_t get_lsa_size(CXLType3Dev *ct3d)
 {
-    MemoryRegion *mr;
-
-    mr = host_memory_backend_get_memory(ct3d->lsa);
-    return memory_region_size(mr);
+    MemoryRegion *mr = NULL;
+    if (ct3d->lsa) {
+        mr = host_memory_backend_get_memory(ct3d->lsa);
+        return memory_region_size(mr);
+    }
+    return 0;
 }
 
 static void validate_lsa_access(MemoryRegion *mr, uint64_t size,
@@ -744,30 +873,35 @@ static void validate_lsa_access(MemoryRegion *mr, uint64_t size,
 static uint64_t get_lsa(CXLType3Dev *ct3d, void *buf, uint64_t size,
                     uint64_t offset)
 {
-    MemoryRegion *mr;
+    MemoryRegion *mr = NULL;
     void *lsa;
 
-    mr = host_memory_backend_get_memory(ct3d->lsa);
-    validate_lsa_access(mr, size, offset);
+    if (ct3d->lsa) {
+        mr = host_memory_backend_get_memory(ct3d->lsa);
+        validate_lsa_access(mr, size, offset);
 
-    lsa = memory_region_get_ram_ptr(mr) + offset;
-    memcpy(buf, lsa, size);
+        lsa = memory_region_get_ram_ptr(mr) + offset;
+        memcpy(buf, lsa, size);
+        return size;
+    }
 
-    return size;
+    return 0;
 }
 
 static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
                     uint64_t offset)
 {
-    MemoryRegion *mr;
-    void *lsa;
+    MemoryRegion *mr = NULL;
+    void *lsa = NULL;
 
-    mr = host_memory_backend_get_memory(ct3d->lsa);
-    validate_lsa_access(mr, size, offset);
+    if (ct3d->lsa) {
+        mr = host_memory_backend_get_memory(ct3d->lsa);
+        validate_lsa_access(mr, size, offset);
 
-    lsa = memory_region_get_ram_ptr(mr) + offset;
-    memcpy(lsa, buf, size);
-    memory_region_set_dirty(mr, offset, size);
+        lsa = memory_region_get_ram_ptr(mr) + offset;
+        memcpy(lsa, buf, size);
+        memory_region_set_dirty(mr, offset, size);
+    }
 
     /*
      * Just like the PMEM, if the guest is not allowed to exit gracefully, label
@@ -978,7 +1112,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
     pc->config_read = ct3d_config_read;
 
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
-    dc->desc = "CXL PMEM Device (Type 3)";
+    dc->desc = "CXL Memory Device (Type 3)";
     dc->reset = ct3d_reset;
     device_class_set_props(dc, ct3_props);
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 1cd71afcb6..1b366b739c 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -180,8 +180,10 @@ typedef struct cxl_device_state {
         uint64_t host_set;
     } timestamp;
 
-    /* memory region for persistent memory, HDM */
+    /* memory region size, HDM */
+    uint64_t mem_size;
     uint64_t pmem_size;
+    uint64_t vmem_size;
 
     struct cxl_cmd (*cxl_cmd_set)[256];
     /* Move me later */
@@ -311,12 +313,15 @@ struct CXLType3Dev {
     PCIDevice parent_obj;
 
     /* Properties */
-    HostMemoryBackend *hostmem;
+    HostMemoryBackend *hostmem; /* deprecated */
+    HostMemoryBackend *hostvmem;
+    HostMemoryBackend *hostpmem;
     HostMemoryBackend *lsa;
     uint64_t sn;
 
     /* State */
-    AddressSpace hostmem_as;
+    AddressSpace hostvmem_as;
+    AddressSpace hostpmem_as;
     CXLComponentState cxl_cstate;
     CXLDeviceState cxl_dstate;
 
diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
index 6baed747fa..a05ddc0c7b 100644
--- a/tests/qtest/cxl-test.c
+++ b/tests/qtest/cxl-test.c
@@ -34,29 +34,46 @@
                  "-device cxl-rp,id=rp2,bus=cxl.1,chassis=0,slot=2 " \
                  "-device cxl-rp,id=rp3,bus=cxl.1,chassis=0,slot=3 "
 
-#define QEMU_T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
-                 "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
-                 "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
-
-#define QEMU_2T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "    \
-                  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
-                  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
-                  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 "
-
-#define QEMU_4T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
-                  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
-                  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
-                  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 " \
-                  "-object memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "    \
-                  "-object memory-backend-file,id=lsa2,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp2,memdev=cxl-mem2,lsa=lsa2,id=cxl-pmem2 " \
-                  "-object memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "    \
-                  "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp3,memdev=cxl-mem3,lsa=lsa3,id=cxl-pmem3 "
+#define QEMU_T3D_DEPRECATED \
+  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
+  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
+
+#define QEMU_T3D_PMEM \
+  "-object memory-backend-file,id=m0,mem-path=%s,size=256M " \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
+  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-m0,lsa=lsa0,id=pmem0 "
+
+#define QEMU_T3D_VMEM \
+  "-object memory-backend-ram,id=mem0,size=256M " \
+  "-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=mem0 "
+
+#define QEMU_T3D_VMEM_LSA \
+  "-object memory-backend-ram,id=mem0,size=256M " \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
+  "-device cxl-type3,bus=rp0,volatile-memdev=mem0,lsa=lsa0,id=mem0 "
+
+#define QEMU_2T3D \
+  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "    \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
+  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=lsa0,id=pmem0 " \
+  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
+  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
+  "-device cxl-type3,bus=rp1,persistent-memdev=cxl-mem1,lsa=lsa1,id=pmem1 "
+
+#define QEMU_4T3D \
+  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
+  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=lsa0,id=pmem0 " \
+  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
+  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
+  "-device cxl-type3,bus=rp1,persistent-memdev=cxl-mem1,lsa=lsa1,id=pmem1 " \
+  "-object memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "    \
+  "-object memory-backend-file,id=lsa2,mem-path=%s,size=256M "    \
+  "-device cxl-type3,bus=rp2,persistent-memdev=cxl-mem2,lsa=lsa2,id=pmem2 " \
+  "-object memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "    \
+  "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M "    \
+  "-device cxl-type3,bus=rp3,persistent-memdev=cxl-mem3,lsa=lsa3,id=pmem3 "
 
 static void cxl_basic_hb(void)
 {
@@ -95,14 +112,53 @@ static void cxl_2root_port(void)
 }
 
 #ifdef CONFIG_POSIX
-static void cxl_t3d(void)
+static void cxl_t3d_deprecated(void)
 {
     g_autoptr(GString) cmdline = g_string_new(NULL);
     g_autofree const char *tmpfs = NULL;
 
     tmpfs = g_dir_make_tmp("cxl-test-XXXXXX", NULL);
 
-    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D, tmpfs, tmpfs);
+    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D_DEPRECATED,
+                    tmpfs, tmpfs);
+
+    qtest_start(cmdline->str);
+    qtest_end();
+}
+
+static void cxl_t3d_persistent(void)
+{
+    g_autoptr(GString) cmdline = g_string_new(NULL);
+    g_autofree const char *tmpfs = NULL;
+
+    tmpfs = g_dir_make_tmp("cxl-test-XXXXXX", NULL);
+
+    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D_PMEM,
+                    tmpfs, tmpfs);
+
+    qtest_start(cmdline->str);
+    qtest_end();
+}
+
+static void cxl_t3d_volatile(void)
+{
+    g_autoptr(GString) cmdline = g_string_new(NULL);
+
+    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D_VMEM);
+
+    qtest_start(cmdline->str);
+    qtest_end();
+}
+
+static void cxl_t3d_volatile_lsa(void)
+{
+    g_autoptr(GString) cmdline = g_string_new(NULL);
+    g_autofree const char *tmpfs = NULL;
+
+    tmpfs = g_dir_make_tmp("cxl-test-XXXXXX", NULL);
+
+    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D_VMEM_LSA,
+                    tmpfs);
 
     qtest_start(cmdline->str);
     qtest_end();
@@ -167,7 +223,10 @@ int main(int argc, char **argv)
         qtest_add_func("/pci/cxl/rp", cxl_root_port);
         qtest_add_func("/pci/cxl/rp_x2", cxl_2root_port);
 #ifdef CONFIG_POSIX
-        qtest_add_func("/pci/cxl/type3_device", cxl_t3d);
+        qtest_add_func("/pci/cxl/type3_device", cxl_t3d_deprecated);
+        qtest_add_func("/pci/cxl/type3_device_pmem", cxl_t3d_persistent);
+        qtest_add_func("/pci/cxl/type3_device_vmem", cxl_t3d_volatile);
+        qtest_add_func("/pci/cxl/type3_device_vmem_lsa", cxl_t3d_volatile_lsa);
         qtest_add_func("/pci/cxl/rp_x2_type3_x2", cxl_1pxb_2rp_2t3d);
         qtest_add_func("/pci/cxl/pxb_x2_root_port_x4_type3_x4",
                        cxl_2pxb_4rp_4t3d);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/4] hw/acpi/cxl.c: Fill in SRAT for vmem/pmem if NUMA node is assigned
  2022-10-26  0:47 ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
                     ` (2 preceding siblings ...)
  2022-10-26  0:47   ` [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent) Gregory Price
@ 2022-10-26  0:47   ` Gregory Price
  2022-10-26 20:13   ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Adam Manzanares
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Gregory Price @ 2022-10-26  0:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, mst, marcel.apfelbaum, imammedo,
	ani, alison.schofield, dave, a.manzanares, bwidawsk,
	gregory.price, hchkuo, cbrowy, ira.weiny

This patch enables the direct assignment of a NUMA node to a volatile or
persistent memory region on a CXL type-3 device.  This is useful for
testing static mapping for type-3 device memory regions as memory and
leveraging them directly via its NUMA node.

Co-developed-By: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 docs/system/devices/cxl.rst | 21 ++++++++++++
 hw/acpi/cxl.c               | 67 +++++++++++++++++++++++++++++++++++++
 hw/i386/acpi-build.c        |  4 +++
 include/hw/acpi/cxl.h       |  1 +
 4 files changed, 93 insertions(+)

diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
index 9e165064c8..32bf84a97c 100644
--- a/docs/system/devices/cxl.rst
+++ b/docs/system/devices/cxl.rst
@@ -332,6 +332,27 @@ The same volatile setup may optionally include an LSA region::
   -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,lsa=cxl-lsa0,id=cxl-vmem0 \
   -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
 
+
+Volatile and Persistent Memory regions may also be assigned an SRAT entry and statically
+mapped into the system by manually assigning them a CPU-less NUMA node. This is an example
+of a CXL Type 3 Volatile Memory device being assigned an SRAT entry via a NUMA node mapping::
+
+    qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
+    ...
+    -smp 4 \
+    -enable-kvm \
+    -nographic \
+    -object memory-backend-ram,id=mem0,size=2G,share=on \
+    -object memory-backend-ram,id=mem1,size=2G,share=on \
+    -numa node,memdev=mem0,cpus=0-3,nodeid=0 \
+    -numa node,memdev=mem1,nodeid=1, \
+    -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
+    -device cxl-rp,port=0,id=rp0,bus=cxl.0,chassis=0,slot=0 \
+    -device cxl-rp,port=1,id=rp1,bus=cxl.0,chassis=0,slot=1 \
+    -device cxl-type3,bus=rp0,volatile-memdev=mem1,id=cxl-mem0 
+
+
+
 A setup suitable for 4 way interleave. Only one fixed window provided, to enable 2 way
 interleave across 2 CXL host bridges.  Each host bridge has 2 CXL Root Ports, with
 the CXL Type3 device directly attached (no switches).::
diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index 2bf8c07993..a8c6166b7f 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -254,3 +254,70 @@ void build_cxl_osc_method(Aml *dev)
     aml_append(dev, aml_name_decl("CTRC", aml_int(0)));
     aml_append(dev, __build_cxl_osc_method());
 }
+
+static int cxl_device_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+        *list = g_slist_append(*list, DEVICE(obj));
+    }
+
+    object_child_foreach(obj, cxl_device_list, opaque);
+    return 0;
+}
+
+static GSList *cxl_get_device_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(qdev_get_machine(), cxl_device_list, &list);
+    return list;
+}
+
+void cxl_build_srat(GArray *table_data, NodeInfo* numa_info, int nb_numa_nodes)
+{
+    GSList *device_list, *list = cxl_get_device_list();
+    int node = 0;
+
+    for (device_list = list; device_list; device_list = device_list->next) {
+        DeviceState *dev = device_list->data;
+        CXLType3Dev *ct3d = CXL_TYPE3(dev);
+        MemoryRegion *mr = NULL;
+
+        if (ct3d->hostvmem) {
+            /* Find the numa node associated with this memdev */
+            for (node = 0; node < nb_numa_nodes; node++) {
+                if (numa_info[node].node_memdev == ct3d->hostvmem) {
+                    break;
+                }
+            }
+            if (node != nb_numa_nodes) {
+                mr = host_memory_backend_get_memory(ct3d->hostvmem);
+                if (mr) {
+                    build_srat_memory(table_data, mr->addr, mr->size, node,
+                          (MEM_AFFINITY_ENABLED | MEM_AFFINITY_HOTPLUGGABLE));
+                }
+            }
+        }
+
+        if (ct3d->hostpmem) {
+            /* Find the numa node associated with this memdev */
+            for (node = 0; node < nb_numa_nodes; node++) {
+                if (numa_info[node].node_memdev == ct3d->hostpmem) {
+                    break;
+                }
+            }
+            if (node != nb_numa_nodes) {
+                mr = host_memory_backend_get_memory(ct3d->hostpmem);
+                if (mr) {
+                    build_srat_memory(table_data, mr->addr, mr->size, node,
+                          (MEM_AFFINITY_ENABLED | MEM_AFFINITY_HOTPLUGGABLE |
+                           MEM_AFFINITY_NON_VOLATILE));
+                }
+            }
+        }
+    }
+
+    g_slist_free(list);
+}
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4f54b61904..af62c888e5 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2080,6 +2080,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
      * Memory devices may override proximity set by this entry,
      * providing _PXM method if necessary.
      */
+    if (pcms->cxl_devices_state.is_enabled) {
+        cxl_build_srat(table_data, numa_info, nb_numa_nodes);
+    }
+
     if (hotpluggable_address_space_size) {
         build_srat_memory(table_data, machine->device_memory->base,
                           hotpluggable_address_space_size, nb_numa_nodes - 1,
diff --git a/include/hw/acpi/cxl.h b/include/hw/acpi/cxl.h
index acf4418886..b4974297db 100644
--- a/include/hw/acpi/cxl.h
+++ b/include/hw/acpi/cxl.h
@@ -25,5 +25,6 @@ void cxl_build_cedt(GArray *table_offsets, GArray *table_data,
                     BIOSLinker *linker, const char *oem_id,
                     const char *oem_table_id, CXLState *cxl_state);
 void build_cxl_osc_method(Aml *dev);
+void cxl_build_srat(GArray *table_data, NodeInfo* numa_info, int nb_numa_nodes);
 
 #endif
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-26  0:47 ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
                     ` (3 preceding siblings ...)
  2022-10-26  0:47   ` [PATCH 4/4] hw/acpi/cxl.c: Fill in SRAT for vmem/pmem if NUMA node is assigned Gregory Price
@ 2022-10-26 20:13   ` Adam Manzanares
  2022-10-26 20:47     ` Gregory Price
  2022-10-26 20:15   ` Michael S. Tsirkin
  2022-10-26 20:20   ` Michael S. Tsirkin
  6 siblings, 1 reply; 23+ messages in thread
From: Adam Manzanares @ 2022-10-26 20:13 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, jonathan.cameron, linux-cxl, mst, marcel.apfelbaum,
	imammedo, ani, alison.schofield, dave, bwidawsk, gregory.price,
	hchkuo, cbrowy, ira.weiny

On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:
> Submitted as an extention to the multi-feature branch maintained
> by Jonathan Cameron at:
> https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24__;!!EwVzqGoTKBqv-0DWAJBm!RyiGL5B1XmQnVFwgxikKJeosPMKtoO1cTr61gIq8fwqfju8l4cbGZGwAEkKXIJB-Dbkfi_LNN2rGCbzMISz65cTxpAxI9pQ$   
> 
> 
> Summary of Changes:
> 1) E820 CFMW Bug fix.  
> 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev
> 
> 
> Regarding the E820 fix
>   * This bugfix is required for memory regions to work on x86
>   * input from Dan Williams and others suggest that E820 entry for
>     the CFMW should not exist, as it is expected to be dynamically
>     assigned at runtime.  If this entry exists, it instead blocks
>     region creation by nature of the memory region being marked as
>     reserved.

For CXL 2.0 it is my understanding that volatile capacity present at boot will
be advertised by the firmware. In the absence of EFI I would assume this would
be provided in the e820 map. 

Is the region driver meant to cover volatile capacity present at boot? I was
under the impression that it would be used for hot added volatile memory. It
would be good to cover all of these assumptions for the e820 fix. 

Lastly it is my understanding that the region driver does not have support for
volatile memory. It would be great to add that functionality if we make this
change in QEMU.

> 
> Regarding Multi-Region and Volatile Memory
>   * Developed with input from Jonathan Cameron and Davidlohr Bueso.
> 
> Regarding SRAT Generation for Type-3 Devices
>   * Co-Developed by Davidlohr Bueso.  Built from his base patch and
>     extended to work with both volatile and persistent regions.
>   * This can be used to demonstrate static type-3 device mapping and
>     testing numa-access to type-3 device memory regions.
> 
> 
> This series brings 3 features to CXL Type-3 Devices:
>     1) Volatile Memory Region support
>     2) Multi-Region support (1 Volatile, 1 Persistent)
>     3) (optional) SRAT Entry generation for type-3 device regions
> 
> In this series we implement multi-region and volatile region support
> through 7 major changes to CXL devices
>     1) The HostMemoryBackend [hostmem] has been replaced by two
>        [hostvmem] and [hostpmem] to store volatile and persistent
>        memory respectively
>     2) The single AddressSpace has been replaced by two AddressSpaces
>        [hostvmem_as] and [hostpmem_as] to map respective memdevs.
>     3) Each memory region size and total region are stored separately
>     4) The CDAT and DVSEC memory map entries have been updated:
>        a) if vmem is present, vmem is mapped at DPA(0)
>        b) if pmem is present
>           i)  and vmem is present, pmem is mapped at DPA(vmem->size)
>           ii) else, pmem is mapped at DPA(0)
>        c) partitioning of pmem is not supported in this patch set but
>           has been discussed and this design should suffice.
>     5) Read/Write functions have been updated to access AddressSpaces
>        according to the mapping described in #4
>     6) cxl-mailbox has been updated to report the respective size of
>        volatile and persistent memory regions
>     7) SRAT entries may optionally be generated by manually assigning
>        memdevs to a cpuless numa node
> 
> To support the Device Physical Address (DPA) Mapping decisions, see
> CXL Spec (3.0) Section 8.2.9.8.2.0 - Get Partition Info:
>   Active Volatile Memory
>     The device shall provide this volatile capacity starting at DPA 0
>   Active Persistent Memory
>     The device shall provide this persistent capacity starting at the
>     DPA immediately following the volatile capacity
> 
> Partitioning of Persistent Memory regions may be supported on
> following patch sets.
> 
> 
> Gregory Price (4):
>   hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios
>   hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
>   hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
>   hw/acpi/cxl.c: Fill in SRAT for vmem/pmem if NUMA node is assigned
> 
>  docs/system/devices/cxl.rst |  74 ++++++++--
>  hw/acpi/cxl.c               |  67 +++++++++
>  hw/cxl/cxl-mailbox-utils.c  |  23 +--
>  hw/i386/acpi-build.c        |   4 +
>  hw/i386/pc.c                |   2 -
>  hw/mem/cxl_type3.c          | 274 +++++++++++++++++++++++++++---------
>  include/hw/acpi/cxl.h       |   1 +
>  include/hw/cxl/cxl_device.h |  11 +-
>  tests/qtest/cxl-test.c      | 111 +++++++++++----
>  9 files changed, 443 insertions(+), 124 deletions(-)
> 
> -- 
> 2.37.3
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-26  0:47 ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
                     ` (4 preceding siblings ...)
  2022-10-26 20:13   ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Adam Manzanares
@ 2022-10-26 20:15   ` Michael S. Tsirkin
  2022-10-26 20:20   ` Michael S. Tsirkin
  6 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2022-10-26 20:15 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, jonathan.cameron, linux-cxl, marcel.apfelbaum,
	imammedo, ani, alison.schofield, dave, a.manzanares, bwidawsk,
	gregory.price, hchkuo, cbrowy, ira.weiny

On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:
> Submitted as an extention to the multi-feature branch maintained
> by Jonathan Cameron at:
> https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24 

BTW pls set subject prefix for all patches, and put it before the patch #.
-v parameter to format-patch will do this for you.

> 
> Summary of Changes:
> 1) E820 CFMW Bug fix.  
> 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev
> 
> 
> Regarding the E820 fix
>   * This bugfix is required for memory regions to work on x86
>   * input from Dan Williams and others suggest that E820 entry for
>     the CFMW should not exist, as it is expected to be dynamically
>     assigned at runtime.  If this entry exists, it instead blocks
>     region creation by nature of the memory region being marked as
>     reserved.
> 
> Regarding Multi-Region and Volatile Memory
>   * Developed with input from Jonathan Cameron and Davidlohr Bueso.
> 
> Regarding SRAT Generation for Type-3 Devices
>   * Co-Developed by Davidlohr Bueso.  Built from his base patch and
>     extended to work with both volatile and persistent regions.
>   * This can be used to demonstrate static type-3 device mapping and
>     testing numa-access to type-3 device memory regions.
> 
> 
> This series brings 3 features to CXL Type-3 Devices:
>     1) Volatile Memory Region support
>     2) Multi-Region support (1 Volatile, 1 Persistent)
>     3) (optional) SRAT Entry generation for type-3 device regions
> 
> In this series we implement multi-region and volatile region support
> through 7 major changes to CXL devices
>     1) The HostMemoryBackend [hostmem] has been replaced by two
>        [hostvmem] and [hostpmem] to store volatile and persistent
>        memory respectively
>     2) The single AddressSpace has been replaced by two AddressSpaces
>        [hostvmem_as] and [hostpmem_as] to map respective memdevs.
>     3) Each memory region size and total region are stored separately
>     4) The CDAT and DVSEC memory map entries have been updated:
>        a) if vmem is present, vmem is mapped at DPA(0)
>        b) if pmem is present
>           i)  and vmem is present, pmem is mapped at DPA(vmem->size)
>           ii) else, pmem is mapped at DPA(0)
>        c) partitioning of pmem is not supported in this patch set but
>           has been discussed and this design should suffice.
>     5) Read/Write functions have been updated to access AddressSpaces
>        according to the mapping described in #4
>     6) cxl-mailbox has been updated to report the respective size of
>        volatile and persistent memory regions
>     7) SRAT entries may optionally be generated by manually assigning
>        memdevs to a cpuless numa node
> 
> To support the Device Physical Address (DPA) Mapping decisions, see
> CXL Spec (3.0) Section 8.2.9.8.2.0 - Get Partition Info:
>   Active Volatile Memory
>     The device shall provide this volatile capacity starting at DPA 0
>   Active Persistent Memory
>     The device shall provide this persistent capacity starting at the
>     DPA immediately following the volatile capacity
> 
> Partitioning of Persistent Memory regions may be supported on
> following patch sets.
> 
> 
> Gregory Price (4):
>   hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios
>   hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
>   hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
>   hw/acpi/cxl.c: Fill in SRAT for vmem/pmem if NUMA node is assigned
> 
>  docs/system/devices/cxl.rst |  74 ++++++++--
>  hw/acpi/cxl.c               |  67 +++++++++
>  hw/cxl/cxl-mailbox-utils.c  |  23 +--
>  hw/i386/acpi-build.c        |   4 +
>  hw/i386/pc.c                |   2 -
>  hw/mem/cxl_type3.c          | 274 +++++++++++++++++++++++++++---------
>  include/hw/acpi/cxl.h       |   1 +
>  include/hw/cxl/cxl_device.h |  11 +-
>  tests/qtest/cxl-test.c      | 111 +++++++++++----
>  9 files changed, 443 insertions(+), 124 deletions(-)
> 
> -- 
> 2.37.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-26  0:47 ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
                     ` (5 preceding siblings ...)
  2022-10-26 20:15   ` Michael S. Tsirkin
@ 2022-10-26 20:20   ` Michael S. Tsirkin
  2022-10-26 20:48     ` Gregory Price
  6 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2022-10-26 20:20 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, jonathan.cameron, linux-cxl, marcel.apfelbaum,
	imammedo, ani, alison.schofield, dave, a.manzanares, bwidawsk,
	gregory.price, hchkuo, cbrowy, ira.weiny

On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:
> Submitted as an extention to the multi-feature branch maintained
> by Jonathan Cameron at:
> https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24 
> 

I am not supposed to merge this patchset yet, right?
That branch has a bunch of patches not yet posted for review.
Pls add "RFC" in the subject when that is the case.

Thanks!


> Summary of Changes:
> 1) E820 CFMW Bug fix.  
> 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev
> 
> 
> Regarding the E820 fix
>   * This bugfix is required for memory regions to work on x86
>   * input from Dan Williams and others suggest that E820 entry for
>     the CFMW should not exist, as it is expected to be dynamically
>     assigned at runtime.  If this entry exists, it instead blocks
>     region creation by nature of the memory region being marked as
>     reserved.
> 
> Regarding Multi-Region and Volatile Memory
>   * Developed with input from Jonathan Cameron and Davidlohr Bueso.
> 
> Regarding SRAT Generation for Type-3 Devices
>   * Co-Developed by Davidlohr Bueso.  Built from his base patch and
>     extended to work with both volatile and persistent regions.
>   * This can be used to demonstrate static type-3 device mapping and
>     testing numa-access to type-3 device memory regions.
> 
> 
> This series brings 3 features to CXL Type-3 Devices:
>     1) Volatile Memory Region support
>     2) Multi-Region support (1 Volatile, 1 Persistent)
>     3) (optional) SRAT Entry generation for type-3 device regions
> 
> In this series we implement multi-region and volatile region support
> through 7 major changes to CXL devices
>     1) The HostMemoryBackend [hostmem] has been replaced by two
>        [hostvmem] and [hostpmem] to store volatile and persistent
>        memory respectively
>     2) The single AddressSpace has been replaced by two AddressSpaces
>        [hostvmem_as] and [hostpmem_as] to map respective memdevs.
>     3) Each memory region size and total region are stored separately
>     4) The CDAT and DVSEC memory map entries have been updated:
>        a) if vmem is present, vmem is mapped at DPA(0)
>        b) if pmem is present
>           i)  and vmem is present, pmem is mapped at DPA(vmem->size)
>           ii) else, pmem is mapped at DPA(0)
>        c) partitioning of pmem is not supported in this patch set but
>           has been discussed and this design should suffice.
>     5) Read/Write functions have been updated to access AddressSpaces
>        according to the mapping described in #4
>     6) cxl-mailbox has been updated to report the respective size of
>        volatile and persistent memory regions
>     7) SRAT entries may optionally be generated by manually assigning
>        memdevs to a cpuless numa node
> 
> To support the Device Physical Address (DPA) Mapping decisions, see
> CXL Spec (3.0) Section 8.2.9.8.2.0 - Get Partition Info:
>   Active Volatile Memory
>     The device shall provide this volatile capacity starting at DPA 0
>   Active Persistent Memory
>     The device shall provide this persistent capacity starting at the
>     DPA immediately following the volatile capacity
> 
> Partitioning of Persistent Memory regions may be supported on
> following patch sets.
> 
> 
> Gregory Price (4):
>   hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios
>   hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
>   hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
>   hw/acpi/cxl.c: Fill in SRAT for vmem/pmem if NUMA node is assigned
> 
>  docs/system/devices/cxl.rst |  74 ++++++++--
>  hw/acpi/cxl.c               |  67 +++++++++
>  hw/cxl/cxl-mailbox-utils.c  |  23 +--
>  hw/i386/acpi-build.c        |   4 +
>  hw/i386/pc.c                |   2 -
>  hw/mem/cxl_type3.c          | 274 +++++++++++++++++++++++++++---------
>  include/hw/acpi/cxl.h       |   1 +
>  include/hw/cxl/cxl_device.h |  11 +-
>  tests/qtest/cxl-test.c      | 111 +++++++++++----
>  9 files changed, 443 insertions(+), 124 deletions(-)
> 
> -- 
> 2.37.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios
  2022-10-26  0:47   ` [PATCH 1/4] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios Gregory Price
@ 2022-10-26 20:33     ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2022-10-26 20:33 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, jonathan.cameron, linux-cxl, marcel.apfelbaum,
	imammedo, ani, alison.schofield, dave, a.manzanares, bwidawsk,
	gregory.price, hchkuo, cbrowy, ira.weiny

On Tue, Oct 25, 2022 at 08:47:34PM -0400, Gregory Price wrote:
> Early-boot e820 records will be inserted by the bios/efi/early boot
> software and be reported to the kernel via insert_resource.  Later, when
> CXL drivers iterate through the regions again, they will insert another
> resource and make the RESERVED memory area a child.
> 
> This RESERVED memory area causes the memory region to become unusable,
> and as a result attempting to create memory regions with
> 
>     `cxl create-region ...`
> 
> Will fail due to the RESERVED area intersecting with the CXL window.
> 
> During boot the following traceback is observed:
> 
> 0xffffffff81101650 in insert_resource_expand_to_fit ()
> 0xffffffff83d964c5 in e820__reserve_resources_late ()
> 0xffffffff83e03210 in pcibios_resource_survey ()
> 0xffffffff83e04f4a in pcibios_init ()
> 
> Which produces a call to reserve the CFMWS area:
> 
> (gdb) p *new
> $54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved",
>        flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0,
>        child = 0x0}
> 
> Later the Kernel parses ACPI tables and reserves the exact same area as
> the CXL Fixed Memory Window:
> 
> 0xffffffff811016a4 in insert_resource_conflict ()
>                       insert_resource ()
> 0xffffffff81a81389 in cxl_parse_cfmws ()
> 0xffffffff818c4a81 in call_handler ()
>                       acpi_parse_entries_array ()
> 
> (gdb) p/x *new
> $59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0",
>        flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0,
>        child = 0x0}
> 
> This produces the following output in /proc/iomem:
> 
> 590000000-68fffffff : CXL Window 0
>   590000000-68fffffff : Reserved
> 
> This reserved area causes `get_free_mem_region()` to fail due to a check
> against `__region_intersects()`.  Due to this reserved area, the
> intersect check will only ever return REGION_INTERSECTS, which causes
> `cxl create-region` to always fail.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>


Do we want this as a bugfix, separate from the rest of the
patchset?

> ---
>  hw/i386/pc.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 768982ae9a..203c90fedb 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1062,7 +1062,6 @@ void pc_memory_init(PCMachineState *pcms,
>          hwaddr cxl_size = MiB;
>  
>          cxl_base = pc_get_cxl_range_start(pcms);
> -        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
>          memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
>          memory_region_add_subregion(system_memory, cxl_base, mr);
>          cxl_resv_end = cxl_base + cxl_size;
> @@ -1078,7 +1077,6 @@ void pc_memory_init(PCMachineState *pcms,
>                  memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
>                                        "cxl-fixed-memory-region", fw->size);
>                  memory_region_add_subregion(system_memory, fw->base, &fw->mr);
> -                e820_add_entry(fw->base, fw->size, E820_RESERVED);
>                  cxl_fmw_base += fw->size;
>                  cxl_resv_end = cxl_fmw_base;
>              }
> -- 
> 2.37.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-26 20:13   ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Adam Manzanares
@ 2022-10-26 20:47     ` Gregory Price
  2022-10-27 10:58         ` Jonathan Cameron via
  0 siblings, 1 reply; 23+ messages in thread
From: Gregory Price @ 2022-10-26 20:47 UTC (permalink / raw)
  To: Adam Manzanares
  Cc: Gregory Price, qemu-devel, jonathan.cameron, linux-cxl, mst,
	marcel.apfelbaum, imammedo, ani, alison.schofield, dave,
	bwidawsk, hchkuo, cbrowy, ira.weiny

On Wed, Oct 26, 2022 at 08:13:24PM +0000, Adam Manzanares wrote:
> On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:
> > Submitted as an extention to the multi-feature branch maintained
> > by Jonathan Cameron at:
> > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24__;!!EwVzqGoTKBqv-0DWAJBm!RyiGL5B1XmQnVFwgxikKJeosPMKtoO1cTr61gIq8fwqfju8l4cbGZGwAEkKXIJB-Dbkfi_LNN2rGCbzMISz65cTxpAxI9pQ$   
> > 
> > 
> > Summary of Changes:
> > 1) E820 CFMW Bug fix.  
> > 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> > 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> > 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev
> > 
> > 
> > Regarding the E820 fix
> >   * This bugfix is required for memory regions to work on x86
> >   * input from Dan Williams and others suggest that E820 entry for
> >     the CFMW should not exist, as it is expected to be dynamically
> >     assigned at runtime.  If this entry exists, it instead blocks
> >     region creation by nature of the memory region being marked as
> >     reserved.
> 
> For CXL 2.0 it is my understanding that volatile capacity present at boot will
> be advertised by the firmware. In the absence of EFI I would assume this would
> be provided in the e820 map. 

The issue in this case is very explicitly that a double-mapping occurs
for the same region.  An E820 mapping for RESERVED is set *and* the
region driver allocates a CXL CFMW mapping.  As a result the region
drive straight up fails to allocate regions.

So in either case - at least from my view - the entry added as RESERVED
is just wrong.

This is separate from type-3 device SRAT entries and default mappings
for volatile regions.  For this situation, if you explicitly assign the
memdev backing a type-3 device to a numa node, then an SRAT area is
generated and an explicit e820 entry is generated and marked usable -
though I think there are likely issues with this kind of
double-referencing.

> 
> Is the region driver meant to cover volatile capacity present at boot? I was
> under the impression that it would be used for hot added volatile memory. It
> would be good to cover all of these assumptions for the e820 fix.

This region appears to cover hotplug memory behind the CFMW.  The
problem is that this e820 RESERVED mapping blocks the CFMW region from
being used at all.

Without this, you can't use a type-3 persistent region, even with
support, let alone a volatile region.  In attempting to use a persistent
region as volatile via ndctl and friends, I'm seeing further issues (it
cannot be assigned to a numa node successfully), but that's a separate
issue.

> 
> Lastly it is my understanding that the region driver does not have support for
> volatile memory. It would be great to add that functionality if we make this
> change in QEMU.
> 

Right now this is true, but it seems a bit of a chicken/egg scenario.
Nothing to test against vs no support.  Nudging this along such that we
can at least report an (unusable) hot-add volatile memory region would
provide someone working with the region driver something to poke and
prod at.

> > Regarding SRAT Generation for Type-3 Devices
> >   * Co-Developed by Davidlohr Bueso.  Built from his base patch and
> >     extended to work with both volatile and persistent regions.
> >   * This can be used to demonstrate static type-3 device mapping and
> >     testing numa-access to type-3 device memory regions.

Regarding "volatile memory present at boot" - there is still two ways
for that memory to be onlined: Statically (entered as an explicit e820
region after reading the SRAT), or Dynamically (hot-add by the region
driver).

This patch would at least allow an SRAT to be generated if you
explicitly add a NUMA node mapping to it.  Although I concede that I'm
not entirely sure what is "correct" here.

What this ends up looking like is mapping a memdev to both a numa node
and to a type-3 device.  Though that seems wrong.

After further testing it seems like creating a CPU-less, Memory-less
NUMA node with the intent of mapping volatile memory regions to it is
not supported (yet?).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-26 20:20   ` Michael S. Tsirkin
@ 2022-10-26 20:48     ` Gregory Price
  0 siblings, 0 replies; 23+ messages in thread
From: Gregory Price @ 2022-10-26 20:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gregory Price, qemu-devel, jonathan.cameron, linux-cxl,
	marcel.apfelbaum, imammedo, ani, alison.schofield, dave,
	a.manzanares, bwidawsk, hchkuo, cbrowy, ira.weiny

On Wed, Oct 26, 2022 at 04:20:40PM -0400, Michael S. Tsirkin wrote:
> On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:
> > Submitted as an extention to the multi-feature branch maintained
> > by Jonathan Cameron at:
> > https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24 
> > 
> 
> I am not supposed to merge this patchset yet, right?
> That branch has a bunch of patches not yet posted for review.
> Pls add "RFC" in the subject when that is the case.
> 
> Thanks!
> 
> 

Correct, sorry, Jonathan asked me to send out a new round to incorporate
into his branch, I should have marked it RFC.

I will push up separate patch requests for the E820 and PCI_MEMORY_CXL
bug fixes.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-26 20:47     ` Gregory Price
@ 2022-10-27 10:58         ` Jonathan Cameron via
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron @ 2022-10-27 10:58 UTC (permalink / raw)
  To: Gregory Price
  Cc: Adam Manzanares, Gregory Price, qemu-devel, linux-cxl, mst,
	marcel.apfelbaum, imammedo, ani, alison.schofield, dave,
	bwidawsk, hchkuo, cbrowy, ira.weiny, Dan Williams

On Wed, 26 Oct 2022 16:47:18 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Wed, Oct 26, 2022 at 08:13:24PM +0000, Adam Manzanares wrote:
> > On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:  
> > > Submitted as an extention to the multi-feature branch maintained
> > > by Jonathan Cameron at:
> > > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24__;!!EwVzqGoTKBqv-0DWAJBm!RyiGL5B1XmQnVFwgxikKJeosPMKtoO1cTr61gIq8fwqfju8l4cbGZGwAEkKXIJB-Dbkfi_LNN2rGCbzMISz65cTxpAxI9pQ$   
> > > 
> > > 
> > > Summary of Changes:
> > > 1) E820 CFMW Bug fix.  
> > > 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> > > 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> > > 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev

+CC Dan for a question on status of Generic Ports ACPI code first ECN.
Given that was done on the mailing list I can find any public tracking
of whether it was accepted or not - hence whether we can get on with
implementation.  There hasn't been a release ACPI spec since before
that was proposed so we need that confirmation of the code first proposal
being accepted to get things moving.

> > > 
> > > 
> > > Regarding the E820 fix
> > >   * This bugfix is required for memory regions to work on x86
> > >   * input from Dan Williams and others suggest that E820 entry for
> > >     the CFMW should not exist, as it is expected to be dynamically
> > >     assigned at runtime.  If this entry exists, it instead blocks
> > >     region creation by nature of the memory region being marked as
> > >     reserved.  
> > 
> > For CXL 2.0 it is my understanding that volatile capacity present at boot will
> > be advertised by the firmware. In the absence of EFI I would assume this would
> > be provided in the e820 map.   

Whilst this is one option, it is certainly not the case that all CXL 2.0
platforms will decide to do any setup of CXL memory (volatile or not) in the
firmware.  They can leave it entirely to the OS, so a cold plug flow.
Early platforms will do the setup in BIOS to support unware OSes, once
that's not a problem any more the only reason you'd want to do this is if
you don't have other RAM to boot the system, or for some reason you want
your host kernel etc in the CXL attached memory.

I'd expect to see BIOS having OS managed configuration as an option in the
intermediate period where some OSes are aware, others not.
OS knows more about usecase / policy so can make better choices on interleaving
etc of volatile CXL type 3 memory (let alone the fun corner of devices
where you can dynamically change the mix of volatile and non volatile
memory).


> 
> The issue in this case is very explicitly that a double-mapping occurs
> for the same region.  An E820 mapping for RESERVED is set *and* the
> region driver allocates a CXL CFMW mapping.  As a result the region
> drive straight up fails to allocate regions.
> 
> So in either case - at least from my view - the entry added as RESERVED
> is just wrong.
> 
> This is separate from type-3 device SRAT entries and default mappings
> for volatile regions.  For this situation, if you explicitly assign the
> memdev backing a type-3 device to a numa node, then an SRAT area is
> generated and an explicit e820 entry is generated and marked usable -
> though I think there are likely issues with this kind of
> double-referencing.

SRAT setup for CXL type 3 devices is to my mind is a job for a full BIOS,
not QEMU level of faking a few things. That BIOS should also
be doing the full configuration (HDM Decoders and all the rest).  ARM has
a prototype for one of the fixed virtual platforms that does this (talk
at Plumbers Uconf), we should look to do the same if we want to test
those flows using QEMU via appropriate changes in EDK2 to walk topology
and configure everything.  Until the devices are configured there is no
way to configure the SLIT, HMAT entries that align with the SRAT ones
(in theory those can be updated at runtime via _SLI, _HMA but in 
practice, I'm fairly sure Linux doesn't support doing that.)


> 
> > 
> > Is the region driver meant to cover volatile capacity present at boot? I was
> > under the impression that it would be used for hot added volatile memory. It
> > would be good to cover all of these assumptions for the e820 fix.  
> 
> This region appears to cover hotplug memory behind the CFMW.  The
> problem is that this e820 RESERVED mapping blocks the CFMW region from
> being used at all.
> 
> Without this, you can't use a type-3 persistent region, even with
> support, let alone a volatile region.  In attempting to use a persistent
> region as volatile via ndctl and friends, I'm seeing further issues (it
> cannot be assigned to a numa node successfully), but that's a separate
> issue.
For the Numa node bit...

So far, the CDAT table isn't used in Linux (read out for debug purposes
is supported only).  That all needs to be added yet to get the NUMA node
allocations to work correctly.  It shouldn't have anything to do with SRAT
unless the BIOS has done the full full configuration (same way CXL will work
with a legacy OS).  Come to think of it, that should definitely be on the
priority list for kernel support (don't think it was on the list on Tuesday?)

> 
> > 
> > Lastly it is my understanding that the region driver does not have support for
> > volatile memory. It would be great to add that functionality if we make this
> > change in QEMU.
> >   
> 
> Right now this is true, but it seems a bit of a chicken/egg scenario.
> Nothing to test against vs no support.  Nudging this along such that we
> can at least report an (unusable) hot-add volatile memory region would
> provide someone working with the region driver something to poke and
> prod at.

Agreed. This is same place as Ben's original CXL QEMU work on non volatile.
Need something to develop against so we'll at least have QEMU patches
available whilst the kernel side is in development (hopefully this cycle)
> 
> > > Regarding SRAT Generation for Type-3 Devices
> > >   * Co-Developed by Davidlohr Bueso.  Built from his base patch and
> > >     extended to work with both volatile and persistent regions.
> > >   * This can be used to demonstrate static type-3 device mapping and
> > >     testing numa-access to type-3 device memory regions.  
> 
> Regarding "volatile memory present at boot" - there is still two ways
> for that memory to be onlined: Statically (entered as an explicit e820
> region after reading the SRAT), or Dynamically (hot-add by the region
> driver).
> 
> This patch would at least allow an SRAT to be generated if you
> explicitly add a NUMA node mapping to it.  Although I concede that I'm
> not entirely sure what is "correct" here.

For hotplug, needs to be the region driver, not here (or BIOS if you
are doing a BIOS driven flow - in which case the whole topology is
discovered and mostly configured by the BIOS before the OS reaches it
- including filling in SRAT, SLIT, HMAT etCc).
> 
> What this ends up looking like is mapping a memdev to both a numa node
> and to a type-3 device.  Though that seems wrong.
> 
> After further testing it seems like creating a CPU-less, Memory-less
> NUMA node with the intent of mapping volatile memory regions to it is
> not supported (yet?).

Yup, and I doubt it ever will be for reasons above. 

BIOS does it all, or OS does it all.  Mixing and matching is a mess
(there is an exception - Generic Port entries in SRAT - those we do
need for the OS driven flow and possibly also the BIOS flow
- patches welcome! I'd forgotten that on my list of QEMU stuff that
needs doing.)

https://lore.kernel.org/all/e1a52da9aec90766da5de51b1b839fd95d63a5af.camel@intel.com/

If anyone is implementing that, also good to do Generic Initiators
as they are very similar.

Jonathan
 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
@ 2022-10-27 10:58         ` Jonathan Cameron via
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron via @ 2022-10-27 10:58 UTC (permalink / raw)
  To: Gregory Price
  Cc: Adam Manzanares, Gregory Price, qemu-devel, linux-cxl, mst,
	marcel.apfelbaum, imammedo, ani, alison.schofield, dave,
	bwidawsk, hchkuo, cbrowy, ira.weiny, Dan Williams

On Wed, 26 Oct 2022 16:47:18 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Wed, Oct 26, 2022 at 08:13:24PM +0000, Adam Manzanares wrote:
> > On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:  
> > > Submitted as an extention to the multi-feature branch maintained
> > > by Jonathan Cameron at:
> > > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24__;!!EwVzqGoTKBqv-0DWAJBm!RyiGL5B1XmQnVFwgxikKJeosPMKtoO1cTr61gIq8fwqfju8l4cbGZGwAEkKXIJB-Dbkfi_LNN2rGCbzMISz65cTxpAxI9pQ$   
> > > 
> > > 
> > > Summary of Changes:
> > > 1) E820 CFMW Bug fix.  
> > > 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> > > 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> > > 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev

+CC Dan for a question on status of Generic Ports ACPI code first ECN.
Given that was done on the mailing list I can find any public tracking
of whether it was accepted or not - hence whether we can get on with
implementation.  There hasn't been a release ACPI spec since before
that was proposed so we need that confirmation of the code first proposal
being accepted to get things moving.

> > > 
> > > 
> > > Regarding the E820 fix
> > >   * This bugfix is required for memory regions to work on x86
> > >   * input from Dan Williams and others suggest that E820 entry for
> > >     the CFMW should not exist, as it is expected to be dynamically
> > >     assigned at runtime.  If this entry exists, it instead blocks
> > >     region creation by nature of the memory region being marked as
> > >     reserved.  
> > 
> > For CXL 2.0 it is my understanding that volatile capacity present at boot will
> > be advertised by the firmware. In the absence of EFI I would assume this would
> > be provided in the e820 map.   

Whilst this is one option, it is certainly not the case that all CXL 2.0
platforms will decide to do any setup of CXL memory (volatile or not) in the
firmware.  They can leave it entirely to the OS, so a cold plug flow.
Early platforms will do the setup in BIOS to support unware OSes, once
that's not a problem any more the only reason you'd want to do this is if
you don't have other RAM to boot the system, or for some reason you want
your host kernel etc in the CXL attached memory.

I'd expect to see BIOS having OS managed configuration as an option in the
intermediate period where some OSes are aware, others not.
OS knows more about usecase / policy so can make better choices on interleaving
etc of volatile CXL type 3 memory (let alone the fun corner of devices
where you can dynamically change the mix of volatile and non volatile
memory).


> 
> The issue in this case is very explicitly that a double-mapping occurs
> for the same region.  An E820 mapping for RESERVED is set *and* the
> region driver allocates a CXL CFMW mapping.  As a result the region
> drive straight up fails to allocate regions.
> 
> So in either case - at least from my view - the entry added as RESERVED
> is just wrong.
> 
> This is separate from type-3 device SRAT entries and default mappings
> for volatile regions.  For this situation, if you explicitly assign the
> memdev backing a type-3 device to a numa node, then an SRAT area is
> generated and an explicit e820 entry is generated and marked usable -
> though I think there are likely issues with this kind of
> double-referencing.

SRAT setup for CXL type 3 devices is to my mind is a job for a full BIOS,
not QEMU level of faking a few things. That BIOS should also
be doing the full configuration (HDM Decoders and all the rest).  ARM has
a prototype for one of the fixed virtual platforms that does this (talk
at Plumbers Uconf), we should look to do the same if we want to test
those flows using QEMU via appropriate changes in EDK2 to walk topology
and configure everything.  Until the devices are configured there is no
way to configure the SLIT, HMAT entries that align with the SRAT ones
(in theory those can be updated at runtime via _SLI, _HMA but in 
practice, I'm fairly sure Linux doesn't support doing that.)


> 
> > 
> > Is the region driver meant to cover volatile capacity present at boot? I was
> > under the impression that it would be used for hot added volatile memory. It
> > would be good to cover all of these assumptions for the e820 fix.  
> 
> This region appears to cover hotplug memory behind the CFMW.  The
> problem is that this e820 RESERVED mapping blocks the CFMW region from
> being used at all.
> 
> Without this, you can't use a type-3 persistent region, even with
> support, let alone a volatile region.  In attempting to use a persistent
> region as volatile via ndctl and friends, I'm seeing further issues (it
> cannot be assigned to a numa node successfully), but that's a separate
> issue.
For the Numa node bit...

So far, the CDAT table isn't used in Linux (read out for debug purposes
is supported only).  That all needs to be added yet to get the NUMA node
allocations to work correctly.  It shouldn't have anything to do with SRAT
unless the BIOS has done the full full configuration (same way CXL will work
with a legacy OS).  Come to think of it, that should definitely be on the
priority list for kernel support (don't think it was on the list on Tuesday?)

> 
> > 
> > Lastly it is my understanding that the region driver does not have support for
> > volatile memory. It would be great to add that functionality if we make this
> > change in QEMU.
> >   
> 
> Right now this is true, but it seems a bit of a chicken/egg scenario.
> Nothing to test against vs no support.  Nudging this along such that we
> can at least report an (unusable) hot-add volatile memory region would
> provide someone working with the region driver something to poke and
> prod at.

Agreed. This is same place as Ben's original CXL QEMU work on non volatile.
Need something to develop against so we'll at least have QEMU patches
available whilst the kernel side is in development (hopefully this cycle)
> 
> > > Regarding SRAT Generation for Type-3 Devices
> > >   * Co-Developed by Davidlohr Bueso.  Built from his base patch and
> > >     extended to work with both volatile and persistent regions.
> > >   * This can be used to demonstrate static type-3 device mapping and
> > >     testing numa-access to type-3 device memory regions.  
> 
> Regarding "volatile memory present at boot" - there is still two ways
> for that memory to be onlined: Statically (entered as an explicit e820
> region after reading the SRAT), or Dynamically (hot-add by the region
> driver).
> 
> This patch would at least allow an SRAT to be generated if you
> explicitly add a NUMA node mapping to it.  Although I concede that I'm
> not entirely sure what is "correct" here.

For hotplug, needs to be the region driver, not here (or BIOS if you
are doing a BIOS driven flow - in which case the whole topology is
discovered and mostly configured by the BIOS before the OS reaches it
- including filling in SRAT, SLIT, HMAT etCc).
> 
> What this ends up looking like is mapping a memdev to both a numa node
> and to a type-3 device.  Though that seems wrong.
> 
> After further testing it seems like creating a CPU-less, Memory-less
> NUMA node with the intent of mapping volatile memory regions to it is
> not supported (yet?).

Yup, and I doubt it ever will be for reasons above. 

BIOS does it all, or OS does it all.  Mixing and matching is a mess
(there is an exception - Generic Port entries in SRAT - those we do
need for the OS driven flow and possibly also the BIOS flow
- patches welcome! I'd forgotten that on my list of QEMU stuff that
needs doing.)

https://lore.kernel.org/all/e1a52da9aec90766da5de51b1b839fd95d63a5af.camel@intel.com/

If anyone is implementing that, also good to do Generic Initiators
as they are very similar.

Jonathan
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-27 10:58         ` Jonathan Cameron via
  (?)
@ 2022-10-27 14:29         ` Gregory Price
  -1 siblings, 0 replies; 23+ messages in thread
From: Gregory Price @ 2022-10-27 14:29 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Adam Manzanares, Gregory Price, qemu-devel, linux-cxl, mst,
	marcel.apfelbaum, imammedo, ani, alison.schofield, dave,
	bwidawsk, hchkuo, cbrowy, ira.weiny, Dan Williams

ok to summarize then:

patch 1) e820 - submitted as a separate patch/bugfix for mst to pick up

patch 2&3) Pickup by Jonathan for his branch as it depends on DOE and other changes.

patch 4) incorrect, this should be done in bios/efi, drop entirely

On Thu, Oct 27, 2022 at 11:58:54AM +0100, Jonathan Cameron wrote:
> On Wed, 26 Oct 2022 16:47:18 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Wed, Oct 26, 2022 at 08:13:24PM +0000, Adam Manzanares wrote:
> > > On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:  
> > > > Submitted as an extention to the multi-feature branch maintained
> > > > by Jonathan Cameron at:
> > > > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24__;!!EwVzqGoTKBqv-0DWAJBm!RyiGL5B1XmQnVFwgxikKJeosPMKtoO1cTr61gIq8fwqfju8l4cbGZGwAEkKXIJB-Dbkfi_LNN2rGCbzMISz65cTxpAxI9pQ$   
> > > > 
> > > > 
> > > > Summary of Changes:
> > > > 1) E820 CFMW Bug fix.  
> > > > 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> > > > 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> > > > 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev
> 
> +CC Dan for a question on status of Generic Ports ACPI code first ECN.
> Given that was done on the mailing list I can find any public tracking
> of whether it was accepted or not - hence whether we can get on with
> implementation.  There hasn't been a release ACPI spec since before
> that was proposed so we need that confirmation of the code first proposal
> being accepted to get things moving.
> 

/* snip for brevity */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-27 10:58         ` Jonathan Cameron via
  (?)
  (?)
@ 2022-10-27 18:10         ` Adam Manzanares
  -1 siblings, 0 replies; 23+ messages in thread
From: Adam Manzanares @ 2022-10-27 18:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Gregory Price, Gregory Price, qemu-devel, linux-cxl, mst,
	marcel.apfelbaum, imammedo, ani, alison.schofield, dave,
	bwidawsk, hchkuo, cbrowy, ira.weiny, Dan Williams

On Thu, Oct 27, 2022 at 11:58:54AM +0100, Jonathan Cameron wrote:
> On Wed, 26 Oct 2022 16:47:18 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Wed, Oct 26, 2022 at 08:13:24PM +0000, Adam Manzanares wrote:
> > > On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:  
> > > > Submitted as an extention to the multi-feature branch maintained
> > > > by Jonathan Cameron at:
> > > > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24__;!!EwVzqGoTKBqv-0DWAJBm!RyiGL5B1XmQnVFwgxikKJeosPMKtoO1cTr61gIq8fwqfju8l4cbGZGwAEkKXIJB-Dbkfi_LNN2rGCbzMISz65cTxpAxI9pQ$   
> > > > 
> > > > 
> > > > Summary of Changes:
> > > > 1) E820 CFMW Bug fix.  
> > > > 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> > > > 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> > > > 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev
> 
> +CC Dan for a question on status of Generic Ports ACPI code first ECN.
> Given that was done on the mailing list I can find any public tracking
> of whether it was accepted or not - hence whether we can get on with
> implementation.  There hasn't been a release ACPI spec since before
> that was proposed so we need that confirmation of the code first proposal
> being accepted to get things moving.
> 
> > > > 
> > > > 
> > > > Regarding the E820 fix
> > > >   * This bugfix is required for memory regions to work on x86
> > > >   * input from Dan Williams and others suggest that E820 entry for
> > > >     the CFMW should not exist, as it is expected to be dynamically
> > > >     assigned at runtime.  If this entry exists, it instead blocks
> > > >     region creation by nature of the memory region being marked as
> > > >     reserved.  
> > > 
> > > For CXL 2.0 it is my understanding that volatile capacity present at boot will
> > > be advertised by the firmware. In the absence of EFI I would assume this would
> > > be provided in the e820 map.   
> 
> Whilst this is one option, it is certainly not the case that all CXL 2.0
> platforms will decide to do any setup of CXL memory (volatile or not) in the
> firmware.  They can leave it entirely to the OS, so a cold plug flow.
> Early platforms will do the setup in BIOS to support unware OSes, once
> that's not a problem any more the only reason you'd want to do this is if
> you don't have other RAM to boot the system, or for some reason you want
> your host kernel etc in the CXL attached memory.
> 
> I'd expect to see BIOS having OS managed configuration as an option in the
> intermediate period where some OSes are aware, others not.
> OS knows more about usecase / policy so can make better choices on interleaving
> etc of volatile CXL type 3 memory (let alone the fun corner of devices
> where you can dynamically change the mix of volatile and non volatile
> memory).
> 
> 
> > 
> > The issue in this case is very explicitly that a double-mapping occurs
> > for the same region.  An E820 mapping for RESERVED is set *and* the
> > region driver allocates a CXL CFMW mapping.  As a result the region
> > drive straight up fails to allocate regions.
> > 
> > So in either case - at least from my view - the entry added as RESERVED
> > is just wrong.
> > 
> > This is separate from type-3 device SRAT entries and default mappings
> > for volatile regions.  For this situation, if you explicitly assign the
> > memdev backing a type-3 device to a numa node, then an SRAT area is
> > generated and an explicit e820 entry is generated and marked usable -
> > though I think there are likely issues with this kind of
> > double-referencing.
> 
> SRAT setup for CXL type 3 devices is to my mind is a job for a full BIOS,
> not QEMU level of faking a few things. That BIOS should also
> be doing the full configuration (HDM Decoders and all the rest).  ARM has
> a prototype for one of the fixed virtual platforms that does this (talk
> at Plumbers Uconf), we should look to do the same if we want to test
> those flows using QEMU via appropriate changes in EDK2 to walk topology
> and configure everything.  Until the devices are configured there is no
> way to configure the SLIT, HMAT entries that align with the SRAT ones
> (in theory those can be updated at runtime via _SLI, _HMA but in 
> practice, I'm fairly sure Linux doesn't support doing that.)
> 
> 
> > 
> > > 
> > > Is the region driver meant to cover volatile capacity present at boot? I was
> > > under the impression that it would be used for hot added volatile memory. It
> > > would be good to cover all of these assumptions for the e820 fix.  
> > 
> > This region appears to cover hotplug memory behind the CFMW.  The
> > problem is that this e820 RESERVED mapping blocks the CFMW region from
> > being used at all.
> > 
> > Without this, you can't use a type-3 persistent region, even with
> > support, let alone a volatile region.  In attempting to use a persistent
> > region as volatile via ndctl and friends, I'm seeing further issues (it
> > cannot be assigned to a numa node successfully), but that's a separate
> > issue.
> For the Numa node bit...
> 
> So far, the CDAT table isn't used in Linux (read out for debug purposes
> is supported only).  That all needs to be added yet to get the NUMA node
> allocations to work correctly.  It shouldn't have anything to do with SRAT
> unless the BIOS has done the full full configuration (same way CXL will work
> with a legacy OS).  Come to think of it, that should definitely be on the
> priority list for kernel support (don't think it was on the list on Tuesday?)
> 
> > 
> > > 
> > > Lastly it is my understanding that the region driver does not have support for
> > > volatile memory. It would be great to add that functionality if we make this
> > > change in QEMU.
> > >   
> > 
> > Right now this is true, but it seems a bit of a chicken/egg scenario.
> > Nothing to test against vs no support.  Nudging this along such that we
> > can at least report an (unusable) hot-add volatile memory region would
> > provide someone working with the region driver something to poke and
> > prod at.
> 
> Agreed. This is same place as Ben's original CXL QEMU work on non volatile.
> Need something to develop against so we'll at least have QEMU patches
> available whilst the kernel side is in development (hopefully this cycle)
> > 
> > > > Regarding SRAT Generation for Type-3 Devices
> > > >   * Co-Developed by Davidlohr Bueso.  Built from his base patch and
> > > >     extended to work with both volatile and persistent regions.
> > > >   * This can be used to demonstrate static type-3 device mapping and
> > > >     testing numa-access to type-3 device memory regions.  
> > 
> > Regarding "volatile memory present at boot" - there is still two ways
> > for that memory to be onlined: Statically (entered as an explicit e820
> > region after reading the SRAT), or Dynamically (hot-add by the region
> > driver).
> > 
> > This patch would at least allow an SRAT to be generated if you
> > explicitly add a NUMA node mapping to it.  Although I concede that I'm
> > not entirely sure what is "correct" here.
> 
> For hotplug, needs to be the region driver, not here (or BIOS if you
> are doing a BIOS driven flow - in which case the whole topology is
> discovered and mostly configured by the BIOS before the OS reaches it
> - including filling in SRAT, SLIT, HMAT etCc).
> > 
> > What this ends up looking like is mapping a memdev to both a numa node
> > and to a type-3 device.  Though that seems wrong.
> > 
> > After further testing it seems like creating a CPU-less, Memory-less
> > NUMA node with the intent of mapping volatile memory regions to it is
> > not supported (yet?).
> 
> Yup, and I doubt it ever will be for reasons above. 
> 
> BIOS does it all, or OS does it all.  Mixing and matching is a mess
> (there is an exception - Generic Port entries in SRAT - those we do
> need for the OS driven flow and possibly also the BIOS flow
> - patches welcome! I'd forgotten that on my list of QEMU stuff that
> needs doing.)

Based on the discussions is it safe to say we have settled on an OS driven flow
for the current QEMU CXL emulation. 

> 
> https://urldefense.com/v3/__https://lore.kernel.org/all/e1a52da9aec90766da5de51b1b839fd95d63a5af.camel@intel.com/__;!!EwVzqGoTKBqv-0DWAJBm!XLqjPd1aXt3i5NXrZhakQlGdgJ5o4tcfV_5iUEp8vwBLv8T1Ft1OVHPI0p7KpYSFDYzhAGj_ulMz1LfZVmJgrOvrO-_v7udl$  
> 
> If anyone is implementing that, also good to do Generic Initiators
> as they are very similar.
> 
> Jonathan
>  
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
  2022-10-26  0:47   ` [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent) Gregory Price
@ 2022-11-14 17:53       ` Jonathan Cameron via
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron @ 2022-11-14 17:53 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, mst, marcel.apfelbaum, imammedo, ani,
	alison.schofield, dave, a.manzanares, bwidawsk, gregory.price,
	hchkuo, cbrowy, ira.weiny

On Tue, 25 Oct 2022 20:47:36 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> This commit enables each CXL Type-3 device to contain one volatile
> memory region and one persistent region.
> 
> Two new properties have been added to cxl-type3 device initialization:
>     [volatile-memdev] and [persistent-memdev]
> 
> The existing [memdev] property has been deprecated and will default the
> memory region to a persistent memory region (although a user may assign
> the region to a ram or file backed region). It cannot be used in
> combination with the new [persistent-memdev] property.
> 
> Partitioning volatile memory from persistent memory is not yet supported.
> 
> Volatile memory is mapped at DPA(0x0), while Persistent memory is mapped
> at DPA(vmem->size), per CXL Spec 8.2.9.8.2.0 - Get Partition Info.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

Hi Gregory,

I've not been rushing on this purely because we are after the feature
freeze for this QEMU cycle so no great rush to line up new features
(and there was some fun with the pull request the previous set of QEMU
CXL features were in - unrelated to those patches).

A few comments inline.

Once I've chased down a segfault on power down of my qemu setup (that
seems to have nothing to do with the CXL support. *sigh*) I'll push out
an updated tree with this on it for testing purposes.

Thanks,

Jonathan

> ---
>  docs/system/devices/cxl.rst |  53 +++++--
>  hw/cxl/cxl-mailbox-utils.c  |  21 ++-
>  hw/mem/cxl_type3.c          | 274 +++++++++++++++++++++++++++---------
>  include/hw/cxl/cxl_device.h |  11 +-
>  tests/qtest/cxl-test.c      | 111 +++++++++++----
>  5 files changed, 348 insertions(+), 122 deletions(-)
> 
> diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
> index f25783a4ec..9e165064c8 100644
> --- a/docs/system/devices/cxl.rst
> +++ b/docs/system/devices/cxl.rst
> @@ -300,15 +300,36 @@ Example topology involving a switch::
>  
>  Example command lines
>  ---------------------
> -A very simple setup with just one directly attached CXL Type 3 device::
> +A very simple setup with just one directly attached CXL Type 3 Persistent Memory device::
>  
>    qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
>    ...
> -  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
> -  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
> +  -object memory-backend-file,pmem=true,id=pmem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
> +  -object memory-backend-file,pmem=true,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \

Why make the pmem=true change in here? If we want to do that I think it should be in a
separate patch with explanation.

> +  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> +  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> +  -device cxl-type3,bus=root_port13,persistent-memdev=pmem0,lsa=cxl-lsa1,id=cxl-pmem0 \
> +  -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
> +
> +A very simple setup with just one directly attached CXL Type 3 Volatile Memory device::
> +
> +  qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
> +  ...
> +  -object memory-backend-ram,id=vmem0,share=on,size=256M \
>    -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
>    -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
> +  -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \
> +  -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G

...


> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index d7543fd5b4..c1183614b9 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -269,7 +269,7 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
>      } QEMU_PACKED *fw_info;
>      QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
>  
> -    if (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) {
> +    if (cxl_dstate->mem_size < CXL_CAPACITY_MULTIPLIER) {

I think this is also true of the individual types. Should we check each of them instead
much like we do below for cmd_identify_memoyr_Device.

>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
> @@ -412,20 +412,20 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
>  
>      CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
>      CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
> -    uint64_t size = cxl_dstate->pmem_size;
>  
> -    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
> +    if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> +        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
...


> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 0317bd96a6..21e866dcaf 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -32,7 +32,8 @@ enum {
>  };
>  

...

>  
> @@ -151,33 +152,67 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
>  static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
>  {
>      g_autofree CDATSubHeader **table = NULL;
> -    MemoryRegion *nonvolatile_mr;
>      CXLType3Dev *ct3d = priv;
> +    MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
>      int dsmad_handle = 0;
> +    int cur_ent = 0;
> +    int len = 0;
>      int rc;
>  
> -    if (!ct3d->hostmem) {
> +    if (!ct3d->hostpmem && !ct3d->hostvmem) {
>          return 0;
>      }
>  
> -    nonvolatile_mr = host_memory_backend_get_memory(ct3d->hostmem);
> -    if (!nonvolatile_mr) {
> -        return -EINVAL;
> +    if (ct3d->hostvmem) {
> +        volatile_mr = host_memory_backend_get_memory(ct3d->hostvmem);
> +        if (!volatile_mr) {
> +            return -EINVAL;
> +        }
> +        len += CT3_CDAT_NUM_ENTRIES;
> +    }
> +
> +    if (ct3d->hostpmem) {
> +        nonvolatile_mr = host_memory_backend_get_memory(ct3d->hostpmem);
> +        if (!nonvolatile_mr) {
> +            return -EINVAL;
> +        }
> +        len += CT3_CDAT_NUM_ENTRIES;
>      }
>  
> -    table = g_malloc0(CT3_CDAT_NUM_ENTRIES * sizeof(*table));
> +    table = g_malloc0(len * sizeof(*table));
>      if (!table) {
>          return -ENOMEM;
>      }
>  
> -    rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, nonvolatile_mr);
> -    if (rc < 0) {
> -        return rc;
> +    /* Now fill them in */
> +    if (volatile_mr) {
> +        rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
> +                true, 0);

Align second line of parameters with ( would look nicer.

> +        if (rc < 0) {
> +            return rc;
> +        }
> +        cur_ent = CT3_CDAT_NUM_ENTRIES;
> +    }
> +
> +    if (nonvolatile_mr) {
> +        rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
> +                nonvolatile_mr, true, (volatile_mr ? volatile_mr->size : 0));
> +        if (rc < 0) {
> +            goto error_cleanup;
> +        }
> +        cur_ent += CT3_CDAT_NUM_ENTRIES;
>      }
> +    assert(len == cur_ent);
>  
>      *cdat_table = g_steal_pointer(&table);
>  
> -    return CT3_CDAT_NUM_ENTRIES;
> +    return len;
> +error_cleanup:
> +    int i;
> +    for (i = 0; i < cur_ent; i++) {
> +        g_free(*cdat_table[i]);

Until the steal pointer above, *cdata_table not set to anything.
Possibly gfree(table[i])?


> +    }
> +    return rc;
>  }
>  
>  static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
> @@ -378,16 +413,48 @@ static void build_dvsecs(CXLType3Dev *ct3d)
>      CXLDVSECRegisterLocator *regloc_dvsec;
>      uint8_t *dvsec;
>      int i;
> +    uint32_t range1_size_hi = 0, range1_size_lo = 0,
> +             range1_base_hi = 0, range1_base_lo = 0,

range1 values always set, so no need to initialize. Hopefully the compiler can
see that.

> +             range2_size_hi = 0, range2_size_lo = 0,
> +             range2_base_hi = 0, range2_base_lo = 0;
> +
> +    /*
> +     * Volatile memory is mapped as (0x0)
> +     * Persistent memory is mapped at (volatile->size)
> +     */
> +    if (ct3d->hostvmem && ct3d->hostpmem) {
> +        range1_size_hi = ct3d->hostvmem->size >> 32;
> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                         (ct3d->hostvmem->size & 0xF0000000);
> +        range1_base_hi = 0;
> +        range1_base_lo = 0;
> +        range2_size_hi = ct3d->hostpmem->size >> 32;
> +        range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                         (ct3d->hostpmem->size & 0xF0000000);
> +        range2_base_hi = ct3d->hostvmem->size >> 32;
> +        range2_base_lo = ct3d->hostvmem->size & 0xF0000000;
> +    } else {
> +        HostMemoryBackend *hmbe = ct3d->hostvmem ?
> +                                  ct3d->hostvmem : ct3d->hostpmem;
> +        range1_size_hi = hmbe->size >> 32;
> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                         (hmbe->size & 0xF0000000);
> +        range1_base_hi = 0;
> +        range1_base_lo = 0;
> +    }

Hmm. I wonder if this is simpler done as below. Both look fine
to me though so up to you for next version.  Trade off between
slightly ugly nested logic, and the readability always lost when
a ternary operator puts in an appearance.

    if (ct3d->hostvmem) {
        range1_size_hi = ct3d->hostvmem->size >> 32;
        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                         (ct3d->hostvmem->size & 0xF0000000);
        range1_base_hi = 0;
        range1_base_lo = 0;
	if (ct3d->hostpmem) {
            range2_size_hi = ct3d->hostpmem->size >> 32;
            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                             (ct3d->hostpmem->size & 0xF0000000);
            range2_base_hi = ct3d->hostvmem->size >> 32;
            range2_base_lo = ct3d->hostvmem->size & 0xF0000000;
       }
   } else {
       range1_size_hi = hostpmem->size >> 32;
       range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                        (hostpmem->size & 0xF0000000);
       range1_base_hi = 0;
       range1_base_lo = 0;
   }


>  
>      dvsec = (uint8_t *)&(CXLDVSECDevice){
>          .cap = 0x1e,
>          .ctrl = 0x2,
>          .status2 = 0x2,
> -        .range1_size_hi = ct3d->hostmem->size >> 32,
> -        .range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> -        (ct3d->hostmem->size & 0xF0000000),
> -        .range1_base_hi = 0,
> -        .range1_base_lo = 0,
> +        .range1_size_hi = range1_size_hi,
> +        .range1_size_lo = range1_size_lo,
> +        .range1_base_hi = range1_base_hi,
> +        .range1_base_lo = range1_base_lo,
> +        .range2_size_hi = range2_size_hi,
> +        .range2_size_lo = range2_size_lo,
> +        .range2_base_hi = range2_base_hi,
> +        .range2_base_lo = range2_base_lo
>      };
>      cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE,
>                                 PCIE_CXL_DEVICE_DVSEC_LENGTH,
> @@ -475,33 +542,62 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>      MemoryRegion *mr;
>      char *name;


>  MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
> @@ -687,10 +801,17 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
>  {
>      CXLType3Dev *ct3d = CXL_TYPE3(d);
>      uint64_t dpa_offset;
> -    MemoryRegion *mr;
> +    MemoryRegion *vmr = NULL, *pmr = NULL;
> +    AddressSpace *as;
> +
> +    if (ct3d->hostvmem) {
> +        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> +    }
> +    if (ct3d->hostpmem) {
> +        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> +    }
>  
> -    mr = host_memory_backend_get_memory(ct3d->hostmem);
> -    if (!mr) {
> +    if (!vmr && !pmr) {
>          return MEMTX_OK;
>      }
>  
> @@ -698,11 +819,13 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
>          return MEMTX_OK;
>      }
>  
> -    if (dpa_offset > int128_get64(mr->size)) {
> +    if (dpa_offset > int128_get64(ct3d->cxl_dstate.mem_size)) {
>          return MEMTX_OK;
>      }
> -    return address_space_write(&ct3d->hostmem_as, dpa_offset, attrs,
> -                               &data, size);
> +
> +    as = (vmr && (dpa_offset <= int128_get64(vmr->size))) ?
> +         &ct3d->hostvmem_as : &ct3d->hostpmem_as;
> +    return address_space_write(as, dpa_offset, attrs, &data, size);

If we have both volatile and persistent and yet still only have our one HDM
decoder, then I think a write into the persistent range will have the wrong offset.

dpa_offset == address space offset when we only had one region. Not so much any more.
For persistent i think we'll need to subtract the size of the volatile region
(possibly taking care with alignment - I need to check that).

...

> diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
> index 6baed747fa..a05ddc0c7b 100644
> --- a/tests/qtest/cxl-test.c
> +++ b/tests/qtest/cxl-test.c
> @@ -34,29 +34,46 @@
>                   "-device cxl-rp,id=rp2,bus=cxl.1,chassis=0,slot=2 " \
>                   "-device cxl-rp,id=rp3,bus=cxl.1,chassis=0,slot=3 "
>  
> -#define QEMU_T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
> -                 "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> -                 "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
> -
> -#define QEMU_2T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
> -                  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 "
> -
> -#define QEMU_4T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
> -                  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
> -                  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 " \
> -                  "-object memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa2,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp2,memdev=cxl-mem2,lsa=lsa2,id=cxl-pmem2 " \
> -                  "-object memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp3,memdev=cxl-mem3,lsa=lsa3,id=cxl-pmem3 "

If re-indenting makes sense (I'm really convinced it is worth the noise) then do it
in a precusor no-op patch so we can more easily see what is new here.)

 
> +#define QEMU_T3D_DEPRECATED \
> +  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
> +
> +#define QEMU_T3D_PMEM \
> +  "-object memory-backend-file,id=m0,mem-path=%s,size=256M " \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
> +  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-m0,lsa=lsa0,id=pmem0 "
> +
> +#define QEMU_T3D_VMEM \
> +  "-object memory-backend-ram,id=mem0,size=256M " \
> +  "-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=mem0 "
> +
> +#define QEMU_T3D_VMEM_LSA \
> +  "-object memory-backend-ram,id=mem0,size=256M " \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
> +  "-device cxl-type3,bus=rp0,volatile-memdev=mem0,lsa=lsa0,id=mem0 "
> +
> +#define QEMU_2T3D \
> +  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=lsa0,id=pmem0 " \
> +  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp1,persistent-memdev=cxl-mem1,lsa=lsa1,id=pmem1 "
> +
> +#define QEMU_4T3D \
> +  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=lsa0,id=pmem0 " \
> +  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp1,persistent-memdev=cxl-mem1,lsa=lsa1,id=pmem1 " \
> +  "-object memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa2,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp2,persistent-memdev=cxl-mem2,lsa=lsa2,id=pmem2 " \
> +  "-object memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp3,persistent-memdev=cxl-mem3,lsa=lsa3,id=pmem3 "
>  
>  static void cxl_basic_hb(void)
>  {
> @@ -95,14 +112,53 @@ static void cxl_2root_port(void)
>  }
>  



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
@ 2022-11-14 17:53       ` Jonathan Cameron via
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron via @ 2022-11-14 17:53 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, mst, marcel.apfelbaum, imammedo, ani,
	alison.schofield, dave, a.manzanares, bwidawsk, gregory.price,
	hchkuo, cbrowy, ira.weiny

On Tue, 25 Oct 2022 20:47:36 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> This commit enables each CXL Type-3 device to contain one volatile
> memory region and one persistent region.
> 
> Two new properties have been added to cxl-type3 device initialization:
>     [volatile-memdev] and [persistent-memdev]
> 
> The existing [memdev] property has been deprecated and will default the
> memory region to a persistent memory region (although a user may assign
> the region to a ram or file backed region). It cannot be used in
> combination with the new [persistent-memdev] property.
> 
> Partitioning volatile memory from persistent memory is not yet supported.
> 
> Volatile memory is mapped at DPA(0x0), while Persistent memory is mapped
> at DPA(vmem->size), per CXL Spec 8.2.9.8.2.0 - Get Partition Info.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

Hi Gregory,

I've not been rushing on this purely because we are after the feature
freeze for this QEMU cycle so no great rush to line up new features
(and there was some fun with the pull request the previous set of QEMU
CXL features were in - unrelated to those patches).

A few comments inline.

Once I've chased down a segfault on power down of my qemu setup (that
seems to have nothing to do with the CXL support. *sigh*) I'll push out
an updated tree with this on it for testing purposes.

Thanks,

Jonathan

> ---
>  docs/system/devices/cxl.rst |  53 +++++--
>  hw/cxl/cxl-mailbox-utils.c  |  21 ++-
>  hw/mem/cxl_type3.c          | 274 +++++++++++++++++++++++++++---------
>  include/hw/cxl/cxl_device.h |  11 +-
>  tests/qtest/cxl-test.c      | 111 +++++++++++----
>  5 files changed, 348 insertions(+), 122 deletions(-)
> 
> diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
> index f25783a4ec..9e165064c8 100644
> --- a/docs/system/devices/cxl.rst
> +++ b/docs/system/devices/cxl.rst
> @@ -300,15 +300,36 @@ Example topology involving a switch::
>  
>  Example command lines
>  ---------------------
> -A very simple setup with just one directly attached CXL Type 3 device::
> +A very simple setup with just one directly attached CXL Type 3 Persistent Memory device::
>  
>    qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
>    ...
> -  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
> -  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
> +  -object memory-backend-file,pmem=true,id=pmem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
> +  -object memory-backend-file,pmem=true,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \

Why make the pmem=true change in here? If we want to do that I think it should be in a
separate patch with explanation.

> +  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> +  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> +  -device cxl-type3,bus=root_port13,persistent-memdev=pmem0,lsa=cxl-lsa1,id=cxl-pmem0 \
> +  -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
> +
> +A very simple setup with just one directly attached CXL Type 3 Volatile Memory device::
> +
> +  qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
> +  ...
> +  -object memory-backend-ram,id=vmem0,share=on,size=256M \
>    -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
>    -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
> +  -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \
> +  -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G

...


> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index d7543fd5b4..c1183614b9 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -269,7 +269,7 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
>      } QEMU_PACKED *fw_info;
>      QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
>  
> -    if (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) {
> +    if (cxl_dstate->mem_size < CXL_CAPACITY_MULTIPLIER) {

I think this is also true of the individual types. Should we check each of them instead
much like we do below for cmd_identify_memoyr_Device.

>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
> @@ -412,20 +412,20 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
>  
>      CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
>      CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
> -    uint64_t size = cxl_dstate->pmem_size;
>  
> -    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
> +    if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> +        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
...


> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 0317bd96a6..21e866dcaf 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -32,7 +32,8 @@ enum {
>  };
>  

...

>  
> @@ -151,33 +152,67 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
>  static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
>  {
>      g_autofree CDATSubHeader **table = NULL;
> -    MemoryRegion *nonvolatile_mr;
>      CXLType3Dev *ct3d = priv;
> +    MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
>      int dsmad_handle = 0;
> +    int cur_ent = 0;
> +    int len = 0;
>      int rc;
>  
> -    if (!ct3d->hostmem) {
> +    if (!ct3d->hostpmem && !ct3d->hostvmem) {
>          return 0;
>      }
>  
> -    nonvolatile_mr = host_memory_backend_get_memory(ct3d->hostmem);
> -    if (!nonvolatile_mr) {
> -        return -EINVAL;
> +    if (ct3d->hostvmem) {
> +        volatile_mr = host_memory_backend_get_memory(ct3d->hostvmem);
> +        if (!volatile_mr) {
> +            return -EINVAL;
> +        }
> +        len += CT3_CDAT_NUM_ENTRIES;
> +    }
> +
> +    if (ct3d->hostpmem) {
> +        nonvolatile_mr = host_memory_backend_get_memory(ct3d->hostpmem);
> +        if (!nonvolatile_mr) {
> +            return -EINVAL;
> +        }
> +        len += CT3_CDAT_NUM_ENTRIES;
>      }
>  
> -    table = g_malloc0(CT3_CDAT_NUM_ENTRIES * sizeof(*table));
> +    table = g_malloc0(len * sizeof(*table));
>      if (!table) {
>          return -ENOMEM;
>      }
>  
> -    rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, nonvolatile_mr);
> -    if (rc < 0) {
> -        return rc;
> +    /* Now fill them in */
> +    if (volatile_mr) {
> +        rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
> +                true, 0);

Align second line of parameters with ( would look nicer.

> +        if (rc < 0) {
> +            return rc;
> +        }
> +        cur_ent = CT3_CDAT_NUM_ENTRIES;
> +    }
> +
> +    if (nonvolatile_mr) {
> +        rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
> +                nonvolatile_mr, true, (volatile_mr ? volatile_mr->size : 0));
> +        if (rc < 0) {
> +            goto error_cleanup;
> +        }
> +        cur_ent += CT3_CDAT_NUM_ENTRIES;
>      }
> +    assert(len == cur_ent);
>  
>      *cdat_table = g_steal_pointer(&table);
>  
> -    return CT3_CDAT_NUM_ENTRIES;
> +    return len;
> +error_cleanup:
> +    int i;
> +    for (i = 0; i < cur_ent; i++) {
> +        g_free(*cdat_table[i]);

Until the steal pointer above, *cdata_table not set to anything.
Possibly gfree(table[i])?


> +    }
> +    return rc;
>  }
>  
>  static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
> @@ -378,16 +413,48 @@ static void build_dvsecs(CXLType3Dev *ct3d)
>      CXLDVSECRegisterLocator *regloc_dvsec;
>      uint8_t *dvsec;
>      int i;
> +    uint32_t range1_size_hi = 0, range1_size_lo = 0,
> +             range1_base_hi = 0, range1_base_lo = 0,

range1 values always set, so no need to initialize. Hopefully the compiler can
see that.

> +             range2_size_hi = 0, range2_size_lo = 0,
> +             range2_base_hi = 0, range2_base_lo = 0;
> +
> +    /*
> +     * Volatile memory is mapped as (0x0)
> +     * Persistent memory is mapped at (volatile->size)
> +     */
> +    if (ct3d->hostvmem && ct3d->hostpmem) {
> +        range1_size_hi = ct3d->hostvmem->size >> 32;
> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                         (ct3d->hostvmem->size & 0xF0000000);
> +        range1_base_hi = 0;
> +        range1_base_lo = 0;
> +        range2_size_hi = ct3d->hostpmem->size >> 32;
> +        range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                         (ct3d->hostpmem->size & 0xF0000000);
> +        range2_base_hi = ct3d->hostvmem->size >> 32;
> +        range2_base_lo = ct3d->hostvmem->size & 0xF0000000;
> +    } else {
> +        HostMemoryBackend *hmbe = ct3d->hostvmem ?
> +                                  ct3d->hostvmem : ct3d->hostpmem;
> +        range1_size_hi = hmbe->size >> 32;
> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                         (hmbe->size & 0xF0000000);
> +        range1_base_hi = 0;
> +        range1_base_lo = 0;
> +    }

Hmm. I wonder if this is simpler done as below. Both look fine
to me though so up to you for next version.  Trade off between
slightly ugly nested logic, and the readability always lost when
a ternary operator puts in an appearance.

    if (ct3d->hostvmem) {
        range1_size_hi = ct3d->hostvmem->size >> 32;
        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                         (ct3d->hostvmem->size & 0xF0000000);
        range1_base_hi = 0;
        range1_base_lo = 0;
	if (ct3d->hostpmem) {
            range2_size_hi = ct3d->hostpmem->size >> 32;
            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                             (ct3d->hostpmem->size & 0xF0000000);
            range2_base_hi = ct3d->hostvmem->size >> 32;
            range2_base_lo = ct3d->hostvmem->size & 0xF0000000;
       }
   } else {
       range1_size_hi = hostpmem->size >> 32;
       range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                        (hostpmem->size & 0xF0000000);
       range1_base_hi = 0;
       range1_base_lo = 0;
   }


>  
>      dvsec = (uint8_t *)&(CXLDVSECDevice){
>          .cap = 0x1e,
>          .ctrl = 0x2,
>          .status2 = 0x2,
> -        .range1_size_hi = ct3d->hostmem->size >> 32,
> -        .range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> -        (ct3d->hostmem->size & 0xF0000000),
> -        .range1_base_hi = 0,
> -        .range1_base_lo = 0,
> +        .range1_size_hi = range1_size_hi,
> +        .range1_size_lo = range1_size_lo,
> +        .range1_base_hi = range1_base_hi,
> +        .range1_base_lo = range1_base_lo,
> +        .range2_size_hi = range2_size_hi,
> +        .range2_size_lo = range2_size_lo,
> +        .range2_base_hi = range2_base_hi,
> +        .range2_base_lo = range2_base_lo
>      };
>      cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE,
>                                 PCIE_CXL_DEVICE_DVSEC_LENGTH,
> @@ -475,33 +542,62 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>      MemoryRegion *mr;
>      char *name;


>  MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
> @@ -687,10 +801,17 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
>  {
>      CXLType3Dev *ct3d = CXL_TYPE3(d);
>      uint64_t dpa_offset;
> -    MemoryRegion *mr;
> +    MemoryRegion *vmr = NULL, *pmr = NULL;
> +    AddressSpace *as;
> +
> +    if (ct3d->hostvmem) {
> +        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> +    }
> +    if (ct3d->hostpmem) {
> +        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> +    }
>  
> -    mr = host_memory_backend_get_memory(ct3d->hostmem);
> -    if (!mr) {
> +    if (!vmr && !pmr) {
>          return MEMTX_OK;
>      }
>  
> @@ -698,11 +819,13 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
>          return MEMTX_OK;
>      }
>  
> -    if (dpa_offset > int128_get64(mr->size)) {
> +    if (dpa_offset > int128_get64(ct3d->cxl_dstate.mem_size)) {
>          return MEMTX_OK;
>      }
> -    return address_space_write(&ct3d->hostmem_as, dpa_offset, attrs,
> -                               &data, size);
> +
> +    as = (vmr && (dpa_offset <= int128_get64(vmr->size))) ?
> +         &ct3d->hostvmem_as : &ct3d->hostpmem_as;
> +    return address_space_write(as, dpa_offset, attrs, &data, size);

If we have both volatile and persistent and yet still only have our one HDM
decoder, then I think a write into the persistent range will have the wrong offset.

dpa_offset == address space offset when we only had one region. Not so much any more.
For persistent i think we'll need to subtract the size of the volatile region
(possibly taking care with alignment - I need to check that).

...

> diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
> index 6baed747fa..a05ddc0c7b 100644
> --- a/tests/qtest/cxl-test.c
> +++ b/tests/qtest/cxl-test.c
> @@ -34,29 +34,46 @@
>                   "-device cxl-rp,id=rp2,bus=cxl.1,chassis=0,slot=2 " \
>                   "-device cxl-rp,id=rp3,bus=cxl.1,chassis=0,slot=3 "
>  
> -#define QEMU_T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
> -                 "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> -                 "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
> -
> -#define QEMU_2T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
> -                  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 "
> -
> -#define QEMU_4T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
> -                  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
> -                  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 " \
> -                  "-object memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa2,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp2,memdev=cxl-mem2,lsa=lsa2,id=cxl-pmem2 " \
> -                  "-object memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "    \
> -                  "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M "    \
> -                  "-device cxl-type3,bus=rp3,memdev=cxl-mem3,lsa=lsa3,id=cxl-pmem3 "

If re-indenting makes sense (I'm really convinced it is worth the noise) then do it
in a precusor no-op patch so we can more easily see what is new here.)

 
> +#define QEMU_T3D_DEPRECATED \
> +  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
> +
> +#define QEMU_T3D_PMEM \
> +  "-object memory-backend-file,id=m0,mem-path=%s,size=256M " \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
> +  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-m0,lsa=lsa0,id=pmem0 "
> +
> +#define QEMU_T3D_VMEM \
> +  "-object memory-backend-ram,id=mem0,size=256M " \
> +  "-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=mem0 "
> +
> +#define QEMU_T3D_VMEM_LSA \
> +  "-object memory-backend-ram,id=mem0,size=256M " \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
> +  "-device cxl-type3,bus=rp0,volatile-memdev=mem0,lsa=lsa0,id=mem0 "
> +
> +#define QEMU_2T3D \
> +  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=lsa0,id=pmem0 " \
> +  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp1,persistent-memdev=cxl-mem1,lsa=lsa1,id=pmem1 "
> +
> +#define QEMU_4T3D \
> +  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
> +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=lsa0,id=pmem0 " \
> +  "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp1,persistent-memdev=cxl-mem1,lsa=lsa1,id=pmem1 " \
> +  "-object memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa2,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp2,persistent-memdev=cxl-mem2,lsa=lsa2,id=pmem2 " \
> +  "-object memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "    \
> +  "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M "    \
> +  "-device cxl-type3,bus=rp3,persistent-memdev=cxl-mem3,lsa=lsa3,id=pmem3 "
>  
>  static void cxl_basic_hb(void)
>  {
> @@ -95,14 +112,53 @@ static void cxl_2root_port(void)
>  }
>  




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
  2022-10-26  0:47   ` [PATCH 2/4] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition Gregory Price
@ 2022-11-14 17:55       ` Jonathan Cameron via
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron @ 2022-11-14 17:55 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, mst, marcel.apfelbaum, imammedo, ani,
	alison.schofield, dave, a.manzanares, bwidawsk, gregory.price,
	hchkuo, cbrowy, ira.weiny

On Tue, 25 Oct 2022 20:47:35 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> Remove usage of magic numbers when accessing capacity fields and replace
> with CXL_CAPACITY_MULTIPLIER, matching the kernel definition.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Makes sense. I'll add it to my local tree.

Depending on how timing works out, I might roll this into a wider series
of cleanups sent to Michael or we might want to keep this as part of your
series.

Jonathan
 
> ---
>  hw/cxl/cxl-mailbox-utils.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 3e23d29e2d..d7543fd5b4 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -15,6 +15,8 @@
>  #include "qemu/log.h"
>  #include "qemu/uuid.h"
>  
> +#define CXL_CAPACITY_MULTIPLIER   0x10000000 /* SZ_256M */
> +
>  /*
>   * How to add a new command, example. The command set FOO, with cmd BAR.
>   *  1. Add the command set and cmd to the enum.
> @@ -267,7 +269,7 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
>      } QEMU_PACKED *fw_info;
>      QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
>  
> -    if (cxl_dstate->pmem_size < (256 << 20)) {
> +    if (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
> @@ -412,7 +414,7 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
>      CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
>      uint64_t size = cxl_dstate->pmem_size;
>  
> -    if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
> +    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
> @@ -422,8 +424,8 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
>      /* PMEM only */
>      snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
>  
> -    id->total_capacity = size / (256 << 20);
> -    id->persistent_capacity = size / (256 << 20);
> +    id->total_capacity = size / CXL_CAPACITY_MULTIPLIER;
> +    id->persistent_capacity = size / CXL_CAPACITY_MULTIPLIER;
>      id->lsa_size = cvc->get_lsa_size(ct3d);
>      id->poison_list_max_mer[1] = 0x1; /* 256 poison records */
>  
> @@ -444,14 +446,14 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
>      QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
>      uint64_t size = cxl_dstate->pmem_size;
>  
> -    if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
> +    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
>      /* PMEM only */
>      part_info->active_vmem = 0;
>      part_info->next_vmem = 0;
> -    part_info->active_pmem = size / (256 << 20);
> +    part_info->active_pmem = size / CXL_CAPACITY_MULTIPLIER;
>      part_info->next_pmem = 0;
>  
>      *len = sizeof(*part_info);


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
@ 2022-11-14 17:55       ` Jonathan Cameron via
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron via @ 2022-11-14 17:55 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, mst, marcel.apfelbaum, imammedo, ani,
	alison.schofield, dave, a.manzanares, bwidawsk, gregory.price,
	hchkuo, cbrowy, ira.weiny

On Tue, 25 Oct 2022 20:47:35 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> Remove usage of magic numbers when accessing capacity fields and replace
> with CXL_CAPACITY_MULTIPLIER, matching the kernel definition.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Makes sense. I'll add it to my local tree.

Depending on how timing works out, I might roll this into a wider series
of cleanups sent to Michael or we might want to keep this as part of your
series.

Jonathan
 
> ---
>  hw/cxl/cxl-mailbox-utils.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 3e23d29e2d..d7543fd5b4 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -15,6 +15,8 @@
>  #include "qemu/log.h"
>  #include "qemu/uuid.h"
>  
> +#define CXL_CAPACITY_MULTIPLIER   0x10000000 /* SZ_256M */
> +
>  /*
>   * How to add a new command, example. The command set FOO, with cmd BAR.
>   *  1. Add the command set and cmd to the enum.
> @@ -267,7 +269,7 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
>      } QEMU_PACKED *fw_info;
>      QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
>  
> -    if (cxl_dstate->pmem_size < (256 << 20)) {
> +    if (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
> @@ -412,7 +414,7 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
>      CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
>      uint64_t size = cxl_dstate->pmem_size;
>  
> -    if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
> +    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
> @@ -422,8 +424,8 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
>      /* PMEM only */
>      snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
>  
> -    id->total_capacity = size / (256 << 20);
> -    id->persistent_capacity = size / (256 << 20);
> +    id->total_capacity = size / CXL_CAPACITY_MULTIPLIER;
> +    id->persistent_capacity = size / CXL_CAPACITY_MULTIPLIER;
>      id->lsa_size = cvc->get_lsa_size(ct3d);
>      id->poison_list_max_mer[1] = 0x1; /* 256 poison records */
>  
> @@ -444,14 +446,14 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
>      QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
>      uint64_t size = cxl_dstate->pmem_size;
>  
> -    if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
> +    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
>      /* PMEM only */
>      part_info->active_vmem = 0;
>      part_info->next_vmem = 0;
> -    part_info->active_pmem = size / (256 << 20);
> +    part_info->active_pmem = size / CXL_CAPACITY_MULTIPLIER;
>      part_info->next_pmem = 0;
>  
>      *len = sizeof(*part_info);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
  2022-11-14 17:53       ` Jonathan Cameron via
  (?)
@ 2022-11-14 23:00       ` Gregory Price
  2022-11-17 13:53           ` Jonathan Cameron via
  -1 siblings, 1 reply; 23+ messages in thread
From: Gregory Price @ 2022-11-14 23:00 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Gregory Price, qemu-devel, linux-cxl, mst, marcel.apfelbaum,
	imammedo, ani, alison.schofield, dave, a.manzanares, bwidawsk,
	hchkuo, cbrowy, ira.weiny

On Mon, Nov 14, 2022 at 05:53:41PM +0000, Jonathan Cameron wrote:
> Hi Gregory,
> 
> I've not been rushing on this purely because we are after the feature
> freeze for this QEMU cycle so no great rush to line up new features
> (and there was some fun with the pull request the previous set of QEMU
> CXL features were in - unrelated to those patches).
> 
> A few comments inline.
> 
> Once I've chased down a segfault on power down of my qemu setup (that
> seems to have nothing to do with the CXL support. *sigh*) I'll push out
> an updated tree with this on it for testing purposes.
> 
> Thanks,
> 
> Jonathan
> 

All good, I've been wrapped up in other work.  Just ping me when you are
pushing a new branch and i'll gladly rebased and address the notes.

Regards
Gregory

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
  2022-11-14 23:00       ` Gregory Price
@ 2022-11-17 13:53           ` Jonathan Cameron via
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron @ 2022-11-17 13:53 UTC (permalink / raw)
  To: Gregory Price
  Cc: Gregory Price, qemu-devel, linux-cxl, mst, marcel.apfelbaum,
	imammedo, ani, alison.schofield, dave, a.manzanares, bwidawsk,
	hchkuo, cbrowy, ira.weiny

On Mon, 14 Nov 2022 18:00:59 -0500
Gregory Price <gregory.price@memverge.com> wrote:

> On Mon, Nov 14, 2022 at 05:53:41PM +0000, Jonathan Cameron wrote:
> > Hi Gregory,
> > 
> > I've not been rushing on this purely because we are after the feature
> > freeze for this QEMU cycle so no great rush to line up new features
> > (and there was some fun with the pull request the previous set of QEMU
> > CXL features were in - unrelated to those patches).
> > 
> > A few comments inline.
> > 
> > Once I've chased down a segfault on power down of my qemu setup (that
> > seems to have nothing to do with the CXL support. *sigh*) I'll push out
> > an updated tree with this on it for testing purposes.
> > 
> > Thanks,
> > 
> > Jonathan
> >   
> 
> All good, I've been wrapped up in other work.  Just ping me when you are
> pushing a new branch and i'll gladly rebased and address the notes.

https://gitlab.com/jic23/qemu/-/tree/cxl-2022-11-17

Has two patches from this series on top currently.  I'll switch those out
for your next version when available.

There is a segmentation fault on powering down the qemu VMs at the moment,
but it's not connected to the cxl code, but rather just memory backends
(happens with upstream, including when cxl is turned off and the memory
backends aren't being used).

I'm not currently carrying Ira's error injection series yet but otherwise this
has pretty much everything that is in flight.

Jonathan

> 
> Regards
> Gregory


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
@ 2022-11-17 13:53           ` Jonathan Cameron via
  0 siblings, 0 replies; 23+ messages in thread
From: Jonathan Cameron via @ 2022-11-17 13:53 UTC (permalink / raw)
  To: Gregory Price
  Cc: Gregory Price, qemu-devel, linux-cxl, mst, marcel.apfelbaum,
	imammedo, ani, alison.schofield, dave, a.manzanares, bwidawsk,
	hchkuo, cbrowy, ira.weiny

On Mon, 14 Nov 2022 18:00:59 -0500
Gregory Price <gregory.price@memverge.com> wrote:

> On Mon, Nov 14, 2022 at 05:53:41PM +0000, Jonathan Cameron wrote:
> > Hi Gregory,
> > 
> > I've not been rushing on this purely because we are after the feature
> > freeze for this QEMU cycle so no great rush to line up new features
> > (and there was some fun with the pull request the previous set of QEMU
> > CXL features were in - unrelated to those patches).
> > 
> > A few comments inline.
> > 
> > Once I've chased down a segfault on power down of my qemu setup (that
> > seems to have nothing to do with the CXL support. *sigh*) I'll push out
> > an updated tree with this on it for testing purposes.
> > 
> > Thanks,
> > 
> > Jonathan
> >   
> 
> All good, I've been wrapped up in other work.  Just ping me when you are
> pushing a new branch and i'll gladly rebased and address the notes.

https://gitlab.com/jic23/qemu/-/tree/cxl-2022-11-17

Has two patches from this series on top currently.  I'll switch those out
for your next version when available.

There is a segmentation fault on powering down the qemu VMs at the moment,
but it's not connected to the cxl code, but rather just memory backends
(happens with upstream, including when cxl is turned off and the memory
backends aren't being used).

I'm not currently carrying Ira's error injection series yet but otherwise this
has pretty much everything that is in flight.

Jonathan

> 
> Regards
> Gregory



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
  2022-11-14 17:53       ` Jonathan Cameron via
  (?)
  (?)
@ 2022-11-23 17:42       ` Gregory Price
  -1 siblings, 0 replies; 23+ messages in thread
From: Gregory Price @ 2022-11-23 17:42 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Gregory Price, qemu-devel, linux-cxl, mst, marcel.apfelbaum,
	imammedo, ani, alison.schofield, dave, a.manzanares, bwidawsk,
	hchkuo, cbrowy, ira.weiny

> > -  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
> > -  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
> > +  -object memory-backend-file,pmem=true,id=pmem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
> > +  -object memory-backend-file,pmem=true,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \
> 
> Why make the pmem=true change in here? If we want to do that I think it should be in a
> separate patch with explanation.
> 

this is mostly an observation that memory-backend's have a pmem option.
It was unclear to me that using this backend for a pmem region without
setting pmem=true was "correct", but i also am not sure it has a real
effect.  I'll drop this from the changeset.

> > +error_cleanup:
> > +    int i;
> > +    for (i = 0; i < cur_ent; i++) {
> > +        g_free(*cdat_table[i]);
> 
> Until the steal pointer above, *cdata_table not set to anything.
> Possibly gfree(table[i])?
> 
> 

good catch

> Hmm. I wonder if this is simpler done as below. Both look fine
> to me though so up to you for next version.  Trade off between
> slightly ugly nested logic, and the readability always lost when
> a ternary operator puts in an appearance.
> 
>     if (ct3d->hostvmem) {

this seems reasonable, pulled in

> 
> If we have both volatile and persistent and yet still only have our one HDM
> decoder, then I think a write into the persistent range will have the wrong offset.
> 
> dpa_offset == address space offset when we only had one region. Not so much any more.
> For persistent i think we'll need to subtract the size of the volatile region
> (possibly taking care with alignment - I need to check that).
>

I had originally done this, but it wasn't clear to me what was correct
here, I'll make the adjustment

> > diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
> > index 6baed747fa..a05ddc0c7b 100644
> > --- a/tests/qtest/cxl-test.c
> > +++ b/tests/qtest/cxl-test.c
> > @@ -34,29 +34,46 @@
> > -                  "-object memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "    \
> > -                  "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M "    \
> > -                  "-device cxl-type3,bus=rp3,memdev=cxl-mem3,lsa=lsa3,id=cxl-pmem3 "
> 
> If re-indenting makes sense (I'm really convinced it is worth the noise) then do it
> in a precusor no-op patch so we can more easily see what is new here.)
> 

can do

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-11-23 17:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20221026004835uscas1p1d37255ba8daaba8fc7e16e5129cb94b5@uscas1p1.samsung.com>
2022-10-26  0:47 ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
2022-10-26  0:47   ` [PATCH 1/4] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios Gregory Price
2022-10-26 20:33     ` Michael S. Tsirkin
2022-10-26  0:47   ` [PATCH 2/4] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition Gregory Price
2022-11-14 17:55     ` Jonathan Cameron
2022-11-14 17:55       ` Jonathan Cameron via
2022-10-26  0:47   ` [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent) Gregory Price
2022-11-14 17:53     ` Jonathan Cameron
2022-11-14 17:53       ` Jonathan Cameron via
2022-11-14 23:00       ` Gregory Price
2022-11-17 13:53         ` Jonathan Cameron
2022-11-17 13:53           ` Jonathan Cameron via
2022-11-23 17:42       ` Gregory Price
2022-10-26  0:47   ` [PATCH 4/4] hw/acpi/cxl.c: Fill in SRAT for vmem/pmem if NUMA node is assigned Gregory Price
2022-10-26 20:13   ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Adam Manzanares
2022-10-26 20:47     ` Gregory Price
2022-10-27 10:58       ` Jonathan Cameron
2022-10-27 10:58         ` Jonathan Cameron via
2022-10-27 14:29         ` Gregory Price
2022-10-27 18:10         ` Adam Manzanares
2022-10-26 20:15   ` Michael S. Tsirkin
2022-10-26 20:20   ` Michael S. Tsirkin
2022-10-26 20:48     ` Gregory Price

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.