All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 0/4] ppc: spapr: virtual NVDIMM support
@ 2019-02-06  5:24 Shivaprasad G Bhat
  2019-02-06  5:25 ` [Qemu-devel] [RFC PATCH 1/4] mem: make nvdimm_device_list global Shivaprasad G Bhat
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-06  5:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiaoguangrong.eric, sbhat, mst, bharata, qemu-ppc, vaibhav,
	imammedo, david

The patchset attempts to implement the virtual NVDIMM for pseries.

PAPR semantics is such that each NVDIMM device is comprising of multiple
SCM(Storage Class Memory) blocks. The hypervisor is expected to prepare the
FDT for the NVDIMM device and send guest a hotplug interrupt with new type 
RTAS_LOG_V6_HP_TYPE_PMEM currently handled by the upstream kernel. In response
to that interrupt, the guest requests the hypervisor to bind each of the SCM
blocks of the NVDIMM device using hcalls. There can be SCM block unbind
requests in case of driver errors or unplug(not supported now) use cases. The
NVDIMM label read/writes are done through hcalls.

There are also new futuristic hcalls added(currently unused in the kernel), for
querying the informations such as binding, logical addresses of the SCM blocks.
The current patchset leaves them unimplemented.

Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
unbind, and queries using hcalls on those blocks can come independently. This
doesnt fit well into the qemu device semantics, where the map/unmap are done at
the (whole)device/object level granularity. The patchset uses the existing
NVDIMM class structures for the implementation. The bind/unbind is left to
happen at the object_add/del phase itself instead of at hcalls on-demand.

The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
region level granularity. Without interleaving, each virtual NVDIMM device is
presented as separate region. There is no way to configure the virtual NVDIMM
interleaving for the guests today. So, there is no way a partial bind/unbind
request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
object_add/del.

The free device-memory region which is used for memory hotplug are done using
multiple LMBs of size(256MiB) and are expected to be aligned to 256 MiB. As the
SCM blocks are mapped to the same region, the SCM blocks also need to be
aligned to this size for the subsequent memory hotplug to work. The minimum SCM
block size is set to this size for that reason and can be made user configurable
in future if required.

The first patch moves around the existing static function to common area
for using it in the subsequent patches. The second patch implements
memory_device_set_region_size, for which we already have "get" implementation.
The remaining two, one of them adds the FDT entries and basic device support,
the other adds the hcalls implementation.

The patches are also available at https://github.com/ShivaprasadGBhat/qemu.git -
pseries-nvdimm branch and can be used with the upstream kernel. ndctl can be
used for configuring the nvdimms inside the guest.

This is how it can be used ..
Add nvdimm=on to the qemu machine argument,
Ex : -machine pseries,nvdimm=on
For coldplug, the device to be added in qemu command line as shown below
-object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0
-device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0

For hotplug, the device to be added from monitor as below
object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdi
device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0

---

Shivaprasad G Bhat (4):
      mem: make nvdimm_device_list global
      mem: implement memory_device_set_region_size
      spapr: Add NVDIMM device support
      spapr: Add Hcalls to support PAPR NVDIMM device


 default-configs/ppc64-softmmu.mak |    1 
 hw/acpi/nvdimm.c                  |   27 ----
 hw/mem/memory-device.c            |   15 ++
 hw/mem/nvdimm.c                   |   27 ++++
 hw/ppc/spapr.c                    |  212 ++++++++++++++++++++++++++++++++--
 hw/ppc/spapr_drc.c                |   17 +++
 hw/ppc/spapr_events.c             |    4 +
 hw/ppc/spapr_hcall.c              |  230 +++++++++++++++++++++++++++++++++++++
 include/hw/mem/memory-device.h    |    2 
 include/hw/mem/nvdimm.h           |    2 
 include/hw/ppc/spapr.h            |   20 +++
 include/hw/ppc/spapr_drc.h        |    9 +
 12 files changed, 526 insertions(+), 40 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 1/4] mem: make nvdimm_device_list global
  2019-02-06  5:24 [Qemu-devel] [RFC PATCH 0/4] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
@ 2019-02-06  5:25 ` Shivaprasad G Bhat
  2019-02-19  7:59   ` Igor Mammedov
  2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 2/4] mem: implement memory_device_set_region_size Shivaprasad G Bhat
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-06  5:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiaoguangrong.eric, sbhat, mst, bharata, qemu-ppc, vaibhav,
	imammedo, david

nvdimm_device_list is required for parsing the list for devices
in subsequent patches. Move it to common area.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
---
 hw/acpi/nvdimm.c        |   27 ---------------------------
 hw/mem/nvdimm.c         |   27 +++++++++++++++++++++++++++
 include/hw/mem/nvdimm.h |    2 ++
 3 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index e53b2cb681..34322298c2 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -33,33 +33,6 @@
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
 
-static int nvdimm_device_list(Object *obj, void *opaque)
-{
-    GSList **list = opaque;
-
-    if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
-        *list = g_slist_append(*list, DEVICE(obj));
-    }
-
-    object_child_foreach(obj, nvdimm_device_list, opaque);
-    return 0;
-}
-
-/*
- * inquire NVDIMM devices and link them into the list which is
- * returned to the caller.
- *
- * Note: it is the caller's responsibility to free the list to avoid
- * memory leak.
- */
-static GSList *nvdimm_get_device_list(void)
-{
-    GSList *list = NULL;
-
-    object_child_foreach(qdev_get_machine(), nvdimm_device_list, &list);
-    return list;
-}
-
 #define NVDIMM_UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)             \
    { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
      (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,          \
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index bf2adf5e16..f221ec7a9a 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -29,6 +29,33 @@
 #include "hw/mem/nvdimm.h"
 #include "hw/mem/memory-device.h"
 
+static int nvdimm_device_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
+        *list = g_slist_append(*list, DEVICE(obj));
+    }
+
+    object_child_foreach(obj, nvdimm_device_list, opaque);
+    return 0;
+}
+
+/*
+ * inquire NVDIMM devices and link them into the list which is
+ * returned to the caller.
+ *
+ * Note: it is the caller's responsibility to free the list to avoid
+ * memory leak.
+ */
+GSList *nvdimm_get_device_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(qdev_get_machine(), nvdimm_device_list, &list);
+    return list;
+}
+
 static void nvdimm_get_label_size(Object *obj, Visitor *v, const char *name,
                                   void *opaque, Error **errp)
 {
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index c5c9b3c7f8..e8b086f2df 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -150,4 +150,6 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
                        uint32_t ram_slots);
 void nvdimm_plug(AcpiNVDIMMState *state);
 void nvdimm_acpi_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev);
+GSList *nvdimm_get_device_list(void);
+
 #endif

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 2/4] mem: implement memory_device_set_region_size
  2019-02-06  5:24 [Qemu-devel] [RFC PATCH 0/4] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
  2019-02-06  5:25 ` [Qemu-devel] [RFC PATCH 1/4] mem: make nvdimm_device_list global Shivaprasad G Bhat
@ 2019-02-06  5:26 ` Shivaprasad G Bhat
  2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support Shivaprasad G Bhat
  2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 4/4] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
  3 siblings, 0 replies; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-06  5:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiaoguangrong.eric, sbhat, mst, bharata, qemu-ppc, vaibhav,
	imammedo, david

Required for PAPR NVDIMM implementation. Need memory_device_set_region_size
for aligning the size to the SCM block size.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
---
 hw/mem/memory-device.c         |   15 +++++++++++++++
 include/hw/mem/memory-device.h |    2 ++
 2 files changed, 17 insertions(+)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 5f2c408036..ad0419e203 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -330,6 +330,21 @@ uint64_t memory_device_get_region_size(const MemoryDeviceState *md,
     return memory_region_size(mr);
 }
 
+void memory_device_set_region_size(const MemoryDeviceState *md,
+                                   uint64_t size, Error **errp)
+{
+    const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
+    MemoryRegion *mr;
+
+    /* dropping const here is fine as we don't touch the memory region */
+    mr = mdc->get_memory_region((MemoryDeviceState *)md, errp);
+    if (!mr) {
+        return;
+    }
+
+    memory_region_set_size(mr, size);
+}
+
 static const TypeInfo memory_device_info = {
     .name          = TYPE_MEMORY_DEVICE,
     .parent        = TYPE_INTERFACE,
diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
index 0293a96abb..ba9b72fd28 100644
--- a/include/hw/mem/memory-device.h
+++ b/include/hw/mem/memory-device.h
@@ -103,5 +103,7 @@ void memory_device_plug(MemoryDeviceState *md, MachineState *ms);
 void memory_device_unplug(MemoryDeviceState *md, MachineState *ms);
 uint64_t memory_device_get_region_size(const MemoryDeviceState *md,
                                        Error **errp);
+void memory_device_set_region_size(const MemoryDeviceState *md,
+                                   uint64_t size, Error **errp);
 
 #endif

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-06  5:24 [Qemu-devel] [RFC PATCH 0/4] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
  2019-02-06  5:25 ` [Qemu-devel] [RFC PATCH 1/4] mem: make nvdimm_device_list global Shivaprasad G Bhat
  2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 2/4] mem: implement memory_device_set_region_size Shivaprasad G Bhat
@ 2019-02-06  5:26 ` Shivaprasad G Bhat
  2019-02-12  1:49   ` David Gibson
  2019-02-19  8:11   ` Igor Mammedov
  2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 4/4] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
  3 siblings, 2 replies; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-06  5:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiaoguangrong.eric, sbhat, mst, bharata, qemu-ppc, vaibhav,
	imammedo, david

Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
device interface in QEMU to support virtual NVDIMM devices for Power (May have
to re-look at this later).  Create the required DT entries for the
device (some entries have dummy values right now).

The patch creates the required DT node and sends a hotplug
interrupt to the guest. Guest is expected to undertake the normal
DR resource add path in response and start issuing PAPR SCM hcalls.

This is how it can be used ..
Add nvdimm=on to the qemu machine argument.
Ex : -machine pseries,nvdimm=on
For coldplug, the device to be added in qemu command line as shown below
-object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
-device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0

For hotplug, the device to be added from monitor as below
object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
               [Early implementation]
---
 default-configs/ppc64-softmmu.mak |    1 
 hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
 hw/ppc/spapr_drc.c                |   17 +++
 hw/ppc/spapr_events.c             |    4 +
 include/hw/ppc/spapr.h            |   10 ++
 include/hw/ppc/spapr_drc.h        |    9 ++
 6 files changed, 241 insertions(+), 12 deletions(-)

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 7f34ad0528..b6e1aa5125 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
 CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
 CONFIG_MEM_DEVICE=y
 CONFIG_DIMM=y
+CONFIG_NVDIMM=y
 CONFIG_SPAPR_RNG=y
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0fcdd35cbe..7e7a1a8041 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -73,6 +73,7 @@
 #include "qemu/cutils.h"
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/mem/memory-device.h"
+#include "hw/mem/nvdimm.h"
 
 #include <libfdt.h>
 
@@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
     uint8_t *int_buf, *cur_index, buf_len;
     int ret;
     uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
+    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
     uint64_t addr, cur_addr, size;
     uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
     uint64_t mem_end = machine->device_memory->base +
@@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
             nr_entries++;
         }
 
-        /* Entry for DIMM */
-        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
-        g_assert(drc);
-        elem = spapr_get_drconf_cell(size / lmb_size, addr,
-                                     spapr_drc_index(drc), node,
-                                     SPAPR_LMB_FLAGS_ASSIGNED);
+        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
+            /* Entry for NVDIMM */
+            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
+            g_assert(drc);
+            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
+                                         spapr_drc_index(drc), -1, 0);
+            cur_addr = ROUND_UP(addr + size, scm_block_size);
+        } else {
+            /* Entry for DIMM */
+            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
+            g_assert(drc);
+            elem = spapr_get_drconf_cell(size / lmb_size, addr,
+                                         spapr_drc_index(drc), node,
+                                         SPAPR_LMB_FLAGS_ASSIGNED);
+            cur_addr = addr + size;
+        }
         QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
         nr_entries++;
-        cur_addr = addr + size;
     }
 
     /* Entry for remaining hotpluggable area */
@@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
     }
 }
 
+static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
+                                      uint32_t node, uint64_t addr,
+                                      uint64_t size, uint64_t label_size);
+static void spapr_create_nvdimm(void *fdt)
+{
+    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
+    GSList *dimms = NULL;
+
+    if (offset < 0) {
+        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
+        _FDT(offset);
+        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
+        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
+        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));
+        _FDT((fdt_setprop_string(fdt, offset, "device_type",
+                                 "ibm,persistent-memory")));
+    }
+
+    /*NB : Add drc-info array here */
+
+    /* Create DT entries for cold plugged NVDIMM devices */
+    dimms = nvdimm_get_device_list();
+    for (; dimms; dimms = dimms->next) {
+        NVDIMMDevice *nvdimm = dimms->data;
+        PCDIMMDevice *di = PC_DIMM(nvdimm);
+        uint64_t lsize = nvdimm->label_size;
+        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
+                                           NULL);
+
+        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
+                                   size, lsize);
+    }
+    g_slist_free(dimms);
+    return;
+}
+
 static void *spapr_build_fdt(sPAPRMachineState *spapr)
 {
     MachineState *machine = MACHINE(spapr);
@@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
         exit(1);
     }
 
+    /* NVDIMM devices */
+    if (spapr->nvdimm_enabled) {
+        spapr_create_nvdimm(fdt);
+    }
+
     return fdt;
 }
 
@@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
     }
 }
 
+static bool spapr_get_nvdimm(Object *obj, Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    return spapr->nvdimm_enabled;
+}
+
+static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    spapr->nvdimm_enabled = value;
+}
+
 static void spapr_instance_init(Object *obj)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
@@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
     object_property_set_description(obj, "ic-mode",
                  "Specifies the interrupt controller mode (xics, xive, dual)",
                  NULL);
+    object_property_add_bool(obj, "nvdimm",
+                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
+    object_property_set_description(obj, "nvdimm",
+                                    "Enable support for nvdimm devices",
+                                    NULL);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
@@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
     }
 }
 
+static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
+                                      uint64_t addr, uint64_t size,
+                                      uint64_t label_size)
+{
+    int offset;
+    char buf[40];
+    GString *lcode = g_string_sized_new(10);
+    sPAPRDRConnector *drc;
+    QemuUUID uuid;
+    uint32_t drc_idx;
+    uint32_t associativity[] = {
+        cpu_to_be32(0x4), /* length */
+        cpu_to_be32(0x0), cpu_to_be32(0x0),
+        cpu_to_be32(0x0), cpu_to_be32(node)
+    };
+
+    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
+                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
+    g_assert(drc);
+
+    drc_idx = spapr_drc_index(drc);
+
+    sprintf(buf, "pmem@%x", drc_idx);
+    offset = fdt_add_subnode(fdt, fdt_offset, buf);
+    _FDT(offset);
+
+    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
+    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
+    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));
+    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
+
+    /*NB : Supposed to be random strings. Currently empty 10 strings! */
+    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
+    g_string_free(lcode, TRUE);
+
+    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
+                      sizeof(associativity))));
+    g_random_set_seed(drc_idx);
+    qemu_uuid_generate(&uuid);
+
+    qemu_uuid_unparse(&uuid, buf);
+    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
+
+    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
+
+    /*NB : What it should be? */
+    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
+
+    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
+                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
+    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
+                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
+    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
+
+    return offset;
+}
+
+static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
+                             uint64_t size, uint32_t node,
+                             Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
+    sPAPRDRConnector *drc;
+    bool hotplugged = spapr_drc_hotplugged(dev);
+    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
+    void *fdt;
+    int fdt_offset, fdt_size;
+    Error *local_err = NULL;
+
+    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
+                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
+    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
+                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
+    g_assert(drc);
+
+    fdt = create_device_tree(&fdt_size);
+    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
+                                            size, nvdimm->label_size);
+
+    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    if (hotplugged) {
+        spapr_hotplug_req_add_by_index(drc);
+    }
+}
+
 static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                               Error **errp)
 {
     Error *local_err = NULL;
     sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
     PCDIMMDevice *dimm = PC_DIMM(dev);
+    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
     uint64_t size, addr;
     uint32_t node;
 
@@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
 
     node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
                                     &error_abort);
-    spapr_add_lmbs(dev, addr, size, node,
-                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
-                   &local_err);
+    if (!is_nvdimm) {
+        spapr_add_lmbs(dev, addr, size, node,
+                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
+                       &local_err);
+    } else {
+        spapr_add_nvdimm(dev, addr, size, node, &local_err);
+    }
+
     if (local_err) {
         goto out_unplug;
     }
@@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
 {
     const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
     sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
+    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
     PCDIMMDevice *dimm = PC_DIMM(dev);
     Error *local_err = NULL;
     uint64_t size;
@@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         return;
     }
 
-    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
+    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
         error_setg(errp, "Hotplugged memory size must be a multiple of "
-                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
+                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
         return;
+    } else if (is_nvdimm) {
+        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
+        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
+            error_setg(errp, "NVDIMM memory size must be a multiple of "
+                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
+            return;
+        }
+        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
+            error_setg(errp, "NVDIMM size must be atleast "
+                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
+            return;
+        }
+
+        /* Align to scm block size, exclude the label */
+        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
+               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
     }
 
     memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 2edb7d1e9c..94ddd102cc 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -696,6 +696,16 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
     drck->release = spapr_lmb_release;
 }
 
+static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
+{
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
+
+    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
+    drck->typename = "MEM";
+    drck->drc_name_prefix = "PMEM ";
+    drck->release = NULL;
+}
+
 static const TypeInfo spapr_dr_connector_info = {
     .name          = TYPE_SPAPR_DR_CONNECTOR,
     .parent        = TYPE_DEVICE,
@@ -739,6 +749,12 @@ static const TypeInfo spapr_drc_lmb_info = {
     .class_init    = spapr_drc_lmb_class_init,
 };
 
+static const TypeInfo spapr_drc_pmem_info = {
+    .name          = TYPE_SPAPR_DRC_PMEM,
+    .parent        = TYPE_SPAPR_DRC_LOGICAL,
+    .class_init    = spapr_drc_pmem_class_init,
+};
+
 /* helper functions for external users */
 
 sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
@@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void)
     type_register_static(&spapr_drc_cpu_info);
     type_register_static(&spapr_drc_pci_info);
     type_register_static(&spapr_drc_lmb_info);
+    type_register_static(&spapr_drc_pmem_info);
 
     spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
                         rtas_set_indicator);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 32719a1b72..a4fed84346 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
 #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
 #define RTAS_LOG_V6_HP_TYPE_PHB                          4
 #define RTAS_LOG_V6_HP_TYPE_PCI                          5
+#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
     uint8_t hotplug_action;
 #define RTAS_LOG_V6_HP_ACTION_ADD                        1
 #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
@@ -526,6 +527,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     case SPAPR_DR_CONNECTOR_TYPE_CPU:
         hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
         break;
+    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
+        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
+        break;
     default:
         /* we shouldn't be signaling hotplug events for resources
          * that don't support them
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a947a0a0dc..21a9709afe 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -187,6 +187,7 @@ struct sPAPRMachineState {
 
     bool cmd_line_caps[SPAPR_CAP_NUM];
     sPAPRCapabilities def, eff, mig;
+    bool nvdimm_enabled;
 };
 
 #define H_SUCCESS         0
@@ -798,6 +799,15 @@ int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
 #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
 #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
 
+/*
+ * The nvdimm size should be aligned to SCM block size.
+ * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
+ * inorder to have SCM regions not to overlap with dimm memory regions.
+ * The SCM devices can have variable block sizes. For now, fixing the
+ * block size to the minimum value.
+ */
+#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
+
 void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
 
 #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index f6ff32e7e2..65925d00b1 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -70,6 +70,13 @@
 #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
                                         TYPE_SPAPR_DRC_LMB)
 
+#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
+#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PMEM)
+#define SPAPR_DRC_PMEM_CLASS(klass) \
+        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PMEM)
+#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
+                                        TYPE_SPAPR_DRC_PMEM)
 /*
  * Various hotplug types managed by sPAPRDRConnector
  *
@@ -87,6 +94,7 @@ typedef enum {
     SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
     SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
     SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
 } sPAPRDRConnectorTypeShift;
 
 typedef enum {
@@ -96,6 +104,7 @@ typedef enum {
     SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
     SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
     SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
+    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
 } sPAPRDRConnectorType;
 
 /*

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 4/4] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-02-06  5:24 [Qemu-devel] [RFC PATCH 0/4] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
                   ` (2 preceding siblings ...)
  2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support Shivaprasad G Bhat
@ 2019-02-06  5:26 ` Shivaprasad G Bhat
  2019-02-12  2:28   ` David Gibson
  3 siblings, 1 reply; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-06  5:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiaoguangrong.eric, sbhat, mst, bharata, qemu-ppc, vaibhav,
	imammedo, david

This patch implements few of the necessary hcalls for the nvdimm support.

PAPR semantics is such that each NVDIMM device is comprising of multiple
SCM(Storage Class Memory) blocks. The guest requests the hypervisor to bind
each of the SCM blocks of the NVDIMM device using hcalls. There can be
SCM block unbind requests in case of driver errors or unplug(not supported now)
use cases. The NVDIMM label read/writes are done through hcalls.

Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
unbind, and queries using hcalls on those blocks can come independently. This
doesn't fit well into the qemu device semantics, where the map/unmap are done
at the (whole)device/object level granularity. The patch doesnt actually
bind/unbind on hcalls but let it happen at the object_add/del phase itself
instead.

The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
region level granularity. Without interleaving, each virtual NVDIMM device is
presented as separate region. There is no way to configure the virtual NVDIMM
interleaving for the guests today. So, there is no way a partial bind/unbind
request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
object_add/del.

The kernel today is not using the hcalls - h_scm_mem_query, h_scm_mem_clear,
h_scm_query_logical_mem_binding and h_scm_query_block_mem_binding. They are just
stubs in this patch.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
---
 hw/ppc/spapr_hcall.c   |  230 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |   12 ++-
 2 files changed, 240 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 17bcaa3822..40553e80d6 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -3,11 +3,13 @@
 #include "sysemu/hw_accel.h"
 #include "sysemu/sysemu.h"
 #include "qemu/log.h"
+#include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "helper_regs.h"
 #include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_drc.h"
 #include "hw/ppc/spapr_cpu_core.h"
 #include "mmu-hash64.h"
 #include "cpu-models.h"
@@ -16,6 +18,7 @@
 #include "hw/ppc/spapr_ovec.h"
 #include "mmu-book3s-v3.h"
 #include "hw/mem/memory-device.h"
+#include "hw/mem/nvdimm.h"
 
 struct LPCRSyncState {
     target_ulong value;
@@ -1808,6 +1811,222 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     return H_SUCCESS;
 }
 
+static target_ulong h_scm_read_metadata(PowerPCCPU *cpu,
+                                        sPAPRMachineState *spapr,
+                                        target_ulong opcode,
+                                        target_ulong *args)
+{
+    uint32_t drc_index = args[0];
+    uint64_t offset = args[1];
+    uint8_t numBytesToRead = args[2];
+    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
+    NVDIMMDevice *nvdimm = NULL;
+    NVDIMMClass *ddc = NULL;
+
+    if (numBytesToRead != 1 && numBytesToRead != 2 &&
+        numBytesToRead != 4 && numBytesToRead != 8) {
+        return H_P3;
+    }
+
+    if (offset & (numBytesToRead - 1)) {
+        return H_P2;
+    }
+
+    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+        return H_PARAMETER;
+    }
+
+    nvdimm = NVDIMM(drc->dev);
+    ddc = NVDIMM_GET_CLASS(nvdimm);
+
+    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);
+
+    return H_SUCCESS;
+}
+
+
+static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
+                                         sPAPRMachineState *spapr,
+                                         target_ulong opcode,
+                                         target_ulong *args)
+{
+    uint32_t drc_index = args[0];
+    uint64_t offset = args[1];
+    uint64_t data = args[2];
+    int8_t numBytesToWrite = args[3];
+    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
+    NVDIMMDevice *nvdimm = NULL;
+    DeviceState *dev = NULL;
+    NVDIMMClass *ddc = NULL;
+
+    if (numBytesToWrite != 1 && numBytesToWrite != 2 &&
+        numBytesToWrite != 4 && numBytesToWrite != 8) {
+        return H_P4;
+    }
+
+    if (offset & (numBytesToWrite - 1)) {
+        return H_P2;
+    }
+
+    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+        return H_PARAMETER;
+    }
+
+    dev = drc->dev;
+    nvdimm = NVDIMM(dev);
+    if (offset >= nvdimm->label_size) {
+        return H_P3;
+    }
+
+    ddc = NVDIMM_GET_CLASS(nvdimm);
+
+    ddc->write_label_data(nvdimm, &data, numBytesToWrite, offset);
+
+    return H_SUCCESS;
+}
+
+static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, sPAPRMachineState *spapr,
+                                        target_ulong opcode,
+                                        target_ulong *args)
+{
+    uint32_t drc_index = args[0];
+    uint64_t starting_index = args[1];
+    uint64_t no_of_scm_blocks_to_bind = args[2];
+    uint64_t target_logical_mem_addr = args[3];
+    uint64_t continue_token = args[4];
+    uint64_t size;
+    uint64_t total_no_of_scm_blocks;
+
+    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
+    hwaddr addr;
+    DeviceState *dev = NULL;
+    PCDIMMDevice *dimm = NULL;
+    Error *local_err = NULL;
+
+    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+        return H_PARAMETER;
+    }
+
+    dev = drc->dev;
+    dimm = PC_DIMM(dev);
+
+    size = object_property_get_uint(OBJECT(dimm),
+                                    PC_DIMM_SIZE_PROP, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return H_PARAMETER;
+    }
+
+    total_no_of_scm_blocks = size / SPAPR_MINIMUM_SCM_BLOCK_SIZE;
+
+    if (starting_index > total_no_of_scm_blocks) {
+        return H_P2;
+    }
+
+    if ((starting_index + no_of_scm_blocks_to_bind) > total_no_of_scm_blocks) {
+        return H_P3;
+    }
+
+    /* Currently qemu assigns the address. */
+    if (target_logical_mem_addr != 0xffffffffffffffff) {
+        return H_OVERLAP;
+    }
+
+    /*
+     * Currently continue token should be zero qemu has already bound
+     * everything and this hcall doesnt return H_BUSY.
+     */
+    if (continue_token > 0) {
+        return H_P5;
+    }
+
+    /* NB : Already bound, Return target logical address in R4 */
+    addr = object_property_get_uint(OBJECT(dimm),
+                                    PC_DIMM_ADDR_PROP, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return H_PARAMETER;
+    }
+
+    args[1] = addr;
+
+    return H_SUCCESS;
+}
+
+static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, sPAPRMachineState *spapr,
+                                        target_ulong opcode,
+                                        target_ulong *args)
+{
+    uint64_t starting_scm_logical_addr = args[0];
+    uint64_t no_of_scm_blocks_to_unbind = args[1];
+    uint64_t size_to_unbind;
+    uint64_t continue_token = args[2];
+    Range as = range_empty;
+    GSList *dimms = NULL;
+    bool valid = false;
+
+    size_to_unbind = no_of_scm_blocks_to_unbind * SPAPR_MINIMUM_SCM_BLOCK_SIZE;
+
+    /* Check if starting_scm_logical_addr is block aligned */
+    if (!QEMU_IS_ALIGNED(starting_scm_logical_addr,
+                         SPAPR_MINIMUM_SCM_BLOCK_SIZE)) {
+        return H_PARAMETER;
+    }
+
+    range_init_nofail(&as, starting_scm_logical_addr, size_to_unbind);
+
+    dimms = nvdimm_get_device_list();
+    for (; dimms; dimms = dimms->next) {
+        NVDIMMDevice *nvdimm = dimms->data;
+        Range tmp;
+        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
+                                           NULL);
+        int addr = object_property_get_int(OBJECT(nvdimm), PC_DIMM_ADDR_PROP,
+                                           NULL);
+        range_init_nofail(&tmp, addr, size);
+
+        if (range_contains_range(&tmp, &as)) {
+            valid = true;
+            break;
+        }
+    }
+
+    if (!valid) {
+        return H_P2;
+    }
+
+    if (continue_token > 0) {
+        return H_P3;
+    }
+
+    /*NB : dont do anything, let object_del take care of this for now. */
+
+    return H_SUCCESS;
+}
+
+static target_ulong h_scm_query_block_mem_binding(PowerPCCPU *cpu,
+                                                  sPAPRMachineState *spapr,
+                                                  target_ulong opcode,
+                                                  target_ulong *args)
+{
+    return H_SUCCESS;
+}
+
+static target_ulong h_scm_query_logical_mem_binding(PowerPCCPU *cpu,
+                                                    sPAPRMachineState *spapr,
+                                                    target_ulong opcode,
+                                                    target_ulong *args)
+{
+    return H_SUCCESS;
+}
+
+static target_ulong h_scm_mem_query(PowerPCCPU *cpu, sPAPRMachineState *spapr,
+                                        target_ulong opcode,
+                                        target_ulong *args)
+{
+    return H_SUCCESS;
+}
+
 static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
 static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
 
@@ -1907,6 +2126,17 @@ static void hypercall_register_types(void)
     /* qemu/KVM-PPC specific hcalls */
     spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
 
+    /* qemu/scm specific hcalls */
+    spapr_register_hypercall(H_SCM_READ_METADATA, h_scm_read_metadata);
+    spapr_register_hypercall(H_SCM_WRITE_METADATA, h_scm_write_metadata);
+    spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
+    spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
+    spapr_register_hypercall(H_SCM_QUERY_BLOCK_MEM_BINDING,
+                             h_scm_query_block_mem_binding);
+    spapr_register_hypercall(H_SCM_QUERY_LOGICAL_MEM_BINDING,
+                             h_scm_query_logical_mem_binding);
+    spapr_register_hypercall(H_SCM_MEM_QUERY, h_scm_mem_query);
+
     /* ibm,client-architecture-support support */
     spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 21a9709afe..28249567f4 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -268,6 +268,7 @@ struct sPAPRMachineState {
 #define H_P7              -60
 #define H_P8              -61
 #define H_P9              -62
+#define H_OVERLAP         -68
 #define H_UNSUPPORTED_FLAG -256
 #define H_MULTI_THREADS_ACTIVE -9005
 
@@ -473,8 +474,15 @@ struct sPAPRMachineState {
 #define H_INT_ESB               0x3C8
 #define H_INT_SYNC              0x3CC
 #define H_INT_RESET             0x3D0
-
-#define MAX_HCALL_OPCODE        H_INT_RESET
+#define H_SCM_READ_METADATA     0x3E4
+#define H_SCM_WRITE_METADATA     0x3E8
+#define H_SCM_BIND_MEM          0x3EC
+#define H_SCM_UNBIND_MEM        0x3F0
+#define H_SCM_QUERY_BLOCK_MEM_BINDING 0x3F4
+#define H_SCM_QUERY_LOGICAL_MEM_BINDING 0x3F8
+#define H_SCM_MEM_QUERY         0x3FC
+
+#define MAX_HCALL_OPCODE        H_SCM_MEM_QUERY
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support Shivaprasad G Bhat
@ 2019-02-12  1:49   ` David Gibson
  2019-02-15 11:11     ` Shivaprasad G Bhat
  2019-02-19  8:11   ` Igor Mammedov
  1 sibling, 1 reply; 19+ messages in thread
From: David Gibson @ 2019-02-12  1:49 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav,
	imammedo

[-- Attachment #1: Type: text/plain, Size: 21176 bytes --]

On Tue, Feb 05, 2019 at 11:26:27PM -0600, Shivaprasad G Bhat wrote:
> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
> device interface in QEMU to support virtual NVDIMM devices for Power (May have
> to re-look at this later).  Create the required DT entries for the
> device (some entries have dummy values right now).
> 
> The patch creates the required DT node and sends a hotplug
> interrupt to the guest. Guest is expected to undertake the normal
> DR resource add path in response and start issuing PAPR SCM hcalls.
> 
> This is how it can be used ..
> Add nvdimm=on to the qemu machine argument.
> Ex : -machine pseries,nvdimm=on
> For coldplug, the device to be added in qemu command line as shown below
> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> -device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> 
> For hotplug, the device to be added from monitor as below
> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> 
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
>                [Early implementation]
> ---
>  default-configs/ppc64-softmmu.mak |    1 
>  hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
>  hw/ppc/spapr_drc.c                |   17 +++
>  hw/ppc/spapr_events.c             |    4 +
>  include/hw/ppc/spapr.h            |   10 ++
>  include/hw/ppc/spapr_drc.h        |    9 ++
>  6 files changed, 241 insertions(+), 12 deletions(-)
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 7f34ad0528..b6e1aa5125 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
>  CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_MEM_DEVICE=y
>  CONFIG_DIMM=y
> +CONFIG_NVDIMM=y
>  CONFIG_SPAPR_RNG=y
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0fcdd35cbe..7e7a1a8041 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -73,6 +73,7 @@
>  #include "qemu/cutils.h"
>  #include "hw/ppc/spapr_cpu_core.h"
>  #include "hw/mem/memory-device.h"
> +#include "hw/mem/nvdimm.h"
>  
>  #include <libfdt.h>
>  
> @@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>      uint8_t *int_buf, *cur_index, buf_len;
>      int ret;
>      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>      uint64_t addr, cur_addr, size;
>      uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
>      uint64_t mem_end = machine->device_memory->base +
> @@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>              nr_entries++;
>          }
>  
> -        /* Entry for DIMM */
> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> -        g_assert(drc);
> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
> -                                     spapr_drc_index(drc), node,
> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> +            /* Entry for NVDIMM */
> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
> +            g_assert(drc);
> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
> +                                         spapr_drc_index(drc), -1, 0);
> +            cur_addr = ROUND_UP(addr + size, scm_block_size);
> +        } else {
> +            /* Entry for DIMM */
> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> +            g_assert(drc);
> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
> +                                         spapr_drc_index(drc), node,
> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
> +            cur_addr = addr + size;
> +        }
>          QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>          nr_entries++;
> -        cur_addr = addr + size;
>      }
>  
>      /* Entry for remaining hotpluggable area */
> @@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
>      }
>  }
>  
> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
> +                                      uint32_t node, uint64_t addr,
> +                                      uint64_t size, uint64_t label_size);

Re-ordering the code is generally preferred to static forward declarations.

> +static void spapr_create_nvdimm(void *fdt)

I'm trying to standardize on spapr_dt_*() for functions which generate
bits of the device tree.

> +{
> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
> +    GSList *dimms = NULL;
> +
> +    if (offset < 0) {
> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
> +        _FDT(offset);
> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
> +        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));

No need to explicitly set the "name" property, that's implicit in the
node name.

> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
> +                                 "ibm,persistent-memory")));
> +    }
> +
> +    /*NB : Add drc-info array here */
> +
> +    /* Create DT entries for cold plugged NVDIMM devices */
> +    dimms = nvdimm_get_device_list();
> +    for (; dimms; dimms = dimms->next) {
> +        NVDIMMDevice *nvdimm = dimms->data;
> +        PCDIMMDevice *di = PC_DIMM(nvdimm);
> +        uint64_t lsize = nvdimm->label_size;
> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
> +                                           NULL);
> +
> +        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
> +                                   size, lsize);

It might be cleaner to just pass the NVDIMMDevice * rather than
umpteen parameters.

> +    }
> +    g_slist_free(dimms);
> +    return;
> +}
> +
>  static void *spapr_build_fdt(sPAPRMachineState *spapr)
>  {
>      MachineState *machine = MACHINE(spapr);
> @@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
>          exit(1);
>      }
>  
> +    /* NVDIMM devices */
> +    if (spapr->nvdimm_enabled) {
> +        spapr_create_nvdimm(fdt);
> +    }
> +
>      return fdt;
>  }
>  
> @@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
>      }
>  }
>  
> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    return spapr->nvdimm_enabled;
> +}
> +
> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    spapr->nvdimm_enabled = value;
> +}
> +
>  static void spapr_instance_init(Object *obj)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> @@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
>      object_property_set_description(obj, "ic-mode",
>                   "Specifies the interrupt controller mode (xics, xive, dual)",
>                   NULL);
> +    object_property_add_bool(obj, "nvdimm",
> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
> +    object_property_set_description(obj, "nvdimm",
> +                                    "Enable support for nvdimm devices",
> +                                    NULL);

I'm not seeing a lot of point to this machine parameter.

>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> @@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
>      }
>  }
>  
> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
> +                                      uint64_t addr, uint64_t size,
> +                                      uint64_t label_size)
> +{
> +    int offset;
> +    char buf[40];
> +    GString *lcode = g_string_sized_new(10);
> +    sPAPRDRConnector *drc;
> +    QemuUUID uuid;
> +    uint32_t drc_idx;
> +    uint32_t associativity[] = {
> +        cpu_to_be32(0x4), /* length */
> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
> +        cpu_to_be32(0x0), cpu_to_be32(node)
> +    };
> +
> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> +    g_assert(drc);
> +
> +    drc_idx = spapr_drc_index(drc);
> +
> +    sprintf(buf, "pmem@%x", drc_idx);
> +    offset = fdt_add_subnode(fdt, fdt_offset, buf);

"fdt_offset" vs. "offset" isn't very obvious.  Maybe parent_offset /
child_offset or something?

> +    _FDT(offset);
> +
> +    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
> +    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
> +    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));

Again, no need to set "name".

> +    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
> +
> +    /*NB : Supposed to be random strings. Currently empty 10 strings! */
> +    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
> +    g_string_free(lcode, TRUE);

I think leaving this property out would be preferable to including it
but putting nothing useful there.

> +
> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
> +                      sizeof(associativity))));
> +    g_random_set_seed(drc_idx);
> +    qemu_uuid_generate(&uuid);

This looks bogus.  I'm guessing the set seed is so that you generate
consistent UUIDs for the same NVDIMM in a guest.  First, that's making
a lot of assumptions about how qemu_uuid_generate() works that aren't
really warranted.  Second, it poisons the RNG for anything running
after this which actually wants (pseudo) random numbers.

I think you need to make the UUID a property of the device instead.

> +
> +    qemu_uuid_unparse(&uuid, buf);
> +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
> +
> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
> +
> +    /*NB : What it should be? */
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
> +
> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
> +
> +    return offset;
> +}
> +
> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
> +                             uint64_t size, uint32_t node,
> +                             Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> +    sPAPRDRConnector *drc;
> +    bool hotplugged = spapr_drc_hotplugged(dev);
> +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> +    void *fdt;
> +    int fdt_offset, fdt_size;
> +    Error *local_err = NULL;
> +
> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> +    g_assert(drc);

Creating the DRC in the hotplug path looks bogus.  Generally the DRC
has to exist before you can even attempt to plug the device.

> +    fdt = create_device_tree(&fdt_size);
> +    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
> +                                            size, nvdimm->label_size);
> +
> +    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    if (hotplugged) {
> +        spapr_hotplug_req_add_by_index(drc);
> +    }
> +}
> +
>  static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                                Error **errp)
>  {
>      Error *local_err = NULL;
>      sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>      PCDIMMDevice *dimm = PC_DIMM(dev);
> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>      uint64_t size, addr;
>      uint32_t node;
>  
> @@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>  
>      node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
>                                      &error_abort);
> -    spapr_add_lmbs(dev, addr, size, node,
> -                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> -                   &local_err);
> +    if (!is_nvdimm) {
> +        spapr_add_lmbs(dev, addr, size, node,
> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> +                       &local_err);
> +    } else {
> +        spapr_add_nvdimm(dev, addr, size, node, &local_err);
> +    }
> +
>      if (local_err) {
>          goto out_unplug;
>      }
> @@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>  {
>      const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>      sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>      PCDIMMDevice *dimm = PC_DIMM(dev);
>      Error *local_err = NULL;
>      uint64_t size;
> @@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          return;
>      }
>  
> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
>          error_setg(errp, "Hotplugged memory size must be a multiple of "
> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>          return;
> +    } else if (is_nvdimm) {
> +        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> +        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
> +            error_setg(errp, "NVDIMM memory size must be a multiple of "
> +                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> +            return;
> +        }
> +        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
> +            error_setg(errp, "NVDIMM size must be atleast "
> +                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> +            return;
> +        }
> +
> +        /* Align to scm block size, exclude the label */
> +        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
> +               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
>      }
>  
>      memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 2edb7d1e9c..94ddd102cc 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -696,6 +696,16 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
>      drck->release = spapr_lmb_release;
>  }
>  
> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> +
> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
> +    drck->typename = "MEM";
> +    drck->drc_name_prefix = "PMEM ";
> +    drck->release = NULL;
> +}
> +
>  static const TypeInfo spapr_dr_connector_info = {
>      .name          = TYPE_SPAPR_DR_CONNECTOR,
>      .parent        = TYPE_DEVICE,
> @@ -739,6 +749,12 @@ static const TypeInfo spapr_drc_lmb_info = {
>      .class_init    = spapr_drc_lmb_class_init,
>  };
>  
> +static const TypeInfo spapr_drc_pmem_info = {
> +    .name          = TYPE_SPAPR_DRC_PMEM,
> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
> +    .class_init    = spapr_drc_pmem_class_init,
> +};
> +
>  /* helper functions for external users */
>  
>  sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
> @@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void)
>      type_register_static(&spapr_drc_cpu_info);
>      type_register_static(&spapr_drc_pci_info);
>      type_register_static(&spapr_drc_lmb_info);
> +    type_register_static(&spapr_drc_pmem_info);
>  
>      spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>                          rtas_set_indicator);
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 32719a1b72..a4fed84346 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
>  #define RTAS_LOG_V6_HP_TYPE_PHB                          4
>  #define RTAS_LOG_V6_HP_TYPE_PCI                          5
> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
>      uint8_t hotplug_action;
>  #define RTAS_LOG_V6_HP_ACTION_ADD                        1
>  #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> @@ -526,6 +527,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>      case SPAPR_DR_CONNECTOR_TYPE_CPU:
>          hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
>          break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
> +        break;
>      default:
>          /* we shouldn't be signaling hotplug events for resources
>           * that don't support them
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index a947a0a0dc..21a9709afe 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -187,6 +187,7 @@ struct sPAPRMachineState {
>  
>      bool cmd_line_caps[SPAPR_CAP_NUM];
>      sPAPRCapabilities def, eff, mig;
> +    bool nvdimm_enabled;
>  };
>  
>  #define H_SUCCESS         0
> @@ -798,6 +799,15 @@ int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
>  #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
>  #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
>  
> +/*
> + * The nvdimm size should be aligned to SCM block size.
> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
> + * inorder to have SCM regions not to overlap with dimm memory regions.
> + * The SCM devices can have variable block sizes. For now, fixing the
> + * block size to the minimum value.
> + */
> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
> +
>  void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>  
>  #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index f6ff32e7e2..65925d00b1 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -70,6 +70,13 @@
>  #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>                                          TYPE_SPAPR_DRC_LMB)
>  
> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
> +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PMEM)
> +#define SPAPR_DRC_PMEM_CLASS(klass) \
> +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PMEM)
> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
> +                                        TYPE_SPAPR_DRC_PMEM)
>  /*
>   * Various hotplug types managed by sPAPRDRConnector
>   *
> @@ -87,6 +94,7 @@ typedef enum {
>      SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
>      SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
>      SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
>  } sPAPRDRConnectorTypeShift;
>  
>  typedef enum {
> @@ -96,6 +104,7 @@ typedef enum {
>      SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
>      SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
>      SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
>  } sPAPRDRConnectorType;
>  
>  /*
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/4] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 4/4] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
@ 2019-02-12  2:28   ` David Gibson
  2019-02-15 11:11     ` Shivaprasad G Bhat
  0 siblings, 1 reply; 19+ messages in thread
From: David Gibson @ 2019-02-12  2:28 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav,
	imammedo

[-- Attachment #1: Type: text/plain, Size: 13242 bytes --]

On Tue, Feb 05, 2019 at 11:26:41PM -0600, Shivaprasad G Bhat wrote:
> This patch implements few of the necessary hcalls for the nvdimm support.
> 
> PAPR semantics is such that each NVDIMM device is comprising of multiple
> SCM(Storage Class Memory) blocks. The guest requests the hypervisor to bind
> each of the SCM blocks of the NVDIMM device using hcalls. There can be
> SCM block unbind requests in case of driver errors or unplug(not supported now)
> use cases. The NVDIMM label read/writes are done through hcalls.
> 
> Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
> unbind, and queries using hcalls on those blocks can come independently. This
> doesn't fit well into the qemu device semantics, where the map/unmap are done
> at the (whole)device/object level granularity. The patch doesnt actually
> bind/unbind on hcalls but let it happen at the object_add/del phase itself
> instead.
> 
> The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
> region level granularity. Without interleaving, each virtual NVDIMM device is
> presented as separate region. There is no way to configure the virtual NVDIMM
> interleaving for the guests today. So, there is no way a partial bind/unbind
> request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
> virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
> object_add/del.

Hrm.  I don't entirely follow the above, but implementing something
that doesn't really match the PAPR model seems like it could lead to
problems.

> 
> The kernel today is not using the hcalls - h_scm_mem_query, h_scm_mem_clear,
> h_scm_query_logical_mem_binding and h_scm_query_block_mem_binding. They are just
> stubs in this patch.
> 
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> ---
>  hw/ppc/spapr_hcall.c   |  230 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |   12 ++-
>  2 files changed, 240 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 17bcaa3822..40553e80d6 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -3,11 +3,13 @@
>  #include "sysemu/hw_accel.h"
>  #include "sysemu/sysemu.h"
>  #include "qemu/log.h"
> +#include "qemu/range.h"
>  #include "qemu/error-report.h"
>  #include "cpu.h"
>  #include "exec/exec-all.h"
>  #include "helper_regs.h"
>  #include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_drc.h"
>  #include "hw/ppc/spapr_cpu_core.h"
>  #include "mmu-hash64.h"
>  #include "cpu-models.h"
> @@ -16,6 +18,7 @@
>  #include "hw/ppc/spapr_ovec.h"
>  #include "mmu-book3s-v3.h"
>  #include "hw/mem/memory-device.h"
> +#include "hw/mem/nvdimm.h"
>  
>  struct LPCRSyncState {
>      target_ulong value;
> @@ -1808,6 +1811,222 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>      return H_SUCCESS;
>  }
>  
> +static target_ulong h_scm_read_metadata(PowerPCCPU *cpu,
> +                                        sPAPRMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t offset = args[1];
> +    uint8_t numBytesToRead = args[2];

This will truncate the argument to 8 bits _before_ you validate it,
which doesn't seem like what you want.

> +    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
> +    NVDIMMDevice *nvdimm = NULL;
> +    NVDIMMClass *ddc = NULL;
> +
> +    if (numBytesToRead != 1 && numBytesToRead != 2 &&
> +        numBytesToRead != 4 && numBytesToRead != 8) {
> +        return H_P3;
> +    }
> +
> +    if (offset & (numBytesToRead - 1)) {
> +        return H_P2;
> +    }
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    nvdimm = NVDIMM(drc->dev);
> +    ddc = NVDIMM_GET_CLASS(nvdimm);
> +
> +    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);

Hm.  Is this the only way to access the label data, or is it also
mapped into the guest visible address space?  I ask because some of
the calculations you made about size+label_size in an earlier patch
seemed to suggest it was part of the address space.

> +    return H_SUCCESS;
> +}
> +
> +
> +static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
> +                                         sPAPRMachineState *spapr,
> +                                         target_ulong opcode,
> +                                         target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t offset = args[1];
> +    uint64_t data = args[2];
> +    int8_t numBytesToWrite = args[3];
> +    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
> +    NVDIMMDevice *nvdimm = NULL;
> +    DeviceState *dev = NULL;
> +    NVDIMMClass *ddc = NULL;
> +
> +    if (numBytesToWrite != 1 && numBytesToWrite != 2 &&
> +        numBytesToWrite != 4 && numBytesToWrite != 8) {
> +        return H_P4;
> +    }
> +
> +    if (offset & (numBytesToWrite - 1)) {
> +        return H_P2;
> +    }
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    dev = drc->dev;
> +    nvdimm = NVDIMM(dev);
> +    if (offset >= nvdimm->label_size) {
> +        return H_P3;
> +    }
> +
> +    ddc = NVDIMM_GET_CLASS(nvdimm);
> +
> +    ddc->write_label_data(nvdimm, &data, numBytesToWrite, offset);
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t starting_index = args[1];
> +    uint64_t no_of_scm_blocks_to_bind = args[2];
> +    uint64_t target_logical_mem_addr = args[3];
> +    uint64_t continue_token = args[4];
> +    uint64_t size;
> +    uint64_t total_no_of_scm_blocks;
> +
> +    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
> +    hwaddr addr;
> +    DeviceState *dev = NULL;
> +    PCDIMMDevice *dimm = NULL;
> +    Error *local_err = NULL;
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    dev = drc->dev;
> +    dimm = PC_DIMM(dev);
> +
> +    size = object_property_get_uint(OBJECT(dimm),
> +                                    PC_DIMM_SIZE_PROP, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return H_PARAMETER;

This should probably be H_HARDWARE, no?  The error isn't caused by one
of the parameters.

> +    }
> +
> +    total_no_of_scm_blocks = size / SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> +
> +    if (starting_index > total_no_of_scm_blocks) {
> +        return H_P2;
> +    }
> +
> +    if ((starting_index + no_of_scm_blocks_to_bind) >
> total_no_of_scm_blocks) {

You should probably have a check for integer overflow here as well,
just to be thorough.

> +        return H_P3;
> +    }
> +
> +    /* Currently qemu assigns the address. */
> +    if (target_logical_mem_addr != 0xffffffffffffffff) {
> +        return H_OVERLAP;
> +    }
> +
> +    /*
> +     * Currently continue token should be zero qemu has already bound
> +     * everything and this hcall doesnt return H_BUSY.
> +     */
> +    if (continue_token > 0) {
> +        return H_P5;
> +    }
> +
> +    /* NB : Already bound, Return target logical address in R4 */
> +    addr = object_property_get_uint(OBJECT(dimm),
> +                                    PC_DIMM_ADDR_PROP, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return H_PARAMETER;
> +    }
> +
> +    args[1] = addr;
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    uint64_t starting_scm_logical_addr = args[0];
> +    uint64_t no_of_scm_blocks_to_unbind = args[1];
> +    uint64_t size_to_unbind;
> +    uint64_t continue_token = args[2];
> +    Range as = range_empty;
> +    GSList *dimms = NULL;
> +    bool valid = false;
> +
> +    size_to_unbind = no_of_scm_blocks_to_unbind * SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> +
> +    /* Check if starting_scm_logical_addr is block aligned */
> +    if (!QEMU_IS_ALIGNED(starting_scm_logical_addr,
> +                         SPAPR_MINIMUM_SCM_BLOCK_SIZE)) {
> +        return H_PARAMETER;
> +    }
> +
> +    range_init_nofail(&as, starting_scm_logical_addr, size_to_unbind);
> +
> +    dimms = nvdimm_get_device_list();
> +    for (; dimms; dimms = dimms->next) {
> +        NVDIMMDevice *nvdimm = dimms->data;
> +        Range tmp;
> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
> +                                           NULL);
> +        int addr = object_property_get_int(OBJECT(nvdimm), PC_DIMM_ADDR_PROP,
> +                                           NULL);
> +        range_init_nofail(&tmp, addr, size);
> +
> +        if (range_contains_range(&tmp, &as)) {
> +            valid = true;
> +            break;
> +        }
> +    }
> +
> +    if (!valid) {
> +        return H_P2;
> +    }
> +
> +    if (continue_token > 0) {
> +        return H_P3;
> +    }
> +
> +    /*NB : dont do anything, let object_del take care of this for now. */
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_scm_query_block_mem_binding(PowerPCCPU *cpu,
> +                                                  sPAPRMachineState *spapr,
> +                                                  target_ulong opcode,
> +                                                  target_ulong *args)
> +{
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_scm_query_logical_mem_binding(PowerPCCPU *cpu,
> +                                                    sPAPRMachineState *spapr,
> +                                                    target_ulong opcode,
> +                                                    target_ulong *args)
> +{
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_scm_mem_query(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    return H_SUCCESS;
> +}
> +
>  static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
>  static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
>  
> @@ -1907,6 +2126,17 @@ static void hypercall_register_types(void)
>      /* qemu/KVM-PPC specific hcalls */
>      spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
>  
> +    /* qemu/scm specific hcalls */
> +    spapr_register_hypercall(H_SCM_READ_METADATA, h_scm_read_metadata);
> +    spapr_register_hypercall(H_SCM_WRITE_METADATA, h_scm_write_metadata);
> +    spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
> +    spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
> +    spapr_register_hypercall(H_SCM_QUERY_BLOCK_MEM_BINDING,
> +                             h_scm_query_block_mem_binding);
> +    spapr_register_hypercall(H_SCM_QUERY_LOGICAL_MEM_BINDING,
> +                             h_scm_query_logical_mem_binding);
> +    spapr_register_hypercall(H_SCM_MEM_QUERY, h_scm_mem_query);
> +
>      /* ibm,client-architecture-support support */
>      spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 21a9709afe..28249567f4 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -268,6 +268,7 @@ struct sPAPRMachineState {
>  #define H_P7              -60
>  #define H_P8              -61
>  #define H_P9              -62
> +#define H_OVERLAP         -68
>  #define H_UNSUPPORTED_FLAG -256
>  #define H_MULTI_THREADS_ACTIVE -9005
>  
> @@ -473,8 +474,15 @@ struct sPAPRMachineState {
>  #define H_INT_ESB               0x3C8
>  #define H_INT_SYNC              0x3CC
>  #define H_INT_RESET             0x3D0
> -
> -#define MAX_HCALL_OPCODE        H_INT_RESET
> +#define H_SCM_READ_METADATA     0x3E4
> +#define H_SCM_WRITE_METADATA     0x3E8
> +#define H_SCM_BIND_MEM          0x3EC
> +#define H_SCM_UNBIND_MEM        0x3F0
> +#define H_SCM_QUERY_BLOCK_MEM_BINDING 0x3F4
> +#define H_SCM_QUERY_LOGICAL_MEM_BINDING 0x3F8
> +#define H_SCM_MEM_QUERY         0x3FC
> +
> +#define MAX_HCALL_OPCODE        H_SCM_MEM_QUERY
>  
>  /* The hcalls above are standardized in PAPR and implemented by pHyp
>   * as well.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-12  1:49   ` David Gibson
@ 2019-02-15 11:11     ` Shivaprasad G Bhat
  2019-02-17 23:02       ` David Gibson
  0 siblings, 1 reply; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-15 11:11 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav,
	imammedo

Thanks for the comments David. Please find my replies inline..


On 02/12/2019 07:19 AM, David Gibson wrote:
> On Tue, Feb 05, 2019 at 11:26:27PM -0600, Shivaprasad G Bhat wrote:
>> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
>> device interface in QEMU to support virtual NVDIMM devices for Power (May have
>> to re-look at this later).  Create the required DT entries for the
>> device (some entries have dummy values right now).
>>
>> The patch creates the required DT node and sends a hotplug
>> interrupt to the guest. Guest is expected to undertake the normal
>> DR resource add path in response and start issuing PAPR SCM hcalls.
>>
>> This is how it can be used ..
>> Add nvdimm=on to the qemu machine argument.
>> Ex : -machine pseries,nvdimm=on
>> For coldplug, the device to be added in qemu command line as shown below
>> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
>> -device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
>>
>> For hotplug, the device to be added from monitor as below
>> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
>> device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
>>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
>>                 [Early implementation]
>> ---
>>   default-configs/ppc64-softmmu.mak |    1
>>   hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
>>   hw/ppc/spapr_drc.c                |   17 +++
>>   hw/ppc/spapr_events.c             |    4 +
>>   include/hw/ppc/spapr.h            |   10 ++
>>   include/hw/ppc/spapr_drc.h        |    9 ++
>>   6 files changed, 241 insertions(+), 12 deletions(-)
>>
>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>> index 7f34ad0528..b6e1aa5125 100644
>> --- a/default-configs/ppc64-softmmu.mak
>> +++ b/default-configs/ppc64-softmmu.mak
>> @@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
>>   CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>>   CONFIG_MEM_DEVICE=y
>>   CONFIG_DIMM=y
>> +CONFIG_NVDIMM=y
>>   CONFIG_SPAPR_RNG=y
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 0fcdd35cbe..7e7a1a8041 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -73,6 +73,7 @@
>>   #include "qemu/cutils.h"
>>   #include "hw/ppc/spapr_cpu_core.h"
>>   #include "hw/mem/memory-device.h"
>> +#include "hw/mem/nvdimm.h"
>>   
>>   #include <libfdt.h>
>>   
>> @@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>>       uint8_t *int_buf, *cur_index, buf_len;
>>       int ret;
>>       uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>>       uint64_t addr, cur_addr, size;
>>       uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
>>       uint64_t mem_end = machine->device_memory->base +
>> @@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>>               nr_entries++;
>>           }
>>   
>> -        /* Entry for DIMM */
>> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>> -        g_assert(drc);
>> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
>> -                                     spapr_drc_index(drc), node,
>> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
>> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
>> +            /* Entry for NVDIMM */
>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
>> +            g_assert(drc);
>> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
>> +                                         spapr_drc_index(drc), -1, 0);
>> +            cur_addr = ROUND_UP(addr + size, scm_block_size);
>> +        } else {
>> +            /* Entry for DIMM */
>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>> +            g_assert(drc);
>> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
>> +                                         spapr_drc_index(drc), node,
>> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
>> +            cur_addr = addr + size;
>> +        }
>>           QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>>           nr_entries++;
>> -        cur_addr = addr + size;
>>       }
>>   
>>       /* Entry for remaining hotpluggable area */
>> @@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
>>       }
>>   }
>>   
>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
>> +                                      uint32_t node, uint64_t addr,
>> +                                      uint64_t size, uint64_t label_size);
> Re-ordering the code is generally preferred to static forward declarations.
Ok
>> +static void spapr_create_nvdimm(void *fdt)
> I'm trying to standardize on spapr_dt_*() for functions which generate
> bits of the device tree.
Ok. Will rename to spapr_dt_create_nvdimm
>> +{
>> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
>> +    GSList *dimms = NULL;
>> +
>> +    if (offset < 0) {
>> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
>> +        _FDT(offset);
>> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
>> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
>> +        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));
> No need to explicitly set the "name" property, that's implicit in the
> node name.
Ok
>> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
>> +                                 "ibm,persistent-memory")));
>> +    }
>> +
>> +    /*NB : Add drc-info array here */
>> +
>> +    /* Create DT entries for cold plugged NVDIMM devices */
>> +    dimms = nvdimm_get_device_list();
>> +    for (; dimms; dimms = dimms->next) {
>> +        NVDIMMDevice *nvdimm = dimms->data;
>> +        PCDIMMDevice *di = PC_DIMM(nvdimm);
>> +        uint64_t lsize = nvdimm->label_size;
>> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
>> +                                           NULL);
>> +
>> +        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
>> +                                   size, lsize);
> It might be cleaner to just pass the NVDIMMDevice * rather than
> umpteen parameters.
Ok.
>> +    }
>> +    g_slist_free(dimms);
>> +    return;
>> +}
>> +
>>   static void *spapr_build_fdt(sPAPRMachineState *spapr)
>>   {
>>       MachineState *machine = MACHINE(spapr);
>> @@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
>>           exit(1);
>>       }
>>   
>> +    /* NVDIMM devices */
>> +    if (spapr->nvdimm_enabled) {
>> +        spapr_create_nvdimm(fdt);
>> +    }
>> +
>>       return fdt;
>>   }
>>   
>> @@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
>>       }
>>   }
>>   
>> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
>> +{
>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>> +
>> +    return spapr->nvdimm_enabled;
>> +}
>> +
>> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
>> +{
>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>> +
>> +    spapr->nvdimm_enabled = value;
>> +}
>> +
>>   static void spapr_instance_init(Object *obj)
>>   {
>>       sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>> @@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
>>       object_property_set_description(obj, "ic-mode",
>>                    "Specifies the interrupt controller mode (xics, xive, dual)",
>>                    NULL);
>> +    object_property_add_bool(obj, "nvdimm",
>> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
>> +    object_property_set_description(obj, "nvdimm",
>> +                                    "Enable support for nvdimm devices",
>> +                                    NULL);
> I'm not seeing a lot of point to this machine parameter.
Just followed what the x86 is doing here.

>>   }
>>   
>>   static void spapr_machine_finalizefn(Object *obj)
>> @@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
>>       }
>>   }
>>   
>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
>> +                                      uint64_t addr, uint64_t size,
>> +                                      uint64_t label_size)
>> +{
>> +    int offset;
>> +    char buf[40];
>> +    GString *lcode = g_string_sized_new(10);
>> +    sPAPRDRConnector *drc;
>> +    QemuUUID uuid;
>> +    uint32_t drc_idx;
>> +    uint32_t associativity[] = {
>> +        cpu_to_be32(0x4), /* length */
>> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
>> +        cpu_to_be32(0x0), cpu_to_be32(node)
>> +    };
>> +
>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>> +    g_assert(drc);
>> +
>> +    drc_idx = spapr_drc_index(drc);
>> +
>> +    sprintf(buf, "pmem@%x", drc_idx);
>> +    offset = fdt_add_subnode(fdt, fdt_offset, buf);
> "fdt_offset" vs. "offset" isn't very obvious.  Maybe parent_offset /
> child_offset or something?
Ok
>> +    _FDT(offset);
>> +
>> +    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
>> +    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
>> +    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));
> Again, no need to set "name".
Ok
>> +    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
>> +
>> +    /*NB : Supposed to be random strings. Currently empty 10 strings! */
>> +    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
>> +    g_string_free(lcode, TRUE);
> I think leaving this property out would be preferable to including it
> but putting nothing useful there.
Ok.
>> +
>> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
>> +                      sizeof(associativity))));
>> +    g_random_set_seed(drc_idx);
>> +    qemu_uuid_generate(&uuid);
> This looks bogus.  I'm guessing the set seed is so that you generate
> consistent UUIDs for the same NVDIMM in a guest.  First, that's making
> a lot of assumptions about how qemu_uuid_generate() works that aren't
> really warranted.  Second, it poisons the RNG for anything running
> after this which actually wants (pseudo) random numbers.
>
> I think you need to make the UUID a property of the device instead.
Ok.
>> +
>> +    qemu_uuid_unparse(&uuid, buf);
>> +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
>> +
>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
>> +
>> +    /*NB : What it should be? */
>> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
>> +
>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
>> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
>> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
>> +
>> +    return offset;
>> +}
>> +
>> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
>> +                             uint64_t size, uint32_t node,
>> +                             Error **errp)
>> +{
>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
>> +    sPAPRDRConnector *drc;
>> +    bool hotplugged = spapr_drc_hotplugged(dev);
>> +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
>> +    void *fdt;
>> +    int fdt_offset, fdt_size;
>> +    Error *local_err = NULL;
>> +
>> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
>> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>> +    g_assert(drc);
> Creating the DRC in the hotplug path looks bogus.  Generally the DRC
> has to exist before you can even attempt to plug the device.

We dont really know how many DRC to create. Unlike memory hotplug
where we know how many LMBs are required to fit till the maxmem, in this
case we dont know how many NVDIMM devices  guest can have. That is the
reason I am creating the DRC on demand. I'll see if it is possible to 
address this
by putting a cap on maximum number of NVDIMM devices a guest can have.


>> +    fdt = create_device_tree(&fdt_size);
>> +    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
>> +                                            size, nvdimm->label_size);
>> +
>> +    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    if (hotplugged) {
>> +        spapr_hotplug_req_add_by_index(drc);
>> +    }
>> +}
>> +
>>   static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>                                 Error **errp)
>>   {
>>       Error *local_err = NULL;
>>       sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>>       PCDIMMDevice *dimm = PC_DIMM(dev);
>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>       uint64_t size, addr;
>>       uint32_t node;
>>   
>> @@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>   
>>       node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
>>                                       &error_abort);
>> -    spapr_add_lmbs(dev, addr, size, node,
>> -                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>> -                   &local_err);
>> +    if (!is_nvdimm) {
>> +        spapr_add_lmbs(dev, addr, size, node,
>> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>> +                       &local_err);
>> +    } else {
>> +        spapr_add_nvdimm(dev, addr, size, node, &local_err);
>> +    }
>> +
>>       if (local_err) {
>>           goto out_unplug;
>>       }
>> @@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>   {
>>       const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>>       sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>       PCDIMMDevice *dimm = PC_DIMM(dev);
>>       Error *local_err = NULL;
>>       uint64_t size;
>> @@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>           return;
>>       }
>>   
>> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
>>           error_setg(errp, "Hotplugged memory size must be a multiple of "
>> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>>           return;
>> +    } else if (is_nvdimm) {
>> +        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
>> +        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
>> +            error_setg(errp, "NVDIMM memory size must be a multiple of "
>> +                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>> +            return;
>> +        }
>> +        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
>> +            error_setg(errp, "NVDIMM size must be atleast "
>> +                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>> +            return;
>> +        }
>> +
>> +        /* Align to scm block size, exclude the label */
>> +        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
>> +               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);
>> +        if (local_err) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>>       }
>>   
>>       memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>> index 2edb7d1e9c..94ddd102cc 100644
>> --- a/hw/ppc/spapr_drc.c
>> +++ b/hw/ppc/spapr_drc.c
>> @@ -696,6 +696,16 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
>>       drck->release = spapr_lmb_release;
>>   }
>>   
>> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
>> +{
>> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
>> +
>> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
>> +    drck->typename = "MEM";
>> +    drck->drc_name_prefix = "PMEM ";
>> +    drck->release = NULL;
>> +}
>> +
>>   static const TypeInfo spapr_dr_connector_info = {
>>       .name          = TYPE_SPAPR_DR_CONNECTOR,
>>       .parent        = TYPE_DEVICE,
>> @@ -739,6 +749,12 @@ static const TypeInfo spapr_drc_lmb_info = {
>>       .class_init    = spapr_drc_lmb_class_init,
>>   };
>>   
>> +static const TypeInfo spapr_drc_pmem_info = {
>> +    .name          = TYPE_SPAPR_DRC_PMEM,
>> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
>> +    .class_init    = spapr_drc_pmem_class_init,
>> +};
>> +
>>   /* helper functions for external users */
>>   
>>   sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
>> @@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void)
>>       type_register_static(&spapr_drc_cpu_info);
>>       type_register_static(&spapr_drc_pci_info);
>>       type_register_static(&spapr_drc_lmb_info);
>> +    type_register_static(&spapr_drc_pmem_info);
>>   
>>       spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>>                           rtas_set_indicator);
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 32719a1b72..a4fed84346 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
>>   #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
>>   #define RTAS_LOG_V6_HP_TYPE_PHB                          4
>>   #define RTAS_LOG_V6_HP_TYPE_PCI                          5
>> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
>>       uint8_t hotplug_action;
>>   #define RTAS_LOG_V6_HP_ACTION_ADD                        1
>>   #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
>> @@ -526,6 +527,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>>       case SPAPR_DR_CONNECTOR_TYPE_CPU:
>>           hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
>>           break;
>> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
>> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
>> +        break;
>>       default:
>>           /* we shouldn't be signaling hotplug events for resources
>>            * that don't support them
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index a947a0a0dc..21a9709afe 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -187,6 +187,7 @@ struct sPAPRMachineState {
>>   
>>       bool cmd_line_caps[SPAPR_CAP_NUM];
>>       sPAPRCapabilities def, eff, mig;
>> +    bool nvdimm_enabled;
>>   };
>>   
>>   #define H_SUCCESS         0
>> @@ -798,6 +799,15 @@ int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
>>   #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
>>   #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
>>   
>> +/*
>> + * The nvdimm size should be aligned to SCM block size.
>> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
>> + * inorder to have SCM regions not to overlap with dimm memory regions.
>> + * The SCM devices can have variable block sizes. For now, fixing the
>> + * block size to the minimum value.
>> + */
>> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
>> +
>>   void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>>   
>>   #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
>> index f6ff32e7e2..65925d00b1 100644
>> --- a/include/hw/ppc/spapr_drc.h
>> +++ b/include/hw/ppc/spapr_drc.h
>> @@ -70,6 +70,13 @@
>>   #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>>                                           TYPE_SPAPR_DRC_LMB)
>>   
>> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
>> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
>> +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PMEM)
>> +#define SPAPR_DRC_PMEM_CLASS(klass) \
>> +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PMEM)
>> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>> +                                        TYPE_SPAPR_DRC_PMEM)
>>   /*
>>    * Various hotplug types managed by sPAPRDRConnector
>>    *
>> @@ -87,6 +94,7 @@ typedef enum {
>>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
>>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
>>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
>> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
>>   } sPAPRDRConnectorTypeShift;
>>   
>>   typedef enum {
>> @@ -96,6 +104,7 @@ typedef enum {
>>       SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
>>       SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
>>       SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
>> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
>>   } sPAPRDRConnectorType;
>>   
>>   /*
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/4] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-02-12  2:28   ` David Gibson
@ 2019-02-15 11:11     ` Shivaprasad G Bhat
  2019-02-19  5:33       ` David Gibson
  0 siblings, 1 reply; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-15 11:11 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav,
	imammedo



On 02/12/2019 07:58 AM, David Gibson wrote:
> On Tue, Feb 05, 2019 at 11:26:41PM -0600, Shivaprasad G Bhat wrote:
>> This patch implements few of the necessary hcalls for the nvdimm support.
>>
>> PAPR semantics is such that each NVDIMM device is comprising of multiple
>> SCM(Storage Class Memory) blocks. The guest requests the hypervisor to bind
>> each of the SCM blocks of the NVDIMM device using hcalls. There can be
>> SCM block unbind requests in case of driver errors or unplug(not supported now)
>> use cases. The NVDIMM label read/writes are done through hcalls.
>>
>> Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
>> unbind, and queries using hcalls on those blocks can come independently. This
>> doesn't fit well into the qemu device semantics, where the map/unmap are done
>> at the (whole)device/object level granularity. The patch doesnt actually
>> bind/unbind on hcalls but let it happen at the object_add/del phase itself
>> instead.
>>
>> The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
>> region level granularity. Without interleaving, each virtual NVDIMM device is
>> presented as separate region. There is no way to configure the virtual NVDIMM
>> interleaving for the guests today. So, there is no way a partial bind/unbind
>> request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
>> virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
>> object_add/del.
> Hrm.  I don't entirely follow the above, but implementing something
> that doesn't really match the PAPR model seems like it could lead to
> problems.

In qemu, the device is mapped at the hotplug stage. However the SCM blocks
map requests can come later block by block. So, we will have to figure out
if NVDIMM device model is the right fit here.

The interleaving of the NVDIMMs actually can send requests for binding
different blocks of different devices on demand, and thus have partial 
mapping.
But, I dont see how interleaving can be supported for Virtual NVDIMMs given
the existing support is only from firmware interfaces like UEFI/BIOS.

I chose this approach given virtual NVDIMM interleaving support chances are
less and so pre-mapping is safe, and we can build on the existing NVDIMM 
model.

>> The kernel today is not using the hcalls - h_scm_mem_query, h_scm_mem_clear,
>> h_scm_query_logical_mem_binding and h_scm_query_block_mem_binding. They are just
>> stubs in this patch.
>>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_hcall.c   |  230 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr.h |   12 ++-
>>   2 files changed, 240 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index 17bcaa3822..40553e80d6 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -3,11 +3,13 @@
>>   #include "sysemu/hw_accel.h"
>>   #include "sysemu/sysemu.h"
>>   #include "qemu/log.h"
>> +#include "qemu/range.h"
>>   #include "qemu/error-report.h"
>>   #include "cpu.h"
>>   #include "exec/exec-all.h"
>>   #include "helper_regs.h"
>>   #include "hw/ppc/spapr.h"
>> +#include "hw/ppc/spapr_drc.h"
>>   #include "hw/ppc/spapr_cpu_core.h"
>>   #include "mmu-hash64.h"
>>   #include "cpu-models.h"
>> @@ -16,6 +18,7 @@
>>   #include "hw/ppc/spapr_ovec.h"
>>   #include "mmu-book3s-v3.h"
>>   #include "hw/mem/memory-device.h"
>> +#include "hw/mem/nvdimm.h"
>>   
>>   struct LPCRSyncState {
>>       target_ulong value;
>> @@ -1808,6 +1811,222 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>       return H_SUCCESS;
>>   }
>>   
>> +static target_ulong h_scm_read_metadata(PowerPCCPU *cpu,
>> +                                        sPAPRMachineState *spapr,
>> +                                        target_ulong opcode,
>> +                                        target_ulong *args)
>> +{
>> +    uint32_t drc_index = args[0];
>> +    uint64_t offset = args[1];
>> +    uint8_t numBytesToRead = args[2];
> This will truncate the argument to 8 bits _before_ you validate it,
> which doesn't seem like what you want.
I'll fix it.

>> +    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
>> +    NVDIMMDevice *nvdimm = NULL;
>> +    NVDIMMClass *ddc = NULL;
>> +
>> +    if (numBytesToRead != 1 && numBytesToRead != 2 &&
>> +        numBytesToRead != 4 && numBytesToRead != 8) {
>> +        return H_P3;
>> +    }
>> +
>> +    if (offset & (numBytesToRead - 1)) {
>> +        return H_P2;
>> +    }
>> +
>> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    nvdimm = NVDIMM(drc->dev);
>> +    ddc = NVDIMM_GET_CLASS(nvdimm);
>> +
>> +    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);
> Hm.  Is this the only way to access the label data, or is it also
> mapped into the guest visible address space?  I ask because some of
> the calculations you made about size+label_size in an earlier patch
> seemed to suggest it was part of the address space.
Yes. The label is not mapped to the guest visible address space.
You are right in pointing that out, its a bug.
That is not needed as in the same patch I am doing
QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE) to the
nvdimm size in spapr_memory_pre_plug().

>> +    return H_SUCCESS;
>> +}
>> +
>> +
>> +static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
>> +                                         sPAPRMachineState *spapr,
>> +                                         target_ulong opcode,
>> +                                         target_ulong *args)
>> +{
>> +    uint32_t drc_index = args[0];
>> +    uint64_t offset = args[1];
>> +    uint64_t data = args[2];
>> +    int8_t numBytesToWrite = args[3];
>> +    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
>> +    NVDIMMDevice *nvdimm = NULL;
>> +    DeviceState *dev = NULL;
>> +    NVDIMMClass *ddc = NULL;
>> +
>> +    if (numBytesToWrite != 1 && numBytesToWrite != 2 &&
>> +        numBytesToWrite != 4 && numBytesToWrite != 8) {
>> +        return H_P4;
>> +    }
>> +
>> +    if (offset & (numBytesToWrite - 1)) {
>> +        return H_P2;
>> +    }
>> +
>> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    dev = drc->dev;
>> +    nvdimm = NVDIMM(dev);
>> +    if (offset >= nvdimm->label_size) {
>> +        return H_P3;
>> +    }
>> +
>> +    ddc = NVDIMM_GET_CLASS(nvdimm);
>> +
>> +    ddc->write_label_data(nvdimm, &data, numBytesToWrite, offset);
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>> +                                        target_ulong opcode,
>> +                                        target_ulong *args)
>> +{
>> +    uint32_t drc_index = args[0];
>> +    uint64_t starting_index = args[1];
>> +    uint64_t no_of_scm_blocks_to_bind = args[2];
>> +    uint64_t target_logical_mem_addr = args[3];
>> +    uint64_t continue_token = args[4];
>> +    uint64_t size;
>> +    uint64_t total_no_of_scm_blocks;
>> +
>> +    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
>> +    hwaddr addr;
>> +    DeviceState *dev = NULL;
>> +    PCDIMMDevice *dimm = NULL;
>> +    Error *local_err = NULL;
>> +
>> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    dev = drc->dev;
>> +    dimm = PC_DIMM(dev);
>> +
>> +    size = object_property_get_uint(OBJECT(dimm),
>> +                                    PC_DIMM_SIZE_PROP, &local_err);
>> +    if (local_err) {
>> +        error_report_err(local_err);
>> +        return H_PARAMETER;
> This should probably be H_HARDWARE, no?  The error isn't caused by one
> of the parameters.
Its not clearly defined, so I chose H_PARAMETER to suggest the drc index
was probably wrong.
>> +    }
>> +
>> +    total_no_of_scm_blocks = size / SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>> +
>> +    if (starting_index > total_no_of_scm_blocks) {
>> +        return H_P2;
>> +    }
>> +
>> +    if ((starting_index + no_of_scm_blocks_to_bind) >
>> total_no_of_scm_blocks) {
> You should probably have a check for integer overflow here as well,
> just to be thorough.
Ok
>> +        return H_P3;
>> +    }
>> +
>> +    /* Currently qemu assigns the address. */
>> +    if (target_logical_mem_addr != 0xffffffffffffffff) {
>> +        return H_OVERLAP;
>> +    }
>> +
>> +    /*
>> +     * Currently continue token should be zero qemu has already bound
>> +     * everything and this hcall doesnt return H_BUSY.
>> +     */
>> +    if (continue_token > 0) {
>> +        return H_P5;
>> +    }
>> +
>> +    /* NB : Already bound, Return target logical address in R4 */
>> +    addr = object_property_get_uint(OBJECT(dimm),
>> +                                    PC_DIMM_ADDR_PROP, &local_err);
>> +    if (local_err) {
>> +        error_report_err(local_err);
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    args[1] = addr;
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>> +                                        target_ulong opcode,
>> +                                        target_ulong *args)
>> +{
>> +    uint64_t starting_scm_logical_addr = args[0];
>> +    uint64_t no_of_scm_blocks_to_unbind = args[1];
>> +    uint64_t size_to_unbind;
>> +    uint64_t continue_token = args[2];
>> +    Range as = range_empty;
>> +    GSList *dimms = NULL;
>> +    bool valid = false;
>> +
>> +    size_to_unbind = no_of_scm_blocks_to_unbind * SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>> +
>> +    /* Check if starting_scm_logical_addr is block aligned */
>> +    if (!QEMU_IS_ALIGNED(starting_scm_logical_addr,
>> +                         SPAPR_MINIMUM_SCM_BLOCK_SIZE)) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    range_init_nofail(&as, starting_scm_logical_addr, size_to_unbind);
>> +
>> +    dimms = nvdimm_get_device_list();
>> +    for (; dimms; dimms = dimms->next) {
>> +        NVDIMMDevice *nvdimm = dimms->data;
>> +        Range tmp;
>> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
>> +                                           NULL);
>> +        int addr = object_property_get_int(OBJECT(nvdimm), PC_DIMM_ADDR_PROP,
>> +                                           NULL);
>> +        range_init_nofail(&tmp, addr, size);
>> +
>> +        if (range_contains_range(&tmp, &as)) {
>> +            valid = true;
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (!valid) {
>> +        return H_P2;
>> +    }
>> +
>> +    if (continue_token > 0) {
>> +        return H_P3;
>> +    }
>> +
>> +    /*NB : dont do anything, let object_del take care of this for now. */
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +static target_ulong h_scm_query_block_mem_binding(PowerPCCPU *cpu,
>> +                                                  sPAPRMachineState *spapr,
>> +                                                  target_ulong opcode,
>> +                                                  target_ulong *args)
>> +{
>> +    return H_SUCCESS;
>> +}
>> +
>> +static target_ulong h_scm_query_logical_mem_binding(PowerPCCPU *cpu,
>> +                                                    sPAPRMachineState *spapr,
>> +                                                    target_ulong opcode,
>> +                                                    target_ulong *args)
>> +{
>> +    return H_SUCCESS;
>> +}
>> +
>> +static target_ulong h_scm_mem_query(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>> +                                        target_ulong opcode,
>> +                                        target_ulong *args)
>> +{
>> +    return H_SUCCESS;
>> +}
>> +
>>   static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
>>   static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
>>   
>> @@ -1907,6 +2126,17 @@ static void hypercall_register_types(void)
>>       /* qemu/KVM-PPC specific hcalls */
>>       spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
>>   
>> +    /* qemu/scm specific hcalls */
>> +    spapr_register_hypercall(H_SCM_READ_METADATA, h_scm_read_metadata);
>> +    spapr_register_hypercall(H_SCM_WRITE_METADATA, h_scm_write_metadata);
>> +    spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
>> +    spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
>> +    spapr_register_hypercall(H_SCM_QUERY_BLOCK_MEM_BINDING,
>> +                             h_scm_query_block_mem_binding);
>> +    spapr_register_hypercall(H_SCM_QUERY_LOGICAL_MEM_BINDING,
>> +                             h_scm_query_logical_mem_binding);
>> +    spapr_register_hypercall(H_SCM_MEM_QUERY, h_scm_mem_query);
>> +
>>       /* ibm,client-architecture-support support */
>>       spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
>>   
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 21a9709afe..28249567f4 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -268,6 +268,7 @@ struct sPAPRMachineState {
>>   #define H_P7              -60
>>   #define H_P8              -61
>>   #define H_P9              -62
>> +#define H_OVERLAP         -68
>>   #define H_UNSUPPORTED_FLAG -256
>>   #define H_MULTI_THREADS_ACTIVE -9005
>>   
>> @@ -473,8 +474,15 @@ struct sPAPRMachineState {
>>   #define H_INT_ESB               0x3C8
>>   #define H_INT_SYNC              0x3CC
>>   #define H_INT_RESET             0x3D0
>> -
>> -#define MAX_HCALL_OPCODE        H_INT_RESET
>> +#define H_SCM_READ_METADATA     0x3E4
>> +#define H_SCM_WRITE_METADATA     0x3E8
>> +#define H_SCM_BIND_MEM          0x3EC
>> +#define H_SCM_UNBIND_MEM        0x3F0
>> +#define H_SCM_QUERY_BLOCK_MEM_BINDING 0x3F4
>> +#define H_SCM_QUERY_LOGICAL_MEM_BINDING 0x3F8
>> +#define H_SCM_MEM_QUERY         0x3FC
>> +
>> +#define MAX_HCALL_OPCODE        H_SCM_MEM_QUERY
>>   
>>   /* The hcalls above are standardized in PAPR and implemented by pHyp
>>    * as well.
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-15 11:11     ` Shivaprasad G Bhat
@ 2019-02-17 23:02       ` David Gibson
  2019-02-18 16:15         ` Shivaprasad G Bhat
  0 siblings, 1 reply; 19+ messages in thread
From: David Gibson @ 2019-02-17 23:02 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav,
	imammedo

[-- Attachment #1: Type: text/plain, Size: 24554 bytes --]

On Fri, Feb 15, 2019 at 04:41:09PM +0530, Shivaprasad G Bhat wrote:
> Thanks for the comments David. Please find my replies inline..
> 
> 
> On 02/12/2019 07:19 AM, David Gibson wrote:
> > On Tue, Feb 05, 2019 at 11:26:27PM -0600, Shivaprasad G Bhat wrote:
> > > Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
> > > device interface in QEMU to support virtual NVDIMM devices for Power (May have
> > > to re-look at this later).  Create the required DT entries for the
> > > device (some entries have dummy values right now).
> > > 
> > > The patch creates the required DT node and sends a hotplug
> > > interrupt to the guest. Guest is expected to undertake the normal
> > > DR resource add path in response and start issuing PAPR SCM hcalls.
> > > 
> > > This is how it can be used ..
> > > Add nvdimm=on to the qemu machine argument.
> > > Ex : -machine pseries,nvdimm=on
> > > For coldplug, the device to be added in qemu command line as shown below
> > > -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> > > -device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> > > 
> > > For hotplug, the device to be added from monitor as below
> > > object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> > > device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> > > 
> > > Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> > > Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> > >                 [Early implementation]
> > > ---
> > >   default-configs/ppc64-softmmu.mak |    1
> > >   hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
> > >   hw/ppc/spapr_drc.c                |   17 +++
> > >   hw/ppc/spapr_events.c             |    4 +
> > >   include/hw/ppc/spapr.h            |   10 ++
> > >   include/hw/ppc/spapr_drc.h        |    9 ++
> > >   6 files changed, 241 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> > > index 7f34ad0528..b6e1aa5125 100644
> > > --- a/default-configs/ppc64-softmmu.mak
> > > +++ b/default-configs/ppc64-softmmu.mak
> > > @@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
> > >   CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
> > >   CONFIG_MEM_DEVICE=y
> > >   CONFIG_DIMM=y
> > > +CONFIG_NVDIMM=y
> > >   CONFIG_SPAPR_RNG=y
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 0fcdd35cbe..7e7a1a8041 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -73,6 +73,7 @@
> > >   #include "qemu/cutils.h"
> > >   #include "hw/ppc/spapr_cpu_core.h"
> > >   #include "hw/mem/memory-device.h"
> > > +#include "hw/mem/nvdimm.h"
> > >   #include <libfdt.h>
> > > @@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
> > >       uint8_t *int_buf, *cur_index, buf_len;
> > >       int ret;
> > >       uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> > > +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> > >       uint64_t addr, cur_addr, size;
> > >       uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
> > >       uint64_t mem_end = machine->device_memory->base +
> > > @@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
> > >               nr_entries++;
> > >           }
> > > -        /* Entry for DIMM */
> > > -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> > > -        g_assert(drc);
> > > -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
> > > -                                     spapr_drc_index(drc), node,
> > > -                                     SPAPR_LMB_FLAGS_ASSIGNED);
> > > +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> > > +            /* Entry for NVDIMM */
> > > +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
> > > +            g_assert(drc);
> > > +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
> > > +                                         spapr_drc_index(drc), -1, 0);
> > > +            cur_addr = ROUND_UP(addr + size, scm_block_size);
> > > +        } else {
> > > +            /* Entry for DIMM */
> > > +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> > > +            g_assert(drc);
> > > +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
> > > +                                         spapr_drc_index(drc), node,
> > > +                                         SPAPR_LMB_FLAGS_ASSIGNED);
> > > +            cur_addr = addr + size;
> > > +        }
> > >           QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
> > >           nr_entries++;
> > > -        cur_addr = addr + size;
> > >       }
> > >       /* Entry for remaining hotpluggable area */
> > > @@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
> > >       }
> > >   }
> > > +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
> > > +                                      uint32_t node, uint64_t addr,
> > > +                                      uint64_t size, uint64_t label_size);
> > Re-ordering the code is generally preferred to static forward declarations.
> Ok
> > > +static void spapr_create_nvdimm(void *fdt)
> > I'm trying to standardize on spapr_dt_*() for functions which generate
> > bits of the device tree.
> Ok. Will rename to spapr_dt_create_nvdimm

Just spapr_dt_nvdimm() would be preferred.

> > > +{
> > > +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
> > > +    GSList *dimms = NULL;
> > > +
> > > +    if (offset < 0) {
> > > +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
> > > +        _FDT(offset);
> > > +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
> > > +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
> > > +        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));
> > No need to explicitly set the "name" property, that's implicit in the
> > node name.
> Ok
> > > +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
> > > +                                 "ibm,persistent-memory")));
> > > +    }
> > > +
> > > +    /*NB : Add drc-info array here */
> > > +
> > > +    /* Create DT entries for cold plugged NVDIMM devices */
> > > +    dimms = nvdimm_get_device_list();
> > > +    for (; dimms; dimms = dimms->next) {
> > > +        NVDIMMDevice *nvdimm = dimms->data;
> > > +        PCDIMMDevice *di = PC_DIMM(nvdimm);
> > > +        uint64_t lsize = nvdimm->label_size;
> > > +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
> > > +                                           NULL);
> > > +
> > > +        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
> > > +                                   size, lsize);
> > It might be cleaner to just pass the NVDIMMDevice * rather than
> > umpteen parameters.
> Ok.
> > > +    }
> > > +    g_slist_free(dimms);
> > > +    return;
> > > +}
> > > +
> > >   static void *spapr_build_fdt(sPAPRMachineState *spapr)
> > >   {
> > >       MachineState *machine = MACHINE(spapr);
> > > @@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
> > >           exit(1);
> > >       }
> > > +    /* NVDIMM devices */
> > > +    if (spapr->nvdimm_enabled) {
> > > +        spapr_create_nvdimm(fdt);
> > > +    }
> > > +
> > >       return fdt;
> > >   }
> > > @@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
> > >       }
> > >   }
> > > +static bool spapr_get_nvdimm(Object *obj, Error **errp)
> > > +{
> > > +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> > > +
> > > +    return spapr->nvdimm_enabled;
> > > +}
> > > +
> > > +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
> > > +{
> > > +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> > > +
> > > +    spapr->nvdimm_enabled = value;
> > > +}
> > > +
> > >   static void spapr_instance_init(Object *obj)
> > >   {
> > >       sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> > > @@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
> > >       object_property_set_description(obj, "ic-mode",
> > >                    "Specifies the interrupt controller mode (xics, xive, dual)",
> > >                    NULL);
> > > +    object_property_add_bool(obj, "nvdimm",
> > > +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
> > > +    object_property_set_description(obj, "nvdimm",
> > > +                                    "Enable support for nvdimm devices",
> > > +                                    NULL);
> > I'm not seeing a lot of point to this machine parameter.
> Just followed what the x86 is doing here.

Hmm.  I wonder what the rationale for the property is there.

> > >   }
> > >   static void spapr_machine_finalizefn(Object *obj)
> > > @@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
> > >       }
> > >   }
> > > +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
> > > +                                      uint64_t addr, uint64_t size,
> > > +                                      uint64_t label_size)
> > > +{
> > > +    int offset;
> > > +    char buf[40];
> > > +    GString *lcode = g_string_sized_new(10);
> > > +    sPAPRDRConnector *drc;
> > > +    QemuUUID uuid;
> > > +    uint32_t drc_idx;
> > > +    uint32_t associativity[] = {
> > > +        cpu_to_be32(0x4), /* length */
> > > +        cpu_to_be32(0x0), cpu_to_be32(0x0),
> > > +        cpu_to_be32(0x0), cpu_to_be32(node)
> > > +    };
> > > +
> > > +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> > > +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> > > +    g_assert(drc);
> > > +
> > > +    drc_idx = spapr_drc_index(drc);
> > > +
> > > +    sprintf(buf, "pmem@%x", drc_idx);
> > > +    offset = fdt_add_subnode(fdt, fdt_offset, buf);
> > "fdt_offset" vs. "offset" isn't very obvious.  Maybe parent_offset /
> > child_offset or something?
> Ok
> > > +    _FDT(offset);
> > > +
> > > +    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
> > > +    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
> > > +    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));
> > Again, no need to set "name".
> Ok
> > > +    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
> > > +
> > > +    /*NB : Supposed to be random strings. Currently empty 10 strings! */
> > > +    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
> > > +    g_string_free(lcode, TRUE);
> > I think leaving this property out would be preferable to including it
> > but putting nothing useful there.
> Ok.
> > > +
> > > +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
> > > +                      sizeof(associativity))));
> > > +    g_random_set_seed(drc_idx);
> > > +    qemu_uuid_generate(&uuid);
> > This looks bogus.  I'm guessing the set seed is so that you generate
> > consistent UUIDs for the same NVDIMM in a guest.  First, that's making
> > a lot of assumptions about how qemu_uuid_generate() works that aren't
> > really warranted.  Second, it poisons the RNG for anything running
> > after this which actually wants (pseudo) random numbers.
> > 
> > I think you need to make the UUID a property of the device instead.
> Ok.
> > > +
> > > +    qemu_uuid_unparse(&uuid, buf);
> > > +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
> > > +
> > > +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
> > > +
> > > +    /*NB : What it should be? */
> > > +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
> > > +
> > > +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
> > > +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> > > +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
> > > +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> > > +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
> > > +
> > > +    return offset;
> > > +}
> > > +
> > > +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
> > > +                             uint64_t size, uint32_t node,
> > > +                             Error **errp)
> > > +{
> > > +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> > > +    sPAPRDRConnector *drc;
> > > +    bool hotplugged = spapr_drc_hotplugged(dev);
> > > +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> > > +    void *fdt;
> > > +    int fdt_offset, fdt_size;
> > > +    Error *local_err = NULL;
> > > +
> > > +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
> > > +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> > > +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> > > +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> > > +    g_assert(drc);
> > Creating the DRC in the hotplug path looks bogus.  Generally the DRC
> > has to exist before you can even attempt to plug the device.
> 
> We dont really know how many DRC to create. Unlike memory hotplug
> where we know how many LMBs are required to fit till the maxmem, in this
> case we dont know how many NVDIMM devices  guest can have. That is the
> reason I am creating the DRC on demand. I'll see if it is possible to
> address this
> by putting a cap on maximum number of NVDIMM devices a guest can have.

Urgh, PAPR.  First it specifies a crappy hotplug model that requires
zillions of fixed attachment points to be instantiated, then it breaks
its own model.

But.. I still don't really understand how this works.

a) How does the guest know the DRC index to use for the new NVDIMM?
   Generally that comes from the device tree, but the guest doesn't
   get new device tree information until it calls configure-connector
   for which it needs the DRC index.

b) AFAICT, NVDIMMs would also require HPT space, much like regular
   memory would.  PowerVM doesn't have HPT resizing, so surely it must
   already have some sort of cap on the amount of NVDIMM space in
   order to size the HPT correctly.


> > > +    fdt = create_device_tree(&fdt_size);
> > > +    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
> > > +                                            size, nvdimm->label_size);
> > > +
> > > +    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
> > > +    if (local_err) {
> > > +        error_propagate(errp, local_err);
> > > +        return;
> > > +    }
> > > +
> > > +    if (hotplugged) {
> > > +        spapr_hotplug_req_add_by_index(drc);
> > > +    }
> > > +}
> > > +
> > >   static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >                                 Error **errp)
> > >   {
> > >       Error *local_err = NULL;
> > >       sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> > >       PCDIMMDevice *dimm = PC_DIMM(dev);
> > > +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> > >       uint64_t size, addr;
> > >       uint32_t node;
> > > @@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >       node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
> > >                                       &error_abort);
> > > -    spapr_add_lmbs(dev, addr, size, node,
> > > -                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> > > -                   &local_err);
> > > +    if (!is_nvdimm) {
> > > +        spapr_add_lmbs(dev, addr, size, node,
> > > +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> > > +                       &local_err);
> > > +    } else {
> > > +        spapr_add_nvdimm(dev, addr, size, node, &local_err);
> > > +    }
> > > +
> > >       if (local_err) {
> > >           goto out_unplug;
> > >       }
> > > @@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >   {
> > >       const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
> > >       sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> > > +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> > >       PCDIMMDevice *dimm = PC_DIMM(dev);
> > >       Error *local_err = NULL;
> > >       uint64_t size;
> > > @@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >           return;
> > >       }
> > > -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> > > +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
> > >           error_setg(errp, "Hotplugged memory size must be a multiple of "
> > > -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> > > +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> > >           return;
> > > +    } else if (is_nvdimm) {
> > > +        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> > > +        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
> > > +            error_setg(errp, "NVDIMM memory size must be a multiple of "
> > > +                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> > > +            return;
> > > +        }
> > > +        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
> > > +            error_setg(errp, "NVDIMM size must be atleast "
> > > +                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> > > +            return;
> > > +        }
> > > +
> > > +        /* Align to scm block size, exclude the label */
> > > +        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
> > > +               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);
> > > +        if (local_err) {
> > > +            error_propagate(errp, local_err);
> > > +            return;
> > > +        }
> > >       }
> > >       memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
> > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > > index 2edb7d1e9c..94ddd102cc 100644
> > > --- a/hw/ppc/spapr_drc.c
> > > +++ b/hw/ppc/spapr_drc.c
> > > @@ -696,6 +696,16 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
> > >       drck->release = spapr_lmb_release;
> > >   }
> > > +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
> > > +{
> > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> > > +
> > > +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
> > > +    drck->typename = "MEM";
> > > +    drck->drc_name_prefix = "PMEM ";
> > > +    drck->release = NULL;
> > > +}
> > > +
> > >   static const TypeInfo spapr_dr_connector_info = {
> > >       .name          = TYPE_SPAPR_DR_CONNECTOR,
> > >       .parent        = TYPE_DEVICE,
> > > @@ -739,6 +749,12 @@ static const TypeInfo spapr_drc_lmb_info = {
> > >       .class_init    = spapr_drc_lmb_class_init,
> > >   };
> > > +static const TypeInfo spapr_drc_pmem_info = {
> > > +    .name          = TYPE_SPAPR_DRC_PMEM,
> > > +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
> > > +    .class_init    = spapr_drc_pmem_class_init,
> > > +};
> > > +
> > >   /* helper functions for external users */
> > >   sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
> > > @@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void)
> > >       type_register_static(&spapr_drc_cpu_info);
> > >       type_register_static(&spapr_drc_pci_info);
> > >       type_register_static(&spapr_drc_lmb_info);
> > > +    type_register_static(&spapr_drc_pmem_info);
> > >       spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
> > >                           rtas_set_indicator);
> > > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > > index 32719a1b72..a4fed84346 100644
> > > --- a/hw/ppc/spapr_events.c
> > > +++ b/hw/ppc/spapr_events.c
> > > @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
> > >   #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
> > >   #define RTAS_LOG_V6_HP_TYPE_PHB                          4
> > >   #define RTAS_LOG_V6_HP_TYPE_PCI                          5
> > > +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
> > >       uint8_t hotplug_action;
> > >   #define RTAS_LOG_V6_HP_ACTION_ADD                        1
> > >   #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> > > @@ -526,6 +527,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> > >       case SPAPR_DR_CONNECTOR_TYPE_CPU:
> > >           hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
> > >           break;
> > > +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
> > > +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
> > > +        break;
> > >       default:
> > >           /* we shouldn't be signaling hotplug events for resources
> > >            * that don't support them
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index a947a0a0dc..21a9709afe 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -187,6 +187,7 @@ struct sPAPRMachineState {
> > >       bool cmd_line_caps[SPAPR_CAP_NUM];
> > >       sPAPRCapabilities def, eff, mig;
> > > +    bool nvdimm_enabled;
> > >   };
> > >   #define H_SUCCESS         0
> > > @@ -798,6 +799,15 @@ int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
> > >   #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
> > >   #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
> > > +/*
> > > + * The nvdimm size should be aligned to SCM block size.
> > > + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
> > > + * inorder to have SCM regions not to overlap with dimm memory regions.
> > > + * The SCM devices can have variable block sizes. For now, fixing the
> > > + * block size to the minimum value.
> > > + */
> > > +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
> > > +
> > >   void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
> > >   #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
> > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > > index f6ff32e7e2..65925d00b1 100644
> > > --- a/include/hw/ppc/spapr_drc.h
> > > +++ b/include/hw/ppc/spapr_drc.h
> > > @@ -70,6 +70,13 @@
> > >   #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
> > >                                           TYPE_SPAPR_DRC_LMB)
> > > +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
> > > +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
> > > +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PMEM)
> > > +#define SPAPR_DRC_PMEM_CLASS(klass) \
> > > +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PMEM)
> > > +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
> > > +                                        TYPE_SPAPR_DRC_PMEM)
> > >   /*
> > >    * Various hotplug types managed by sPAPRDRConnector
> > >    *
> > > @@ -87,6 +94,7 @@ typedef enum {
> > >       SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
> > >       SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
> > >       SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
> > > +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
> > >   } sPAPRDRConnectorTypeShift;
> > >   typedef enum {
> > > @@ -96,6 +104,7 @@ typedef enum {
> > >       SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
> > >       SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
> > >       SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
> > > +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
> > >   } sPAPRDRConnectorType;
> > >   /*
> > > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-17 23:02       ` David Gibson
@ 2019-02-18 16:15         ` Shivaprasad G Bhat
  2019-02-27  4:27           ` David Gibson
  0 siblings, 1 reply; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-18 16:15 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav,
	imammedo



On 02/18/2019 04:32 AM, David Gibson wrote:
> On Fri, Feb 15, 2019 at 04:41:09PM +0530, Shivaprasad G Bhat wrote:
>> Thanks for the comments David. Please find my replies inline..
>>
>>
>> On 02/12/2019 07:19 AM, David Gibson wrote:
>>> On Tue, Feb 05, 2019 at 11:26:27PM -0600, Shivaprasad G Bhat wrote:
>>>> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
>>>> device interface in QEMU to support virtual NVDIMM devices for Power (May have
>>>> to re-look at this later).  Create the required DT entries for the
>>>> device (some entries have dummy values right now).
>>>>
>>>> The patch creates the required DT node and sends a hotplug
>>>> interrupt to the guest. Guest is expected to undertake the normal
>>>> DR resource add path in response and start issuing PAPR SCM hcalls.
>>>>
>>>> This is how it can be used ..
>>>> Add nvdimm=on to the qemu machine argument.
>>>> Ex : -machine pseries,nvdimm=on
>>>> For coldplug, the device to be added in qemu command line as shown below
>>>> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
>>>> -device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
>>>>
>>>> For hotplug, the device to be added from monitor as below
>>>> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
>>>> device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
>>>>
>>>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>>>> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
>>>>                  [Early implementation]
>>>> ---
>>>>    default-configs/ppc64-softmmu.mak |    1
>>>>    hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
>>>>    hw/ppc/spapr_drc.c                |   17 +++
>>>>    hw/ppc/spapr_events.c             |    4 +
>>>>    include/hw/ppc/spapr.h            |   10 ++
>>>>    include/hw/ppc/spapr_drc.h        |    9 ++
>>>>    6 files changed, 241 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>>>> index 7f34ad0528..b6e1aa5125 100644
>>>> --- a/default-configs/ppc64-softmmu.mak
>>>> +++ b/default-configs/ppc64-softmmu.mak
>>>> @@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
>>>>    CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>>>>    CONFIG_MEM_DEVICE=y
>>>>    CONFIG_DIMM=y
>>>> +CONFIG_NVDIMM=y
>>>>    CONFIG_SPAPR_RNG=y
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 0fcdd35cbe..7e7a1a8041 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -73,6 +73,7 @@
>>>>    #include "qemu/cutils.h"
>>>>    #include "hw/ppc/spapr_cpu_core.h"
>>>>    #include "hw/mem/memory-device.h"
>>>> +#include "hw/mem/nvdimm.h"
>>>>    #include <libfdt.h>
>>>> @@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>>>>        uint8_t *int_buf, *cur_index, buf_len;
>>>>        int ret;
>>>>        uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>>>> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>>>>        uint64_t addr, cur_addr, size;
>>>>        uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
>>>>        uint64_t mem_end = machine->device_memory->base +
>>>> @@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>>>>                nr_entries++;
>>>>            }
>>>> -        /* Entry for DIMM */
>>>> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>>>> -        g_assert(drc);
>>>> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
>>>> -                                     spapr_drc_index(drc), node,
>>>> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
>>>> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
>>>> +            /* Entry for NVDIMM */
>>>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
>>>> +            g_assert(drc);
>>>> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
>>>> +                                         spapr_drc_index(drc), -1, 0);
>>>> +            cur_addr = ROUND_UP(addr + size, scm_block_size);
>>>> +        } else {
>>>> +            /* Entry for DIMM */
>>>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>>>> +            g_assert(drc);
>>>> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
>>>> +                                         spapr_drc_index(drc), node,
>>>> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
>>>> +            cur_addr = addr + size;
>>>> +        }
>>>>            QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>>>>            nr_entries++;
>>>> -        cur_addr = addr + size;
>>>>        }
>>>>        /* Entry for remaining hotpluggable area */
>>>> @@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
>>>>        }
>>>>    }
>>>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
>>>> +                                      uint32_t node, uint64_t addr,
>>>> +                                      uint64_t size, uint64_t label_size);
>>> Re-ordering the code is generally preferred to static forward declarations.
>> Ok
>>>> +static void spapr_create_nvdimm(void *fdt)
>>> I'm trying to standardize on spapr_dt_*() for functions which generate
>>> bits of the device tree.
>> Ok. Will rename to spapr_dt_create_nvdimm
> Just spapr_dt_nvdimm() would be preferred.
Ok.
>
>>>> +{
>>>> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
>>>> +    GSList *dimms = NULL;
>>>> +
>>>> +    if (offset < 0) {
>>>> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
>>>> +        _FDT(offset);
>>>> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
>>>> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
>>>> +        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));
>>> No need to explicitly set the "name" property, that's implicit in the
>>> node name.
>> Ok
>>>> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
>>>> +                                 "ibm,persistent-memory")));
>>>> +    }
>>>> +
>>>> +    /*NB : Add drc-info array here */
>>>> +
>>>> +    /* Create DT entries for cold plugged NVDIMM devices */
>>>> +    dimms = nvdimm_get_device_list();
>>>> +    for (; dimms; dimms = dimms->next) {
>>>> +        NVDIMMDevice *nvdimm = dimms->data;
>>>> +        PCDIMMDevice *di = PC_DIMM(nvdimm);
>>>> +        uint64_t lsize = nvdimm->label_size;
>>>> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
>>>> +                                           NULL);
>>>> +
>>>> +        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
>>>> +                                   size, lsize);
>>> It might be cleaner to just pass the NVDIMMDevice * rather than
>>> umpteen parameters.
>> Ok.
>>>> +    }
>>>> +    g_slist_free(dimms);
>>>> +    return;
>>>> +}
>>>> +
>>>>    static void *spapr_build_fdt(sPAPRMachineState *spapr)
>>>>    {
>>>>        MachineState *machine = MACHINE(spapr);
>>>> @@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
>>>>            exit(1);
>>>>        }
>>>> +    /* NVDIMM devices */
>>>> +    if (spapr->nvdimm_enabled) {
>>>> +        spapr_create_nvdimm(fdt);
>>>> +    }
>>>> +
>>>>        return fdt;
>>>>    }
>>>> @@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
>>>>        }
>>>>    }
>>>> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
>>>> +{
>>>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>>>> +
>>>> +    return spapr->nvdimm_enabled;
>>>> +}
>>>> +
>>>> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
>>>> +{
>>>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>>>> +
>>>> +    spapr->nvdimm_enabled = value;
>>>> +}
>>>> +
>>>>    static void spapr_instance_init(Object *obj)
>>>>    {
>>>>        sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>>>> @@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
>>>>        object_property_set_description(obj, "ic-mode",
>>>>                     "Specifies the interrupt controller mode (xics, xive, dual)",
>>>>                     NULL);
>>>> +    object_property_add_bool(obj, "nvdimm",
>>>> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
>>>> +    object_property_set_description(obj, "nvdimm",
>>>> +                                    "Enable support for nvdimm devices",
>>>> +                                    NULL);
>>> I'm not seeing a lot of point to this machine parameter.
>> Just followed what the x86 is doing here.
> Hmm.  I wonder what the rationale for the property is there.
>
>>>>    }
>>>>    static void spapr_machine_finalizefn(Object *obj)
>>>> @@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
>>>>        }
>>>>    }
>>>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
>>>> +                                      uint64_t addr, uint64_t size,
>>>> +                                      uint64_t label_size)
>>>> +{
>>>> +    int offset;
>>>> +    char buf[40];
>>>> +    GString *lcode = g_string_sized_new(10);
>>>> +    sPAPRDRConnector *drc;
>>>> +    QemuUUID uuid;
>>>> +    uint32_t drc_idx;
>>>> +    uint32_t associativity[] = {
>>>> +        cpu_to_be32(0x4), /* length */
>>>> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
>>>> +        cpu_to_be32(0x0), cpu_to_be32(node)
>>>> +    };
>>>> +
>>>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>>>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>>>> +    g_assert(drc);
>>>> +
>>>> +    drc_idx = spapr_drc_index(drc);
>>>> +
>>>> +    sprintf(buf, "pmem@%x", drc_idx);
>>>> +    offset = fdt_add_subnode(fdt, fdt_offset, buf);
>>> "fdt_offset" vs. "offset" isn't very obvious.  Maybe parent_offset /
>>> child_offset or something?
>> Ok
>>>> +    _FDT(offset);
>>>> +
>>>> +    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
>>>> +    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
>>>> +    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));
>>> Again, no need to set "name".
>> Ok
>>>> +    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
>>>> +
>>>> +    /*NB : Supposed to be random strings. Currently empty 10 strings! */
>>>> +    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
>>>> +    g_string_free(lcode, TRUE);
>>> I think leaving this property out would be preferable to including it
>>> but putting nothing useful there.
>> Ok.
>>>> +
>>>> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
>>>> +                      sizeof(associativity))));
>>>> +    g_random_set_seed(drc_idx);
>>>> +    qemu_uuid_generate(&uuid);
>>> This looks bogus.  I'm guessing the set seed is so that you generate
>>> consistent UUIDs for the same NVDIMM in a guest.  First, that's making
>>> a lot of assumptions about how qemu_uuid_generate() works that aren't
>>> really warranted.  Second, it poisons the RNG for anything running
>>> after this which actually wants (pseudo) random numbers.
>>>
>>> I think you need to make the UUID a property of the device instead.
>> Ok.
>>>> +
>>>> +    qemu_uuid_unparse(&uuid, buf);
>>>> +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
>>>> +
>>>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
>>>> +
>>>> +    /*NB : What it should be? */
>>>> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
>>>> +
>>>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
>>>> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>>>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
>>>> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>>>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
>>>> +
>>>> +    return offset;
>>>> +}
>>>> +
>>>> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
>>>> +                             uint64_t size, uint32_t node,
>>>> +                             Error **errp)
>>>> +{
>>>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
>>>> +    sPAPRDRConnector *drc;
>>>> +    bool hotplugged = spapr_drc_hotplugged(dev);
>>>> +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
>>>> +    void *fdt;
>>>> +    int fdt_offset, fdt_size;
>>>> +    Error *local_err = NULL;
>>>> +
>>>> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
>>>> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>>>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>>>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>>>> +    g_assert(drc);
>>> Creating the DRC in the hotplug path looks bogus.  Generally the DRC
>>> has to exist before you can even attempt to plug the device.
>> We dont really know how many DRC to create. Unlike memory hotplug
>> where we know how many LMBs are required to fit till the maxmem, in this
>> case we dont know how many NVDIMM devices  guest can have. That is the
>> reason I am creating the DRC on demand. I'll see if it is possible to
>> address this
>> by putting a cap on maximum number of NVDIMM devices a guest can have.
> Urgh, PAPR.  First it specifies a crappy hotplug model that requires
> zillions of fixed attachment points to be instantiated, then it breaks
> its own model.
>
> But.. I still don't really understand how this works.
>
> a) How does the guest know the DRC index to use for the new NVDIMM?
>     Generally that comes from the device tree, but the guest doesn't
>     get new device tree information until it calls configure-connector
>     for which it needs the DRC index.
The DRC is passed in the device tree blob passed as payload of hotplug 
interrupt
from which the guest picks the DRC index and makes the subsequent calls.
> b) AFAICT, NVDIMMs would also require HPT space, much like regular
>     memory would.  PowerVM doesn't have HPT resizing, so surely it must
>     already have some sort of cap on the amount of NVDIMM space in
>     order to size the HPT correctly.
On Power KVM we will enforce the NVDIMM is mapped within the maxmem,
however the spec allows outside of it. Coming back to the original point of
creating the DRCs at the hotplug time, we could impose a limit on the
number of NVDIMM devices that could be hotplugged so that we can
create the DRCs at the machine init time.
>>>> +    fdt = create_device_tree(&fdt_size);
>>>> +    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
>>>> +                                            size, nvdimm->label_size);
>>>> +
>>>> +    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
>>>> +    if (local_err) {
>>>> +        error_propagate(errp, local_err);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (hotplugged) {
>>>> +        spapr_hotplug_req_add_by_index(drc);
>>>> +    }
>>>> +}
>>>> +
>>>>    static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>                                  Error **errp)
>>>>    {
>>>>        Error *local_err = NULL;
>>>>        sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>>>>        PCDIMMDevice *dimm = PC_DIMM(dev);
>>>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>>>        uint64_t size, addr;
>>>>        uint32_t node;
>>>> @@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
>>>>                                        &error_abort);
>>>> -    spapr_add_lmbs(dev, addr, size, node,
>>>> -                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>>>> -                   &local_err);
>>>> +    if (!is_nvdimm) {
>>>> +        spapr_add_lmbs(dev, addr, size, node,
>>>> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>>>> +                       &local_err);
>>>> +    } else {
>>>> +        spapr_add_nvdimm(dev, addr, size, node, &local_err);
>>>> +    }
>>>> +
>>>>        if (local_err) {
>>>>            goto out_unplug;
>>>>        }
>>>> @@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>    {
>>>>        const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>>>>        sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
>>>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>>>        PCDIMMDevice *dimm = PC_DIMM(dev);
>>>>        Error *local_err = NULL;
>>>>        uint64_t size;
>>>> @@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>            return;
>>>>        }
>>>> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>>>> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
>>>>            error_setg(errp, "Hotplugged memory size must be a multiple of "
>>>> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>>>> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>>>>            return;
>>>> +    } else if (is_nvdimm) {
>>>> +        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
>>>> +        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
>>>> +            error_setg(errp, "NVDIMM memory size must be a multiple of "
>>>> +                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>>>> +            return;
>>>> +        }
>>>> +        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
>>>> +            error_setg(errp, "NVDIMM size must be atleast "
>>>> +                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        /* Align to scm block size, exclude the label */
>>>> +        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
>>>> +               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);
>>>> +        if (local_err) {
>>>> +            error_propagate(errp, local_err);
>>>> +            return;
>>>> +        }
>>>>        }
>>>>        memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
>>>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>>>> index 2edb7d1e9c..94ddd102cc 100644
>>>> --- a/hw/ppc/spapr_drc.c
>>>> +++ b/hw/ppc/spapr_drc.c
>>>> @@ -696,6 +696,16 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
>>>>        drck->release = spapr_lmb_release;
>>>>    }
>>>> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
>>>> +{
>>>> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
>>>> +
>>>> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
>>>> +    drck->typename = "MEM";
>>>> +    drck->drc_name_prefix = "PMEM ";
>>>> +    drck->release = NULL;
>>>> +}
>>>> +
>>>>    static const TypeInfo spapr_dr_connector_info = {
>>>>        .name          = TYPE_SPAPR_DR_CONNECTOR,
>>>>        .parent        = TYPE_DEVICE,
>>>> @@ -739,6 +749,12 @@ static const TypeInfo spapr_drc_lmb_info = {
>>>>        .class_init    = spapr_drc_lmb_class_init,
>>>>    };
>>>> +static const TypeInfo spapr_drc_pmem_info = {
>>>> +    .name          = TYPE_SPAPR_DRC_PMEM,
>>>> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
>>>> +    .class_init    = spapr_drc_pmem_class_init,
>>>> +};
>>>> +
>>>>    /* helper functions for external users */
>>>>    sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
>>>> @@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void)
>>>>        type_register_static(&spapr_drc_cpu_info);
>>>>        type_register_static(&spapr_drc_pci_info);
>>>>        type_register_static(&spapr_drc_lmb_info);
>>>> +    type_register_static(&spapr_drc_pmem_info);
>>>>        spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>>>>                            rtas_set_indicator);
>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>> index 32719a1b72..a4fed84346 100644
>>>> --- a/hw/ppc/spapr_events.c
>>>> +++ b/hw/ppc/spapr_events.c
>>>> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
>>>>    #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
>>>>    #define RTAS_LOG_V6_HP_TYPE_PHB                          4
>>>>    #define RTAS_LOG_V6_HP_TYPE_PCI                          5
>>>> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
>>>>        uint8_t hotplug_action;
>>>>    #define RTAS_LOG_V6_HP_ACTION_ADD                        1
>>>>    #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
>>>> @@ -526,6 +527,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>>>>        case SPAPR_DR_CONNECTOR_TYPE_CPU:
>>>>            hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
>>>>            break;
>>>> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
>>>> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
>>>> +        break;
>>>>        default:
>>>>            /* we shouldn't be signaling hotplug events for resources
>>>>             * that don't support them
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index a947a0a0dc..21a9709afe 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -187,6 +187,7 @@ struct sPAPRMachineState {
>>>>        bool cmd_line_caps[SPAPR_CAP_NUM];
>>>>        sPAPRCapabilities def, eff, mig;
>>>> +    bool nvdimm_enabled;
>>>>    };
>>>>    #define H_SUCCESS         0
>>>> @@ -798,6 +799,15 @@ int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
>>>>    #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
>>>>    #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
>>>> +/*
>>>> + * The nvdimm size should be aligned to SCM block size.
>>>> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
>>>> + * inorder to have SCM regions not to overlap with dimm memory regions.
>>>> + * The SCM devices can have variable block sizes. For now, fixing the
>>>> + * block size to the minimum value.
>>>> + */
>>>> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
>>>> +
>>>>    void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>>>>    #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>>>> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
>>>> index f6ff32e7e2..65925d00b1 100644
>>>> --- a/include/hw/ppc/spapr_drc.h
>>>> +++ b/include/hw/ppc/spapr_drc.h
>>>> @@ -70,6 +70,13 @@
>>>>    #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>>>>                                            TYPE_SPAPR_DRC_LMB)
>>>> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
>>>> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
>>>> +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PMEM)
>>>> +#define SPAPR_DRC_PMEM_CLASS(klass) \
>>>> +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PMEM)
>>>> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>>>> +                                        TYPE_SPAPR_DRC_PMEM)
>>>>    /*
>>>>     * Various hotplug types managed by sPAPRDRConnector
>>>>     *
>>>> @@ -87,6 +94,7 @@ typedef enum {
>>>>        SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
>>>>        SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
>>>>        SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
>>>> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
>>>>    } sPAPRDRConnectorTypeShift;
>>>>    typedef enum {
>>>> @@ -96,6 +104,7 @@ typedef enum {
>>>>        SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
>>>>        SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
>>>>        SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
>>>> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
>>>>    } sPAPRDRConnectorType;
>>>>    /*
>>>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/4] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-02-15 11:11     ` Shivaprasad G Bhat
@ 2019-02-19  5:33       ` David Gibson
  0 siblings, 0 replies; 19+ messages in thread
From: David Gibson @ 2019-02-19  5:33 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav,
	imammedo

[-- Attachment #1: Type: text/plain, Size: 15960 bytes --]

On Fri, Feb 15, 2019 at 04:41:10PM +0530, Shivaprasad G Bhat wrote:
> 
> 
> On 02/12/2019 07:58 AM, David Gibson wrote:
> > On Tue, Feb 05, 2019 at 11:26:41PM -0600, Shivaprasad G Bhat wrote:
> > > This patch implements few of the necessary hcalls for the nvdimm support.
> > > 
> > > PAPR semantics is such that each NVDIMM device is comprising of multiple
> > > SCM(Storage Class Memory) blocks. The guest requests the hypervisor to bind
> > > each of the SCM blocks of the NVDIMM device using hcalls. There can be
> > > SCM block unbind requests in case of driver errors or unplug(not supported now)
> > > use cases. The NVDIMM label read/writes are done through hcalls.
> > > 
> > > Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
> > > unbind, and queries using hcalls on those blocks can come independently. This
> > > doesn't fit well into the qemu device semantics, where the map/unmap are done
> > > at the (whole)device/object level granularity. The patch doesnt actually
> > > bind/unbind on hcalls but let it happen at the object_add/del phase itself
> > > instead.
> > > 
> > > The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
> > > region level granularity. Without interleaving, each virtual NVDIMM device is
> > > presented as separate region. There is no way to configure the virtual NVDIMM
> > > interleaving for the guests today. So, there is no way a partial bind/unbind
> > > request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
> > > virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
> > > object_add/del.
> > Hrm.  I don't entirely follow the above, but implementing something
> > that doesn't really match the PAPR model seems like it could lead to
> > problems.
> 
> In qemu, the device is mapped at the hotplug stage. However the SCM blocks
> map requests can come later block by block. So, we will have to figure out
> if NVDIMM device model is the right fit here.

I don't really understand what that means.  Is there any documentation
I can get on the PAPR pmem model?

> The interleaving of the NVDIMMs actually can send requests for binding
> different blocks of different devices on demand, and thus have partial
> mapping.
> But, I dont see how interleaving can be supported for Virtual NVDIMMs given
> the existing support is only from firmware interfaces like
> UEFI/BIOS.

Um.. I don't know what you mean by interleaving.

> I chose this approach given virtual NVDIMM interleaving support chances are
> less and so pre-mapping is safe, and we can build on the existing NVDIMM
> model.
> 
> > > The kernel today is not using the hcalls - h_scm_mem_query, h_scm_mem_clear,
> > > h_scm_query_logical_mem_binding and h_scm_query_block_mem_binding. They are just
> > > stubs in this patch.
> > > 
> > > Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> > > ---
> > >   hw/ppc/spapr_hcall.c   |  230 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >   include/hw/ppc/spapr.h |   12 ++-
> > >   2 files changed, 240 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > > index 17bcaa3822..40553e80d6 100644
> > > --- a/hw/ppc/spapr_hcall.c
> > > +++ b/hw/ppc/spapr_hcall.c
> > > @@ -3,11 +3,13 @@
> > >   #include "sysemu/hw_accel.h"
> > >   #include "sysemu/sysemu.h"
> > >   #include "qemu/log.h"
> > > +#include "qemu/range.h"
> > >   #include "qemu/error-report.h"
> > >   #include "cpu.h"
> > >   #include "exec/exec-all.h"
> > >   #include "helper_regs.h"
> > >   #include "hw/ppc/spapr.h"
> > > +#include "hw/ppc/spapr_drc.h"
> > >   #include "hw/ppc/spapr_cpu_core.h"
> > >   #include "mmu-hash64.h"
> > >   #include "cpu-models.h"
> > > @@ -16,6 +18,7 @@
> > >   #include "hw/ppc/spapr_ovec.h"
> > >   #include "mmu-book3s-v3.h"
> > >   #include "hw/mem/memory-device.h"
> > > +#include "hw/mem/nvdimm.h"
> > >   struct LPCRSyncState {
> > >       target_ulong value;
> > > @@ -1808,6 +1811,222 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> > >       return H_SUCCESS;
> > >   }
> > > +static target_ulong h_scm_read_metadata(PowerPCCPU *cpu,
> > > +                                        sPAPRMachineState *spapr,
> > > +                                        target_ulong opcode,
> > > +                                        target_ulong *args)
> > > +{
> > > +    uint32_t drc_index = args[0];
> > > +    uint64_t offset = args[1];
> > > +    uint8_t numBytesToRead = args[2];
> > This will truncate the argument to 8 bits _before_ you validate it,
> > which doesn't seem like what you want.
> I'll fix it.
> 
> > > +    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
> > > +    NVDIMMDevice *nvdimm = NULL;
> > > +    NVDIMMClass *ddc = NULL;
> > > +
> > > +    if (numBytesToRead != 1 && numBytesToRead != 2 &&
> > > +        numBytesToRead != 4 && numBytesToRead != 8) {
> > > +        return H_P3;
> > > +    }
> > > +
> > > +    if (offset & (numBytesToRead - 1)) {
> > > +        return H_P2;
> > > +    }
> > > +
> > > +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> > > +        return H_PARAMETER;
> > > +    }
> > > +
> > > +    nvdimm = NVDIMM(drc->dev);
> > > +    ddc = NVDIMM_GET_CLASS(nvdimm);
> > > +
> > > +    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);
> > Hm.  Is this the only way to access the label data, or is it also
> > mapped into the guest visible address space?  I ask because some of
> > the calculations you made about size+label_size in an earlier patch
> > seemed to suggest it was part of the address space.
> Yes. The label is not mapped to the guest visible address space.
> You are right in pointing that out, its a bug.
> That is not needed as in the same patch I am doing
> QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE) to the
> nvdimm size in spapr_memory_pre_plug().
> 
> > > +    return H_SUCCESS;
> > > +}
> > > +
> > > +
> > > +static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
> > > +                                         sPAPRMachineState *spapr,
> > > +                                         target_ulong opcode,
> > > +                                         target_ulong *args)
> > > +{
> > > +    uint32_t drc_index = args[0];
> > > +    uint64_t offset = args[1];
> > > +    uint64_t data = args[2];
> > > +    int8_t numBytesToWrite = args[3];
> > > +    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
> > > +    NVDIMMDevice *nvdimm = NULL;
> > > +    DeviceState *dev = NULL;
> > > +    NVDIMMClass *ddc = NULL;
> > > +
> > > +    if (numBytesToWrite != 1 && numBytesToWrite != 2 &&
> > > +        numBytesToWrite != 4 && numBytesToWrite != 8) {
> > > +        return H_P4;
> > > +    }
> > > +
> > > +    if (offset & (numBytesToWrite - 1)) {
> > > +        return H_P2;
> > > +    }
> > > +
> > > +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> > > +        return H_PARAMETER;
> > > +    }
> > > +
> > > +    dev = drc->dev;
> > > +    nvdimm = NVDIMM(dev);
> > > +    if (offset >= nvdimm->label_size) {
> > > +        return H_P3;
> > > +    }
> > > +
> > > +    ddc = NVDIMM_GET_CLASS(nvdimm);
> > > +
> > > +    ddc->write_label_data(nvdimm, &data, numBytesToWrite, offset);
> > > +
> > > +    return H_SUCCESS;
> > > +}
> > > +
> > > +static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> > > +                                        target_ulong opcode,
> > > +                                        target_ulong *args)
> > > +{
> > > +    uint32_t drc_index = args[0];
> > > +    uint64_t starting_index = args[1];
> > > +    uint64_t no_of_scm_blocks_to_bind = args[2];
> > > +    uint64_t target_logical_mem_addr = args[3];
> > > +    uint64_t continue_token = args[4];
> > > +    uint64_t size;
> > > +    uint64_t total_no_of_scm_blocks;
> > > +
> > > +    sPAPRDRConnector *drc = spapr_drc_by_index(drc_index);
> > > +    hwaddr addr;
> > > +    DeviceState *dev = NULL;
> > > +    PCDIMMDevice *dimm = NULL;
> > > +    Error *local_err = NULL;
> > > +
> > > +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> > > +        return H_PARAMETER;
> > > +    }
> > > +
> > > +    dev = drc->dev;
> > > +    dimm = PC_DIMM(dev);
> > > +
> > > +    size = object_property_get_uint(OBJECT(dimm),
> > > +                                    PC_DIMM_SIZE_PROP, &local_err);
> > > +    if (local_err) {
> > > +        error_report_err(local_err);
> > > +        return H_PARAMETER;
> > This should probably be H_HARDWARE, no?  The error isn't caused by one
> > of the parameters.
> Its not clearly defined, so I chose H_PARAMETER to suggest the drc index
> was probably wrong.
> > > +    }
> > > +
> > > +    total_no_of_scm_blocks = size / SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> > > +
> > > +    if (starting_index > total_no_of_scm_blocks) {
> > > +        return H_P2;
> > > +    }
> > > +
> > > +    if ((starting_index + no_of_scm_blocks_to_bind) >
> > > total_no_of_scm_blocks) {
> > You should probably have a check for integer overflow here as well,
> > just to be thorough.
> Ok
> > > +        return H_P3;
> > > +    }
> > > +
> > > +    /* Currently qemu assigns the address. */
> > > +    if (target_logical_mem_addr != 0xffffffffffffffff) {
> > > +        return H_OVERLAP;
> > > +    }
> > > +
> > > +    /*
> > > +     * Currently continue token should be zero qemu has already bound
> > > +     * everything and this hcall doesnt return H_BUSY.
> > > +     */
> > > +    if (continue_token > 0) {
> > > +        return H_P5;
> > > +    }
> > > +
> > > +    /* NB : Already bound, Return target logical address in R4 */
> > > +    addr = object_property_get_uint(OBJECT(dimm),
> > > +                                    PC_DIMM_ADDR_PROP, &local_err);
> > > +    if (local_err) {
> > > +        error_report_err(local_err);
> > > +        return H_PARAMETER;
> > > +    }
> > > +
> > > +    args[1] = addr;
> > > +
> > > +    return H_SUCCESS;
> > > +}
> > > +
> > > +static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> > > +                                        target_ulong opcode,
> > > +                                        target_ulong *args)
> > > +{
> > > +    uint64_t starting_scm_logical_addr = args[0];
> > > +    uint64_t no_of_scm_blocks_to_unbind = args[1];
> > > +    uint64_t size_to_unbind;
> > > +    uint64_t continue_token = args[2];
> > > +    Range as = range_empty;
> > > +    GSList *dimms = NULL;
> > > +    bool valid = false;
> > > +
> > > +    size_to_unbind = no_of_scm_blocks_to_unbind * SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> > > +
> > > +    /* Check if starting_scm_logical_addr is block aligned */
> > > +    if (!QEMU_IS_ALIGNED(starting_scm_logical_addr,
> > > +                         SPAPR_MINIMUM_SCM_BLOCK_SIZE)) {
> > > +        return H_PARAMETER;
> > > +    }
> > > +
> > > +    range_init_nofail(&as, starting_scm_logical_addr, size_to_unbind);
> > > +
> > > +    dimms = nvdimm_get_device_list();
> > > +    for (; dimms; dimms = dimms->next) {
> > > +        NVDIMMDevice *nvdimm = dimms->data;
> > > +        Range tmp;
> > > +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
> > > +                                           NULL);
> > > +        int addr = object_property_get_int(OBJECT(nvdimm), PC_DIMM_ADDR_PROP,
> > > +                                           NULL);
> > > +        range_init_nofail(&tmp, addr, size);
> > > +
> > > +        if (range_contains_range(&tmp, &as)) {
> > > +            valid = true;
> > > +            break;
> > > +        }
> > > +    }
> > > +
> > > +    if (!valid) {
> > > +        return H_P2;
> > > +    }
> > > +
> > > +    if (continue_token > 0) {
> > > +        return H_P3;
> > > +    }
> > > +
> > > +    /*NB : dont do anything, let object_del take care of this for now. */
> > > +
> > > +    return H_SUCCESS;
> > > +}
> > > +
> > > +static target_ulong h_scm_query_block_mem_binding(PowerPCCPU *cpu,
> > > +                                                  sPAPRMachineState *spapr,
> > > +                                                  target_ulong opcode,
> > > +                                                  target_ulong *args)
> > > +{
> > > +    return H_SUCCESS;
> > > +}
> > > +
> > > +static target_ulong h_scm_query_logical_mem_binding(PowerPCCPU *cpu,
> > > +                                                    sPAPRMachineState *spapr,
> > > +                                                    target_ulong opcode,
> > > +                                                    target_ulong *args)
> > > +{
> > > +    return H_SUCCESS;
> > > +}
> > > +
> > > +static target_ulong h_scm_mem_query(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> > > +                                        target_ulong opcode,
> > > +                                        target_ulong *args)
> > > +{
> > > +    return H_SUCCESS;
> > > +}
> > > +
> > >   static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
> > >   static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
> > > @@ -1907,6 +2126,17 @@ static void hypercall_register_types(void)
> > >       /* qemu/KVM-PPC specific hcalls */
> > >       spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
> > > +    /* qemu/scm specific hcalls */
> > > +    spapr_register_hypercall(H_SCM_READ_METADATA, h_scm_read_metadata);
> > > +    spapr_register_hypercall(H_SCM_WRITE_METADATA, h_scm_write_metadata);
> > > +    spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
> > > +    spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
> > > +    spapr_register_hypercall(H_SCM_QUERY_BLOCK_MEM_BINDING,
> > > +                             h_scm_query_block_mem_binding);
> > > +    spapr_register_hypercall(H_SCM_QUERY_LOGICAL_MEM_BINDING,
> > > +                             h_scm_query_logical_mem_binding);
> > > +    spapr_register_hypercall(H_SCM_MEM_QUERY, h_scm_mem_query);
> > > +
> > >       /* ibm,client-architecture-support support */
> > >       spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index 21a9709afe..28249567f4 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -268,6 +268,7 @@ struct sPAPRMachineState {
> > >   #define H_P7              -60
> > >   #define H_P8              -61
> > >   #define H_P9              -62
> > > +#define H_OVERLAP         -68
> > >   #define H_UNSUPPORTED_FLAG -256
> > >   #define H_MULTI_THREADS_ACTIVE -9005
> > > @@ -473,8 +474,15 @@ struct sPAPRMachineState {
> > >   #define H_INT_ESB               0x3C8
> > >   #define H_INT_SYNC              0x3CC
> > >   #define H_INT_RESET             0x3D0
> > > -
> > > -#define MAX_HCALL_OPCODE        H_INT_RESET
> > > +#define H_SCM_READ_METADATA     0x3E4
> > > +#define H_SCM_WRITE_METADATA     0x3E8
> > > +#define H_SCM_BIND_MEM          0x3EC
> > > +#define H_SCM_UNBIND_MEM        0x3F0
> > > +#define H_SCM_QUERY_BLOCK_MEM_BINDING 0x3F4
> > > +#define H_SCM_QUERY_LOGICAL_MEM_BINDING 0x3F8
> > > +#define H_SCM_MEM_QUERY         0x3FC
> > > +
> > > +#define MAX_HCALL_OPCODE        H_SCM_MEM_QUERY
> > >   /* The hcalls above are standardized in PAPR and implemented by pHyp
> > >    * as well.
> > > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/4] mem: make nvdimm_device_list global
  2019-02-06  5:25 ` [Qemu-devel] [RFC PATCH 1/4] mem: make nvdimm_device_list global Shivaprasad G Bhat
@ 2019-02-19  7:59   ` Igor Mammedov
  0 siblings, 0 replies; 19+ messages in thread
From: Igor Mammedov @ 2019-02-19  7:59 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav, david

On Tue, 05 Feb 2019 23:25:54 -0600
Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:

> nvdimm_device_list is required for parsing the list for devices
> in subsequent patches. Move it to common area.
> 
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  hw/acpi/nvdimm.c        |   27 ---------------------------
>  hw/mem/nvdimm.c         |   27 +++++++++++++++++++++++++++
>  include/hw/mem/nvdimm.h |    2 ++
>  3 files changed, 29 insertions(+), 27 deletions(-)
> 
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index e53b2cb681..34322298c2 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -33,33 +33,6 @@
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/mem/nvdimm.h"
>  
> -static int nvdimm_device_list(Object *obj, void *opaque)
> -{
> -    GSList **list = opaque;
> -
> -    if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
> -        *list = g_slist_append(*list, DEVICE(obj));
> -    }
> -
> -    object_child_foreach(obj, nvdimm_device_list, opaque);
> -    return 0;
> -}
> -
> -/*
> - * inquire NVDIMM devices and link them into the list which is
> - * returned to the caller.
> - *
> - * Note: it is the caller's responsibility to free the list to avoid
> - * memory leak.
> - */
> -static GSList *nvdimm_get_device_list(void)
> -{
> -    GSList *list = NULL;
> -
> -    object_child_foreach(qdev_get_machine(), nvdimm_device_list, &list);
> -    return list;
> -}
> -
>  #define NVDIMM_UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)             \
>     { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
>       (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,          \
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index bf2adf5e16..f221ec7a9a 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -29,6 +29,33 @@
>  #include "hw/mem/nvdimm.h"
>  #include "hw/mem/memory-device.h"
>  
> +static int nvdimm_device_list(Object *obj, void *opaque)
> +{
> +    GSList **list = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
> +        *list = g_slist_append(*list, DEVICE(obj));
> +    }
> +
> +    object_child_foreach(obj, nvdimm_device_list, opaque);
> +    return 0;
> +}
> +
> +/*
> + * inquire NVDIMM devices and link them into the list which is
> + * returned to the caller.
> + *
> + * Note: it is the caller's responsibility to free the list to avoid
> + * memory leak.
> + */
> +GSList *nvdimm_get_device_list(void)
> +{
> +    GSList *list = NULL;
> +
> +    object_child_foreach(qdev_get_machine(), nvdimm_device_list, &list);
> +    return list;
> +}
> +
>  static void nvdimm_get_label_size(Object *obj, Visitor *v, const char *name,
>                                    void *opaque, Error **errp)
>  {
> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> index c5c9b3c7f8..e8b086f2df 100644
> --- a/include/hw/mem/nvdimm.h
> +++ b/include/hw/mem/nvdimm.h
> @@ -150,4 +150,6 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
>                         uint32_t ram_slots);
>  void nvdimm_plug(AcpiNVDIMMState *state);
>  void nvdimm_acpi_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev);
> +GSList *nvdimm_get_device_list(void);
> +
>  #endif
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support Shivaprasad G Bhat
  2019-02-12  1:49   ` David Gibson
@ 2019-02-19  8:11   ` Igor Mammedov
  2019-02-19  9:29     ` Shivaprasad G Bhat
  1 sibling, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2019-02-19  8:11 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav, david

On Tue, 05 Feb 2019 23:26:27 -0600
Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:

> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
> device interface in QEMU to support virtual NVDIMM devices for Power (May have
> to re-look at this later).  Create the required DT entries for the
> device (some entries have dummy values right now).
> 
> The patch creates the required DT node and sends a hotplug
> interrupt to the guest. Guest is expected to undertake the normal
> DR resource add path in response and start issuing PAPR SCM hcalls.
> 
> This is how it can be used ..
> Add nvdimm=on to the qemu machine argument.
> Ex : -machine pseries,nvdimm=on
> For coldplug, the device to be added in qemu command line as shown below
> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> -device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> 
> For hotplug, the device to be added from monitor as below
> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> 
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
>                [Early implementation]
> ---
>  default-configs/ppc64-softmmu.mak |    1 
>  hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
>  hw/ppc/spapr_drc.c                |   17 +++
>  hw/ppc/spapr_events.c             |    4 +
>  include/hw/ppc/spapr.h            |   10 ++
>  include/hw/ppc/spapr_drc.h        |    9 ++
>  6 files changed, 241 insertions(+), 12 deletions(-)
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 7f34ad0528..b6e1aa5125 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
>  CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_MEM_DEVICE=y
>  CONFIG_DIMM=y
> +CONFIG_NVDIMM=y
>  CONFIG_SPAPR_RNG=y
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0fcdd35cbe..7e7a1a8041 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -73,6 +73,7 @@
>  #include "qemu/cutils.h"
>  #include "hw/ppc/spapr_cpu_core.h"
>  #include "hw/mem/memory-device.h"
> +#include "hw/mem/nvdimm.h"
>  
>  #include <libfdt.h>
>  
> @@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>      uint8_t *int_buf, *cur_index, buf_len;
>      int ret;
>      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>      uint64_t addr, cur_addr, size;
>      uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
>      uint64_t mem_end = machine->device_memory->base +
> @@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>              nr_entries++;
>          }
>  
> -        /* Entry for DIMM */
> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> -        g_assert(drc);
> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
> -                                     spapr_drc_index(drc), node,
> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> +            /* Entry for NVDIMM */
> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
> +            g_assert(drc);
> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
> +                                         spapr_drc_index(drc), -1, 0);
> +            cur_addr = ROUND_UP(addr + size, scm_block_size);
> +        } else {
> +            /* Entry for DIMM */
> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> +            g_assert(drc);
> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
> +                                         spapr_drc_index(drc), node,
> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
> +            cur_addr = addr + size;
> +        }
>          QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>          nr_entries++;
> -        cur_addr = addr + size;
>      }
>  
>      /* Entry for remaining hotpluggable area */
> @@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
>      }
>  }
>  
> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
> +                                      uint32_t node, uint64_t addr,
> +                                      uint64_t size, uint64_t label_size);
> +static void spapr_create_nvdimm(void *fdt)
> +{
> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
> +    GSList *dimms = NULL;
> +
> +    if (offset < 0) {
> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
> +        _FDT(offset);
> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
> +        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));
> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
> +                                 "ibm,persistent-memory")));
> +    }
> +
> +    /*NB : Add drc-info array here */
> +
> +    /* Create DT entries for cold plugged NVDIMM devices */
> +    dimms = nvdimm_get_device_list();
> +    for (; dimms; dimms = dimms->next) {
> +        NVDIMMDevice *nvdimm = dimms->data;
> +        PCDIMMDevice *di = PC_DIMM(nvdimm);
> +        uint64_t lsize = nvdimm->label_size;
> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
> +                                           NULL);
> +
> +        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
> +                                   size, lsize);
> +    }
> +    g_slist_free(dimms);
> +    return;
> +}
> +
>  static void *spapr_build_fdt(sPAPRMachineState *spapr)
>  {
>      MachineState *machine = MACHINE(spapr);
> @@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
>          exit(1);
>      }
>  
> +    /* NVDIMM devices */
> +    if (spapr->nvdimm_enabled) {
> +        spapr_create_nvdimm(fdt);
> +    }
> +
>      return fdt;
>  }
>  
> @@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
>      }
>  }
>  
> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    return spapr->nvdimm_enabled;
> +}
> +
> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    spapr->nvdimm_enabled = value;
> +}
> +
>  static void spapr_instance_init(Object *obj)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> @@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
>      object_property_set_description(obj, "ic-mode",
>                   "Specifies the interrupt controller mode (xics, xive, dual)",
>                   NULL);
> +    object_property_add_bool(obj, "nvdimm",
> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
> +    object_property_set_description(obj, "nvdimm",
> +                                    "Enable support for nvdimm devices",
> +                                    NULL);
>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> @@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
>      }
>  }
>  
> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
> +                                      uint64_t addr, uint64_t size,
> +                                      uint64_t label_size)
> +{
> +    int offset;
> +    char buf[40];
> +    GString *lcode = g_string_sized_new(10);
> +    sPAPRDRConnector *drc;
> +    QemuUUID uuid;
> +    uint32_t drc_idx;
> +    uint32_t associativity[] = {
> +        cpu_to_be32(0x4), /* length */
> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
> +        cpu_to_be32(0x0), cpu_to_be32(node)
> +    };
> +
> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> +    g_assert(drc);
> +
> +    drc_idx = spapr_drc_index(drc);
> +
> +    sprintf(buf, "pmem@%x", drc_idx);
> +    offset = fdt_add_subnode(fdt, fdt_offset, buf);
> +    _FDT(offset);
> +
> +    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
> +    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
> +    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));
> +    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
> +
> +    /*NB : Supposed to be random strings. Currently empty 10 strings! */
> +    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
> +    g_string_free(lcode, TRUE);
> +
> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
> +                      sizeof(associativity))));
> +    g_random_set_seed(drc_idx);
> +    qemu_uuid_generate(&uuid);
> +
> +    qemu_uuid_unparse(&uuid, buf);
> +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
> +
> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
> +
> +    /*NB : What it should be? */
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
> +
> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
> +
> +    return offset;
> +}
> +
> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
> +                             uint64_t size, uint32_t node,
> +                             Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> +    sPAPRDRConnector *drc;
> +    bool hotplugged = spapr_drc_hotplugged(dev);
> +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> +    void *fdt;
> +    int fdt_offset, fdt_size;
> +    Error *local_err = NULL;
> +
> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> +    g_assert(drc);
> +
> +    fdt = create_device_tree(&fdt_size);
> +    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
> +                                            size, nvdimm->label_size);
> +
> +    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    if (hotplugged) {
> +        spapr_hotplug_req_add_by_index(drc);
> +    }
> +}
> +
>  static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                                Error **errp)
>  {
>      Error *local_err = NULL;
>      sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>      PCDIMMDevice *dimm = PC_DIMM(dev);
> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>      uint64_t size, addr;
>      uint32_t node;
>  
> @@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>  
>      node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
>                                      &error_abort);
> -    spapr_add_lmbs(dev, addr, size, node,
> -                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> -                   &local_err);
> +    if (!is_nvdimm) {
> +        spapr_add_lmbs(dev, addr, size, node,
> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> +                       &local_err);
> +    } else {
> +        spapr_add_nvdimm(dev, addr, size, node, &local_err);
> +    }
> +
>      if (local_err) {
>          goto out_unplug;
>      }
> @@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>  {
>      const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>      sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>      PCDIMMDevice *dimm = PC_DIMM(dev);
>      Error *local_err = NULL;
>      uint64_t size;
> @@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          return;
>      }
>  
> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
>          error_setg(errp, "Hotplugged memory size must be a multiple of "
> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>          return;
> +    } else if (is_nvdimm) {
> +        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> +        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
> +            error_setg(errp, "NVDIMM memory size must be a multiple of "
> +                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> +            return;
> +        }
> +        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
> +            error_setg(errp, "NVDIMM size must be atleast "
> +                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> +            return;
> +        }
> +
> +        /* Align to scm block size, exclude the label */
> +        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
> +               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);
I'm not sure that arbitrarily fixing up region size is the right thing to do
and also what you are trying to achieve here isn't clear, could you explain it some more?

> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
>      }
>  
>      memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 2edb7d1e9c..94ddd102cc 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -696,6 +696,16 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
>      drck->release = spapr_lmb_release;
>  }
>  
> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> +
> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
> +    drck->typename = "MEM";
> +    drck->drc_name_prefix = "PMEM ";
> +    drck->release = NULL;
> +}
> +
>  static const TypeInfo spapr_dr_connector_info = {
>      .name          = TYPE_SPAPR_DR_CONNECTOR,
>      .parent        = TYPE_DEVICE,
> @@ -739,6 +749,12 @@ static const TypeInfo spapr_drc_lmb_info = {
>      .class_init    = spapr_drc_lmb_class_init,
>  };
>  
> +static const TypeInfo spapr_drc_pmem_info = {
> +    .name          = TYPE_SPAPR_DRC_PMEM,
> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
> +    .class_init    = spapr_drc_pmem_class_init,
> +};
> +
>  /* helper functions for external users */
>  
>  sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
> @@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void)
>      type_register_static(&spapr_drc_cpu_info);
>      type_register_static(&spapr_drc_pci_info);
>      type_register_static(&spapr_drc_lmb_info);
> +    type_register_static(&spapr_drc_pmem_info);
>  
>      spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>                          rtas_set_indicator);
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 32719a1b72..a4fed84346 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
>  #define RTAS_LOG_V6_HP_TYPE_PHB                          4
>  #define RTAS_LOG_V6_HP_TYPE_PCI                          5
> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
>      uint8_t hotplug_action;
>  #define RTAS_LOG_V6_HP_ACTION_ADD                        1
>  #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> @@ -526,6 +527,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>      case SPAPR_DR_CONNECTOR_TYPE_CPU:
>          hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
>          break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
> +        break;
>      default:
>          /* we shouldn't be signaling hotplug events for resources
>           * that don't support them
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index a947a0a0dc..21a9709afe 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -187,6 +187,7 @@ struct sPAPRMachineState {
>  
>      bool cmd_line_caps[SPAPR_CAP_NUM];
>      sPAPRCapabilities def, eff, mig;
> +    bool nvdimm_enabled;
>  };
>  
>  #define H_SUCCESS         0
> @@ -798,6 +799,15 @@ int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
>  #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
>  #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
>  
> +/*
> + * The nvdimm size should be aligned to SCM block size.
> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
> + * inorder to have SCM regions not to overlap with dimm memory regions.
> + * The SCM devices can have variable block sizes. For now, fixing the
> + * block size to the minimum value.
> + */
> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
> +
>  void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>  
>  #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index f6ff32e7e2..65925d00b1 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -70,6 +70,13 @@
>  #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>                                          TYPE_SPAPR_DRC_LMB)
>  
> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
> +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PMEM)
> +#define SPAPR_DRC_PMEM_CLASS(klass) \
> +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PMEM)
> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
> +                                        TYPE_SPAPR_DRC_PMEM)
>  /*
>   * Various hotplug types managed by sPAPRDRConnector
>   *
> @@ -87,6 +94,7 @@ typedef enum {
>      SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
>      SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
>      SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
>  } sPAPRDRConnectorTypeShift;
>  
>  typedef enum {
> @@ -96,6 +104,7 @@ typedef enum {
>      SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
>      SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
>      SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
>  } sPAPRDRConnectorType;
>  
>  /*
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-19  8:11   ` Igor Mammedov
@ 2019-02-19  9:29     ` Shivaprasad G Bhat
  2019-02-21 14:12       ` Igor Mammedov
  0 siblings, 1 reply; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-19  9:29 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav, david



On 02/19/2019 01:41 PM, Igor Mammedov wrote:
> On Tue, 05 Feb 2019 23:26:27 -0600
> Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:
>
>> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
>> device interface in QEMU to support virtual NVDIMM devices for Power (May have
>> to re-look at this later).  Create the required DT entries for the
>> device (some entries have dummy values right now).
>>
>> The patch creates the required DT node and sends a hotplug
>> interrupt to the guest. Guest is expected to undertake the normal
>> DR resource add path in response and start issuing PAPR SCM hcalls.
>>
>> This is how it can be used ..
>> Add nvdimm=on to the qemu machine argument.
>> Ex : -machine pseries,nvdimm=on
>> For coldplug, the device to be added in qemu command line as shown below
>> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
>> -device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
>>
>> For hotplug, the device to be added from monitor as below
>> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
>> device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
>>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
>>                 [Early implementation]
>> ---
>>   default-configs/ppc64-softmmu.mak |    1
>>   hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
>>   hw/ppc/spapr_drc.c                |   17 +++
>>   hw/ppc/spapr_events.c             |    4 +
>>   include/hw/ppc/spapr.h            |   10 ++
>>   include/hw/ppc/spapr_drc.h        |    9 ++
>>   6 files changed, 241 insertions(+), 12 deletions(-)
>>
>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>> index 7f34ad0528..b6e1aa5125 100644
>> --- a/default-configs/ppc64-softmmu.mak
>> +++ b/default-configs/ppc64-softmmu.mak
>> @@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
>>   CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>>   CONFIG_MEM_DEVICE=y
>>   CONFIG_DIMM=y
>> +CONFIG_NVDIMM=y
>>   CONFIG_SPAPR_RNG=y
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 0fcdd35cbe..7e7a1a8041 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -73,6 +73,7 @@
>>   #include "qemu/cutils.h"
>>   #include "hw/ppc/spapr_cpu_core.h"
>>   #include "hw/mem/memory-device.h"
>> +#include "hw/mem/nvdimm.h"
>>   
>>   #include <libfdt.h>
>>   
>> @@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>>       uint8_t *int_buf, *cur_index, buf_len;
>>       int ret;
>>       uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>>       uint64_t addr, cur_addr, size;
>>       uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
>>       uint64_t mem_end = machine->device_memory->base +
>> @@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>>               nr_entries++;
>>           }
>>   
>> -        /* Entry for DIMM */
>> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>> -        g_assert(drc);
>> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
>> -                                     spapr_drc_index(drc), node,
>> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
>> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
>> +            /* Entry for NVDIMM */
>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
>> +            g_assert(drc);
>> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
>> +                                         spapr_drc_index(drc), -1, 0);
>> +            cur_addr = ROUND_UP(addr + size, scm_block_size);
>> +        } else {
>> +            /* Entry for DIMM */
>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>> +            g_assert(drc);
>> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
>> +                                         spapr_drc_index(drc), node,
>> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
>> +            cur_addr = addr + size;
>> +        }
>>           QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>>           nr_entries++;
>> -        cur_addr = addr + size;
>>       }
>>   
>>       /* Entry for remaining hotpluggable area */
>> @@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
>>       }
>>   }
>>   
>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
>> +                                      uint32_t node, uint64_t addr,
>> +                                      uint64_t size, uint64_t label_size);
>> +static void spapr_create_nvdimm(void *fdt)
>> +{
>> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
>> +    GSList *dimms = NULL;
>> +
>> +    if (offset < 0) {
>> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
>> +        _FDT(offset);
>> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
>> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
>> +        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));
>> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
>> +                                 "ibm,persistent-memory")));
>> +    }
>> +
>> +    /*NB : Add drc-info array here */
>> +
>> +    /* Create DT entries for cold plugged NVDIMM devices */
>> +    dimms = nvdimm_get_device_list();
>> +    for (; dimms; dimms = dimms->next) {
>> +        NVDIMMDevice *nvdimm = dimms->data;
>> +        PCDIMMDevice *di = PC_DIMM(nvdimm);
>> +        uint64_t lsize = nvdimm->label_size;
>> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
>> +                                           NULL);
>> +
>> +        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
>> +                                   size, lsize);
>> +    }
>> +    g_slist_free(dimms);
>> +    return;
>> +}
>> +
>>   static void *spapr_build_fdt(sPAPRMachineState *spapr)
>>   {
>>       MachineState *machine = MACHINE(spapr);
>> @@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
>>           exit(1);
>>       }
>>   
>> +    /* NVDIMM devices */
>> +    if (spapr->nvdimm_enabled) {
>> +        spapr_create_nvdimm(fdt);
>> +    }
>> +
>>       return fdt;
>>   }
>>   
>> @@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
>>       }
>>   }
>>   
>> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
>> +{
>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>> +
>> +    return spapr->nvdimm_enabled;
>> +}
>> +
>> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
>> +{
>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>> +
>> +    spapr->nvdimm_enabled = value;
>> +}
>> +
>>   static void spapr_instance_init(Object *obj)
>>   {
>>       sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>> @@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
>>       object_property_set_description(obj, "ic-mode",
>>                    "Specifies the interrupt controller mode (xics, xive, dual)",
>>                    NULL);
>> +    object_property_add_bool(obj, "nvdimm",
>> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
>> +    object_property_set_description(obj, "nvdimm",
>> +                                    "Enable support for nvdimm devices",
>> +                                    NULL);
>>   }
>>   
>>   static void spapr_machine_finalizefn(Object *obj)
>> @@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
>>       }
>>   }
>>   
>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
>> +                                      uint64_t addr, uint64_t size,
>> +                                      uint64_t label_size)
>> +{
>> +    int offset;
>> +    char buf[40];
>> +    GString *lcode = g_string_sized_new(10);
>> +    sPAPRDRConnector *drc;
>> +    QemuUUID uuid;
>> +    uint32_t drc_idx;
>> +    uint32_t associativity[] = {
>> +        cpu_to_be32(0x4), /* length */
>> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
>> +        cpu_to_be32(0x0), cpu_to_be32(node)
>> +    };
>> +
>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>> +    g_assert(drc);
>> +
>> +    drc_idx = spapr_drc_index(drc);
>> +
>> +    sprintf(buf, "pmem@%x", drc_idx);
>> +    offset = fdt_add_subnode(fdt, fdt_offset, buf);
>> +    _FDT(offset);
>> +
>> +    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
>> +    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
>> +    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));
>> +    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
>> +
>> +    /*NB : Supposed to be random strings. Currently empty 10 strings! */
>> +    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
>> +    g_string_free(lcode, TRUE);
>> +
>> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
>> +                      sizeof(associativity))));
>> +    g_random_set_seed(drc_idx);
>> +    qemu_uuid_generate(&uuid);
>> +
>> +    qemu_uuid_unparse(&uuid, buf);
>> +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
>> +
>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
>> +
>> +    /*NB : What it should be? */
>> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
>> +
>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
>> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
>> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
>> +
>> +    return offset;
>> +}
>> +
>> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
>> +                             uint64_t size, uint32_t node,
>> +                             Error **errp)
>> +{
>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
>> +    sPAPRDRConnector *drc;
>> +    bool hotplugged = spapr_drc_hotplugged(dev);
>> +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
>> +    void *fdt;
>> +    int fdt_offset, fdt_size;
>> +    Error *local_err = NULL;
>> +
>> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
>> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>> +    g_assert(drc);
>> +
>> +    fdt = create_device_tree(&fdt_size);
>> +    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
>> +                                            size, nvdimm->label_size);
>> +
>> +    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    if (hotplugged) {
>> +        spapr_hotplug_req_add_by_index(drc);
>> +    }
>> +}
>> +
>>   static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>                                 Error **errp)
>>   {
>>       Error *local_err = NULL;
>>       sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>>       PCDIMMDevice *dimm = PC_DIMM(dev);
>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>       uint64_t size, addr;
>>       uint32_t node;
>>   
>> @@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>   
>>       node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
>>                                       &error_abort);
>> -    spapr_add_lmbs(dev, addr, size, node,
>> -                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>> -                   &local_err);
>> +    if (!is_nvdimm) {
>> +        spapr_add_lmbs(dev, addr, size, node,
>> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>> +                       &local_err);
>> +    } else {
>> +        spapr_add_nvdimm(dev, addr, size, node, &local_err);
>> +    }
>> +
>>       if (local_err) {
>>           goto out_unplug;
>>       }
>> @@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>   {
>>       const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>>       sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>       PCDIMMDevice *dimm = PC_DIMM(dev);
>>       Error *local_err = NULL;
>>       uint64_t size;
>> @@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>           return;
>>       }
>>   
>> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
>>           error_setg(errp, "Hotplugged memory size must be a multiple of "
>> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>>           return;
>> +    } else if (is_nvdimm) {
>> +        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
>> +        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
>> +            error_setg(errp, "NVDIMM memory size must be a multiple of "
>> +                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>> +            return;
>> +        }
>> +        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
>> +            error_setg(errp, "NVDIMM size must be atleast "
>> +                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>> +            return;
>> +        }
>> +
>> +        /* Align to scm block size, exclude the label */
>> +        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
>> +               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);
> I'm not sure that arbitrarily fixing up region size is the right thing to do
> and also what you are trying to achieve here isn't clear, could you explain it some more?
The resize is required to allow the subsequent memory hotplugs to work. The
base address(if not specified) for the next dimm hotplug, starts at the 
end of
this region. If the region is not aligned to LMB size, guest refuses to 
claim the
newly hotplugged memory.  The label area can be small and need not be
aligned to (LMB/SCM block) size. The region size is actually the size 
minus the
label_size which can be unaligned to LMB size. So, align down to SCM block
size is necessary here.
>
>> +        if (local_err) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>>       }
>>   
>>       memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>> index 2edb7d1e9c..94ddd102cc 100644
>> --- a/hw/ppc/spapr_drc.c
>> +++ b/hw/ppc/spapr_drc.c
>> @@ -696,6 +696,16 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
>>       drck->release = spapr_lmb_release;
>>   }
>>   
>> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
>> +{
>> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
>> +
>> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
>> +    drck->typename = "MEM";
>> +    drck->drc_name_prefix = "PMEM ";
>> +    drck->release = NULL;
>> +}
>> +
>>   static const TypeInfo spapr_dr_connector_info = {
>>       .name          = TYPE_SPAPR_DR_CONNECTOR,
>>       .parent        = TYPE_DEVICE,
>> @@ -739,6 +749,12 @@ static const TypeInfo spapr_drc_lmb_info = {
>>       .class_init    = spapr_drc_lmb_class_init,
>>   };
>>   
>> +static const TypeInfo spapr_drc_pmem_info = {
>> +    .name          = TYPE_SPAPR_DRC_PMEM,
>> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
>> +    .class_init    = spapr_drc_pmem_class_init,
>> +};
>> +
>>   /* helper functions for external users */
>>   
>>   sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
>> @@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void)
>>       type_register_static(&spapr_drc_cpu_info);
>>       type_register_static(&spapr_drc_pci_info);
>>       type_register_static(&spapr_drc_lmb_info);
>> +    type_register_static(&spapr_drc_pmem_info);
>>   
>>       spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>>                           rtas_set_indicator);
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 32719a1b72..a4fed84346 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
>>   #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
>>   #define RTAS_LOG_V6_HP_TYPE_PHB                          4
>>   #define RTAS_LOG_V6_HP_TYPE_PCI                          5
>> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
>>       uint8_t hotplug_action;
>>   #define RTAS_LOG_V6_HP_ACTION_ADD                        1
>>   #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
>> @@ -526,6 +527,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>>       case SPAPR_DR_CONNECTOR_TYPE_CPU:
>>           hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
>>           break;
>> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
>> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
>> +        break;
>>       default:
>>           /* we shouldn't be signaling hotplug events for resources
>>            * that don't support them
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index a947a0a0dc..21a9709afe 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -187,6 +187,7 @@ struct sPAPRMachineState {
>>   
>>       bool cmd_line_caps[SPAPR_CAP_NUM];
>>       sPAPRCapabilities def, eff, mig;
>> +    bool nvdimm_enabled;
>>   };
>>   
>>   #define H_SUCCESS         0
>> @@ -798,6 +799,15 @@ int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
>>   #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
>>   #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
>>   
>> +/*
>> + * The nvdimm size should be aligned to SCM block size.
>> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
>> + * inorder to have SCM regions not to overlap with dimm memory regions.
>> + * The SCM devices can have variable block sizes. For now, fixing the
>> + * block size to the minimum value.
>> + */
>> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
>> +
>>   void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>>   
>>   #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
>> index f6ff32e7e2..65925d00b1 100644
>> --- a/include/hw/ppc/spapr_drc.h
>> +++ b/include/hw/ppc/spapr_drc.h
>> @@ -70,6 +70,13 @@
>>   #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>>                                           TYPE_SPAPR_DRC_LMB)
>>   
>> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
>> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
>> +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PMEM)
>> +#define SPAPR_DRC_PMEM_CLASS(klass) \
>> +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PMEM)
>> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>> +                                        TYPE_SPAPR_DRC_PMEM)
>>   /*
>>    * Various hotplug types managed by sPAPRDRConnector
>>    *
>> @@ -87,6 +94,7 @@ typedef enum {
>>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
>>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
>>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
>> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
>>   } sPAPRDRConnectorTypeShift;
>>   
>>   typedef enum {
>> @@ -96,6 +104,7 @@ typedef enum {
>>       SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
>>       SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
>>       SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
>> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
>>   } sPAPRDRConnectorType;
>>   
>>   /*
>>
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-19  9:29     ` Shivaprasad G Bhat
@ 2019-02-21 14:12       ` Igor Mammedov
  2019-02-28  8:54         ` Shivaprasad G Bhat
  0 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2019-02-21 14:12 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav, david

On Tue, 19 Feb 2019 14:59:25 +0530
Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:

> On 02/19/2019 01:41 PM, Igor Mammedov wrote:
> > On Tue, 05 Feb 2019 23:26:27 -0600
> > Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:
> >  
> >> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
> >> device interface in QEMU to support virtual NVDIMM devices for Power (May have
> >> to re-look at this later).  Create the required DT entries for the
> >> device (some entries have dummy values right now).
> >>
> >> The patch creates the required DT node and sends a hotplug
> >> interrupt to the guest. Guest is expected to undertake the normal
> >> DR resource add path in response and start issuing PAPR SCM hcalls.
> >>
> >> This is how it can be used ..
> >> Add nvdimm=on to the qemu machine argument.
> >> Ex : -machine pseries,nvdimm=on
> >> For coldplug, the device to be added in qemu command line as shown below
> >> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> >> -device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> >>
> >> For hotplug, the device to be added from monitor as below
> >> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> >> device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> >>
> >> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> >> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> >>                 [Early implementation]
> >> ---
> >>   default-configs/ppc64-softmmu.mak |    1
> >>   hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
> >>   hw/ppc/spapr_drc.c                |   17 +++
> >>   hw/ppc/spapr_events.c             |    4 +
> >>   include/hw/ppc/spapr.h            |   10 ++
> >>   include/hw/ppc/spapr_drc.h        |    9 ++
> >>   6 files changed, 241 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> >> index 7f34ad0528..b6e1aa5125 100644
> >> --- a/default-configs/ppc64-softmmu.mak
> >> +++ b/default-configs/ppc64-softmmu.mak
> >> @@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
> >>   CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
> >>   CONFIG_MEM_DEVICE=y
> >>   CONFIG_DIMM=y
> >> +CONFIG_NVDIMM=y
> >>   CONFIG_SPAPR_RNG=y
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 0fcdd35cbe..7e7a1a8041 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -73,6 +73,7 @@
> >>   #include "qemu/cutils.h"
> >>   #include "hw/ppc/spapr_cpu_core.h"
> >>   #include "hw/mem/memory-device.h"
> >> +#include "hw/mem/nvdimm.h"
> >>   
> >>   #include <libfdt.h>
> >>   
> >> @@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
> >>       uint8_t *int_buf, *cur_index, buf_len;
> >>       int ret;
> >>       uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> >> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> >>       uint64_t addr, cur_addr, size;
> >>       uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
> >>       uint64_t mem_end = machine->device_memory->base +
> >> @@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
> >>               nr_entries++;
> >>           }
> >>   
> >> -        /* Entry for DIMM */
> >> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> >> -        g_assert(drc);
> >> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
> >> -                                     spapr_drc_index(drc), node,
> >> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
> >> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> >> +            /* Entry for NVDIMM */
> >> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
> >> +            g_assert(drc);
> >> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
> >> +                                         spapr_drc_index(drc), -1, 0);
> >> +            cur_addr = ROUND_UP(addr + size, scm_block_size);
> >> +        } else {
> >> +            /* Entry for DIMM */
> >> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> >> +            g_assert(drc);
> >> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
> >> +                                         spapr_drc_index(drc), node,
> >> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
> >> +            cur_addr = addr + size;
> >> +        }
> >>           QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
> >>           nr_entries++;
> >> -        cur_addr = addr + size;
> >>       }
> >>   
> >>       /* Entry for remaining hotpluggable area */
> >> @@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
> >>       }
> >>   }
> >>   
> >> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
> >> +                                      uint32_t node, uint64_t addr,
> >> +                                      uint64_t size, uint64_t label_size);
> >> +static void spapr_create_nvdimm(void *fdt)
> >> +{
> >> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
> >> +    GSList *dimms = NULL;
> >> +
> >> +    if (offset < 0) {
> >> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
> >> +        _FDT(offset);
> >> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
> >> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
> >> +        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));
> >> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
> >> +                                 "ibm,persistent-memory")));
> >> +    }
> >> +
> >> +    /*NB : Add drc-info array here */
> >> +
> >> +    /* Create DT entries for cold plugged NVDIMM devices */
> >> +    dimms = nvdimm_get_device_list();
> >> +    for (; dimms; dimms = dimms->next) {
> >> +        NVDIMMDevice *nvdimm = dimms->data;
> >> +        PCDIMMDevice *di = PC_DIMM(nvdimm);
> >> +        uint64_t lsize = nvdimm->label_size;
> >> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
> >> +                                           NULL);
> >> +
> >> +        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
> >> +                                   size, lsize);
> >> +    }
> >> +    g_slist_free(dimms);
> >> +    return;
> >> +}
> >> +
> >>   static void *spapr_build_fdt(sPAPRMachineState *spapr)
> >>   {
> >>       MachineState *machine = MACHINE(spapr);
> >> @@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
> >>           exit(1);
> >>       }
> >>   
> >> +    /* NVDIMM devices */
> >> +    if (spapr->nvdimm_enabled) {
> >> +        spapr_create_nvdimm(fdt);
> >> +    }
> >> +
> >>       return fdt;
> >>   }
> >>   
> >> @@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
> >>       }
> >>   }
> >>   
> >> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
> >> +{
> >> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> >> +
> >> +    return spapr->nvdimm_enabled;
> >> +}
> >> +
> >> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
> >> +{
> >> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> >> +
> >> +    spapr->nvdimm_enabled = value;
> >> +}
> >> +
> >>   static void spapr_instance_init(Object *obj)
> >>   {
> >>       sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> >> @@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
> >>       object_property_set_description(obj, "ic-mode",
> >>                    "Specifies the interrupt controller mode (xics, xive, dual)",
> >>                    NULL);
> >> +    object_property_add_bool(obj, "nvdimm",
> >> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
> >> +    object_property_set_description(obj, "nvdimm",
> >> +                                    "Enable support for nvdimm devices",
> >> +                                    NULL);
> >>   }
> >>   
> >>   static void spapr_machine_finalizefn(Object *obj)
> >> @@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
> >>       }
> >>   }
> >>   
> >> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
> >> +                                      uint64_t addr, uint64_t size,
> >> +                                      uint64_t label_size)
> >> +{
> >> +    int offset;
> >> +    char buf[40];
> >> +    GString *lcode = g_string_sized_new(10);
> >> +    sPAPRDRConnector *drc;
> >> +    QemuUUID uuid;
> >> +    uint32_t drc_idx;
> >> +    uint32_t associativity[] = {
> >> +        cpu_to_be32(0x4), /* length */
> >> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
> >> +        cpu_to_be32(0x0), cpu_to_be32(node)
> >> +    };
> >> +
> >> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> >> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> >> +    g_assert(drc);
> >> +
> >> +    drc_idx = spapr_drc_index(drc);
> >> +
> >> +    sprintf(buf, "pmem@%x", drc_idx);
> >> +    offset = fdt_add_subnode(fdt, fdt_offset, buf);
> >> +    _FDT(offset);
> >> +
> >> +    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
> >> +    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
> >> +    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));
> >> +    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
> >> +
> >> +    /*NB : Supposed to be random strings. Currently empty 10 strings! */
> >> +    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
> >> +    g_string_free(lcode, TRUE);
> >> +
> >> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
> >> +                      sizeof(associativity))));
> >> +    g_random_set_seed(drc_idx);
> >> +    qemu_uuid_generate(&uuid);
> >> +
> >> +    qemu_uuid_unparse(&uuid, buf);
> >> +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
> >> +
> >> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
> >> +
> >> +    /*NB : What it should be? */
> >> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
> >> +
> >> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
> >> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> >> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
> >> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> >> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
> >> +
> >> +    return offset;
> >> +}
> >> +
> >> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
> >> +                             uint64_t size, uint32_t node,
> >> +                             Error **errp)
> >> +{
> >> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> >> +    sPAPRDRConnector *drc;
> >> +    bool hotplugged = spapr_drc_hotplugged(dev);
> >> +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> >> +    void *fdt;
> >> +    int fdt_offset, fdt_size;
> >> +    Error *local_err = NULL;
> >> +
> >> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
> >> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> >> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> >> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> >> +    g_assert(drc);
> >> +
> >> +    fdt = create_device_tree(&fdt_size);
> >> +    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
> >> +                                            size, nvdimm->label_size);
> >> +
> >> +    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
> >> +    if (local_err) {
> >> +        error_propagate(errp, local_err);
> >> +        return;
> >> +    }
> >> +
> >> +    if (hotplugged) {
> >> +        spapr_hotplug_req_add_by_index(drc);
> >> +    }
> >> +}
> >> +
> >>   static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>                                 Error **errp)
> >>   {
> >>       Error *local_err = NULL;
> >>       sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> >>       PCDIMMDevice *dimm = PC_DIMM(dev);
> >> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> >>       uint64_t size, addr;
> >>       uint32_t node;
> >>   
> >> @@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>   
> >>       node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
> >>                                       &error_abort);
> >> -    spapr_add_lmbs(dev, addr, size, node,
> >> -                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> >> -                   &local_err);
> >> +    if (!is_nvdimm) {
> >> +        spapr_add_lmbs(dev, addr, size, node,
> >> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> >> +                       &local_err);
> >> +    } else {
> >> +        spapr_add_nvdimm(dev, addr, size, node, &local_err);
> >> +    }
> >> +
> >>       if (local_err) {
> >>           goto out_unplug;
> >>       }
> >> @@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>   {
> >>       const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
> >>       sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> >> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> >>       PCDIMMDevice *dimm = PC_DIMM(dev);
> >>       Error *local_err = NULL;
> >>       uint64_t size;
> >> @@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>           return;
> >>       }
> >>   
> >> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> >> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
> >>           error_setg(errp, "Hotplugged memory size must be a multiple of "
> >> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> >> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> >>           return;
> >> +    } else if (is_nvdimm) {
> >> +        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> >> +        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
> >> +            error_setg(errp, "NVDIMM memory size must be a multiple of "
> >> +                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> >> +            return;
> >> +        }
> >> +        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
> >> +            error_setg(errp, "NVDIMM size must be atleast "
> >> +                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> >> +            return;
> >> +        }
on the second glance 2 things looks weird here:
  1. we shouldn't poke inside of nvdimm object directly, there is NVDIMM_LABEL_SIZE_PROP
     if you really need to get label size
  2. why do we need to care about label_size here at all?

> >> +        /* Align to scm block size, exclude the label */
> >> +        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
> >> +               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);  
> > I'm not sure that arbitrarily fixing up region size is the right thing to do
> > and also what you are trying to achieve here isn't clear, could you explain it some more?  
> The resize is required to allow the subsequent memory hotplugs to work. The
> base address(if not specified) for the next dimm hotplug, starts at the 
> end of
> this region. If the region is not aligned to LMB size, guest refuses to 
> claim the
> newly hotplugged memory.  The label area can be small and need not be
> aligned to (LMB/SCM block) size. The region size is actually the size 
> minus the
> label_size which can be unaligned to LMB size. So, align down to SCM block
> size is necessary here.
Well fixing up object(MemoryRegion) which belongs to the backend from
machine level to satisfy machine specific alignment requirements looks
like a wrong thing to do.

So we need to come up with another approach.
I'm sill not sure what problem is there but nvdimm already
has a notion of data region (without label size) look for
nvdimm->nvdimm_mr and mdc->get_memory_region and that's what you have in
local var 'size'. So what you are doing here look incorrect even more,
i.e. beside we shouldn't do it at all and the second thing is that you are
sizing down data area which already excludes label size.

What I'd suggest is to align up GPA of being added memory on
   MAX(LMB size, backend_page_size, max supported huge page size)
so hotplugged dimm or whatever else would be properly aligned,
see pc_dimm_pre_plug(,legacy_align,) and how PC uses it.

> >> +        if (local_err) {
> >> +            error_propagate(errp, local_err);
> >> +            return;
> >> +        }
> >>       }
> >>   
> >>       memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
> >> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> >> index 2edb7d1e9c..94ddd102cc 100644
> >> --- a/hw/ppc/spapr_drc.c
> >> +++ b/hw/ppc/spapr_drc.c
> >> @@ -696,6 +696,16 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
> >>       drck->release = spapr_lmb_release;
> >>   }
> >>   
> >> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
> >> +{
> >> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> >> +
> >> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
> >> +    drck->typename = "MEM";
> >> +    drck->drc_name_prefix = "PMEM ";
> >> +    drck->release = NULL;
> >> +}
> >> +
> >>   static const TypeInfo spapr_dr_connector_info = {
> >>       .name          = TYPE_SPAPR_DR_CONNECTOR,
> >>       .parent        = TYPE_DEVICE,
> >> @@ -739,6 +749,12 @@ static const TypeInfo spapr_drc_lmb_info = {
> >>       .class_init    = spapr_drc_lmb_class_init,
> >>   };
> >>   
> >> +static const TypeInfo spapr_drc_pmem_info = {
> >> +    .name          = TYPE_SPAPR_DRC_PMEM,
> >> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
> >> +    .class_init    = spapr_drc_pmem_class_init,
> >> +};
> >> +
> >>   /* helper functions for external users */
> >>   
> >>   sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
> >> @@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void)
> >>       type_register_static(&spapr_drc_cpu_info);
> >>       type_register_static(&spapr_drc_pci_info);
> >>       type_register_static(&spapr_drc_lmb_info);
> >> +    type_register_static(&spapr_drc_pmem_info);
> >>   
> >>       spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
> >>                           rtas_set_indicator);
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index 32719a1b72..a4fed84346 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
> >>   #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
> >>   #define RTAS_LOG_V6_HP_TYPE_PHB                          4
> >>   #define RTAS_LOG_V6_HP_TYPE_PCI                          5
> >> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
> >>       uint8_t hotplug_action;
> >>   #define RTAS_LOG_V6_HP_ACTION_ADD                        1
> >>   #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> >> @@ -526,6 +527,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> >>       case SPAPR_DR_CONNECTOR_TYPE_CPU:
> >>           hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
> >>           break;
> >> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
> >> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
> >> +        break;
> >>       default:
> >>           /* we shouldn't be signaling hotplug events for resources
> >>            * that don't support them
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index a947a0a0dc..21a9709afe 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -187,6 +187,7 @@ struct sPAPRMachineState {
> >>   
> >>       bool cmd_line_caps[SPAPR_CAP_NUM];
> >>       sPAPRCapabilities def, eff, mig;
> >> +    bool nvdimm_enabled;
> >>   };
> >>   
> >>   #define H_SUCCESS         0
> >> @@ -798,6 +799,15 @@ int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
> >>   #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
> >>   #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
> >>   
> >> +/*
> >> + * The nvdimm size should be aligned to SCM block size.
> >> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
> >> + * inorder to have SCM regions not to overlap with dimm memory regions.
> >> + * The SCM devices can have variable block sizes. For now, fixing the
> >> + * block size to the minimum value.
> >> + */
> >> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
> >> +
> >>   void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
> >>   
> >>   #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
> >> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> >> index f6ff32e7e2..65925d00b1 100644
> >> --- a/include/hw/ppc/spapr_drc.h
> >> +++ b/include/hw/ppc/spapr_drc.h
> >> @@ -70,6 +70,13 @@
> >>   #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
> >>                                           TYPE_SPAPR_DRC_LMB)
> >>   
> >> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
> >> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
> >> +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PMEM)
> >> +#define SPAPR_DRC_PMEM_CLASS(klass) \
> >> +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PMEM)
> >> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
> >> +                                        TYPE_SPAPR_DRC_PMEM)
> >>   /*
> >>    * Various hotplug types managed by sPAPRDRConnector
> >>    *
> >> @@ -87,6 +94,7 @@ typedef enum {
> >>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
> >>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
> >>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
> >> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
> >>   } sPAPRDRConnectorTypeShift;
> >>   
> >>   typedef enum {
> >> @@ -96,6 +104,7 @@ typedef enum {
> >>       SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
> >>       SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
> >>       SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
> >> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
> >>   } sPAPRDRConnectorType;
> >>   
> >>   /*
> >>
> >>  
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-18 16:15         ` Shivaprasad G Bhat
@ 2019-02-27  4:27           ` David Gibson
  0 siblings, 0 replies; 19+ messages in thread
From: David Gibson @ 2019-02-27  4:27 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav,
	imammedo

[-- Attachment #1: Type: text/plain, Size: 4581 bytes --]

On Mon, Feb 18, 2019 at 09:45:13PM +0530, Shivaprasad G Bhat wrote:
> 
> 
> On 02/18/2019 04:32 AM, David Gibson wrote:
> > On Fri, Feb 15, 2019 at 04:41:09PM +0530, Shivaprasad G Bhat wrote:
> > > Thanks for the comments David. Please find my replies inline..
[snip]
> > > > > +
> > > > > +    qemu_uuid_unparse(&uuid, buf);
> > > > > +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
> > > > > +
> > > > > +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
> > > > > +
> > > > > +    /*NB : What it should be? */
> > > > > +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
> > > > > +
> > > > > +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
> > > > > +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> > > > > +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
> > > > > +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> > > > > +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
> > > > > +
> > > > > +    return offset;
> > > > > +}
> > > > > +
> > > > > +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
> > > > > +                             uint64_t size, uint32_t node,
> > > > > +                             Error **errp)
> > > > > +{
> > > > > +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> > > > > +    sPAPRDRConnector *drc;
> > > > > +    bool hotplugged = spapr_drc_hotplugged(dev);
> > > > > +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> > > > > +    void *fdt;
> > > > > +    int fdt_offset, fdt_size;
> > > > > +    Error *local_err = NULL;
> > > > > +
> > > > > +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
> > > > > +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> > > > > +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> > > > > +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> > > > > +    g_assert(drc);
> > > > Creating the DRC in the hotplug path looks bogus.  Generally the DRC
> > > > has to exist before you can even attempt to plug the device.
> > > We dont really know how many DRC to create. Unlike memory hotplug
> > > where we know how many LMBs are required to fit till the maxmem, in this
> > > case we dont know how many NVDIMM devices  guest can have. That is the
> > > reason I am creating the DRC on demand. I'll see if it is possible to
> > > address this
> > > by putting a cap on maximum number of NVDIMM devices a guest can have.
> > Urgh, PAPR.  First it specifies a crappy hotplug model that requires
> > zillions of fixed attachment points to be instantiated, then it breaks
> > its own model.
> > 
> > But.. I still don't really understand how this works.
> > 
> > a) How does the guest know the DRC index to use for the new NVDIMM?
> >     Generally that comes from the device tree, but the guest doesn't
> >     get new device tree information until it calls configure-connector
> >     for which it needs the DRC index.
> The DRC is passed in the device tree blob passed as payload of hotplug
> interrupt

Um.. there is no device tree blob as paylod of a hotplug interrupt.
The guest only gets device tree information when it makes
configure-connector calls.

I see that there is a drc identifier field though, so I guess you're
getting the DRC from that.  In existing cases the guest looks that up
in the *existing* device tree to find infomation about that DRC.  I
guess in the case of NVDIMMs here it doesn't need any more info.

> from which the guest picks the DRC index and makes the subsequent calls.
> > b) AFAICT, NVDIMMs would also require HPT space, much like regular
> >     memory would.  PowerVM doesn't have HPT resizing, so surely it must
> >     already have some sort of cap on the amount of NVDIMM space in
> >     order to size the HPT correctly.
> On Power KVM we will enforce the NVDIMM is mapped within the maxmem,
> however the spec allows outside of it. Coming back to the original point of
> creating the DRCs at the hotplug time, we could impose a limit on the
> number of NVDIMM devices that could be hotplugged so that we can
> create the DRCs at the machine init time.

Ah, so NVDIMMs live within the same maxmem limit as regular memory.
Ok, I guess that makes sense.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-21 14:12       ` Igor Mammedov
@ 2019-02-28  8:54         ` Shivaprasad G Bhat
  2019-03-05  9:13           ` Igor Mammedov
  0 siblings, 1 reply; 19+ messages in thread
From: Shivaprasad G Bhat @ 2019-02-28  8:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav, david

Hi Igor,

Thanks for the elaboration. Please find my response inline.


On 02/21/2019 07:42 PM, Igor Mammedov wrote:
> On Tue, 19 Feb 2019 14:59:25 +0530
> Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:
>
>> On 02/19/2019 01:41 PM, Igor Mammedov wrote:
>>> On Tue, 05 Feb 2019 23:26:27 -0600
>>> Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:
>>>   
>>>> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
>>>> device interface in QEMU to support virtual NVDIMM devices for Power (May have
>>>> to re-look at this later).  Create the required DT entries for the
>>>> device (some entries have dummy values right now).
>>>>
>>>> The patch creates the required DT node and sends a hotplug
>>>> interrupt to the guest. Guest is expected to undertake the normal
>>>> DR resource add path in response and start issuing PAPR SCM hcalls.
>>>>
>>>> This is how it can be used ..
>>>> Add nvdimm=on to the qemu machine argument.
>>>> Ex : -machine pseries,nvdimm=on
>>>> For coldplug, the device to be added in qemu command line as shown below
>>>> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
>>>> -device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
>>>>
>>>> For hotplug, the device to be added from monitor as below
>>>> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
>>>> device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
>>>>
>>>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>>>> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
>>>>                  [Early implementation]
>>>> ---
>>>>    default-configs/ppc64-softmmu.mak |    1
>>>>    hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
>>>>    hw/ppc/spapr_drc.c                |   17 +++
>>>>    hw/ppc/spapr_events.c             |    4 +
>>>>    include/hw/ppc/spapr.h            |   10 ++
>>>>    include/hw/ppc/spapr_drc.h        |    9 ++
>>>>    6 files changed, 241 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>>>> index 7f34ad0528..b6e1aa5125 100644
>>>> --- a/default-configs/ppc64-softmmu.mak
>>>> +++ b/default-configs/ppc64-softmmu.mak
>>>> @@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
>>>>    CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>>>>    CONFIG_MEM_DEVICE=y
>>>>    CONFIG_DIMM=y
>>>> +CONFIG_NVDIMM=y
>>>>    CONFIG_SPAPR_RNG=y
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 0fcdd35cbe..7e7a1a8041 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -73,6 +73,7 @@
>>>>    #include "qemu/cutils.h"
>>>>    #include "hw/ppc/spapr_cpu_core.h"
>>>>    #include "hw/mem/memory-device.h"
>>>> +#include "hw/mem/nvdimm.h"
>>>>    
>>>>    #include <libfdt.h>
>>>>    
>>>> @@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>>>>        uint8_t *int_buf, *cur_index, buf_len;
>>>>        int ret;
>>>>        uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>>>> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>>>>        uint64_t addr, cur_addr, size;
>>>>        uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
>>>>        uint64_t mem_end = machine->device_memory->base +
>>>> @@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
>>>>                nr_entries++;
>>>>            }
>>>>    
>>>> -        /* Entry for DIMM */
>>>> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>>>> -        g_assert(drc);
>>>> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
>>>> -                                     spapr_drc_index(drc), node,
>>>> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
>>>> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
>>>> +            /* Entry for NVDIMM */
>>>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
>>>> +            g_assert(drc);
>>>> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
>>>> +                                         spapr_drc_index(drc), -1, 0);
>>>> +            cur_addr = ROUND_UP(addr + size, scm_block_size);
>>>> +        } else {
>>>> +            /* Entry for DIMM */
>>>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>>>> +            g_assert(drc);
>>>> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
>>>> +                                         spapr_drc_index(drc), node,
>>>> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
>>>> +            cur_addr = addr + size;
>>>> +        }
>>>>            QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>>>>            nr_entries++;
>>>> -        cur_addr = addr + size;
>>>>        }
>>>>    
>>>>        /* Entry for remaining hotpluggable area */
>>>> @@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
>>>>        }
>>>>    }
>>>>    
>>>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
>>>> +                                      uint32_t node, uint64_t addr,
>>>> +                                      uint64_t size, uint64_t label_size);
>>>> +static void spapr_create_nvdimm(void *fdt)
>>>> +{
>>>> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
>>>> +    GSList *dimms = NULL;
>>>> +
>>>> +    if (offset < 0) {
>>>> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
>>>> +        _FDT(offset);
>>>> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
>>>> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
>>>> +        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));
>>>> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
>>>> +                                 "ibm,persistent-memory")));
>>>> +    }
>>>> +
>>>> +    /*NB : Add drc-info array here */
>>>> +
>>>> +    /* Create DT entries for cold plugged NVDIMM devices */
>>>> +    dimms = nvdimm_get_device_list();
>>>> +    for (; dimms; dimms = dimms->next) {
>>>> +        NVDIMMDevice *nvdimm = dimms->data;
>>>> +        PCDIMMDevice *di = PC_DIMM(nvdimm);
>>>> +        uint64_t lsize = nvdimm->label_size;
>>>> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
>>>> +                                           NULL);
>>>> +
>>>> +        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
>>>> +                                   size, lsize);
>>>> +    }
>>>> +    g_slist_free(dimms);
>>>> +    return;
>>>> +}
>>>> +
>>>>    static void *spapr_build_fdt(sPAPRMachineState *spapr)
>>>>    {
>>>>        MachineState *machine = MACHINE(spapr);
>>>> @@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
>>>>            exit(1);
>>>>        }
>>>>    
>>>> +    /* NVDIMM devices */
>>>> +    if (spapr->nvdimm_enabled) {
>>>> +        spapr_create_nvdimm(fdt);
>>>> +    }
>>>> +
>>>>        return fdt;
>>>>    }
>>>>    
>>>> @@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
>>>>        }
>>>>    }
>>>>    
>>>> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
>>>> +{
>>>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>>>> +
>>>> +    return spapr->nvdimm_enabled;
>>>> +}
>>>> +
>>>> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
>>>> +{
>>>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>>>> +
>>>> +    spapr->nvdimm_enabled = value;
>>>> +}
>>>> +
>>>>    static void spapr_instance_init(Object *obj)
>>>>    {
>>>>        sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>>>> @@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
>>>>        object_property_set_description(obj, "ic-mode",
>>>>                     "Specifies the interrupt controller mode (xics, xive, dual)",
>>>>                     NULL);
>>>> +    object_property_add_bool(obj, "nvdimm",
>>>> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
>>>> +    object_property_set_description(obj, "nvdimm",
>>>> +                                    "Enable support for nvdimm devices",
>>>> +                                    NULL);
>>>>    }
>>>>    
>>>>    static void spapr_machine_finalizefn(Object *obj)
>>>> @@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
>>>>        }
>>>>    }
>>>>    
>>>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
>>>> +                                      uint64_t addr, uint64_t size,
>>>> +                                      uint64_t label_size)
>>>> +{
>>>> +    int offset;
>>>> +    char buf[40];
>>>> +    GString *lcode = g_string_sized_new(10);
>>>> +    sPAPRDRConnector *drc;
>>>> +    QemuUUID uuid;
>>>> +    uint32_t drc_idx;
>>>> +    uint32_t associativity[] = {
>>>> +        cpu_to_be32(0x4), /* length */
>>>> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
>>>> +        cpu_to_be32(0x0), cpu_to_be32(node)
>>>> +    };
>>>> +
>>>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>>>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>>>> +    g_assert(drc);
>>>> +
>>>> +    drc_idx = spapr_drc_index(drc);
>>>> +
>>>> +    sprintf(buf, "pmem@%x", drc_idx);
>>>> +    offset = fdt_add_subnode(fdt, fdt_offset, buf);
>>>> +    _FDT(offset);
>>>> +
>>>> +    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
>>>> +    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
>>>> +    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));
>>>> +    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
>>>> +
>>>> +    /*NB : Supposed to be random strings. Currently empty 10 strings! */
>>>> +    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
>>>> +    g_string_free(lcode, TRUE);
>>>> +
>>>> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
>>>> +                      sizeof(associativity))));
>>>> +    g_random_set_seed(drc_idx);
>>>> +    qemu_uuid_generate(&uuid);
>>>> +
>>>> +    qemu_uuid_unparse(&uuid, buf);
>>>> +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
>>>> +
>>>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
>>>> +
>>>> +    /*NB : What it should be? */
>>>> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
>>>> +
>>>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
>>>> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>>>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
>>>> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>>>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
>>>> +
>>>> +    return offset;
>>>> +}
>>>> +
>>>> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
>>>> +                             uint64_t size, uint32_t node,
>>>> +                             Error **errp)
>>>> +{
>>>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
>>>> +    sPAPRDRConnector *drc;
>>>> +    bool hotplugged = spapr_drc_hotplugged(dev);
>>>> +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
>>>> +    void *fdt;
>>>> +    int fdt_offset, fdt_size;
>>>> +    Error *local_err = NULL;
>>>> +
>>>> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
>>>> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>>>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>>>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>>>> +    g_assert(drc);
>>>> +
>>>> +    fdt = create_device_tree(&fdt_size);
>>>> +    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
>>>> +                                            size, nvdimm->label_size);
>>>> +
>>>> +    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
>>>> +    if (local_err) {
>>>> +        error_propagate(errp, local_err);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (hotplugged) {
>>>> +        spapr_hotplug_req_add_by_index(drc);
>>>> +    }
>>>> +}
>>>> +
>>>>    static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>                                  Error **errp)
>>>>    {
>>>>        Error *local_err = NULL;
>>>>        sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>>>>        PCDIMMDevice *dimm = PC_DIMM(dev);
>>>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>>>        uint64_t size, addr;
>>>>        uint32_t node;
>>>>    
>>>> @@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>    
>>>>        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
>>>>                                        &error_abort);
>>>> -    spapr_add_lmbs(dev, addr, size, node,
>>>> -                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>>>> -                   &local_err);
>>>> +    if (!is_nvdimm) {
>>>> +        spapr_add_lmbs(dev, addr, size, node,
>>>> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>>>> +                       &local_err);
>>>> +    } else {
>>>> +        spapr_add_nvdimm(dev, addr, size, node, &local_err);
>>>> +    }
>>>> +
>>>>        if (local_err) {
>>>>            goto out_unplug;
>>>>        }
>>>> @@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>    {
>>>>        const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>>>>        sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
>>>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>>>        PCDIMMDevice *dimm = PC_DIMM(dev);
>>>>        Error *local_err = NULL;
>>>>        uint64_t size;
>>>> @@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>            return;
>>>>        }
>>>>    
>>>> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>>>> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
>>>>            error_setg(errp, "Hotplugged memory size must be a multiple of "
>>>> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>>>> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>>>>            return;
>>>> +    } else if (is_nvdimm) {
>>>> +        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
>>>> +        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
>>>> +            error_setg(errp, "NVDIMM memory size must be a multiple of "
>>>> +                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>>>> +            return;
>>>> +        }
>>>> +        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
>>>> +            error_setg(errp, "NVDIMM size must be atleast "
>>>> +                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>>>> +            return;
>>>> +        }
> on the second glance 2 things looks weird here:
>    1. we shouldn't poke inside of nvdimm object directly, there is NVDIMM_LABEL_SIZE_PROP
>       if you really need to get label size

Ok. Will use the property.

>    2. why do we need to care about label_size here at all?

On PPC, there is no explicit way to specify the size of the NVDIMM 
device for the guest.
It is inferred by the (number of SCM blocks) * (SCM block-size) as 
specified in the
device tree. The label area is part of the nvdimm but not exposed to the 
guest
as you mentioned. So, if user specified size=1GB, and label_size=3MB, 
the qemu
will say (1GB-3MB)/256MB(block size) = 3 number of blocks and block size 
as 256MB.
The user gets 768MB of the device. Since the minimum required size is 256MB,
and label_area being outside, and we want it to be aligned to 256MB, I 
am forcing the minimum
device size to be 512MB. Actually it can be anything above 256MB + 
label_size. I see
your point, I can just not care about label_size but care only about the 
nvdimm size(-label_size)
is aligned to 256MB or not.

>>>> +        /* Align to scm block size, exclude the label */
>>>> +        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
>>>> +               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);
>>> I'm not sure that arbitrarily fixing up region size is the right thing to do
>>> and also what you are trying to achieve here isn't clear, could you explain it some more?
>> The resize is required to allow the subsequent memory hotplugs to work. The
>> base address(if not specified) for the next dimm hotplug, starts at the
>> end of
>> this region. If the region is not aligned to LMB size, guest refuses to
>> claim the
>> newly hotplugged memory.  The label area can be small and need not be
>> aligned to (LMB/SCM block) size. The region size is actually the size
>> minus the
>> label_size which can be unaligned to LMB size. So, align down to SCM block
>> size is necessary here.
> Well fixing up object(MemoryRegion) which belongs to the backend from
> machine level to satisfy machine specific alignment requirements looks
> like a wrong thing to do.

For a 1GB device with say 3MB label_size, the qemu exposes 3 SCM blocks 
of 256MB each
and guest actually accesses 768MB of the region, even though the memory 
region size
is (1GB-3MB). But on x86, the guest actually sees (1GB-3MB), not any less.
The memory region size is larger than 768 and is unaligned to 256MB, the
subsequent dimm hotplug would fail as the next free address got from
memory_device_get_free_addr is not aligned to 256MB.

[   35.617767] pseries-hotplug-mem: dlpar_memory: Memory add LMBs
[   35.619598] pseries-hotplug-mem: Attempting to hot-add 1 LMB(s) at 
index 80000040
[   35.619966] pseries-hotplug-mem: Attempting to hot-add in range 
40fe00000 - 40fe00000
[   35.620416] pseries-hotplug-mem: Attempting to hot-add in range 
40fe00000 - 40fe00000
[   35.621330] Block size [0x10000000 or 268435456] unaligned hotplug 
range: start 0x40fe00000, size 0x10000000
[   35.621432] pseries-hotplug-mem: Memory indexed-count-add failed, 
removing any added LMBs

This alignment problem is not unique to Power, I see the same happening 
on x86_64 too as the
memory block size is required to be aligned to 128MB there.

[   26.558423] Block size [0x8000000] unaligned hotplug range: start 
0x11ffe0000, size 0x8000000
[   26.558427] acpi PNP0C80:00: add_memory failed
[   26.558431] acpi PNP0C80:00: acpi_memory_enable_device() error
[   26.558433] acpi PNP0C80:00: Enumeration failure

The user has to circumvent this alignment issue by explicitly giving the 
256MB and 128MB as the align size on the
memory-backend-file object option on PPC and X86_64 respectively.

> So we need to come up with another approach.
> I'm sill not sure what problem is there but nvdimm already
> has a notion of data region (without label size) look for
> nvdimm->nvdimm_mr and mdc->get_memory_region and that's what you have in
> local var 'size'.
>    
>
> So what you are doing here look incorrect even more,
> i.e. beside we shouldn't do it at all and the second thing is that you are
> sizing down data area which already excludes label size.

To get the things(nvdimm & dimm) working together, the user has to give 
align=256m on
memory-backend-file device option for nvdimm device backend object.
With the align option, the nvdimm_prepare_memory_region() does the 
QEMU_ALIGN_DOWN of the
memory region which I am doing here by default. Doing it by default 
still makes sense as the
actual size the guest gets to use is only 768MB in this case. However, I 
should probably do align down
only if the user has not specified the value by himself, and let 
nvdimm_prepare_memory_region() do
it in such a case.

Since this is PPC specific alignment requirement, I think this is the 
right place to enforce it & size down
by default here. There are checks for PPC specific DIMM size alignment 
requirement same way here.
If this is not the right place, could you suggest me a better place as the
nvdimm_prepare_memory_region() is generic and I cant set machine 
specific device properties there?


> What I'd suggest is to align up GPA of being added memory on
>     MAX(LMB size, backend_page_size, max supported huge page size)
> so hotplugged dimm or whatever else would be properly aligned,
> see pc_dimm_pre_plug(,legacy_align,) and how PC uses it.

This approach though assures to give an address aligned to the 
legacy_align mentioned from
memory_device_get_free_addr(), requires the size of the device also to 
be aligned to
the legacy_align specified. If I am not sizing down the region size, the 
size-label_size will not
be aligned to this size.  That is another reason why sizing down the 
region size is still needed.

>>>> +        if (local_err) {
>>>> +            error_propagate(errp, local_err);
>>>> +            return;
>>>> +        }
>>>>        }
>>>>    
>>>>        memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
>>>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>>>> index 2edb7d1e9c..94ddd102cc 100644
>>>> --- a/hw/ppc/spapr_drc.c
>>>> +++ b/hw/ppc/spapr_drc.c
>>>> @@ -696,6 +696,16 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
>>>>        drck->release = spapr_lmb_release;
>>>>    }
>>>>    
>>>> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
>>>> +{
>>>> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
>>>> +
>>>> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
>>>> +    drck->typename = "MEM";
>>>> +    drck->drc_name_prefix = "PMEM ";
>>>> +    drck->release = NULL;
>>>> +}
>>>> +
>>>>    static const TypeInfo spapr_dr_connector_info = {
>>>>        .name          = TYPE_SPAPR_DR_CONNECTOR,
>>>>        .parent        = TYPE_DEVICE,
>>>> @@ -739,6 +749,12 @@ static const TypeInfo spapr_drc_lmb_info = {
>>>>        .class_init    = spapr_drc_lmb_class_init,
>>>>    };
>>>>    
>>>> +static const TypeInfo spapr_drc_pmem_info = {
>>>> +    .name          = TYPE_SPAPR_DRC_PMEM,
>>>> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
>>>> +    .class_init    = spapr_drc_pmem_class_init,
>>>> +};
>>>> +
>>>>    /* helper functions for external users */
>>>>    
>>>>    sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
>>>> @@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void)
>>>>        type_register_static(&spapr_drc_cpu_info);
>>>>        type_register_static(&spapr_drc_pci_info);
>>>>        type_register_static(&spapr_drc_lmb_info);
>>>> +    type_register_static(&spapr_drc_pmem_info);
>>>>    
>>>>        spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>>>>                            rtas_set_indicator);
>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>> index 32719a1b72..a4fed84346 100644
>>>> --- a/hw/ppc/spapr_events.c
>>>> +++ b/hw/ppc/spapr_events.c
>>>> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
>>>>    #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
>>>>    #define RTAS_LOG_V6_HP_TYPE_PHB                          4
>>>>    #define RTAS_LOG_V6_HP_TYPE_PCI                          5
>>>> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
>>>>        uint8_t hotplug_action;
>>>>    #define RTAS_LOG_V6_HP_ACTION_ADD                        1
>>>>    #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
>>>> @@ -526,6 +527,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>>>>        case SPAPR_DR_CONNECTOR_TYPE_CPU:
>>>>            hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
>>>>            break;
>>>> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
>>>> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
>>>> +        break;
>>>>        default:
>>>>            /* we shouldn't be signaling hotplug events for resources
>>>>             * that don't support them
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index a947a0a0dc..21a9709afe 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -187,6 +187,7 @@ struct sPAPRMachineState {
>>>>    
>>>>        bool cmd_line_caps[SPAPR_CAP_NUM];
>>>>        sPAPRCapabilities def, eff, mig;
>>>> +    bool nvdimm_enabled;
>>>>    };
>>>>    
>>>>    #define H_SUCCESS         0
>>>> @@ -798,6 +799,15 @@ int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
>>>>    #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
>>>>    #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
>>>>    
>>>> +/*
>>>> + * The nvdimm size should be aligned to SCM block size.
>>>> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
>>>> + * inorder to have SCM regions not to overlap with dimm memory regions.
>>>> + * The SCM devices can have variable block sizes. For now, fixing the
>>>> + * block size to the minimum value.
>>>> + */
>>>> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
>>>> +
>>>>    void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>>>>    
>>>>    #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>>>> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
>>>> index f6ff32e7e2..65925d00b1 100644
>>>> --- a/include/hw/ppc/spapr_drc.h
>>>> +++ b/include/hw/ppc/spapr_drc.h
>>>> @@ -70,6 +70,13 @@
>>>>    #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>>>>                                            TYPE_SPAPR_DRC_LMB)
>>>>    
>>>> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
>>>> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
>>>> +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PMEM)
>>>> +#define SPAPR_DRC_PMEM_CLASS(klass) \
>>>> +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PMEM)
>>>> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
>>>> +                                        TYPE_SPAPR_DRC_PMEM)
>>>>    /*
>>>>     * Various hotplug types managed by sPAPRDRConnector
>>>>     *
>>>> @@ -87,6 +94,7 @@ typedef enum {
>>>>        SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
>>>>        SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
>>>>        SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
>>>> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
>>>>    } sPAPRDRConnectorTypeShift;
>>>>    
>>>>    typedef enum {
>>>> @@ -96,6 +104,7 @@ typedef enum {
>>>>        SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
>>>>        SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
>>>>        SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
>>>> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
>>>>    } sPAPRDRConnectorType;
>>>>    
>>>>    /*
>>>>
>>>>   

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support
  2019-02-28  8:54         ` Shivaprasad G Bhat
@ 2019-03-05  9:13           ` Igor Mammedov
  0 siblings, 0 replies; 19+ messages in thread
From: Igor Mammedov @ 2019-03-05  9:13 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, xiaoguangrong.eric, mst, bharata, qemu-ppc, vaibhav, david

On Thu, 28 Feb 2019 14:24:07 +0530
Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:

> Hi Igor,
> 
> Thanks for the elaboration. Please find my response inline.
> 
> 
> On 02/21/2019 07:42 PM, Igor Mammedov wrote:
> > On Tue, 19 Feb 2019 14:59:25 +0530
> > Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:
> >  
> >> On 02/19/2019 01:41 PM, Igor Mammedov wrote:  
> >>> On Tue, 05 Feb 2019 23:26:27 -0600
> >>> Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:
> >>>     
> >>>> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
> >>>> device interface in QEMU to support virtual NVDIMM devices for Power (May have
> >>>> to re-look at this later).  Create the required DT entries for the
> >>>> device (some entries have dummy values right now).
> >>>>
> >>>> The patch creates the required DT node and sends a hotplug
> >>>> interrupt to the guest. Guest is expected to undertake the normal
> >>>> DR resource add path in response and start issuing PAPR SCM hcalls.
> >>>>
> >>>> This is how it can be used ..
> >>>> Add nvdimm=on to the qemu machine argument.
> >>>> Ex : -machine pseries,nvdimm=on
> >>>> For coldplug, the device to be added in qemu command line as shown below
> >>>> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> >>>> -device nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> >>>>
> >>>> For hotplug, the device to be added from monitor as below
> >>>> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0.img,share=yes,size=512m
> >>>> device_add nvdimm,label-size=128k,memdev=memnvdimm0,id=nvdimm0,slot=0
> >>>>
> >>>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> >>>> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> >>>>                  [Early implementation]
> >>>> ---
> >>>>    default-configs/ppc64-softmmu.mak |    1
> >>>>    hw/ppc/spapr.c                    |  212 +++++++++++++++++++++++++++++++++++--
> >>>>    hw/ppc/spapr_drc.c                |   17 +++
> >>>>    hw/ppc/spapr_events.c             |    4 +
> >>>>    include/hw/ppc/spapr.h            |   10 ++
> >>>>    include/hw/ppc/spapr_drc.h        |    9 ++
> >>>>    6 files changed, 241 insertions(+), 12 deletions(-)
> >>>>
> >>>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> >>>> index 7f34ad0528..b6e1aa5125 100644
> >>>> --- a/default-configs/ppc64-softmmu.mak
> >>>> +++ b/default-configs/ppc64-softmmu.mak
> >>>> @@ -20,4 +20,5 @@ CONFIG_XIVE=$(CONFIG_PSERIES)
> >>>>    CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
> >>>>    CONFIG_MEM_DEVICE=y
> >>>>    CONFIG_DIMM=y
> >>>> +CONFIG_NVDIMM=y
> >>>>    CONFIG_SPAPR_RNG=y
> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>> index 0fcdd35cbe..7e7a1a8041 100644
> >>>> --- a/hw/ppc/spapr.c
> >>>> +++ b/hw/ppc/spapr.c
> >>>> @@ -73,6 +73,7 @@
> >>>>    #include "qemu/cutils.h"
> >>>>    #include "hw/ppc/spapr_cpu_core.h"
> >>>>    #include "hw/mem/memory-device.h"
> >>>> +#include "hw/mem/nvdimm.h"
> >>>>    
> >>>>    #include <libfdt.h>
> >>>>    
> >>>> @@ -690,6 +691,7 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
> >>>>        uint8_t *int_buf, *cur_index, buf_len;
> >>>>        int ret;
> >>>>        uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> >>>> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> >>>>        uint64_t addr, cur_addr, size;
> >>>>        uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
> >>>>        uint64_t mem_end = machine->device_memory->base +
> >>>> @@ -726,15 +728,24 @@ static int spapr_populate_drmem_v2(sPAPRMachineState *spapr, void *fdt,
> >>>>                nr_entries++;
> >>>>            }
> >>>>    
> >>>> -        /* Entry for DIMM */
> >>>> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> >>>> -        g_assert(drc);
> >>>> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
> >>>> -                                     spapr_drc_index(drc), node,
> >>>> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
> >>>> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> >>>> +            /* Entry for NVDIMM */
> >>>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
> >>>> +            g_assert(drc);
> >>>> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
> >>>> +                                         spapr_drc_index(drc), -1, 0);
> >>>> +            cur_addr = ROUND_UP(addr + size, scm_block_size);
> >>>> +        } else {
> >>>> +            /* Entry for DIMM */
> >>>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> >>>> +            g_assert(drc);
> >>>> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
> >>>> +                                         spapr_drc_index(drc), node,
> >>>> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
> >>>> +            cur_addr = addr + size;
> >>>> +        }
> >>>>            QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
> >>>>            nr_entries++;
> >>>> -        cur_addr = addr + size;
> >>>>        }
> >>>>    
> >>>>        /* Entry for remaining hotpluggable area */
> >>>> @@ -1225,6 +1236,42 @@ static void spapr_dt_hypervisor(sPAPRMachineState *spapr, void *fdt)
> >>>>        }
> >>>>    }
> >>>>    
> >>>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset,
> >>>> +                                      uint32_t node, uint64_t addr,
> >>>> +                                      uint64_t size, uint64_t label_size);
> >>>> +static void spapr_create_nvdimm(void *fdt)
> >>>> +{
> >>>> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
> >>>> +    GSList *dimms = NULL;
> >>>> +
> >>>> +    if (offset < 0) {
> >>>> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
> >>>> +        _FDT(offset);
> >>>> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
> >>>> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
> >>>> +        _FDT((fdt_setprop_string(fdt, offset, "name", "persistent-memory")));
> >>>> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
> >>>> +                                 "ibm,persistent-memory")));
> >>>> +    }
> >>>> +
> >>>> +    /*NB : Add drc-info array here */
> >>>> +
> >>>> +    /* Create DT entries for cold plugged NVDIMM devices */
> >>>> +    dimms = nvdimm_get_device_list();
> >>>> +    for (; dimms; dimms = dimms->next) {
> >>>> +        NVDIMMDevice *nvdimm = dimms->data;
> >>>> +        PCDIMMDevice *di = PC_DIMM(nvdimm);
> >>>> +        uint64_t lsize = nvdimm->label_size;
> >>>> +        int size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
> >>>> +                                           NULL);
> >>>> +
> >>>> +        spapr_populate_nvdimm_node(fdt, offset, di->node, di->addr,
> >>>> +                                   size, lsize);
> >>>> +    }
> >>>> +    g_slist_free(dimms);
> >>>> +    return;
> >>>> +}
> >>>> +
> >>>>    static void *spapr_build_fdt(sPAPRMachineState *spapr)
> >>>>    {
> >>>>        MachineState *machine = MACHINE(spapr);
> >>>> @@ -1348,6 +1395,11 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
> >>>>            exit(1);
> >>>>        }
> >>>>    
> >>>> +    /* NVDIMM devices */
> >>>> +    if (spapr->nvdimm_enabled) {
> >>>> +        spapr_create_nvdimm(fdt);
> >>>> +    }
> >>>> +
> >>>>        return fdt;
> >>>>    }
> >>>>    
> >>>> @@ -3143,6 +3195,20 @@ static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp)
> >>>>        }
> >>>>    }
> >>>>    
> >>>> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
> >>>> +{
> >>>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> >>>> +
> >>>> +    return spapr->nvdimm_enabled;
> >>>> +}
> >>>> +
> >>>> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
> >>>> +{
> >>>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> >>>> +
> >>>> +    spapr->nvdimm_enabled = value;
> >>>> +}
> >>>> +
> >>>>    static void spapr_instance_init(Object *obj)
> >>>>    {
> >>>>        sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> >>>> @@ -3188,6 +3254,11 @@ static void spapr_instance_init(Object *obj)
> >>>>        object_property_set_description(obj, "ic-mode",
> >>>>                     "Specifies the interrupt controller mode (xics, xive, dual)",
> >>>>                     NULL);
> >>>> +    object_property_add_bool(obj, "nvdimm",
> >>>> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
> >>>> +    object_property_set_description(obj, "nvdimm",
> >>>> +                                    "Enable support for nvdimm devices",
> >>>> +                                    NULL);
> >>>>    }
> >>>>    
> >>>>    static void spapr_machine_finalizefn(Object *obj)
> >>>> @@ -3267,12 +3338,103 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
> >>>>        }
> >>>>    }
> >>>>    
> >>>> +static int spapr_populate_nvdimm_node(void *fdt, int fdt_offset, uint32_t node,
> >>>> +                                      uint64_t addr, uint64_t size,
> >>>> +                                      uint64_t label_size)
> >>>> +{
> >>>> +    int offset;
> >>>> +    char buf[40];
> >>>> +    GString *lcode = g_string_sized_new(10);
> >>>> +    sPAPRDRConnector *drc;
> >>>> +    QemuUUID uuid;
> >>>> +    uint32_t drc_idx;
> >>>> +    uint32_t associativity[] = {
> >>>> +        cpu_to_be32(0x4), /* length */
> >>>> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
> >>>> +        cpu_to_be32(0x0), cpu_to_be32(node)
> >>>> +    };
> >>>> +
> >>>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> >>>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> >>>> +    g_assert(drc);
> >>>> +
> >>>> +    drc_idx = spapr_drc_index(drc);
> >>>> +
> >>>> +    sprintf(buf, "pmem@%x", drc_idx);
> >>>> +    offset = fdt_add_subnode(fdt, fdt_offset, buf);
> >>>> +    _FDT(offset);
> >>>> +
> >>>> +    _FDT((fdt_setprop_cell(fdt, offset, "reg", drc_idx)));
> >>>> +    _FDT((fdt_setprop_string(fdt, offset, "compatible", "ibm,pmemory")));
> >>>> +    _FDT((fdt_setprop_string(fdt, offset, "name", "pmem")));
> >>>> +    _FDT((fdt_setprop_string(fdt, offset, "device_type", "ibm,pmemory")));
> >>>> +
> >>>> +    /*NB : Supposed to be random strings. Currently empty 10 strings! */
> >>>> +    _FDT((fdt_setprop(fdt, offset, "ibm,loc-code", lcode->str, lcode->len)));
> >>>> +    g_string_free(lcode, TRUE);
> >>>> +
> >>>> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
> >>>> +                      sizeof(associativity))));
> >>>> +    g_random_set_seed(drc_idx);
> >>>> +    qemu_uuid_generate(&uuid);
> >>>> +
> >>>> +    qemu_uuid_unparse(&uuid, buf);
> >>>> +    _FDT((fdt_setprop_string(fdt, offset, "ibm,unit-guid", buf)));
> >>>> +
> >>>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_idx)));
> >>>> +
> >>>> +    /*NB : What it should be? */
> >>>> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,latency-attribute", 828));
> >>>> +
> >>>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,block-size",
> >>>> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> >>>> +    _FDT((fdt_setprop_u64(fdt, offset, "ibm,number-of-blocks",
> >>>> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> >>>> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,metadata-size", label_size)));
> >>>> +
> >>>> +    return offset;
> >>>> +}
> >>>> +
> >>>> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr,
> >>>> +                             uint64_t size, uint32_t node,
> >>>> +                             Error **errp)
> >>>> +{
> >>>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> >>>> +    sPAPRDRConnector *drc;
> >>>> +    bool hotplugged = spapr_drc_hotplugged(dev);
> >>>> +    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> >>>> +    void *fdt;
> >>>> +    int fdt_offset, fdt_size;
> >>>> +    Error *local_err = NULL;
> >>>> +
> >>>> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
> >>>> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> >>>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> >>>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> >>>> +    g_assert(drc);
> >>>> +
> >>>> +    fdt = create_device_tree(&fdt_size);
> >>>> +    fdt_offset = spapr_populate_nvdimm_node(fdt, 0, node, addr,
> >>>> +                                            size, nvdimm->label_size);
> >>>> +
> >>>> +    spapr_drc_attach(drc, dev, fdt, fdt_offset, &local_err);
> >>>> +    if (local_err) {
> >>>> +        error_propagate(errp, local_err);
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    if (hotplugged) {
> >>>> +        spapr_hotplug_req_add_by_index(drc);
> >>>> +    }
> >>>> +}
> >>>> +
> >>>>    static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>>>                                  Error **errp)
> >>>>    {
> >>>>        Error *local_err = NULL;
> >>>>        sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> >>>>        PCDIMMDevice *dimm = PC_DIMM(dev);
> >>>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> >>>>        uint64_t size, addr;
> >>>>        uint32_t node;
> >>>>    
> >>>> @@ -3291,9 +3453,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>>>    
> >>>>        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
> >>>>                                        &error_abort);
> >>>> -    spapr_add_lmbs(dev, addr, size, node,
> >>>> -                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> >>>> -                   &local_err);
> >>>> +    if (!is_nvdimm) {
> >>>> +        spapr_add_lmbs(dev, addr, size, node,
> >>>> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> >>>> +                       &local_err);
> >>>> +    } else {
> >>>> +        spapr_add_nvdimm(dev, addr, size, node, &local_err);
> >>>> +    }
> >>>> +
> >>>>        if (local_err) {
> >>>>            goto out_unplug;
> >>>>        }
> >>>> @@ -3311,6 +3478,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>>>    {
> >>>>        const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
> >>>>        sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> >>>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> >>>>        PCDIMMDevice *dimm = PC_DIMM(dev);
> >>>>        Error *local_err = NULL;
> >>>>        uint64_t size;
> >>>> @@ -3328,10 +3496,30 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>>>            return;
> >>>>        }
> >>>>    
> >>>> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> >>>> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
> >>>>            error_setg(errp, "Hotplugged memory size must be a multiple of "
> >>>> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> >>>> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> >>>>            return;
> >>>> +    } else if (is_nvdimm) {
> >>>> +        NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
> >>>> +        if ((nvdimm->label_size + size) % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
> >>>> +            error_setg(errp, "NVDIMM memory size must be a multiple of "
> >>>> +                       "%" PRIu64 "MB", SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> >>>> +            return;
> >>>> +        }
> >>>> +        if (((nvdimm->label_size + size) / SPAPR_MINIMUM_SCM_BLOCK_SIZE) == 1) {
> >>>> +            error_setg(errp, "NVDIMM size must be atleast "
> >>>> +                       "%" PRIu64 "MB", 2 * SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> >>>> +            return;
> >>>> +        }  
> > on the second glance 2 things looks weird here:
> >    1. we shouldn't poke inside of nvdimm object directly, there is NVDIMM_LABEL_SIZE_PROP
> >       if you really need to get label size  
> 
> Ok. Will use the property.
> 
> >    2. why do we need to care about label_size here at all?  
> 
> On PPC, there is no explicit way to specify the size of the NVDIMM 
> device for the guest.
> It is inferred by the (number of SCM blocks) * (SCM block-size) as 
> specified in the
> device tree. The label area is part of the nvdimm but not exposed to the 
> guest
> as you mentioned. So, if user specified size=1GB, and label_size=3MB, 
> the qemu
> will say (1GB-3MB)/256MB(block size) = 3 number of blocks and block size 
> as 256MB.
> The user gets 768MB of the device. Since the minimum required size is 256MB,
> and label_area being outside, and we want it to be aligned to 256MB, I 
> am forcing the minimum
> device size to be 512MB. Actually it can be anything above 256MB + 
> label_size. I see
> your point, I can just not care about label_size but care only about the 
> nvdimm size(-label_size)
> is aligned to 256MB or not.
> 
> >>>> +        /* Align to scm block size, exclude the label */
> >>>> +        memory_device_set_region_size(MEMORY_DEVICE(nvdimm),
> >>>> +               QEMU_ALIGN_DOWN(size, SPAPR_MINIMUM_SCM_BLOCK_SIZE), &local_err);  
> >>> I'm not sure that arbitrarily fixing up region size is the right thing to do
> >>> and also what you are trying to achieve here isn't clear, could you explain it some more?  
> >> The resize is required to allow the subsequent memory hotplugs to work. The
> >> base address(if not specified) for the next dimm hotplug, starts at the
> >> end of
> >> this region. If the region is not aligned to LMB size, guest refuses to
> >> claim the
> >> newly hotplugged memory.  The label area can be small and need not be
> >> aligned to (LMB/SCM block) size. The region size is actually the size
> >> minus the
> >> label_size which can be unaligned to LMB size. So, align down to SCM block
> >> size is necessary here.  
> > Well fixing up object(MemoryRegion) which belongs to the backend from
> > machine level to satisfy machine specific alignment requirements looks
> > like a wrong thing to do.  
> 
> For a 1GB device with say 3MB label_size, the qemu exposes 3 SCM blocks 
> of 256MB each
> and guest actually accesses 768MB of the region, even though the memory 
> region size
> is (1GB-3MB). But on x86, the guest actually sees (1GB-3MB), not any less.
> The memory region size is larger than 768 and is unaligned to 256MB, the
> subsequent dimm hotplug would fail as the next free address got from
> memory_device_get_free_addr is not aligned to 256MB.
> 
> [   35.617767] pseries-hotplug-mem: dlpar_memory: Memory add LMBs
> [   35.619598] pseries-hotplug-mem: Attempting to hot-add 1 LMB(s) at 
> index 80000040
> [   35.619966] pseries-hotplug-mem: Attempting to hot-add in range 
> 40fe00000 - 40fe00000
> [   35.620416] pseries-hotplug-mem: Attempting to hot-add in range 
> 40fe00000 - 40fe00000
> [   35.621330] Block size [0x10000000 or 268435456] unaligned hotplug 
> range: start 0x40fe00000, size 0x10000000
> [   35.621432] pseries-hotplug-mem: Memory indexed-count-add failed, 
> removing any added LMBs
> 
> This alignment problem is not unique to Power, I see the same happening 
> on x86_64 too as the
> memory block size is required to be aligned to 128MB there.
> 
> [   26.558423] Block size [0x8000000] unaligned hotplug range: start 
> 0x11ffe0000, size 0x8000000
> [   26.558427] acpi PNP0C80:00: add_memory failed
> [   26.558431] acpi PNP0C80:00: acpi_memory_enable_device() error
> [   26.558433] acpi PNP0C80:00: Enumeration failure
> 
> The user has to circumvent this alignment issue by explicitly giving the 
> 256MB and 128MB as the align size on the
> memory-backend-file object option on PPC and X86_64 respectively.
> 
> > So we need to come up with another approach.
> > I'm sill not sure what problem is there but nvdimm already
> > has a notion of data region (without label size) look for
> > nvdimm->nvdimm_mr and mdc->get_memory_region and that's what you have in
> > local var 'size'.
> >    
> >
> > So what you are doing here look incorrect even more,
> > i.e. beside we shouldn't do it at all and the second thing is that you are
> > sizing down data area which already excludes label size.  
> 
> To get the things(nvdimm & dimm) working together, the user has to give 
> align=256m on
> memory-backend-file device option for nvdimm device backend object.
> With the align option, the nvdimm_prepare_memory_region() does the 
> QEMU_ALIGN_DOWN of the
> memory region which I am doing here by default. Doing it by default 
> still makes sense as the
> actual size the guest gets to use is only 768MB in this case. However, I 
> should probably do align down
> only if the user has not specified the value by himself, and let 
> nvdimm_prepare_memory_region() do
> it in such a case.
> 
> Since this is PPC specific alignment requirement, I think this is the 
> right place to enforce it & size down
> by default here. There are checks for PPC specific DIMM size alignment 
> requirement same way here.
> If this is not the right place, could you suggest me a better place as the
> nvdimm_prepare_memory_region() is generic and I cant set machine 
> specific device properties there?
> 
> 
> > What I'd suggest is to align up GPA of being added memory on
> >     MAX(LMB size, backend_page_size, max supported huge page size)
> > so hotplugged dimm or whatever else would be properly aligned,
> > see pc_dimm_pre_plug(,legacy_align,) and how PC uses it.  
> 
> This approach though assures to give an address aligned to the 
> legacy_align mentioned from
> memory_device_get_free_addr(), requires the size of the device also to 
> be aligned to
> the legacy_align specified. If I am not sizing down the region size, the 
> size-label_size will not
> be aligned to this size.  That is another reason why sizing down the 
> region size is still needed.
The thing with size alignment is that it's a guest specific requirement
that varies depending on OS is running inside. So if spec doesn't specify
alignment I'd look at backend page size as such. It's upto mgmt layer to
configure size properly as upper stack should be aware of which guest it runs.

In case of SPAPR, is size alignment an architectural requirement or it's just
specific guest impl?
If it's the former I'd replace sizing down with a check and refuse improperly
sized nvdimm in case of the later I'd let mgmt to properly pick size depending
on guest OS.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-03-05  9:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-06  5:24 [Qemu-devel] [RFC PATCH 0/4] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
2019-02-06  5:25 ` [Qemu-devel] [RFC PATCH 1/4] mem: make nvdimm_device_list global Shivaprasad G Bhat
2019-02-19  7:59   ` Igor Mammedov
2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 2/4] mem: implement memory_device_set_region_size Shivaprasad G Bhat
2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 3/4] spapr: Add NVDIMM device support Shivaprasad G Bhat
2019-02-12  1:49   ` David Gibson
2019-02-15 11:11     ` Shivaprasad G Bhat
2019-02-17 23:02       ` David Gibson
2019-02-18 16:15         ` Shivaprasad G Bhat
2019-02-27  4:27           ` David Gibson
2019-02-19  8:11   ` Igor Mammedov
2019-02-19  9:29     ` Shivaprasad G Bhat
2019-02-21 14:12       ` Igor Mammedov
2019-02-28  8:54         ` Shivaprasad G Bhat
2019-03-05  9:13           ` Igor Mammedov
2019-02-06  5:26 ` [Qemu-devel] [RFC PATCH 4/4] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
2019-02-12  2:28   ` David Gibson
2019-02-15 11:11     ` Shivaprasad G Bhat
2019-02-19  5:33       ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.