All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2 PATCH 0/3] ppc: spapr: virtual NVDIMM support
@ 2019-05-13  9:25 Shivaprasad G Bhat
  2019-05-13  9:27 ` [Qemu-devel] [RFC v2 PATCH 1/3] mem: make nvdimm_device_list global Shivaprasad G Bhat
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Shivaprasad G Bhat @ 2019-05-13  9:25 UTC (permalink / raw)
  To: imammedo, david, xiaoguangrong.eric, mst; +Cc: qemu-ppc, qemu-devel, sbhat

The patchset attempts to implement the virtual NVDIMM for pseries.

PAPR semantics is such that each NVDIMM device is comprising of multiple
SCM(Storage Class Memory) blocks. The hypervisor is expected to prepare the
FDT for the NVDIMM device and send guest a hotplug interrupt with new type 
RTAS_LOG_V6_HP_TYPE_PMEM currently handled by the upstream kernel. In response
to that interrupt, the guest requests the hypervisor to bind each of the SCM
blocks of the NVDIMM device using hcalls. There can be SCM block unbind
requests in case of driver errors or unplug(not supported now) use cases. The
NVDIMM label read/writes are done through hcalls.

There are also new futuristic hcalls added(currently unused in the kernel), for
querying the informations such as binding, logical addresses of the SCM blocks.
The current patchset leaves them unimplemented.

Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
unbind, and queries using hcalls on those blocks can come independently. This
doesnt fit well into the qemu device semantics, where the map/unmap are done at
the (whole)device/object level granularity. The patchset uses the existing
NVDIMM class structures for the implementation. The bind/unbind is left to
happen at the object_add/del phase itself instead of at hcalls on-demand.

The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
region level granularity. Without interleaving, each virtual NVDIMM device is
presented as separate region. There is no way to configure the virtual NVDIMM
interleaving for the guests today. So, there is no way a partial bind/unbind
request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
object_add/del.

The free device-memory region which is used for memory hotplug are done using
multiple LMBs of size(256MiB) and are expected to be aligned to 256 MiB. As the
SCM blocks are mapped to the same region, the SCM blocks also need to be
aligned to this size for the subsequent memory hotplug to work. The minimum SCM
block size is set to this size for that reason and can be made user configurable
in future if required.

The first patch moves around the existing static function to common area
for using it in the subsequent patches. Second patch adds the FDT entries and
basic device support, the third patch adds the hcalls implementation.

The patches are also available at https://github.com/ShivaprasadGBhat/qemu.git -
pseries-nvdimm branch and can be used with the upstream kernel. ndctl can be
used for configuring the nvdimms inside the guest.

This is how it can be used ..
Add nvdimm=on to the qemu machine argument,
Ex : -machine pseries,nvdimm=on
For coldplug, the device to be added in qemu command line as shown below
-object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
-device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0

For hotplug, the device to be added from monitor as below
object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0

---
v1 : http://lists.nongnu.org/archive/html/qemu-devel/2019-02/msg01545.html
Changes from v1:
     - Rebased to upstream, this required required dt_populate implementation
       for nvdimm hotplug support
     - Added uuid option to nvdimm device
     - Removed the memory region sizing down code as suggested by Igor,
       now erroring out if NVDIMM size excluding the label area is not
       aligned to 256MB, so patch 2 from previous series no longer needed.
     - Removed un-implemented hcalls
     - Changed the hcalls to different kinds of checks and return
       different values.
     - Addressed comments for v1

Shivaprasad G Bhat (3):
      mem: make nvdimm_device_list global
      spapr: Add NVDIMM device support
      spapr: Add Hcalls to support PAPR NVDIMM device


 default-configs/ppc64-softmmu.mak |    1 
 hw/acpi/nvdimm.c                  |   27 -----
 hw/mem/Kconfig                    |    2 
 hw/mem/nvdimm.c                   |   70 +++++++++++++
 hw/ppc/spapr.c                    |  202 +++++++++++++++++++++++++++++++++++--
 hw/ppc/spapr_drc.c                |   18 +++
 hw/ppc/spapr_events.c             |    4 +
 hw/ppc/spapr_hcall.c              |  202 +++++++++++++++++++++++++++++++++++++
 include/hw/mem/nvdimm.h           |    8 +
 include/hw/ppc/spapr.h            |   19 +++
 include/hw/ppc/spapr_drc.h        |    9 ++
 11 files changed, 523 insertions(+), 39 deletions(-)

--
Signature



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 1/3] mem: make nvdimm_device_list global
  2019-05-13  9:25 [Qemu-devel] [RFC v2 PATCH 0/3] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
@ 2019-05-13  9:27 ` Shivaprasad G Bhat
  2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support Shivaprasad G Bhat
  2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
  2 siblings, 0 replies; 16+ messages in thread
From: Shivaprasad G Bhat @ 2019-05-13  9:27 UTC (permalink / raw)
  To: imammedo, david, xiaoguangrong.eric, mst; +Cc: qemu-ppc, qemu-devel, sbhat

nvdimm_device_list is required for parsing the list for devices
in subsequent patches. Move it to common area.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-By: Igor Mammedov <imammedo@redhat.com>
---
This looks to break the mips*-softmmu build.
The mips depend on CONFIG_NVDIMM_ACPI, adding CONFIG_NVDIMM looks wrong.
Is there some CONFIG tweak I need to do here? OR

Should I move these functions to utilities like I have
done here -(https://github.com/ShivaprasadGBhat/qemu/commit/1b8eaea132a8b19c90b4fcc4d93da356029f4667)?
---
 hw/acpi/nvdimm.c        |   27 ---------------------------
 hw/mem/nvdimm.c         |   27 +++++++++++++++++++++++++++
 include/hw/mem/nvdimm.h |    2 ++
 3 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 9fdad6dc3f..94baba1b8f 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -33,33 +33,6 @@
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
 
-static int nvdimm_device_list(Object *obj, void *opaque)
-{
-    GSList **list = opaque;
-
-    if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
-        *list = g_slist_append(*list, DEVICE(obj));
-    }
-
-    object_child_foreach(obj, nvdimm_device_list, opaque);
-    return 0;
-}
-
-/*
- * inquire NVDIMM devices and link them into the list which is
- * returned to the caller.
- *
- * Note: it is the caller's responsibility to free the list to avoid
- * memory leak.
- */
-static GSList *nvdimm_get_device_list(void)
-{
-    GSList *list = NULL;
-
-    object_child_foreach(qdev_get_machine(), nvdimm_device_list, &list);
-    return list;
-}
-
 #define NVDIMM_UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)             \
    { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
      (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,          \
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index bf2adf5e16..f221ec7a9a 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -29,6 +29,33 @@
 #include "hw/mem/nvdimm.h"
 #include "hw/mem/memory-device.h"
 
+static int nvdimm_device_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
+        *list = g_slist_append(*list, DEVICE(obj));
+    }
+
+    object_child_foreach(obj, nvdimm_device_list, opaque);
+    return 0;
+}
+
+/*
+ * inquire NVDIMM devices and link them into the list which is
+ * returned to the caller.
+ *
+ * Note: it is the caller's responsibility to free the list to avoid
+ * memory leak.
+ */
+GSList *nvdimm_get_device_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(qdev_get_machine(), nvdimm_device_list, &list);
+    return list;
+}
+
 static void nvdimm_get_label_size(Object *obj, Visitor *v, const char *name,
                                   void *opaque, Error **errp)
 {
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index 523a9b3d4a..bad4fc04b5 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -150,4 +150,6 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
                        uint32_t ram_slots);
 void nvdimm_plug(NVDIMMState *state);
 void nvdimm_acpi_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev);
+GSList *nvdimm_get_device_list(void);
+
 #endif



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support
  2019-05-13  9:25 [Qemu-devel] [RFC v2 PATCH 0/3] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
  2019-05-13  9:27 ` [Qemu-devel] [RFC v2 PATCH 1/3] mem: make nvdimm_device_list global Shivaprasad G Bhat
@ 2019-05-13  9:28 ` Shivaprasad G Bhat
  2019-05-14  2:22   ` David Gibson
  2019-05-22 16:07   ` Fabiano Rosas
  2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
  2 siblings, 2 replies; 16+ messages in thread
From: Shivaprasad G Bhat @ 2019-05-13  9:28 UTC (permalink / raw)
  To: imammedo, david, xiaoguangrong.eric, mst; +Cc: qemu-ppc, qemu-devel, sbhat

Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
device interface in QEMU to support virtual NVDIMM devices for Power (May have
to re-look at this later).  Create the required DT entries for the
device (some entries have dummy values right now).

The patch creates the required DT node and sends a hotplug
interrupt to the guest. Guest is expected to undertake the normal
DR resource add path in response and start issuing PAPR SCM hcalls.

This is how it can be used ..
Add nvdimm=on to the qemu machine argument.
Ex : -machine pseries,nvdimm=on
For coldplug, the device to be added in qemu command line as shown below
-object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
-device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0

For hotplug, the device to be added from monitor as below
object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
               [Early implementation]
---
---
 default-configs/ppc64-softmmu.mak |    1 
 hw/mem/Kconfig                    |    2 
 hw/mem/nvdimm.c                   |   43 ++++++++
 hw/ppc/spapr.c                    |  202 +++++++++++++++++++++++++++++++++++--
 hw/ppc/spapr_drc.c                |   18 +++
 hw/ppc/spapr_events.c             |    4 +
 include/hw/mem/nvdimm.h           |    6 +
 include/hw/ppc/spapr.h            |   12 ++
 include/hw/ppc/spapr_drc.h        |    9 ++
 9 files changed, 286 insertions(+), 11 deletions(-)

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index cca52665d9..ae0841fa3a 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -8,3 +8,4 @@ CONFIG_POWERNV=y
 
 # For pSeries
 CONFIG_PSERIES=y
+CONFIG_NVDIMM=y
diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
index 620fd4cb59..2ad052a536 100644
--- a/hw/mem/Kconfig
+++ b/hw/mem/Kconfig
@@ -8,4 +8,4 @@ config MEM_DEVICE
 config NVDIMM
     bool
     default y
-    depends on PC
+    depends on (PC || PSERIES)
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index f221ec7a9a..deaebaaaa5 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -93,11 +93,54 @@ out:
     error_propagate(errp, local_err);
 }
 
+static void nvdimm_get_uuid(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+    char *value = NULL;
+
+    value = qemu_uuid_unparse_strdup(&nvdimm->uuid);
+
+    visit_type_str(v, name, &value, errp);
+}
+
+
+static void nvdimm_set_uuid(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+    Error *local_err = NULL;
+    char *value;
+
+    visit_type_str(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    if (strcmp(value, "") == 0) {
+        error_setg(&local_err, "Property '%s.%s' %s is required"
+                   " at least 0x%lx", object_get_typename(obj),
+                   name, value, MIN_NAMESPACE_LABEL_SIZE);
+        goto out;
+    }
+
+    if (qemu_uuid_parse(value, &nvdimm->uuid) != 0) {
+        error_setg(errp, "Invalid UUID");
+        return;
+    }
+out:
+    error_propagate(errp, local_err);
+}
+
+
 static void nvdimm_init(Object *obj)
 {
     object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
                         nvdimm_get_label_size, nvdimm_set_label_size, NULL,
                         NULL, NULL);
+
+    object_property_add(obj, NVDIMM_UUID_PROP, "QemuUUID", nvdimm_get_uuid,
+                        nvdimm_set_uuid, NULL, NULL, NULL);
 }
 
 static void nvdimm_finalize(Object *obj)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2ef3ce4362..b6951577e7 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -74,6 +74,7 @@
 #include "qemu/cutils.h"
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/mem/memory-device.h"
+#include "hw/mem/nvdimm.h"
 
 #include <libfdt.h>
 
@@ -699,6 +700,7 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
     uint8_t *int_buf, *cur_index;
     int ret;
     uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
+    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
     uint64_t addr, cur_addr, size;
     uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
     uint64_t mem_end = machine->device_memory->base +
@@ -735,12 +737,20 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
             nr_entries++;
         }
 
-        /* Entry for DIMM */
-        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
-        g_assert(drc);
-        elem = spapr_get_drconf_cell(size / lmb_size, addr,
-                                     spapr_drc_index(drc), node,
-                                     SPAPR_LMB_FLAGS_ASSIGNED);
+        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
+            /* Entry for NVDIMM */
+            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
+            g_assert(drc);
+            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
+                                         spapr_drc_index(drc), -1, 0);
+        } else {
+            /* Entry for DIMM */
+            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
+            g_assert(drc);
+            elem = spapr_get_drconf_cell(size / lmb_size, addr,
+                                         spapr_drc_index(drc), node,
+                                         SPAPR_LMB_FLAGS_ASSIGNED);
+        }
         QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
         nr_entries++;
         cur_addr = addr + size;
@@ -1235,6 +1245,87 @@ static void spapr_dt_hypervisor(SpaprMachineState *spapr, void *fdt)
     }
 }
 
+static int spapr_populate_nvdimm_node(void *fdt, int parent_offset,
+                                      NVDIMMDevice *nvdimm)
+{
+    int child_offset;
+    char buf[40];
+    SpaprDrc *drc;
+    uint32_t drc_idx;
+    uint32_t node = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_NODE_PROP,
+                                             &error_abort);
+    uint64_t addr = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_ADDR_PROP,
+                                             &error_abort);
+    uint32_t associativity[] = {
+        cpu_to_be32(0x4), /* length */
+        cpu_to_be32(0x0), cpu_to_be32(0x0),
+        cpu_to_be32(0x0), cpu_to_be32(node)
+    };
+    uint64_t lsize = nvdimm->label_size;
+    uint64_t size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
+                                            NULL);
+
+    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
+                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
+    g_assert(drc);
+
+    drc_idx = spapr_drc_index(drc);
+
+    sprintf(buf, "pmem@%x", drc_idx);
+    child_offset = fdt_add_subnode(fdt, parent_offset, buf);
+    _FDT(child_offset);
+
+    _FDT((fdt_setprop_cell(fdt, child_offset, "reg", drc_idx)));
+    _FDT((fdt_setprop_string(fdt, child_offset, "compatible", "ibm,pmemory")));
+    _FDT((fdt_setprop_string(fdt, child_offset, "device_type", "ibm,pmemory")));
+
+    _FDT((fdt_setprop(fdt, child_offset, "ibm,associativity", associativity,
+                      sizeof(associativity))));
+
+    qemu_uuid_unparse(&nvdimm->uuid, buf);
+    _FDT((fdt_setprop_string(fdt, child_offset, "ibm,unit-guid", buf)));
+
+    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,my-drc-index", drc_idx)));
+
+    /*NB : What it should be? */
+    _FDT(fdt_setprop_cell(fdt, child_offset, "ibm,latency-attribute", 828));
+
+    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,block-size",
+                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
+    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,number-of-blocks",
+                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
+    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,metadata-size", lsize)));
+
+    return child_offset;
+}
+
+static void spapr_dt_nvdimm(void *fdt)
+{
+    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
+    GSList *dimms = NULL;
+
+    if (offset < 0) {
+        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
+        _FDT(offset);
+        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
+        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
+        _FDT((fdt_setprop_string(fdt, offset, "device_type",
+                                 "ibm,persistent-memory")));
+    }
+
+    /*NB : Add drc-info array here */
+
+    /* Create DT entries for cold plugged NVDIMM devices */
+    dimms = nvdimm_get_device_list();
+    for (; dimms; dimms = dimms->next) {
+        NVDIMMDevice *nvdimm = dimms->data;
+
+        spapr_populate_nvdimm_node(fdt, offset, nvdimm);
+    }
+    g_slist_free(dimms);
+    return;
+}
+
 static void *spapr_build_fdt(SpaprMachineState *spapr)
 {
     MachineState *machine = MACHINE(spapr);
@@ -1368,6 +1459,11 @@ static void *spapr_build_fdt(SpaprMachineState *spapr)
         }
     }
 
+    /* NVDIMM devices */
+    if (spapr->nvdimm_enabled) {
+        spapr_dt_nvdimm(fdt);
+    }
+
     return fdt;
 }
 
@@ -3324,6 +3420,20 @@ static void spapr_set_host_serial(Object *obj, const char *value, Error **errp)
     spapr->host_serial = g_strdup(value);
 }
 
+static bool spapr_get_nvdimm(Object *obj, Error **errp)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
+
+    return spapr->nvdimm_enabled;
+}
+
+static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
+
+    spapr->nvdimm_enabled = value;
+}
+
 static void spapr_instance_init(Object *obj)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(obj);
@@ -3380,6 +3490,12 @@ static void spapr_instance_init(Object *obj)
         &error_abort);
     object_property_set_description(obj, "host-serial",
         "Host serial number to advertise in guest device tree", &error_abort);
+
+    object_property_add_bool(obj, "nvdimm",
+                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
+    object_property_set_description(obj, "nvdimm",
+                                    "Enable support for nvdimm devices",
+                                    NULL);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
@@ -3404,6 +3520,16 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
     }
 }
 
+int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
+                           void *fdt, int *fdt_start_offset, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(drc->dev);
+
+    *fdt_start_offset = spapr_populate_nvdimm_node(fdt, 0, nvdimm);
+
+    return 0;
+}
+
 int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
                           void *fdt, int *fdt_start_offset, Error **errp)
 {
@@ -3466,12 +3592,37 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
     }
 }
 
+static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr, Error **errp)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
+    SpaprDrc *drc;
+    bool hotplugged = spapr_drc_hotplugged(dev);
+    Error *local_err = NULL;
+
+    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
+                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
+    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
+                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
+    g_assert(drc);
+
+    spapr_drc_attach(drc, dev, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    if (hotplugged) {
+        spapr_hotplug_req_add_by_index(drc);
+    }
+}
+
 static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                               Error **errp)
 {
     Error *local_err = NULL;
     SpaprMachineState *ms = SPAPR_MACHINE(hotplug_dev);
     PCDIMMDevice *dimm = PC_DIMM(dev);
+    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
     uint64_t size, addr;
 
     size = memory_device_get_region_size(MEMORY_DEVICE(dev), &error_abort);
@@ -3487,8 +3638,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out_unplug;
     }
 
-    spapr_add_lmbs(dev, addr, size, spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
-                   &local_err);
+    if (!is_nvdimm) {
+        spapr_add_lmbs(dev, addr, size,
+                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
+                       &local_err);
+    } else {
+        spapr_add_nvdimm(dev, addr, &local_err);
+    }
+
     if (local_err) {
         goto out_unplug;
     }
@@ -3506,6 +3663,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
 {
     const SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
     SpaprMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
+    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
     PCDIMMDevice *dimm = PC_DIMM(dev);
     Error *local_err = NULL;
     uint64_t size;
@@ -3523,10 +3681,28 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         return;
     }
 
-    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
+    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
         error_setg(errp, "Hotplugged memory size must be a multiple of "
-                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
+                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
         return;
+    } else if (is_nvdimm) {
+        char *uuidstr = NULL;
+        QemuUUID uuid;
+        if (size % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
+            error_setg(errp, "NVDIMM memory size excluding the label area"
+                       " must be a multiple of "
+                       "%" PRIu64 "MB",
+                       SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
+            return;
+        }
+
+        uuidstr = object_property_get_str(OBJECT(dimm), NVDIMM_UUID_PROP, NULL);
+        qemu_uuid_parse(uuidstr, &uuid);
+        if (qemu_uuid_is_null(&uuid)) {
+            error_setg(errp, "NVDIMM device requires the uuid to be set");
+            return;
+        }
+        g_free(uuidstr);
     }
 
     memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
@@ -3666,6 +3842,12 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
     int i;
     SpaprDrc *drc;
 
+    if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
+        error_setg(&local_err,
+                   "nvdimm device hot unplug is not supported yet.");
+        goto out;
+    }
+
     size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
     nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
 
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 597f236b9c..983440a711 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -707,6 +707,17 @@ static void spapr_drc_phb_class_init(ObjectClass *k, void *data)
     drck->dt_populate = spapr_phb_dt_populate;
 }
 
+static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
+{
+    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
+
+    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
+    drck->typename = "MEM";
+    drck->drc_name_prefix = "PMEM ";
+    drck->release = NULL;
+    drck->dt_populate = spapr_pmem_dt_populate;
+}
+
 static const TypeInfo spapr_dr_connector_info = {
     .name          = TYPE_SPAPR_DR_CONNECTOR,
     .parent        = TYPE_DEVICE,
@@ -757,6 +768,12 @@ static const TypeInfo spapr_drc_phb_info = {
     .class_init    = spapr_drc_phb_class_init,
 };
 
+static const TypeInfo spapr_drc_pmem_info = {
+    .name          = TYPE_SPAPR_DRC_PMEM,
+    .parent        = TYPE_SPAPR_DRC_LOGICAL,
+    .class_init    = spapr_drc_pmem_class_init,
+};
+
 /* helper functions for external users */
 
 SpaprDrc *spapr_drc_by_index(uint32_t index)
@@ -1226,6 +1243,7 @@ static void spapr_drc_register_types(void)
     type_register_static(&spapr_drc_pci_info);
     type_register_static(&spapr_drc_lmb_info);
     type_register_static(&spapr_drc_phb_info);
+    type_register_static(&spapr_drc_pmem_info);
 
     spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
                         rtas_set_indicator);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index ae0f093f59..1141203a87 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
 #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
 #define RTAS_LOG_V6_HP_TYPE_PHB                          4
 #define RTAS_LOG_V6_HP_TYPE_PCI                          5
+#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
     uint8_t hotplug_action;
 #define RTAS_LOG_V6_HP_ACTION_ADD                        1
 #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
@@ -529,6 +530,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     case SPAPR_DR_CONNECTOR_TYPE_PHB:
         hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PHB;
         break;
+    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
+        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
+        break;
     default:
         /* we shouldn't be signaling hotplug events for resources
          * that don't support them
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index bad4fc04b5..3089615e17 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -49,6 +49,7 @@
                                                TYPE_NVDIMM)
 
 #define NVDIMM_LABEL_SIZE_PROP "label-size"
+#define NVDIMM_UUID_PROP "uuid"
 #define NVDIMM_UNARMED_PROP    "unarmed"
 
 struct NVDIMMDevice {
@@ -83,6 +84,11 @@ struct NVDIMMDevice {
      * the guest write persistence.
      */
     bool unarmed;
+
+    /*
+     * The PPC64 - spapr requires each nvdimm device have a uuid.
+     */
+    QemuUUID uuid;
 };
 typedef struct NVDIMMDevice NVDIMMDevice;
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 7e32f309c2..394ea26335 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -202,6 +202,7 @@ struct SpaprMachineState {
     SpaprCapabilities def, eff, mig;
 
     unsigned gpu_numa_id;
+    bool nvdimm_enabled;
 };
 
 #define H_SUCCESS         0
@@ -794,6 +795,8 @@ int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
 void spapr_lmb_release(DeviceState *dev);
 int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
                           void *fdt, int *fdt_start_offset, Error **errp);
+int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
+                          void *fdt, int *fdt_start_offset, Error **errp);
 void spapr_phb_release(DeviceState *dev);
 int spapr_phb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
                           void *fdt, int *fdt_start_offset, Error **errp);
@@ -829,6 +832,15 @@ int spapr_rtc_import_offset(SpaprRtcState *rtc, int64_t legacy_offset);
 #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
 #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
 
+/*
+ * The nvdimm size should be aligned to SCM block size.
+ * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
+ * inorder to have SCM regions not to overlap with dimm memory regions.
+ * The SCM devices can have variable block sizes. For now, fixing the
+ * block size to the minimum value.
+ */
+#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
+
 void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
 
 #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index fad0a887f9..8b7ce41a0f 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -79,6 +79,13 @@
 #define SPAPR_DRC_PHB(obj) OBJECT_CHECK(SpaprDrc, (obj), \
                                         TYPE_SPAPR_DRC_PHB)
 
+#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
+#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(SpaprDrcClass, obj, TYPE_SPAPR_DRC_PMEM)
+#define SPAPR_DRC_PMEM_CLASS(klass) \
+        OBJECT_CLASS_CHECK(SpaprDrcClass, klass, TYPE_SPAPR_DRC_PMEM)
+#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(SpaprDrc, (obj), \
+                                        TYPE_SPAPR_DRC_PMEM)
 /*
  * Various hotplug types managed by SpaprDrc
  *
@@ -96,6 +103,7 @@ typedef enum {
     SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
     SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
     SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
 } SpaprDrcTypeShift;
 
 typedef enum {
@@ -105,6 +113,7 @@ typedef enum {
     SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
     SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
     SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
+    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
 } SpaprDrcType;
 
 /*



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-05-13  9:25 [Qemu-devel] [RFC v2 PATCH 0/3] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
  2019-05-13  9:27 ` [Qemu-devel] [RFC v2 PATCH 1/3] mem: make nvdimm_device_list global Shivaprasad G Bhat
  2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support Shivaprasad G Bhat
@ 2019-05-13  9:28 ` Shivaprasad G Bhat
  2019-05-14  4:38   ` David Gibson
  2019-05-22 18:08   ` Fabiano Rosas
  2 siblings, 2 replies; 16+ messages in thread
From: Shivaprasad G Bhat @ 2019-05-13  9:28 UTC (permalink / raw)
  To: imammedo, david, xiaoguangrong.eric, mst; +Cc: qemu-ppc, qemu-devel, sbhat

This patch implements few of the necessary hcalls for the nvdimm support.

PAPR semantics is such that each NVDIMM device is comprising of multiple
SCM(Storage Class Memory) blocks. The guest requests the hypervisor to bind
each of the SCM blocks of the NVDIMM device using hcalls. There can be
SCM block unbind requests in case of driver errors or unplug(not supported now)
use cases. The NVDIMM label read/writes are done through hcalls.

Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
unbind, and queries using hcalls on those blocks can come independently. This
doesn't fit well into the qemu device semantics, where the map/unmap are done
at the (whole)device/object level granularity. The patch doesnt actually
bind/unbind on hcalls but let it happen at the object_add/del phase itself
instead.

The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
region level granularity. Without interleaving, each virtual NVDIMM device is
presented as separate region. There is no way to configure the virtual NVDIMM
interleaving for the guests today. So, there is no way a partial bind/unbind
request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
object_add/del.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
---
 hw/ppc/spapr_hcall.c   |  202 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    7 +-
 2 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 6c16d2b120..b6e7d04dcf 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -3,11 +3,13 @@
 #include "sysemu/hw_accel.h"
 #include "sysemu/sysemu.h"
 #include "qemu/log.h"
+#include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "helper_regs.h"
 #include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_drc.h"
 #include "hw/ppc/spapr_cpu_core.h"
 #include "mmu-hash64.h"
 #include "cpu-models.h"
@@ -16,6 +18,7 @@
 #include "hw/ppc/spapr_ovec.h"
 #include "mmu-book3s-v3.h"
 #include "hw/mem/memory-device.h"
+#include "hw/mem/nvdimm.h"
 
 static bool has_spr(PowerPCCPU *cpu, int spr)
 {
@@ -1795,6 +1798,199 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, SpaprMachineState *spapr,
     return H_SUCCESS;
 }
 
+static target_ulong h_scm_read_metadata(PowerPCCPU *cpu,
+                                        SpaprMachineState *spapr,
+                                        target_ulong opcode,
+                                        target_ulong *args)
+{
+    uint32_t drc_index = args[0];
+    uint64_t offset = args[1];
+    uint64_t numBytesToRead = args[2];
+    SpaprDrc *drc = spapr_drc_by_index(drc_index);
+    NVDIMMDevice *nvdimm = NULL;
+    NVDIMMClass *ddc = NULL;
+
+    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+        return H_PARAMETER;
+    }
+
+    if (numBytesToRead != 1 && numBytesToRead != 2 &&
+        numBytesToRead != 4 && numBytesToRead != 8) {
+        return H_P3;
+    }
+
+    nvdimm = NVDIMM(drc->dev);
+    if ((offset + numBytesToRead < offset) ||
+        (nvdimm->label_size < numBytesToRead + offset)) {
+        return H_P2;
+    }
+
+    ddc = NVDIMM_GET_CLASS(nvdimm);
+    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);
+
+    return H_SUCCESS;
+}
+
+
+static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
+                                         SpaprMachineState *spapr,
+                                         target_ulong opcode,
+                                         target_ulong *args)
+{
+    uint32_t drc_index = args[0];
+    uint64_t offset = args[1];
+    uint64_t data = args[2];
+    int8_t numBytesToWrite = args[3];
+    SpaprDrc *drc = spapr_drc_by_index(drc_index);
+    NVDIMMDevice *nvdimm = NULL;
+    DeviceState *dev = NULL;
+    NVDIMMClass *ddc = NULL;
+
+    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+        return H_PARAMETER;
+    }
+
+    if (numBytesToWrite != 1 && numBytesToWrite != 2 &&
+        numBytesToWrite != 4 && numBytesToWrite != 8) {
+        return H_P4;
+    }
+
+    dev = drc->dev;
+    nvdimm = NVDIMM(dev);
+    if ((nvdimm->label_size < numBytesToWrite + offset) ||
+        (offset + numBytesToWrite < offset)) {
+        return H_P2;
+    }
+
+    ddc = NVDIMM_GET_CLASS(nvdimm);
+    ddc->write_label_data(nvdimm, &data, numBytesToWrite, offset);
+
+    return H_SUCCESS;
+}
+
+static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
+                                        target_ulong opcode,
+                                        target_ulong *args)
+{
+    uint32_t drc_index = args[0];
+    uint64_t starting_idx = args[1];
+    uint64_t no_of_scm_blocks_to_bind = args[2];
+    uint64_t target_logical_mem_addr = args[3];
+    uint64_t continue_token = args[4];
+    uint64_t size;
+    uint64_t total_no_of_scm_blocks;
+
+    SpaprDrc *drc = spapr_drc_by_index(drc_index);
+    hwaddr addr;
+    DeviceState *dev = NULL;
+    PCDIMMDevice *dimm = NULL;
+    Error *local_err = NULL;
+
+    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+        return H_PARAMETER;
+    }
+
+    dev = drc->dev;
+    dimm = PC_DIMM(dev);
+
+    size = object_property_get_uint(OBJECT(dimm),
+                                    PC_DIMM_SIZE_PROP, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return H_PARAMETER;
+    }
+
+    total_no_of_scm_blocks = size / SPAPR_MINIMUM_SCM_BLOCK_SIZE;
+
+    if ((starting_idx > total_no_of_scm_blocks) ||
+        (no_of_scm_blocks_to_bind > total_no_of_scm_blocks)) {
+        return H_P2;
+    }
+
+    if (((starting_idx + no_of_scm_blocks_to_bind) < starting_idx) ||
+        ((starting_idx + no_of_scm_blocks_to_bind) > total_no_of_scm_blocks)) {
+        return H_P3;
+    }
+
+    /* Currently qemu assigns the address. */
+    if (target_logical_mem_addr != 0xffffffffffffffff) {
+        return H_OVERLAP;
+    }
+
+    /*
+     * Currently continue token should be zero qemu has already bound
+     * everything and this hcall doesnt return H_BUSY.
+     */
+    if (continue_token > 0) {
+        return H_P5;
+    }
+
+    /* NB : Already bound, Return target logical address in R4 */
+    addr = object_property_get_uint(OBJECT(dimm),
+                                    PC_DIMM_ADDR_PROP, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return H_PARAMETER;
+    }
+
+    args[1] = addr;
+    args[2] = no_of_scm_blocks_to_bind;
+
+    return H_SUCCESS;
+}
+
+static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
+                                        target_ulong opcode,
+                                        target_ulong *args)
+{
+    uint32_t drc_index = args[0];
+    uint64_t starting_scm_logical_addr = args[1];
+    uint64_t no_of_scm_blocks_to_unbind = args[2];
+    uint64_t size_to_unbind;
+    uint64_t continue_token = args[3];
+    Range blockrange = range_empty;
+    Range nvdimmrange = range_empty;
+    SpaprDrc *drc = spapr_drc_by_index(drc_index);
+    DeviceState *dev = NULL;
+    PCDIMMDevice *dimm = NULL;
+    uint64_t size, addr;
+
+    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+        return H_PARAMETER;
+    }
+
+    /* Check if starting_scm_logical_addr is block aligned */
+    if (!QEMU_IS_ALIGNED(starting_scm_logical_addr,
+                         SPAPR_MINIMUM_SCM_BLOCK_SIZE)) {
+        return H_P2;
+    }
+
+    dev = drc->dev;
+    dimm = PC_DIMM(dev);
+    size = object_property_get_int(OBJECT(dimm), PC_DIMM_SIZE_PROP, NULL);
+    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, NULL);
+
+    range_init_nofail(&nvdimmrange, addr, size);
+
+    size_to_unbind = no_of_scm_blocks_to_unbind * SPAPR_MINIMUM_SCM_BLOCK_SIZE;
+
+
+    range_init_nofail(&blockrange, starting_scm_logical_addr, size_to_unbind);
+
+    if (!range_contains_range(&nvdimmrange, &blockrange)) {
+        return H_P3;
+    }
+
+    if (continue_token > 0) {
+        return H_P3;
+    }
+
+    args[1] = no_of_scm_blocks_to_unbind;
+
+    /*NB : dont do anything, let object_del take care of this for now. */
+    return H_SUCCESS;
+}
+
 static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
 static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
 
@@ -1894,6 +2090,12 @@ static void hypercall_register_types(void)
     /* qemu/KVM-PPC specific hcalls */
     spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
 
+    /* qemu/scm specific hcalls */
+    spapr_register_hypercall(H_SCM_READ_METADATA, h_scm_read_metadata);
+    spapr_register_hypercall(H_SCM_WRITE_METADATA, h_scm_write_metadata);
+    spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
+    spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
+
     /* ibm,client-architecture-support support */
     spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 394ea26335..48e2cc9d67 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -283,6 +283,7 @@ struct SpaprMachineState {
 #define H_P7              -60
 #define H_P8              -61
 #define H_P9              -62
+#define H_OVERLAP         -68
 #define H_UNSUPPORTED_FLAG -256
 #define H_MULTI_THREADS_ACTIVE -9005
 
@@ -490,8 +491,12 @@ struct SpaprMachineState {
 #define H_INT_ESB               0x3C8
 #define H_INT_SYNC              0x3CC
 #define H_INT_RESET             0x3D0
+#define H_SCM_READ_METADATA     0x3E4
+#define H_SCM_WRITE_METADATA    0x3E8
+#define H_SCM_BIND_MEM          0x3EC
+#define H_SCM_UNBIND_MEM        0x3F0
 
-#define MAX_HCALL_OPCODE        H_INT_RESET
+#define MAX_HCALL_OPCODE        H_SCM_UNBIND_MEM
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support
  2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support Shivaprasad G Bhat
@ 2019-05-14  2:22   ` David Gibson
  2019-05-15  7:00     ` Shivaprasad G Bhat
  2019-05-22 16:07   ` Fabiano Rosas
  1 sibling, 1 reply; 16+ messages in thread
From: David Gibson @ 2019-05-14  2:22 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, imammedo, qemu-ppc, xiaoguangrong.eric, mst

[-- Attachment #1: Type: text/plain, Size: 24396 bytes --]

On Mon, May 13, 2019 at 04:28:02AM -0500, Shivaprasad G Bhat wrote:
> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
> device interface in QEMU to support virtual NVDIMM devices for Power (May have
> to re-look at this later).  Create the required DT entries for the
> device (some entries have dummy values right now).
> 
> The patch creates the required DT node and sends a hotplug
> interrupt to the guest. Guest is expected to undertake the normal
> DR resource add path in response and start issuing PAPR SCM hcalls.
> 
> This is how it can be used ..
> Add nvdimm=on to the qemu machine argument.
> Ex : -machine pseries,nvdimm=on
> For coldplug, the device to be added in qemu command line as shown below
> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
> -device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0
> 
> For hotplug, the device to be added from monitor as below
> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
> device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0
> 
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
>                [Early implementation]
> ---
> ---
>  default-configs/ppc64-softmmu.mak |    1 
>  hw/mem/Kconfig                    |    2 
>  hw/mem/nvdimm.c                   |   43 ++++++++
>  hw/ppc/spapr.c                    |  202 +++++++++++++++++++++++++++++++++++--
>  hw/ppc/spapr_drc.c                |   18 +++
>  hw/ppc/spapr_events.c             |    4 +
>  include/hw/mem/nvdimm.h           |    6 +
>  include/hw/ppc/spapr.h            |   12 ++
>  include/hw/ppc/spapr_drc.h        |    9 ++
>  9 files changed, 286 insertions(+), 11 deletions(-)
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index cca52665d9..ae0841fa3a 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -8,3 +8,4 @@ CONFIG_POWERNV=y
>  
>  # For pSeries
>  CONFIG_PSERIES=y
> +CONFIG_NVDIMM=y
> diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
> index 620fd4cb59..2ad052a536 100644
> --- a/hw/mem/Kconfig
> +++ b/hw/mem/Kconfig
> @@ -8,4 +8,4 @@ config MEM_DEVICE
>  config NVDIMM
>      bool
>      default y
> -    depends on PC
> +    depends on (PC || PSERIES)
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index f221ec7a9a..deaebaaaa5 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -93,11 +93,54 @@ out:
>      error_propagate(errp, local_err);
>  }
>  
> +static void nvdimm_get_uuid(Object *obj, Visitor *v, const char *name,
> +                                  void *opaque, Error **errp)
> +{
> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> +    char *value = NULL;
> +
> +    value = qemu_uuid_unparse_strdup(&nvdimm->uuid);
> +
> +    visit_type_str(v, name, &value, errp);
> +}
> +
> +
> +static void nvdimm_set_uuid(Object *obj, Visitor *v, const char *name,
> +                                  void *opaque, Error **errp)
> +{
> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> +    Error *local_err = NULL;
> +    char *value;
> +
> +    visit_type_str(v, name, &value, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +
> +    if (strcmp(value, "") == 0) {
> +        error_setg(&local_err, "Property '%s.%s' %s is required"
> +                   " at least 0x%lx", object_get_typename(obj),
> +                   name, value, MIN_NAMESPACE_LABEL_SIZE);
> +        goto out;
> +    }
> +
> +    if (qemu_uuid_parse(value, &nvdimm->uuid) != 0) {
> +        error_setg(errp, "Invalid UUID");
> +        return;
> +    }
> +out:
> +    error_propagate(errp, local_err);
> +}
> +
> +
>  static void nvdimm_init(Object *obj)
>  {
>      object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
>                          nvdimm_get_label_size, nvdimm_set_label_size, NULL,
>                          NULL, NULL);
> +
> +    object_property_add(obj, NVDIMM_UUID_PROP, "QemuUUID", nvdimm_get_uuid,
> +                        nvdimm_set_uuid, NULL, NULL, NULL);
>  }
>  
>  static void nvdimm_finalize(Object *obj)
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2ef3ce4362..b6951577e7 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -74,6 +74,7 @@
>  #include "qemu/cutils.h"
>  #include "hw/ppc/spapr_cpu_core.h"
>  #include "hw/mem/memory-device.h"
> +#include "hw/mem/nvdimm.h"
>  
>  #include <libfdt.h>
>  
> @@ -699,6 +700,7 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
>      uint8_t *int_buf, *cur_index;
>      int ret;
>      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>      uint64_t addr, cur_addr, size;
>      uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
>      uint64_t mem_end = machine->device_memory->base +
> @@ -735,12 +737,20 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
>              nr_entries++;
>          }
>  
> -        /* Entry for DIMM */
> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> -        g_assert(drc);
> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
> -                                     spapr_drc_index(drc), node,
> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> +            /* Entry for NVDIMM */
> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
> +            g_assert(drc);
> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
> +                                         spapr_drc_index(drc), -1, 0);
> +        } else {
> +            /* Entry for DIMM */
> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> +            g_assert(drc);
> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
> +                                         spapr_drc_index(drc), node,
> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
> +        }
>          QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>          nr_entries++;
>          cur_addr = addr + size;
> @@ -1235,6 +1245,87 @@ static void spapr_dt_hypervisor(SpaprMachineState *spapr, void *fdt)
>      }
>  }
>  
> +static int spapr_populate_nvdimm_node(void *fdt, int parent_offset,
> +                                      NVDIMMDevice *nvdimm)

spapr_dt_*() is the new preferred naming convention for DT creation
functions.

> +{
> +    int child_offset;
> +    char buf[40];
> +    SpaprDrc *drc;
> +    uint32_t drc_idx;
> +    uint32_t node = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_NODE_PROP,
> +                                             &error_abort);
> +    uint64_t addr = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_ADDR_PROP,
> +                                             &error_abort);
> +    uint32_t associativity[] = {
> +        cpu_to_be32(0x4), /* length */
> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
> +        cpu_to_be32(0x0), cpu_to_be32(node)
> +    };
> +    uint64_t lsize = nvdimm->label_size;
> +    uint64_t size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
> +                                            NULL);
> +
> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> +    g_assert(drc);
> +
> +    drc_idx = spapr_drc_index(drc);
> +
> +    sprintf(buf, "pmem@%x", drc_idx);

This doesn't look right.  According to the PAPR fragment I have, the
name is supposed to be "ibm,pmemory" and the reg is supposed to be
(addr, size) pairs, not DRC indices.

Or do I have an out of date document?

> +    child_offset = fdt_add_subnode(fdt, parent_offset, buf);
> +    _FDT(child_offset);
> +
> +    _FDT((fdt_setprop_cell(fdt, child_offset, "reg", drc_idx)));
> +    _FDT((fdt_setprop_string(fdt, child_offset, "compatible", "ibm,pmemory")));
> +    _FDT((fdt_setprop_string(fdt, child_offset, "device_type", "ibm,pmemory")));
> +
> +    _FDT((fdt_setprop(fdt, child_offset, "ibm,associativity", associativity,
> +                      sizeof(associativity))));
> +
> +    qemu_uuid_unparse(&nvdimm->uuid, buf);
> +    _FDT((fdt_setprop_string(fdt, child_offset, "ibm,unit-guid", buf)));
> +
> +    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,my-drc-index", drc_idx)));
> +
> +    /*NB : What it should be? */

If you don't have a good value, can you just leave it out?

> +    _FDT(fdt_setprop_cell(fdt, child_offset, "ibm,latency-attribute", 828));
> +
> +    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,block-size",
> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> +    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,number-of-blocks",
> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> +    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,metadata-size", lsize)));
> +
> +    return child_offset;
> +}
> +
> +static void spapr_dt_nvdimm(void *fdt)

I'd suggest spapr_dt_nvdimms() or spapr_dt_persistent_memory(), with
spapr_dt_nvdimm() for the single-NVDIMM functions earlier.

> +{
> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
> +    GSList *dimms = NULL;
> +
> +    if (offset < 0) {
> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
> +        _FDT(offset);
> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));

Hmm.  So the docuiment I have specifies #address-cells = 1 here, but
that doesn't make sense with the reg format it specifies for the
children.  So *something* is screwy.  Is there an up to date PAPR
specification for persistent memory I can get?

> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
> +                                 "ibm,persistent-memory")));
> +    }
> +
> +    /*NB : Add drc-info array here */
> +
> +    /* Create DT entries for cold plugged NVDIMM devices */
> +    dimms = nvdimm_get_device_list();
> +    for (; dimms; dimms = dimms->next) {
> +        NVDIMMDevice *nvdimm = dimms->data;
> +
> +        spapr_populate_nvdimm_node(fdt, offset, nvdimm);
> +    }
> +    g_slist_free(dimms);
> +    return;
> +}
> +
>  static void *spapr_build_fdt(SpaprMachineState *spapr)
>  {
>      MachineState *machine = MACHINE(spapr);
> @@ -1368,6 +1459,11 @@ static void *spapr_build_fdt(SpaprMachineState *spapr)
>          }
>      }
>  
> +    /* NVDIMM devices */
> +    if (spapr->nvdimm_enabled) {
> +        spapr_dt_nvdimm(fdt);
> +    }
> +
>      return fdt;
>  }
>  
> @@ -3324,6 +3420,20 @@ static void spapr_set_host_serial(Object *obj, const char *value, Error **errp)
>      spapr->host_serial = g_strdup(value);
>  }
>  
> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    return spapr->nvdimm_enabled;
> +}
> +
> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    spapr->nvdimm_enabled = value;
> +}
> +
>  static void spapr_instance_init(Object *obj)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(obj);
> @@ -3380,6 +3490,12 @@ static void spapr_instance_init(Object *obj)
>          &error_abort);
>      object_property_set_description(obj, "host-serial",
>          "Host serial number to advertise in guest device tree", &error_abort);
> +
> +    object_property_add_bool(obj, "nvdimm",
> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);

Is there actually a reason to have a property for this, rather than
just always having it on?

> +    object_property_set_description(obj, "nvdimm",
> +                                    "Enable support for nvdimm devices",
> +                                    NULL);
>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> @@ -3404,6 +3520,16 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
>      }
>  }
>  
> +int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> +                           void *fdt, int *fdt_start_offset, Error **errp)
> +{
> +    NVDIMMDevice *nvdimm = NVDIMM(drc->dev);
> +
> +    *fdt_start_offset = spapr_populate_nvdimm_node(fdt, 0, nvdimm);
> +
> +    return 0;
> +}
> +
>  int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>                            void *fdt, int *fdt_start_offset, Error **errp)
>  {
> @@ -3466,12 +3592,37 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
>      }
>  }
>  
> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr, Error **errp)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> +    SpaprDrc *drc;
> +    bool hotplugged = spapr_drc_hotplugged(dev);
> +    Error *local_err = NULL;
> +
> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);

This still seems bogus to me.  The whole point of DRCs is that they
exist *before* the device is plugged as a handle for the guest side
plug mechanisms.  Creating them only when the device is added doesn't
make sense.

> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> +    g_assert(drc);
> +
> +    spapr_drc_attach(drc, dev, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    if (hotplugged) {
> +        spapr_hotplug_req_add_by_index(drc);
> +    }
> +}
> +
>  static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                                Error **errp)
>  {
>      Error *local_err = NULL;
>      SpaprMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>      PCDIMMDevice *dimm = PC_DIMM(dev);
> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>      uint64_t size, addr;
>  
>      size = memory_device_get_region_size(MEMORY_DEVICE(dev), &error_abort);
> @@ -3487,8 +3638,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          goto out_unplug;
>      }
>  
> -    spapr_add_lmbs(dev, addr, size, spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> -                   &local_err);
> +    if (!is_nvdimm) {
> +        spapr_add_lmbs(dev, addr, size,
> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> +                       &local_err);
> +    } else {
> +        spapr_add_nvdimm(dev, addr, &local_err);
> +    }
> +
>      if (local_err) {
>          goto out_unplug;
>      }
> @@ -3506,6 +3663,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>  {
>      const SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>      SpaprMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>      PCDIMMDevice *dimm = PC_DIMM(dev);
>      Error *local_err = NULL;
>      uint64_t size;
> @@ -3523,10 +3681,28 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          return;
>      }
>  
> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
>          error_setg(errp, "Hotplugged memory size must be a multiple of "
> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>          return;
> +    } else if (is_nvdimm) {
> +        char *uuidstr = NULL;
> +        QemuUUID uuid;
> +        if (size % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
> +            error_setg(errp, "NVDIMM memory size excluding the label area"

Is this reference to the label area still accurate?

> +                       " must be a multiple of "
> +                       "%" PRIu64 "MB",
> +                       SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> +            return;
> +        }
> +
> +        uuidstr = object_property_get_str(OBJECT(dimm), NVDIMM_UUID_PROP, NULL);
> +        qemu_uuid_parse(uuidstr, &uuid);
> +        if (qemu_uuid_is_null(&uuid)) {
> +            error_setg(errp, "NVDIMM device requires the uuid to be set");
> +            return;
> +        }
> +        g_free(uuidstr);
>      }
>  
>      memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
> @@ -3666,6 +3842,12 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
>      int i;
>      SpaprDrc *drc;
>  
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
> +        error_setg(&local_err,
> +                   "nvdimm device hot unplug is not supported yet.");
> +        goto out;
> +    }
> +
>      size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
>      nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
>  
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 597f236b9c..983440a711 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -707,6 +707,17 @@ static void spapr_drc_phb_class_init(ObjectClass *k, void *data)
>      drck->dt_populate = spapr_phb_dt_populate;
>  }
>  
> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
> +{
> +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> +
> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
> +    drck->typename = "MEM";
> +    drck->drc_name_prefix = "PMEM ";
> +    drck->release = NULL;
> +    drck->dt_populate = spapr_pmem_dt_populate;
> +}
> +
>  static const TypeInfo spapr_dr_connector_info = {
>      .name          = TYPE_SPAPR_DR_CONNECTOR,
>      .parent        = TYPE_DEVICE,
> @@ -757,6 +768,12 @@ static const TypeInfo spapr_drc_phb_info = {
>      .class_init    = spapr_drc_phb_class_init,
>  };
>  
> +static const TypeInfo spapr_drc_pmem_info = {
> +    .name          = TYPE_SPAPR_DRC_PMEM,
> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
> +    .class_init    = spapr_drc_pmem_class_init,
> +};
> +
>  /* helper functions for external users */
>  
>  SpaprDrc *spapr_drc_by_index(uint32_t index)
> @@ -1226,6 +1243,7 @@ static void spapr_drc_register_types(void)
>      type_register_static(&spapr_drc_pci_info);
>      type_register_static(&spapr_drc_lmb_info);
>      type_register_static(&spapr_drc_phb_info);
> +    type_register_static(&spapr_drc_pmem_info);
>  
>      spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>                          rtas_set_indicator);
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index ae0f093f59..1141203a87 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
>  #define RTAS_LOG_V6_HP_TYPE_PHB                          4
>  #define RTAS_LOG_V6_HP_TYPE_PCI                          5
> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
>      uint8_t hotplug_action;
>  #define RTAS_LOG_V6_HP_ACTION_ADD                        1
>  #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> @@ -529,6 +530,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>      case SPAPR_DR_CONNECTOR_TYPE_PHB:
>          hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PHB;
>          break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
> +        break;
>      default:
>          /* we shouldn't be signaling hotplug events for resources
>           * that don't support them
> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> index bad4fc04b5..3089615e17 100644
> --- a/include/hw/mem/nvdimm.h
> +++ b/include/hw/mem/nvdimm.h
> @@ -49,6 +49,7 @@
>                                                 TYPE_NVDIMM)
>  
>  #define NVDIMM_LABEL_SIZE_PROP "label-size"
> +#define NVDIMM_UUID_PROP "uuid"
>  #define NVDIMM_UNARMED_PROP    "unarmed"
>  
>  struct NVDIMMDevice {
> @@ -83,6 +84,11 @@ struct NVDIMMDevice {
>       * the guest write persistence.
>       */
>      bool unarmed;
> +
> +    /*
> +     * The PPC64 - spapr requires each nvdimm device have a uuid.
> +     */
> +    QemuUUID uuid;
>  };
>  typedef struct NVDIMMDevice NVDIMMDevice;
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 7e32f309c2..394ea26335 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -202,6 +202,7 @@ struct SpaprMachineState {
>      SpaprCapabilities def, eff, mig;
>  
>      unsigned gpu_numa_id;
> +    bool nvdimm_enabled;
>  };
>  
>  #define H_SUCCESS         0
> @@ -794,6 +795,8 @@ int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>  void spapr_lmb_release(DeviceState *dev);
>  int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>                            void *fdt, int *fdt_start_offset, Error **errp);
> +int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> +                          void *fdt, int *fdt_start_offset, Error **errp);
>  void spapr_phb_release(DeviceState *dev);
>  int spapr_phb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>                            void *fdt, int *fdt_start_offset, Error **errp);
> @@ -829,6 +832,15 @@ int spapr_rtc_import_offset(SpaprRtcState *rtc, int64_t legacy_offset);
>  #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
>  #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
>  
> +/*
> + * The nvdimm size should be aligned to SCM block size.
> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
> + * inorder to have SCM regions not to overlap with dimm memory regions.
> + * The SCM devices can have variable block sizes. For now, fixing the
> + * block size to the minimum value.
> + */
> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
> +
>  void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>  
>  #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index fad0a887f9..8b7ce41a0f 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -79,6 +79,13 @@
>  #define SPAPR_DRC_PHB(obj) OBJECT_CHECK(SpaprDrc, (obj), \
>                                          TYPE_SPAPR_DRC_PHB)
>  
> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
> +        OBJECT_GET_CLASS(SpaprDrcClass, obj, TYPE_SPAPR_DRC_PMEM)
> +#define SPAPR_DRC_PMEM_CLASS(klass) \
> +        OBJECT_CLASS_CHECK(SpaprDrcClass, klass, TYPE_SPAPR_DRC_PMEM)
> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(SpaprDrc, (obj), \
> +                                        TYPE_SPAPR_DRC_PMEM)
>  /*
>   * Various hotplug types managed by SpaprDrc
>   *
> @@ -96,6 +103,7 @@ typedef enum {
>      SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
>      SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
>      SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
>  } SpaprDrcTypeShift;
>  
>  typedef enum {
> @@ -105,6 +113,7 @@ typedef enum {
>      SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
>      SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
>      SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
>  } SpaprDrcType;
>  
>  /*
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
@ 2019-05-14  4:38   ` David Gibson
  2019-07-11  5:46     ` Shivaprasad G Bhat
  2019-05-22 18:08   ` Fabiano Rosas
  1 sibling, 1 reply; 16+ messages in thread
From: David Gibson @ 2019-05-14  4:38 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, imammedo, qemu-ppc, xiaoguangrong.eric, mst

[-- Attachment #1: Type: text/plain, Size: 12008 bytes --]

On Mon, May 13, 2019 at 04:28:36AM -0500, Shivaprasad G Bhat wrote:
> This patch implements few of the necessary hcalls for the nvdimm support.
> 
> PAPR semantics is such that each NVDIMM device is comprising of multiple
> SCM(Storage Class Memory) blocks. The guest requests the hypervisor to bind
> each of the SCM blocks of the NVDIMM device using hcalls. There can be
> SCM block unbind requests in case of driver errors or unplug(not supported now)
> use cases. The NVDIMM label read/writes are done through hcalls.
> 
> Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
> unbind, and queries using hcalls on those blocks can come independently. This
> doesn't fit well into the qemu device semantics, where the map/unmap are done
> at the (whole)device/object level granularity. The patch doesnt actually
> bind/unbind on hcalls but let it happen at the object_add/del phase itself
> instead.
> 
> The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
> region level granularity. Without interleaving, each virtual NVDIMM device is
> presented as separate region. There is no way to configure the virtual NVDIMM
> interleaving for the guests today. So, there is no way a partial bind/unbind
> request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
> virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
> object_add/del.
> 
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> ---
>  hw/ppc/spapr_hcall.c   |  202 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    7 +-
>  2 files changed, 208 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 6c16d2b120..b6e7d04dcf 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -3,11 +3,13 @@
>  #include "sysemu/hw_accel.h"
>  #include "sysemu/sysemu.h"
>  #include "qemu/log.h"
> +#include "qemu/range.h"
>  #include "qemu/error-report.h"
>  #include "cpu.h"
>  #include "exec/exec-all.h"
>  #include "helper_regs.h"
>  #include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_drc.h"
>  #include "hw/ppc/spapr_cpu_core.h"
>  #include "mmu-hash64.h"
>  #include "cpu-models.h"
> @@ -16,6 +18,7 @@
>  #include "hw/ppc/spapr_ovec.h"
>  #include "mmu-book3s-v3.h"
>  #include "hw/mem/memory-device.h"
> +#include "hw/mem/nvdimm.h"
>  
>  static bool has_spr(PowerPCCPU *cpu, int spr)
>  {
> @@ -1795,6 +1798,199 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      return H_SUCCESS;
>  }
>  
> +static target_ulong h_scm_read_metadata(PowerPCCPU *cpu,
> +                                        SpaprMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t offset = args[1];
> +    uint64_t numBytesToRead = args[2];
> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +    NVDIMMDevice *nvdimm = NULL;
> +    NVDIMMClass *ddc = NULL;
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (numBytesToRead != 1 && numBytesToRead != 2 &&
> +        numBytesToRead != 4 && numBytesToRead != 8) {
> +        return H_P3;
> +    }
> +
> +    nvdimm = NVDIMM(drc->dev);
> +    if ((offset + numBytesToRead < offset) ||
> +        (nvdimm->label_size < numBytesToRead + offset)) {
> +        return H_P2;
> +    }
> +
> +    ddc = NVDIMM_GET_CLASS(nvdimm);
> +    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);

I'm pretty sure this will need some sort of byteswap.  args[0] is
effectively in host native order (since it maps to a register, not
memory).  But the guest will have to assume some byteorder (BE, I'm
guessing, because PAPR) in order to map that register to an in-memory
byteorder.  So, you'll need to compensate for that here.

> +    return H_SUCCESS;
> +}
> +
> +
> +static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
> +                                         SpaprMachineState *spapr,
> +                                         target_ulong opcode,
> +                                         target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t offset = args[1];
> +    uint64_t data = args[2];
> +    int8_t numBytesToWrite = args[3];
> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +    NVDIMMDevice *nvdimm = NULL;
> +    DeviceState *dev = NULL;
> +    NVDIMMClass *ddc = NULL;
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (numBytesToWrite != 1 && numBytesToWrite != 2 &&
> +        numBytesToWrite != 4 && numBytesToWrite != 8) {
> +        return H_P4;
> +    }
> +
> +    dev = drc->dev;
> +    nvdimm = NVDIMM(dev);
> +    if ((nvdimm->label_size < numBytesToWrite + offset) ||
> +        (offset + numBytesToWrite < offset)) {
> +        return H_P2;
> +    }
> +
> +    ddc = NVDIMM_GET_CLASS(nvdimm);
> +    ddc->write_label_data(nvdimm, &data, numBytesToWrite, offset);

Likewise here.

> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t starting_idx = args[1];
> +    uint64_t no_of_scm_blocks_to_bind = args[2];
> +    uint64_t target_logical_mem_addr = args[3];
> +    uint64_t continue_token = args[4];
> +    uint64_t size;
> +    uint64_t total_no_of_scm_blocks;
> +
> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +    hwaddr addr;
> +    DeviceState *dev = NULL;
> +    PCDIMMDevice *dimm = NULL;
> +    Error *local_err = NULL;
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    dev = drc->dev;
> +    dimm = PC_DIMM(dev);
> +
> +    size = object_property_get_uint(OBJECT(dimm),
> +                                    PC_DIMM_SIZE_PROP, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return H_PARAMETER;
> +    }
> +
> +    total_no_of_scm_blocks = size / SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> +
> +    if ((starting_idx > total_no_of_scm_blocks) ||
> +        (no_of_scm_blocks_to_bind > total_no_of_scm_blocks)) {
> +        return H_P2;
> +    }
> +
> +    if (((starting_idx + no_of_scm_blocks_to_bind) < starting_idx) ||
> +        ((starting_idx + no_of_scm_blocks_to_bind) > total_no_of_scm_blocks)) {
> +        return H_P3;
> +    }
> +
> +    /* Currently qemu assigns the address. */
> +    if (target_logical_mem_addr != 0xffffffffffffffff) {
> +        return H_OVERLAP;

So, only supporting one mode of the interface is reasonable.  However,
it seems a somewhat unnatural way to handle the PAPR interface.  It
seems to me it would be more natural to instantiate the underlying
NVDIMM objects so that they're not in address_space_memory, but rather
in their own initially inaccssible address_space.  Then the BIND call
would alias a chunk of address_space_memory into the NVDIMMs address
space.

> +    }
> +
> +    /*
> +     * Currently continue token should be zero qemu has already bound
> +     * everything and this hcall doesnt return H_BUSY.
> +     */
> +    if (continue_token > 0) {
> +        return H_P5;
> +    }
> +
> +    /* NB : Already bound, Return target logical address in R4 */
> +    addr = object_property_get_uint(OBJECT(dimm),
> +                                    PC_DIMM_ADDR_PROP, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return H_PARAMETER;
> +    }
> +
> +    args[1] = addr;

Don't you need to adjust this if start_idx is non-zero?

> +    args[2] = no_of_scm_blocks_to_bind;
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t starting_scm_logical_addr = args[1];
> +    uint64_t no_of_scm_blocks_to_unbind = args[2];
> +    uint64_t size_to_unbind;
> +    uint64_t continue_token = args[3];
> +    Range blockrange = range_empty;
> +    Range nvdimmrange = range_empty;
> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +    DeviceState *dev = NULL;
> +    PCDIMMDevice *dimm = NULL;
> +    uint64_t size, addr;
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    /* Check if starting_scm_logical_addr is block aligned */
> +    if (!QEMU_IS_ALIGNED(starting_scm_logical_addr,
> +                         SPAPR_MINIMUM_SCM_BLOCK_SIZE)) {
> +        return H_P2;
> +    }
> +
> +    dev = drc->dev;
> +    dimm = PC_DIMM(dev);
> +    size = object_property_get_int(OBJECT(dimm), PC_DIMM_SIZE_PROP, NULL);
> +    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, NULL);
> +
> +    range_init_nofail(&nvdimmrange, addr, size);
> +
> +    size_to_unbind = no_of_scm_blocks_to_unbind * SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> +
> +
> +    range_init_nofail(&blockrange, starting_scm_logical_addr, size_to_unbind);
> +
> +    if (!range_contains_range(&nvdimmrange, &blockrange)) {
> +        return H_P3;
> +    }
> +
> +    if (continue_token > 0) {
> +        return H_P3;
> +    }
> +
> +    args[1] = no_of_scm_blocks_to_unbind;
> +
> +    /*NB : dont do anything, let object_del take care of this for now. */
> +    return H_SUCCESS;
> +}
> +
>  static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
>  static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
>  
> @@ -1894,6 +2090,12 @@ static void hypercall_register_types(void)
>      /* qemu/KVM-PPC specific hcalls */
>      spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
>  
> +    /* qemu/scm specific hcalls */
> +    spapr_register_hypercall(H_SCM_READ_METADATA, h_scm_read_metadata);
> +    spapr_register_hypercall(H_SCM_WRITE_METADATA, h_scm_write_metadata);
> +    spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
> +    spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
> +
>      /* ibm,client-architecture-support support */
>      spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 394ea26335..48e2cc9d67 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -283,6 +283,7 @@ struct SpaprMachineState {
>  #define H_P7              -60
>  #define H_P8              -61
>  #define H_P9              -62
> +#define H_OVERLAP         -68
>  #define H_UNSUPPORTED_FLAG -256
>  #define H_MULTI_THREADS_ACTIVE -9005
>  
> @@ -490,8 +491,12 @@ struct SpaprMachineState {
>  #define H_INT_ESB               0x3C8
>  #define H_INT_SYNC              0x3CC
>  #define H_INT_RESET             0x3D0
> +#define H_SCM_READ_METADATA     0x3E4
> +#define H_SCM_WRITE_METADATA    0x3E8
> +#define H_SCM_BIND_MEM          0x3EC
> +#define H_SCM_UNBIND_MEM        0x3F0
>  
> -#define MAX_HCALL_OPCODE        H_INT_RESET
> +#define MAX_HCALL_OPCODE        H_SCM_UNBIND_MEM
>  
>  /* The hcalls above are standardized in PAPR and implemented by pHyp
>   * as well.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support
  2019-05-14  2:22   ` David Gibson
@ 2019-05-15  7:00     ` Shivaprasad G Bhat
  2019-05-16  1:32       ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Shivaprasad G Bhat @ 2019-05-15  7:00 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, imammedo, qemu-ppc, xiaoguangrong.eric, mst

Hi David,

Thanks for the comments. Replies inline..

On 05/14/2019 07:52 AM, David Gibson wrote:
> On Mon, May 13, 2019 at 04:28:02AM -0500, Shivaprasad G Bhat wrote:
>> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
>> device interface in QEMU to support virtual NVDIMM devices for Power (May have
>> to re-look at this later).  Create the required DT entries for the
>> device (some entries have dummy values right now).
>>
>> The patch creates the required DT node and sends a hotplug
>> interrupt to the guest. Guest is expected to undertake the normal
>> DR resource add path in response and start issuing PAPR SCM hcalls.
>>
>> This is how it can be used ..
>> Add nvdimm=on to the qemu machine argument.
>> Ex : -machine pseries,nvdimm=on
>> For coldplug, the device to be added in qemu command line as shown below
>> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
>> -device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0
>>
>> For hotplug, the device to be added from monitor as below
>> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
>> device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0
>>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
>>                 [Early implementation]
>> ---
>> ---
>>   default-configs/ppc64-softmmu.mak |    1
>>   hw/mem/Kconfig                    |    2
>>   hw/mem/nvdimm.c                   |   43 ++++++++
>>   hw/ppc/spapr.c                    |  202 +++++++++++++++++++++++++++++++++++--
>>   hw/ppc/spapr_drc.c                |   18 +++
>>   hw/ppc/spapr_events.c             |    4 +
>>   include/hw/mem/nvdimm.h           |    6 +
>>   include/hw/ppc/spapr.h            |   12 ++
>>   include/hw/ppc/spapr_drc.h        |    9 ++
>>   9 files changed, 286 insertions(+), 11 deletions(-)
>>
>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>> index cca52665d9..ae0841fa3a 100644
>> --- a/default-configs/ppc64-softmmu.mak
>> +++ b/default-configs/ppc64-softmmu.mak
>> @@ -8,3 +8,4 @@ CONFIG_POWERNV=y
>>   
>>   # For pSeries
>>   CONFIG_PSERIES=y
>> +CONFIG_NVDIMM=y
>> diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
>> index 620fd4cb59..2ad052a536 100644
>> --- a/hw/mem/Kconfig
>> +++ b/hw/mem/Kconfig
>> @@ -8,4 +8,4 @@ config MEM_DEVICE
>>   config NVDIMM
>>       bool
>>       default y
>> -    depends on PC
>> +    depends on (PC || PSERIES)
>> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
>> index f221ec7a9a..deaebaaaa5 100644
>> --- a/hw/mem/nvdimm.c
>> +++ b/hw/mem/nvdimm.c
>> @@ -93,11 +93,54 @@ out:
>>       error_propagate(errp, local_err);
>>   }
>>   
>> +static void nvdimm_get_uuid(Object *obj, Visitor *v, const char *name,
>> +                                  void *opaque, Error **errp)
>> +{
>> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
>> +    char *value = NULL;
>> +
>> +    value = qemu_uuid_unparse_strdup(&nvdimm->uuid);
>> +
>> +    visit_type_str(v, name, &value, errp);
>> +}
>> +
>> +
>> +static void nvdimm_set_uuid(Object *obj, Visitor *v, const char *name,
>> +                                  void *opaque, Error **errp)
>> +{
>> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
>> +    Error *local_err = NULL;
>> +    char *value;
>> +
>> +    visit_type_str(v, name, &value, &local_err);
>> +    if (local_err) {
>> +        goto out;
>> +    }
>> +
>> +    if (strcmp(value, "") == 0) {
>> +        error_setg(&local_err, "Property '%s.%s' %s is required"
>> +                   " at least 0x%lx", object_get_typename(obj),
>> +                   name, value, MIN_NAMESPACE_LABEL_SIZE);
>> +        goto out;
>> +    }
>> +
>> +    if (qemu_uuid_parse(value, &nvdimm->uuid) != 0) {
>> +        error_setg(errp, "Invalid UUID");
>> +        return;
>> +    }
>> +out:
>> +    error_propagate(errp, local_err);
>> +}
>> +
>> +
>>   static void nvdimm_init(Object *obj)
>>   {
>>       object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
>>                           nvdimm_get_label_size, nvdimm_set_label_size, NULL,
>>                           NULL, NULL);
>> +
>> +    object_property_add(obj, NVDIMM_UUID_PROP, "QemuUUID", nvdimm_get_uuid,
>> +                        nvdimm_set_uuid, NULL, NULL, NULL);
>>   }
>>   
>>   static void nvdimm_finalize(Object *obj)
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 2ef3ce4362..b6951577e7 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -74,6 +74,7 @@
>>   #include "qemu/cutils.h"
>>   #include "hw/ppc/spapr_cpu_core.h"
>>   #include "hw/mem/memory-device.h"
>> +#include "hw/mem/nvdimm.h"
>>   
>>   #include <libfdt.h>
>>   
>> @@ -699,6 +700,7 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
>>       uint8_t *int_buf, *cur_index;
>>       int ret;
>>       uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>>       uint64_t addr, cur_addr, size;
>>       uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
>>       uint64_t mem_end = machine->device_memory->base +
>> @@ -735,12 +737,20 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
>>               nr_entries++;
>>           }
>>   
>> -        /* Entry for DIMM */
>> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>> -        g_assert(drc);
>> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
>> -                                     spapr_drc_index(drc), node,
>> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
>> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
>> +            /* Entry for NVDIMM */
>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
>> +            g_assert(drc);
>> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
>> +                                         spapr_drc_index(drc), -1, 0);
>> +        } else {
>> +            /* Entry for DIMM */
>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>> +            g_assert(drc);
>> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
>> +                                         spapr_drc_index(drc), node,
>> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
>> +        }
>>           QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>>           nr_entries++;
>>           cur_addr = addr + size;
>> @@ -1235,6 +1245,87 @@ static void spapr_dt_hypervisor(SpaprMachineState *spapr, void *fdt)
>>       }
>>   }
>>   
>> +static int spapr_populate_nvdimm_node(void *fdt, int parent_offset,
>> +                                      NVDIMMDevice *nvdimm)
> spapr_dt_*() is the new preferred naming convention for DT creation
> functions.

Okay

>> +{
>> +    int child_offset;
>> +    char buf[40];
>> +    SpaprDrc *drc;
>> +    uint32_t drc_idx;
>> +    uint32_t node = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_NODE_PROP,
>> +                                             &error_abort);
>> +    uint64_t addr = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_ADDR_PROP,
>> +                                             &error_abort);
>> +    uint32_t associativity[] = {
>> +        cpu_to_be32(0x4), /* length */
>> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
>> +        cpu_to_be32(0x0), cpu_to_be32(node)
>> +    };
>> +    uint64_t lsize = nvdimm->label_size;
>> +    uint64_t size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
>> +                                            NULL);
>> +
>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>> +    g_assert(drc);
>> +
>> +    drc_idx = spapr_drc_index(drc);
>> +
>> +    sprintf(buf, "pmem@%x", drc_idx);
> This doesn't look right.  According to the PAPR fragment I have, the
> name is supposed to be "ibm,pmemory" and the reg is supposed to be
> (addr, size) pairs, not DRC indices.
>
> Or do I have an out of date document?

Ah, you are right. I missed it, this went unnoticed as all my test cases
were to check pmem functionality and the current kernel functional code
doesnt refer this property anywhere.

>
>> +    child_offset = fdt_add_subnode(fdt, parent_offset, buf);
>> +    _FDT(child_offset);
>> +
>> +    _FDT((fdt_setprop_cell(fdt, child_offset, "reg", drc_idx)));
>> +    _FDT((fdt_setprop_string(fdt, child_offset, "compatible", "ibm,pmemory")));
>> +    _FDT((fdt_setprop_string(fdt, child_offset, "device_type", "ibm,pmemory")));
>> +
>> +    _FDT((fdt_setprop(fdt, child_offset, "ibm,associativity", associativity,
>> +                      sizeof(associativity))));
>> +
>> +    qemu_uuid_unparse(&nvdimm->uuid, buf);
>> +    _FDT((fdt_setprop_string(fdt, child_offset, "ibm,unit-guid", buf)));
>> +
>> +    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,my-drc-index", drc_idx)));
>> +
>> +    /*NB : What it should be? */
> If you don't have a good value, can you just leave it out?

Okay

>
>> +    _FDT(fdt_setprop_cell(fdt, child_offset, "ibm,latency-attribute", 828));
>> +
>> +    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,block-size",
>> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>> +    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,number-of-blocks",
>> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>> +    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,metadata-size", lsize)));
>> +
>> +    return child_offset;
>> +}
>> +
>> +static void spapr_dt_nvdimm(void *fdt)
> I'd suggest spapr_dt_nvdimms() or spapr_dt_persistent_memory(), with
> spapr_dt_nvdimm() for the single-NVDIMM functions earlier.

Okay

>
>> +{
>> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
>> +    GSList *dimms = NULL;
>> +
>> +    if (offset < 0) {
>> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
>> +        _FDT(offset);
>> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
> Hmm.  So the docuiment I have specifies #address-cells = 1 here, but
> that doesn't make sense with the reg format it specifies for the
> children.  So *something* is screwy.  Is there an up to date PAPR
> specification for persistent memory I can get?

Document is uptodate. I'll correct this part as well.

>
>> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
>> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
>> +                                 "ibm,persistent-memory")));
>> +    }
>> +
>> +    /*NB : Add drc-info array here */
>> +
>> +    /* Create DT entries for cold plugged NVDIMM devices */
>> +    dimms = nvdimm_get_device_list();
>> +    for (; dimms; dimms = dimms->next) {
>> +        NVDIMMDevice *nvdimm = dimms->data;
>> +
>> +        spapr_populate_nvdimm_node(fdt, offset, nvdimm);
>> +    }
>> +    g_slist_free(dimms);
>> +    return;
>> +}
>> +
>>   static void *spapr_build_fdt(SpaprMachineState *spapr)
>>   {
>>       MachineState *machine = MACHINE(spapr);
>> @@ -1368,6 +1459,11 @@ static void *spapr_build_fdt(SpaprMachineState *spapr)
>>           }
>>       }
>>   
>> +    /* NVDIMM devices */
>> +    if (spapr->nvdimm_enabled) {
>> +        spapr_dt_nvdimm(fdt);
>> +    }
>> +
>>       return fdt;
>>   }
>>   
>> @@ -3324,6 +3420,20 @@ static void spapr_set_host_serial(Object *obj, const char *value, Error **errp)
>>       spapr->host_serial = g_strdup(value);
>>   }
>>   
>> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
>> +
>> +    return spapr->nvdimm_enabled;
>> +}
>> +
>> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
>> +
>> +    spapr->nvdimm_enabled = value;
>> +}
>> +
>>   static void spapr_instance_init(Object *obj)
>>   {
>>       SpaprMachineState *spapr = SPAPR_MACHINE(obj);
>> @@ -3380,6 +3490,12 @@ static void spapr_instance_init(Object *obj)
>>           &error_abort);
>>       object_property_set_description(obj, "host-serial",
>>           "Host serial number to advertise in guest device tree", &error_abort);
>> +
>> +    object_property_add_bool(obj, "nvdimm",
>> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
> Is there actually a reason to have a property for this, rather than
> just always having it on?

No reason, just keeps inline with x86 workflow for nvdimm. Since its only
a limitation where one cannot add nvdimm if not started with nvdimm=on,
I'll remove it.

I think I still have to tie nvdimm to pseries machine version for the
supportability checks just the way memory and cpu hotplugs are tied to.

>> +    object_property_set_description(obj, "nvdimm",
>> +                                    "Enable support for nvdimm devices",
>> +                                    NULL);
>>   }
>>   
>>   static void spapr_machine_finalizefn(Object *obj)
>> @@ -3404,6 +3520,16 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
>>       }
>>   }
>>   
>> +int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>> +                           void *fdt, int *fdt_start_offset, Error **errp)
>> +{
>> +    NVDIMMDevice *nvdimm = NVDIMM(drc->dev);
>> +
>> +    *fdt_start_offset = spapr_populate_nvdimm_node(fdt, 0, nvdimm);
>> +
>> +    return 0;
>> +}
>> +
>>   int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>                             void *fdt, int *fdt_start_offset, Error **errp)
>>   {
>> @@ -3466,12 +3592,37 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
>>       }
>>   }
>>   
>> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr, Error **errp)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
>> +    SpaprDrc *drc;
>> +    bool hotplugged = spapr_drc_hotplugged(dev);
>> +    Error *local_err = NULL;
>> +
>> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
>> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> This still seems bogus to me.  The whole point of DRCs is that they
> exist *before* the device is plugged as a handle for the guest side
> plug mechanisms.  Creating them only when the device is added doesn't
> make sense.

We don't have a known maxpmem like maxmem or maxcpu, so we don't know 
how many
DRCs to create.

>
>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>> +    g_assert(drc);
>> +
>> +    spapr_drc_attach(drc, dev, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    if (hotplugged) {
>> +        spapr_hotplug_req_add_by_index(drc);
>> +    }
>> +}
>> +
>>   static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>                                 Error **errp)
>>   {
>>       Error *local_err = NULL;
>>       SpaprMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>>       PCDIMMDevice *dimm = PC_DIMM(dev);
>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>       uint64_t size, addr;
>>   
>>       size = memory_device_get_region_size(MEMORY_DEVICE(dev), &error_abort);
>> @@ -3487,8 +3638,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>           goto out_unplug;
>>       }
>>   
>> -    spapr_add_lmbs(dev, addr, size, spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>> -                   &local_err);
>> +    if (!is_nvdimm) {
>> +        spapr_add_lmbs(dev, addr, size,
>> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>> +                       &local_err);
>> +    } else {
>> +        spapr_add_nvdimm(dev, addr, &local_err);
>> +    }
>> +
>>       if (local_err) {
>>           goto out_unplug;
>>       }
>> @@ -3506,6 +3663,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>   {
>>       const SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>>       SpaprMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>       PCDIMMDevice *dimm = PC_DIMM(dev);
>>       Error *local_err = NULL;
>>       uint64_t size;
>> @@ -3523,10 +3681,28 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>           return;
>>       }
>>   
>> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
>>           error_setg(errp, "Hotplugged memory size must be a multiple of "
>> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>>           return;
>> +    } else if (is_nvdimm) {
>> +        char *uuidstr = NULL;
>> +        QemuUUID uuid;
>> +        if (size % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
>> +            error_setg(errp, "NVDIMM memory size excluding the label area"
> Is this reference to the label area still accurate?

Yes. If the anytime the base_address for dimm gets unaligned to 256MiB 
anytime, the subsequent
memory hotplug fails in the kernel.

>
>> +                       " must be a multiple of "
>> +                       "%" PRIu64 "MB",
>> +                       SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>> +            return;
>> +        }
>> +
>> +        uuidstr = object_property_get_str(OBJECT(dimm), NVDIMM_UUID_PROP, NULL);
>> +        qemu_uuid_parse(uuidstr, &uuid);
>> +        if (qemu_uuid_is_null(&uuid)) {
>> +            error_setg(errp, "NVDIMM device requires the uuid to be set");
>> +            return;
>> +        }
>> +        g_free(uuidstr);
>>       }
>>   
>>       memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
>> @@ -3666,6 +3842,12 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
>>       int i;
>>       SpaprDrc *drc;
>>   
>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
>> +        error_setg(&local_err,
>> +                   "nvdimm device hot unplug is not supported yet.");
>> +        goto out;
>> +    }
>> +
>>       size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
>>       nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
>>   
>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>> index 597f236b9c..983440a711 100644
>> --- a/hw/ppc/spapr_drc.c
>> +++ b/hw/ppc/spapr_drc.c
>> @@ -707,6 +707,17 @@ static void spapr_drc_phb_class_init(ObjectClass *k, void *data)
>>       drck->dt_populate = spapr_phb_dt_populate;
>>   }
>>   
>> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
>> +{
>> +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
>> +
>> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
>> +    drck->typename = "MEM";
>> +    drck->drc_name_prefix = "PMEM ";
>> +    drck->release = NULL;
>> +    drck->dt_populate = spapr_pmem_dt_populate;
>> +}
>> +
>>   static const TypeInfo spapr_dr_connector_info = {
>>       .name          = TYPE_SPAPR_DR_CONNECTOR,
>>       .parent        = TYPE_DEVICE,
>> @@ -757,6 +768,12 @@ static const TypeInfo spapr_drc_phb_info = {
>>       .class_init    = spapr_drc_phb_class_init,
>>   };
>>   
>> +static const TypeInfo spapr_drc_pmem_info = {
>> +    .name          = TYPE_SPAPR_DRC_PMEM,
>> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
>> +    .class_init    = spapr_drc_pmem_class_init,
>> +};
>> +
>>   /* helper functions for external users */
>>   
>>   SpaprDrc *spapr_drc_by_index(uint32_t index)
>> @@ -1226,6 +1243,7 @@ static void spapr_drc_register_types(void)
>>       type_register_static(&spapr_drc_pci_info);
>>       type_register_static(&spapr_drc_lmb_info);
>>       type_register_static(&spapr_drc_phb_info);
>> +    type_register_static(&spapr_drc_pmem_info);
>>   
>>       spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>>                           rtas_set_indicator);
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index ae0f093f59..1141203a87 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
>>   #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
>>   #define RTAS_LOG_V6_HP_TYPE_PHB                          4
>>   #define RTAS_LOG_V6_HP_TYPE_PCI                          5
>> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
>>       uint8_t hotplug_action;
>>   #define RTAS_LOG_V6_HP_ACTION_ADD                        1
>>   #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
>> @@ -529,6 +530,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>>       case SPAPR_DR_CONNECTOR_TYPE_PHB:
>>           hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PHB;
>>           break;
>> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
>> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
>> +        break;
>>       default:
>>           /* we shouldn't be signaling hotplug events for resources
>>            * that don't support them
>> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
>> index bad4fc04b5..3089615e17 100644
>> --- a/include/hw/mem/nvdimm.h
>> +++ b/include/hw/mem/nvdimm.h
>> @@ -49,6 +49,7 @@
>>                                                  TYPE_NVDIMM)
>>   
>>   #define NVDIMM_LABEL_SIZE_PROP "label-size"
>> +#define NVDIMM_UUID_PROP "uuid"
>>   #define NVDIMM_UNARMED_PROP    "unarmed"
>>   
>>   struct NVDIMMDevice {
>> @@ -83,6 +84,11 @@ struct NVDIMMDevice {
>>        * the guest write persistence.
>>        */
>>       bool unarmed;
>> +
>> +    /*
>> +     * The PPC64 - spapr requires each nvdimm device have a uuid.
>> +     */
>> +    QemuUUID uuid;
>>   };
>>   typedef struct NVDIMMDevice NVDIMMDevice;
>>   
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 7e32f309c2..394ea26335 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -202,6 +202,7 @@ struct SpaprMachineState {
>>       SpaprCapabilities def, eff, mig;
>>   
>>       unsigned gpu_numa_id;
>> +    bool nvdimm_enabled;
>>   };
>>   
>>   #define H_SUCCESS         0
>> @@ -794,6 +795,8 @@ int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>   void spapr_lmb_release(DeviceState *dev);
>>   int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>                             void *fdt, int *fdt_start_offset, Error **errp);
>> +int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>> +                          void *fdt, int *fdt_start_offset, Error **errp);
>>   void spapr_phb_release(DeviceState *dev);
>>   int spapr_phb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>                             void *fdt, int *fdt_start_offset, Error **errp);
>> @@ -829,6 +832,15 @@ int spapr_rtc_import_offset(SpaprRtcState *rtc, int64_t legacy_offset);
>>   #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
>>   #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
>>   
>> +/*
>> + * The nvdimm size should be aligned to SCM block size.
>> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
>> + * inorder to have SCM regions not to overlap with dimm memory regions.
>> + * The SCM devices can have variable block sizes. For now, fixing the
>> + * block size to the minimum value.
>> + */
>> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
>> +
>>   void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>>   
>>   #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
>> index fad0a887f9..8b7ce41a0f 100644
>> --- a/include/hw/ppc/spapr_drc.h
>> +++ b/include/hw/ppc/spapr_drc.h
>> @@ -79,6 +79,13 @@
>>   #define SPAPR_DRC_PHB(obj) OBJECT_CHECK(SpaprDrc, (obj), \
>>                                           TYPE_SPAPR_DRC_PHB)
>>   
>> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
>> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
>> +        OBJECT_GET_CLASS(SpaprDrcClass, obj, TYPE_SPAPR_DRC_PMEM)
>> +#define SPAPR_DRC_PMEM_CLASS(klass) \
>> +        OBJECT_CLASS_CHECK(SpaprDrcClass, klass, TYPE_SPAPR_DRC_PMEM)
>> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(SpaprDrc, (obj), \
>> +                                        TYPE_SPAPR_DRC_PMEM)
>>   /*
>>    * Various hotplug types managed by SpaprDrc
>>    *
>> @@ -96,6 +103,7 @@ typedef enum {
>>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
>>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
>>       SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
>> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
>>   } SpaprDrcTypeShift;
>>   
>>   typedef enum {
>> @@ -105,6 +113,7 @@ typedef enum {
>>       SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
>>       SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
>>       SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
>> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
>>   } SpaprDrcType;
>>   
>>   /*
>>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support
  2019-05-15  7:00     ` Shivaprasad G Bhat
@ 2019-05-16  1:32       ` David Gibson
  2019-05-17  7:42         ` Shivaprasad G Bhat
  0 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2019-05-16  1:32 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: qemu-devel, imammedo, qemu-ppc, xiaoguangrong.eric, mst

[-- Attachment #1: Type: text/plain, Size: 28536 bytes --]

On Wed, May 15, 2019 at 12:30:07PM +0530, Shivaprasad G Bhat wrote:
> Hi David,
> 
> Thanks for the comments. Replies inline..
> 
> On 05/14/2019 07:52 AM, David Gibson wrote:
> > On Mon, May 13, 2019 at 04:28:02AM -0500, Shivaprasad G Bhat wrote:
> > > Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
> > > device interface in QEMU to support virtual NVDIMM devices for Power (May have
> > > to re-look at this later).  Create the required DT entries for the
> > > device (some entries have dummy values right now).
> > > 
> > > The patch creates the required DT node and sends a hotplug
> > > interrupt to the guest. Guest is expected to undertake the normal
> > > DR resource add path in response and start issuing PAPR SCM hcalls.
> > > 
> > > This is how it can be used ..
> > > Add nvdimm=on to the qemu machine argument.
> > > Ex : -machine pseries,nvdimm=on
> > > For coldplug, the device to be added in qemu command line as shown below
> > > -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
> > > -device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0
> > > 
> > > For hotplug, the device to be added from monitor as below
> > > object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
> > > device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0
> > > 
> > > Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> > > Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> > >                 [Early implementation]
> > > ---
> > > ---
> > >   default-configs/ppc64-softmmu.mak |    1
> > >   hw/mem/Kconfig                    |    2
> > >   hw/mem/nvdimm.c                   |   43 ++++++++
> > >   hw/ppc/spapr.c                    |  202 +++++++++++++++++++++++++++++++++++--
> > >   hw/ppc/spapr_drc.c                |   18 +++
> > >   hw/ppc/spapr_events.c             |    4 +
> > >   include/hw/mem/nvdimm.h           |    6 +
> > >   include/hw/ppc/spapr.h            |   12 ++
> > >   include/hw/ppc/spapr_drc.h        |    9 ++
> > >   9 files changed, 286 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> > > index cca52665d9..ae0841fa3a 100644
> > > --- a/default-configs/ppc64-softmmu.mak
> > > +++ b/default-configs/ppc64-softmmu.mak
> > > @@ -8,3 +8,4 @@ CONFIG_POWERNV=y
> > >   # For pSeries
> > >   CONFIG_PSERIES=y
> > > +CONFIG_NVDIMM=y
> > > diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
> > > index 620fd4cb59..2ad052a536 100644
> > > --- a/hw/mem/Kconfig
> > > +++ b/hw/mem/Kconfig
> > > @@ -8,4 +8,4 @@ config MEM_DEVICE
> > >   config NVDIMM
> > >       bool
> > >       default y
> > > -    depends on PC
> > > +    depends on (PC || PSERIES)
> > > diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> > > index f221ec7a9a..deaebaaaa5 100644
> > > --- a/hw/mem/nvdimm.c
> > > +++ b/hw/mem/nvdimm.c
> > > @@ -93,11 +93,54 @@ out:
> > >       error_propagate(errp, local_err);
> > >   }
> > > +static void nvdimm_get_uuid(Object *obj, Visitor *v, const char *name,
> > > +                                  void *opaque, Error **errp)
> > > +{
> > > +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> > > +    char *value = NULL;
> > > +
> > > +    value = qemu_uuid_unparse_strdup(&nvdimm->uuid);
> > > +
> > > +    visit_type_str(v, name, &value, errp);
> > > +}
> > > +
> > > +
> > > +static void nvdimm_set_uuid(Object *obj, Visitor *v, const char *name,
> > > +                                  void *opaque, Error **errp)
> > > +{
> > > +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> > > +    Error *local_err = NULL;
> > > +    char *value;
> > > +
> > > +    visit_type_str(v, name, &value, &local_err);
> > > +    if (local_err) {
> > > +        goto out;
> > > +    }
> > > +
> > > +    if (strcmp(value, "") == 0) {
> > > +        error_setg(&local_err, "Property '%s.%s' %s is required"
> > > +                   " at least 0x%lx", object_get_typename(obj),
> > > +                   name, value, MIN_NAMESPACE_LABEL_SIZE);
> > > +        goto out;
> > > +    }
> > > +
> > > +    if (qemu_uuid_parse(value, &nvdimm->uuid) != 0) {
> > > +        error_setg(errp, "Invalid UUID");
> > > +        return;
> > > +    }
> > > +out:
> > > +    error_propagate(errp, local_err);
> > > +}
> > > +
> > > +
> > >   static void nvdimm_init(Object *obj)
> > >   {
> > >       object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
> > >                           nvdimm_get_label_size, nvdimm_set_label_size, NULL,
> > >                           NULL, NULL);
> > > +
> > > +    object_property_add(obj, NVDIMM_UUID_PROP, "QemuUUID", nvdimm_get_uuid,
> > > +                        nvdimm_set_uuid, NULL, NULL, NULL);
> > >   }
> > >   static void nvdimm_finalize(Object *obj)
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 2ef3ce4362..b6951577e7 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -74,6 +74,7 @@
> > >   #include "qemu/cutils.h"
> > >   #include "hw/ppc/spapr_cpu_core.h"
> > >   #include "hw/mem/memory-device.h"
> > > +#include "hw/mem/nvdimm.h"
> > >   #include <libfdt.h>
> > > @@ -699,6 +700,7 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
> > >       uint8_t *int_buf, *cur_index;
> > >       int ret;
> > >       uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> > > +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> > >       uint64_t addr, cur_addr, size;
> > >       uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
> > >       uint64_t mem_end = machine->device_memory->base +
> > > @@ -735,12 +737,20 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
> > >               nr_entries++;
> > >           }
> > > -        /* Entry for DIMM */
> > > -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> > > -        g_assert(drc);
> > > -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
> > > -                                     spapr_drc_index(drc), node,
> > > -                                     SPAPR_LMB_FLAGS_ASSIGNED);
> > > +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
> > > +            /* Entry for NVDIMM */
> > > +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
> > > +            g_assert(drc);
> > > +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
> > > +                                         spapr_drc_index(drc), -1, 0);
> > > +        } else {
> > > +            /* Entry for DIMM */
> > > +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
> > > +            g_assert(drc);
> > > +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
> > > +                                         spapr_drc_index(drc), node,
> > > +                                         SPAPR_LMB_FLAGS_ASSIGNED);
> > > +        }
> > >           QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
> > >           nr_entries++;
> > >           cur_addr = addr + size;
> > > @@ -1235,6 +1245,87 @@ static void spapr_dt_hypervisor(SpaprMachineState *spapr, void *fdt)
> > >       }
> > >   }
> > > +static int spapr_populate_nvdimm_node(void *fdt, int parent_offset,
> > > +                                      NVDIMMDevice *nvdimm)
> > spapr_dt_*() is the new preferred naming convention for DT creation
> > functions.
> 
> Okay
> 
> > > +{
> > > +    int child_offset;
> > > +    char buf[40];
> > > +    SpaprDrc *drc;
> > > +    uint32_t drc_idx;
> > > +    uint32_t node = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_NODE_PROP,
> > > +                                             &error_abort);
> > > +    uint64_t addr = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_ADDR_PROP,
> > > +                                             &error_abort);
> > > +    uint32_t associativity[] = {
> > > +        cpu_to_be32(0x4), /* length */
> > > +        cpu_to_be32(0x0), cpu_to_be32(0x0),
> > > +        cpu_to_be32(0x0), cpu_to_be32(node)
> > > +    };
> > > +    uint64_t lsize = nvdimm->label_size;
> > > +    uint64_t size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
> > > +                                            NULL);
> > > +
> > > +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> > > +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> > > +    g_assert(drc);
> > > +
> > > +    drc_idx = spapr_drc_index(drc);
> > > +
> > > +    sprintf(buf, "pmem@%x", drc_idx);
> > This doesn't look right.  According to the PAPR fragment I have, the
> > name is supposed to be "ibm,pmemory" and the reg is supposed to be
> > (addr, size) pairs, not DRC indices.
> > 
> > Or do I have an out of date document?
> 
> Ah, you are right. I missed it, this went unnoticed as all my test cases
> were to check pmem functionality and the current kernel functional code
> doesnt refer this property anywhere.
> 
> > 
> > > +    child_offset = fdt_add_subnode(fdt, parent_offset, buf);
> > > +    _FDT(child_offset);
> > > +
> > > +    _FDT((fdt_setprop_cell(fdt, child_offset, "reg", drc_idx)));
> > > +    _FDT((fdt_setprop_string(fdt, child_offset, "compatible", "ibm,pmemory")));
> > > +    _FDT((fdt_setprop_string(fdt, child_offset, "device_type", "ibm,pmemory")));
> > > +
> > > +    _FDT((fdt_setprop(fdt, child_offset, "ibm,associativity", associativity,
> > > +                      sizeof(associativity))));
> > > +
> > > +    qemu_uuid_unparse(&nvdimm->uuid, buf);
> > > +    _FDT((fdt_setprop_string(fdt, child_offset, "ibm,unit-guid", buf)));
> > > +
> > > +    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,my-drc-index", drc_idx)));
> > > +
> > > +    /*NB : What it should be? */
> > If you don't have a good value, can you just leave it out?
> 
> Okay
> 
> > 
> > > +    _FDT(fdt_setprop_cell(fdt, child_offset, "ibm,latency-attribute", 828));
> > > +
> > > +    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,block-size",
> > > +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> > > +    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,number-of-blocks",
> > > +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
> > > +    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,metadata-size", lsize)));
> > > +
> > > +    return child_offset;
> > > +}
> > > +
> > > +static void spapr_dt_nvdimm(void *fdt)
> > I'd suggest spapr_dt_nvdimms() or spapr_dt_persistent_memory(), with
> > spapr_dt_nvdimm() for the single-NVDIMM functions earlier.
> 
> Okay
> 
> > 
> > > +{
> > > +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
> > > +    GSList *dimms = NULL;
> > > +
> > > +    if (offset < 0) {
> > > +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
> > > +        _FDT(offset);
> > > +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
> > Hmm.  So the docuiment I have specifies #address-cells = 1 here, but
> > that doesn't make sense with the reg format it specifies for the
> > children.  So *something* is screwy.  Is there an up to date PAPR
> > specification for persistent memory I can get?
> 
> Document is uptodate. I'll correct this part as well.

Uh.. correct it to what.  My point is that the document contradicts
itself, or at least conflicts with base OF conventions.  So we need to
resolve that somehow, not slavishly implement a clearly-wrong document.

> 
> > 
> > > +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
> > > +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
> > > +                                 "ibm,persistent-memory")));
> > > +    }
> > > +
> > > +    /*NB : Add drc-info array here */
> > > +
> > > +    /* Create DT entries for cold plugged NVDIMM devices */
> > > +    dimms = nvdimm_get_device_list();
> > > +    for (; dimms; dimms = dimms->next) {
> > > +        NVDIMMDevice *nvdimm = dimms->data;
> > > +
> > > +        spapr_populate_nvdimm_node(fdt, offset, nvdimm);
> > > +    }
> > > +    g_slist_free(dimms);
> > > +    return;
> > > +}
> > > +
> > >   static void *spapr_build_fdt(SpaprMachineState *spapr)
> > >   {
> > >       MachineState *machine = MACHINE(spapr);
> > > @@ -1368,6 +1459,11 @@ static void *spapr_build_fdt(SpaprMachineState *spapr)
> > >           }
> > >       }
> > > +    /* NVDIMM devices */
> > > +    if (spapr->nvdimm_enabled) {
> > > +        spapr_dt_nvdimm(fdt);
> > > +    }
> > > +
> > >       return fdt;
> > >   }
> > > @@ -3324,6 +3420,20 @@ static void spapr_set_host_serial(Object *obj, const char *value, Error **errp)
> > >       spapr->host_serial = g_strdup(value);
> > >   }
> > > +static bool spapr_get_nvdimm(Object *obj, Error **errp)
> > > +{
> > > +    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
> > > +
> > > +    return spapr->nvdimm_enabled;
> > > +}
> > > +
> > > +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
> > > +{
> > > +    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
> > > +
> > > +    spapr->nvdimm_enabled = value;
> > > +}
> > > +
> > >   static void spapr_instance_init(Object *obj)
> > >   {
> > >       SpaprMachineState *spapr = SPAPR_MACHINE(obj);
> > > @@ -3380,6 +3490,12 @@ static void spapr_instance_init(Object *obj)
> > >           &error_abort);
> > >       object_property_set_description(obj, "host-serial",
> > >           "Host serial number to advertise in guest device tree", &error_abort);
> > > +
> > > +    object_property_add_bool(obj, "nvdimm",
> > > +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
> > Is there actually a reason to have a property for this, rather than
> > just always having it on?
> 
> No reason, just keeps inline with x86 workflow for nvdimm. Since its only
> a limitation where one cannot add nvdimm if not started with nvdimm=on,
> I'll remove it.

Let's do that for now.  It might be useful to consider a followup
patch adding it, if management layers will be made easier if they can
set the same property for x86 and power.

> I think I still have to tie nvdimm to pseries machine version for the
> supportability checks just the way memory and cpu hotplugs are tied
> to.

Yes, that might be needed.

> 
> > > +    object_property_set_description(obj, "nvdimm",
> > > +                                    "Enable support for nvdimm devices",
> > > +                                    NULL);
> > >   }
> > >   static void spapr_machine_finalizefn(Object *obj)
> > > @@ -3404,6 +3520,16 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
> > >       }
> > >   }
> > > +int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > > +                           void *fdt, int *fdt_start_offset, Error **errp)
> > > +{
> > > +    NVDIMMDevice *nvdimm = NVDIMM(drc->dev);
> > > +
> > > +    *fdt_start_offset = spapr_populate_nvdimm_node(fdt, 0, nvdimm);
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >   int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > >                             void *fdt, int *fdt_start_offset, Error **errp)
> > >   {
> > > @@ -3466,12 +3592,37 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
> > >       }
> > >   }
> > > +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr, Error **errp)
> > > +{
> > > +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> > > +    SpaprDrc *drc;
> > > +    bool hotplugged = spapr_drc_hotplugged(dev);
> > > +    Error *local_err = NULL;
> > > +
> > > +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
> > > +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> > This still seems bogus to me.  The whole point of DRCs is that they
> > exist *before* the device is plugged as a handle for the guest side
> > plug mechanisms.  Creating them only when the device is added doesn't
> > make sense.
> 
> We don't have a known maxpmem like maxmem or maxcpu, so we don't know how
> many
> DRCs to create.

Urgh.  Which means this PAPR extension basically doesn't make sense
with the existing PAPR stuff.  What a mess.

> 
> > 
> > > +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
> > > +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
> > > +    g_assert(drc);
> > > +
> > > +    spapr_drc_attach(drc, dev, &local_err);
> > > +    if (local_err) {
> > > +        error_propagate(errp, local_err);
> > > +        return;
> > > +    }
> > > +
> > > +    if (hotplugged) {
> > > +        spapr_hotplug_req_add_by_index(drc);
> > > +    }
> > > +}
> > > +
> > >   static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >                                 Error **errp)
> > >   {
> > >       Error *local_err = NULL;
> > >       SpaprMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> > >       PCDIMMDevice *dimm = PC_DIMM(dev);
> > > +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> > >       uint64_t size, addr;
> > >       size = memory_device_get_region_size(MEMORY_DEVICE(dev), &error_abort);
> > > @@ -3487,8 +3638,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >           goto out_unplug;
> > >       }
> > > -    spapr_add_lmbs(dev, addr, size, spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> > > -                   &local_err);
> > > +    if (!is_nvdimm) {
> > > +        spapr_add_lmbs(dev, addr, size,
> > > +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> > > +                       &local_err);
> > > +    } else {
> > > +        spapr_add_nvdimm(dev, addr, &local_err);
> > > +    }
> > > +
> > >       if (local_err) {
> > >           goto out_unplug;
> > >       }
> > > @@ -3506,6 +3663,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >   {
> > >       const SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
> > >       SpaprMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> > > +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> > >       PCDIMMDevice *dimm = PC_DIMM(dev);
> > >       Error *local_err = NULL;
> > >       uint64_t size;
> > > @@ -3523,10 +3681,28 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >           return;
> > >       }
> > > -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> > > +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
> > >           error_setg(errp, "Hotplugged memory size must be a multiple of "
> > > -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> > > +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
> > >           return;
> > > +    } else if (is_nvdimm) {
> > > +        char *uuidstr = NULL;
> > > +        QemuUUID uuid;
> > > +        if (size % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
> > > +            error_setg(errp, "NVDIMM memory size excluding the label area"
> > Is this reference to the label area still accurate?
> 
> Yes. If the anytime the base_address for dimm gets unaligned to 256MiB
> anytime, the subsequent
> memory hotplug fails in the kernel.


Yes... but AFAICT the label information is now out-of-band accessed
via the hcalls, rather than living within the nvdimm address space.
So what does the label have to do with the check above?  Or am I
missing something?
> 
> > 
> > > +                       " must be a multiple of "
> > > +                       "%" PRIu64 "MB",
> > > +                       SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
> > > +            return;
> > > +        }
> > > +
> > > +        uuidstr = object_property_get_str(OBJECT(dimm), NVDIMM_UUID_PROP, NULL);
> > > +        qemu_uuid_parse(uuidstr, &uuid);
> > > +        if (qemu_uuid_is_null(&uuid)) {
> > > +            error_setg(errp, "NVDIMM device requires the uuid to be set");
> > > +            return;
> > > +        }
> > > +        g_free(uuidstr);
> > >       }
> > >       memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
> > > @@ -3666,6 +3842,12 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
> > >       int i;
> > >       SpaprDrc *drc;
> > > +    if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
> > > +        error_setg(&local_err,
> > > +                   "nvdimm device hot unplug is not supported yet.");
> > > +        goto out;
> > > +    }
> > > +
> > >       size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
> > >       nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
> > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > > index 597f236b9c..983440a711 100644
> > > --- a/hw/ppc/spapr_drc.c
> > > +++ b/hw/ppc/spapr_drc.c
> > > @@ -707,6 +707,17 @@ static void spapr_drc_phb_class_init(ObjectClass *k, void *data)
> > >       drck->dt_populate = spapr_phb_dt_populate;
> > >   }
> > > +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
> > > +{
> > > +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> > > +
> > > +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
> > > +    drck->typename = "MEM";
> > > +    drck->drc_name_prefix = "PMEM ";
> > > +    drck->release = NULL;
> > > +    drck->dt_populate = spapr_pmem_dt_populate;
> > > +}
> > > +
> > >   static const TypeInfo spapr_dr_connector_info = {
> > >       .name          = TYPE_SPAPR_DR_CONNECTOR,
> > >       .parent        = TYPE_DEVICE,
> > > @@ -757,6 +768,12 @@ static const TypeInfo spapr_drc_phb_info = {
> > >       .class_init    = spapr_drc_phb_class_init,
> > >   };
> > > +static const TypeInfo spapr_drc_pmem_info = {
> > > +    .name          = TYPE_SPAPR_DRC_PMEM,
> > > +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
> > > +    .class_init    = spapr_drc_pmem_class_init,
> > > +};
> > > +
> > >   /* helper functions for external users */
> > >   SpaprDrc *spapr_drc_by_index(uint32_t index)
> > > @@ -1226,6 +1243,7 @@ static void spapr_drc_register_types(void)
> > >       type_register_static(&spapr_drc_pci_info);
> > >       type_register_static(&spapr_drc_lmb_info);
> > >       type_register_static(&spapr_drc_phb_info);
> > > +    type_register_static(&spapr_drc_pmem_info);
> > >       spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
> > >                           rtas_set_indicator);
> > > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > > index ae0f093f59..1141203a87 100644
> > > --- a/hw/ppc/spapr_events.c
> > > +++ b/hw/ppc/spapr_events.c
> > > @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
> > >   #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
> > >   #define RTAS_LOG_V6_HP_TYPE_PHB                          4
> > >   #define RTAS_LOG_V6_HP_TYPE_PCI                          5
> > > +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
> > >       uint8_t hotplug_action;
> > >   #define RTAS_LOG_V6_HP_ACTION_ADD                        1
> > >   #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> > > @@ -529,6 +530,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> > >       case SPAPR_DR_CONNECTOR_TYPE_PHB:
> > >           hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PHB;
> > >           break;
> > > +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
> > > +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
> > > +        break;
> > >       default:
> > >           /* we shouldn't be signaling hotplug events for resources
> > >            * that don't support them
> > > diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> > > index bad4fc04b5..3089615e17 100644
> > > --- a/include/hw/mem/nvdimm.h
> > > +++ b/include/hw/mem/nvdimm.h
> > > @@ -49,6 +49,7 @@
> > >                                                  TYPE_NVDIMM)
> > >   #define NVDIMM_LABEL_SIZE_PROP "label-size"
> > > +#define NVDIMM_UUID_PROP "uuid"
> > >   #define NVDIMM_UNARMED_PROP    "unarmed"
> > >   struct NVDIMMDevice {
> > > @@ -83,6 +84,11 @@ struct NVDIMMDevice {
> > >        * the guest write persistence.
> > >        */
> > >       bool unarmed;
> > > +
> > > +    /*
> > > +     * The PPC64 - spapr requires each nvdimm device have a uuid.
> > > +     */
> > > +    QemuUUID uuid;
> > >   };
> > >   typedef struct NVDIMMDevice NVDIMMDevice;
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index 7e32f309c2..394ea26335 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -202,6 +202,7 @@ struct SpaprMachineState {
> > >       SpaprCapabilities def, eff, mig;
> > >       unsigned gpu_numa_id;
> > > +    bool nvdimm_enabled;
> > >   };
> > >   #define H_SUCCESS         0
> > > @@ -794,6 +795,8 @@ int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > >   void spapr_lmb_release(DeviceState *dev);
> > >   int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > >                             void *fdt, int *fdt_start_offset, Error **errp);
> > > +int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > > +                          void *fdt, int *fdt_start_offset, Error **errp);
> > >   void spapr_phb_release(DeviceState *dev);
> > >   int spapr_phb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > >                             void *fdt, int *fdt_start_offset, Error **errp);
> > > @@ -829,6 +832,15 @@ int spapr_rtc_import_offset(SpaprRtcState *rtc, int64_t legacy_offset);
> > >   #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
> > >   #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
> > > +/*
> > > + * The nvdimm size should be aligned to SCM block size.
> > > + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
> > > + * inorder to have SCM regions not to overlap with dimm memory regions.
> > > + * The SCM devices can have variable block sizes. For now, fixing the
> > > + * block size to the minimum value.
> > > + */
> > > +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
> > > +
> > >   void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
> > >   #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
> > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > > index fad0a887f9..8b7ce41a0f 100644
> > > --- a/include/hw/ppc/spapr_drc.h
> > > +++ b/include/hw/ppc/spapr_drc.h
> > > @@ -79,6 +79,13 @@
> > >   #define SPAPR_DRC_PHB(obj) OBJECT_CHECK(SpaprDrc, (obj), \
> > >                                           TYPE_SPAPR_DRC_PHB)
> > > +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
> > > +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
> > > +        OBJECT_GET_CLASS(SpaprDrcClass, obj, TYPE_SPAPR_DRC_PMEM)
> > > +#define SPAPR_DRC_PMEM_CLASS(klass) \
> > > +        OBJECT_CLASS_CHECK(SpaprDrcClass, klass, TYPE_SPAPR_DRC_PMEM)
> > > +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(SpaprDrc, (obj), \
> > > +                                        TYPE_SPAPR_DRC_PMEM)
> > >   /*
> > >    * Various hotplug types managed by SpaprDrc
> > >    *
> > > @@ -96,6 +103,7 @@ typedef enum {
> > >       SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
> > >       SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
> > >       SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
> > > +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
> > >   } SpaprDrcTypeShift;
> > >   typedef enum {
> > > @@ -105,6 +113,7 @@ typedef enum {
> > >       SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
> > >       SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
> > >       SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
> > > +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
> > >   } SpaprDrcType;
> > >   /*
> > > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support
  2019-05-16  1:32       ` David Gibson
@ 2019-05-17  7:42         ` Shivaprasad G Bhat
  0 siblings, 0 replies; 16+ messages in thread
From: Shivaprasad G Bhat @ 2019-05-17  7:42 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, imammedo, qemu-ppc, xiaoguangrong.eric, mst



On 05/16/2019 07:02 AM, David Gibson wrote:
> On Wed, May 15, 2019 at 12:30:07PM +0530, Shivaprasad G Bhat wrote:
>> Hi David,
>>
>> Thanks for the comments. Replies inline..
>>
>> On 05/14/2019 07:52 AM, David Gibson wrote:
>>> On Mon, May 13, 2019 at 04:28:02AM -0500, Shivaprasad G Bhat wrote:
>>>> Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
>>>> device interface in QEMU to support virtual NVDIMM devices for Power (May have
>>>> to re-look at this later).  Create the required DT entries for the
>>>> device (some entries have dummy values right now).
>>>>
>>>> The patch creates the required DT node and sends a hotplug
>>>> interrupt to the guest. Guest is expected to undertake the normal
>>>> DR resource add path in response and start issuing PAPR SCM hcalls.
>>>>
>>>> This is how it can be used ..
>>>> Add nvdimm=on to the qemu machine argument.
>>>> Ex : -machine pseries,nvdimm=on
>>>> For coldplug, the device to be added in qemu command line as shown below
>>>> -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
>>>> -device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0
>>>>
>>>> For hotplug, the device to be added from monitor as below
>>>> object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
>>>> device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0
>>>>
>>>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>>>> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
>>>>                  [Early implementation]
>>>> ---
>>>> ---
>>>>    default-configs/ppc64-softmmu.mak |    1
>>>>    hw/mem/Kconfig                    |    2
>>>>    hw/mem/nvdimm.c                   |   43 ++++++++
>>>>    hw/ppc/spapr.c                    |  202 +++++++++++++++++++++++++++++++++++--
>>>>    hw/ppc/spapr_drc.c                |   18 +++
>>>>    hw/ppc/spapr_events.c             |    4 +
>>>>    include/hw/mem/nvdimm.h           |    6 +
>>>>    include/hw/ppc/spapr.h            |   12 ++
>>>>    include/hw/ppc/spapr_drc.h        |    9 ++
>>>>    9 files changed, 286 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>>>> index cca52665d9..ae0841fa3a 100644
>>>> --- a/default-configs/ppc64-softmmu.mak
>>>> +++ b/default-configs/ppc64-softmmu.mak
>>>> @@ -8,3 +8,4 @@ CONFIG_POWERNV=y
>>>>    # For pSeries
>>>>    CONFIG_PSERIES=y
>>>> +CONFIG_NVDIMM=y
>>>> diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
>>>> index 620fd4cb59..2ad052a536 100644
>>>> --- a/hw/mem/Kconfig
>>>> +++ b/hw/mem/Kconfig
>>>> @@ -8,4 +8,4 @@ config MEM_DEVICE
>>>>    config NVDIMM
>>>>        bool
>>>>        default y
>>>> -    depends on PC
>>>> +    depends on (PC || PSERIES)
>>>> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
>>>> index f221ec7a9a..deaebaaaa5 100644
>>>> --- a/hw/mem/nvdimm.c
>>>> +++ b/hw/mem/nvdimm.c
>>>> @@ -93,11 +93,54 @@ out:
>>>>        error_propagate(errp, local_err);
>>>>    }
>>>> +static void nvdimm_get_uuid(Object *obj, Visitor *v, const char *name,
>>>> +                                  void *opaque, Error **errp)
>>>> +{
>>>> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
>>>> +    char *value = NULL;
>>>> +
>>>> +    value = qemu_uuid_unparse_strdup(&nvdimm->uuid);
>>>> +
>>>> +    visit_type_str(v, name, &value, errp);
>>>> +}
>>>> +
>>>> +
>>>> +static void nvdimm_set_uuid(Object *obj, Visitor *v, const char *name,
>>>> +                                  void *opaque, Error **errp)
>>>> +{
>>>> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
>>>> +    Error *local_err = NULL;
>>>> +    char *value;
>>>> +
>>>> +    visit_type_str(v, name, &value, &local_err);
>>>> +    if (local_err) {
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    if (strcmp(value, "") == 0) {
>>>> +        error_setg(&local_err, "Property '%s.%s' %s is required"
>>>> +                   " at least 0x%lx", object_get_typename(obj),
>>>> +                   name, value, MIN_NAMESPACE_LABEL_SIZE);
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    if (qemu_uuid_parse(value, &nvdimm->uuid) != 0) {
>>>> +        error_setg(errp, "Invalid UUID");
>>>> +        return;
>>>> +    }
>>>> +out:
>>>> +    error_propagate(errp, local_err);
>>>> +}
>>>> +
>>>> +
>>>>    static void nvdimm_init(Object *obj)
>>>>    {
>>>>        object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
>>>>                            nvdimm_get_label_size, nvdimm_set_label_size, NULL,
>>>>                            NULL, NULL);
>>>> +
>>>> +    object_property_add(obj, NVDIMM_UUID_PROP, "QemuUUID", nvdimm_get_uuid,
>>>> +                        nvdimm_set_uuid, NULL, NULL, NULL);
>>>>    }
>>>>    static void nvdimm_finalize(Object *obj)
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 2ef3ce4362..b6951577e7 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -74,6 +74,7 @@
>>>>    #include "qemu/cutils.h"
>>>>    #include "hw/ppc/spapr_cpu_core.h"
>>>>    #include "hw/mem/memory-device.h"
>>>> +#include "hw/mem/nvdimm.h"
>>>>    #include <libfdt.h>
>>>> @@ -699,6 +700,7 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
>>>>        uint8_t *int_buf, *cur_index;
>>>>        int ret;
>>>>        uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>>>> +    uint64_t scm_block_size = SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>>>>        uint64_t addr, cur_addr, size;
>>>>        uint32_t nr_boot_lmbs = (machine->device_memory->base / lmb_size);
>>>>        uint64_t mem_end = machine->device_memory->base +
>>>> @@ -735,12 +737,20 @@ static int spapr_populate_drmem_v2(SpaprMachineState *spapr, void *fdt,
>>>>                nr_entries++;
>>>>            }
>>>> -        /* Entry for DIMM */
>>>> -        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>>>> -        g_assert(drc);
>>>> -        elem = spapr_get_drconf_cell(size / lmb_size, addr,
>>>> -                                     spapr_drc_index(drc), node,
>>>> -                                     SPAPR_LMB_FLAGS_ASSIGNED);
>>>> +        if (info->value->type == MEMORY_DEVICE_INFO_KIND_NVDIMM) {
>>>> +            /* Entry for NVDIMM */
>>>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, addr / scm_block_size);
>>>> +            g_assert(drc);
>>>> +            elem = spapr_get_drconf_cell(size / scm_block_size, addr,
>>>> +                                         spapr_drc_index(drc), -1, 0);
>>>> +        } else {
>>>> +            /* Entry for DIMM */
>>>> +            drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
>>>> +            g_assert(drc);
>>>> +            elem = spapr_get_drconf_cell(size / lmb_size, addr,
>>>> +                                         spapr_drc_index(drc), node,
>>>> +                                         SPAPR_LMB_FLAGS_ASSIGNED);
>>>> +        }
>>>>            QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>>>>            nr_entries++;
>>>>            cur_addr = addr + size;
>>>> @@ -1235,6 +1245,87 @@ static void spapr_dt_hypervisor(SpaprMachineState *spapr, void *fdt)
>>>>        }
>>>>    }
>>>> +static int spapr_populate_nvdimm_node(void *fdt, int parent_offset,
>>>> +                                      NVDIMMDevice *nvdimm)
>>> spapr_dt_*() is the new preferred naming convention for DT creation
>>> functions.
>> Okay
>>
>>>> +{
>>>> +    int child_offset;
>>>> +    char buf[40];
>>>> +    SpaprDrc *drc;
>>>> +    uint32_t drc_idx;
>>>> +    uint32_t node = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_NODE_PROP,
>>>> +                                             &error_abort);
>>>> +    uint64_t addr = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_ADDR_PROP,
>>>> +                                             &error_abort);
>>>> +    uint32_t associativity[] = {
>>>> +        cpu_to_be32(0x4), /* length */
>>>> +        cpu_to_be32(0x0), cpu_to_be32(0x0),
>>>> +        cpu_to_be32(0x0), cpu_to_be32(node)
>>>> +    };
>>>> +    uint64_t lsize = nvdimm->label_size;
>>>> +    uint64_t size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
>>>> +                                            NULL);
>>>> +
>>>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>>>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>>>> +    g_assert(drc);
>>>> +
>>>> +    drc_idx = spapr_drc_index(drc);
>>>> +
>>>> +    sprintf(buf, "pmem@%x", drc_idx);
>>> This doesn't look right.  According to the PAPR fragment I have, the
>>> name is supposed to be "ibm,pmemory" and the reg is supposed to be
>>> (addr, size) pairs, not DRC indices.
>>>
>>> Or do I have an out of date document?
>> Ah, you are right. I missed it, this went unnoticed as all my test cases
>> were to check pmem functionality and the current kernel functional code
>> doesnt refer this property anywhere.
>>
>>>> +    child_offset = fdt_add_subnode(fdt, parent_offset, buf);
>>>> +    _FDT(child_offset);
>>>> +
>>>> +    _FDT((fdt_setprop_cell(fdt, child_offset, "reg", drc_idx)));
>>>> +    _FDT((fdt_setprop_string(fdt, child_offset, "compatible", "ibm,pmemory")));
>>>> +    _FDT((fdt_setprop_string(fdt, child_offset, "device_type", "ibm,pmemory")));
>>>> +
>>>> +    _FDT((fdt_setprop(fdt, child_offset, "ibm,associativity", associativity,
>>>> +                      sizeof(associativity))));
>>>> +
>>>> +    qemu_uuid_unparse(&nvdimm->uuid, buf);
>>>> +    _FDT((fdt_setprop_string(fdt, child_offset, "ibm,unit-guid", buf)));
>>>> +
>>>> +    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,my-drc-index", drc_idx)));
>>>> +
>>>> +    /*NB : What it should be? */
>>> If you don't have a good value, can you just leave it out?
>> Okay
>>
>>>> +    _FDT(fdt_setprop_cell(fdt, child_offset, "ibm,latency-attribute", 828));
>>>> +
>>>> +    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,block-size",
>>>> +                          SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>>>> +    _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,number-of-blocks",
>>>> +                          size / SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
>>>> +    _FDT((fdt_setprop_cell(fdt, child_offset, "ibm,metadata-size", lsize)));
>>>> +
>>>> +    return child_offset;
>>>> +}
>>>> +
>>>> +static void spapr_dt_nvdimm(void *fdt)
>>> I'd suggest spapr_dt_nvdimms() or spapr_dt_persistent_memory(), with
>>> spapr_dt_nvdimm() for the single-NVDIMM functions earlier.
>> Okay
>>
>>>> +{
>>>> +    int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
>>>> +    GSList *dimms = NULL;
>>>> +
>>>> +    if (offset < 0) {
>>>> +        offset = fdt_add_subnode(fdt, 0, "persistent-memory");
>>>> +        _FDT(offset);
>>>> +        _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0x2)));
>>> Hmm.  So the docuiment I have specifies #address-cells = 1 here, but
>>> that doesn't make sense with the reg format it specifies for the
>>> children.  So *something* is screwy.  Is there an up to date PAPR
>>> specification for persistent memory I can get?
>> Document is uptodate. I'll correct this part as well.
> Uh.. correct it to what.  My point is that the document contradicts
> itself, or at least conflicts with base OF conventions.  So we need to
> resolve that somehow, not slavishly implement a clearly-wrong document.

Ah, you are right! I missed the whole point earlier. I'll get it checked to
see what it actually should be.

>>>> +        _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0x0)));
>>>> +        _FDT((fdt_setprop_string(fdt, offset, "device_type",
>>>> +                                 "ibm,persistent-memory")));
>>>> +    }
>>>> +
>>>> +    /*NB : Add drc-info array here */
>>>> +
>>>> +    /* Create DT entries for cold plugged NVDIMM devices */
>>>> +    dimms = nvdimm_get_device_list();
>>>> +    for (; dimms; dimms = dimms->next) {
>>>> +        NVDIMMDevice *nvdimm = dimms->data;
>>>> +
>>>> +        spapr_populate_nvdimm_node(fdt, offset, nvdimm);
>>>> +    }
>>>> +    g_slist_free(dimms);
>>>> +    return;
>>>> +}
>>>> +
>>>>    static void *spapr_build_fdt(SpaprMachineState *spapr)
>>>>    {
>>>>        MachineState *machine = MACHINE(spapr);
>>>> @@ -1368,6 +1459,11 @@ static void *spapr_build_fdt(SpaprMachineState *spapr)
>>>>            }
>>>>        }
>>>> +    /* NVDIMM devices */
>>>> +    if (spapr->nvdimm_enabled) {
>>>> +        spapr_dt_nvdimm(fdt);
>>>> +    }
>>>> +
>>>>        return fdt;
>>>>    }
>>>> @@ -3324,6 +3420,20 @@ static void spapr_set_host_serial(Object *obj, const char *value, Error **errp)
>>>>        spapr->host_serial = g_strdup(value);
>>>>    }
>>>> +static bool spapr_get_nvdimm(Object *obj, Error **errp)
>>>> +{
>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
>>>> +
>>>> +    return spapr->nvdimm_enabled;
>>>> +}
>>>> +
>>>> +static void spapr_set_nvdimm(Object *obj, bool value, Error **errp)
>>>> +{
>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
>>>> +
>>>> +    spapr->nvdimm_enabled = value;
>>>> +}
>>>> +
>>>>    static void spapr_instance_init(Object *obj)
>>>>    {
>>>>        SpaprMachineState *spapr = SPAPR_MACHINE(obj);
>>>> @@ -3380,6 +3490,12 @@ static void spapr_instance_init(Object *obj)
>>>>            &error_abort);
>>>>        object_property_set_description(obj, "host-serial",
>>>>            "Host serial number to advertise in guest device tree", &error_abort);
>>>> +
>>>> +    object_property_add_bool(obj, "nvdimm",
>>>> +                            spapr_get_nvdimm, spapr_set_nvdimm, NULL);
>>> Is there actually a reason to have a property for this, rather than
>>> just always having it on?
>> No reason, just keeps inline with x86 workflow for nvdimm. Since its only
>> a limitation where one cannot add nvdimm if not started with nvdimm=on,
>> I'll remove it.
> Let's do that for now.  It might be useful to consider a followup
> patch adding it, if management layers will be made easier if they can
> set the same property for x86 and power.

Ok

>
>> I think I still have to tie nvdimm to pseries machine version for the
>> supportability checks just the way memory and cpu hotplugs are tied
>> to.
> Yes, that might be needed.
>
>>>> +    object_property_set_description(obj, "nvdimm",
>>>> +                                    "Enable support for nvdimm devices",
>>>> +                                    NULL);
>>>>    }
>>>>    static void spapr_machine_finalizefn(Object *obj)
>>>> @@ -3404,6 +3520,16 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
>>>>        }
>>>>    }
>>>> +int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>>> +                           void *fdt, int *fdt_start_offset, Error **errp)
>>>> +{
>>>> +    NVDIMMDevice *nvdimm = NVDIMM(drc->dev);
>>>> +
>>>> +    *fdt_start_offset = spapr_populate_nvdimm_node(fdt, 0, nvdimm);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>>>                              void *fdt, int *fdt_start_offset, Error **errp)
>>>>    {
>>>> @@ -3466,12 +3592,37 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
>>>>        }
>>>>    }
>>>> +static void spapr_add_nvdimm(DeviceState *dev, uint64_t addr, Error **errp)
>>>> +{
>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
>>>> +    SpaprDrc *drc;
>>>> +    bool hotplugged = spapr_drc_hotplugged(dev);
>>>> +    Error *local_err = NULL;
>>>> +
>>>> +    spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM,
>>>> +                           addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>>> This still seems bogus to me.  The whole point of DRCs is that they
>>> exist *before* the device is plugged as a handle for the guest side
>>> plug mechanisms.  Creating them only when the device is added doesn't
>>> make sense.
>> We don't have a known maxpmem like maxmem or maxcpu, so we don't know how
>> many
>> DRCs to create.
> Urgh.  Which means this PAPR extension basically doesn't make sense
> with the existing PAPR stuff.  What a mess.

Greg hinted me if we can use the maxslots set by the user along with maxmem
of -m option for this purpose and create those many DRCs before. It actually
makes sense, only thing is the slots are shared with memory dimms
and some DRCs may never be used.

>>>> +    drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM,
>>>> +                          addr / SPAPR_MINIMUM_SCM_BLOCK_SIZE);
>>>> +    g_assert(drc);
>>>> +
>>>> +    spapr_drc_attach(drc, dev, &local_err);
>>>> +    if (local_err) {
>>>> +        error_propagate(errp, local_err);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (hotplugged) {
>>>> +        spapr_hotplug_req_add_by_index(drc);
>>>> +    }
>>>> +}
>>>> +
>>>>    static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>                                  Error **errp)
>>>>    {
>>>>        Error *local_err = NULL;
>>>>        SpaprMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>>>>        PCDIMMDevice *dimm = PC_DIMM(dev);
>>>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>>>        uint64_t size, addr;
>>>>        size = memory_device_get_region_size(MEMORY_DEVICE(dev), &error_abort);
>>>> @@ -3487,8 +3638,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>            goto out_unplug;
>>>>        }
>>>> -    spapr_add_lmbs(dev, addr, size, spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>>>> -                   &local_err);
>>>> +    if (!is_nvdimm) {
>>>> +        spapr_add_lmbs(dev, addr, size,
>>>> +                       spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
>>>> +                       &local_err);
>>>> +    } else {
>>>> +        spapr_add_nvdimm(dev, addr, &local_err);
>>>> +    }
>>>> +
>>>>        if (local_err) {
>>>>            goto out_unplug;
>>>>        }
>>>> @@ -3506,6 +3663,7 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>    {
>>>>        const SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>>>>        SpaprMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
>>>> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>>>        PCDIMMDevice *dimm = PC_DIMM(dev);
>>>>        Error *local_err = NULL;
>>>>        uint64_t size;
>>>> @@ -3523,10 +3681,28 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>            return;
>>>>        }
>>>> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>>>> +    if (!is_nvdimm && size % SPAPR_MEMORY_BLOCK_SIZE) {
>>>>            error_setg(errp, "Hotplugged memory size must be a multiple of "
>>>> -                      "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>>>> +                          "%" PRIu64 " MB", SPAPR_MEMORY_BLOCK_SIZE / MiB);
>>>>            return;
>>>> +    } else if (is_nvdimm) {
>>>> +        char *uuidstr = NULL;
>>>> +        QemuUUID uuid;
>>>> +        if (size % SPAPR_MINIMUM_SCM_BLOCK_SIZE) {
>>>> +            error_setg(errp, "NVDIMM memory size excluding the label area"
>>> Is this reference to the label area still accurate?
>> Yes. If the anytime the base_address for dimm gets unaligned to 256MiB
>> anytime, the subsequent
>> memory hotplug fails in the kernel.
>
> Yes... but AFAICT the label information is now out-of-band accessed
> via the hcalls, rather than living within the nvdimm address space.
> So what does the label have to do with the check above?  Or am I
> missing something?

NVDIMM size usable by guest is minus the label area

See nvdimm_prepare_memory_region(),
     pmem_size = size - nvdimm->label_size;

For arbitrary label size, the alignment of guest address space would go 
wrong,
that would affect subsequent dimm hotplugs .

For a guest with 1GB memory and 1GB nvdimm with 128k label area,
without the above check,

address-space: memory
   0000000000000000-ffffffffffffffff (prio 0, i/o): system
     0000000000000000-000000003fffffff (prio 0, ram): ppc_spapr.ram
     0000000040000000-00000004ffffffff (prio 0, i/o): device-memory
       0000000040000000-000000007ffdffff (prio 0, nv-i/o): alias 
nvdimm-memory @/objects/memnvdimm0 0000000000000000-000000003ffdffff
The next address 000000007ffe0000 - is unaligned to 256MiB.

Now after hotplugging 256MiB,
   0000000000000000-ffffffffffffffff (prio 0, i/o): system
     0000000000000000-000000003fffffff (prio 0, ram): ppc_spapr.ram
     0000000040000000-00000004ffffffff (prio 0, i/o): device-memory
       0000000040000000-000000007ffdffff (prio 0, nv-i/o): alias 
nvdimm-memory @/objects/memnvdimm0 0000000000000000-000000003ffdffff
       0000000080000000-000000008fffffff (prio 0, ram): memdimm1
In kernel we hit
pseries-hotplug-mem: Attempting to hot-add 1 LMB(s) at index 80000008
[   48.001215] Block size [0x10000000 or 268435456] unaligned hotplug 
range: start 0x8ffe0000, size 0x10000000
[   48.001300] pseries-hotplug-mem: Memory indexed-count-add failed, 
removing any added LMBs

Also, the nvdimm size actually less, 3 SCM blocks of 256MiB and that is 
768MiB not 1GB, as we
calculate number of SCM blocks = size / 256MiB.

So, this check is still necessary.

>>>> +                       " must be a multiple of "
>>>> +                       "%" PRIu64 "MB",
>>>> +                       SPAPR_MINIMUM_SCM_BLOCK_SIZE / MiB);
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        uuidstr = object_property_get_str(OBJECT(dimm), NVDIMM_UUID_PROP, NULL);
>>>> +        qemu_uuid_parse(uuidstr, &uuid);
>>>> +        if (qemu_uuid_is_null(&uuid)) {
>>>> +            error_setg(errp, "NVDIMM device requires the uuid to be set");
>>>> +            return;
>>>> +        }
>>>> +        g_free(uuidstr);
>>>>        }
>>>>        memdev = object_property_get_link(OBJECT(dimm), PC_DIMM_MEMDEV_PROP,
>>>> @@ -3666,6 +3842,12 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
>>>>        int i;
>>>>        SpaprDrc *drc;
>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
>>>> +        error_setg(&local_err,
>>>> +                   "nvdimm device hot unplug is not supported yet.");
>>>> +        goto out;
>>>> +    }
>>>> +
>>>>        size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
>>>>        nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
>>>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>>>> index 597f236b9c..983440a711 100644
>>>> --- a/hw/ppc/spapr_drc.c
>>>> +++ b/hw/ppc/spapr_drc.c
>>>> @@ -707,6 +707,17 @@ static void spapr_drc_phb_class_init(ObjectClass *k, void *data)
>>>>        drck->dt_populate = spapr_phb_dt_populate;
>>>>    }
>>>> +static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
>>>> +{
>>>> +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
>>>> +
>>>> +    drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM;
>>>> +    drck->typename = "MEM";
>>>> +    drck->drc_name_prefix = "PMEM ";
>>>> +    drck->release = NULL;
>>>> +    drck->dt_populate = spapr_pmem_dt_populate;
>>>> +}
>>>> +
>>>>    static const TypeInfo spapr_dr_connector_info = {
>>>>        .name          = TYPE_SPAPR_DR_CONNECTOR,
>>>>        .parent        = TYPE_DEVICE,
>>>> @@ -757,6 +768,12 @@ static const TypeInfo spapr_drc_phb_info = {
>>>>        .class_init    = spapr_drc_phb_class_init,
>>>>    };
>>>> +static const TypeInfo spapr_drc_pmem_info = {
>>>> +    .name          = TYPE_SPAPR_DRC_PMEM,
>>>> +    .parent        = TYPE_SPAPR_DRC_LOGICAL,
>>>> +    .class_init    = spapr_drc_pmem_class_init,
>>>> +};
>>>> +
>>>>    /* helper functions for external users */
>>>>    SpaprDrc *spapr_drc_by_index(uint32_t index)
>>>> @@ -1226,6 +1243,7 @@ static void spapr_drc_register_types(void)
>>>>        type_register_static(&spapr_drc_pci_info);
>>>>        type_register_static(&spapr_drc_lmb_info);
>>>>        type_register_static(&spapr_drc_phb_info);
>>>> +    type_register_static(&spapr_drc_pmem_info);
>>>>        spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>>>>                            rtas_set_indicator);
>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>> index ae0f093f59..1141203a87 100644
>>>> --- a/hw/ppc/spapr_events.c
>>>> +++ b/hw/ppc/spapr_events.c
>>>> @@ -193,6 +193,7 @@ struct rtas_event_log_v6_hp {
>>>>    #define RTAS_LOG_V6_HP_TYPE_SLOT                         3
>>>>    #define RTAS_LOG_V6_HP_TYPE_PHB                          4
>>>>    #define RTAS_LOG_V6_HP_TYPE_PCI                          5
>>>> +#define RTAS_LOG_V6_HP_TYPE_PMEM                         6
>>>>        uint8_t hotplug_action;
>>>>    #define RTAS_LOG_V6_HP_ACTION_ADD                        1
>>>>    #define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
>>>> @@ -529,6 +530,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>>>>        case SPAPR_DR_CONNECTOR_TYPE_PHB:
>>>>            hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PHB;
>>>>            break;
>>>> +    case SPAPR_DR_CONNECTOR_TYPE_PMEM:
>>>> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PMEM;
>>>> +        break;
>>>>        default:
>>>>            /* we shouldn't be signaling hotplug events for resources
>>>>             * that don't support them
>>>> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
>>>> index bad4fc04b5..3089615e17 100644
>>>> --- a/include/hw/mem/nvdimm.h
>>>> +++ b/include/hw/mem/nvdimm.h
>>>> @@ -49,6 +49,7 @@
>>>>                                                   TYPE_NVDIMM)
>>>>    #define NVDIMM_LABEL_SIZE_PROP "label-size"
>>>> +#define NVDIMM_UUID_PROP "uuid"
>>>>    #define NVDIMM_UNARMED_PROP    "unarmed"
>>>>    struct NVDIMMDevice {
>>>> @@ -83,6 +84,11 @@ struct NVDIMMDevice {
>>>>         * the guest write persistence.
>>>>         */
>>>>        bool unarmed;
>>>> +
>>>> +    /*
>>>> +     * The PPC64 - spapr requires each nvdimm device have a uuid.
>>>> +     */
>>>> +    QemuUUID uuid;
>>>>    };
>>>>    typedef struct NVDIMMDevice NVDIMMDevice;
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index 7e32f309c2..394ea26335 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -202,6 +202,7 @@ struct SpaprMachineState {
>>>>        SpaprCapabilities def, eff, mig;
>>>>        unsigned gpu_numa_id;
>>>> +    bool nvdimm_enabled;
>>>>    };
>>>>    #define H_SUCCESS         0
>>>> @@ -794,6 +795,8 @@ int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>>>    void spapr_lmb_release(DeviceState *dev);
>>>>    int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>>>                              void *fdt, int *fdt_start_offset, Error **errp);
>>>> +int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>>> +                          void *fdt, int *fdt_start_offset, Error **errp);
>>>>    void spapr_phb_release(DeviceState *dev);
>>>>    int spapr_phb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>>>>                              void *fdt, int *fdt_start_offset, Error **errp);
>>>> @@ -829,6 +832,15 @@ int spapr_rtc_import_offset(SpaprRtcState *rtc, int64_t legacy_offset);
>>>>    #define SPAPR_LMB_FLAGS_DRC_INVALID 0x00000020
>>>>    #define SPAPR_LMB_FLAGS_RESERVED 0x00000080
>>>> +/*
>>>> + * The nvdimm size should be aligned to SCM block size.
>>>> + * The SCM block size should be aligned to SPAPR_MEMORY_BLOCK_SIZE
>>>> + * inorder to have SCM regions not to overlap with dimm memory regions.
>>>> + * The SCM devices can have variable block sizes. For now, fixing the
>>>> + * block size to the minimum value.
>>>> + */
>>>> +#define SPAPR_MINIMUM_SCM_BLOCK_SIZE SPAPR_MEMORY_BLOCK_SIZE
>>>> +
>>>>    void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>>>>    #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>>>> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
>>>> index fad0a887f9..8b7ce41a0f 100644
>>>> --- a/include/hw/ppc/spapr_drc.h
>>>> +++ b/include/hw/ppc/spapr_drc.h
>>>> @@ -79,6 +79,13 @@
>>>>    #define SPAPR_DRC_PHB(obj) OBJECT_CHECK(SpaprDrc, (obj), \
>>>>                                            TYPE_SPAPR_DRC_PHB)
>>>> +#define TYPE_SPAPR_DRC_PMEM "spapr-drc-pmem"
>>>> +#define SPAPR_DRC_PMEM_GET_CLASS(obj) \
>>>> +        OBJECT_GET_CLASS(SpaprDrcClass, obj, TYPE_SPAPR_DRC_PMEM)
>>>> +#define SPAPR_DRC_PMEM_CLASS(klass) \
>>>> +        OBJECT_CLASS_CHECK(SpaprDrcClass, klass, TYPE_SPAPR_DRC_PMEM)
>>>> +#define SPAPR_DRC_PMEM(obj) OBJECT_CHECK(SpaprDrc, (obj), \
>>>> +                                        TYPE_SPAPR_DRC_PMEM)
>>>>    /*
>>>>     * Various hotplug types managed by SpaprDrc
>>>>     *
>>>> @@ -96,6 +103,7 @@ typedef enum {
>>>>        SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
>>>>        SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
>>>>        SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
>>>> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM = 9,
>>>>    } SpaprDrcTypeShift;
>>>>    typedef enum {
>>>> @@ -105,6 +113,7 @@ typedef enum {
>>>>        SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
>>>>        SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
>>>>        SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
>>>> +    SPAPR_DR_CONNECTOR_TYPE_PMEM = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PMEM,
>>>>    } SpaprDrcType;
>>>>    /*
>>>>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support
  2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support Shivaprasad G Bhat
  2019-05-14  2:22   ` David Gibson
@ 2019-05-22 16:07   ` Fabiano Rosas
  2019-07-16  7:53     ` Shivaprasad G Bhat
  1 sibling, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2019-05-22 16:07 UTC (permalink / raw)
  To: Shivaprasad G Bhat, imammedo, david, xiaoguangrong.eric, mst
  Cc: qemu-ppc, qemu-devel, sbhat

Shivaprasad G Bhat <sbhat@linux.ibm.com> writes:

> +    /* Create DT entries for cold plugged NVDIMM devices */
> +    dimms = nvdimm_get_device_list();
> +    for (; dimms; dimms = dimms->next) {
> +        NVDIMMDevice *nvdimm = dimms->data;
> +
> +        spapr_populate_nvdimm_node(fdt, offset, nvdimm);
> +    }
> +    g_slist_free(dimms);

To free the whole list you'll need another variable in the loop above,
right?

Cheers



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
  2019-05-14  4:38   ` David Gibson
@ 2019-05-22 18:08   ` Fabiano Rosas
  2019-05-22 18:11     ` Fabiano Rosas
                       ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Fabiano Rosas @ 2019-05-22 18:08 UTC (permalink / raw)
  To: Shivaprasad G Bhat, imammedo, david, xiaoguangrong.eric, mst
  Cc: qemu-ppc, qemu-devel, sbhat

Shivaprasad G Bhat <sbhat@linux.ibm.com> writes:

> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 6c16d2b120..b6e7d04dcf 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -3,11 +3,13 @@
>  #include "sysemu/hw_accel.h"
>  #include "sysemu/sysemu.h"
>  #include "qemu/log.h"
> +#include "qemu/range.h"
>  #include "qemu/error-report.h"
>  #include "cpu.h"
>  #include "exec/exec-all.h"
>  #include "helper_regs.h"
>  #include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_drc.h"
>  #include "hw/ppc/spapr_cpu_core.h"
>  #include "mmu-hash64.h"
>  #include "cpu-models.h"
> @@ -16,6 +18,7 @@
>  #include "hw/ppc/spapr_ovec.h"
>  #include "mmu-book3s-v3.h"
>  #include "hw/mem/memory-device.h"
> +#include "hw/mem/nvdimm.h"
>  
>  static bool has_spr(PowerPCCPU *cpu, int spr)
>  {
> @@ -1795,6 +1798,199 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      return H_SUCCESS;
>  }
>  
> +static target_ulong h_scm_read_metadata(PowerPCCPU *cpu,
> +                                        SpaprMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t offset = args[1];
> +    uint64_t numBytesToRead = args[2];

This variable's case is inconsistent with the rest of the file.

> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +    NVDIMMDevice *nvdimm = NULL;
> +    NVDIMMClass *ddc = NULL;
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (numBytesToRead != 1 && numBytesToRead != 2 &&
> +        numBytesToRead != 4 && numBytesToRead != 8) {
> +        return H_P3;
> +    }
> +
> +    nvdimm = NVDIMM(drc->dev);
> +    if ((offset + numBytesToRead < offset) ||
> +        (nvdimm->label_size < numBytesToRead + offset)) {
> +        return H_P2;
> +    }

Won't the first clause always be false? Considering they're both uint64_t.

> +
> +    ddc = NVDIMM_GET_CLASS(nvdimm);
> +    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);
> +
> +    return H_SUCCESS;
> +}
> +
> +
> +static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
> +                                         SpaprMachineState *spapr,
> +                                         target_ulong opcode,
> +                                         target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t offset = args[1];
> +    uint64_t data = args[2];
> +    int8_t numBytesToWrite = args[3];

Likewise.

> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +    NVDIMMDevice *nvdimm = NULL;
> +    DeviceState *dev = NULL;
> +    NVDIMMClass *ddc = NULL;
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (numBytesToWrite != 1 && numBytesToWrite != 2 &&
> +        numBytesToWrite != 4 && numBytesToWrite != 8) {
> +        return H_P4;
> +    }
> +
> +    dev = drc->dev;
> +    nvdimm = NVDIMM(dev);
> +    if ((nvdimm->label_size < numBytesToWrite + offset) ||
> +        (offset + numBytesToWrite < offset)) {
> +        return H_P2;
> +    }
> +
> +    ddc = NVDIMM_GET_CLASS(nvdimm);
> +    ddc->write_label_data(nvdimm, &data, numBytesToWrite, offset);
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t starting_idx = args[1];
> +    uint64_t no_of_scm_blocks_to_bind = args[2];
> +    uint64_t target_logical_mem_addr = args[3];
> +    uint64_t continue_token = args[4];
> +    uint64_t size;
> +    uint64_t total_no_of_scm_blocks;
> +
> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +    hwaddr addr;
> +    DeviceState *dev = NULL;
> +    PCDIMMDevice *dimm = NULL;
> +    Error *local_err = NULL;
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    dev = drc->dev;
> +    dimm = PC_DIMM(dev);
> +
> +    size = object_property_get_uint(OBJECT(dimm),
> +                                    PC_DIMM_SIZE_PROP, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return H_PARAMETER;
> +    }
> +
> +    total_no_of_scm_blocks = size / SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> +
> +    if ((starting_idx > total_no_of_scm_blocks) ||
> +        (no_of_scm_blocks_to_bind > total_no_of_scm_blocks)) {
> +        return H_P2;
> +    }
> +
> +    if (((starting_idx + no_of_scm_blocks_to_bind) < starting_idx) ||
> +        ((starting_idx + no_of_scm_blocks_to_bind) > total_no_of_scm_blocks)) {
> +        return H_P3;
> +    }

Same here.

> +
> +    /* Currently qemu assigns the address. */
> +    if (target_logical_mem_addr != 0xffffffffffffffff) {
> +        return H_OVERLAP;
> +    }
> +
> +    /*
> +     * Currently continue token should be zero qemu has already bound
> +     * everything and this hcall doesnt return H_BUSY.
> +     */
> +    if (continue_token > 0) {
> +        return H_P5;
> +    }
> +
> +    /* NB : Already bound, Return target logical address in R4 */
> +    addr = object_property_get_uint(OBJECT(dimm),
> +                                    PC_DIMM_ADDR_PROP, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return H_PARAMETER;
> +    }
> +
> +    args[1] = addr;
> +    args[2] = no_of_scm_blocks_to_bind;
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    uint32_t drc_index = args[0];
> +    uint64_t starting_scm_logical_addr = args[1];
> +    uint64_t no_of_scm_blocks_to_unbind = args[2];
> +    uint64_t size_to_unbind;
> +    uint64_t continue_token = args[3];
> +    Range blockrange = range_empty;
> +    Range nvdimmrange = range_empty;
> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +    DeviceState *dev = NULL;
> +    PCDIMMDevice *dimm = NULL;
> +    uint64_t size, addr;
> +
> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    /* Check if starting_scm_logical_addr is block aligned */
> +    if (!QEMU_IS_ALIGNED(starting_scm_logical_addr,
> +                         SPAPR_MINIMUM_SCM_BLOCK_SIZE)) {
> +        return H_P2;
> +    }
> +
> +    dev = drc->dev;
> +    dimm = PC_DIMM(dev);
> +    size = object_property_get_int(OBJECT(dimm), PC_DIMM_SIZE_PROP, NULL);
> +    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, NULL);
> +
> +    range_init_nofail(&nvdimmrange, addr, size);
> +
> +    size_to_unbind = no_of_scm_blocks_to_unbind * SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> +
> +
> +    range_init_nofail(&blockrange, starting_scm_logical_addr, size_to_unbind);
> +
> +    if (!range_contains_range(&nvdimmrange, &blockrange)) {
> +        return H_P3;
> +    }
> +
> +    if (continue_token > 0) {
> +        return H_P3;
> +    }
> +
> +    args[1] = no_of_scm_blocks_to_unbind;
> +
> +    /*NB : dont do anything, let object_del take care of this for now. */
> +    return H_SUCCESS;
> +}
> +
>  static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
>  static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
>  
> @@ -1894,6 +2090,12 @@ static void hypercall_register_types(void)
>      /* qemu/KVM-PPC specific hcalls */
>      spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
>  
> +    /* qemu/scm specific hcalls */
> +    spapr_register_hypercall(H_SCM_READ_METADATA, h_scm_read_metadata);
> +    spapr_register_hypercall(H_SCM_WRITE_METADATA, h_scm_write_metadata);
> +    spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
> +    spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
> +
>      /* ibm,client-architecture-support support */
>      spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 394ea26335..48e2cc9d67 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -283,6 +283,7 @@ struct SpaprMachineState {
>  #define H_P7              -60
>  #define H_P8              -61
>  #define H_P9              -62
> +#define H_OVERLAP         -68
>  #define H_UNSUPPORTED_FLAG -256
>  #define H_MULTI_THREADS_ACTIVE -9005
>  
> @@ -490,8 +491,12 @@ struct SpaprMachineState {
>  #define H_INT_ESB               0x3C8
>  #define H_INT_SYNC              0x3CC
>  #define H_INT_RESET             0x3D0
> +#define H_SCM_READ_METADATA     0x3E4
> +#define H_SCM_WRITE_METADATA    0x3E8
> +#define H_SCM_BIND_MEM          0x3EC
> +#define H_SCM_UNBIND_MEM        0x3F0
>  
> -#define MAX_HCALL_OPCODE        H_INT_RESET
> +#define MAX_HCALL_OPCODE        H_SCM_UNBIND_MEM
>  
>  /* The hcalls above are standardized in PAPR and implemented by pHyp
>   * as well.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-05-22 18:08   ` Fabiano Rosas
@ 2019-05-22 18:11     ` Fabiano Rosas
  2019-05-23  0:46     ` David Gibson
  2019-07-11  5:51     ` Shivaprasad G Bhat
  2 siblings, 0 replies; 16+ messages in thread
From: Fabiano Rosas @ 2019-05-22 18:11 UTC (permalink / raw)
  To: Shivaprasad G Bhat, imammedo, david, xiaoguangrong.eric, mst
  Cc: qemu-ppc, qemu-devel, sbhat

Fabiano Rosas <farosas@linux.ibm.com> writes:

>> +    nvdimm = NVDIMM(drc->dev);
>> +    if ((offset + numBytesToRead < offset) ||
>> +        (nvdimm->label_size < numBytesToRead + offset)) {
>> +        return H_P2;
>> +    }
>
> Won't the first clause always be false? Considering they're both uint64_t.

Neverming this question. I just saw that David asked for it in the previous
version.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-05-22 18:08   ` Fabiano Rosas
  2019-05-22 18:11     ` Fabiano Rosas
@ 2019-05-23  0:46     ` David Gibson
  2019-07-11  5:51     ` Shivaprasad G Bhat
  2 siblings, 0 replies; 16+ messages in thread
From: David Gibson @ 2019-05-23  0:46 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: xiaoguangrong.eric, Shivaprasad G Bhat, mst, qemu-devel,
	qemu-ppc, imammedo

[-- Attachment #1: Type: text/plain, Size: 10423 bytes --]

On Wed, May 22, 2019 at 03:08:34PM -0300, Fabiano Rosas wrote:
> Shivaprasad G Bhat <sbhat@linux.ibm.com> writes:
> 
> > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > index 6c16d2b120..b6e7d04dcf 100644
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -3,11 +3,13 @@
> >  #include "sysemu/hw_accel.h"
> >  #include "sysemu/sysemu.h"
> >  #include "qemu/log.h"
> > +#include "qemu/range.h"
> >  #include "qemu/error-report.h"
> >  #include "cpu.h"
> >  #include "exec/exec-all.h"
> >  #include "helper_regs.h"
> >  #include "hw/ppc/spapr.h"
> > +#include "hw/ppc/spapr_drc.h"
> >  #include "hw/ppc/spapr_cpu_core.h"
> >  #include "mmu-hash64.h"
> >  #include "cpu-models.h"
> > @@ -16,6 +18,7 @@
> >  #include "hw/ppc/spapr_ovec.h"
> >  #include "mmu-book3s-v3.h"
> >  #include "hw/mem/memory-device.h"
> > +#include "hw/mem/nvdimm.h"
> >  
> >  static bool has_spr(PowerPCCPU *cpu, int spr)
> >  {
> > @@ -1795,6 +1798,199 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >      return H_SUCCESS;
> >  }
> >  
> > +static target_ulong h_scm_read_metadata(PowerPCCPU *cpu,
> > +                                        SpaprMachineState *spapr,
> > +                                        target_ulong opcode,
> > +                                        target_ulong *args)
> > +{
> > +    uint32_t drc_index = args[0];
> > +    uint64_t offset = args[1];
> > +    uint64_t numBytesToRead = args[2];
> 
> This variable's case is inconsistent with the rest of the file.
> 
> > +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> > +    NVDIMMDevice *nvdimm = NULL;
> > +    NVDIMMClass *ddc = NULL;
> > +
> > +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> > +        return H_PARAMETER;
> > +    }
> > +
> > +    if (numBytesToRead != 1 && numBytesToRead != 2 &&
> > +        numBytesToRead != 4 && numBytesToRead != 8) {
> > +        return H_P3;
> > +    }
> > +
> > +    nvdimm = NVDIMM(drc->dev);
> > +    if ((offset + numBytesToRead < offset) ||
> > +        (nvdimm->label_size < numBytesToRead + offset)) {
> > +        return H_P2;
> > +    }
> 
> Won't the first clause always be false? Considering they're both
> uint64_t.

Generally yes, but that's caleed supplied input, so we need to protect
against 64-bit overflow.

> 
> > +
> > +    ddc = NVDIMM_GET_CLASS(nvdimm);
> > +    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);
> > +
> > +    return H_SUCCESS;
> > +}
> > +
> > +
> > +static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
> > +                                         SpaprMachineState *spapr,
> > +                                         target_ulong opcode,
> > +                                         target_ulong *args)
> > +{
> > +    uint32_t drc_index = args[0];
> > +    uint64_t offset = args[1];
> > +    uint64_t data = args[2];
> > +    int8_t numBytesToWrite = args[3];
> 
> Likewise.
> 
> > +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> > +    NVDIMMDevice *nvdimm = NULL;
> > +    DeviceState *dev = NULL;
> > +    NVDIMMClass *ddc = NULL;
> > +
> > +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> > +        return H_PARAMETER;
> > +    }
> > +
> > +    if (numBytesToWrite != 1 && numBytesToWrite != 2 &&
> > +        numBytesToWrite != 4 && numBytesToWrite != 8) {
> > +        return H_P4;
> > +    }
> > +
> > +    dev = drc->dev;
> > +    nvdimm = NVDIMM(dev);
> > +    if ((nvdimm->label_size < numBytesToWrite + offset) ||
> > +        (offset + numBytesToWrite < offset)) {
> > +        return H_P2;
> > +    }
> > +
> > +    ddc = NVDIMM_GET_CLASS(nvdimm);
> > +    ddc->write_label_data(nvdimm, &data, numBytesToWrite, offset);
> > +
> > +    return H_SUCCESS;
> > +}
> > +
> > +static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
> > +                                        target_ulong opcode,
> > +                                        target_ulong *args)
> > +{
> > +    uint32_t drc_index = args[0];
> > +    uint64_t starting_idx = args[1];
> > +    uint64_t no_of_scm_blocks_to_bind = args[2];
> > +    uint64_t target_logical_mem_addr = args[3];
> > +    uint64_t continue_token = args[4];
> > +    uint64_t size;
> > +    uint64_t total_no_of_scm_blocks;
> > +
> > +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> > +    hwaddr addr;
> > +    DeviceState *dev = NULL;
> > +    PCDIMMDevice *dimm = NULL;
> > +    Error *local_err = NULL;
> > +
> > +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> > +        return H_PARAMETER;
> > +    }
> > +
> > +    dev = drc->dev;
> > +    dimm = PC_DIMM(dev);
> > +
> > +    size = object_property_get_uint(OBJECT(dimm),
> > +                                    PC_DIMM_SIZE_PROP, &local_err);
> > +    if (local_err) {
> > +        error_report_err(local_err);
> > +        return H_PARAMETER;
> > +    }
> > +
> > +    total_no_of_scm_blocks = size / SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> > +
> > +    if ((starting_idx > total_no_of_scm_blocks) ||
> > +        (no_of_scm_blocks_to_bind > total_no_of_scm_blocks)) {
> > +        return H_P2;
> > +    }
> > +
> > +    if (((starting_idx + no_of_scm_blocks_to_bind) < starting_idx) ||
> > +        ((starting_idx + no_of_scm_blocks_to_bind) > total_no_of_scm_blocks)) {
> > +        return H_P3;
> > +    }
> 
> Same here.
> 
> > +
> > +    /* Currently qemu assigns the address. */
> > +    if (target_logical_mem_addr != 0xffffffffffffffff) {
> > +        return H_OVERLAP;
> > +    }
> > +
> > +    /*
> > +     * Currently continue token should be zero qemu has already bound
> > +     * everything and this hcall doesnt return H_BUSY.
> > +     */
> > +    if (continue_token > 0) {
> > +        return H_P5;
> > +    }
> > +
> > +    /* NB : Already bound, Return target logical address in R4 */
> > +    addr = object_property_get_uint(OBJECT(dimm),
> > +                                    PC_DIMM_ADDR_PROP, &local_err);
> > +    if (local_err) {
> > +        error_report_err(local_err);
> > +        return H_PARAMETER;
> > +    }
> > +
> > +    args[1] = addr;
> > +    args[2] = no_of_scm_blocks_to_bind;
> > +
> > +    return H_SUCCESS;
> > +}
> > +
> > +static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
> > +                                        target_ulong opcode,
> > +                                        target_ulong *args)
> > +{
> > +    uint32_t drc_index = args[0];
> > +    uint64_t starting_scm_logical_addr = args[1];
> > +    uint64_t no_of_scm_blocks_to_unbind = args[2];
> > +    uint64_t size_to_unbind;
> > +    uint64_t continue_token = args[3];
> > +    Range blockrange = range_empty;
> > +    Range nvdimmrange = range_empty;
> > +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> > +    DeviceState *dev = NULL;
> > +    PCDIMMDevice *dimm = NULL;
> > +    uint64_t size, addr;
> > +
> > +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> > +        return H_PARAMETER;
> > +    }
> > +
> > +    /* Check if starting_scm_logical_addr is block aligned */
> > +    if (!QEMU_IS_ALIGNED(starting_scm_logical_addr,
> > +                         SPAPR_MINIMUM_SCM_BLOCK_SIZE)) {
> > +        return H_P2;
> > +    }
> > +
> > +    dev = drc->dev;
> > +    dimm = PC_DIMM(dev);
> > +    size = object_property_get_int(OBJECT(dimm), PC_DIMM_SIZE_PROP, NULL);
> > +    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, NULL);
> > +
> > +    range_init_nofail(&nvdimmrange, addr, size);
> > +
> > +    size_to_unbind = no_of_scm_blocks_to_unbind * SPAPR_MINIMUM_SCM_BLOCK_SIZE;
> > +
> > +
> > +    range_init_nofail(&blockrange, starting_scm_logical_addr, size_to_unbind);
> > +
> > +    if (!range_contains_range(&nvdimmrange, &blockrange)) {
> > +        return H_P3;
> > +    }
> > +
> > +    if (continue_token > 0) {
> > +        return H_P3;
> > +    }
> > +
> > +    args[1] = no_of_scm_blocks_to_unbind;
> > +
> > +    /*NB : dont do anything, let object_del take care of this for now. */
> > +    return H_SUCCESS;
> > +}
> > +
> >  static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
> >  static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
> >  
> > @@ -1894,6 +2090,12 @@ static void hypercall_register_types(void)
> >      /* qemu/KVM-PPC specific hcalls */
> >      spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
> >  
> > +    /* qemu/scm specific hcalls */
> > +    spapr_register_hypercall(H_SCM_READ_METADATA, h_scm_read_metadata);
> > +    spapr_register_hypercall(H_SCM_WRITE_METADATA, h_scm_write_metadata);
> > +    spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
> > +    spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
> > +
> >      /* ibm,client-architecture-support support */
> >      spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
> >  
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 394ea26335..48e2cc9d67 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -283,6 +283,7 @@ struct SpaprMachineState {
> >  #define H_P7              -60
> >  #define H_P8              -61
> >  #define H_P9              -62
> > +#define H_OVERLAP         -68
> >  #define H_UNSUPPORTED_FLAG -256
> >  #define H_MULTI_THREADS_ACTIVE -9005
> >  
> > @@ -490,8 +491,12 @@ struct SpaprMachineState {
> >  #define H_INT_ESB               0x3C8
> >  #define H_INT_SYNC              0x3CC
> >  #define H_INT_RESET             0x3D0
> > +#define H_SCM_READ_METADATA     0x3E4
> > +#define H_SCM_WRITE_METADATA    0x3E8
> > +#define H_SCM_BIND_MEM          0x3EC
> > +#define H_SCM_UNBIND_MEM        0x3F0
> >  
> > -#define MAX_HCALL_OPCODE        H_INT_RESET
> > +#define MAX_HCALL_OPCODE        H_SCM_UNBIND_MEM
> >  
> >  /* The hcalls above are standardized in PAPR and implemented by pHyp
> >   * as well.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-05-14  4:38   ` David Gibson
@ 2019-07-11  5:46     ` Shivaprasad G Bhat
  0 siblings, 0 replies; 16+ messages in thread
From: Shivaprasad G Bhat @ 2019-07-11  5:46 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, imammedo, qemu-ppc, xiaoguangrong.eric, mst



On 05/14/2019 10:08 AM, David Gibson wrote:
> On Mon, May 13, 2019 at 04:28:36AM -0500, Shivaprasad G Bhat wrote:
>> This patch implements few of the necessary hcalls for the nvdimm support.
>>
>> PAPR semantics is such that each NVDIMM device is comprising of multiple
>> SCM(Storage Class Memory) blocks. The guest requests the hypervisor to bind
>> each of the SCM blocks of the NVDIMM device using hcalls. There can be
>> SCM block unbind requests in case of driver errors or unplug(not supported now)
>> use cases. The NVDIMM label read/writes are done through hcalls.
>>
>> Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
>> unbind, and queries using hcalls on those blocks can come independently. This
>> doesn't fit well into the qemu device semantics, where the map/unmap are done
>> at the (whole)device/object level granularity. The patch doesnt actually
>> bind/unbind on hcalls but let it happen at the object_add/del phase itself
>> instead.
>>
>> The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
>> region level granularity. Without interleaving, each virtual NVDIMM device is
>> presented as separate region. There is no way to configure the virtual NVDIMM
>> interleaving for the guests today. So, there is no way a partial bind/unbind
>> request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
>> virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
>> object_add/del.
>>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_hcall.c   |  202 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr.h |    7 +-
>>   2 files changed, 208 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index 6c16d2b120..b6e7d04dcf 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -3,11 +3,13 @@
>>   #include "sysemu/hw_accel.h"
>>   #include "sysemu/sysemu.h"
>>   #include "qemu/log.h"
>> +#include "qemu/range.h"
>>   #include "qemu/error-report.h"
>>   #include "cpu.h"
>>   #include "exec/exec-all.h"
>>   #include "helper_regs.h"
>>   #include "hw/ppc/spapr.h"
>> +#include "hw/ppc/spapr_drc.h"
>>   #include "hw/ppc/spapr_cpu_core.h"
>>   #include "mmu-hash64.h"
>>   #include "cpu-models.h"
>> @@ -16,6 +18,7 @@
>>   #include "hw/ppc/spapr_ovec.h"
>>   #include "mmu-book3s-v3.h"
>>   #include "hw/mem/memory-device.h"
>> +#include "hw/mem/nvdimm.h"
>>   
>>   static bool has_spr(PowerPCCPU *cpu, int spr)
>>   {
>> @@ -1795,6 +1798,199 @@ static target_ulong h_update_dt(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>       return H_SUCCESS;
>>   }
>>   
>> +static target_ulong h_scm_read_metadata(PowerPCCPU *cpu,
>> +                                        SpaprMachineState *spapr,
>> +                                        target_ulong opcode,
>> +                                        target_ulong *args)
>> +{
>> +    uint32_t drc_index = args[0];
>> +    uint64_t offset = args[1];
>> +    uint64_t numBytesToRead = args[2];
>> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
>> +    NVDIMMDevice *nvdimm = NULL;
>> +    NVDIMMClass *ddc = NULL;
>> +
>> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    if (numBytesToRead != 1 && numBytesToRead != 2 &&
>> +        numBytesToRead != 4 && numBytesToRead != 8) {
>> +        return H_P3;
>> +    }
>> +
>> +    nvdimm = NVDIMM(drc->dev);
>> +    if ((offset + numBytesToRead < offset) ||
>> +        (nvdimm->label_size < numBytesToRead + offset)) {
>> +        return H_P2;
>> +    }
>> +
>> +    ddc = NVDIMM_GET_CLASS(nvdimm);
>> +    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);
> I'm pretty sure this will need some sort of byteswap.  args[0] is
> effectively in host native order (since it maps to a register, not
> memory).  But the guest will have to assume some byteorder (BE, I'm
> guessing, because PAPR) in order to map that register to an in-memory
> byteorder.  So, you'll need to compensate for that here.

You are right, I have figured out the required changes working with the 
kernel
team. Will post it in the next version.

>
>> +    return H_SUCCESS;
>> +}
>> +
>> +
>> +static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
>> +                                         SpaprMachineState *spapr,
>> +                                         target_ulong opcode,
>> +                                         target_ulong *args)
>> +{
>> +    uint32_t drc_index = args[0];
>> +    uint64_t offset = args[1];
>> +    uint64_t data = args[2];
>> +    int8_t numBytesToWrite = args[3];
>> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
>> +    NVDIMMDevice *nvdimm = NULL;
>> +    DeviceState *dev = NULL;
>> +    NVDIMMClass *ddc = NULL;
>> +
>> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    if (numBytesToWrite != 1 && numBytesToWrite != 2 &&
>> +        numBytesToWrite != 4 && numBytesToWrite != 8) {
>> +        return H_P4;
>> +    }
>> +
>> +    dev = drc->dev;
>> +    nvdimm = NVDIMM(dev);
>> +    if ((nvdimm->label_size < numBytesToWrite + offset) ||
>> +        (offset + numBytesToWrite < offset)) {
>> +        return H_P2;
>> +    }
>> +
>> +    ddc = NVDIMM_GET_CLASS(nvdimm);
>> +    ddc->write_label_data(nvdimm, &data, numBytesToWrite, offset);
> Likewise here.
>
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> +                                        target_ulong opcode,
>> +                                        target_ulong *args)
>> +{
>> +    uint32_t drc_index = args[0];
>> +    uint64_t starting_idx = args[1];
>> +    uint64_t no_of_scm_blocks_to_bind = args[2];
>> +    uint64_t target_logical_mem_addr = args[3];
>> +    uint64_t continue_token = args[4];
>> +    uint64_t size;
>> +    uint64_t total_no_of_scm_blocks;
>> +
>> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
>> +    hwaddr addr;
>> +    DeviceState *dev = NULL;
>> +    PCDIMMDevice *dimm = NULL;
>> +    Error *local_err = NULL;
>> +
>> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    dev = drc->dev;
>> +    dimm = PC_DIMM(dev);
>> +
>> +    size = object_property_get_uint(OBJECT(dimm),
>> +                                    PC_DIMM_SIZE_PROP, &local_err);
>> +    if (local_err) {
>> +        error_report_err(local_err);
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    total_no_of_scm_blocks = size / SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>> +
>> +    if ((starting_idx > total_no_of_scm_blocks) ||
>> +        (no_of_scm_blocks_to_bind > total_no_of_scm_blocks)) {
>> +        return H_P2;
>> +    }
>> +
>> +    if (((starting_idx + no_of_scm_blocks_to_bind) < starting_idx) ||
>> +        ((starting_idx + no_of_scm_blocks_to_bind) > total_no_of_scm_blocks)) {
>> +        return H_P3;
>> +    }
>> +
>> +    /* Currently qemu assigns the address. */
>> +    if (target_logical_mem_addr != 0xffffffffffffffff) {
>> +        return H_OVERLAP;
> So, only supporting one mode of the interface is reasonable.  However,
> it seems a somewhat unnatural way to handle the PAPR interface.  It
> seems to me it would be more natural to instantiate the underlying
> NVDIMM objects so that they're not in address_space_memory, but rather
> in their own initially inaccssible address_space.  Then the BIND call
> would alias a chunk of address_space_memory into the NVDIMMs address
> space.

The pre-plug checks require the memory region to be initialized before 
reaching
there. So, I cant avoid the minimal initialization with the size calling
memory_region_init() at the nvdimm_prepare_memory_region(). If I delay
the aliasing till the bind hcall, things seem to work.

Are you suggesting me to do something like this?

https://github.ibm.com/shivapbh/qemu-1/commit/04e0a5c7ef71db942be5ced936fde93dd7bb3ee4

>> +    }
>> +
>> +    /*
>> +     * Currently continue token should be zero qemu has already bound
>> +     * everything and this hcall doesnt return H_BUSY.
>> +     */
>> +    if (continue_token > 0) {
>> +        return H_P5;
>> +    }
>> +
>> +    /* NB : Already bound, Return target logical address in R4 */
>> +    addr = object_property_get_uint(OBJECT(dimm),
>> +                                    PC_DIMM_ADDR_PROP, &local_err);
>> +    if (local_err) {
>> +        error_report_err(local_err);
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    args[1] = addr;
> Don't you need to adjust this if start_idx is non-zero?

Since I don't support partial bind request, start_idx can never be 
non-zero. I think
I need to enforce it with more checks here.

>
>> +    args[2] = no_of_scm_blocks_to_bind;
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> +                                        target_ulong opcode,
>> +                                        target_ulong *args)
>> +{
>> +    uint32_t drc_index = args[0];
>> +    uint64_t starting_scm_logical_addr = args[1];
>> +    uint64_t no_of_scm_blocks_to_unbind = args[2];
>> +    uint64_t size_to_unbind;
>> +    uint64_t continue_token = args[3];
>> +    Range blockrange = range_empty;
>> +    Range nvdimmrange = range_empty;
>> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
>> +    DeviceState *dev = NULL;
>> +    PCDIMMDevice *dimm = NULL;
>> +    uint64_t size, addr;
>> +
>> +    if (drc && spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    /* Check if starting_scm_logical_addr is block aligned */
>> +    if (!QEMU_IS_ALIGNED(starting_scm_logical_addr,
>> +                         SPAPR_MINIMUM_SCM_BLOCK_SIZE)) {
>> +        return H_P2;
>> +    }
>> +
>> +    dev = drc->dev;
>> +    dimm = PC_DIMM(dev);
>> +    size = object_property_get_int(OBJECT(dimm), PC_DIMM_SIZE_PROP, NULL);
>> +    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, NULL);
>> +
>> +    range_init_nofail(&nvdimmrange, addr, size);
>> +
>> +    size_to_unbind = no_of_scm_blocks_to_unbind * SPAPR_MINIMUM_SCM_BLOCK_SIZE;
>> +
>> +
>> +    range_init_nofail(&blockrange, starting_scm_logical_addr, size_to_unbind);
>> +
>> +    if (!range_contains_range(&nvdimmrange, &blockrange)) {
>> +        return H_P3;
>> +    }
>> +
>> +    if (continue_token > 0) {
>> +        return H_P3;
>> +    }
>> +
>> +    args[1] = no_of_scm_blocks_to_unbind;
>> +
>> +    /*NB : dont do anything, let object_del take care of this for now. */
>> +    return H_SUCCESS;
>> +}
>> +
>>   static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
>>   static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
>>   
>> @@ -1894,6 +2090,12 @@ static void hypercall_register_types(void)
>>       /* qemu/KVM-PPC specific hcalls */
>>       spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
>>   
>> +    /* qemu/scm specific hcalls */
>> +    spapr_register_hypercall(H_SCM_READ_METADATA, h_scm_read_metadata);
>> +    spapr_register_hypercall(H_SCM_WRITE_METADATA, h_scm_write_metadata);
>> +    spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
>> +    spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
>> +
>>       /* ibm,client-architecture-support support */
>>       spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
>>   
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 394ea26335..48e2cc9d67 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -283,6 +283,7 @@ struct SpaprMachineState {
>>   #define H_P7              -60
>>   #define H_P8              -61
>>   #define H_P9              -62
>> +#define H_OVERLAP         -68
>>   #define H_UNSUPPORTED_FLAG -256
>>   #define H_MULTI_THREADS_ACTIVE -9005
>>   
>> @@ -490,8 +491,12 @@ struct SpaprMachineState {
>>   #define H_INT_ESB               0x3C8
>>   #define H_INT_SYNC              0x3CC
>>   #define H_INT_RESET             0x3D0
>> +#define H_SCM_READ_METADATA     0x3E4
>> +#define H_SCM_WRITE_METADATA    0x3E8
>> +#define H_SCM_BIND_MEM          0x3EC
>> +#define H_SCM_UNBIND_MEM        0x3F0
>>   
>> -#define MAX_HCALL_OPCODE        H_INT_RESET
>> +#define MAX_HCALL_OPCODE        H_SCM_UNBIND_MEM
>>   
>>   /* The hcalls above are standardized in PAPR and implemented by pHyp
>>    * as well.
>>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device
  2019-05-22 18:08   ` Fabiano Rosas
  2019-05-22 18:11     ` Fabiano Rosas
  2019-05-23  0:46     ` David Gibson
@ 2019-07-11  5:51     ` Shivaprasad G Bhat
  2 siblings, 0 replies; 16+ messages in thread
From: Shivaprasad G Bhat @ 2019-07-11  5:51 UTC (permalink / raw)
  To: Fabiano Rosas, imammedo, david, xiaoguangrong.eric, mst
  Cc: qemu-ppc, qemu-devel

Thanks for the comments Fabiano.


On 05/22/2019 11:38 PM, Fabiano Rosas wrote:
> Shivaprasad G Bhat <sbhat@linux.ibm.com> writes:
>
> +
>> +    ddc = NVDIMM_GET_CLASS(nvdimm);
>> +    ddc->read_label_data(nvdimm, &args[0], numBytesToRead, offset);
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +
>> +static target_ulong h_scm_write_metadata(PowerPCCPU *cpu,
>> +                                         SpaprMachineState *spapr,
>> +                                         target_ulong opcode,
>> +                                         target_ulong *args)
>> +{
>> +    uint32_t drc_index = args[0];
>> +    uint64_t offset = args[1];
>> +    uint64_t data = args[2];
>> +    int8_t numBytesToWrite = args[3];
> Likewise.

This is supposed to be uint64_t like used in other places.
Will fix it in the next version.

Rest of the comments I think David has already answered the rational
behind the usage to avoid integer overflow.

>
>> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
>> +    NVDIMMDevice *nvdimm = NULL;
>> +    DeviceState *dev = NULL;
>>

Regards,
Shivaprasad



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support
  2019-05-22 16:07   ` Fabiano Rosas
@ 2019-07-16  7:53     ` Shivaprasad G Bhat
  0 siblings, 0 replies; 16+ messages in thread
From: Shivaprasad G Bhat @ 2019-07-16  7:53 UTC (permalink / raw)
  To: Fabiano Rosas, imammedo, david, xiaoguangrong.eric, mst
  Cc: qemu-ppc, qemu-devel

Hi Fabiano,


On 05/22/2019 09:37 PM, Fabiano Rosas wrote:
> Shivaprasad G Bhat <sbhat@linux.ibm.com> writes:
>
>> +    /* Create DT entries for cold plugged NVDIMM devices */
>> +    dimms = nvdimm_get_device_list();
>> +    for (; dimms; dimms = dimms->next) {
>> +        NVDIMMDevice *nvdimm = dimms->data;
>> +
>> +        spapr_populate_nvdimm_node(fdt, offset, nvdimm);
>> +    }
>> +    g_slist_free(dimms);
> To free the whole list you'll need another variable in the loop above,
> right?

Nope. The  g_slist_free() takes care of freeing the list node pointers.

As I am iterating using the original dimms variable, I would still not
be freeing the list here though, fixing that in next version.

Valgrind pointed out few more leaks in many places, will fix them all.

>
> Cheers



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-07-16  7:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-13  9:25 [Qemu-devel] [RFC v2 PATCH 0/3] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
2019-05-13  9:27 ` [Qemu-devel] [RFC v2 PATCH 1/3] mem: make nvdimm_device_list global Shivaprasad G Bhat
2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 2/3] spapr: Add NVDIMM device support Shivaprasad G Bhat
2019-05-14  2:22   ` David Gibson
2019-05-15  7:00     ` Shivaprasad G Bhat
2019-05-16  1:32       ` David Gibson
2019-05-17  7:42         ` Shivaprasad G Bhat
2019-05-22 16:07   ` Fabiano Rosas
2019-07-16  7:53     ` Shivaprasad G Bhat
2019-05-13  9:28 ` [Qemu-devel] [RFC v2 PATCH 3/3] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
2019-05-14  4:38   ` David Gibson
2019-07-11  5:46     ` Shivaprasad G Bhat
2019-05-22 18:08   ` Fabiano Rosas
2019-05-22 18:11     ` Fabiano Rosas
2019-05-23  0:46     ` David Gibson
2019-07-11  5:51     ` Shivaprasad G Bhat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.