All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [QEMU RFC PATCH v2 0/6] Migration: ensure hotplug and migration work together
@ 2016-05-24 17:55 Jianjun Duan
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 1/6] spapr: ensure device trees are always associated with DRC Jianjun Duan
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Jianjun Duan @ 2016-05-24 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, amit.shah, quintela, duanj

v2: - Introduce a general approach to migrate QTAILQ in qemu/queue.h.
    - Migrate signalled field in the DRC state.
    - Put the newly added migrating fields in subsections so that backward migration is not broken.  
    - Set detach_cb field right after migration so that a migrated hot-unplug event could finish its course.

v1: - Inital version.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02601.html)

To make guest device (PCI, CPU and memory) hotplug work together 
with guest migration, spapr drc state needs be transmitted in
migration. This patch defines the VMStateDescription struct for
spapr drc state to enable it.

To fix the potential racing between hotplug events on guest and 
guest migration, ccs_list and pending_events of spapr state need be 
transmitted in migration. This patch also takes care of it. 


Alexey Kardashevskiy (1):
  vmstate: Define VARRAY with VMS_ALLOC

Jianjun Duan (5):
  spapr: ensure device trees are always associated with DRC
  Migration: Defined VMStateDescription struct for spapr_drc
  Migration: migrate QTAILQ
  Migration: migrate ccs_list in spapr state
  Migration: migrate pending_events of spapr state

 hw/ppc/spapr.c              | 92 ++++++++++++++++++++++++++++++++++++++++-----
 hw/ppc/spapr_drc.c          | 61 ++++++++++++++++++++++++++++++
 hw/ppc/spapr_events.c       | 22 ++++++-----
 hw/ppc/spapr_pci.c          | 34 +++++++++++++----
 include/hw/ppc/spapr.h      |  3 +-
 include/hw/ppc/spapr_drc.h  |  9 +++++
 include/migration/vmstate.h | 69 ++++++++++++++++++++++++++++++++++
 include/qemu/queue.h        | 38 +++++++++++++++++++
 migration/vmstate.c         | 84 +++++++++++++++++++++++++++++++++++++++++
 9 files changed, 385 insertions(+), 27 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [QEMU RFC PATCH v2 1/6] spapr: ensure device trees are always associated with DRC
  2016-05-24 17:55 [Qemu-devel] [QEMU RFC PATCH v2 0/6] Migration: ensure hotplug and migration work together Jianjun Duan
@ 2016-05-24 17:55 ` Jianjun Duan
  2016-05-25  5:20   ` David Gibson
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 2/6] Migration: Defined VMStateDescription struct for spapr_drc Jianjun Duan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jianjun Duan @ 2016-05-24 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, amit.shah, quintela, duanj

There are possible racing situations involving hotplug events and
guest migration. For cases where a hotplug event is migrated, or
the guest is in the process of fetching device tree at the time of
migration, we need to ensure the device tree is created and
associated with the corresponding DRC for devices that were
hotplugged on the source, but 'coldplugged' on the target.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c     | 16 ++++++----------
 hw/ppc/spapr_pci.c | 12 +++++-------
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b69995e..ec18bee 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2131,15 +2131,6 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
     int i, fdt_offset, fdt_size;
     void *fdt;
 
-    /*
-     * Check for DRC connectors and send hotplug notification to the
-     * guest only in case of hotplugged memory. This allows cold plugged
-     * memory to be specified at boot time.
-     */
-    if (!dev->hotplugged) {
-        return;
-    }
-
     for (i = 0; i < nr_lmbs; i++) {
         drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
                 addr/SPAPR_MEMORY_BLOCK_SIZE);
@@ -2153,7 +2144,12 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
         drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, errp);
         addr += SPAPR_MEMORY_BLOCK_SIZE;
     }
-    spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
+    /* send hotplug notification to the
+     * guest only in case of hotplugged memory
+     */
+    if (dev->hotplugged) {
+       spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
+    }
 }
 
 static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 573e635..f0b9de9 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1092,13 +1092,11 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector *drc,
         spapr_tce_set_need_vfio(tcet, true);
     }
 
-    if (dev->hotplugged) {
-        fdt = create_device_tree(&fdt_size);
-        fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
-        if (!fdt_start_offset) {
-            error_setg(errp, "Failed to create pci child device tree node");
-            goto out;
-        }
+    fdt = create_device_tree(&fdt_size);
+    fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
+    if (!fdt_start_offset) {
+        error_setg(errp, "Failed to create pci child device tree node");
+        goto out;
     }
 
     drck->attach(drc, DEVICE(pdev),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [QEMU RFC PATCH v2 2/6] Migration: Defined VMStateDescription struct for spapr_drc
  2016-05-24 17:55 [Qemu-devel] [QEMU RFC PATCH v2 0/6] Migration: ensure hotplug and migration work together Jianjun Duan
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 1/6] spapr: ensure device trees are always associated with DRC Jianjun Duan
@ 2016-05-24 17:55 ` Jianjun Duan
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 3/6] vmstate: Define VARRAY with VMS_ALLOC Jianjun Duan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Jianjun Duan @ 2016-05-24 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, amit.shah, quintela, duanj

To manage hotplug/unplug of dynamic resources such as PCI cards,
memory, and CPU on sPAPR guests, a firmware abstraction known as
a Dynamic Resource Connector (DRC) is used to assign a particular
dynamic resource to the guest, and provide an interface for the
guest to manage configuration/removal of the resource associated
with it.

To migrate the hotplugged resources in migration, the
associated DRC state need be migrated. To migrate the DRC state,
we defined the VMStateDescription struct for spapr_drc to enable
the transmission of spapr_drc state in migration.

Not all the elements in the DRC state are migrated. Only those
ones modifiable or needed by guest actions or device add/remove
operation are migrated. From the perspective of device
hotplugging, if we hotplug a device on the source, we need to
"coldplug" it on the target. The states across two hosts for the
same device are not the same. Ideally we want the states be same
after migration so that the device would function as hotplugged
on the target. For example we can unplug it. The minimum DRC
state we need to transfer should cover all the pieces changed by
hotplugging. Out of the elements of the DRC state, isolation_state,
allocation_sate, and configured are involved in the DR state
transition diagram from PAPR+ 2.7, 13.4. configured and signalled
are needed in attaching and detaching devices. indicator_state
provides users with hardware state information. These 6 elements
are migrated.

detach_cb in the DRC state is a function pointer that cannot be
migrated. We set it right after DRC state is migrated so that
a migrated hot-unplug event could finish its work.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 hw/ppc/spapr_drc.c         | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_pci.c         | 22 +++++++++++++++++
 include/hw/ppc/spapr_drc.h |  9 +++++++
 3 files changed, 92 insertions(+)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 94c875d..1fb5e23 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -617,6 +617,65 @@ static void spapr_dr_connector_instance_init(Object *obj)
                         NULL, NULL, NULL, NULL);
 }
 
+static bool spapr_drc_needed(void *opaque)
+{
+    sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    bool rc = false;
+    sPAPRDREntitySense value;
+
+    drck->entity_sense(drc, &value);
+    /* If no dev is plugged in there is no need to migrate the DRC state */
+    if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
+        return false;
+    }
+    /*
+     * If there is dev plugged in, we need to migrate the DRC state when
+     * it is different from cold-plugged state
+     */
+    switch(drc->type) {
+    /* for PCI type */
+    case SPAPR_DR_CONNECTOR_TYPE_PCI:
+        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
+               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
+               drc->configured && drc->signalled && !drc->awaiting_release);
+        break;
+    /* for LMB type */
+    case SPAPR_DR_CONNECTOR_TYPE_LMB:
+        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
+               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
+               drc->configured && drc->signalled && !drc->awaiting_release);
+        break;
+    default:
+        ;
+    }
+
+    return rc;
+}
+
+/* detach_cb needs be set since it is not migrated */
+static void postmigrate_set_detach_cb(sPAPRDRConnector *drc,
+                                      spapr_drc_detach_cb *detach_cb)
+{
+    drc->detach_cb = detach_cb;
+}
+
+static const VMStateDescription vmstate_spapr_drc = {
+    .name = "spapr_drc",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_drc_needed,
+    .fields  = (VMStateField []) {
+        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
+        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
+        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
+        VMSTATE_BOOL(configured, sPAPRDRConnector),
+        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
+        VMSTATE_BOOL(signalled, sPAPRDRConnector),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
 {
     DeviceClass *dk = DEVICE_CLASS(k);
@@ -625,6 +684,7 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
     dk->reset = reset;
     dk->realize = realize;
     dk->unrealize = unrealize;
+    dk->vmsd = &vmstate_spapr_drc;
     drck->set_isolation_state = set_isolation_state;
     drck->set_indicator_state = set_indicator_state;
     drck->set_allocation_state = set_allocation_state;
@@ -638,6 +698,7 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
     drck->detach = detach;
     drck->release_pending = release_pending;
     drck->set_signalled = set_signalled;
+    drck->postmigrate_set_detach_cb = postmigrate_set_detach_cb;
     /*
      * Reason: it crashes FIXME find and document the real reason
      */
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index f0b9de9..e10774d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1561,11 +1561,33 @@ static void spapr_pci_pre_save(void *opaque)
     }
 }
 
+/*
+ * detach_cb in the DRC state is a function pointer that cannot be
+ * migrated. We set it right after migration so that a migrated
+ * hot-unplug event could finish its work.
+ */
+static void spapr_pci_set_detach_cb(PCIBus *bus, PCIDevice *pdev,
+                                 void *opaque)
+{
+    sPAPRPHBState *sphb = opaque;
+    sPAPRDRConnector *drc = spapr_phb_get_pci_drc(sphb, pdev);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    drck->postmigrate_set_detach_cb(drc, spapr_phb_remove_pci_device_cb);
+}
+
 static int spapr_pci_post_load(void *opaque, int version_id)
 {
     sPAPRPHBState *sphb = opaque;
     gpointer key, value;
     int i;
+    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
+    unsigned int bus_no = 0;
+
+    /* Set detach_cb for the drc unconditionally after migration */
+    if (bus) {
+        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
+                            &bus_no);
+    }
 
     for (i = 0; i < sphb->msi_devs_num; ++i) {
         key = g_memdup(&sphb->msi_devs[i].key,
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index fa21ba0..a68433b 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -190,6 +190,15 @@ typedef struct sPAPRDRConnectorClass {
                    void *detach_cb_opaque, Error **errp);
     bool (*release_pending)(sPAPRDRConnector *drc);
     void (*set_signalled)(sPAPRDRConnector *drc);
+
+    /*
+     * QEMU interface for setting detach_cb after migration.
+     * detach_cb in the DRC state is a function pointer that cannot be
+     * migrated. We set it right after migration so that a migrated
+     * hot-unplug event could finish its work.
+     */
+    void (*postmigrate_set_detach_cb)(sPAPRDRConnector *drc,
+                                      spapr_drc_detach_cb *detach_cb);
 } sPAPRDRConnectorClass;
 
 sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [QEMU RFC PATCH v2 3/6] vmstate: Define VARRAY with VMS_ALLOC
  2016-05-24 17:55 [Qemu-devel] [QEMU RFC PATCH v2 0/6] Migration: ensure hotplug and migration work together Jianjun Duan
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 1/6] spapr: ensure device trees are always associated with DRC Jianjun Duan
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 2/6] Migration: Defined VMStateDescription struct for spapr_drc Jianjun Duan
@ 2016-05-24 17:55 ` Jianjun Duan
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ Jianjun Duan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Jianjun Duan @ 2016-05-24 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, amit.shah, quintela, duanj

From: Alexey Kardashevskiy <aik@ozlabs.ru>

This allows dynamic allocation for migrating arrays.

Already existing VMSTATE_VARRAY_UINT32 requires an array to be
pre-allocated, however there are cases when the size is not known in
advance and there is no real need to enforce it.

This defines another variant of VMSTATE_VARRAY_UINT32 with WMS_ALLOC
flag which tells the receiving side to allocate memory for the array
before receiving the data.

The first user of it is a dynamic DMA window which existence and size
are totally dynamic.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 include/migration/vmstate.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 84ee355..1622638 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -386,6 +386,16 @@ extern const VMStateInfo vmstate_info_bitmap;
     .offset     = vmstate_offset_pointer(_state, _field, _type),     \
 }
 
+#define VMSTATE_VARRAY_UINT32_ALLOC(_field, _state, _field_num, _version, _info, _type) {\
+    .name       = (stringify(_field)),                               \
+    .version_id = (_version),                                        \
+    .num_offset = vmstate_offset_value(_state, _field_num, uint32_t),\
+    .info       = &(_info),                                          \
+    .size       = sizeof(_type),                                     \
+    .flags      = VMS_VARRAY_UINT32|VMS_POINTER|VMS_ALLOC,           \
+    .offset     = vmstate_offset_pointer(_state, _field, _type),     \
+}
+
 #define VMSTATE_VARRAY_UINT16_UNSAFE(_field, _state, _field_num, _version, _info, _type) {\
     .name       = (stringify(_field)),                               \
     .version_id = (_version),                                        \
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-24 17:55 [Qemu-devel] [QEMU RFC PATCH v2 0/6] Migration: ensure hotplug and migration work together Jianjun Duan
                   ` (2 preceding siblings ...)
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 3/6] vmstate: Define VARRAY with VMS_ALLOC Jianjun Duan
@ 2016-05-24 17:55 ` Jianjun Duan
  2016-05-25  5:38   ` David Gibson
  2016-05-25  8:14   ` Paolo Bonzini
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 5/6] Migration: migrate ccs_list in spapr state Jianjun Duan
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 6/6] Migration: migrate pending_events of " Jianjun Duan
  5 siblings, 2 replies; 18+ messages in thread
From: Jianjun Duan @ 2016-05-24 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, amit.shah, quintela, duanj

A recursive structure has elements of the same type in itself. Currently
we cannot directly transfer a QTAILQ instance, or any recursive
structure such as lists in migration because of the limitation in the
migration code. Here we introduce a general approach to transfer such
structures. In our approach a recursive structure is tagged with VMS_CSTM.
We extend VMStateField with 3 fields: meta_data to store the meta data
about the recursive structure in question, extend_get to load the structure
from migration stream to memory, and extend_put to dump the structure into
the migration stream. This extension mirrors VMStateInfo. We then modify
vmstate_save_state and vmstate_load_state so that when VMS_CSTM is
encountered, extend_put and extend_get are called respectively with the
knowledge of the meta data.

To make it work for QTAILQ in qemu/queue.h, we created the meta data
format, extend_get and extend_put for it. We will use this approach to
transfer pending_events and ccs_list in spapr state.

We also create some macros in qemu/queue.h to get the layout information
about QTAILQ. This ensures that we do not depend on the implementation
details about QTAILQ in the migration code.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 include/migration/vmstate.h | 59 +++++++++++++++++++++++++++++++
 include/qemu/queue.h        | 38 ++++++++++++++++++++
 migration/vmstate.c         | 84 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 181 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 1622638..bf57b25 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -183,6 +183,8 @@ enum VMStateFlags {
      * to determine the number of entries in the array. Only valid in
      * combination with one of VMS_VARRAY*. */
     VMS_MULTIPLY_ELEMENTS = 0x4000,
+    /* For fields which need customized handling, such as QTAILQ in queue.h*/
+    VMS_CSTM            = 0x8000,
 };
 
 typedef struct {
@@ -198,6 +200,14 @@ typedef struct {
     const VMStateDescription *vmsd;
     int version_id;
     bool (*field_exists)(void *opaque, int version_id);
+    /*
+     * Following 3 fields are for VMStateField which needs customized handling,
+     * such as QTAILQ in qemu/queue.h, lists, and tree.
+     */
+    const void *meta_data;
+    int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
+    void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
+                       QJSON *vmdesc);
 } VMStateField;
 
 struct VMStateDescription {
@@ -654,6 +664,18 @@ extern const VMStateInfo vmstate_info_bitmap;
     .offset       = offsetof(_state, _field),                        \
 }
 
+/* For fields that need customized handling, such as queue, list */
+#define VMSTATE_CSTM_V(_field, _state, _version, _meta_data, _get, _put)  \
+{                                                                         \
+    .name         = (stringify(_field)),                                  \
+    .version_id   = (_version),                                           \
+    .flags        = VMS_CSTM,                                             \
+    .offset       = offsetof(_state, _field),                             \
+    .meta_data    = &(_meta_data),                                        \
+    .extend_get   = (_get),                                               \
+    .extend_put   = (_put),                                               \
+}
+
 /* _f : field name
    _f_n : num of elements field_name
    _n : num of elements
@@ -970,4 +992,41 @@ int64_t self_announce_delay(int round)
 
 void dump_vmstate_json_to_file(FILE *out_fp);
 
+
+/* Meta data for QTAILQ */
+typedef struct QTAILQMetaData {
+    /* the offset of tqh_first in QTAILQ_HEAD */
+    size_t first;
+    /* the offset of tqh_last in QTAILQ_HEAD */
+    size_t last;
+    /* the offset of tqe_next in QTAILQ_ENTRY */
+    size_t next;
+    /* the offset of tqe_prev in QTAILQ_ENTRY */
+    size_t prev;
+    /* the offset of QTAILQ_ENTRY in a QTAILQ element*/
+    size_t entry;
+    /* size of a QTAILQ element */
+    size_t size;
+    /* vmsd of a QTAILQ element */
+    const VMStateDescription *vmsd;
+} QTAILQMetaData;
+
+#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
+    .first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),        \
+    .last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)),         \
+    .next = QTAILQ_NEXT_OFFSET(_type, _next),                          \
+    .prev = QTAILQ_PREV_OFFSET(_type, _next),                          \
+    .entry = offsetof(_type, _next),                                   \
+    .size = sizeof(_type),                                             \
+    .vmsd = &(_vmsd),                                                  \
+}
+
+int vmstate_get_qtailq(QEMUFile *f, const void *metadata, void *opaque);
+void vmstate_put_qtailq(QEMUFile *f, const void *metadata,
+                         void *opaque, QJSON *vmdesc);
+
+/* VMStateField for QTAILQ field */
+#define VMSTATE_QTAILQ_V(_field, _state, _version, _meta_data)               \
+    VMSTATE_CSTM_V(_field, _state, _version, _meta_data, vmstate_get_qtailq, \
+                   vmstate_put_qtailq)
 #endif
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index f781aa2..46962d7 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -437,3 +437,41 @@ struct {                                                                \
         (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last))
 
 #endif  /* !QEMU_SYS_QUEUE_H_ */
+
+/*
+ * Offsets of layout of a tail queue head.
+ */
+#define QTAILQ_FIRST_OFFSET(head_type) \
+        ((size_t) ((char *) &((head_type *)0)->tqh_first - (char *)0))
+
+#define QTAILQ_LAST_OFFSET(head_type) \
+        ((size_t) ((char *) &((head_type *)0)->tqh_last - (char *)0))
+
+/*
+ * Offsets of layout of a tail queue element.
+ */
+#define QTAILQ_NEXT_OFFSET(ele_type, field)                            \
+        ((size_t) ((char *) &((ele_type *)0)->field.tqe_next -         \
+                   (char *) &((ele_type *)0)->field))
+
+#define QTAILQ_PREV_OFFSET(ele_type, field) \
+        ((size_t) ((char *) &((ele_type *)0)->field.tqe_prev -         \
+                   (char *) &((ele_type *)0)->field))
+/*
+ * Tail queue tranversal using pointer arithmetic.
+ */
+#define QTAILQ_RAW_FOREACH(elm, head, entry, first, next)              \
+        for ((elm) = *((void **) ((char *) (head) + (first)));         \
+             (elm);                                                    \
+             (elm) = *((void **) ((char *) (elm) + (entry) + (next))))
+/*
+ * Tail queue insertion using pointer arithmetic.
+ */
+#define QTAILQ_RAW_INSERT_TAIL(head, elm, entry, first, last, next, prev) do { \
+            *((void **) ((char *) (elm) + (entry) + (next))) = NULL;           \
+            *((void **) ((char *) (elm) + (entry) + (prev))) =                 \
+                *((void **) ((char *) (head) + (last)));                       \
+            **((void ***)((char *) (head) + (last))) = (elm);                  \
+            *((void **) ((char *) (head) + (last))) =                          \
+                (void *) ((char *) (elm) + (entry) + (next));                  \
+} while (/*CONSTCOND*/0)
diff --git a/migration/vmstate.c b/migration/vmstate.c
index bf3d5db..47cd052 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -5,6 +5,7 @@
 #include "migration/vmstate.h"
 #include "qemu/bitops.h"
 #include "qemu/error-report.h"
+#include "qemu/queue.h"
 #include "trace.h"
 #include "qjson.h"
 
@@ -121,6 +122,8 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
                 if (field->flags & VMS_STRUCT) {
                     ret = vmstate_load_state(f, field->vmsd, addr,
                                              field->vmsd->version_id);
+                } else if (field->flags & VMS_CSTM) {
+                    ret = field->extend_get(f, field->meta_data, addr);
                 } else {
                     ret = field->info->get(f, addr, size);
 
@@ -193,6 +196,8 @@ static const char *vmfield_get_type_name(VMStateField *field)
 
     if (field->flags & VMS_STRUCT) {
         type = "struct";
+    } else if (field->flags & VMS_CSTM) {
+        type = "customized";
     } else if (field->info->name) {
         type = field->info->name;
     }
@@ -327,6 +332,8 @@ void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
                 }
                 if (field->flags & VMS_STRUCT) {
                     vmstate_save_state(f, field->vmsd, addr, vmdesc_loop);
+                } else if  (field->flags & VMS_CSTM) {
+                    field->extend_put(f, field->meta_data, addr, vmdesc_loop);
                 } else {
                     field->info->put(f, addr, size);
                 }
@@ -916,3 +923,80 @@ const VMStateInfo vmstate_info_bitmap = {
     .get = get_bitmap,
     .put = put_bitmap,
 };
+
+/* extend_get for QTAILQ */
+int vmstate_get_qtailq(QEMUFile *f, const void *metadata, void *opaque)
+{
+    bool link;
+    int ret = 0;
+    size_t first, last, next, prev, entry, size;
+    QTAILQMetaData *data = (QTAILQMetaData *)metadata;
+    const VMStateDescription *vmsd = data->vmsd;
+    int version_id = vmsd->version_id;
+    void *elm;
+
+    first = data->first;
+    last = data->last;
+    next = data->next;
+    prev = data->prev;
+    entry = data->entry;
+    size = data->size;
+
+    trace_vmstate_load_state(vmsd->name, version_id);
+    if (version_id > vmsd->version_id) {
+        trace_vmstate_load_state_end(vmsd->name, "too new", -EINVAL);
+        return -EINVAL;
+    }
+    if (version_id < vmsd->minimum_version_id) {
+        trace_vmstate_load_state_end(vmsd->name, "too old", -EINVAL);
+        return -EINVAL;
+    }
+
+    while (true) {
+        vmstate_info_bool.get(f, &link, sizeof(bool));
+        if (!link) {
+            break;
+        }
+
+        elm =  g_malloc(size);
+        ret = vmstate_load_state(f, vmsd, elm, version_id);
+        if (ret) {
+            return ret;
+        }
+        QTAILQ_RAW_INSERT_TAIL(opaque, elm, entry, first, last, next, prev);
+    }
+
+    trace_vmstate_load_state_end(vmsd->name, "end", ret);
+    return ret;
+}
+
+/* extend_get for QTAILQ */
+void vmstate_put_qtailq(QEMUFile *f, const void *metadata, void *opaque,
+                         QJSON *vmdesc)
+{
+    bool link = true;
+    size_t first, next, entry;
+    QTAILQMetaData *data = (QTAILQMetaData *)metadata;
+    const VMStateDescription *vmsd = data->vmsd;
+    void *elm;
+
+    first = data->first;
+    next = data->next;
+    entry = data->entry;
+
+    if (vmdesc) {
+        json_prop_str(vmdesc, "vmsd_name", vmsd->name);
+        json_prop_int(vmdesc, "version", vmsd->version_id);
+        json_start_array(vmdesc, "fields");
+    }
+
+    QTAILQ_RAW_FOREACH(elm, opaque, entry, first, next) {
+        vmstate_info_bool.put(f, &link, sizeof(bool));
+        vmstate_save_state(f, vmsd, elm, vmdesc);
+    }
+    link = false;
+    vmstate_info_bool.put(f, &link, sizeof(bool));
+    if (vmdesc) {
+        json_end_array(vmdesc);
+    }
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [QEMU RFC PATCH v2 5/6] Migration: migrate ccs_list in spapr state
  2016-05-24 17:55 [Qemu-devel] [QEMU RFC PATCH v2 0/6] Migration: ensure hotplug and migration work together Jianjun Duan
                   ` (3 preceding siblings ...)
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ Jianjun Duan
@ 2016-05-24 17:55 ` Jianjun Duan
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 6/6] Migration: migrate pending_events of " Jianjun Duan
  5 siblings, 0 replies; 18+ messages in thread
From: Jianjun Duan @ 2016-05-24 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, amit.shah, quintela, duanj

ccs_list in spapr state maintains the device tree related
information on the rtas side for hotplugged devices. In racing
situations between hotplug events and migration operation, a rtas
hotplug event could be migrated from the source guest to target
guest, or the source guest could have not yet finished fetching
the device tree when migration is started, the target will try
to finish fetching the device tree. By migrating ccs_list, the
target can fetch the device tree properly.

ccs_list is put in a subsection in the spapr state VMSD to make
sure migration across different versions is not broken.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ec18bee..88b6e3b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1265,6 +1265,40 @@ static bool version_before_3(void *opaque, int version_id)
     return version_id < 3;
 }
 
+static bool spapr_ccs_list_needed(void *opaque)
+{
+    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
+    return !QTAILQ_EMPTY(&spapr->ccs_list);
+}
+
+static const VMStateDescription vmstate_spapr_ccs = {
+    .name = "spaprconfigureconnectorstate",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorState),
+        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorState),
+        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const QTAILQMetaData spapr_ccs_metadata =
+    VMSTATE_QTAILQ_METADATA( ccs_list, sPAPRMachineState,
+                             sPAPRConfigureConnectorState, next,
+                             vmstate_spapr_ccs);
+
+static const VMStateDescription vmstate_spapr_ccs_list = {
+    .name = "spaprccslist",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_ccs_list_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_QTAILQ_V(ccs_list, sPAPRMachineState, 1, spapr_ccs_metadata),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -1280,6 +1314,10 @@ static const VMStateDescription vmstate_spapr = {
         VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription*[]) {
+        &vmstate_spapr_ccs_list,
+        NULL
+    }
 };
 
 static int htab_save_setup(QEMUFile *f, void *opaque)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [QEMU RFC PATCH v2 6/6] Migration: migrate pending_events of spapr state
  2016-05-24 17:55 [Qemu-devel] [QEMU RFC PATCH v2 0/6] Migration: ensure hotplug and migration work together Jianjun Duan
                   ` (4 preceding siblings ...)
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 5/6] Migration: migrate ccs_list in spapr state Jianjun Duan
@ 2016-05-24 17:55 ` Jianjun Duan
  5 siblings, 0 replies; 18+ messages in thread
From: Jianjun Duan @ 2016-05-24 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, amit.shah, quintela, duanj

In racing situations between hotplug events and migration operation,
a rtas hotplug event could have not yet be delivered to the source
guest when migration is started. In this case the pending_events of
spapr state need be transmitted to the target so that the hotplug
event can be finished on the target.

All the different fields of the events are encoded as defined by
PAPR. We can migrate them as uint8_t binary stream without any
concerns about data padding or endianess.

pending_events is put in a subsection in the spapr state VMSD to make
sure migration across different versions is not broken.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         | 38 ++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_events.c  | 22 +++++++++++++---------
 include/hw/ppc/spapr.h |  3 ++-
 3 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 88b6e3b..236c579 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1265,12 +1265,32 @@ static bool version_before_3(void *opaque, int version_id)
     return version_id < 3;
 }
 
+static bool spapr_pending_events_needed(void *opaque)
+{
+    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
+    return !QTAILQ_EMPTY(&spapr->pending_events);
+}
+
 static bool spapr_ccs_list_needed(void *opaque)
 {
     sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
     return !QTAILQ_EMPTY(&spapr->ccs_list);
 }
 
+static const VMStateDescription vmstate_spapr_event_entry = {
+    .name = "spapreventlogentry",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT32(log_type, sPAPREventLogEntry),
+        VMSTATE_BOOL(exception, sPAPREventLogEntry),
+        VMSTATE_UINT32(data_size, sPAPREventLogEntry),
+        VMSTATE_VARRAY_UINT32_ALLOC(data, sPAPREventLogEntry, data_size,
+                                    0, vmstate_info_uint8, uint8_t),
+        VMSTATE_END_OF_LIST()
+  },
+};
+
 static const VMStateDescription vmstate_spapr_ccs = {
     .name = "spaprconfigureconnectorstate",
     .version_id = 1,
@@ -1283,11 +1303,28 @@ static const VMStateDescription vmstate_spapr_ccs = {
     },
 };
 
+static const QTAILQMetaData spapr_events_metadata =
+    VMSTATE_QTAILQ_METADATA(pending_events, sPAPRMachineState,
+                            sPAPREventLogEntry, next,
+                            vmstate_spapr_event_entry);
+
 static const QTAILQMetaData spapr_ccs_metadata =
     VMSTATE_QTAILQ_METADATA( ccs_list, sPAPRMachineState,
                              sPAPRConfigureConnectorState, next,
                              vmstate_spapr_ccs);
 
+static const VMStateDescription vmstate_spapr_pending_events = {
+    .name = "spaprpendingevents",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_pending_events_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_QTAILQ_V(pending_events, sPAPRMachineState, 1,
+                       spapr_events_metadata),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr_ccs_list = {
     .name = "spaprccslist",
     .version_id = 1,
@@ -1315,6 +1352,7 @@ static const VMStateDescription vmstate_spapr = {
         VMSTATE_END_OF_LIST()
     },
     .subsections = (const VMStateDescription*[]) {
+        &vmstate_spapr_pending_events,
         &vmstate_spapr_ccs_list,
         NULL
     }
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 049fb1b..1680c08 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -239,7 +239,8 @@ void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
     _FDT((fdt_end_node(fdt)));
 }
 
-static void rtas_event_log_queue(int log_type, void *data, bool exception)
+static void rtas_event_log_queue(int log_type, void *data, bool exception,
+				 int data_size)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     sPAPREventLogEntry *entry = g_new(sPAPREventLogEntry, 1);
@@ -248,6 +249,7 @@ static void rtas_event_log_queue(int log_type, void *data, bool exception)
     entry->log_type = log_type;
     entry->exception = exception;
     entry->data = data;
+    entry->data_size = data_size;
     QTAILQ_INSERT_TAIL(&spapr->pending_events, entry, next);
 }
 
@@ -350,6 +352,7 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     struct rtas_event_log_v6_mainb *mainb;
     struct rtas_event_log_v6_epow *epow;
     struct epow_log_full *new_epow;
+    uint32_t data_size;
 
     new_epow = g_malloc0(sizeof(*new_epow));
     hdr = &new_epow->hdr;
@@ -358,13 +361,13 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     mainb = &new_epow->mainb;
     epow = &new_epow->epow;
 
+    data_size = sizeof(*new_epow);
     hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
                                | RTAS_LOG_SEVERITY_EVENT
                                | RTAS_LOG_DISPOSITION_NOT_RECOVERED
                                | RTAS_LOG_OPTIONAL_PART_PRESENT
                                | RTAS_LOG_TYPE_EPOW);
-    hdr->extended_length = cpu_to_be32(sizeof(*new_epow)
-                                       - sizeof(new_epow->hdr));
+    hdr->extended_length = cpu_to_be32(data_size - sizeof(new_epow->hdr));
 
     spapr_init_v6hdr(v6hdr);
     spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
@@ -384,7 +387,7 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
     epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
 
-    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true);
+    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true, data_size);
 
     qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
 }
@@ -407,6 +410,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     struct rtas_event_log_v6_maina *maina;
     struct rtas_event_log_v6_mainb *mainb;
     struct rtas_event_log_v6_hp *hp;
+    uint32_t data_size;
 
     new_hp = g_malloc0(sizeof(struct hp_log_full));
     hdr = &new_hp->hdr;
@@ -415,14 +419,14 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     mainb = &new_hp->mainb;
     hp = &new_hp->hp;
 
+    data_size = sizeof(*new_hp);
     hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
                                | RTAS_LOG_SEVERITY_EVENT
                                | RTAS_LOG_DISPOSITION_NOT_RECOVERED
                                | RTAS_LOG_OPTIONAL_PART_PRESENT
                                | RTAS_LOG_INITIATOR_HOTPLUG
                                | RTAS_LOG_TYPE_HOTPLUG);
-    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
-                                       - sizeof(new_hp->hdr));
+    hdr->extended_length = cpu_to_be32(data_size - sizeof(new_hp->hdr));
 
     spapr_init_v6hdr(v6hdr);
     spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
@@ -463,7 +467,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
         hp->drc.index = cpu_to_be32(drc);
     }
 
-    rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
+    rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true, data_size);
 
     qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
 }
@@ -530,7 +534,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
         goto out_no_events;
     }
 
-    hdr = event->data;
+    hdr = (struct rtas_error_log *)event->data;
     event_len = be32_to_cpu(hdr->extended_length) + sizeof(*hdr);
 
     if (event_len < len) {
@@ -580,7 +584,7 @@ static void event_scan(PowerPCCPU *cpu, sPAPRMachineState *spapr,
         goto out_no_events;
     }
 
-    hdr = event->data;
+    hdr = (struct rtas_error_log *)event->data;
     event_len = be32_to_cpu(hdr->extended_length) + sizeof(*hdr);
 
     if (event_len < len) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 815d5ee..5ded8b1 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -552,7 +552,8 @@ sPAPRTCETable *spapr_tce_find_by_liobn(target_ulong liobn);
 struct sPAPREventLogEntry {
     int log_type;
     bool exception;
-    void *data;
+    uint32_t data_size;
+    uint8_t *data;
     QTAILQ_ENTRY(sPAPREventLogEntry) next;
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 1/6] spapr: ensure device trees are always associated with DRC
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 1/6] spapr: ensure device trees are always associated with DRC Jianjun Duan
@ 2016-05-25  5:20   ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-05-25  5:20 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, qemu-ppc, mdroth, amit.shah, quintela

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

On Tue, May 24, 2016 at 10:55:04AM -0700, Jianjun Duan wrote:
> There are possible racing situations involving hotplug events and
> guest migration. For cases where a hotplug event is migrated, or
> the guest is in the process of fetching device tree at the time of
> migration, we need to ensure the device tree is created and
> associated with the corresponding DRC for devices that were
> hotplugged on the source, but 'coldplugged' on the target.
> 
> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>

Applied to ppc-for-2.7, thanks.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ Jianjun Duan
@ 2016-05-25  5:38   ` David Gibson
  2016-05-25  8:14   ` Paolo Bonzini
  1 sibling, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-05-25  5:38 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, qemu-ppc, mdroth, amit.shah, quintela

[-- Attachment #1: Type: text/plain, Size: 12570 bytes --]

On Tue, May 24, 2016 at 10:55:07AM -0700, Jianjun Duan wrote:
> A recursive structure has elements of the same type in itself. Currently
> we cannot directly transfer a QTAILQ instance, or any recursive
> structure such as lists in migration because of the limitation in the
> migration code. Here we introduce a general approach to transfer such
> structures. In our approach a recursive structure is tagged with VMS_CSTM.
> We extend VMStateField with 3 fields: meta_data to store the meta data
> about the recursive structure in question, extend_get to load the structure
> from migration stream to memory, and extend_put to dump the structure into
> the migration stream. This extension mirrors VMStateInfo. We then modify
> vmstate_save_state and vmstate_load_state so that when VMS_CSTM is
> encountered, extend_put and extend_get are called respectively with the
> knowledge of the meta data.

All the talk about recursive structures just obscures the issue.  The
point is you're building a way of migrating QTAILQs - just stick to
that.

> To make it work for QTAILQ in qemu/queue.h, we created the meta data
> format, extend_get and extend_put for it. We will use this approach to
> transfer pending_events and ccs_list in spapr state.
> 
> We also create some macros in qemu/queue.h to get the layout information
> about QTAILQ. This ensures that we do not depend on the implementation
> details about QTAILQ in the migration code.

I'm interested to see what Juan and Dave Gilbert think about this.


> 
> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> ---
>  include/migration/vmstate.h | 59 +++++++++++++++++++++++++++++++
>  include/qemu/queue.h        | 38 ++++++++++++++++++++
>  migration/vmstate.c         | 84 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 181 insertions(+)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 1622638..bf57b25 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -183,6 +183,8 @@ enum VMStateFlags {
>       * to determine the number of entries in the array. Only valid in
>       * combination with one of VMS_VARRAY*. */
>      VMS_MULTIPLY_ELEMENTS = 0x4000,
> +    /* For fields which need customized handling, such as QTAILQ in queue.h*/
> +    VMS_CSTM            = 0x8000,
>  };
>  
>  typedef struct {
> @@ -198,6 +200,14 @@ typedef struct {
>      const VMStateDescription *vmsd;
>      int version_id;
>      bool (*field_exists)(void *opaque, int version_id);
> +    /*
> +     * Following 3 fields are for VMStateField which needs customized handling,
> +     * such as QTAILQ in qemu/queue.h, lists, and tree.
> +     */
> +    const void *meta_data;
> +    int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
> +    void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
> +                       QJSON *vmdesc);
>  } VMStateField;
>  
>  struct VMStateDescription {
> @@ -654,6 +664,18 @@ extern const VMStateInfo vmstate_info_bitmap;
>      .offset       = offsetof(_state, _field),                        \
>  }
>  
> +/* For fields that need customized handling, such as queue, list */
> +#define VMSTATE_CSTM_V(_field, _state, _version, _meta_data, _get, _put)  \
> +{                                                                         \
> +    .name         = (stringify(_field)),                                  \
> +    .version_id   = (_version),                                           \
> +    .flags        = VMS_CSTM,                                             \
> +    .offset       = offsetof(_state, _field),                             \
> +    .meta_data    = &(_meta_data),                                        \
> +    .extend_get   = (_get),                                               \
> +    .extend_put   = (_put),                                               \
> +}
> +
>  /* _f : field name
>     _f_n : num of elements field_name
>     _n : num of elements
> @@ -970,4 +992,41 @@ int64_t self_announce_delay(int round)
>  
>  void dump_vmstate_json_to_file(FILE *out_fp);
>  
> +
> +/* Meta data for QTAILQ */
> +typedef struct QTAILQMetaData {
> +    /* the offset of tqh_first in QTAILQ_HEAD */
> +    size_t first;
> +    /* the offset of tqh_last in QTAILQ_HEAD */
> +    size_t last;
> +    /* the offset of tqe_next in QTAILQ_ENTRY */
> +    size_t next;
> +    /* the offset of tqe_prev in QTAILQ_ENTRY */
> +    size_t prev;
> +    /* the offset of QTAILQ_ENTRY in a QTAILQ element*/
> +    size_t entry;
> +    /* size of a QTAILQ element */
> +    size_t size;
> +    /* vmsd of a QTAILQ element */
> +    const VMStateDescription *vmsd;
> +} QTAILQMetaData;
> +
> +#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
> +    .first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),        \
> +    .last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)),         \
> +    .next = QTAILQ_NEXT_OFFSET(_type, _next),                          \
> +    .prev = QTAILQ_PREV_OFFSET(_type, _next),                          \
> +    .entry = offsetof(_type, _next),                                   \
> +    .size = sizeof(_type),                                             \
> +    .vmsd = &(_vmsd),                                                  \
> +}
> +
> +int vmstate_get_qtailq(QEMUFile *f, const void *metadata, void *opaque);
> +void vmstate_put_qtailq(QEMUFile *f, const void *metadata,
> +                         void *opaque, QJSON *vmdesc);
> +
> +/* VMStateField for QTAILQ field */
> +#define VMSTATE_QTAILQ_V(_field, _state, _version, _meta_data)               \
> +    VMSTATE_CSTM_V(_field, _state, _version, _meta_data, vmstate_get_qtailq, \
> +                   vmstate_put_qtailq)
>  #endif
> diff --git a/include/qemu/queue.h b/include/qemu/queue.h
> index f781aa2..46962d7 100644
> --- a/include/qemu/queue.h
> +++ b/include/qemu/queue.h
> @@ -437,3 +437,41 @@ struct {                                                                \
>          (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last))
>  
>  #endif  /* !QEMU_SYS_QUEUE_H_ */
> +
> +/*
> + * Offsets of layout of a tail queue head.
> + */
> +#define QTAILQ_FIRST_OFFSET(head_type) \
> +        ((size_t) ((char *) &((head_type *)0)->tqh_first - (char *)0))
> +
> +#define QTAILQ_LAST_OFFSET(head_type) \
> +        ((size_t) ((char *) &((head_type *)0)->tqh_last - (char *)0))
> +
> +/*
> + * Offsets of layout of a tail queue element.
> + */
> +#define QTAILQ_NEXT_OFFSET(ele_type, field)                            \
> +        ((size_t) ((char *) &((ele_type *)0)->field.tqe_next -         \
> +                   (char *) &((ele_type *)0)->field))
> +
> +#define QTAILQ_PREV_OFFSET(ele_type, field) \
> +        ((size_t) ((char *) &((ele_type *)0)->field.tqe_prev -         \
> +                   (char *) &((ele_type *)0)->field))
> +/*
> + * Tail queue tranversal using pointer arithmetic.
> + */
> +#define QTAILQ_RAW_FOREACH(elm, head, entry, first, next)              \
> +        for ((elm) = *((void **) ((char *) (head) + (first)));         \
> +             (elm);                                                    \
> +             (elm) = *((void **) ((char *) (elm) + (entry) + (next))))
> +/*
> + * Tail queue insertion using pointer arithmetic.
> + */
> +#define QTAILQ_RAW_INSERT_TAIL(head, elm, entry, first, last, next, prev) do { \
> +            *((void **) ((char *) (elm) + (entry) + (next))) = NULL;           \
> +            *((void **) ((char *) (elm) + (entry) + (prev))) =                 \
> +                *((void **) ((char *) (head) + (last)));                       \
> +            **((void ***)((char *) (head) + (last))) = (elm);                  \
> +            *((void **) ((char *) (head) + (last))) =                          \
> +                (void *) ((char *) (elm) + (entry) + (next));                  \
> +} while (/*CONSTCOND*/0)
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index bf3d5db..47cd052 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -5,6 +5,7 @@
>  #include "migration/vmstate.h"
>  #include "qemu/bitops.h"
>  #include "qemu/error-report.h"
> +#include "qemu/queue.h"
>  #include "trace.h"
>  #include "qjson.h"
>  
> @@ -121,6 +122,8 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
>                  if (field->flags & VMS_STRUCT) {
>                      ret = vmstate_load_state(f, field->vmsd, addr,
>                                               field->vmsd->version_id);
> +                } else if (field->flags & VMS_CSTM) {
> +                    ret = field->extend_get(f, field->meta_data, addr);
>                  } else {
>                      ret = field->info->get(f, addr, size);
>  
> @@ -193,6 +196,8 @@ static const char *vmfield_get_type_name(VMStateField *field)
>  
>      if (field->flags & VMS_STRUCT) {
>          type = "struct";
> +    } else if (field->flags & VMS_CSTM) {
> +        type = "customized";
>      } else if (field->info->name) {
>          type = field->info->name;
>      }
> @@ -327,6 +332,8 @@ void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
>                  }
>                  if (field->flags & VMS_STRUCT) {
>                      vmstate_save_state(f, field->vmsd, addr, vmdesc_loop);
> +                } else if  (field->flags & VMS_CSTM) {
> +                    field->extend_put(f, field->meta_data, addr, vmdesc_loop);
>                  } else {
>                      field->info->put(f, addr, size);
>                  }
> @@ -916,3 +923,80 @@ const VMStateInfo vmstate_info_bitmap = {
>      .get = get_bitmap,
>      .put = put_bitmap,
>  };
> +
> +/* extend_get for QTAILQ */
> +int vmstate_get_qtailq(QEMUFile *f, const void *metadata, void *opaque)
> +{
> +    bool link;
> +    int ret = 0;
> +    size_t first, last, next, prev, entry, size;
> +    QTAILQMetaData *data = (QTAILQMetaData *)metadata;
> +    const VMStateDescription *vmsd = data->vmsd;
> +    int version_id = vmsd->version_id;
> +    void *elm;
> +
> +    first = data->first;
> +    last = data->last;
> +    next = data->next;
> +    prev = data->prev;
> +    entry = data->entry;
> +    size = data->size;
> +
> +    trace_vmstate_load_state(vmsd->name, version_id);
> +    if (version_id > vmsd->version_id) {
> +        trace_vmstate_load_state_end(vmsd->name, "too new", -EINVAL);
> +        return -EINVAL;
> +    }
> +    if (version_id < vmsd->minimum_version_id) {
> +        trace_vmstate_load_state_end(vmsd->name, "too old", -EINVAL);
> +        return -EINVAL;
> +    }
> +
> +    while (true) {
> +        vmstate_info_bool.get(f, &link, sizeof(bool));
> +        if (!link) {
> +            break;
> +        }
> +
> +        elm =  g_malloc(size);
> +        ret = vmstate_load_state(f, vmsd, elm, version_id);
> +        if (ret) {
> +            return ret;
> +        }
> +        QTAILQ_RAW_INSERT_TAIL(opaque, elm, entry, first, last, next, prev);
> +    }
> +
> +    trace_vmstate_load_state_end(vmsd->name, "end", ret);
> +    return ret;
> +}
> +
> +/* extend_get for QTAILQ */
> +void vmstate_put_qtailq(QEMUFile *f, const void *metadata, void *opaque,
> +                         QJSON *vmdesc)
> +{
> +    bool link = true;
> +    size_t first, next, entry;
> +    QTAILQMetaData *data = (QTAILQMetaData *)metadata;
> +    const VMStateDescription *vmsd = data->vmsd;
> +    void *elm;
> +
> +    first = data->first;
> +    next = data->next;
> +    entry = data->entry;
> +
> +    if (vmdesc) {
> +        json_prop_str(vmdesc, "vmsd_name", vmsd->name);
> +        json_prop_int(vmdesc, "version", vmsd->version_id);
> +        json_start_array(vmdesc, "fields");
> +    }
> +
> +    QTAILQ_RAW_FOREACH(elm, opaque, entry, first, next) {
> +        vmstate_info_bool.put(f, &link, sizeof(bool));
> +        vmstate_save_state(f, vmsd, elm, vmdesc);
> +    }
> +    link = false;
> +    vmstate_info_bool.put(f, &link, sizeof(bool));
> +    if (vmdesc) {
> +        json_end_array(vmdesc);
> +    }
> +}

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ Jianjun Duan
  2016-05-25  5:38   ` David Gibson
@ 2016-05-25  8:14   ` Paolo Bonzini
  2016-05-25 16:37     ` Jianjun Duan
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2016-05-25  8:14 UTC (permalink / raw)
  To: Jianjun Duan, qemu-devel; +Cc: quintela, mdroth, qemu-ppc, amit.shah, david



On 24/05/2016 19:55, Jianjun Duan wrote:
> +/*
> + * Offsets of layout of a tail queue head.
> + */
> +#define QTAILQ_FIRST_OFFSET(head_type) \
> +        ((size_t) ((char *) &((head_type *)0)->tqh_first - (char *)0))
> +
> +#define QTAILQ_LAST_OFFSET(head_type) \
> +        ((size_t) ((char *) &((head_type *)0)->tqh_last - (char *)0))
> +
> +/*
> + * Offsets of layout of a tail queue element.
> + */
> +#define QTAILQ_NEXT_OFFSET(ele_type, field)                            \
> +        ((size_t) ((char *) &((ele_type *)0)->field.tqe_next -         \
> +                   (char *) &((ele_type *)0)->field))
> +
> +#define QTAILQ_PREV_OFFSET(ele_type, field) \
> +        ((size_t) ((char *) &((ele_type *)0)->field.tqe_prev -         \
> +                   (char *) &((ele_type *)0)->field))

Please use offsetof.

> +    /*
> +     * Following 3 fields are for VMStateField which needs customized handling,
> +     * such as QTAILQ in qemu/queue.h, lists, and tree.
> +     */
> +    const void *meta_data;
> +    int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
> +    void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
> +                       QJSON *vmdesc);
>  } VMStateField;

Do not add these two function pointers to VMStateField, instead add
QJSON* and VMStateField* arguments as needed to VMStateInfo's get and put.

> +#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
> +    .first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),        \
> +    .last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)),         \
> +    .next = QTAILQ_NEXT_OFFSET(_type, _next),                          \
> +    .prev = QTAILQ_PREV_OFFSET(_type, _next),                          \
> +    .entry = offsetof(_type, _next),                                   \
> +    .size = sizeof(_type),                                             \
> +    .vmsd = &(_vmsd),                                                  \
> +}

.last and .prev are unnecessary, since they come just after .first and
.next (and their use is hidden behind QTAILQ_RAW_*).  .first and .next
can be placed in .offset and .num_offset respectively.  So you don't
really need an extra metadata struct.

If you prefer you could have something like

    union {
        size_t num_offset;    /* VMS_VARRAY */
        size_t size_offset;   /* VMS_VBUFFER */
        size_t next_offset;   /* VMS_TAILQ */
    } offsets;

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-25  8:14   ` Paolo Bonzini
@ 2016-05-25 16:37     ` Jianjun Duan
  2016-05-25 17:51       ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Jianjun Duan @ 2016-05-25 16:37 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: quintela, mdroth, qemu-ppc, amit.shah, david



On 05/25/2016 01:14 AM, Paolo Bonzini wrote:
> 
> 
> On 24/05/2016 19:55, Jianjun Duan wrote:
>> +/*
>> + * Offsets of layout of a tail queue head.
>> + */
>> +#define QTAILQ_FIRST_OFFSET(head_type) \
>> +        ((size_t) ((char *) &((head_type *)0)->tqh_first - (char *)0))
>> +
>> +#define QTAILQ_LAST_OFFSET(head_type) \
>> +        ((size_t) ((char *) &((head_type *)0)->tqh_last - (char *)0))
>> +
>> +/*
>> + * Offsets of layout of a tail queue element.
>> + */
>> +#define QTAILQ_NEXT_OFFSET(ele_type, field)                            \
>> +        ((size_t) ((char *) &((ele_type *)0)->field.tqe_next -         \
>> +                   (char *) &((ele_type *)0)->field))
>> +
>> +#define QTAILQ_PREV_OFFSET(ele_type, field) \
>> +        ((size_t) ((char *) &((ele_type *)0)->field.tqe_prev -         \
>> +                   (char *) &((ele_type *)0)->field))
> 
> Please use offsetof.
OK.

> 
>> +    /*
>> +     * Following 3 fields are for VMStateField which needs customized handling,
>> +     * such as QTAILQ in qemu/queue.h, lists, and tree.
>> +     */
>> +    const void *meta_data;
>> +    int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
>> +    void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
>> +                       QJSON *vmdesc);
>>  } VMStateField;
> 
> Do not add these two function pointers to VMStateField, instead add
> QJSON* and VMStateField* arguments as needed to VMStateInfo's get and put.
> 
That is definitely the ideal way. However, VMStateInfo's get/put are
already used extensively. If I change them, I need change all the
calling sites of them. Not very sure about whether it will be welcomed.
>> +#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
>> +    .first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),        \
>> +    .last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)),         \
>> +    .next = QTAILQ_NEXT_OFFSET(_type, _next),                          \
>> +    .prev = QTAILQ_PREV_OFFSET(_type, _next),                          \
>> +    .entry = offsetof(_type, _next),                                   \
>> +    .size = sizeof(_type),                                             \
>> +    .vmsd = &(_vmsd),                                                  \
>> +}
> 
> .last and .prev are unnecessary, since they come just after .first and
> .next (and their use is hidden behind QTAILQ_RAW_*).  .first and .next
> can be placed in .offset and .num_offset respectively.  So you don't
> really need an extra metadata struct.
> 
> If you prefer you could have something like
> 
>     union {
>         size_t num_offset;    /* VMS_VARRAY */
>         size_t size_offset;   /* VMS_VBUFFER */
>         size_t next_offset;   /* VMS_TAILQ */
>     } offsets;
> 
Actually I explored the approach in which I use a VMSD to encode all the
information. But a VMSD describes actual memory layout. Interpreting it
another way could be confusing.

One of the assumption about QTAILQ is that we can only use the
interfaces defined in queue.h to access it. I intend not to depend on
its actually layout. From this perspective, these 6 parameters are
needed.

> Thanks,
> 
> Paolo
> 
Thanks,
Jianjun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-25 16:37     ` Jianjun Duan
@ 2016-05-25 17:51       ` Paolo Bonzini
  2016-05-25 18:30         ` Jianjun Duan
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2016-05-25 17:51 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, quintela, mdroth, qemu-ppc, amit shah, david

> >> +    /*
> >> +     * Following 3 fields are for VMStateField which needs customized
> >> handling,
> >> +     * such as QTAILQ in qemu/queue.h, lists, and tree.
> >> +     */
> >> +    const void *meta_data;
> >> +    int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
> >> +    void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
> >> +                       QJSON *vmdesc);
> >>  } VMStateField;
> > 
> > Do not add these two function pointers to VMStateField, instead add
> > QJSON* and VMStateField* arguments as needed to VMStateInfo's get and put.
> 
> That is definitely the ideal way. However, VMStateInfo's get/put are
> already used extensively. If I change them, I need change all the
> calling sites of them. Not very sure about whether it will be welcomed.

Sure, don't be worried. :)  True, there are quite a few VMStateInfos (about
35) but the changes are simple.

> >> +#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
> >> +    .first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),        \
> >> +    .last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)),         \
> >> +    .next = QTAILQ_NEXT_OFFSET(_type, _next),                          \
> >> +    .prev = QTAILQ_PREV_OFFSET(_type, _next),                          \
> >> +    .entry = offsetof(_type, _next),                                   \
> >> +    .size = sizeof(_type),                                             \
> >> +    .vmsd = &(_vmsd),                                                  \
> >> +}
> > 
> > .last and .prev are unnecessary, since they come just after .first and
> > .next (and their use is hidden behind QTAILQ_RAW_*).  .first and .next
                                                          ^^^^^^^^^^^^^^^^

Actually, .first and .next are always 0.  I misread your changes to qemu/queue.h.
See below.

> > can be placed in .offset and .num_offset respectively.  So you don't
> > really need an extra metadata struct.
> > 
> > If you prefer you could have something like
> > 
> >     union {
> >         size_t num_offset;    /* VMS_VARRAY */
> >         size_t size_offset;   /* VMS_VBUFFER */
> >         size_t next_offset;   /* VMS_TAILQ */
> >     } offsets;
>
> Actually I explored the approach in which I use a VMSD to encode all the
> information. But a VMSD describes actual memory layout. Interpreting it
> another way could be confusing.
> 
> One of the assumption about QTAILQ is that we can only use the
> interfaces defined in queue.h to access it. I intend not to depend on
> its actually layout. From this perspective, these 6 parameters are
> needed.

You are already adding QTAILQ_RAW_*, aren't you?  Those interfaces
would need to know about the layout, and you are passing redundant
information:

- .next_offset should always be 0
- .prev_offset should always be sizeof(void *)
- .first_offset should always be 0
- .last_offset should always be sizeof(void *)

so you only need head and entry, which you can store in .offset and
.num_offset.  The .vmsd field you have in ->metadata can be stored
in VMStateField's .vmsd, and likewise for .size (which can be stored
in VMStateField's .vmsd).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-25 17:51       ` Paolo Bonzini
@ 2016-05-25 18:30         ` Jianjun Duan
  2016-05-25 19:22           ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Jianjun Duan @ 2016-05-25 18:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, quintela, mdroth, qemu-ppc, amit shah, david


I will try to explain my design rationale in details here.

1 QTAILQ should only be accessed using the interfaces defined in
queue.h. Its structs should not be directly used. So I created
interfaces in queue.h to query about its layout. If the implementation
is changed, these interfaces should be changed accordingly. Code using
these interfaces should not break.

2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
doesn't exactly know the structs of QTAILAQ head and entry. So pointer
arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
parameters to be passed in. So it is not redundant if we only want to
only use the interfaces.

3 At this moment, vmstate_load_state/vmstate_put_state couldn't handle a
queue, or a list, or another recursive structure. To make it
extensible, I think a metadata is needed. The idea is for any
structure which needs special handling, customized metadata/put/get
should provide enough flexibility to hack around.

There are two issues I tried to address. One is the recursive queue,
another is working around of hiding of the QTAILQ implementation
details. It seems we need to agree on how the latter should be
addressed.

I will give it a try to fix those 35 calling sites of VMStateInfo.

Thanks,
Jianjun

On 05/25/2016 10:51 AM, Paolo Bonzini wrote:
>>>> +    /*
>>>> +     * Following 3 fields are for VMStateField which needs customized
>>>> handling,
>>>> +     * such as QTAILQ in qemu/queue.h, lists, and tree.
>>>> +     */
>>>> +    const void *meta_data;
>>>> +    int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
>>>> +    void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
>>>> +                       QJSON *vmdesc);
>>>>  } VMStateField;
>>>
>>> Do not add these two function pointers to VMStateField, instead add
>>> QJSON* and VMStateField* arguments as needed to VMStateInfo's get and put.
>>
>> That is definitely the ideal way. However, VMStateInfo's get/put are
>> already used extensively. If I change them, I need change all the
>> calling sites of them. Not very sure about whether it will be welcomed.
> 
> Sure, don't be worried. :)  True, there are quite a few VMStateInfos (about
> 35) but the changes are simple.
> 
>>>> +#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
>>>> +    .first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),        \
>>>> +    .last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)),         \
>>>> +    .next = QTAILQ_NEXT_OFFSET(_type, _next),                          \
>>>> +    .prev = QTAILQ_PREV_OFFSET(_type, _next),                          \
>>>> +    .entry = offsetof(_type, _next),                                   \
>>>> +    .size = sizeof(_type),                                             \
>>>> +    .vmsd = &(_vmsd),                                                  \
>>>> +}
>>>
>>> .last and .prev are unnecessary, since they come just after .first and
>>> .next (and their use is hidden behind QTAILQ_RAW_*).  .first and .next
>                                                           ^^^^^^^^^^^^^^^^
> 
> Actually, .first and .next are always 0.  I misread your changes to qemu/queue.h.
> See below.
> 
>>> can be placed in .offset and .num_offset respectively.  So you don't
>>> really need an extra metadata struct.
>>>
>>> If you prefer you could have something like
>>>
>>>     union {
>>>         size_t num_offset;    /* VMS_VARRAY */
>>>         size_t size_offset;   /* VMS_VBUFFER */
>>>         size_t next_offset;   /* VMS_TAILQ */
>>>     } offsets;
>>
>> Actually I explored the approach in which I use a VMSD to encode all the
>> information. But a VMSD describes actual memory layout. Interpreting it
>> another way could be confusing.
>>
>> One of the assumption about QTAILQ is that we can only use the
>> interfaces defined in queue.h to access it. I intend not to depend on
>> its actually layout. From this perspective, these 6 parameters are
>> needed.
> 
> You are already adding QTAILQ_RAW_*, aren't you?  Those interfaces
> would need to know about the layout, and you are passing redundant
> information:
> 
> - .next_offset should always be 0
> - .prev_offset should always be sizeof(void *)
> - .first_offset should always be 0
> - .last_offset should always be sizeof(void *)
> 
> so you only need head and entry, which you can store in .offset and
> .num_offset.  The .vmsd field you have in ->metadata can be stored
> in VMStateField's .vmsd, and likewise for .size (which can be stored
> in VMStateField's .vmsd).
> 
> Thanks,
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-25 18:30         ` Jianjun Duan
@ 2016-05-25 19:22           ` Paolo Bonzini
  2016-05-25 20:17             ` Jianjun Duan
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2016-05-25 19:22 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, quintela, mdroth, qemu-ppc, amit shah, david

> 1 QTAILQ should only be accessed using the interfaces defined in
> queue.h. Its structs should not be directly used. So I created
> interfaces in queue.h to query about its layout. If the implementation
> is changed, these interfaces should be changed accordingly. Code using
> these interfaces should not break.

You don't need to query the layout, as long as the knowledge
remains hidden in QTAILQ_RAW_* macros.  And because QTAILQ_*_OFFSET
returns constant values, you can just put the knowledge of the offsets
directly in QTAILQ_RAW_FOREACH and QTAILQ_RAW_INSERT_TAIL.

> 2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
> doesn't exactly know the structs of QTAILAQ head and entry. So pointer
> arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
> parameters to be passed in. So it is not redundant if we only want to
> only use the interfaces.

No, you only need two.  The other four are internal to qemu/queue.h.
Just like QTAILQ users do not know about tqh_* and tqe_*, they need not
know about their offsets, only the fields that contain them.

> 3 At this moment, vmstate_load_state/vmstate_put_state couldn't handle a
> queue, or a list, or another recursive structure. To make it
> extensible, I think a metadata is needed. The idea is for any
> structure which needs special handling, customized metadata/put/get
> should provide enough flexibility to hack around.

I think your solution is a bit overengineered.  If the metadata can
fit in the VMStateField, you can use VMStateField.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-25 19:22           ` Paolo Bonzini
@ 2016-05-25 20:17             ` Jianjun Duan
  2016-05-26  7:11               ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Jianjun Duan @ 2016-05-25 20:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, quintela, mdroth, qemu-ppc, amit shah, david



On 05/25/2016 12:22 PM, Paolo Bonzini wrote:
>> 1 QTAILQ should only be accessed using the interfaces defined in
>> queue.h. Its structs should not be directly used. So I created
>> interfaces in queue.h to query about its layout. If the implementation
>> is changed, these interfaces should be changed accordingly. Code using
>> these interfaces should not break.
> 
> You don't need to query the layout, as long as the knowledge
> remains hidden in QTAILQ_RAW_* macros.  And because QTAILQ_*_OFFSET
> returns constant values, you can just put the knowledge of the offsets
> directly in QTAILQ_RAW_FOREACH and QTAILQ_RAW_INSERT_TAIL.
> 
>> 2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
>> doesn't exactly know the structs of QTAILAQ head and entry. So pointer
>> arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
>> parameters to be passed in. So it is not redundant if we only want to
>> only use the interfaces.
> 
> No, you only need two.  The other four are internal to qemu/queue.h.
> Just like QTAILQ users do not know about tqh_* and tqe_*, they need not
> know about their offsets, only the fields that contain them.
> 

In vmstate_load/put_state usually we use a VMSD to describe the type for
dump/load purpose. The metadata for the QTAILQ is for the same purpose.
Of course we can encode the information in a VMSD or VMStateField if
we don't have to change much.

The user may only care the position of head and entry. But to
implement QTAILQ_RAW_***, we need more offset information than that.
If we don't query the offsets using something like offset() and store
it in a metadata, we have to make the assumption that all the pointer
types have the same size. Since in vmstate_load/put_state we don't have
any type information about the QTAILQ instance, we cannot use offset()
in QTAILQ_RAW_*** macros. May have to stick the constants there for
first/last/next/prev in QTAILQ_RAW_***. Not sure if it will work for
all arches.
>> 3 At this moment, vmstate_load_state/vmstate_put_state couldn't handle a
>> queue, or a list, or another recursive structure. To make it
>> extensible, I think a metadata is needed. The idea is for any
>> structure which needs special handling, customized metadata/put/get
>> should provide enough flexibility to hack around.
> 
> I think your solution is a bit overengineered.  If the metadata can
> fit in the VMStateField, you can use VMStateField.
> 
> Thanks,
> 
> Paolo
> 

Thanks,
Jianjun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-25 20:17             ` Jianjun Duan
@ 2016-05-26  7:11               ` Paolo Bonzini
  2016-05-26 16:43                 ` Jianjun Duan
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2016-05-26  7:11 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: quintela, mdroth, qemu-devel, qemu-ppc, amit shah, david



On 25/05/2016 22:17, Jianjun Duan wrote:
> 
> 
> On 05/25/2016 12:22 PM, Paolo Bonzini wrote:
>>> 1 QTAILQ should only be accessed using the interfaces defined in
>>> queue.h. Its structs should not be directly used. So I created
>>> interfaces in queue.h to query about its layout. If the implementation
>>> is changed, these interfaces should be changed accordingly. Code using
>>> these interfaces should not break.
>>
>> You don't need to query the layout, as long as the knowledge
>> remains hidden in QTAILQ_RAW_* macros.  And because QTAILQ_*_OFFSET
>> returns constant values, you can just put the knowledge of the offsets
>> directly in QTAILQ_RAW_FOREACH and QTAILQ_RAW_INSERT_TAIL.
>>
>>> 2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
>>> doesn't exactly know the structs of QTAILAQ head and entry. So pointer
>>> arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
>>> parameters to be passed in. So it is not redundant if we only want to
>>> only use the interfaces.
>>
>> No, you only need two.  The other four are internal to qemu/queue.h.
>> Just like QTAILQ users do not know about tqh_* and tqe_*, they need not
>> know about their offsets, only the fields that contain them.
> 
> In vmstate_load/put_state usually we use a VMSD to describe the type for
> dump/load purpose. The metadata for the QTAILQ is for the same purpose.
> Of course we can encode the information in a VMSD or VMStateField if
> we don't have to change much.

A VMSD describes the "sub-structure" of the field.  Here, the
substructure of a QTAILQ is the structure of each entry.

> The user may only care the position of head and entry. But to
> implement QTAILQ_RAW_***, we need more offset information than that.
> If we don't query the offsets using something like offset() and store
> it in a metadata, we have to make the assumption that all the pointer
> types have the same size.

We make this assumption elsewhere anyway.  _You_ are making it by doing
loads and stores through void** in QTAILQ_RAW_*.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-26  7:11               ` Paolo Bonzini
@ 2016-05-26 16:43                 ` Jianjun Duan
  2016-05-26 16:52                   ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Jianjun Duan @ 2016-05-26 16:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: quintela, mdroth, qemu-devel, qemu-ppc, amit shah, david



On 05/26/2016 12:11 AM, Paolo Bonzini wrote:
> 
> 
> On 25/05/2016 22:17, Jianjun Duan wrote:
>>
>>
>> On 05/25/2016 12:22 PM, Paolo Bonzini wrote:
>>>> 1 QTAILQ should only be accessed using the interfaces defined in
>>>> queue.h. Its structs should not be directly used. So I created
>>>> interfaces in queue.h to query about its layout. If the implementation
>>>> is changed, these interfaces should be changed accordingly. Code using
>>>> these interfaces should not break.
>>>
>>> You don't need to query the layout, as long as the knowledge
>>> remains hidden in QTAILQ_RAW_* macros.  And because QTAILQ_*_OFFSET
>>> returns constant values, you can just put the knowledge of the offsets
>>> directly in QTAILQ_RAW_FOREACH and QTAILQ_RAW_INSERT_TAIL.
>>>
>>>> 2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
>>>> doesn't exactly know the structs of QTAILAQ head and entry. So pointer
>>>> arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
>>>> parameters to be passed in. So it is not redundant if we only want to
>>>> only use the interfaces.
>>>
>>> No, you only need two.  The other four are internal to qemu/queue.h.
>>> Just like QTAILQ users do not know about tqh_* and tqe_*, they need not
>>> know about their offsets, only the fields that contain them.
>>
>> In vmstate_load/put_state usually we use a VMSD to describe the type for
>> dump/load purpose. The metadata for the QTAILQ is for the same purpose.
>> Of course we can encode the information in a VMSD or VMStateField if
>> we don't have to change much.
> 
> A VMSD describes the "sub-structure" of the field.  Here, the
> substructure of a QTAILQ is the structure of each entry.
> 
>> The user may only care the position of head and entry. But to
>> implement QTAILQ_RAW_***, we need more offset information than that.
>> If we don't query the offsets using something like offset() and store
>> it in a metadata, we have to make the assumption that all the pointer
>> types have the same size.
> 
> We make this assumption elsewhere anyway.  _You_ are making it by doing
> loads and stores through void** in QTAILQ_RAW_*.
I thought every data pointer can be converted to and from void type of
pointer. Anyway we can make that assumption here.

How about alignment? Is it OK to assume pointers are always aligned at
4byte for 32 bit and 8byte for 64 bit? If yes then we can just add the
size of a void pointer to first to get last in QTAILQ head. We will get
something like:
first=0
last = sizeof(void *)
next=0
prev=sizeof(void *)

Thanks,
Jianjun



> 
> Thanks,
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ
  2016-05-26 16:43                 ` Jianjun Duan
@ 2016-05-26 16:52                   ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2016-05-26 16:52 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: quintela, mdroth, qemu-devel, qemu-ppc, amit shah, david



On 26/05/2016 18:43, Jianjun Duan wrote:
>>> The user may only care the position of head and entry. But to
>>> implement QTAILQ_RAW_***, we need more offset information than that.
>>> If we don't query the offsets using something like offset() and store
>>> it in a metadata, we have to make the assumption that all the pointer
>>> types have the same size.
>>
>> We make this assumption elsewhere anyway.  _You_ are making it by doing
>> loads and stores through void** in QTAILQ_RAW_*.
> 
> I thought every data pointer can be converted to and from void type of
> pointer. Anyway we can make that assumption here.

Yes, you can do that.

> How about alignment? Is it OK to assume pointers are always aligned at
> 4byte for 32 bit and 8byte for 64 bit? If yes then we can just add the
> size of a void pointer to first to get last in QTAILQ head. We will get
> something like:
> first=0
> last = sizeof(void *)
> next=0
> prev=sizeof(void *)

Yes, that's my point.  These four are constants.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-05-26 16:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-24 17:55 [Qemu-devel] [QEMU RFC PATCH v2 0/6] Migration: ensure hotplug and migration work together Jianjun Duan
2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 1/6] spapr: ensure device trees are always associated with DRC Jianjun Duan
2016-05-25  5:20   ` David Gibson
2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 2/6] Migration: Defined VMStateDescription struct for spapr_drc Jianjun Duan
2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 3/6] vmstate: Define VARRAY with VMS_ALLOC Jianjun Duan
2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ Jianjun Duan
2016-05-25  5:38   ` David Gibson
2016-05-25  8:14   ` Paolo Bonzini
2016-05-25 16:37     ` Jianjun Duan
2016-05-25 17:51       ` Paolo Bonzini
2016-05-25 18:30         ` Jianjun Duan
2016-05-25 19:22           ` Paolo Bonzini
2016-05-25 20:17             ` Jianjun Duan
2016-05-26  7:11               ` Paolo Bonzini
2016-05-26 16:43                 ` Jianjun Duan
2016-05-26 16:52                   ` Paolo Bonzini
2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 5/6] Migration: migrate ccs_list in spapr state Jianjun Duan
2016-05-24 17:55 ` [Qemu-devel] [QEMU RFC PATCH v2 6/6] Migration: migrate pending_events of " Jianjun Duan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.