All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together
@ 2016-04-15 20:33 Jianjun Duan
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 1/5] spapr: ensure device trees are always associated with DRC Jianjun Duan
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Jianjun Duan @ 2016-04-15 20:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, Jianjun Duan

To make guest device (PCI, CPU and memory) hotplug work together 
with guest migration, spapr drc state needs be transmitted in
migration. This patch defines the VMStateDescription struct for
spapr drc state to enable it.

To fix the potential racing between hotplug events on guest and 
guest migration, ccs_list and pending_events of spapr state need be 
transmitted in migration. This patch also takes care of it. 
Especially, temporary cache is used for each of them to enable the 
transmission. 

Alexey Kardashevskiy (1):
  vmstate: Define VARRAY with VMS_ALLOC

Jianjun Duan (4):
  spapr: ensure device trees are always associated with DRC
  Migration: Defined VMStateDescription struct for spapr_drc
  Migration: migrate ccs_list in spapr state
  Migration: migrate pending_events of spapr state

 hw/ppc/spapr.c              | 129 ++++++++++++++++++++++++++++++++++++++++----
 hw/ppc/spapr_drc.c          |  15 ++++++
 hw/ppc/spapr_events.c       |  24 +++++----
 hw/ppc/spapr_pci.c          |  12 ++---
 hw/ppc/spapr_rtas.c         |   2 +
 include/hw/ppc/spapr.h      |  25 ++++++++-
 include/migration/vmstate.h |  18 ++++++-
 7 files changed, 196 insertions(+), 29 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 1/5] spapr: ensure device trees are always associated with DRC
  2016-04-15 20:33 [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together Jianjun Duan
@ 2016-04-15 20:33 ` Jianjun Duan
  2016-04-20  4:30   ` David Gibson
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 2/5] Migration: Defined VMStateDescription struct for spapr_drc Jianjun Duan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jianjun Duan @ 2016-04-15 20:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, Jianjun Duan

There are possible racing situations involving hotplug events and
guest migration. For cases where a hotplug event is migrated, or
the guest is in the process of fetching device tree at the time of
migration, we need to ensure the device tree is created and
associated with the corresponding DRC for devices that were
hotplugged on the source, but 'coldplugged' on the target.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c     | 16 ++++++----------
 hw/ppc/spapr_pci.c | 12 +++++-------
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index feaab08..af4745c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2132,15 +2132,6 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
     int i, fdt_offset, fdt_size;
     void *fdt;
 
-    /*
-     * Check for DRC connectors and send hotplug notification to the
-     * guest only in case of hotplugged memory. This allows cold plugged
-     * memory to be specified at boot time.
-     */
-    if (!dev->hotplugged) {
-        return;
-    }
-
     for (i = 0; i < nr_lmbs; i++) {
         drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
                 addr/SPAPR_MEMORY_BLOCK_SIZE);
@@ -2154,7 +2145,12 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
         drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, errp);
         addr += SPAPR_MEMORY_BLOCK_SIZE;
     }
-    spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
+    /* send hotplug notification to the
+     * guest only in case of hotplugged memory
+     */
+    if (dev->hotplugged) {
+       spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
+    }
 }
 
 static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 8c20d34..b179e42 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1092,13 +1092,11 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector *drc,
         spapr_tce_set_need_vfio(tcet, true);
     }
 
-    if (dev->hotplugged) {
-        fdt = create_device_tree(&fdt_size);
-        fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
-        if (!fdt_start_offset) {
-            error_setg(errp, "Failed to create pci child device tree node");
-            goto out;
-        }
+    fdt = create_device_tree(&fdt_size);
+    fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
+    if (!fdt_start_offset) {
+        error_setg(errp, "Failed to create pci child device tree node");
+        goto out;
     }
 
     drck->attach(drc, DEVICE(pdev),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 2/5] Migration: Defined VMStateDescription struct for spapr_drc
  2016-04-15 20:33 [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together Jianjun Duan
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 1/5] spapr: ensure device trees are always associated with DRC Jianjun Duan
@ 2016-04-15 20:33 ` Jianjun Duan
  2016-04-20  4:32   ` David Gibson
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 3/5] vmstate: Define VARRAY with VMS_ALLOC Jianjun Duan
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jianjun Duan @ 2016-04-15 20:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, Jianjun Duan

To manage hotplug/unplug of dynamic resources such as PCI cards,
memory, and CPU on sPAPR guests, a firmware abstraction known as
a Dynamic Resource Connector (DRC) is used to assign a particular
dynamic resource to the guest, and provide an interface for the
guest to manage configuration/removal of the resource associated
with it.

To migrate the hotplugged resources in migration, the
associated DRC state need be migrated. To migrate the DRC state,
we defined the VMStateDescription struct for spapr_drc to enable
the transmission of spapr_drc state in migration.

Not all the elements in the DRC state are migrated. Only those
ones modifiable by guest actions or device add/remove operation
are migrated.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 hw/ppc/spapr_drc.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 3173940..5f7a25f 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -610,6 +610,20 @@ static void spapr_dr_connector_instance_init(Object *obj)
                         NULL, NULL, NULL, NULL);
 }
 
+static const VMStateDescription vmstate_spapr_drc = {
+    .name = "spapr_drc",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields  = (VMStateField []) {
+        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
+        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
+        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
+        VMSTATE_BOOL(configured, sPAPRDRConnector),
+        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
 {
     DeviceClass *dk = DEVICE_CLASS(k);
@@ -618,6 +632,7 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
     dk->reset = reset;
     dk->realize = realize;
     dk->unrealize = unrealize;
+    dk->vmsd = &vmstate_spapr_drc;
     drck->set_isolation_state = set_isolation_state;
     drck->set_indicator_state = set_indicator_state;
     drck->set_allocation_state = set_allocation_state;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 3/5] vmstate: Define VARRAY with VMS_ALLOC
  2016-04-15 20:33 [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together Jianjun Duan
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 1/5] spapr: ensure device trees are always associated with DRC Jianjun Duan
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 2/5] Migration: Defined VMStateDescription struct for spapr_drc Jianjun Duan
@ 2016-04-15 20:33 ` Jianjun Duan
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state Jianjun Duan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Jianjun Duan @ 2016-04-15 20:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, Alexey Kardashevskiy, Jianjun Duan

From: Alexey Kardashevskiy <aik@ozlabs.ru>

This allows dynamic allocation for migrating arrays.

Already existing VMSTATE_VARRAY_UINT32 requires an array to be
pre-allocated, however there are cases when the size is not known in
advance and there is no real need to enforce it.

This defines another variant of VMSTATE_VARRAY_UINT32 with WMS_ALLOC
flag which tells the receiving side to allocate memory for the array
before receiving the data.

The first user of it is a dynamic DMA window which existence and size
are totally dynamic.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 include/migration/vmstate.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 84ee355..1622638 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -386,6 +386,16 @@ extern const VMStateInfo vmstate_info_bitmap;
     .offset     = vmstate_offset_pointer(_state, _field, _type),     \
 }
 
+#define VMSTATE_VARRAY_UINT32_ALLOC(_field, _state, _field_num, _version, _info, _type) {\
+    .name       = (stringify(_field)),                               \
+    .version_id = (_version),                                        \
+    .num_offset = vmstate_offset_value(_state, _field_num, uint32_t),\
+    .info       = &(_info),                                          \
+    .size       = sizeof(_type),                                     \
+    .flags      = VMS_VARRAY_UINT32|VMS_POINTER|VMS_ALLOC,           \
+    .offset     = vmstate_offset_pointer(_state, _field, _type),     \
+}
+
 #define VMSTATE_VARRAY_UINT16_UNSAFE(_field, _state, _field_num, _version, _info, _type) {\
     .name       = (stringify(_field)),                               \
     .version_id = (_version),                                        \
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state
  2016-04-15 20:33 [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together Jianjun Duan
                   ` (2 preceding siblings ...)
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 3/5] vmstate: Define VARRAY with VMS_ALLOC Jianjun Duan
@ 2016-04-15 20:33 ` Jianjun Duan
  2016-04-20  5:14   ` David Gibson
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 5/5] Migration: migrate pending_events of " Jianjun Duan
  2016-04-20  4:29 ` [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together David Gibson
  5 siblings, 1 reply; 17+ messages in thread
From: Jianjun Duan @ 2016-04-15 20:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, Jianjun Duan

ccs_list in spapr state maintains the device tree related
information on the rtas side for hotplugged devices. In racing
situations between hotplug events and migration operation, a rtas
hotplug event could be migrated from the source guest to target
guest, or the source guest could have not yet finished fetching
the device tree when migration is started, the target will try
to finish fetching the device tree. By migrating ccs_list, the
target can fetch the device tree properly.

We tracked the size of ccs_list queue, and introduced a dynamic
cache for ccs_list to get around having to create VMSD for the
queue. There also existence tests in place for the newly added
fields in the spapr state VMSD to make sure forward migration
is not broken.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 60 ++++++++++++++++++++++++++++++++++++++++++++-
 hw/ppc/spapr_rtas.c         |  2 ++
 include/hw/ppc/spapr.h      | 11 +++++++++
 include/migration/vmstate.h |  8 +++++-
 4 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index af4745c..eab95f0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1245,10 +1245,31 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
     }
 }
 
+static void spapr_pre_save(void *opaque)
+{
+    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
+    sPAPRConfigureConnectorState *ccs;
+    sPAPRConfigureConnectorStateCache *ccs_cache;
+
+    /* Copy ccs_list to ccs_list_cache */
+    spapr->ccs_list_cache = g_new0(sPAPRConfigureConnectorStateCache,
+                                   spapr->ccs_list_num);
+    ccs_cache = spapr->ccs_list_cache;
+    QTAILQ_FOREACH(ccs, &spapr->ccs_list, next) {
+        ccs_cache->drc_index = ccs->drc_index;
+        ccs_cache->fdt_offset = ccs->fdt_offset;
+        ccs_cache->fdt_depth = ccs->fdt_depth;
+        ccs_cache++;
+    }
+}
+
 static int spapr_post_load(void *opaque, int version_id)
 {
     sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
     int err = 0;
+    sPAPRConfigureConnectorState *ccs;
+    sPAPRConfigureConnectorStateCache *ccs_cache = spapr->ccs_list_cache;
+    int index = 0;
 
     /* In earlier versions, there was no separate qdev for the PAPR
      * RTC, so the RTC offset was stored directly in sPAPREnvironment.
@@ -1258,6 +1279,19 @@ static int spapr_post_load(void *opaque, int version_id)
         err = spapr_rtc_import_offset(spapr->rtc, spapr->rtc_offset);
     }
 
+    if (version_id < 4) {
+        return err;
+    }
+    /* Copy ccs_list_cache to ccs_list */
+    for (index = 0; index < spapr->ccs_list_num; index ++) {
+        ccs = g_new0(sPAPRConfigureConnectorState, 1);
+        ccs->drc_index = (ccs_cache + index)->drc_index;
+        ccs->fdt_offset = (ccs_cache + index)->fdt_offset;
+        ccs->fdt_depth = (ccs_cache + index)->fdt_depth;
+        QTAILQ_INSERT_TAIL(&spapr->ccs_list, ccs, next);
+    }
+    g_free(spapr->ccs_list_cache);
+
     return err;
 }
 
@@ -1266,10 +1300,28 @@ static bool version_before_3(void *opaque, int version_id)
     return version_id < 3;
 }
 
+static bool version_ge_4(void *opaque, int version_id)
+{
+    return version_id >= 4;
+}
+
+static const VMStateDescription vmstate_spapr_ccs_cache = {
+    .name = "spaprconfigureconnectorstate",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorStateCache),
+        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorStateCache),
+        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorStateCache),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
-    .version_id = 3,
+    .version_id = 4,
     .minimum_version_id = 1,
+    .pre_save = spapr_pre_save,
     .post_load = spapr_post_load,
     .fields = (VMStateField[]) {
         /* used to be @next_irq */
@@ -1279,6 +1331,12 @@ static const VMStateDescription vmstate_spapr = {
         VMSTATE_UINT64_TEST(rtc_offset, sPAPRMachineState, version_before_3),
 
         VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
+        /* RTAS state */
+        VMSTATE_INT32_TEST(ccs_list_num, sPAPRMachineState, version_ge_4),
+        VMSTATE_STRUCT_VARRAY_ALLOC_TEST(ccs_list_cache, sPAPRMachineState,
+                                         version_ge_4, ccs_list_num, 1,
+                                         vmstate_spapr_ccs_cache,
+                                         sPAPRConfigureConnectorStateCache),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index f073258..9cfd559 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -70,6 +70,7 @@ static void spapr_ccs_add(sPAPRMachineState *spapr,
 {
     g_assert(!spapr_ccs_find(spapr, ccs->drc_index));
     QTAILQ_INSERT_HEAD(&spapr->ccs_list, ccs, next);
+    spapr->ccs_list_num++;
 }
 
 static void spapr_ccs_remove(sPAPRMachineState *spapr,
@@ -77,6 +78,7 @@ static void spapr_ccs_remove(sPAPRMachineState *spapr,
 {
     QTAILQ_REMOVE(&spapr->ccs_list, ccs, next);
     g_free(ccs);
+    spapr->ccs_list_num--;
 }
 
 void spapr_ccs_reset_hook(void *opaque)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 815d5ee..c8be926 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -11,6 +11,8 @@ struct VIOsPAPRBus;
 struct sPAPRPHBState;
 struct sPAPRNVRAM;
 typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
+typedef struct sPAPRConfigureConnectorStateCache
+        sPAPRConfigureConnectorStateCache;
 typedef struct sPAPREventLogEntry sPAPREventLogEntry;
 
 #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
@@ -75,6 +77,9 @@ struct sPAPRMachineState {
 
     /* RTAS state */
     QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
+    /* Temporary cache for migration purposes */
+    int32_t ccs_list_num;
+    sPAPRConfigureConnectorStateCache *ccs_list_cache;
 
     /*< public >*/
     char *kvm_type;
@@ -589,6 +594,12 @@ struct sPAPRConfigureConnectorState {
     QTAILQ_ENTRY(sPAPRConfigureConnectorState) next;
 };
 
+struct sPAPRConfigureConnectorStateCache {
+    uint32_t drc_index;
+    int fdt_offset;
+    int fdt_depth;
+};
+
 void spapr_ccs_reset_hook(void *opaque);
 
 #define TYPE_SPAPR_RTC "spapr-rtc"
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 1622638..7966979 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -549,9 +549,10 @@ extern const VMStateInfo vmstate_info_bitmap;
     .offset     = offsetof(_state, _field),                          \
 }
 
-#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
+#define VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, _test, _field_num, _version, _vmsd, _type) { \
     .name       = (stringify(_field)),                               \
     .version_id = (_version),                                        \
+    .field_exists = (_test),                                         \
     .vmsd       = &(_vmsd),                                          \
     .num_offset = vmstate_offset_value(_state, _field_num, int32_t), \
     .size       = sizeof(_type),                                     \
@@ -677,6 +678,11 @@ extern const VMStateInfo vmstate_info_bitmap;
     VMSTATE_STRUCT_ARRAY_TEST(_field, _state, _num, NULL, _version,   \
             _vmsd, _type)
 
+#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, \
+                                    _vmsd, _type)                         \
+    VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, NULL, _field_num,    \
+            _version, _vmsd, _type)
+
 #define VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, _info, _size) \
     VMSTATE_BUFFER_UNSAFE_INFO_TEST(_field, _state, NULL, _version, _info, \
             _size)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 5/5] Migration: migrate pending_events of spapr state
  2016-04-15 20:33 [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together Jianjun Duan
                   ` (3 preceding siblings ...)
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state Jianjun Duan
@ 2016-04-15 20:33 ` Jianjun Duan
  2016-04-20  4:29 ` [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together David Gibson
  5 siblings, 0 replies; 17+ messages in thread
From: Jianjun Duan @ 2016-04-15 20:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, mdroth, david, Jianjun Duan

In racing situations between hotplug events and migration operation,
a rtas hotplug event could have not yet be delivered to the source
guest when migration is started. In this case the pending_events of
spapr state need be transmitted to the target so that the hotplug
event can be finished on the target.

All the different fields of the events are encoded as defined by
PAPR. We can migrate them as uint8_t binary stream without any
concerns about data padding or endianess. We tracked the size of
pending_events queue, and introduced a dynamic cache for
pending_events to get around having to create VMSD for the queue.

We also have existence test in place for the newly added
pending_events related fields in spapr state VMSD to make sure
forward migration is not broken.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_events.c  | 24 ++++++++++++++---------
 include/hw/ppc/spapr.h | 14 ++++++++++++-
 3 files changed, 81 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index eab95f0..73e4401 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1248,9 +1248,24 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
 static void spapr_pre_save(void *opaque)
 {
     sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
+    sPAPREventLogEntry *event;
+    sPAPREventLogEntryCache *event_cache;
     sPAPRConfigureConnectorState *ccs;
     sPAPRConfigureConnectorStateCache *ccs_cache;
 
+    /* Copy pending_events to pending_events_cache */
+    spapr->pending_events_cache = g_new0(sPAPREventLogEntryCache,
+                                         spapr->pending_events_num);
+    event_cache = spapr->pending_events_cache;
+    QTAILQ_FOREACH(event, &spapr->pending_events, next) {
+        event_cache->log_type = event->log_type;
+        event_cache->exception = event->exception;
+        event_cache->data_size = event->data_size;
+        event_cache->data = g_malloc0(event_cache->data_size);
+        memcpy(event_cache->data, event->data, event_cache->data_size);
+        event_cache++;
+    }
+
     /* Copy ccs_list to ccs_list_cache */
     spapr->ccs_list_cache = g_new0(sPAPRConfigureConnectorStateCache,
                                    spapr->ccs_list_num);
@@ -1267,6 +1282,8 @@ static int spapr_post_load(void *opaque, int version_id)
 {
     sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
     int err = 0;
+    sPAPREventLogEntry *event;
+    sPAPREventLogEntryCache *event_cache = spapr->pending_events_cache;
     sPAPRConfigureConnectorState *ccs;
     sPAPRConfigureConnectorStateCache *ccs_cache = spapr->ccs_list_cache;
     int index = 0;
@@ -1282,6 +1299,18 @@ static int spapr_post_load(void *opaque, int version_id)
     if (version_id < 4) {
         return err;
     }
+    /* Copy pending_events_cache to pending_events */
+    for (index = 0; index < spapr->pending_events_num; index++) {
+        event = g_new0(sPAPREventLogEntry, 1);
+        event->log_type = (event_cache + index)->log_type;
+        event->exception = (event_cache + index)->exception;
+        event->data_size = (event_cache + index)->data_size;
+        event->data = g_malloc0(event->data_size);
+        memcpy(event->data, (event_cache + index)->data, event->data_size);
+        QTAILQ_INSERT_TAIL(&spapr->pending_events, event, next);
+    }
+    g_free(spapr->pending_events_cache);
+
     /* Copy ccs_list_cache to ccs_list */
     for (index = 0; index < spapr->ccs_list_num; index ++) {
         ccs = g_new0(sPAPRConfigureConnectorState, 1);
@@ -1305,6 +1334,20 @@ static bool version_ge_4(void *opaque, int version_id)
     return version_id >= 4;
 }
 
+static const VMStateDescription vmstate_spapr_event_cache = {
+    .name = "spapreventlogentrycache",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT32(log_type, sPAPREventLogEntryCache),
+        VMSTATE_BOOL(exception, sPAPREventLogEntryCache),
+        VMSTATE_UINT32(data_size, sPAPREventLogEntryCache),
+        VMSTATE_VARRAY_UINT32_ALLOC(data, sPAPREventLogEntryCache, data_size,
+                                    0, vmstate_info_uint8, uint8_t),
+        VMSTATE_END_OF_LIST()
+  },
+};
+
 static const VMStateDescription vmstate_spapr_ccs_cache = {
     .name = "spaprconfigureconnectorstate",
     .version_id = 1,
@@ -1331,6 +1374,16 @@ static const VMStateDescription vmstate_spapr = {
         VMSTATE_UINT64_TEST(rtc_offset, sPAPRMachineState, version_before_3),
 
         VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
+
+        /* Pending sPAPR event list*/
+        VMSTATE_INT32_TEST(pending_events_num, sPAPRMachineState,
+                           version_ge_4),
+        VMSTATE_STRUCT_VARRAY_ALLOC_TEST(pending_events_cache,
+                                         sPAPRMachineState,
+                                         version_ge_4,
+                                         pending_events_num, 1,
+                                         vmstate_spapr_event_cache,
+                                         sPAPREventLogEntryCache),
         /* RTAS state */
         VMSTATE_INT32_TEST(ccs_list_num, sPAPRMachineState, version_ge_4),
         VMSTATE_STRUCT_VARRAY_ALLOC_TEST(ccs_list_cache, sPAPRMachineState,
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 269ab7e..1bf228e 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -239,7 +239,8 @@ void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
     _FDT((fdt_end_node(fdt)));
 }
 
-static void rtas_event_log_queue(int log_type, void *data, bool exception)
+static void rtas_event_log_queue(int log_type, void *data, bool exception,
+				 int data_size)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     sPAPREventLogEntry *entry = g_new(sPAPREventLogEntry, 1);
@@ -248,7 +249,9 @@ static void rtas_event_log_queue(int log_type, void *data, bool exception)
     entry->log_type = log_type;
     entry->exception = exception;
     entry->data = data;
+    entry->data_size = data_size;
     QTAILQ_INSERT_TAIL(&spapr->pending_events, entry, next);
+    spapr->pending_events_num++;
 }
 
 static sPAPREventLogEntry *rtas_event_log_dequeue(uint32_t event_mask,
@@ -276,6 +279,7 @@ static sPAPREventLogEntry *rtas_event_log_dequeue(uint32_t event_mask,
 
     if (entry) {
         QTAILQ_REMOVE(&spapr->pending_events, entry, next);
+        spapr->pending_events_num--;
     }
 
     return entry;
@@ -350,6 +354,7 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     struct rtas_event_log_v6_mainb *mainb;
     struct rtas_event_log_v6_epow *epow;
     struct epow_log_full *new_epow;
+    uint32_t data_size;
 
     new_epow = g_malloc0(sizeof(*new_epow));
     hdr = &new_epow->hdr;
@@ -358,13 +363,13 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     mainb = &new_epow->mainb;
     epow = &new_epow->epow;
 
+    data_size = sizeof(*new_epow);
     hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
                                | RTAS_LOG_SEVERITY_EVENT
                                | RTAS_LOG_DISPOSITION_NOT_RECOVERED
                                | RTAS_LOG_OPTIONAL_PART_PRESENT
                                | RTAS_LOG_TYPE_EPOW);
-    hdr->extended_length = cpu_to_be32(sizeof(*new_epow)
-                                       - sizeof(new_epow->hdr));
+    hdr->extended_length = cpu_to_be32(data_size - sizeof(new_epow->hdr));
 
     spapr_init_v6hdr(v6hdr);
     spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
@@ -384,7 +389,7 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
     epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
 
-    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true);
+    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true, data_size);
 
     qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
 }
@@ -407,6 +412,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     struct rtas_event_log_v6_maina *maina;
     struct rtas_event_log_v6_mainb *mainb;
     struct rtas_event_log_v6_hp *hp;
+    uint32_t data_size;
 
     new_hp = g_malloc0(sizeof(struct hp_log_full));
     hdr = &new_hp->hdr;
@@ -415,14 +421,14 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     mainb = &new_hp->mainb;
     hp = &new_hp->hp;
 
+    data_size = sizeof(*new_hp);
     hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
                                | RTAS_LOG_SEVERITY_EVENT
                                | RTAS_LOG_DISPOSITION_NOT_RECOVERED
                                | RTAS_LOG_OPTIONAL_PART_PRESENT
                                | RTAS_LOG_INITIATOR_HOTPLUG
                                | RTAS_LOG_TYPE_HOTPLUG);
-    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
-                                       - sizeof(new_hp->hdr));
+    hdr->extended_length = cpu_to_be32(data_size - sizeof(new_hp->hdr));
 
     spapr_init_v6hdr(v6hdr);
     spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
@@ -460,7 +466,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
         hp->drc.index = cpu_to_be32(drc);
     }
 
-    rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
+    rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true, data_size);
 
     if (hp->hotplug_action == RTAS_LOG_V6_HP_ACTION_ADD) {
         spapr_hotplug_set_signalled(drc);
@@ -531,7 +537,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
         goto out_no_events;
     }
 
-    hdr = event->data;
+    hdr = (struct rtas_error_log *)event->data;
     event_len = be32_to_cpu(hdr->extended_length) + sizeof(*hdr);
 
     if (event_len < len) {
@@ -581,7 +587,7 @@ static void event_scan(PowerPCCPU *cpu, sPAPRMachineState *spapr,
         goto out_no_events;
     }
 
-    hdr = event->data;
+    hdr = (struct rtas_error_log *)event->data;
     event_len = be32_to_cpu(hdr->extended_length) + sizeof(*hdr);
 
     if (event_len < len) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index c8be926..12419c2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -14,6 +14,7 @@ typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
 typedef struct sPAPRConfigureConnectorStateCache
         sPAPRConfigureConnectorStateCache;
 typedef struct sPAPREventLogEntry sPAPREventLogEntry;
+typedef struct sPAPREventLogEntryCache sPAPREventLogEntryCache;
 
 #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
 #define SPAPR_ENTRY_POINT       0x100
@@ -69,6 +70,9 @@ struct sPAPRMachineState {
     uint32_t check_exception_irq;
     Notifier epow_notifier;
     QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
+    /* Temporary cache for migration purposes */
+    int32_t pending_events_num;
+    sPAPREventLogEntryCache *pending_events_cache;
 
     /* Migration state */
     int htab_save_index;
@@ -557,10 +561,18 @@ sPAPRTCETable *spapr_tce_find_by_liobn(target_ulong liobn);
 struct sPAPREventLogEntry {
     int log_type;
     bool exception;
-    void *data;
+    uint32_t data_size;
+    uint8_t *data;
     QTAILQ_ENTRY(sPAPREventLogEntry) next;
 };
 
+struct sPAPREventLogEntryCache {
+    int log_type;
+    bool exception;
+    uint32_t data_size;
+    uint8_t *data;
+};
+
 void spapr_events_init(sPAPRMachineState *sm);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 int spapr_h_cas_compose_response(sPAPRMachineState *sm,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together
  2016-04-15 20:33 [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together Jianjun Duan
                   ` (4 preceding siblings ...)
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 5/5] Migration: migrate pending_events of " Jianjun Duan
@ 2016-04-20  4:29 ` David Gibson
  5 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2016-04-20  4:29 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, qemu-ppc, mdroth

[-- Attachment #1: Type: text/plain, Size: 925 bytes --]

On Fri, Apr 15, 2016 at 01:33:00PM -0700, Jianjun Duan wrote:
> To make guest device (PCI, CPU and memory) hotplug work together 
> with guest migration, spapr drc state needs be transmitted in
> migration. This patch defines the VMStateDescription struct for
> spapr drc state to enable it.
> 
> To fix the potential racing between hotplug events on guest and 
> guest migration, ccs_list and pending_events of spapr state need be 
> transmitted in migration. This patch also takes care of it. 
> Especially, temporary cache is used for each of them to enable the 
> transmission.

Hmm, so this will certainly have to be for 2.7 at this stage.  I have
some concerns that I'll comment on in the individual patches.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] spapr: ensure device trees are always associated with DRC
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 1/5] spapr: ensure device trees are always associated with DRC Jianjun Duan
@ 2016-04-20  4:30   ` David Gibson
  2016-04-21 16:53     ` [Qemu-devel] [Qemu-ppc] " Jianjun Duan
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2016-04-20  4:30 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, qemu-ppc, mdroth

[-- Attachment #1: Type: text/plain, Size: 3425 bytes --]

On Fri, Apr 15, 2016 at 01:33:01PM -0700, Jianjun Duan wrote:
> There are possible racing situations involving hotplug events and
> guest migration. For cases where a hotplug event is migrated, or
> the guest is in the process of fetching device tree at the time of
> migration, we need to ensure the device tree is created and
> associated with the corresponding DRC for devices that were
> hotplugged on the source, but 'coldplugged' on the target.
> 
> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>

This seems fairly sensible - should be harmless and a bit cleaner
whether or not we strictly speaking need it.

However, it is likely to conflict with some of the device tree
construction cleanups I have in my pipeline, so it may well need
rework for that.

> ---
>  hw/ppc/spapr.c     | 16 ++++++----------
>  hw/ppc/spapr_pci.c | 12 +++++-------
>  2 files changed, 11 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index feaab08..af4745c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2132,15 +2132,6 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
>      int i, fdt_offset, fdt_size;
>      void *fdt;
>  
> -    /*
> -     * Check for DRC connectors and send hotplug notification to the
> -     * guest only in case of hotplugged memory. This allows cold plugged
> -     * memory to be specified at boot time.
> -     */
> -    if (!dev->hotplugged) {
> -        return;
> -    }
> -
>      for (i = 0; i < nr_lmbs; i++) {
>          drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
>                  addr/SPAPR_MEMORY_BLOCK_SIZE);
> @@ -2154,7 +2145,12 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
>          drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, errp);
>          addr += SPAPR_MEMORY_BLOCK_SIZE;
>      }
> -    spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
> +    /* send hotplug notification to the
> +     * guest only in case of hotplugged memory
> +     */
> +    if (dev->hotplugged) {
> +       spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
> +    }
>  }
>  
>  static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 8c20d34..b179e42 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1092,13 +1092,11 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector *drc,
>          spapr_tce_set_need_vfio(tcet, true);
>      }
>  
> -    if (dev->hotplugged) {
> -        fdt = create_device_tree(&fdt_size);
> -        fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
> -        if (!fdt_start_offset) {
> -            error_setg(errp, "Failed to create pci child device tree node");
> -            goto out;
> -        }
> +    fdt = create_device_tree(&fdt_size);
> +    fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
> +    if (!fdt_start_offset) {
> +        error_setg(errp, "Failed to create pci child device tree node");
> +        goto out;
>      }
>  
>      drck->attach(drc, DEVICE(pdev),

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] Migration: Defined VMStateDescription struct for spapr_drc
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 2/5] Migration: Defined VMStateDescription struct for spapr_drc Jianjun Duan
@ 2016-04-20  4:32   ` David Gibson
  2016-04-21 17:03     ` Jianjun Duan
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2016-04-20  4:32 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, qemu-ppc, mdroth

[-- Attachment #1: Type: text/plain, Size: 3101 bytes --]

On Fri, Apr 15, 2016 at 01:33:02PM -0700, Jianjun Duan wrote:
> To manage hotplug/unplug of dynamic resources such as PCI cards,
> memory, and CPU on sPAPR guests, a firmware abstraction known as
> a Dynamic Resource Connector (DRC) is used to assign a particular
> dynamic resource to the guest, and provide an interface for the
> guest to manage configuration/removal of the resource associated
> with it.
> 
> To migrate the hotplugged resources in migration, the
> associated DRC state need be migrated. To migrate the DRC state,
> we defined the VMStateDescription struct for spapr_drc to enable
> the transmission of spapr_drc state in migration.
> 
> Not all the elements in the DRC state are migrated. Only those
> ones modifiable by guest actions or device add/remove operation
> are migrated.
> 
> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>

Urgh.  It would be really nice if we could avoid this and instead
calculate these states from other information.  I hate migrating
what's essentially transitory state if we can possibly avoid it - is
there any way to defer or retry hotplug operations to make this
unnececessary?

Even if we have to migrate some state here, I'm a bit dubious about
whether directly migrating the PAPR indicator states is the best way.
It does have the advantage of having a spec, on the other hand the
PAPR indicators are really weird and hard to understand the meaning
of.

> ---
>  hw/ppc/spapr_drc.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 3173940..5f7a25f 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -610,6 +610,20 @@ static void spapr_dr_connector_instance_init(Object *obj)
>                          NULL, NULL, NULL, NULL);
>  }
>  
> +static const VMStateDescription vmstate_spapr_drc = {
> +    .name = "spapr_drc",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields  = (VMStateField []) {
> +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> +        VMSTATE_BOOL(configured, sPAPRDRConnector),
> +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>  {
>      DeviceClass *dk = DEVICE_CLASS(k);
> @@ -618,6 +632,7 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>      dk->reset = reset;
>      dk->realize = realize;
>      dk->unrealize = unrealize;
> +    dk->vmsd = &vmstate_spapr_drc;
>      drck->set_isolation_state = set_isolation_state;
>      drck->set_indicator_state = set_indicator_state;
>      drck->set_allocation_state = set_allocation_state;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state
  2016-04-15 20:33 ` [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state Jianjun Duan
@ 2016-04-20  5:14   ` David Gibson
  2016-04-21 17:22     ` Jianjun Duan
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2016-04-20  5:14 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, qemu-ppc, mdroth

[-- Attachment #1: Type: text/plain, Size: 9753 bytes --]

On Fri, Apr 15, 2016 at 01:33:04PM -0700, Jianjun Duan wrote:
> ccs_list in spapr state maintains the device tree related
> information on the rtas side for hotplugged devices. In racing
> situations between hotplug events and migration operation, a rtas
> hotplug event could be migrated from the source guest to target
> guest, or the source guest could have not yet finished fetching
> the device tree when migration is started, the target will try
> to finish fetching the device tree. By migrating ccs_list, the
> target can fetch the device tree properly.
> 
> We tracked the size of ccs_list queue, and introduced a dynamic
> cache for ccs_list to get around having to create VMSD for the
> queue. There also existence tests in place for the newly added
> fields in the spapr state VMSD to make sure forward migration
> is not broken.
> 
> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>

So there's problems here at a couple of levels.

1. Even more so that the indicators, this is transitional state, which
it would be really nice if we can avoid migrating.  At the very least
the new state should go into a subsection with an is_needed function
which will skip it if we don't have a configure connector in progress.
That means we'll be able to backwards migrate as long as we're not in
the middle of a hotplug, which won't be possible with this version.

Again, if there's some way we can defer or retry the operation instead
of migrating the interim state, that would be even better.

2. Having to copy the list of elements out into an array just for
migration is pretty horrible.  I'm almost include to suggest we should
add list migration into the savevm core rather than this.

> ---
>  hw/ppc/spapr.c              | 60 ++++++++++++++++++++++++++++++++++++++++++++-
>  hw/ppc/spapr_rtas.c         |  2 ++
>  include/hw/ppc/spapr.h      | 11 +++++++++
>  include/migration/vmstate.h |  8 +++++-
>  4 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index af4745c..eab95f0 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1245,10 +1245,31 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
>      }
>  }
>  
> +static void spapr_pre_save(void *opaque)
> +{
> +    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> +    sPAPRConfigureConnectorState *ccs;
> +    sPAPRConfigureConnectorStateCache *ccs_cache;
> +
> +    /* Copy ccs_list to ccs_list_cache */
> +    spapr->ccs_list_cache = g_new0(sPAPRConfigureConnectorStateCache,
> +                                   spapr->ccs_list_num);
> +    ccs_cache = spapr->ccs_list_cache;
> +    QTAILQ_FOREACH(ccs, &spapr->ccs_list, next) {
> +        ccs_cache->drc_index = ccs->drc_index;
> +        ccs_cache->fdt_offset = ccs->fdt_offset;
> +        ccs_cache->fdt_depth = ccs->fdt_depth;
> +        ccs_cache++;
> +    }
> +}
> +
>  static int spapr_post_load(void *opaque, int version_id)
>  {
>      sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
>      int err = 0;
> +    sPAPRConfigureConnectorState *ccs;
> +    sPAPRConfigureConnectorStateCache *ccs_cache = spapr->ccs_list_cache;
> +    int index = 0;
>  
>      /* In earlier versions, there was no separate qdev for the PAPR
>       * RTC, so the RTC offset was stored directly in sPAPREnvironment.
> @@ -1258,6 +1279,19 @@ static int spapr_post_load(void *opaque, int version_id)
>          err = spapr_rtc_import_offset(spapr->rtc, spapr->rtc_offset);
>      }
>  
> +    if (version_id < 4) {
> +        return err;
> +    }
> +    /* Copy ccs_list_cache to ccs_list */
> +    for (index = 0; index < spapr->ccs_list_num; index ++) {
> +        ccs = g_new0(sPAPRConfigureConnectorState, 1);
> +        ccs->drc_index = (ccs_cache + index)->drc_index;
> +        ccs->fdt_offset = (ccs_cache + index)->fdt_offset;
> +        ccs->fdt_depth = (ccs_cache + index)->fdt_depth;
> +        QTAILQ_INSERT_TAIL(&spapr->ccs_list, ccs, next);
> +    }
> +    g_free(spapr->ccs_list_cache);
> +
>      return err;
>  }
>  
> @@ -1266,10 +1300,28 @@ static bool version_before_3(void *opaque, int version_id)
>      return version_id < 3;
>  }
>  
> +static bool version_ge_4(void *opaque, int version_id)
> +{
> +    return version_id >= 4;
> +}
> +
> +static const VMStateDescription vmstate_spapr_ccs_cache = {
> +    .name = "spaprconfigureconnectorstate",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorStateCache),
> +        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorStateCache),
> +        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorStateCache),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
> -    .version_id = 3,
> +    .version_id = 4,
>      .minimum_version_id = 1,
> +    .pre_save = spapr_pre_save,
>      .post_load = spapr_post_load,
>      .fields = (VMStateField[]) {
>          /* used to be @next_irq */
> @@ -1279,6 +1331,12 @@ static const VMStateDescription vmstate_spapr = {
>          VMSTATE_UINT64_TEST(rtc_offset, sPAPRMachineState, version_before_3),
>  
>          VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
> +        /* RTAS state */
> +        VMSTATE_INT32_TEST(ccs_list_num, sPAPRMachineState, version_ge_4),

You don't generally need to write your own version test functions for
>= specific-version.  Instead you can just use VMSTATE_INT32_V.

> +        VMSTATE_STRUCT_VARRAY_ALLOC_TEST(ccs_list_cache, sPAPRMachineState,
> +                                         version_ge_4, ccs_list_num, 1,
> +                                         vmstate_spapr_ccs_cache,
> +                                         sPAPRConfigureConnectorStateCache),
>          VMSTATE_END_OF_LIST()
>      },
>  };
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index f073258..9cfd559 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -70,6 +70,7 @@ static void spapr_ccs_add(sPAPRMachineState *spapr,
>  {
>      g_assert(!spapr_ccs_find(spapr, ccs->drc_index));
>      QTAILQ_INSERT_HEAD(&spapr->ccs_list, ccs, next);
> +    spapr->ccs_list_num++;
>  }
>  
>  static void spapr_ccs_remove(sPAPRMachineState *spapr,
> @@ -77,6 +78,7 @@ static void spapr_ccs_remove(sPAPRMachineState *spapr,
>  {
>      QTAILQ_REMOVE(&spapr->ccs_list, ccs, next);
>      g_free(ccs);
> +    spapr->ccs_list_num--;
>  }
>  
>  void spapr_ccs_reset_hook(void *opaque)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 815d5ee..c8be926 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -11,6 +11,8 @@ struct VIOsPAPRBus;
>  struct sPAPRPHBState;
>  struct sPAPRNVRAM;
>  typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
> +typedef struct sPAPRConfigureConnectorStateCache
> +        sPAPRConfigureConnectorStateCache;
>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>  
>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
> @@ -75,6 +77,9 @@ struct sPAPRMachineState {
>  
>      /* RTAS state */
>      QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
> +    /* Temporary cache for migration purposes */
> +    int32_t ccs_list_num;
> +    sPAPRConfigureConnectorStateCache *ccs_list_cache;
>  
>      /*< public >*/
>      char *kvm_type;
> @@ -589,6 +594,12 @@ struct sPAPRConfigureConnectorState {
>      QTAILQ_ENTRY(sPAPRConfigureConnectorState) next;
>  };
>  
> +struct sPAPRConfigureConnectorStateCache {
> +    uint32_t drc_index;
> +    int fdt_offset;
> +    int fdt_depth;
> +};
> +
>  void spapr_ccs_reset_hook(void *opaque);
>  
>  #define TYPE_SPAPR_RTC "spapr-rtc"
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 1622638..7966979 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -549,9 +549,10 @@ extern const VMStateInfo vmstate_info_bitmap;
>      .offset     = offsetof(_state, _field),                          \
>  }
>  
> -#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
> +#define VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, _test, _field_num, _version, _vmsd, _type) { \
>      .name       = (stringify(_field)),                               \
>      .version_id = (_version),                                        \
> +    .field_exists = (_test),                                         \
>      .vmsd       = &(_vmsd),                                          \
>      .num_offset = vmstate_offset_value(_state, _field_num, int32_t), \
>      .size       = sizeof(_type),                                     \
> @@ -677,6 +678,11 @@ extern const VMStateInfo vmstate_info_bitmap;
>      VMSTATE_STRUCT_ARRAY_TEST(_field, _state, _num, NULL, _version,   \
>              _vmsd, _type)
>  
> +#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, \
> +                                    _vmsd, _type)                         \
> +    VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, NULL, _field_num,    \
> +            _version, _vmsd, _type)
> +
>  #define VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, _info, _size) \
>      VMSTATE_BUFFER_UNSAFE_INFO_TEST(_field, _state, NULL, _version, _info, \
>              _size)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/5] spapr: ensure device trees are always associated with DRC
  2016-04-20  4:30   ` David Gibson
@ 2016-04-21 16:53     ` Jianjun Duan
  0 siblings, 0 replies; 17+ messages in thread
From: Jianjun Duan @ 2016-04-21 16:53 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, mdroth



On 04/19/2016 09:30 PM, David Gibson wrote:
> On Fri, Apr 15, 2016 at 01:33:01PM -0700, Jianjun Duan wrote:
>> There are possible racing situations involving hotplug events and
>> guest migration. For cases where a hotplug event is migrated, or
>> the guest is in the process of fetching device tree at the time of
>> migration, we need to ensure the device tree is created and
>> associated with the corresponding DRC for devices that were
>> hotplugged on the source, but 'coldplugged' on the target.
>>
>> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> This seems fairly sensible - should be harmless and a bit cleaner
> whether or not we strictly speaking need it.
>
> However, it is likely to conflict with some of the device tree
> construction cleanups I have in my pipeline, so it may well need
> rework for that.
Sure. It will need rework for CPU hotplug too.
>> ---
>>   hw/ppc/spapr.c     | 16 ++++++----------
>>   hw/ppc/spapr_pci.c | 12 +++++-------
>>   2 files changed, 11 insertions(+), 17 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index feaab08..af4745c 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2132,15 +2132,6 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
>>       int i, fdt_offset, fdt_size;
>>       void *fdt;
>>   
>> -    /*
>> -     * Check for DRC connectors and send hotplug notification to the
>> -     * guest only in case of hotplugged memory. This allows cold plugged
>> -     * memory to be specified at boot time.
>> -     */
>> -    if (!dev->hotplugged) {
>> -        return;
>> -    }
>> -
>>       for (i = 0; i < nr_lmbs; i++) {
>>           drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
>>                   addr/SPAPR_MEMORY_BLOCK_SIZE);
>> @@ -2154,7 +2145,12 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
>>           drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, errp);
>>           addr += SPAPR_MEMORY_BLOCK_SIZE;
>>       }
>> -    spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
>> +    /* send hotplug notification to the
>> +     * guest only in case of hotplugged memory
>> +     */
>> +    if (dev->hotplugged) {
>> +       spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
>> +    }
>>   }
>>   
>>   static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index 8c20d34..b179e42 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -1092,13 +1092,11 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector *drc,
>>           spapr_tce_set_need_vfio(tcet, true);
>>       }
>>   
>> -    if (dev->hotplugged) {
>> -        fdt = create_device_tree(&fdt_size);
>> -        fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
>> -        if (!fdt_start_offset) {
>> -            error_setg(errp, "Failed to create pci child device tree node");
>> -            goto out;
>> -        }
>> +    fdt = create_device_tree(&fdt_size);
>> +    fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
>> +    if (!fdt_start_offset) {
>> +        error_setg(errp, "Failed to create pci child device tree node");
>> +        goto out;
>>       }
>>   
>>       drck->attach(drc, DEVICE(pdev),

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] Migration: Defined VMStateDescription struct for spapr_drc
  2016-04-20  4:32   ` David Gibson
@ 2016-04-21 17:03     ` Jianjun Duan
  2016-04-22  4:25       ` David Gibson
  0 siblings, 1 reply; 17+ messages in thread
From: Jianjun Duan @ 2016-04-21 17:03 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc, mdroth



On 04/19/2016 09:32 PM, David Gibson wrote:
> On Fri, Apr 15, 2016 at 01:33:02PM -0700, Jianjun Duan wrote:
>> To manage hotplug/unplug of dynamic resources such as PCI cards,
>> memory, and CPU on sPAPR guests, a firmware abstraction known as
>> a Dynamic Resource Connector (DRC) is used to assign a particular
>> dynamic resource to the guest, and provide an interface for the
>> guest to manage configuration/removal of the resource associated
>> with it.
>>
>> To migrate the hotplugged resources in migration, the
>> associated DRC state need be migrated. To migrate the DRC state,
>> we defined the VMStateDescription struct for spapr_drc to enable
>> the transmission of spapr_drc state in migration.
>>
>> Not all the elements in the DRC state are migrated. Only those
>> ones modifiable by guest actions or device add/remove operation
>> are migrated.
>>
>> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> Urgh.  It would be really nice if we could avoid this and instead
> calculate these states from other information.  I hate migrating
> what's essentially transitory state if we can possibly avoid it - is
> there any way to defer or retry hotplug operations to make this
> unnececessary?
>
> Even if we have to migrate some state here, I'm a bit dubious about
> whether directly migrating the PAPR indicator states is the best way.
> It does have the advantage of having a spec, on the other hand the
> PAPR indicators are really weird and hard to understand the meaning
> of.
I don't think we can avoid this. I would think migrating the machine 
state is
actually a clean approach, as you said it does have PAPR spec.
>> ---
>>   hw/ppc/spapr_drc.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>> index 3173940..5f7a25f 100644
>> --- a/hw/ppc/spapr_drc.c
>> +++ b/hw/ppc/spapr_drc.c
>> @@ -610,6 +610,20 @@ static void spapr_dr_connector_instance_init(Object *obj)
>>                           NULL, NULL, NULL, NULL);
>>   }
>>   
>> +static const VMStateDescription vmstate_spapr_drc = {
>> +    .name = "spapr_drc",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields  = (VMStateField []) {
>> +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
>> +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
>> +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
>> +        VMSTATE_BOOL(configured, sPAPRDRConnector),
>> +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>>   static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>>   {
>>       DeviceClass *dk = DEVICE_CLASS(k);
>> @@ -618,6 +632,7 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>>       dk->reset = reset;
>>       dk->realize = realize;
>>       dk->unrealize = unrealize;
>> +    dk->vmsd = &vmstate_spapr_drc;
>>       drck->set_isolation_state = set_isolation_state;
>>       drck->set_indicator_state = set_indicator_state;
>>       drck->set_allocation_state = set_allocation_state;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state
  2016-04-20  5:14   ` David Gibson
@ 2016-04-21 17:22     ` Jianjun Duan
  2016-04-22  4:28       ` David Gibson
  0 siblings, 1 reply; 17+ messages in thread
From: Jianjun Duan @ 2016-04-21 17:22 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc, mdroth

[-- Attachment #1: Type: text/plain, Size: 10559 bytes --]



On 04/19/2016 10:14 PM, David Gibson wrote:
> On Fri, Apr 15, 2016 at 01:33:04PM -0700, Jianjun Duan wrote:
>> ccs_list in spapr state maintains the device tree related
>> information on the rtas side for hotplugged devices. In racing
>> situations between hotplug events and migration operation, a rtas
>> hotplug event could be migrated from the source guest to target
>> guest, or the source guest could have not yet finished fetching
>> the device tree when migration is started, the target will try
>> to finish fetching the device tree. By migrating ccs_list, the
>> target can fetch the device tree properly.
>>
>> We tracked the size of ccs_list queue, and introduced a dynamic
>> cache for ccs_list to get around having to create VMSD for the
>> queue. There also existence tests in place for the newly added
>> fields in the spapr state VMSD to make sure forward migration
>> is not broken.
>>
>> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> So there's problems here at a couple of levels.
>
> 1. Even more so that the indicators, this is transitional state, which
> it would be really nice if we can avoid migrating.  At the very least
> the new state should go into a subsection with an is_needed function
> which will skip it if we don't have a configure connector in progress.
> That means we'll be able to backwards migrate as long as we're not in
> the middle of a hotplug, which won't be possible with this version.
>
> Again, if there's some way we can defer or retry the operation instead
> of migrating the interim state, that would be even better.
I am not sure why transitional state should not be migrated. I think the
fact that it changes is the reason why it should be migrated. As for
backward migration, I thought about it, but decided to leave it later
to beat the time. We can do it later, or do it this time and delayed the
patches more. I agree that subsection seems to be the solution here.

>
> 2. Having to copy the list of elements out into an array just for
> migration is pretty horrible.  I'm almost include to suggest we should
> add list migration into the savevm core rather than this.
I thought about a general approach of adding something to savevm to
handle the queue. But the queue used here uses macro and doesn't really
support polymorphism.  And we need to use  QTAILQ_FOREACH(var, head, field)
to access it. It is not easy as we may need to modify the macro 
definition to carry
type name of the elements of the queue.

Also I am following Alexey's code here ([PATCH qemu v15 07/17] 
spapr_iommu: Migrate full state).
It was reviewed by you.
>> ---
>>   hw/ppc/spapr.c              | 60 ++++++++++++++++++++++++++++++++++++++++++++-
>>   hw/ppc/spapr_rtas.c         |  2 ++
>>   include/hw/ppc/spapr.h      | 11 +++++++++
>>   include/migration/vmstate.h |  8 +++++-
>>   4 files changed, 79 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index af4745c..eab95f0 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1245,10 +1245,31 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
>>       }
>>   }
>>   
>> +static void spapr_pre_save(void *opaque)
>> +{
>> +    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
>> +    sPAPRConfigureConnectorState *ccs;
>> +    sPAPRConfigureConnectorStateCache *ccs_cache;
>> +
>> +    /* Copy ccs_list to ccs_list_cache */
>> +    spapr->ccs_list_cache = g_new0(sPAPRConfigureConnectorStateCache,
>> +                                   spapr->ccs_list_num);
>> +    ccs_cache = spapr->ccs_list_cache;
>> +    QTAILQ_FOREACH(ccs, &spapr->ccs_list, next) {
>> +        ccs_cache->drc_index = ccs->drc_index;
>> +        ccs_cache->fdt_offset = ccs->fdt_offset;
>> +        ccs_cache->fdt_depth = ccs->fdt_depth;
>> +        ccs_cache++;
>> +    }
>> +}
>> +
>>   static int spapr_post_load(void *opaque, int version_id)
>>   {
>>       sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
>>       int err = 0;
>> +    sPAPRConfigureConnectorState *ccs;
>> +    sPAPRConfigureConnectorStateCache *ccs_cache = spapr->ccs_list_cache;
>> +    int index = 0;
>>   
>>       /* In earlier versions, there was no separate qdev for the PAPR
>>        * RTC, so the RTC offset was stored directly in sPAPREnvironment.
>> @@ -1258,6 +1279,19 @@ static int spapr_post_load(void *opaque, int version_id)
>>           err = spapr_rtc_import_offset(spapr->rtc, spapr->rtc_offset);
>>       }
>>   
>> +    if (version_id < 4) {
>> +        return err;
>> +    }
>> +    /* Copy ccs_list_cache to ccs_list */
>> +    for (index = 0; index < spapr->ccs_list_num; index ++) {
>> +        ccs = g_new0(sPAPRConfigureConnectorState, 1);
>> +        ccs->drc_index = (ccs_cache + index)->drc_index;
>> +        ccs->fdt_offset = (ccs_cache + index)->fdt_offset;
>> +        ccs->fdt_depth = (ccs_cache + index)->fdt_depth;
>> +        QTAILQ_INSERT_TAIL(&spapr->ccs_list, ccs, next);
>> +    }
>> +    g_free(spapr->ccs_list_cache);
>> +
>>       return err;
>>   }
>>   
>> @@ -1266,10 +1300,28 @@ static bool version_before_3(void *opaque, int version_id)
>>       return version_id < 3;
>>   }
>>   
>> +static bool version_ge_4(void *opaque, int version_id)
>> +{
>> +    return version_id >= 4;
>> +}
>> +
>> +static const VMStateDescription vmstate_spapr_ccs_cache = {
>> +    .name = "spaprconfigureconnectorstate",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorStateCache),
>> +        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorStateCache),
>> +        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorStateCache),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>   static const VMStateDescription vmstate_spapr = {
>>       .name = "spapr",
>> -    .version_id = 3,
>> +    .version_id = 4,
>>       .minimum_version_id = 1,
>> +    .pre_save = spapr_pre_save,
>>       .post_load = spapr_post_load,
>>       .fields = (VMStateField[]) {
>>           /* used to be @next_irq */
>> @@ -1279,6 +1331,12 @@ static const VMStateDescription vmstate_spapr = {
>>           VMSTATE_UINT64_TEST(rtc_offset, sPAPRMachineState, version_before_3),
>>   
>>           VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
>> +        /* RTAS state */
>> +        VMSTATE_INT32_TEST(ccs_list_num, sPAPRMachineState, version_ge_4),
> You don't generally need to write your own version test functions for specific-version.  Instead you can just use VMSTATE_INT32_V.
I agree. I realized that after the code was posted here.
>> +        VMSTATE_STRUCT_VARRAY_ALLOC_TEST(ccs_list_cache, sPAPRMachineState,
>> +                                         version_ge_4, ccs_list_num, 1,
>> +                                         vmstate_spapr_ccs_cache,
>> +                                         sPAPRConfigureConnectorStateCache),
>>           VMSTATE_END_OF_LIST()
>>       },
>>   };
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index f073258..9cfd559 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -70,6 +70,7 @@ static void spapr_ccs_add(sPAPRMachineState *spapr,
>>   {
>>       g_assert(!spapr_ccs_find(spapr, ccs->drc_index));
>>       QTAILQ_INSERT_HEAD(&spapr->ccs_list, ccs, next);
>> +    spapr->ccs_list_num++;
>>   }
>>   
>>   static void spapr_ccs_remove(sPAPRMachineState *spapr,
>> @@ -77,6 +78,7 @@ static void spapr_ccs_remove(sPAPRMachineState *spapr,
>>   {
>>       QTAILQ_REMOVE(&spapr->ccs_list, ccs, next);
>>       g_free(ccs);
>> +    spapr->ccs_list_num--;
>>   }
>>   
>>   void spapr_ccs_reset_hook(void *opaque)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 815d5ee..c8be926 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -11,6 +11,8 @@ struct VIOsPAPRBus;
>>   struct sPAPRPHBState;
>>   struct sPAPRNVRAM;
>>   typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
>> +typedef struct sPAPRConfigureConnectorStateCache
>> +        sPAPRConfigureConnectorStateCache;
>>   typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>>   
>>   #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
>> @@ -75,6 +77,9 @@ struct sPAPRMachineState {
>>   
>>       /* RTAS state */
>>       QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
>> +    /* Temporary cache for migration purposes */
>> +    int32_t ccs_list_num;
>> +    sPAPRConfigureConnectorStateCache *ccs_list_cache;
>>   
>>       /*< public >*/
>>       char *kvm_type;
>> @@ -589,6 +594,12 @@ struct sPAPRConfigureConnectorState {
>>       QTAILQ_ENTRY(sPAPRConfigureConnectorState) next;
>>   };
>>   
>> +struct sPAPRConfigureConnectorStateCache {
>> +    uint32_t drc_index;
>> +    int fdt_offset;
>> +    int fdt_depth;
>> +};
>> +
>>   void spapr_ccs_reset_hook(void *opaque);
>>   
>>   #define TYPE_SPAPR_RTC "spapr-rtc"
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index 1622638..7966979 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -549,9 +549,10 @@ extern const VMStateInfo vmstate_info_bitmap;
>>       .offset     = offsetof(_state, _field),                          \
>>   }
>>   
>> -#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
>> +#define VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, _test, _field_num, _version, _vmsd, _type) { \
>>       .name       = (stringify(_field)),                               \
>>       .version_id = (_version),                                        \
>> +    .field_exists = (_test),                                         \
>>       .vmsd       = &(_vmsd),                                          \
>>       .num_offset = vmstate_offset_value(_state, _field_num, int32_t), \
>>       .size       = sizeof(_type),                                     \
>> @@ -677,6 +678,11 @@ extern const VMStateInfo vmstate_info_bitmap;
>>       VMSTATE_STRUCT_ARRAY_TEST(_field, _state, _num, NULL, _version,   \
>>               _vmsd, _type)
>>   
>> +#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, \
>> +                                    _vmsd, _type)                         \
>> +    VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, NULL, _field_num,    \
>> +            _version, _vmsd, _type)
>> +
>>   #define VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, _info, _size) \
>>       VMSTATE_BUFFER_UNSAFE_INFO_TEST(_field, _state, NULL, _version, _info, \
>>               _size)


[-- Attachment #2: Type: text/html, Size: 11451 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] Migration: Defined VMStateDescription struct for spapr_drc
  2016-04-21 17:03     ` Jianjun Duan
@ 2016-04-22  4:25       ` David Gibson
  2016-04-22 16:47         ` [Qemu-devel] [Qemu-ppc] " Jianjun Duan
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2016-04-22  4:25 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, qemu-ppc, mdroth

[-- Attachment #1: Type: text/plain, Size: 3834 bytes --]

On Thu, Apr 21, 2016 at 10:03:56AM -0700, Jianjun Duan wrote:
> 
> 
> On 04/19/2016 09:32 PM, David Gibson wrote:
> >On Fri, Apr 15, 2016 at 01:33:02PM -0700, Jianjun Duan wrote:
> >>To manage hotplug/unplug of dynamic resources such as PCI cards,
> >>memory, and CPU on sPAPR guests, a firmware abstraction known as
> >>a Dynamic Resource Connector (DRC) is used to assign a particular
> >>dynamic resource to the guest, and provide an interface for the
> >>guest to manage configuration/removal of the resource associated
> >>with it.
> >>
> >>To migrate the hotplugged resources in migration, the
> >>associated DRC state need be migrated. To migrate the DRC state,
> >>we defined the VMStateDescription struct for spapr_drc to enable
> >>the transmission of spapr_drc state in migration.
> >>
> >>Not all the elements in the DRC state are migrated. Only those
> >>ones modifiable by guest actions or device add/remove operation
> >>are migrated.
> >>
> >>Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> >Urgh.  It would be really nice if we could avoid this and instead
> >calculate these states from other information.  I hate migrating
> >what's essentially transitory state if we can possibly avoid it - is
> >there any way to defer or retry hotplug operations to make this
> >unnececessary?
> >
> >Even if we have to migrate some state here, I'm a bit dubious about
> >whether directly migrating the PAPR indicator states is the best way.
> >It does have the advantage of having a spec, on the other hand the
> >PAPR indicators are really weird and hard to understand the meaning
> >of.
> I don't think we can avoid this. I would think migrating the machine state
> is
> actually a clean approach, as you said it does have PAPR spec.

Hm, ok.  The next question, though, is what's the absolute minimum of
state we can transfer.  The indicator states are described by PAPR,
but the configured and awaiting_release states are internal.  Not
thinking carefully about the data model of what we migrate is a recipe
for future migration problems.

> >>---
> >>  hw/ppc/spapr_drc.c | 15 +++++++++++++++
> >>  1 file changed, 15 insertions(+)
> >>
> >>diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> >>index 3173940..5f7a25f 100644
> >>--- a/hw/ppc/spapr_drc.c
> >>+++ b/hw/ppc/spapr_drc.c
> >>@@ -610,6 +610,20 @@ static void spapr_dr_connector_instance_init(Object *obj)
> >>                          NULL, NULL, NULL, NULL);
> >>  }
> >>+static const VMStateDescription vmstate_spapr_drc = {
> >>+    .name = "spapr_drc",
> >>+    .version_id = 1,
> >>+    .minimum_version_id = 1,
> >>+    .fields  = (VMStateField []) {
> >>+        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> >>+        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> >>+        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> >>+        VMSTATE_BOOL(configured, sPAPRDRConnector),
> >>+        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> >>+        VMSTATE_END_OF_LIST()
> >>+    }
> >>+};
> >>+
> >>  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >>  {
> >>      DeviceClass *dk = DEVICE_CLASS(k);
> >>@@ -618,6 +632,7 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >>      dk->reset = reset;
> >>      dk->realize = realize;
> >>      dk->unrealize = unrealize;
> >>+    dk->vmsd = &vmstate_spapr_drc;
> >>      drck->set_isolation_state = set_isolation_state;
> >>      drck->set_indicator_state = set_indicator_state;
> >>      drck->set_allocation_state = set_allocation_state;
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state
  2016-04-21 17:22     ` Jianjun Duan
@ 2016-04-22  4:28       ` David Gibson
  2016-04-22 16:55         ` Jianjun Duan
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2016-04-22  4:28 UTC (permalink / raw)
  To: Jianjun Duan; +Cc: qemu-devel, qemu-ppc, mdroth

[-- Attachment #1: Type: text/plain, Size: 11711 bytes --]

On Thu, Apr 21, 2016 at 10:22:10AM -0700, Jianjun Duan wrote:
> 
> 
> On 04/19/2016 10:14 PM, David Gibson wrote:
> >On Fri, Apr 15, 2016 at 01:33:04PM -0700, Jianjun Duan wrote:
> >>ccs_list in spapr state maintains the device tree related
> >>information on the rtas side for hotplugged devices. In racing
> >>situations between hotplug events and migration operation, a rtas
> >>hotplug event could be migrated from the source guest to target
> >>guest, or the source guest could have not yet finished fetching
> >>the device tree when migration is started, the target will try
> >>to finish fetching the device tree. By migrating ccs_list, the
> >>target can fetch the device tree properly.
> >>
> >>We tracked the size of ccs_list queue, and introduced a dynamic
> >>cache for ccs_list to get around having to create VMSD for the
> >>queue. There also existence tests in place for the newly added
> >>fields in the spapr state VMSD to make sure forward migration
> >>is not broken.
> >>
> >>Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> >So there's problems here at a couple of levels.
> >
> >1. Even more so that the indicators, this is transitional state, which
> >it would be really nice if we can avoid migrating.  At the very least
> >the new state should go into a subsection with an is_needed function
> >which will skip it if we don't have a configure connector in progress.
> >That means we'll be able to backwards migrate as long as we're not in
> >the middle of a hotplug, which won't be possible with this version.
> >
> >Again, if there's some way we can defer or retry the operation instead
> >of migrating the interim state, that would be even better.
> I am not sure why transitional state should not be migrated.

Because every extra piece of state to migrate is something which can
go wrong.  Getting migration working properly is a difficult and
fragile process as it is - every extra bit of state we add makes it
harder.

> I think the
> fact that it changes is the reason why it should be migrated. As for
> backward migration, I thought about it, but decided to leave it later
> to beat the time. We can do it later, or do it this time and delayed the
> patches more. I agree that subsection seems to be the solution here.

Leaving backwards migration until later - after you've already
introduced a new stream version - will make implementing it much, much
more difficult.

> >2. Having to copy the list of elements out into an array just for
> >migration is pretty horrible.  I'm almost include to suggest we should
> >add list migration into the savevm core rather than this.
> I thought about a general approach of adding something to savevm to
> handle the queue. But the queue used here uses macro and doesn't really
> support polymorphism.  And we need to use  QTAILQ_FOREACH(var, head, field)
> to access it. It is not easy as we may need to modify the macro definition
> to carry
> type name of the elements of the queue.
> 
> Also I am following Alexey's code here ([PATCH qemu v15 07/17] spapr_iommu:
> Migrate full state).
> It was reviewed by you.

Yeah.  It's not incorrect, but it's ugly and every extra time I see
it, the impetus to find a better way increases.

> >>---
> >>  hw/ppc/spapr.c              | 60 ++++++++++++++++++++++++++++++++++++++++++++-
> >>  hw/ppc/spapr_rtas.c         |  2 ++
> >>  include/hw/ppc/spapr.h      | 11 +++++++++
> >>  include/migration/vmstate.h |  8 +++++-
> >>  4 files changed, 79 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>index af4745c..eab95f0 100644
> >>--- a/hw/ppc/spapr.c
> >>+++ b/hw/ppc/spapr.c
> >>@@ -1245,10 +1245,31 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
> >>      }
> >>  }
> >>+static void spapr_pre_save(void *opaque)
> >>+{
> >>+    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> >>+    sPAPRConfigureConnectorState *ccs;
> >>+    sPAPRConfigureConnectorStateCache *ccs_cache;
> >>+
> >>+    /* Copy ccs_list to ccs_list_cache */
> >>+    spapr->ccs_list_cache = g_new0(sPAPRConfigureConnectorStateCache,
> >>+                                   spapr->ccs_list_num);
> >>+    ccs_cache = spapr->ccs_list_cache;
> >>+    QTAILQ_FOREACH(ccs, &spapr->ccs_list, next) {
> >>+        ccs_cache->drc_index = ccs->drc_index;
> >>+        ccs_cache->fdt_offset = ccs->fdt_offset;
> >>+        ccs_cache->fdt_depth = ccs->fdt_depth;
> >>+        ccs_cache++;
> >>+    }
> >>+}
> >>+
> >>  static int spapr_post_load(void *opaque, int version_id)
> >>  {
> >>      sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> >>      int err = 0;
> >>+    sPAPRConfigureConnectorState *ccs;
> >>+    sPAPRConfigureConnectorStateCache *ccs_cache = spapr->ccs_list_cache;
> >>+    int index = 0;
> >>      /* In earlier versions, there was no separate qdev for the PAPR
> >>       * RTC, so the RTC offset was stored directly in sPAPREnvironment.
> >>@@ -1258,6 +1279,19 @@ static int spapr_post_load(void *opaque, int version_id)
> >>          err = spapr_rtc_import_offset(spapr->rtc, spapr->rtc_offset);
> >>      }
> >>+    if (version_id < 4) {
> >>+        return err;
> >>+    }
> >>+    /* Copy ccs_list_cache to ccs_list */
> >>+    for (index = 0; index < spapr->ccs_list_num; index ++) {
> >>+        ccs = g_new0(sPAPRConfigureConnectorState, 1);
> >>+        ccs->drc_index = (ccs_cache + index)->drc_index;
> >>+        ccs->fdt_offset = (ccs_cache + index)->fdt_offset;
> >>+        ccs->fdt_depth = (ccs_cache + index)->fdt_depth;
> >>+        QTAILQ_INSERT_TAIL(&spapr->ccs_list, ccs, next);
> >>+    }
> >>+    g_free(spapr->ccs_list_cache);
> >>+
> >>      return err;
> >>  }
> >>@@ -1266,10 +1300,28 @@ static bool version_before_3(void *opaque, int version_id)
> >>      return version_id < 3;
> >>  }
> >>+static bool version_ge_4(void *opaque, int version_id)
> >>+{
> >>+    return version_id >= 4;
> >>+}
> >>+
> >>+static const VMStateDescription vmstate_spapr_ccs_cache = {
> >>+    .name = "spaprconfigureconnectorstate",
> >>+    .version_id = 1,
> >>+    .minimum_version_id = 1,
> >>+    .fields = (VMStateField[]) {
> >>+        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorStateCache),
> >>+        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorStateCache),
> >>+        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorStateCache),
> >>+        VMSTATE_END_OF_LIST()
> >>+    },
> >>+};
> >>+
> >>  static const VMStateDescription vmstate_spapr = {
> >>      .name = "spapr",
> >>-    .version_id = 3,
> >>+    .version_id = 4,
> >>      .minimum_version_id = 1,
> >>+    .pre_save = spapr_pre_save,
> >>      .post_load = spapr_post_load,
> >>      .fields = (VMStateField[]) {
> >>          /* used to be @next_irq */
> >>@@ -1279,6 +1331,12 @@ static const VMStateDescription vmstate_spapr = {
> >>          VMSTATE_UINT64_TEST(rtc_offset, sPAPRMachineState, version_before_3),
> >>          VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
> >>+        /* RTAS state */
> >>+        VMSTATE_INT32_TEST(ccs_list_num, sPAPRMachineState, version_ge_4),
> >You don't generally need to write your own version test functions for specific-version.  Instead you can just use VMSTATE_INT32_V.
> I agree. I realized that after the code was posted here.

Ok.

> >>+        VMSTATE_STRUCT_VARRAY_ALLOC_TEST(ccs_list_cache, sPAPRMachineState,
> >>+                                         version_ge_4, ccs_list_num, 1,
> >>+                                         vmstate_spapr_ccs_cache,
> >>+                                         sPAPRConfigureConnectorStateCache),
> >>          VMSTATE_END_OF_LIST()
> >>      },
> >>  };
> >>diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>index f073258..9cfd559 100644
> >>--- a/hw/ppc/spapr_rtas.c
> >>+++ b/hw/ppc/spapr_rtas.c
> >>@@ -70,6 +70,7 @@ static void spapr_ccs_add(sPAPRMachineState *spapr,
> >>  {
> >>      g_assert(!spapr_ccs_find(spapr, ccs->drc_index));
> >>      QTAILQ_INSERT_HEAD(&spapr->ccs_list, ccs, next);
> >>+    spapr->ccs_list_num++;
> >>  }
> >>  static void spapr_ccs_remove(sPAPRMachineState *spapr,
> >>@@ -77,6 +78,7 @@ static void spapr_ccs_remove(sPAPRMachineState *spapr,
> >>  {
> >>      QTAILQ_REMOVE(&spapr->ccs_list, ccs, next);
> >>      g_free(ccs);
> >>+    spapr->ccs_list_num--;
> >>  }
> >>  void spapr_ccs_reset_hook(void *opaque)
> >>diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >>index 815d5ee..c8be926 100644
> >>--- a/include/hw/ppc/spapr.h
> >>+++ b/include/hw/ppc/spapr.h
> >>@@ -11,6 +11,8 @@ struct VIOsPAPRBus;
> >>  struct sPAPRPHBState;
> >>  struct sPAPRNVRAM;
> >>  typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
> >>+typedef struct sPAPRConfigureConnectorStateCache
> >>+        sPAPRConfigureConnectorStateCache;
> >>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
> >>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
> >>@@ -75,6 +77,9 @@ struct sPAPRMachineState {
> >>      /* RTAS state */
> >>      QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
> >>+    /* Temporary cache for migration purposes */
> >>+    int32_t ccs_list_num;
> >>+    sPAPRConfigureConnectorStateCache *ccs_list_cache;
> >>      /*< public >*/
> >>      char *kvm_type;
> >>@@ -589,6 +594,12 @@ struct sPAPRConfigureConnectorState {
> >>      QTAILQ_ENTRY(sPAPRConfigureConnectorState) next;
> >>  };
> >>+struct sPAPRConfigureConnectorStateCache {
> >>+    uint32_t drc_index;
> >>+    int fdt_offset;
> >>+    int fdt_depth;
> >>+};
> >>+
> >>  void spapr_ccs_reset_hook(void *opaque);
> >>  #define TYPE_SPAPR_RTC "spapr-rtc"
> >>diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> >>index 1622638..7966979 100644
> >>--- a/include/migration/vmstate.h
> >>+++ b/include/migration/vmstate.h
> >>@@ -549,9 +549,10 @@ extern const VMStateInfo vmstate_info_bitmap;
> >>      .offset     = offsetof(_state, _field),                          \
> >>  }
> >>-#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
> >>+#define VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, _test, _field_num, _version, _vmsd, _type) { \
> >>      .name       = (stringify(_field)),                               \
> >>      .version_id = (_version),                                        \
> >>+    .field_exists = (_test),                                         \
> >>      .vmsd       = &(_vmsd),                                          \
> >>      .num_offset = vmstate_offset_value(_state, _field_num, int32_t), \
> >>      .size       = sizeof(_type),                                     \
> >>@@ -677,6 +678,11 @@ extern const VMStateInfo vmstate_info_bitmap;
> >>      VMSTATE_STRUCT_ARRAY_TEST(_field, _state, _num, NULL, _version,   \
> >>              _vmsd, _type)
> >>+#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, \
> >>+                                    _vmsd, _type)                         \
> >>+    VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, NULL, _field_num,    \
> >>+            _version, _vmsd, _type)
> >>+
> >>  #define VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, _info, _size) \
> >>      VMSTATE_BUFFER_UNSAFE_INFO_TEST(_field, _state, NULL, _version, _info, \
> >>              _size)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/5] Migration: Defined VMStateDescription struct for spapr_drc
  2016-04-22  4:25       ` David Gibson
@ 2016-04-22 16:47         ` Jianjun Duan
  0 siblings, 0 replies; 17+ messages in thread
From: Jianjun Duan @ 2016-04-22 16:47 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, mdroth



On 04/21/2016 09:25 PM, David Gibson wrote:
> On Thu, Apr 21, 2016 at 10:03:56AM -0700, Jianjun Duan wrote:
>>
>>
>> On 04/19/2016 09:32 PM, David Gibson wrote:
>>> On Fri, Apr 15, 2016 at 01:33:02PM -0700, Jianjun Duan wrote:
>>>> To manage hotplug/unplug of dynamic resources such as PCI cards,
>>>> memory, and CPU on sPAPR guests, a firmware abstraction known as
>>>> a Dynamic Resource Connector (DRC) is used to assign a particular
>>>> dynamic resource to the guest, and provide an interface for the
>>>> guest to manage configuration/removal of the resource associated
>>>> with it.
>>>>
>>>> To migrate the hotplugged resources in migration, the
>>>> associated DRC state need be migrated. To migrate the DRC state,
>>>> we defined the VMStateDescription struct for spapr_drc to enable
>>>> the transmission of spapr_drc state in migration.
>>>>
>>>> Not all the elements in the DRC state are migrated. Only those
>>>> ones modifiable by guest actions or device add/remove operation
>>>> are migrated.
>>>>
>>>> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
>>> Urgh.  It would be really nice if we could avoid this and instead
>>> calculate these states from other information.  I hate migrating
>>> what's essentially transitory state if we can possibly avoid it - is
>>> there any way to defer or retry hotplug operations to make this
>>> unnececessary?
>>>
>>> Even if we have to migrate some state here, I'm a bit dubious about
>>> whether directly migrating the PAPR indicator states is the best way.
>>> It does have the advantage of having a spec, on the other hand the
>>> PAPR indicators are really weird and hard to understand the meaning
>>> of.
>> I don't think we can avoid this. I would think migrating the machine state
>> is
>> actually a clean approach, as you said it does have PAPR spec.
> 
> Hm, ok.  The next question, though, is what's the absolute minimum of
> state we can transfer.  The indicator states are described by PAPR,
> but the configured and awaiting_release states are internal.  Not
> thinking carefully about the data model of what we migrate is a recipe
> for future migration problems.
> 
>From the perspective of device hotplugging, if we hotplug a device on
the source, we need to "coldplug" it on the target. The states across
two hosts for the same device are not the same. Ideally we want the
states be same after migration so that the device would function as
hotplugged on the target. For example we can unplug it. DRC is the
mechanism used to implement hotplugging, its state need be transferred.
The minimum state we need to transfer should cover all the pieces
changed by hotplugging.
>>>> ---
>>>>  hw/ppc/spapr_drc.c | 15 +++++++++++++++
>>>>  1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>>>> index 3173940..5f7a25f 100644
>>>> --- a/hw/ppc/spapr_drc.c
>>>> +++ b/hw/ppc/spapr_drc.c
>>>> @@ -610,6 +610,20 @@ static void spapr_dr_connector_instance_init(Object *obj)
>>>>                          NULL, NULL, NULL, NULL);
>>>>  }
>>>> +static const VMStateDescription vmstate_spapr_drc = {
>>>> +    .name = "spapr_drc",
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .fields  = (VMStateField []) {
>>>> +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
>>>> +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
>>>> +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
>>>> +        VMSTATE_BOOL(configured, sPAPRDRConnector),
>>>> +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    }
>>>> +};
>>>> +
>>>>  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>>>>  {
>>>>      DeviceClass *dk = DEVICE_CLASS(k);
>>>> @@ -618,6 +632,7 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>>>>      dk->reset = reset;
>>>>      dk->realize = realize;
>>>>      dk->unrealize = unrealize;
>>>> +    dk->vmsd = &vmstate_spapr_drc;
>>>>      drck->set_isolation_state = set_isolation_state;
>>>>      drck->set_indicator_state = set_indicator_state;
>>>>      drck->set_allocation_state = set_allocation_state;
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state
  2016-04-22  4:28       ` David Gibson
@ 2016-04-22 16:55         ` Jianjun Duan
  0 siblings, 0 replies; 17+ messages in thread
From: Jianjun Duan @ 2016-04-22 16:55 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, mdroth



On 04/21/2016 09:28 PM, David Gibson wrote:
> On Thu, Apr 21, 2016 at 10:22:10AM -0700, Jianjun Duan wrote:
>>
>>
>> On 04/19/2016 10:14 PM, David Gibson wrote:
>>> On Fri, Apr 15, 2016 at 01:33:04PM -0700, Jianjun Duan wrote:
>>>> ccs_list in spapr state maintains the device tree related
>>>> information on the rtas side for hotplugged devices. In racing
>>>> situations between hotplug events and migration operation, a rtas
>>>> hotplug event could be migrated from the source guest to target
>>>> guest, or the source guest could have not yet finished fetching
>>>> the device tree when migration is started, the target will try
>>>> to finish fetching the device tree. By migrating ccs_list, the
>>>> target can fetch the device tree properly.
>>>>
>>>> We tracked the size of ccs_list queue, and introduced a dynamic
>>>> cache for ccs_list to get around having to create VMSD for the
>>>> queue. There also existence tests in place for the newly added
>>>> fields in the spapr state VMSD to make sure forward migration
>>>> is not broken.
>>>>
>>>> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
>>> So there's problems here at a couple of levels.
>>>
>>> 1. Even more so that the indicators, this is transitional state, which
>>> it would be really nice if we can avoid migrating.  At the very least
>>> the new state should go into a subsection with an is_needed function
>>> which will skip it if we don't have a configure connector in progress.
>>> That means we'll be able to backwards migrate as long as we're not in
>>> the middle of a hotplug, which won't be possible with this version.
>>>
>>> Again, if there's some way we can defer or retry the operation instead
>>> of migrating the interim state, that would be even better.
>> I am not sure why transitional state should not be migrated.
> 
> Because every extra piece of state to migrate is something which can
> go wrong.  Getting migration working properly is a difficult and
> fragile process as it is - every extra bit of state we add makes it
> harder.
> 
Migrating this and pending_events does fix the racing. The alternative
such as delay or throttling involves timing issues, which IMHO are even
trickier to get it right.
>> I think the
>> fact that it changes is the reason why it should be migrated. As for
>> backward migration, I thought about it, but decided to leave it later
>> to beat the time. We can do it later, or do it this time and delayed the
>> patches more. I agree that subsection seems to be the solution here.
> 
> Leaving backwards migration until later - after you've already
> introduced a new stream version - will make implementing it much, much
> more difficult.
> 
I will use subsection to fix the backward migration.
>>> 2. Having to copy the list of elements out into an array just for
>>> migration is pretty horrible.  I'm almost include to suggest we should
>>> add list migration into the savevm core rather than this.
>> I thought about a general approach of adding something to savevm to
>> handle the queue. But the queue used here uses macro and doesn't really
>> support polymorphism.  And we need to use  QTAILQ_FOREACH(var, head, field)
>> to access it. It is not easy as we may need to modify the macro definition
>> to carry
>> type name of the elements of the queue.
>>
>> Also I am following Alexey's code here ([PATCH qemu v15 07/17] spapr_iommu:
>> Migrate full state).
>> It was reviewed by you.
> 
> Yeah.  It's not incorrect, but it's ugly and every extra time I see
> it, the impetus to find a better way increases.
> 
I agree it is not elegant. I will look into it to see if I can create
something similar to QTAILQ_FOREACH to get around the type issue.
>>>> ---
>>>>  hw/ppc/spapr.c              | 60 ++++++++++++++++++++++++++++++++++++++++++++-
>>>>  hw/ppc/spapr_rtas.c         |  2 ++
>>>>  include/hw/ppc/spapr.h      | 11 +++++++++
>>>>  include/migration/vmstate.h |  8 +++++-
>>>>  4 files changed, 79 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index af4745c..eab95f0 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -1245,10 +1245,31 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
>>>>      }
>>>>  }
>>>> +static void spapr_pre_save(void *opaque)
>>>> +{
>>>> +    sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
>>>> +    sPAPRConfigureConnectorState *ccs;
>>>> +    sPAPRConfigureConnectorStateCache *ccs_cache;
>>>> +
>>>> +    /* Copy ccs_list to ccs_list_cache */
>>>> +    spapr->ccs_list_cache = g_new0(sPAPRConfigureConnectorStateCache,
>>>> +                                   spapr->ccs_list_num);
>>>> +    ccs_cache = spapr->ccs_list_cache;
>>>> +    QTAILQ_FOREACH(ccs, &spapr->ccs_list, next) {
>>>> +        ccs_cache->drc_index = ccs->drc_index;
>>>> +        ccs_cache->fdt_offset = ccs->fdt_offset;
>>>> +        ccs_cache->fdt_depth = ccs->fdt_depth;
>>>> +        ccs_cache++;
>>>> +    }
>>>> +}
>>>> +
>>>>  static int spapr_post_load(void *opaque, int version_id)
>>>>  {
>>>>      sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
>>>>      int err = 0;
>>>> +    sPAPRConfigureConnectorState *ccs;
>>>> +    sPAPRConfigureConnectorStateCache *ccs_cache = spapr->ccs_list_cache;
>>>> +    int index = 0;
>>>>      /* In earlier versions, there was no separate qdev for the PAPR
>>>>       * RTC, so the RTC offset was stored directly in sPAPREnvironment.
>>>> @@ -1258,6 +1279,19 @@ static int spapr_post_load(void *opaque, int version_id)
>>>>          err = spapr_rtc_import_offset(spapr->rtc, spapr->rtc_offset);
>>>>      }
>>>> +    if (version_id < 4) {
>>>> +        return err;
>>>> +    }
>>>> +    /* Copy ccs_list_cache to ccs_list */
>>>> +    for (index = 0; index < spapr->ccs_list_num; index ++) {
>>>> +        ccs = g_new0(sPAPRConfigureConnectorState, 1);
>>>> +        ccs->drc_index = (ccs_cache + index)->drc_index;
>>>> +        ccs->fdt_offset = (ccs_cache + index)->fdt_offset;
>>>> +        ccs->fdt_depth = (ccs_cache + index)->fdt_depth;
>>>> +        QTAILQ_INSERT_TAIL(&spapr->ccs_list, ccs, next);
>>>> +    }
>>>> +    g_free(spapr->ccs_list_cache);
>>>> +
>>>>      return err;
>>>>  }
>>>> @@ -1266,10 +1300,28 @@ static bool version_before_3(void *opaque, int version_id)
>>>>      return version_id < 3;
>>>>  }
>>>> +static bool version_ge_4(void *opaque, int version_id)
>>>> +{
>>>> +    return version_id >= 4;
>>>> +}
>>>> +
>>>> +static const VMStateDescription vmstate_spapr_ccs_cache = {
>>>> +    .name = "spaprconfigureconnectorstate",
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_UINT32(drc_index, sPAPRConfigureConnectorStateCache),
>>>> +        VMSTATE_INT32(fdt_offset, sPAPRConfigureConnectorStateCache),
>>>> +        VMSTATE_INT32(fdt_depth, sPAPRConfigureConnectorStateCache),
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    },
>>>> +};
>>>> +
>>>>  static const VMStateDescription vmstate_spapr = {
>>>>      .name = "spapr",
>>>> -    .version_id = 3,
>>>> +    .version_id = 4,
>>>>      .minimum_version_id = 1,
>>>> +    .pre_save = spapr_pre_save,
>>>>      .post_load = spapr_post_load,
>>>>      .fields = (VMStateField[]) {
>>>>          /* used to be @next_irq */
>>>> @@ -1279,6 +1331,12 @@ static const VMStateDescription vmstate_spapr = {
>>>>          VMSTATE_UINT64_TEST(rtc_offset, sPAPRMachineState, version_before_3),
>>>>          VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
>>>> +        /* RTAS state */
>>>> +        VMSTATE_INT32_TEST(ccs_list_num, sPAPRMachineState, version_ge_4),
>>> You don't generally need to write your own version test functions for specific-version.  Instead you can just use VMSTATE_INT32_V.
>> I agree. I realized that after the code was posted here.
> 
> Ok.
> 
>>>> +        VMSTATE_STRUCT_VARRAY_ALLOC_TEST(ccs_list_cache, sPAPRMachineState,
>>>> +                                         version_ge_4, ccs_list_num, 1,
>>>> +                                         vmstate_spapr_ccs_cache,
>>>> +                                         sPAPRConfigureConnectorStateCache),
>>>>          VMSTATE_END_OF_LIST()
>>>>      },
>>>>  };
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index f073258..9cfd559 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -70,6 +70,7 @@ static void spapr_ccs_add(sPAPRMachineState *spapr,
>>>>  {
>>>>      g_assert(!spapr_ccs_find(spapr, ccs->drc_index));
>>>>      QTAILQ_INSERT_HEAD(&spapr->ccs_list, ccs, next);
>>>> +    spapr->ccs_list_num++;
>>>>  }
>>>>  static void spapr_ccs_remove(sPAPRMachineState *spapr,
>>>> @@ -77,6 +78,7 @@ static void spapr_ccs_remove(sPAPRMachineState *spapr,
>>>>  {
>>>>      QTAILQ_REMOVE(&spapr->ccs_list, ccs, next);
>>>>      g_free(ccs);
>>>> +    spapr->ccs_list_num--;
>>>>  }
>>>>  void spapr_ccs_reset_hook(void *opaque)
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index 815d5ee..c8be926 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -11,6 +11,8 @@ struct VIOsPAPRBus;
>>>>  struct sPAPRPHBState;
>>>>  struct sPAPRNVRAM;
>>>>  typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
>>>> +typedef struct sPAPRConfigureConnectorStateCache
>>>> +        sPAPRConfigureConnectorStateCache;
>>>>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>>>>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
>>>> @@ -75,6 +77,9 @@ struct sPAPRMachineState {
>>>>      /* RTAS state */
>>>>      QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
>>>> +    /* Temporary cache for migration purposes */
>>>> +    int32_t ccs_list_num;
>>>> +    sPAPRConfigureConnectorStateCache *ccs_list_cache;
>>>>      /*< public >*/
>>>>      char *kvm_type;
>>>> @@ -589,6 +594,12 @@ struct sPAPRConfigureConnectorState {
>>>>      QTAILQ_ENTRY(sPAPRConfigureConnectorState) next;
>>>>  };
>>>> +struct sPAPRConfigureConnectorStateCache {
>>>> +    uint32_t drc_index;
>>>> +    int fdt_offset;
>>>> +    int fdt_depth;
>>>> +};
>>>> +
>>>>  void spapr_ccs_reset_hook(void *opaque);
>>>>  #define TYPE_SPAPR_RTC "spapr-rtc"
>>>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>>>> index 1622638..7966979 100644
>>>> --- a/include/migration/vmstate.h
>>>> +++ b/include/migration/vmstate.h
>>>> @@ -549,9 +549,10 @@ extern const VMStateInfo vmstate_info_bitmap;
>>>>      .offset     = offsetof(_state, _field),                          \
>>>>  }
>>>> -#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
>>>> +#define VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, _test, _field_num, _version, _vmsd, _type) { \
>>>>      .name       = (stringify(_field)),                               \
>>>>      .version_id = (_version),                                        \
>>>> +    .field_exists = (_test),                                         \
>>>>      .vmsd       = &(_vmsd),                                          \
>>>>      .num_offset = vmstate_offset_value(_state, _field_num, int32_t), \
>>>>      .size       = sizeof(_type),                                     \
>>>> @@ -677,6 +678,11 @@ extern const VMStateInfo vmstate_info_bitmap;
>>>>      VMSTATE_STRUCT_ARRAY_TEST(_field, _state, _num, NULL, _version,   \
>>>>              _vmsd, _type)
>>>> +#define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, \
>>>> +                                    _vmsd, _type)                         \
>>>> +    VMSTATE_STRUCT_VARRAY_ALLOC_TEST(_field, _state, NULL, _field_num,    \
>>>> +            _version, _vmsd, _type)
>>>> +
>>>>  #define VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, _info, _size) \
>>>>      VMSTATE_BUFFER_UNSAFE_INFO_TEST(_field, _state, NULL, _version, _info, \
>>>>              _size)
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-04-22 16:56 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-15 20:33 [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together Jianjun Duan
2016-04-15 20:33 ` [Qemu-devel] [PATCH 1/5] spapr: ensure device trees are always associated with DRC Jianjun Duan
2016-04-20  4:30   ` David Gibson
2016-04-21 16:53     ` [Qemu-devel] [Qemu-ppc] " Jianjun Duan
2016-04-15 20:33 ` [Qemu-devel] [PATCH 2/5] Migration: Defined VMStateDescription struct for spapr_drc Jianjun Duan
2016-04-20  4:32   ` David Gibson
2016-04-21 17:03     ` Jianjun Duan
2016-04-22  4:25       ` David Gibson
2016-04-22 16:47         ` [Qemu-devel] [Qemu-ppc] " Jianjun Duan
2016-04-15 20:33 ` [Qemu-devel] [PATCH 3/5] vmstate: Define VARRAY with VMS_ALLOC Jianjun Duan
2016-04-15 20:33 ` [Qemu-devel] [PATCH 4/5] Migration: migrate ccs_list in spapr state Jianjun Duan
2016-04-20  5:14   ` David Gibson
2016-04-21 17:22     ` Jianjun Duan
2016-04-22  4:28       ` David Gibson
2016-04-22 16:55         ` Jianjun Duan
2016-04-15 20:33 ` [Qemu-devel] [PATCH 5/5] Migration: migrate pending_events of " Jianjun Duan
2016-04-20  4:29 ` [Qemu-devel] [PATCH 0/5] migration: ensure hotplug and migration work together David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.