All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration
@ 2016-11-18  1:40 Michael Roth
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry Michael Roth
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Michael Roth @ 2016-11-18  1:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david, duanj, bharata, dgilbert, quintela, amit.shah

These patches are based on David's ppc-for-2.8 tree, and are also
available from:

  https://github.com/mdroth/qemu/commits/spapr-cas-migration

Currently, memory hotplugged to a pseries guest cannot be removed after
the guest has been migrated. This is due to 2 issues:

1) The coldplugged state of memory on the target side is one where the
   corresponding DRC's allocation state is:

     allocation_state == unallocated,
     awaiting_allocation == true,

   When the guest attempts to unplug memory on the target side, it first
   checks that allocation_state == allocated. If we fix this, the guest
   can successfully notify QEMU of completion on it's end, but then the
   DRC code sees that awaiting_allocation == true, so it defers the
   finalizing of the LMB and corresponding DIMM since it assumes that
   the DIMM must have been previously allocated before it can be removed.

   To address this, we pull in patches 1-2 from Jian Jun's DRC migration
   series:

     https://lists.gnu.org/archive/html/qemu-ppc/2016-10/msg00048.html

   with some minor changes relating to prior review comments, and
   the addition of migrating the DRC's awaiting_allocation value, which
   wasn't part of the original patch. This doesn't address the full scope
   of the issues Jian Jun was looking at (involving synchronizing state
   when migration occurs during fairly small race windows), just this
   particular case, which is more user visible since the time window is
   indefinite.

2) The ability to unplug memory is gated on the QEMU side by a check as
   to whether or not support for newer-style hotplug events was negotiated
   via CAS during boot. The check is performed by checking the corresponding
   entry in the sPAPROptionVector structure. However, since this value isn't
   migrated currently, we are unable to unplug until after the guest reboots.

   We address that here by adding migration support for sPAPROptionVectors,
   and including the CAS-negotiated vector as part of the migration stream
   for any cases where we advertise newer-style hotplug event support to
   the guest.

David,

These fixes ended up going out much later than planned. I'm not sure
if you're planning another pull for 2.8 or not, and realize there are
some patches here not specifically pseries-related so it's
understandable if we opt to pursue these for 2.9/2.8.1 instead. But if
possible I'm hoping to get these in so that the memory unplug
support is fully functional for 2.8.


 hw/core/qdev.c              |  6 +++++-
 hw/ppc/spapr.c              | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_drc.c          | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_ovec.c         | 12 ++++++++++++
 hw/ppc/spapr_pci.c          | 22 ++++++++++++++++++++++
 include/hw/ppc/spapr_drc.h  |  9 +++++++++
 include/hw/ppc/spapr_ovec.h |  4 ++++
 include/hw/qdev-core.h      |  9 +++++++++
 migration/savevm.c          |  4 ++--
 9 files changed, 201 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry
  2016-11-18  1:40 [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration Michael Roth
@ 2016-11-18  1:40 ` Michael Roth
  2016-11-22  6:15   ` David Gibson
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct Michael Roth
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Michael Roth @ 2016-11-18  1:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david, duanj, bharata, dgilbert, quintela, amit.shah

From: Jianjun Duan <duanj@linux.vnet.ibm.com>

Currently migrated Devices are identified with an idstr which is
calculated automatically using their path in the QOM composition
tree. In some cases these Devices may have a more reliable
identifier that may be preferable in situations where there's a
chance their path in the composition tree might change in the
future as a resulting of changes in how the device is modeled
in QEMU.

One such Device is the "spapr-dr-connector" used to handle hotplug
for various resources on pseries machine types, where the PAPR
specification mandates that each DRC have a 32-bit globally unique
identifier associated with it. As such, this identifier is also ideal
as a reliable way to identify a particular DRC in the migration
stream, so we introduce support here for using a caller-side supplied
instance_id for Devices in preparation for that.

register_savevm_live() and vmstate_register_with_alias_id() already
provide a means to let the caller supply an instance_id instead of
relying on the migration subsystem to generate one automatically,
but in cases where we're registering SaveVMHandlers or VMSDs
(respectively) that are associated with a Device, the instance_id is
ignored in favor of a string identifier based solely on the QOM path.

This patch generalizes this so that an instance_id can also be
supplied by the caller for Devices. Since VMSD registration for
Devices is generally handled automatically by qdev, we also introduce
a DeviceClass->dev_get_instance_id() method that, when set, is called
by qdev to obtain the corresponding instance_id that should be used
for a particular Device. Otherwise we maintain the original behavior
of passing instance_id == -1 and thus falling back to the previous
logic of using the QOM path.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
* moved usage of DeviceClass->dev_get_instance_id() out of savevm.c
  and into caller (qdev) instead
* clarified usage/intent in comments and commit msg
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/core/qdev.c         | 6 +++++-
 include/hw/qdev-core.h | 9 +++++++++
 migration/savevm.c     | 4 ++--
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 5783442..c8c0c44 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -933,7 +933,11 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
         }
 
         if (qdev_get_vmsd(dev)) {
-            vmstate_register_with_alias_id(dev, -1, qdev_get_vmsd(dev), dev,
+            int instance_id = DEVICE_GET_CLASS(dev)->dev_get_instance_id
+                ? DEVICE_GET_CLASS(dev)->dev_get_instance_id(dev)
+                : -1;
+
+            vmstate_register_with_alias_id(dev, instance_id, qdev_get_vmsd(dev), dev,
                                            dev->instance_id_alias,
                                            dev->alias_required_for_version);
         }
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 2c97347..8ba82af 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -139,6 +139,15 @@ typedef struct DeviceClass {
     qdev_initfn init; /* TODO remove, once users are converted to realize */
     qdev_event exit; /* TODO remove, once users are converted to unrealize */
     const char *bus_type;
+
+    /* When this field is set, instead of using the device's QOM path,
+     * SaveStateEntry's for devices will be identified using a combination
+     * of the corresponding VMSD name and an instance_id returned by this
+     * function. This should only be necessary for situations where the
+     * QOM path is anticipated to change and a more stable identifier is
+     * desired to identify a device in the migration stream.
+     */
+    int (*dev_get_instance_id)(DeviceState *dev);
 } DeviceClass;
 
 typedef struct NamedGPIOList NamedGPIOList;
diff --git a/migration/savevm.c b/migration/savevm.c
index 0363372..a95fff9 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -556,7 +556,7 @@ int register_savevm_live(DeviceState *dev,
         se->is_ram = 1;
     }
 
-    if (dev) {
+    if (dev && instance_id == -1) {
         char *id = qdev_get_dev_path(dev);
         if (id) {
             pstrcpy(se->idstr, sizeof(se->idstr), id);
@@ -640,7 +640,7 @@ int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
     se->vmsd = vmsd;
     se->alias_id = alias_id;
 
-    if (dev) {
+    if (dev && instance_id == -1) {
         char *id = qdev_get_dev_path(dev);
         if (id) {
             pstrcpy(se->idstr, sizeof(se->idstr), id);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct
  2016-11-18  1:40 [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration Michael Roth
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry Michael Roth
@ 2016-11-18  1:40 ` Michael Roth
  2016-11-18  6:04   ` David Gibson
  2016-11-22 16:35   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 3/3] spapr: migration support for CAS-negotiated option vectors Michael Roth
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 21+ messages in thread
From: Michael Roth @ 2016-11-18  1:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david, duanj, bharata, dgilbert, quintela, amit.shah

From: Jianjun Duan <duanj@linux.vnet.ibm.com>

To manage hotplug/unplug of dynamic resources such as PCI cards,
memory, and CPU on sPAPR guests, a firmware abstraction known as
a Dynamic Resource Connector (DRC) is used to assign a particular
dynamic resource to the guest, and provide an interface for the
guest to manage configuration/removal of the resource associated
with it.

To migrate the hotplugged resources in migration, the
associated DRC state need be migrated. To migrate the DRC state,
we defined the VMStateDescription struct for spapr_drc to enable
the transmission of spapr_drc state in migration.

Not all the elements in the DRC state are migrated. Only those
ones modifiable or needed by guest actions or device add/remove
operation are migrated. From the perspective of device
hotplugging, if we hotplug a device on the source, we need to
"coldplug" it on the target. The states across two hosts for the
same device are not the same. Ideally we want the states be same
after migration so that the device would function as hotplugged
on the target. For example we can unplug it. The minimum DRC
state we need to transfer should cover all the pieces changed by
hotplugging. Out of the elements of the DRC state, isolation_state,
allocation_sate, and configured are involved in the DR state
transition diagram from PAPR+ 2.7, 13.4. configured and signalled
are needed in attaching and detaching devices. indicator_state
provides users with hardware state information. These 6 elements
are migrated.

detach_cb in the DRC state is a function pointer that cannot be
migrated. We set it right after DRC state is migrated so that
a migrated hot-unplug event could finish its work.

The instance_id is used to identify objects in migration. We set
instance_id of DRC using the unique index so that it is the same
across migration.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
* add migration for awaiting_allocation state
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_drc.c         | 70 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_pci.c         | 22 +++++++++++++++
 include/hw/ppc/spapr_drc.h |  9 ++++++
 3 files changed, 101 insertions(+)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index a0c44ee..1ec6551 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -632,6 +632,72 @@ static void spapr_dr_connector_instance_init(Object *obj)
                         NULL, NULL, NULL, NULL);
 }
 
+static bool spapr_drc_needed(void *opaque)
+{
+    sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    bool rc = false;
+    sPAPRDREntitySense value;
+
+    drck->entity_sense(drc, &value);
+    /* If no dev is plugged in there is no need to migrate the DRC state */
+    if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
+        return false;
+    }
+    /*
+     * If there is dev plugged in, we need to migrate the DRC state when
+     * it is different from cold-plugged state
+     */
+    switch(drc->type) {
+    /* for PCI type */
+    case SPAPR_DR_CONNECTOR_TYPE_PCI:
+        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
+               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
+               drc->configured && drc->signalled && !drc->awaiting_release);
+        break;
+    /* for LMB type */
+    case SPAPR_DR_CONNECTOR_TYPE_LMB:
+        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
+               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
+               drc->configured && drc->signalled && !drc->awaiting_release);
+        break;
+    default:
+        ;
+    }
+
+    return rc;
+}
+
+/* detach_cb needs be set since it is not migrated */
+static void postmigrate_set_detach_cb(sPAPRDRConnector *drc,
+                                      spapr_drc_detach_cb *detach_cb)
+{
+    drc->detach_cb = detach_cb;
+}
+
+/* return the unique drc index as instance_id for qom interfaces*/
+static int get_instance_id(DeviceState *dev)
+{
+    return (int)get_index(SPAPR_DR_CONNECTOR(OBJECT(dev)));
+}
+
+static const VMStateDescription vmstate_spapr_drc = {
+    .name = "spapr_drc",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_drc_needed,
+    .fields  = (VMStateField []) {
+        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
+        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
+        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
+        VMSTATE_BOOL(configured, sPAPRDRConnector),
+        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
+        VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnector),
+        VMSTATE_BOOL(signalled, sPAPRDRConnector),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
 {
     DeviceClass *dk = DEVICE_CLASS(k);
@@ -640,6 +706,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
     dk->reset = reset;
     dk->realize = realize;
     dk->unrealize = unrealize;
+    dk->vmsd = &vmstate_spapr_drc;
+    dk->dev_get_instance_id = get_instance_id;
     drck->set_isolation_state = set_isolation_state;
     drck->set_indicator_state = set_indicator_state;
     drck->set_allocation_state = set_allocation_state;
@@ -653,6 +721,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
     drck->detach = detach;
     drck->release_pending = release_pending;
     drck->set_signalled = set_signalled;
+    drck->postmigrate_set_detach_cb = postmigrate_set_detach_cb;
+
     /*
      * Reason: it crashes FIXME find and document the real reason
      */
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index f9661b7..661f7d8 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1638,11 +1638,33 @@ static void spapr_pci_pre_save(void *opaque)
     }
 }
 
+/*
+ * detach_cb in the DRC state is a function pointer that cannot be
+ * migrated. We set it right after migration so that a migrated
+ * hot-unplug event could finish its work.
+ */
+static void spapr_pci_set_detach_cb(PCIBus *bus, PCIDevice *pdev,
+                                 void *opaque)
+{
+    sPAPRPHBState *sphb = opaque;
+    sPAPRDRConnector *drc = spapr_phb_get_pci_drc(sphb, pdev);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    drck->postmigrate_set_detach_cb(drc, spapr_phb_remove_pci_device_cb);
+}
+
 static int spapr_pci_post_load(void *opaque, int version_id)
 {
     sPAPRPHBState *sphb = opaque;
     gpointer key, value;
     int i;
+    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
+    unsigned int bus_no = 0;
+
+    /* Set detach_cb for the drc unconditionally after migration */
+    if (bus) {
+        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
+                            &bus_no);
+    }
 
     for (i = 0; i < sphb->msi_devs_num; ++i) {
         key = g_memdup(&sphb->msi_devs[i].key,
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index fa531d5..17589c8 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -192,6 +192,15 @@ typedef struct sPAPRDRConnectorClass {
                    void *detach_cb_opaque, Error **errp);
     bool (*release_pending)(sPAPRDRConnector *drc);
     void (*set_signalled)(sPAPRDRConnector *drc);
+
+    /*
+     * QEMU interface for setting detach_cb after migration.
+     * detach_cb in the DRC state is a function pointer that cannot be
+     * migrated. We set it right after migration so that a migrated
+     * hot-unplug event could finish its work.
+     */
+    void (*postmigrate_set_detach_cb)(sPAPRDRConnector *drc,
+                                      spapr_drc_detach_cb *detach_cb);
 } sPAPRDRConnectorClass;
 
 sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-2.8 3/3] spapr: migration support for CAS-negotiated option vectors
  2016-11-18  1:40 [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration Michael Roth
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry Michael Roth
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct Michael Roth
@ 2016-11-18  1:40 ` Michael Roth
  2016-11-18 16:08   ` Michael Roth
  2016-11-18  1:51 ` [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration no-reply
  2016-11-18  5:45 ` David Gibson
  4 siblings, 1 reply; 21+ messages in thread
From: Michael Roth @ 2016-11-18  1:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david, duanj, bharata, dgilbert, quintela, amit.shah

With the additional of the OV5_HP_EVT option vector, we now have
certain functionality (namely, memory unplug) that checks at run-time
for whether or not the guest negotiated the option via CAS. Because
we don't currently migrate these negotiated values, we are unable
to unplug memory from a guest after it's been migrated until after
the guest is rebooted and CAS-negotiation is repeated.

This patch fixes this by adding CAS-negotiated options to the
migration stream. We do this using a subsection, since the
negotiated value of OV5_HP_EVT is the only option currently needed
to maintain proper functionality for a running guest.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 68 +++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_ovec.c         | 12 ++++++++
 include/hw/ppc/spapr_ovec.h |  4 +++
 3 files changed, 84 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0cbab24..9e08aed 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1267,6 +1267,70 @@ static bool version_before_3(void *opaque, int version_id)
     return version_id < 3;
 }
 
+static bool spapr_ov5_cas_needed(void *opaque)
+{
+    sPAPRMachineState *spapr = opaque;
+    sPAPROptionVector *ov5_mask = spapr_ovec_new();
+    sPAPROptionVector *ov5_legacy = spapr_ovec_new();
+    sPAPROptionVector *ov5_removed = spapr_ovec_new();
+    bool cas_needed;
+
+    /* Prior to the introduction of sPAPROptionVector, we had two option
+     * vectors we dealt with: OV5_FORM1_AFFINITY, and OV5_DRCONF_MEMORY.
+     * Both of these options encode machine topology into the device-tree
+     * in such a way that the now-booted OS should still be able to interact
+     * appropriately with QEMU regardless of what options were actually
+     * negotiatied on the source side.
+     *
+     * As such, we can avoid migrating the CAS-negotiated options if these
+     * are the only options available on the current machine/platform.
+     * Since these are the only options available for pseries-2.7 and
+     * earlier, this allows us to maintain old->new/new->old migration
+     * compatibility.
+     *
+     * For QEMU 2.8+, there are additional CAS-negotiatable options available
+     * via default pseries-2.8 machines and explicit command-line parameters.
+     * Some of these options, like OV5_HP_EVT, *do* require QEMU to be aware
+     * of the actual CAS-negotiated values to continue working properly. For
+     * example, availability of memory unplug depends on knowing whether
+     * OV5_HP_EVT was negotiated via CAS.
+     *
+     * Thus, for any cases where the set of available CAS-negotiatable
+     * options extends beyond OV5_FORM1_AFFINITY and OV5_DRCONF_MEMORY, we
+     * include the CAS-negotiated options in the migration stream.
+     */
+    spapr_ovec_set(ov5_mask, OV5_FORM1_AFFINITY);
+    spapr_ovec_set(ov5_mask, OV5_DRCONF_MEMORY);
+
+    /* spapr_ovec_diff returns true if bits were removed. we avoid using
+     * the mask itself since in the future it's possible "legacy" bits may be
+     * removed via machine options, which could generate a false positive
+     * that breaks migration.
+     */
+    spapr_ovec_intersect(ov5_legacy, spapr->ov5, ov5_mask);
+    cas_needed = spapr_ovec_diff(ov5_removed, spapr->ov5, ov5_legacy);
+
+    spapr_ovec_cleanup(ov5_mask);
+    spapr_ovec_cleanup(ov5_legacy);
+    spapr_ovec_cleanup(ov5_removed);
+
+    error_report("MIGRATION NEEDED: %d", cas_needed);
+
+    return cas_needed;
+}
+
+static const VMStateDescription vmstate_spapr_ov5_cas = {
+    .name = "spapr_option_vector_ov5_cas",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_ov5_cas_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_STRUCT_POINTER_V(ov5_cas, sPAPRMachineState, 1,
+                                 vmstate_spapr_ovec, sPAPROptionVector),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -1282,6 +1346,10 @@ static const VMStateDescription vmstate_spapr = {
         VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription*[]) {
+        &vmstate_spapr_ov5_cas,
+        NULL
+    }
 };
 
 static int htab_save_setup(QEMUFile *f, void *opaque)
diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
index c2a0d18..3eb1d59 100644
--- a/hw/ppc/spapr_ovec.c
+++ b/hw/ppc/spapr_ovec.c
@@ -37,6 +37,17 @@
  */
 struct sPAPROptionVector {
     unsigned long *bitmap;
+    int32_t bitmap_size; /* only used for migration */
+};
+
+const VMStateDescription vmstate_spapr_ovec = {
+    .name = "spapr_option_vector",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_BITMAP(bitmap, sPAPROptionVector, 1, bitmap_size),
+        VMSTATE_END_OF_LIST()
+    }
 };
 
 sPAPROptionVector *spapr_ovec_new(void)
@@ -45,6 +56,7 @@ sPAPROptionVector *spapr_ovec_new(void)
 
     ov = g_new0(sPAPROptionVector, 1);
     ov->bitmap = bitmap_new(OV_MAXBITS);
+    ov->bitmap_size = OV_MAXBITS;
 
     return ov;
 }
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
index 6a06da3..355a344 100644
--- a/include/hw/ppc/spapr_ovec.h
+++ b/include/hw/ppc/spapr_ovec.h
@@ -37,6 +37,7 @@
 #define _SPAPR_OVEC_H
 
 #include "cpu.h"
+#include "migration/vmstate.h"
 
 typedef struct sPAPROptionVector sPAPROptionVector;
 
@@ -64,4 +65,7 @@ sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
 int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
                            sPAPROptionVector *ov, const char *name);
 
+/* migration */
+extern const VMStateDescription vmstate_spapr_ovec;
+
 #endif /* !defined (_SPAPR_OVEC_H) */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration
  2016-11-18  1:40 [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration Michael Roth
                   ` (2 preceding siblings ...)
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 3/3] spapr: migration support for CAS-negotiated option vectors Michael Roth
@ 2016-11-18  1:51 ` no-reply
  2016-11-18  5:45 ` David Gibson
  4 siblings, 0 replies; 21+ messages in thread
From: no-reply @ 2016-11-18  1:51 UTC (permalink / raw)
  To: mdroth
  Cc: famz, qemu-devel, duanj, quintela, dgilbert, qemu-ppc, bharata,
	amit.shah, david

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration
Type: series
Message-id: 1479433227-29238-1-git-send-email-mdroth@linux.vnet.ibm.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/1479433227-29238-1-git-send-email-mdroth@linux.vnet.ibm.com -> patchew/1479433227-29238-1-git-send-email-mdroth@linux.vnet.ibm.com
Switched to a new branch 'test'
8ab69b2 spapr: migration support for CAS-negotiated option vectors
3b5efd5 migration: spapr_drc: defined VMStateDescription struct
494d624 migration: alternative way to set instance_id in SaveStateEntry

=== OUTPUT BEGIN ===
Checking PATCH 1/3: migration: alternative way to set instance_id in SaveStateEntry...
WARNING: line over 80 characters
#59: FILE: hw/core/qdev.c:940:
+            vmstate_register_with_alias_id(dev, instance_id, qdev_get_vmsd(dev), dev,

total: 0 errors, 1 warnings, 43 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 2/3: migration: spapr_drc: defined VMStateDescription struct...
ERROR: space required before the open parenthesis '('
#71: FILE: hw/ppc/spapr_drc.c:651:
+    switch(drc->type) {

total: 1 errors, 0 warnings, 136 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/3: spapr: migration support for CAS-negotiated option vectors...
ERROR: spaces required around that '*' (ctx:VxV)
#100: FILE: hw/ppc/spapr.c:1349:
+    .subsections = (const VMStateDescription*[]) {
                                             ^

total: 1 errors, 0 warnings, 118 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration
  2016-11-18  1:40 [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration Michael Roth
                   ` (3 preceding siblings ...)
  2016-11-18  1:51 ` [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration no-reply
@ 2016-11-18  5:45 ` David Gibson
  2016-11-18 16:39   ` Michael Roth
  4 siblings, 1 reply; 21+ messages in thread
From: David Gibson @ 2016-11-18  5:45 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, qemu-ppc, duanj, bharata, dgilbert, quintela, amit.shah

[-- Attachment #1: Type: text/plain, Size: 4052 bytes --]

On Thu, Nov 17, 2016 at 07:40:24PM -0600, Michael Roth wrote:
> These patches are based on David's ppc-for-2.8 tree, and are also
> available from:
> 
>   https://github.com/mdroth/qemu/commits/spapr-cas-migration
> 
> Currently, memory hotplugged to a pseries guest cannot be removed after
> the guest has been migrated. This is due to 2 issues:
> 
> 1) The coldplugged state of memory on the target side is one where the
>    corresponding DRC's allocation state is:
> 
>      allocation_state == unallocated,
>      awaiting_allocation == true,
> 
>    When the guest attempts to unplug memory on the target side, it first
>    checks that allocation_state == allocated. If we fix this, the guest
>    can successfully notify QEMU of completion on it's end, but then the
>    DRC code sees that awaiting_allocation == true, so it defers the
>    finalizing of the LMB and corresponding DIMM since it assumes that
>    the DIMM must have been previously allocated before it can be removed.
> 
>    To address this, we pull in patches 1-2 from Jian Jun's DRC migration
>    series:
> 
>      https://lists.gnu.org/archive/html/qemu-ppc/2016-10/msg00048.html
> 
>    with some minor changes relating to prior review comments, and
>    the addition of migrating the DRC's awaiting_allocation value, which
>    wasn't part of the original patch. This doesn't address the full scope
>    of the issues Jian Jun was looking at (involving synchronizing state
>    when migration occurs during fairly small race windows), just this
>    particular case, which is more user visible since the time window is
>    indefinite.
> 
> 2) The ability to unplug memory is gated on the QEMU side by a check as
>    to whether or not support for newer-style hotplug events was negotiated
>    via CAS during boot. The check is performed by checking the corresponding
>    entry in the sPAPROptionVector structure. However, since this value isn't
>    migrated currently, we are unable to unplug until after the guest reboots.
> 
>    We address that here by adding migration support for sPAPROptionVectors,
>    and including the CAS-negotiated vector as part of the migration stream
>    for any cases where we advertise newer-style hotplug event support to
>    the guest.
> 
> David,
> 
> These fixes ended up going out much later than planned. I'm not sure
> if you're planning another pull for 2.8 or not, and realize there are
> some patches here not specifically pseries-related so it's
> understandable if we opt to pursue these for 2.9/2.8.1 instead. But if
> possible I'm hoping to get these in so that the memory unplug
> support is fully functional for 2.8.

Yeah, I'm still expecting to push a few bugfixes in before 2.8.  So,
I've merged these patches into ppc-for-2.8 (fixing a couple of trivial
style nits along the way).  I have a couple of comments that I'll make
on the patches, but they're not important enough to stop these going
in ASAP.

Unfortunately, of course, this is not the only migration breakage we
have at the moment.  I'm presently wrestling with both breakage due to
changes in the insns_flags masks, and due to the reworking of the mmio
windows for the PHB.

> 
> 
>  hw/core/qdev.c              |  6 +++++-
>  hw/ppc/spapr.c              | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_drc.c          | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_ovec.c         | 12 ++++++++++++
>  hw/ppc/spapr_pci.c          | 22 ++++++++++++++++++++++
>  include/hw/ppc/spapr_drc.h  |  9 +++++++++
>  include/hw/ppc/spapr_ovec.h |  4 ++++
>  include/hw/qdev-core.h      |  9 +++++++++
>  migration/savevm.c          |  4 ++--
>  9 files changed, 201 insertions(+), 3 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct Michael Roth
@ 2016-11-18  6:04   ` David Gibson
  2016-11-18 16:32     ` Michael Roth
  2016-11-22 16:35   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  1 sibling, 1 reply; 21+ messages in thread
From: David Gibson @ 2016-11-18  6:04 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, qemu-ppc, duanj, bharata, dgilbert, quintela, amit.shah

[-- Attachment #1: Type: text/plain, Size: 9384 bytes --]

On Thu, Nov 17, 2016 at 07:40:26PM -0600, Michael Roth wrote:
> From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> 
> To manage hotplug/unplug of dynamic resources such as PCI cards,
> memory, and CPU on sPAPR guests, a firmware abstraction known as
> a Dynamic Resource Connector (DRC) is used to assign a particular
> dynamic resource to the guest, and provide an interface for the
> guest to manage configuration/removal of the resource associated
> with it.
> 
> To migrate the hotplugged resources in migration, the
> associated DRC state need be migrated. To migrate the DRC state,
> we defined the VMStateDescription struct for spapr_drc to enable
> the transmission of spapr_drc state in migration.
> 
> Not all the elements in the DRC state are migrated. Only those
> ones modifiable or needed by guest actions or device add/remove
> operation are migrated. From the perspective of device
> hotplugging, if we hotplug a device on the source, we need to
> "coldplug" it on the target. The states across two hosts for the
> same device are not the same. Ideally we want the states be same
> after migration so that the device would function as hotplugged
> on the target. For example we can unplug it. The minimum DRC
> state we need to transfer should cover all the pieces changed by
> hotplugging. Out of the elements of the DRC state, isolation_state,
> allocation_sate, and configured are involved in the DR state
> transition diagram from PAPR+ 2.7, 13.4. configured and signalled
> are needed in attaching and detaching devices. indicator_state
> provides users with hardware state information. These 6 elements
> are migrated.
> 
> detach_cb in the DRC state is a function pointer that cannot be
> migrated. We set it right after DRC state is migrated so that
> a migrated hot-unplug event could finish its work.
> 
> The instance_id is used to identify objects in migration. We set
> instance_id of DRC using the unique index so that it is the same
> across migration.
> 
> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> * add migration for awaiting_allocation state
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_drc.c         | 70 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_pci.c         | 22 +++++++++++++++
>  include/hw/ppc/spapr_drc.h |  9 ++++++
>  3 files changed, 101 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index a0c44ee..1ec6551 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -632,6 +632,72 @@ static void spapr_dr_connector_instance_init(Object *obj)
>                          NULL, NULL, NULL, NULL);
>  }
>  
> +static bool spapr_drc_needed(void *opaque)
> +{
> +    sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    bool rc = false;
> +    sPAPRDREntitySense value;
> +
> +    drck->entity_sense(drc, &value);
> +    /* If no dev is plugged in there is no need to migrate the DRC state */
> +    if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
> +        return false;
> +    }
> +    /*
> +     * If there is dev plugged in, we need to migrate the DRC state when
> +     * it is different from cold-plugged state
> +     */
> +    switch(drc->type) {
> +    /* for PCI type */
> +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
> +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
> +               drc->configured && drc->signalled && !drc->awaiting_release);
> +        break;
> +    /* for LMB type */
> +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
> +               drc->configured && drc->signalled && !drc->awaiting_release);
> +        break;
> +    default:
> +        ;
> +    }
> +
> +    return rc;
> +}
> +
> +/* detach_cb needs be set since it is not migrated */
> +static void postmigrate_set_detach_cb(sPAPRDRConnector *drc,
> +                                      spapr_drc_detach_cb *detach_cb)
> +{
> +    drc->detach_cb = detach_cb;
> +}
> +
> +/* return the unique drc index as instance_id for qom interfaces*/
> +static int get_instance_id(DeviceState *dev)
> +{
> +    return (int)get_index(SPAPR_DR_CONNECTOR(OBJECT(dev)));
> +}
> +
> +static const VMStateDescription vmstate_spapr_drc = {
> +    .name = "spapr_drc",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_drc_needed,
> +    .fields  = (VMStateField []) {
> +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> +        VMSTATE_BOOL(configured, sPAPRDRConnector),
> +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> +        VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnector),
> +        VMSTATE_BOOL(signalled, sPAPRDRConnector),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>  {
>      DeviceClass *dk = DEVICE_CLASS(k);
> @@ -640,6 +706,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>      dk->reset = reset;
>      dk->realize = realize;
>      dk->unrealize = unrealize;
> +    dk->vmsd = &vmstate_spapr_drc;
> +    dk->dev_get_instance_id = get_instance_id;
>      drck->set_isolation_state = set_isolation_state;
>      drck->set_indicator_state = set_indicator_state;
>      drck->set_allocation_state = set_allocation_state;
> @@ -653,6 +721,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>      drck->detach = detach;
>      drck->release_pending = release_pending;
>      drck->set_signalled = set_signalled;
> +    drck->postmigrate_set_detach_cb = postmigrate_set_detach_cb;
> +
>      /*
>       * Reason: it crashes FIXME find and document the real reason
>       */
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index f9661b7..661f7d8 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1638,11 +1638,33 @@ static void spapr_pci_pre_save(void *opaque)
>      }
>  }
>  
> +/*
> + * detach_cb in the DRC state is a function pointer that cannot be
> + * migrated. We set it right after migration so that a migrated
> + * hot-unplug event could finish its work.
> + */
> +static void spapr_pci_set_detach_cb(PCIBus *bus, PCIDevice *pdev,
> +                                 void *opaque)
> +{
> +    sPAPRPHBState *sphb = opaque;
> +    sPAPRDRConnector *drc = spapr_phb_get_pci_drc(sphb, pdev);
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    drck->postmigrate_set_detach_cb(drc, spapr_phb_remove_pci_device_cb);
> +}
> +
>  static int spapr_pci_post_load(void *opaque, int version_id)
>  {
>      sPAPRPHBState *sphb = opaque;
>      gpointer key, value;
>      int i;
> +    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
> +    unsigned int bus_no = 0;
> +
> +    /* Set detach_cb for the drc unconditionally after migration */
> +    if (bus) {
> +        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
> +                            &bus_no);
> +    }

This reconstruction of the detach_cbs is a bit clunky.  As noted
elsewhere, I'm not going to delay the patches for that, of course.

IIRC, this callback is always either NULL, or the detach callback that
belongs to the particular DRC type.

So for the sake of migration, I think it would make sense to change
the representation of the DRC to instead of including a callback
directly, simply have a boolean flag for whether the callback is
active at the moment.  When we need to call the callback, we check if
it is active, and if it is look up the right function for the DRC type
(we could use subclasses of DRC to do this).

>  
>      for (i = 0; i < sphb->msi_devs_num; ++i) {
>          key = g_memdup(&sphb->msi_devs[i].key,
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index fa531d5..17589c8 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -192,6 +192,15 @@ typedef struct sPAPRDRConnectorClass {
>                     void *detach_cb_opaque, Error **errp);
>      bool (*release_pending)(sPAPRDRConnector *drc);
>      void (*set_signalled)(sPAPRDRConnector *drc);
> +
> +    /*
> +     * QEMU interface for setting detach_cb after migration.
> +     * detach_cb in the DRC state is a function pointer that cannot be
> +     * migrated. We set it right after migration so that a migrated
> +     * hot-unplug event could finish its work.
> +     */
> +    void (*postmigrate_set_detach_cb)(sPAPRDRConnector *drc,
> +                                      spapr_drc_detach_cb *detach_cb);
>  } sPAPRDRConnectorClass;
>  
>  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 3/3] spapr: migration support for CAS-negotiated option vectors
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 3/3] spapr: migration support for CAS-negotiated option vectors Michael Roth
@ 2016-11-18 16:08   ` Michael Roth
  2016-11-20 23:57     ` David Gibson
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Roth @ 2016-11-18 16:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: duanj, quintela, dgilbert, qemu-ppc, bharata, amit.shah, david

Quoting Michael Roth (2016-11-17 19:40:27)
> With the additional of the OV5_HP_EVT option vector, we now have
> certain functionality (namely, memory unplug) that checks at run-time
> for whether or not the guest negotiated the option via CAS. Because
> we don't currently migrate these negotiated values, we are unable
> to unplug memory from a guest after it's been migrated until after
> the guest is rebooted and CAS-negotiation is repeated.
> 
> This patch fixes this by adding CAS-negotiated options to the
> migration stream. We do this using a subsection, since the
> negotiated value of OV5_HP_EVT is the only option currently needed
> to maintain proper functionality for a running guest.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c              | 68 +++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_ovec.c         | 12 ++++++++
>  include/hw/ppc/spapr_ovec.h |  4 +++
>  3 files changed, 84 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0cbab24..9e08aed 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1267,6 +1267,70 @@ static bool version_before_3(void *opaque, int version_id)
>      return version_id < 3;
>  }
> 
> +static bool spapr_ov5_cas_needed(void *opaque)
> +{
> +    sPAPRMachineState *spapr = opaque;
> +    sPAPROptionVector *ov5_mask = spapr_ovec_new();
> +    sPAPROptionVector *ov5_legacy = spapr_ovec_new();
> +    sPAPROptionVector *ov5_removed = spapr_ovec_new();
> +    bool cas_needed;
> +
> +    /* Prior to the introduction of sPAPROptionVector, we had two option
> +     * vectors we dealt with: OV5_FORM1_AFFINITY, and OV5_DRCONF_MEMORY.
> +     * Both of these options encode machine topology into the device-tree
> +     * in such a way that the now-booted OS should still be able to interact
> +     * appropriately with QEMU regardless of what options were actually
> +     * negotiatied on the source side.
> +     *
> +     * As such, we can avoid migrating the CAS-negotiated options if these
> +     * are the only options available on the current machine/platform.
> +     * Since these are the only options available for pseries-2.7 and
> +     * earlier, this allows us to maintain old->new/new->old migration
> +     * compatibility.
> +     *
> +     * For QEMU 2.8+, there are additional CAS-negotiatable options available
> +     * via default pseries-2.8 machines and explicit command-line parameters.
> +     * Some of these options, like OV5_HP_EVT, *do* require QEMU to be aware
> +     * of the actual CAS-negotiated values to continue working properly. For
> +     * example, availability of memory unplug depends on knowing whether
> +     * OV5_HP_EVT was negotiated via CAS.
> +     *
> +     * Thus, for any cases where the set of available CAS-negotiatable
> +     * options extends beyond OV5_FORM1_AFFINITY and OV5_DRCONF_MEMORY, we
> +     * include the CAS-negotiated options in the migration stream.
> +     */
> +    spapr_ovec_set(ov5_mask, OV5_FORM1_AFFINITY);
> +    spapr_ovec_set(ov5_mask, OV5_DRCONF_MEMORY);
> +
> +    /* spapr_ovec_diff returns true if bits were removed. we avoid using
> +     * the mask itself since in the future it's possible "legacy" bits may be
> +     * removed via machine options, which could generate a false positive
> +     * that breaks migration.
> +     */
> +    spapr_ovec_intersect(ov5_legacy, spapr->ov5, ov5_mask);
> +    cas_needed = spapr_ovec_diff(ov5_removed, spapr->ov5, ov5_legacy);
> +
> +    spapr_ovec_cleanup(ov5_mask);
> +    spapr_ovec_cleanup(ov5_legacy);
> +    spapr_ovec_cleanup(ov5_removed);
> +
> +    error_report("MIGRATION NEEDED: %d", cas_needed);

Argh, sorry, I just noticed this stray debug comment that slipped in.
Would you prefer a v2, or just removing it in-tree?

> +
> +    return cas_needed;
> +}
> +
> +static const VMStateDescription vmstate_spapr_ov5_cas = {
> +    .name = "spapr_option_vector_ov5_cas",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_ov5_cas_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_STRUCT_POINTER_V(ov5_cas, sPAPRMachineState, 1,
> +                                 vmstate_spapr_ovec, sPAPROptionVector),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
> @@ -1282,6 +1346,10 @@ static const VMStateDescription vmstate_spapr = {
>          VMSTATE_PPC_TIMEBASE_V(tb, sPAPRMachineState, 2),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription*[]) {
> +        &vmstate_spapr_ov5_cas,
> +        NULL
> +    }
>  };
> 
>  static int htab_save_setup(QEMUFile *f, void *opaque)
> diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
> index c2a0d18..3eb1d59 100644
> --- a/hw/ppc/spapr_ovec.c
> +++ b/hw/ppc/spapr_ovec.c
> @@ -37,6 +37,17 @@
>   */
>  struct sPAPROptionVector {
>      unsigned long *bitmap;
> +    int32_t bitmap_size; /* only used for migration */
> +};
> +
> +const VMStateDescription vmstate_spapr_ovec = {
> +    .name = "spapr_option_vector",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_BITMAP(bitmap, sPAPROptionVector, 1, bitmap_size),
> +        VMSTATE_END_OF_LIST()
> +    }
>  };
> 
>  sPAPROptionVector *spapr_ovec_new(void)
> @@ -45,6 +56,7 @@ sPAPROptionVector *spapr_ovec_new(void)
> 
>      ov = g_new0(sPAPROptionVector, 1);
>      ov->bitmap = bitmap_new(OV_MAXBITS);
> +    ov->bitmap_size = OV_MAXBITS;
> 
>      return ov;
>  }
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> index 6a06da3..355a344 100644
> --- a/include/hw/ppc/spapr_ovec.h
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -37,6 +37,7 @@
>  #define _SPAPR_OVEC_H
> 
>  #include "cpu.h"
> +#include "migration/vmstate.h"
> 
>  typedef struct sPAPROptionVector sPAPROptionVector;
> 
> @@ -64,4 +65,7 @@ sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
>  int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
>                             sPAPROptionVector *ov, const char *name);
> 
> +/* migration */
> +extern const VMStateDescription vmstate_spapr_ovec;
> +
>  #endif /* !defined (_SPAPR_OVEC_H) */
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct
  2016-11-18  6:04   ` David Gibson
@ 2016-11-18 16:32     ` Michael Roth
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Roth @ 2016-11-18 16:32 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, qemu-ppc, duanj, bharata, dgilbert, quintela, amit.shah

Quoting David Gibson (2016-11-18 00:04:33)
> On Thu, Nov 17, 2016 at 07:40:26PM -0600, Michael Roth wrote:
> > From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > 
> > To manage hotplug/unplug of dynamic resources such as PCI cards,
> > memory, and CPU on sPAPR guests, a firmware abstraction known as
> > a Dynamic Resource Connector (DRC) is used to assign a particular
> > dynamic resource to the guest, and provide an interface for the
> > guest to manage configuration/removal of the resource associated
> > with it.
> > 
> > To migrate the hotplugged resources in migration, the
> > associated DRC state need be migrated. To migrate the DRC state,
> > we defined the VMStateDescription struct for spapr_drc to enable
> > the transmission of spapr_drc state in migration.
> > 
> > Not all the elements in the DRC state are migrated. Only those
> > ones modifiable or needed by guest actions or device add/remove
> > operation are migrated. From the perspective of device
> > hotplugging, if we hotplug a device on the source, we need to
> > "coldplug" it on the target. The states across two hosts for the
> > same device are not the same. Ideally we want the states be same
> > after migration so that the device would function as hotplugged
> > on the target. For example we can unplug it. The minimum DRC
> > state we need to transfer should cover all the pieces changed by
> > hotplugging. Out of the elements of the DRC state, isolation_state,
> > allocation_sate, and configured are involved in the DR state
> > transition diagram from PAPR+ 2.7, 13.4. configured and signalled
> > are needed in attaching and detaching devices. indicator_state
> > provides users with hardware state information. These 6 elements
> > are migrated.
> > 
> > detach_cb in the DRC state is a function pointer that cannot be
> > migrated. We set it right after DRC state is migrated so that
> > a migrated hot-unplug event could finish its work.
> > 
> > The instance_id is used to identify objects in migration. We set
> > instance_id of DRC using the unique index so that it is the same
> > across migration.
> > 
> > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > * add migration for awaiting_allocation state
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_drc.c         | 70 ++++++++++++++++++++++++++++++++++++++++++++++
> >  hw/ppc/spapr_pci.c         | 22 +++++++++++++++
> >  include/hw/ppc/spapr_drc.h |  9 ++++++
> >  3 files changed, 101 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index a0c44ee..1ec6551 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -632,6 +632,72 @@ static void spapr_dr_connector_instance_init(Object *obj)
> >                          NULL, NULL, NULL, NULL);
> >  }
> >  
> > +static bool spapr_drc_needed(void *opaque)
> > +{
> > +    sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    bool rc = false;
> > +    sPAPRDREntitySense value;
> > +
> > +    drck->entity_sense(drc, &value);
> > +    /* If no dev is plugged in there is no need to migrate the DRC state */
> > +    if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
> > +        return false;
> > +    }
> > +    /*
> > +     * If there is dev plugged in, we need to migrate the DRC state when
> > +     * it is different from cold-plugged state
> > +     */
> > +    switch(drc->type) {
> > +    /* for PCI type */
> > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
> > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
> > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > +        break;
> > +    /* for LMB type */
> > +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
> > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > +        break;
> > +    default:
> > +        ;
> > +    }
> > +
> > +    return rc;
> > +}
> > +
> > +/* detach_cb needs be set since it is not migrated */
> > +static void postmigrate_set_detach_cb(sPAPRDRConnector *drc,
> > +                                      spapr_drc_detach_cb *detach_cb)
> > +{
> > +    drc->detach_cb = detach_cb;
> > +}
> > +
> > +/* return the unique drc index as instance_id for qom interfaces*/
> > +static int get_instance_id(DeviceState *dev)
> > +{
> > +    return (int)get_index(SPAPR_DR_CONNECTOR(OBJECT(dev)));
> > +}
> > +
> > +static const VMStateDescription vmstate_spapr_drc = {
> > +    .name = "spapr_drc",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .needed = spapr_drc_needed,
> > +    .fields  = (VMStateField []) {
> > +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> > +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> > +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> > +        VMSTATE_BOOL(configured, sPAPRDRConnector),
> > +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> > +        VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnector),
> > +        VMSTATE_BOOL(signalled, sPAPRDRConnector),
> > +        VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> >  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >  {
> >      DeviceClass *dk = DEVICE_CLASS(k);
> > @@ -640,6 +706,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >      dk->reset = reset;
> >      dk->realize = realize;
> >      dk->unrealize = unrealize;
> > +    dk->vmsd = &vmstate_spapr_drc;
> > +    dk->dev_get_instance_id = get_instance_id;
> >      drck->set_isolation_state = set_isolation_state;
> >      drck->set_indicator_state = set_indicator_state;
> >      drck->set_allocation_state = set_allocation_state;
> > @@ -653,6 +721,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >      drck->detach = detach;
> >      drck->release_pending = release_pending;
> >      drck->set_signalled = set_signalled;
> > +    drck->postmigrate_set_detach_cb = postmigrate_set_detach_cb;
> > +
> >      /*
> >       * Reason: it crashes FIXME find and document the real reason
> >       */
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index f9661b7..661f7d8 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1638,11 +1638,33 @@ static void spapr_pci_pre_save(void *opaque)
> >      }
> >  }
> >  
> > +/*
> > + * detach_cb in the DRC state is a function pointer that cannot be
> > + * migrated. We set it right after migration so that a migrated
> > + * hot-unplug event could finish its work.
> > + */
> > +static void spapr_pci_set_detach_cb(PCIBus *bus, PCIDevice *pdev,
> > +                                 void *opaque)
> > +{
> > +    sPAPRPHBState *sphb = opaque;
> > +    sPAPRDRConnector *drc = spapr_phb_get_pci_drc(sphb, pdev);
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    drck->postmigrate_set_detach_cb(drc, spapr_phb_remove_pci_device_cb);
> > +}
> > +
> >  static int spapr_pci_post_load(void *opaque, int version_id)
> >  {
> >      sPAPRPHBState *sphb = opaque;
> >      gpointer key, value;
> >      int i;
> > +    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
> > +    unsigned int bus_no = 0;
> > +
> > +    /* Set detach_cb for the drc unconditionally after migration */
> > +    if (bus) {
> > +        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
> > +                            &bus_no);
> > +    }
> 
> This reconstruction of the detach_cbs is a bit clunky.  As noted
> elsewhere, I'm not going to delay the patches for that, of course.

Agreed, and I thought about pulling it in as part of this series but
was worried some of the changes in the DRC logic might be a bit risky
for hard-freeze. There are a couple other edge-cases in the DRC logic
that we don't fully handle correctly either, so I think it would be
good to introduce some unit tests for the DRC code and do the
refactoring on top of that. Will work on that for 2.9.

> 
> IIRC, this callback is always either NULL, or the detach callback that
> belongs to the particular DRC type.
> 
> So for the sake of migration, I think it would make sense to change
> the representation of the DRC to instead of including a callback
> directly, simply have a boolean flag for whether the callback is
> active at the moment.  When we need to call the callback, we check if
> it is active, and if it is look up the right function for the DRC type
> (we could use subclasses of DRC to do this).

Sub-classes would be pretty elegant... I considered doing that early on
for some of the more core logic, but there seemed to be a lot of overlap
(though the logic seems to be getting more and more type-specific as
time goes on). For the attach/detach callbacks the distinctions are much
clearer though...

Another approach would be to register all callbacks as part of the
DRC creation. Having control over the callbacks at the DRC creation
callsites might have some benefits WRT doing the unit tests, since
the DRCs remain fairly self-contained. But probably not too hard to
make it work either way.

> 
> >  
> >      for (i = 0; i < sphb->msi_devs_num; ++i) {
> >          key = g_memdup(&sphb->msi_devs[i].key,
> > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > index fa531d5..17589c8 100644
> > --- a/include/hw/ppc/spapr_drc.h
> > +++ b/include/hw/ppc/spapr_drc.h
> > @@ -192,6 +192,15 @@ typedef struct sPAPRDRConnectorClass {
> >                     void *detach_cb_opaque, Error **errp);
> >      bool (*release_pending)(sPAPRDRConnector *drc);
> >      void (*set_signalled)(sPAPRDRConnector *drc);
> > +
> > +    /*
> > +     * QEMU interface for setting detach_cb after migration.
> > +     * detach_cb in the DRC state is a function pointer that cannot be
> > +     * migrated. We set it right after migration so that a migrated
> > +     * hot-unplug event could finish its work.
> > +     */
> > +    void (*postmigrate_set_detach_cb)(sPAPRDRConnector *drc,
> > +                                      spapr_drc_detach_cb *detach_cb);
> >  } sPAPRDRConnectorClass;
> >  
> >  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration
  2016-11-18  5:45 ` David Gibson
@ 2016-11-18 16:39   ` Michael Roth
  2016-11-20 23:58     ` David Gibson
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Roth @ 2016-11-18 16:39 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, qemu-ppc, duanj, bharata, dgilbert, quintela, amit.shah

Quoting David Gibson (2016-11-17 23:45:05)
> On Thu, Nov 17, 2016 at 07:40:24PM -0600, Michael Roth wrote:
> > These patches are based on David's ppc-for-2.8 tree, and are also
> > available from:
> > 
> >   https://github.com/mdroth/qemu/commits/spapr-cas-migration
> > 
> > Currently, memory hotplugged to a pseries guest cannot be removed after
> > the guest has been migrated. This is due to 2 issues:
> > 
> > 1) The coldplugged state of memory on the target side is one where the
> >    corresponding DRC's allocation state is:
> > 
> >      allocation_state == unallocated,
> >      awaiting_allocation == true,
> > 
> >    When the guest attempts to unplug memory on the target side, it first
> >    checks that allocation_state == allocated. If we fix this, the guest
> >    can successfully notify QEMU of completion on it's end, but then the
> >    DRC code sees that awaiting_allocation == true, so it defers the
> >    finalizing of the LMB and corresponding DIMM since it assumes that
> >    the DIMM must have been previously allocated before it can be removed.
> > 
> >    To address this, we pull in patches 1-2 from Jian Jun's DRC migration
> >    series:
> > 
> >      https://lists.gnu.org/archive/html/qemu-ppc/2016-10/msg00048.html
> > 
> >    with some minor changes relating to prior review comments, and
> >    the addition of migrating the DRC's awaiting_allocation value, which
> >    wasn't part of the original patch. This doesn't address the full scope
> >    of the issues Jian Jun was looking at (involving synchronizing state
> >    when migration occurs during fairly small race windows), just this
> >    particular case, which is more user visible since the time window is
> >    indefinite.
> > 
> > 2) The ability to unplug memory is gated on the QEMU side by a check as
> >    to whether or not support for newer-style hotplug events was negotiated
> >    via CAS during boot. The check is performed by checking the corresponding
> >    entry in the sPAPROptionVector structure. However, since this value isn't
> >    migrated currently, we are unable to unplug until after the guest reboots.
> > 
> >    We address that here by adding migration support for sPAPROptionVectors,
> >    and including the CAS-negotiated vector as part of the migration stream
> >    for any cases where we advertise newer-style hotplug event support to
> >    the guest.
> > 
> > David,
> > 
> > These fixes ended up going out much later than planned. I'm not sure
> > if you're planning another pull for 2.8 or not, and realize there are
> > some patches here not specifically pseries-related so it's
> > understandable if we opt to pursue these for 2.9/2.8.1 instead. But if
> > possible I'm hoping to get these in so that the memory unplug
> > support is fully functional for 2.8.
> 
> Yeah, I'm still expecting to push a few bugfixes in before 2.8.  So,
> I've merged these patches into ppc-for-2.8 (fixing a couple of trivial
> style nits along the way).  I have a couple of comments that I'll make
> on the patches, but they're not important enough to stop these going
> in ASAP.
> 
> Unfortunately, of course, this is not the only migration breakage we
> have at the moment.  I'm presently wrestling with both breakage due to
> changes in the insns_flags masks, and due to the reworking of the mmio
> windows for the PHB.

Ok, thanks for the heads up. FYI I'm still hoping to get the insns_flags
fix in for 2.7.1 (which is a bit behind at this point, should have schedule
and initial tree posted next week though), so I will keep an eye out for
those.

> 
> > 
> > 
> >  hw/core/qdev.c              |  6 +++++-
> >  hw/ppc/spapr.c              | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  hw/ppc/spapr_drc.c          | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  hw/ppc/spapr_ovec.c         | 12 ++++++++++++
> >  hw/ppc/spapr_pci.c          | 22 ++++++++++++++++++++++
> >  include/hw/ppc/spapr_drc.h  |  9 +++++++++
> >  include/hw/ppc/spapr_ovec.h |  4 ++++
> >  include/hw/qdev-core.h      |  9 +++++++++
> >  migration/savevm.c          |  4 ++--
> >  9 files changed, 201 insertions(+), 3 deletions(-)
> > 
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 3/3] spapr: migration support for CAS-negotiated option vectors
  2016-11-18 16:08   ` Michael Roth
@ 2016-11-20 23:57     ` David Gibson
  0 siblings, 0 replies; 21+ messages in thread
From: David Gibson @ 2016-11-20 23:57 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, duanj, quintela, dgilbert, qemu-ppc, bharata, amit.shah

[-- Attachment #1: Type: text/plain, Size: 4350 bytes --]

On Fri, Nov 18, 2016 at 10:08:59AM -0600, Michael Roth wrote:
> Quoting Michael Roth (2016-11-17 19:40:27)
> > With the additional of the OV5_HP_EVT option vector, we now have
> > certain functionality (namely, memory unplug) that checks at run-time
> > for whether or not the guest negotiated the option via CAS. Because
> > we don't currently migrate these negotiated values, we are unable
> > to unplug memory from a guest after it's been migrated until after
> > the guest is rebooted and CAS-negotiation is repeated.
> > 
> > This patch fixes this by adding CAS-negotiated options to the
> > migration stream. We do this using a subsection, since the
> > negotiated value of OV5_HP_EVT is the only option currently needed
> > to maintain proper functionality for a running guest.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c              | 68 +++++++++++++++++++++++++++++++++++++++++++++
> >  hw/ppc/spapr_ovec.c         | 12 ++++++++
> >  include/hw/ppc/spapr_ovec.h |  4 +++
> >  3 files changed, 84 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 0cbab24..9e08aed 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1267,6 +1267,70 @@ static bool version_before_3(void *opaque, int version_id)
> >      return version_id < 3;
> >  }
> > 
> > +static bool spapr_ov5_cas_needed(void *opaque)
> > +{
> > +    sPAPRMachineState *spapr = opaque;
> > +    sPAPROptionVector *ov5_mask = spapr_ovec_new();
> > +    sPAPROptionVector *ov5_legacy = spapr_ovec_new();
> > +    sPAPROptionVector *ov5_removed = spapr_ovec_new();
> > +    bool cas_needed;
> > +
> > +    /* Prior to the introduction of sPAPROptionVector, we had two option
> > +     * vectors we dealt with: OV5_FORM1_AFFINITY, and OV5_DRCONF_MEMORY.
> > +     * Both of these options encode machine topology into the device-tree
> > +     * in such a way that the now-booted OS should still be able to interact
> > +     * appropriately with QEMU regardless of what options were actually
> > +     * negotiatied on the source side.
> > +     *
> > +     * As such, we can avoid migrating the CAS-negotiated options if these
> > +     * are the only options available on the current machine/platform.
> > +     * Since these are the only options available for pseries-2.7 and
> > +     * earlier, this allows us to maintain old->new/new->old migration
> > +     * compatibility.
> > +     *
> > +     * For QEMU 2.8+, there are additional CAS-negotiatable options available
> > +     * via default pseries-2.8 machines and explicit command-line parameters.
> > +     * Some of these options, like OV5_HP_EVT, *do* require QEMU to be aware
> > +     * of the actual CAS-negotiated values to continue working properly. For
> > +     * example, availability of memory unplug depends on knowing whether
> > +     * OV5_HP_EVT was negotiated via CAS.
> > +     *
> > +     * Thus, for any cases where the set of available CAS-negotiatable
> > +     * options extends beyond OV5_FORM1_AFFINITY and OV5_DRCONF_MEMORY, we
> > +     * include the CAS-negotiated options in the migration stream.
> > +     */
> > +    spapr_ovec_set(ov5_mask, OV5_FORM1_AFFINITY);
> > +    spapr_ovec_set(ov5_mask, OV5_DRCONF_MEMORY);
> > +
> > +    /* spapr_ovec_diff returns true if bits were removed. we avoid using
> > +     * the mask itself since in the future it's possible "legacy" bits may be
> > +     * removed via machine options, which could generate a false positive
> > +     * that breaks migration.
> > +     */
> > +    spapr_ovec_intersect(ov5_legacy, spapr->ov5, ov5_mask);
> > +    cas_needed = spapr_ovec_diff(ov5_removed, spapr->ov5, ov5_legacy);
> > +
> > +    spapr_ovec_cleanup(ov5_mask);
> > +    spapr_ovec_cleanup(ov5_legacy);
> > +    spapr_ovec_cleanup(ov5_removed);
> > +
> > +    error_report("MIGRATION NEEDED: %d", cas_needed);
> 
> Argh, sorry, I just noticed this stray debug comment that slipped in.
> Would you prefer a v2, or just removing it in-tree?

I've fixed it in tree.  Thanks for pointing it out.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration
  2016-11-18 16:39   ` Michael Roth
@ 2016-11-20 23:58     ` David Gibson
  0 siblings, 0 replies; 21+ messages in thread
From: David Gibson @ 2016-11-20 23:58 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, qemu-ppc, duanj, bharata, dgilbert, quintela, amit.shah

[-- Attachment #1: Type: text/plain, Size: 4217 bytes --]

On Fri, Nov 18, 2016 at 10:39:49AM -0600, Michael Roth wrote:
> Quoting David Gibson (2016-11-17 23:45:05)
> > On Thu, Nov 17, 2016 at 07:40:24PM -0600, Michael Roth wrote:
> > > These patches are based on David's ppc-for-2.8 tree, and are also
> > > available from:
> > > 
> > >   https://github.com/mdroth/qemu/commits/spapr-cas-migration
> > > 
> > > Currently, memory hotplugged to a pseries guest cannot be removed after
> > > the guest has been migrated. This is due to 2 issues:
> > > 
> > > 1) The coldplugged state of memory on the target side is one where the
> > >    corresponding DRC's allocation state is:
> > > 
> > >      allocation_state == unallocated,
> > >      awaiting_allocation == true,
> > > 
> > >    When the guest attempts to unplug memory on the target side, it first
> > >    checks that allocation_state == allocated. If we fix this, the guest
> > >    can successfully notify QEMU of completion on it's end, but then the
> > >    DRC code sees that awaiting_allocation == true, so it defers the
> > >    finalizing of the LMB and corresponding DIMM since it assumes that
> > >    the DIMM must have been previously allocated before it can be removed.
> > > 
> > >    To address this, we pull in patches 1-2 from Jian Jun's DRC migration
> > >    series:
> > > 
> > >      https://lists.gnu.org/archive/html/qemu-ppc/2016-10/msg00048.html
> > > 
> > >    with some minor changes relating to prior review comments, and
> > >    the addition of migrating the DRC's awaiting_allocation value, which
> > >    wasn't part of the original patch. This doesn't address the full scope
> > >    of the issues Jian Jun was looking at (involving synchronizing state
> > >    when migration occurs during fairly small race windows), just this
> > >    particular case, which is more user visible since the time window is
> > >    indefinite.
> > > 
> > > 2) The ability to unplug memory is gated on the QEMU side by a check as
> > >    to whether or not support for newer-style hotplug events was negotiated
> > >    via CAS during boot. The check is performed by checking the corresponding
> > >    entry in the sPAPROptionVector structure. However, since this value isn't
> > >    migrated currently, we are unable to unplug until after the guest reboots.
> > > 
> > >    We address that here by adding migration support for sPAPROptionVectors,
> > >    and including the CAS-negotiated vector as part of the migration stream
> > >    for any cases where we advertise newer-style hotplug event support to
> > >    the guest.
> > > 
> > > David,
> > > 
> > > These fixes ended up going out much later than planned. I'm not sure
> > > if you're planning another pull for 2.8 or not, and realize there are
> > > some patches here not specifically pseries-related so it's
> > > understandable if we opt to pursue these for 2.9/2.8.1 instead. But if
> > > possible I'm hoping to get these in so that the memory unplug
> > > support is fully functional for 2.8.
> > 
> > Yeah, I'm still expecting to push a few bugfixes in before 2.8.  So,
> > I've merged these patches into ppc-for-2.8 (fixing a couple of trivial
> > style nits along the way).  I have a couple of comments that I'll make
> > on the patches, but they're not important enough to stop these going
> > in ASAP.
> > 
> > Unfortunately, of course, this is not the only migration breakage we
> > have at the moment.  I'm presently wrestling with both breakage due to
> > changes in the insns_flags masks, and due to the reworking of the mmio
> > windows for the PHB.
> 
> Ok, thanks for the heads up. FYI I'm still hoping to get the insns_flags
> fix in for 2.7.1 (which is a bit behind at this point, should have schedule
> and initial tree posted next week though), so I will keep an eye out for
> those.

Yeah, in addition to being sick, I've had to rethink how to fix these
migration problems, including how to address this for 2.7.1.  I'm
working on it right now.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry Michael Roth
@ 2016-11-22  6:15   ` David Gibson
  2016-11-22 10:23     ` Dr. David Alan Gilbert
  2016-11-22 22:58     ` Michael Roth
  0 siblings, 2 replies; 21+ messages in thread
From: David Gibson @ 2016-11-22  6:15 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, qemu-ppc, duanj, bharata, dgilbert, quintela, amit.shah

[-- Attachment #1: Type: text/plain, Size: 6387 bytes --]

On Thu, Nov 17, 2016 at 07:40:25PM -0600, Michael Roth wrote:
> From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> 
> Currently migrated Devices are identified with an idstr which is
> calculated automatically using their path in the QOM composition
> tree. In some cases these Devices may have a more reliable
> identifier that may be preferable in situations where there's a
> chance their path in the composition tree might change in the
> future as a resulting of changes in how the device is modeled
> in QEMU.
> 
> One such Device is the "spapr-dr-connector" used to handle hotplug
> for various resources on pseries machine types, where the PAPR
> specification mandates that each DRC have a 32-bit globally unique
> identifier associated with it. As such, this identifier is also ideal
> as a reliable way to identify a particular DRC in the migration
> stream, so we introduce support here for using a caller-side supplied
> instance_id for Devices in preparation for that.
> 
> register_savevm_live() and vmstate_register_with_alias_id() already
> provide a means to let the caller supply an instance_id instead of
> relying on the migration subsystem to generate one automatically,
> but in cases where we're registering SaveVMHandlers or VMSDs
> (respectively) that are associated with a Device, the instance_id is
> ignored in favor of a string identifier based solely on the QOM path.
> 
> This patch generalizes this so that an instance_id can also be
> supplied by the caller for Devices. Since VMSD registration for
> Devices is generally handled automatically by qdev, we also introduce
> a DeviceClass->dev_get_instance_id() method that, when set, is called
> by qdev to obtain the corresponding instance_id that should be used
> for a particular Device. Otherwise we maintain the original behavior
> of passing instance_id == -1 and thus falling back to the previous
> logic of using the QOM path.
> 
> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> * moved usage of DeviceClass->dev_get_instance_id() out of savevm.c
>   and into caller (qdev) instead
> * clarified usage/intent in comments and commit msg
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

I've had to remove this patch and 2/3 from ppc-for-2.8, because I
discovered (as I was preparing a pull request) that it causes a weird
breakage.

Specifically, on some, but not all, setups it causes the postcopy-test
to fail with the error:

qemu-system-ppc64: RP: Received invalid message 0x0000 length 0x0000
FAIL

For me, it fails when running on a ppc64le or ppc64 host (RHEL7.3),
and on 32-bit x86 (Fedora container) but not on x86_64 host (Fedora
24).

That's a pretty baffling set of symptoms, and so far I haven't gotten
far in figuring out how it could happen.  But I really want to get the
rest of the patches in ppc-for-2.8 pulled, so I've dropped these for
now.

I'll try to debug this once I've prepared the next pull request, but
if you're able to work out what's going on for me, that would be
extremely helpful.

I do have one vague theory..

> ---
>  hw/core/qdev.c         | 6 +++++-
>  include/hw/qdev-core.h | 9 +++++++++
>  migration/savevm.c     | 4 ++--
>  3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 5783442..c8c0c44 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -933,7 +933,11 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
>          }
>  
>          if (qdev_get_vmsd(dev)) {
> -            vmstate_register_with_alias_id(dev, -1, qdev_get_vmsd(dev), dev,
> +            int instance_id = DEVICE_GET_CLASS(dev)->dev_get_instance_id
> +                ? DEVICE_GET_CLASS(dev)->dev_get_instance_id(dev)
> +                : -1;
> +
> +            vmstate_register_with_alias_id(dev, instance_id, qdev_get_vmsd(dev), dev,
>                                             dev->instance_id_alias,
>                                             dev->alias_required_for_version);
>          }
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 2c97347..8ba82af 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -139,6 +139,15 @@ typedef struct DeviceClass {
>      qdev_initfn init; /* TODO remove, once users are converted to realize */
>      qdev_event exit; /* TODO remove, once users are converted to unrealize */
>      const char *bus_type;
> +
> +    /* When this field is set, instead of using the device's QOM path,
> +     * SaveStateEntry's for devices will be identified using a combination
> +     * of the corresponding VMSD name and an instance_id returned by this
> +     * function. This should only be necessary for situations where the
> +     * QOM path is anticipated to change and a more stable identifier is
> +     * desired to identify a device in the migration stream.
> +     */
> +    int (*dev_get_instance_id)(DeviceState *dev);
>  } DeviceClass;
>  
>  typedef struct NamedGPIOList NamedGPIOList;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 0363372..a95fff9 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -556,7 +556,7 @@ int register_savevm_live(DeviceState *dev,
>          se->is_ram = 1;
>      }
>  
> -    if (dev) {
> +    if (dev && instance_id == -1) {

.. so, I'm not sure how this could lead to the observed symptoms, but
I did note that adding this condition makes another if within the
block redundant, because it also checks for instance_id == -1.  That
suggests that simply skipping the whole block in this case is probably
not the right thing to do.

>          char *id = qdev_get_dev_path(dev);
>          if (id) {
>              pstrcpy(se->idstr, sizeof(se->idstr), id);
> @@ -640,7 +640,7 @@ int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
>      se->vmsd = vmsd;
>      se->alias_id = alias_id;
>  
> -    if (dev) {
> +    if (dev && instance_id == -1) {
>          char *id = qdev_get_dev_path(dev);
>          if (id) {
>              pstrcpy(se->idstr, sizeof(se->idstr), id);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry
  2016-11-22  6:15   ` David Gibson
@ 2016-11-22 10:23     ` Dr. David Alan Gilbert
  2016-11-22 22:58     ` Michael Roth
  1 sibling, 0 replies; 21+ messages in thread
From: Dr. David Alan Gilbert @ 2016-11-22 10:23 UTC (permalink / raw)
  To: David Gibson
  Cc: Michael Roth, qemu-devel, qemu-ppc, duanj, bharata, quintela, amit.shah

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Thu, Nov 17, 2016 at 07:40:25PM -0600, Michael Roth wrote:
> > From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > 
> > Currently migrated Devices are identified with an idstr which is
> > calculated automatically using their path in the QOM composition
> > tree. In some cases these Devices may have a more reliable
> > identifier that may be preferable in situations where there's a
> > chance their path in the composition tree might change in the
> > future as a resulting of changes in how the device is modeled
> > in QEMU.
> > 
> > One such Device is the "spapr-dr-connector" used to handle hotplug
> > for various resources on pseries machine types, where the PAPR
> > specification mandates that each DRC have a 32-bit globally unique
> > identifier associated with it. As such, this identifier is also ideal
> > as a reliable way to identify a particular DRC in the migration
> > stream, so we introduce support here for using a caller-side supplied
> > instance_id for Devices in preparation for that.
> > 
> > register_savevm_live() and vmstate_register_with_alias_id() already
> > provide a means to let the caller supply an instance_id instead of
> > relying on the migration subsystem to generate one automatically,
> > but in cases where we're registering SaveVMHandlers or VMSDs
> > (respectively) that are associated with a Device, the instance_id is
> > ignored in favor of a string identifier based solely on the QOM path.
> > 
> > This patch generalizes this so that an instance_id can also be
> > supplied by the caller for Devices. Since VMSD registration for
> > Devices is generally handled automatically by qdev, we also introduce
> > a DeviceClass->dev_get_instance_id() method that, when set, is called
> > by qdev to obtain the corresponding instance_id that should be used
> > for a particular Device. Otherwise we maintain the original behavior
> > of passing instance_id == -1 and thus falling back to the previous
> > logic of using the QOM path.
> > 
> > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > * moved usage of DeviceClass->dev_get_instance_id() out of savevm.c
> >   and into caller (qdev) instead
> > * clarified usage/intent in comments and commit msg
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> 
> I've had to remove this patch and 2/3 from ppc-for-2.8, because I
> discovered (as I was preparing a pull request) that it causes a weird
> breakage.
> 
> Specifically, on some, but not all, setups it causes the postcopy-test
> to fail with the error:
> 
> qemu-system-ppc64: RP: Received invalid message 0x0000 length 0x0000
> FAIL

Hmm; that's from the source side of the migration, it probably means the destination
has died and that's hiding the real error somewhere.

Dave


> For me, it fails when running on a ppc64le or ppc64 host (RHEL7.3),
> and on 32-bit x86 (Fedora container) but not on x86_64 host (Fedora
> 24).
> 
> That's a pretty baffling set of symptoms, and so far I haven't gotten
> far in figuring out how it could happen.  But I really want to get the
> rest of the patches in ppc-for-2.8 pulled, so I've dropped these for
> now.
> 
> I'll try to debug this once I've prepared the next pull request, but
> if you're able to work out what's going on for me, that would be
> extremely helpful.
> 
> I do have one vague theory..
> 
> > ---
> >  hw/core/qdev.c         | 6 +++++-
> >  include/hw/qdev-core.h | 9 +++++++++
> >  migration/savevm.c     | 4 ++--
> >  3 files changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index 5783442..c8c0c44 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -933,7 +933,11 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
> >          }
> >  
> >          if (qdev_get_vmsd(dev)) {
> > -            vmstate_register_with_alias_id(dev, -1, qdev_get_vmsd(dev), dev,
> > +            int instance_id = DEVICE_GET_CLASS(dev)->dev_get_instance_id
> > +                ? DEVICE_GET_CLASS(dev)->dev_get_instance_id(dev)
> > +                : -1;
> > +
> > +            vmstate_register_with_alias_id(dev, instance_id, qdev_get_vmsd(dev), dev,
> >                                             dev->instance_id_alias,
> >                                             dev->alias_required_for_version);
> >          }
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 2c97347..8ba82af 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -139,6 +139,15 @@ typedef struct DeviceClass {
> >      qdev_initfn init; /* TODO remove, once users are converted to realize */
> >      qdev_event exit; /* TODO remove, once users are converted to unrealize */
> >      const char *bus_type;
> > +
> > +    /* When this field is set, instead of using the device's QOM path,
> > +     * SaveStateEntry's for devices will be identified using a combination
> > +     * of the corresponding VMSD name and an instance_id returned by this
> > +     * function. This should only be necessary for situations where the
> > +     * QOM path is anticipated to change and a more stable identifier is
> > +     * desired to identify a device in the migration stream.
> > +     */
> > +    int (*dev_get_instance_id)(DeviceState *dev);
> >  } DeviceClass;
> >  
> >  typedef struct NamedGPIOList NamedGPIOList;
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 0363372..a95fff9 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -556,7 +556,7 @@ int register_savevm_live(DeviceState *dev,
> >          se->is_ram = 1;
> >      }
> >  
> > -    if (dev) {
> > +    if (dev && instance_id == -1) {
> 
> .. so, I'm not sure how this could lead to the observed symptoms, but
> I did note that adding this condition makes another if within the
> block redundant, because it also checks for instance_id == -1.  That
> suggests that simply skipping the whole block in this case is probably
> not the right thing to do.
> 
> >          char *id = qdev_get_dev_path(dev);
> >          if (id) {
> >              pstrcpy(se->idstr, sizeof(se->idstr), id);
> > @@ -640,7 +640,7 @@ int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
> >      se->vmsd = vmsd;
> >      se->alias_id = alias_id;
> >  
> > -    if (dev) {
> > +    if (dev && instance_id == -1) {
> >          char *id = qdev_get_dev_path(dev);
> >          if (id) {
> >              pstrcpy(se->idstr, sizeof(se->idstr), id);
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct
  2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct Michael Roth
  2016-11-18  6:04   ` David Gibson
@ 2016-11-22 16:35   ` Greg Kurz
  2016-11-22 17:24     ` Michael Roth
  1 sibling, 1 reply; 21+ messages in thread
From: Greg Kurz @ 2016-11-22 16:35 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, duanj, quintela, dgilbert, qemu-ppc, bharata,
	amit.shah, david

On Thu, 17 Nov 2016 19:40:26 -0600
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> 
> To manage hotplug/unplug of dynamic resources such as PCI cards,
> memory, and CPU on sPAPR guests, a firmware abstraction known as
> a Dynamic Resource Connector (DRC) is used to assign a particular
> dynamic resource to the guest, and provide an interface for the
> guest to manage configuration/removal of the resource associated
> with it.
> 
> To migrate the hotplugged resources in migration, the
> associated DRC state need be migrated. To migrate the DRC state,
> we defined the VMStateDescription struct for spapr_drc to enable
> the transmission of spapr_drc state in migration.
> 
> Not all the elements in the DRC state are migrated. Only those
> ones modifiable or needed by guest actions or device add/remove
> operation are migrated. From the perspective of device
> hotplugging, if we hotplug a device on the source, we need to
> "coldplug" it on the target. The states across two hosts for the
> same device are not the same. Ideally we want the states be same
> after migration so that the device would function as hotplugged
> on the target. For example we can unplug it. The minimum DRC
> state we need to transfer should cover all the pieces changed by
> hotplugging. Out of the elements of the DRC state, isolation_state,
> allocation_sate, and configured are involved in the DR state
> transition diagram from PAPR+ 2.7, 13.4. configured and signalled
> are needed in attaching and detaching devices. indicator_state
> provides users with hardware state information. These 6 elements
> are migrated.
> 
> detach_cb in the DRC state is a function pointer that cannot be
> migrated. We set it right after DRC state is migrated so that
> a migrated hot-unplug event could finish its work.
> 
> The instance_id is used to identify objects in migration. We set
> instance_id of DRC using the unique index so that it is the same
> across migration.
> 
> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> * add migration for awaiting_allocation state
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_drc.c         | 70 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_pci.c         | 22 +++++++++++++++
>  include/hw/ppc/spapr_drc.h |  9 ++++++
>  3 files changed, 101 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index a0c44ee..1ec6551 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -632,6 +632,72 @@ static void spapr_dr_connector_instance_init(Object *obj)
>                          NULL, NULL, NULL, NULL);
>  }
>  
> +static bool spapr_drc_needed(void *opaque)
> +{
> +    sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    bool rc = false;
> +    sPAPRDREntitySense value;
> +
> +    drck->entity_sense(drc, &value);
> +    /* If no dev is plugged in there is no need to migrate the DRC state */
> +    if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
> +        return false;
> +    }
> +    /*
> +     * If there is dev plugged in, we need to migrate the DRC state when
> +     * it is different from cold-plugged state
> +     */
> +    switch(drc->type) {
> +    /* for PCI type */
> +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
> +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
> +               drc->configured && drc->signalled && !drc->awaiting_release);
> +        break;
> +    /* for LMB type */
> +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
> +               drc->configured && drc->signalled && !drc->awaiting_release);
> +        break;
> +    default:
> +        ;
> +    }
> +
> +    return rc;
> +}
> +
> +/* detach_cb needs be set since it is not migrated */
> +static void postmigrate_set_detach_cb(sPAPRDRConnector *drc,
> +                                      spapr_drc_detach_cb *detach_cb)
> +{
> +    drc->detach_cb = detach_cb;
> +}
> +
> +/* return the unique drc index as instance_id for qom interfaces*/
> +static int get_instance_id(DeviceState *dev)
> +{
> +    return (int)get_index(SPAPR_DR_CONNECTOR(OBJECT(dev)));
> +}
> +
> +static const VMStateDescription vmstate_spapr_drc = {
> +    .name = "spapr_drc",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_drc_needed,
> +    .fields  = (VMStateField []) {
> +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> +        VMSTATE_BOOL(configured, sPAPRDRConnector),
> +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> +        VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnector),
> +        VMSTATE_BOOL(signalled, sPAPRDRConnector),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>  {
>      DeviceClass *dk = DEVICE_CLASS(k);
> @@ -640,6 +706,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>      dk->reset = reset;
>      dk->realize = realize;
>      dk->unrealize = unrealize;
> +    dk->vmsd = &vmstate_spapr_drc;
> +    dk->dev_get_instance_id = get_instance_id;
>      drck->set_isolation_state = set_isolation_state;
>      drck->set_indicator_state = set_indicator_state;
>      drck->set_allocation_state = set_allocation_state;
> @@ -653,6 +721,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>      drck->detach = detach;
>      drck->release_pending = release_pending;
>      drck->set_signalled = set_signalled;
> +    drck->postmigrate_set_detach_cb = postmigrate_set_detach_cb;
> +
>      /*
>       * Reason: it crashes FIXME find and document the real reason
>       */
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index f9661b7..661f7d8 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1638,11 +1638,33 @@ static void spapr_pci_pre_save(void *opaque)
>      }
>  }
>  
> +/*
> + * detach_cb in the DRC state is a function pointer that cannot be
> + * migrated. We set it right after migration so that a migrated
> + * hot-unplug event could finish its work.
> + */
> +static void spapr_pci_set_detach_cb(PCIBus *bus, PCIDevice *pdev,
> +                                 void *opaque)
> +{
> +    sPAPRPHBState *sphb = opaque;
> +    sPAPRDRConnector *drc = spapr_phb_get_pci_drc(sphb, pdev);
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);

So we assume here that all PCI devices have an associated DRC, which is
of course wrong for coldplug devices.

> +    drck->postmigrate_set_detach_cb(drc, spapr_phb_remove_pci_device_cb);
> +}
> +
>  static int spapr_pci_post_load(void *opaque, int version_id)
>  {
>      sPAPRPHBState *sphb = opaque;
>      gpointer key, value;
>      int i;
> +    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
> +    unsigned int bus_no = 0;
> +
> +    /* Set detach_cb for the drc unconditionally after migration */
> +    if (bus) {
> +        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
> +                            &bus_no);
> +    }
>  
>      for (i = 0; i < sphb->msi_devs_num; ++i) {
>          key = g_memdup(&sphb->msi_devs[i].key,
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index fa531d5..17589c8 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -192,6 +192,15 @@ typedef struct sPAPRDRConnectorClass {
>                     void *detach_cb_opaque, Error **errp);
>      bool (*release_pending)(sPAPRDRConnector *drc);
>      void (*set_signalled)(sPAPRDRConnector *drc);
> +
> +    /*
> +     * QEMU interface for setting detach_cb after migration.
> +     * detach_cb in the DRC state is a function pointer that cannot be
> +     * migrated. We set it right after migration so that a migrated
> +     * hot-unplug event could finish its work.
> +     */
> +    void (*postmigrate_set_detach_cb)(sPAPRDRConnector *drc,
> +                                      spapr_drc_detach_cb *detach_cb);
>  } sPAPRDRConnectorClass;
>  
>  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct
  2016-11-22 16:35   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2016-11-22 17:24     ` Michael Roth
  2016-11-22 17:33       ` Michael Roth
  2016-11-22 20:09       ` Greg Kurz
  0 siblings, 2 replies; 21+ messages in thread
From: Michael Roth @ 2016-11-22 17:24 UTC (permalink / raw)
  To: Greg Kurz
  Cc: qemu-devel, duanj, quintela, dgilbert, qemu-ppc, bharata,
	amit.shah, david

Quoting Greg Kurz (2016-11-22 10:35:52)
> On Thu, 17 Nov 2016 19:40:26 -0600
> Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> 
> > From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > 
> > To manage hotplug/unplug of dynamic resources such as PCI cards,
> > memory, and CPU on sPAPR guests, a firmware abstraction known as
> > a Dynamic Resource Connector (DRC) is used to assign a particular
> > dynamic resource to the guest, and provide an interface for the
> > guest to manage configuration/removal of the resource associated
> > with it.
> > 
> > To migrate the hotplugged resources in migration, the
> > associated DRC state need be migrated. To migrate the DRC state,
> > we defined the VMStateDescription struct for spapr_drc to enable
> > the transmission of spapr_drc state in migration.
> > 
> > Not all the elements in the DRC state are migrated. Only those
> > ones modifiable or needed by guest actions or device add/remove
> > operation are migrated. From the perspective of device
> > hotplugging, if we hotplug a device on the source, we need to
> > "coldplug" it on the target. The states across two hosts for the
> > same device are not the same. Ideally we want the states be same
> > after migration so that the device would function as hotplugged
> > on the target. For example we can unplug it. The minimum DRC
> > state we need to transfer should cover all the pieces changed by
> > hotplugging. Out of the elements of the DRC state, isolation_state,
> > allocation_sate, and configured are involved in the DR state
> > transition diagram from PAPR+ 2.7, 13.4. configured and signalled
> > are needed in attaching and detaching devices. indicator_state
> > provides users with hardware state information. These 6 elements
> > are migrated.
> > 
> > detach_cb in the DRC state is a function pointer that cannot be
> > migrated. We set it right after DRC state is migrated so that
> > a migrated hot-unplug event could finish its work.
> > 
> > The instance_id is used to identify objects in migration. We set
> > instance_id of DRC using the unique index so that it is the same
> > across migration.
> > 
> > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > * add migration for awaiting_allocation state
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_drc.c         | 70 ++++++++++++++++++++++++++++++++++++++++++++++
> >  hw/ppc/spapr_pci.c         | 22 +++++++++++++++
> >  include/hw/ppc/spapr_drc.h |  9 ++++++
> >  3 files changed, 101 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index a0c44ee..1ec6551 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -632,6 +632,72 @@ static void spapr_dr_connector_instance_init(Object *obj)
> >                          NULL, NULL, NULL, NULL);
> >  }
> >  
> > +static bool spapr_drc_needed(void *opaque)
> > +{
> > +    sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    bool rc = false;
> > +    sPAPRDREntitySense value;
> > +
> > +    drck->entity_sense(drc, &value);
> > +    /* If no dev is plugged in there is no need to migrate the DRC state */
> > +    if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
> > +        return false;
> > +    }
> > +    /*
> > +     * If there is dev plugged in, we need to migrate the DRC state when
> > +     * it is different from cold-plugged state
> > +     */
> > +    switch(drc->type) {
> > +    /* for PCI type */
> > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
> > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
> > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > +        break;
> > +    /* for LMB type */
> > +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
> > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > +        break;
> > +    default:
> > +        ;
> > +    }
> > +
> > +    return rc;
> > +}
> > +
> > +/* detach_cb needs be set since it is not migrated */
> > +static void postmigrate_set_detach_cb(sPAPRDRConnector *drc,
> > +                                      spapr_drc_detach_cb *detach_cb)
> > +{
> > +    drc->detach_cb = detach_cb;
> > +}
> > +
> > +/* return the unique drc index as instance_id for qom interfaces*/
> > +static int get_instance_id(DeviceState *dev)
> > +{
> > +    return (int)get_index(SPAPR_DR_CONNECTOR(OBJECT(dev)));
> > +}
> > +
> > +static const VMStateDescription vmstate_spapr_drc = {
> > +    .name = "spapr_drc",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .needed = spapr_drc_needed,
> > +    .fields  = (VMStateField []) {
> > +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> > +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> > +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> > +        VMSTATE_BOOL(configured, sPAPRDRConnector),
> > +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> > +        VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnector),
> > +        VMSTATE_BOOL(signalled, sPAPRDRConnector),
> > +        VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> >  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >  {
> >      DeviceClass *dk = DEVICE_CLASS(k);
> > @@ -640,6 +706,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >      dk->reset = reset;
> >      dk->realize = realize;
> >      dk->unrealize = unrealize;
> > +    dk->vmsd = &vmstate_spapr_drc;
> > +    dk->dev_get_instance_id = get_instance_id;
> >      drck->set_isolation_state = set_isolation_state;
> >      drck->set_indicator_state = set_indicator_state;
> >      drck->set_allocation_state = set_allocation_state;
> > @@ -653,6 +721,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >      drck->detach = detach;
> >      drck->release_pending = release_pending;
> >      drck->set_signalled = set_signalled;
> > +    drck->postmigrate_set_detach_cb = postmigrate_set_detach_cb;
> > +
> >      /*
> >       * Reason: it crashes FIXME find and document the real reason
> >       */
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index f9661b7..661f7d8 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1638,11 +1638,33 @@ static void spapr_pci_pre_save(void *opaque)
> >      }
> >  }
> >  
> > +/*
> > + * detach_cb in the DRC state is a function pointer that cannot be
> > + * migrated. We set it right after migration so that a migrated
> > + * hot-unplug event could finish its work.
> > + */
> > +static void spapr_pci_set_detach_cb(PCIBus *bus, PCIDevice *pdev,
> > +                                 void *opaque)
> > +{
> > +    sPAPRPHBState *sphb = opaque;
> > +    sPAPRDRConnector *drc = spapr_phb_get_pci_drc(sphb, pdev);
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> 
> So we assume here that all PCI devices have an associated DRC, which is
> of course wrong for coldplug devices.

In spapr_phb_realize we do actually create a DRC for each possible PCI
device:

   /* allocate connectors for child PCI devices */
    if (sphb->dr_enabled) {
        for (i = 0; i < PCI_SLOT_MAX * 8; i++) {
            spapr_dr_connector_new(OBJECT(phb),
                                   SPAPR_DR_CONNECTOR_TYPE_PCI,
                                   (sphb->index << 16) | i);
        }
    }

This is so coldplugged devices still have a mechanism for hot unplug
later. Is there another scenario that I'm missing?

However, now that I notice the sphb->dr_enabled that does make me
concerned that this assumption will not hold for older machine types
with dr disabled. Will make sure to check on that (and the postcopy
test issue) before re-submitting.

> 
> > +    drck->postmigrate_set_detach_cb(drc, spapr_phb_remove_pci_device_cb);
> > +}
> > +
> >  static int spapr_pci_post_load(void *opaque, int version_id)
> >  {
> >      sPAPRPHBState *sphb = opaque;
> >      gpointer key, value;
> >      int i;
> > +    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
> > +    unsigned int bus_no = 0;
> > +
> > +    /* Set detach_cb for the drc unconditionally after migration */
> > +    if (bus) {
> > +        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
> > +                            &bus_no);
> > +    }
> >  
> >      for (i = 0; i < sphb->msi_devs_num; ++i) {
> >          key = g_memdup(&sphb->msi_devs[i].key,
> > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > index fa531d5..17589c8 100644
> > --- a/include/hw/ppc/spapr_drc.h
> > +++ b/include/hw/ppc/spapr_drc.h
> > @@ -192,6 +192,15 @@ typedef struct sPAPRDRConnectorClass {
> >                     void *detach_cb_opaque, Error **errp);
> >      bool (*release_pending)(sPAPRDRConnector *drc);
> >      void (*set_signalled)(sPAPRDRConnector *drc);
> > +
> > +    /*
> > +     * QEMU interface for setting detach_cb after migration.
> > +     * detach_cb in the DRC state is a function pointer that cannot be
> > +     * migrated. We set it right after migration so that a migrated
> > +     * hot-unplug event could finish its work.
> > +     */
> > +    void (*postmigrate_set_detach_cb)(sPAPRDRConnector *drc,
> > +                                      spapr_drc_detach_cb *detach_cb);
> >  } sPAPRDRConnectorClass;
> >  
> >  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct
  2016-11-22 17:24     ` Michael Roth
@ 2016-11-22 17:33       ` Michael Roth
  2016-11-22 21:28         ` Greg Kurz
  2016-11-22 20:09       ` Greg Kurz
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Roth @ 2016-11-22 17:33 UTC (permalink / raw)
  To: Greg Kurz
  Cc: qemu-devel, duanj, quintela, dgilbert, qemu-ppc, bharata,
	amit.shah, david

Quoting Michael Roth (2016-11-22 11:24:23)
> Quoting Greg Kurz (2016-11-22 10:35:52)
> > On Thu, 17 Nov 2016 19:40:26 -0600
> > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > 
> > > From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > > 
> > > To manage hotplug/unplug of dynamic resources such as PCI cards,
> > > memory, and CPU on sPAPR guests, a firmware abstraction known as
> > > a Dynamic Resource Connector (DRC) is used to assign a particular
> > > dynamic resource to the guest, and provide an interface for the
> > > guest to manage configuration/removal of the resource associated
> > > with it.
> > > 
> > > To migrate the hotplugged resources in migration, the
> > > associated DRC state need be migrated. To migrate the DRC state,
> > > we defined the VMStateDescription struct for spapr_drc to enable
> > > the transmission of spapr_drc state in migration.
> > > 
> > > Not all the elements in the DRC state are migrated. Only those
> > > ones modifiable or needed by guest actions or device add/remove
> > > operation are migrated. From the perspective of device
> > > hotplugging, if we hotplug a device on the source, we need to
> > > "coldplug" it on the target. The states across two hosts for the
> > > same device are not the same. Ideally we want the states be same
> > > after migration so that the device would function as hotplugged
> > > on the target. For example we can unplug it. The minimum DRC
> > > state we need to transfer should cover all the pieces changed by
> > > hotplugging. Out of the elements of the DRC state, isolation_state,
> > > allocation_sate, and configured are involved in the DR state
> > > transition diagram from PAPR+ 2.7, 13.4. configured and signalled
> > > are needed in attaching and detaching devices. indicator_state
> > > provides users with hardware state information. These 6 elements
> > > are migrated.
> > > 
> > > detach_cb in the DRC state is a function pointer that cannot be
> > > migrated. We set it right after DRC state is migrated so that
> > > a migrated hot-unplug event could finish its work.
> > > 
> > > The instance_id is used to identify objects in migration. We set
> > > instance_id of DRC using the unique index so that it is the same
> > > across migration.
> > > 
> > > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > > * add migration for awaiting_allocation state
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/spapr_drc.c         | 70 ++++++++++++++++++++++++++++++++++++++++++++++
> > >  hw/ppc/spapr_pci.c         | 22 +++++++++++++++
> > >  include/hw/ppc/spapr_drc.h |  9 ++++++
> > >  3 files changed, 101 insertions(+)
> > > 
> > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > > index a0c44ee..1ec6551 100644
> > > --- a/hw/ppc/spapr_drc.c
> > > +++ b/hw/ppc/spapr_drc.c
> > > @@ -632,6 +632,72 @@ static void spapr_dr_connector_instance_init(Object *obj)
> > >                          NULL, NULL, NULL, NULL);
> > >  }
> > >  
> > > +static bool spapr_drc_needed(void *opaque)
> > > +{
> > > +    sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +    bool rc = false;
> > > +    sPAPRDREntitySense value;
> > > +
> > > +    drck->entity_sense(drc, &value);
> > > +    /* If no dev is plugged in there is no need to migrate the DRC state */
> > > +    if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
> > > +        return false;
> > > +    }
> > > +    /*
> > > +     * If there is dev plugged in, we need to migrate the DRC state when
> > > +     * it is different from cold-plugged state
> > > +     */
> > > +    switch(drc->type) {
> > > +    /* for PCI type */
> > > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
> > > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
> > > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > > +        break;
> > > +    /* for LMB type */
> > > +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> > > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
> > > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > > +        break;
> > > +    default:
> > > +        ;
> > > +    }
> > > +
> > > +    return rc;
> > > +}
> > > +
> > > +/* detach_cb needs be set since it is not migrated */
> > > +static void postmigrate_set_detach_cb(sPAPRDRConnector *drc,
> > > +                                      spapr_drc_detach_cb *detach_cb)
> > > +{
> > > +    drc->detach_cb = detach_cb;
> > > +}
> > > +
> > > +/* return the unique drc index as instance_id for qom interfaces*/
> > > +static int get_instance_id(DeviceState *dev)
> > > +{
> > > +    return (int)get_index(SPAPR_DR_CONNECTOR(OBJECT(dev)));
> > > +}
> > > +
> > > +static const VMStateDescription vmstate_spapr_drc = {
> > > +    .name = "spapr_drc",
> > > +    .version_id = 1,
> > > +    .minimum_version_id = 1,
> > > +    .needed = spapr_drc_needed,
> > > +    .fields  = (VMStateField []) {
> > > +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> > > +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> > > +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> > > +        VMSTATE_BOOL(configured, sPAPRDRConnector),
> > > +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> > > +        VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnector),
> > > +        VMSTATE_BOOL(signalled, sPAPRDRConnector),
> > > +        VMSTATE_END_OF_LIST()
> > > +    }
> > > +};
> > > +
> > >  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > >  {
> > >      DeviceClass *dk = DEVICE_CLASS(k);
> > > @@ -640,6 +706,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > >      dk->reset = reset;
> > >      dk->realize = realize;
> > >      dk->unrealize = unrealize;
> > > +    dk->vmsd = &vmstate_spapr_drc;
> > > +    dk->dev_get_instance_id = get_instance_id;
> > >      drck->set_isolation_state = set_isolation_state;
> > >      drck->set_indicator_state = set_indicator_state;
> > >      drck->set_allocation_state = set_allocation_state;
> > > @@ -653,6 +721,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > >      drck->detach = detach;
> > >      drck->release_pending = release_pending;
> > >      drck->set_signalled = set_signalled;
> > > +    drck->postmigrate_set_detach_cb = postmigrate_set_detach_cb;
> > > +
> > >      /*
> > >       * Reason: it crashes FIXME find and document the real reason
> > >       */
> > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > index f9661b7..661f7d8 100644
> > > --- a/hw/ppc/spapr_pci.c
> > > +++ b/hw/ppc/spapr_pci.c
> > > @@ -1638,11 +1638,33 @@ static void spapr_pci_pre_save(void *opaque)
> > >      }
> > >  }
> > >  
> > > +/*
> > > + * detach_cb in the DRC state is a function pointer that cannot be
> > > + * migrated. We set it right after migration so that a migrated
> > > + * hot-unplug event could finish its work.
> > > + */
> > > +static void spapr_pci_set_detach_cb(PCIBus *bus, PCIDevice *pdev,
> > > +                                 void *opaque)
> > > +{
> > > +    sPAPRPHBState *sphb = opaque;
> > > +    sPAPRDRConnector *drc = spapr_phb_get_pci_drc(sphb, pdev);
> > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > 
> > So we assume here that all PCI devices have an associated DRC, which is
> > of course wrong for coldplug devices.
> 
> In spapr_phb_realize we do actually create a DRC for each possible PCI
> device:
> 
>    /* allocate connectors for child PCI devices */
>     if (sphb->dr_enabled) {
>         for (i = 0; i < PCI_SLOT_MAX * 8; i++) {
>             spapr_dr_connector_new(OBJECT(phb),
>                                    SPAPR_DR_CONNECTOR_TYPE_PCI,
>                                    (sphb->index << 16) | i);
>         }
>     }
> 
> This is so coldplugged devices still have a mechanism for hot unplug
> later. Is there another scenario that I'm missing?
> 
> However, now that I notice the sphb->dr_enabled that does make me
> concerned that this assumption will not hold for older machine types
> with dr disabled. Will make sure to check on that (and the postcopy
> test issue) before re-submitting.

I'm also just noticing that the post-migrate hook to set the detach_cb
is only necessary for cases where migration occurs after device_del
was issued on the source (but before the device has been released
by the guest).  This is part of the "race window" mentioned in the
summary and not actually within the scope of what this series is trying
to fix. In our scenario the device_del gets issued on the target side,
which is when the detach_cb's are set prior to sending unplug event to
guest, so we don't need the post-migrate hook.

I'll go ahead and pull it out completely since there were other
discussions about better ways to approach this anyway.

> 
> > 
> > > +    drck->postmigrate_set_detach_cb(drc, spapr_phb_remove_pci_device_cb);
> > > +}
> > > +
> > >  static int spapr_pci_post_load(void *opaque, int version_id)
> > >  {
> > >      sPAPRPHBState *sphb = opaque;
> > >      gpointer key, value;
> > >      int i;
> > > +    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
> > > +    unsigned int bus_no = 0;
> > > +
> > > +    /* Set detach_cb for the drc unconditionally after migration */
> > > +    if (bus) {
> > > +        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
> > > +                            &bus_no);
> > > +    }
> > >  
> > >      for (i = 0; i < sphb->msi_devs_num; ++i) {
> > >          key = g_memdup(&sphb->msi_devs[i].key,
> > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > > index fa531d5..17589c8 100644
> > > --- a/include/hw/ppc/spapr_drc.h
> > > +++ b/include/hw/ppc/spapr_drc.h
> > > @@ -192,6 +192,15 @@ typedef struct sPAPRDRConnectorClass {
> > >                     void *detach_cb_opaque, Error **errp);
> > >      bool (*release_pending)(sPAPRDRConnector *drc);
> > >      void (*set_signalled)(sPAPRDRConnector *drc);
> > > +
> > > +    /*
> > > +     * QEMU interface for setting detach_cb after migration.
> > > +     * detach_cb in the DRC state is a function pointer that cannot be
> > > +     * migrated. We set it right after migration so that a migrated
> > > +     * hot-unplug event could finish its work.
> > > +     */
> > > +    void (*postmigrate_set_detach_cb)(sPAPRDRConnector *drc,
> > > +                                      spapr_drc_detach_cb *detach_cb);
> > >  } sPAPRDRConnectorClass;
> > >  
> > >  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct
  2016-11-22 17:24     ` Michael Roth
  2016-11-22 17:33       ` Michael Roth
@ 2016-11-22 20:09       ` Greg Kurz
  1 sibling, 0 replies; 21+ messages in thread
From: Greg Kurz @ 2016-11-22 20:09 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, duanj, quintela, dgilbert, qemu-ppc, bharata,
	amit.shah, david

On Tue, 22 Nov 2016 11:24:23 -0600
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> Quoting Greg Kurz (2016-11-22 10:35:52)
> > On Thu, 17 Nov 2016 19:40:26 -0600
> > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> >   
> > > From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > > 
> > > To manage hotplug/unplug of dynamic resources such as PCI cards,
> > > memory, and CPU on sPAPR guests, a firmware abstraction known as
> > > a Dynamic Resource Connector (DRC) is used to assign a particular
> > > dynamic resource to the guest, and provide an interface for the
> > > guest to manage configuration/removal of the resource associated
> > > with it.
> > > 
> > > To migrate the hotplugged resources in migration, the
> > > associated DRC state need be migrated. To migrate the DRC state,
> > > we defined the VMStateDescription struct for spapr_drc to enable
> > > the transmission of spapr_drc state in migration.
> > > 
> > > Not all the elements in the DRC state are migrated. Only those
> > > ones modifiable or needed by guest actions or device add/remove
> > > operation are migrated. From the perspective of device
> > > hotplugging, if we hotplug a device on the source, we need to
> > > "coldplug" it on the target. The states across two hosts for the
> > > same device are not the same. Ideally we want the states be same
> > > after migration so that the device would function as hotplugged
> > > on the target. For example we can unplug it. The minimum DRC
> > > state we need to transfer should cover all the pieces changed by
> > > hotplugging. Out of the elements of the DRC state, isolation_state,
> > > allocation_sate, and configured are involved in the DR state
> > > transition diagram from PAPR+ 2.7, 13.4. configured and signalled
> > > are needed in attaching and detaching devices. indicator_state
> > > provides users with hardware state information. These 6 elements
> > > are migrated.
> > > 
> > > detach_cb in the DRC state is a function pointer that cannot be
> > > migrated. We set it right after DRC state is migrated so that
> > > a migrated hot-unplug event could finish its work.
> > > 
> > > The instance_id is used to identify objects in migration. We set
> > > instance_id of DRC using the unique index so that it is the same
> > > across migration.
> > > 
> > > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > > * add migration for awaiting_allocation state
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/spapr_drc.c         | 70 ++++++++++++++++++++++++++++++++++++++++++++++
> > >  hw/ppc/spapr_pci.c         | 22 +++++++++++++++
> > >  include/hw/ppc/spapr_drc.h |  9 ++++++
> > >  3 files changed, 101 insertions(+)
> > > 
> > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > > index a0c44ee..1ec6551 100644
> > > --- a/hw/ppc/spapr_drc.c
> > > +++ b/hw/ppc/spapr_drc.c
> > > @@ -632,6 +632,72 @@ static void spapr_dr_connector_instance_init(Object *obj)
> > >                          NULL, NULL, NULL, NULL);
> > >  }
> > >  
> > > +static bool spapr_drc_needed(void *opaque)
> > > +{
> > > +    sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +    bool rc = false;
> > > +    sPAPRDREntitySense value;
> > > +
> > > +    drck->entity_sense(drc, &value);
> > > +    /* If no dev is plugged in there is no need to migrate the DRC state */
> > > +    if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
> > > +        return false;
> > > +    }
> > > +    /*
> > > +     * If there is dev plugged in, we need to migrate the DRC state when
> > > +     * it is different from cold-plugged state
> > > +     */
> > > +    switch(drc->type) {
> > > +    /* for PCI type */
> > > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
> > > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
> > > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > > +        break;
> > > +    /* for LMB type */
> > > +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> > > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
> > > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > > +        break;
> > > +    default:
> > > +        ;
> > > +    }
> > > +
> > > +    return rc;
> > > +}
> > > +
> > > +/* detach_cb needs be set since it is not migrated */
> > > +static void postmigrate_set_detach_cb(sPAPRDRConnector *drc,
> > > +                                      spapr_drc_detach_cb *detach_cb)
> > > +{
> > > +    drc->detach_cb = detach_cb;
> > > +}
> > > +
> > > +/* return the unique drc index as instance_id for qom interfaces*/
> > > +static int get_instance_id(DeviceState *dev)
> > > +{
> > > +    return (int)get_index(SPAPR_DR_CONNECTOR(OBJECT(dev)));
> > > +}
> > > +
> > > +static const VMStateDescription vmstate_spapr_drc = {
> > > +    .name = "spapr_drc",
> > > +    .version_id = 1,
> > > +    .minimum_version_id = 1,
> > > +    .needed = spapr_drc_needed,
> > > +    .fields  = (VMStateField []) {
> > > +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> > > +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> > > +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> > > +        VMSTATE_BOOL(configured, sPAPRDRConnector),
> > > +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> > > +        VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnector),
> > > +        VMSTATE_BOOL(signalled, sPAPRDRConnector),
> > > +        VMSTATE_END_OF_LIST()
> > > +    }
> > > +};
> > > +
> > >  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > >  {
> > >      DeviceClass *dk = DEVICE_CLASS(k);
> > > @@ -640,6 +706,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > >      dk->reset = reset;
> > >      dk->realize = realize;
> > >      dk->unrealize = unrealize;
> > > +    dk->vmsd = &vmstate_spapr_drc;
> > > +    dk->dev_get_instance_id = get_instance_id;
> > >      drck->set_isolation_state = set_isolation_state;
> > >      drck->set_indicator_state = set_indicator_state;
> > >      drck->set_allocation_state = set_allocation_state;
> > > @@ -653,6 +721,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > >      drck->detach = detach;
> > >      drck->release_pending = release_pending;
> > >      drck->set_signalled = set_signalled;
> > > +    drck->postmigrate_set_detach_cb = postmigrate_set_detach_cb;
> > > +
> > >      /*
> > >       * Reason: it crashes FIXME find and document the real reason
> > >       */
> > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > index f9661b7..661f7d8 100644
> > > --- a/hw/ppc/spapr_pci.c
> > > +++ b/hw/ppc/spapr_pci.c
> > > @@ -1638,11 +1638,33 @@ static void spapr_pci_pre_save(void *opaque)
> > >      }
> > >  }
> > >  
> > > +/*
> > > + * detach_cb in the DRC state is a function pointer that cannot be
> > > + * migrated. We set it right after migration so that a migrated
> > > + * hot-unplug event could finish its work.
> > > + */
> > > +static void spapr_pci_set_detach_cb(PCIBus *bus, PCIDevice *pdev,
> > > +                                 void *opaque)
> > > +{
> > > +    sPAPRPHBState *sphb = opaque;
> > > +    sPAPRDRConnector *drc = spapr_phb_get_pci_drc(sphb, pdev);
> > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);  
> > 
> > So we assume here that all PCI devices have an associated DRC, which is
> > of course wrong for coldplug devices.  
> 
> In spapr_phb_realize we do actually create a DRC for each possible PCI
> device:
> 
>    /* allocate connectors for child PCI devices */
>     if (sphb->dr_enabled) {
>         for (i = 0; i < PCI_SLOT_MAX * 8; i++) {
>             spapr_dr_connector_new(OBJECT(phb),
>                                    SPAPR_DR_CONNECTOR_TYPE_PCI,
>                                    (sphb->index << 16) | i);
>         }
>     }
> 
> This is so coldplugged devices still have a mechanism for hot unplug
> later. Is there another scenario that I'm missing?
> 
> However, now that I notice the sphb->dr_enabled that does make me
> concerned that this assumption will not hold for older machine types
> with dr disabled. Will make sure to check on that (and the postcopy
> test issue) before re-submitting.
> 

Ok, I was testing migration of a pseries-2.6 machine when I hit the issue. :)
Checking sphb->dr_enabled in spapr_pci_set_detach_cb() looks like the right
thing to do indeed.

Cheers.

--
Greg

> >   
> > > +    drck->postmigrate_set_detach_cb(drc, spapr_phb_remove_pci_device_cb);
> > > +}
> > > +
> > >  static int spapr_pci_post_load(void *opaque, int version_id)
> > >  {
> > >      sPAPRPHBState *sphb = opaque;
> > >      gpointer key, value;
> > >      int i;
> > > +    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
> > > +    unsigned int bus_no = 0;
> > > +
> > > +    /* Set detach_cb for the drc unconditionally after migration */
> > > +    if (bus) {
> > > +        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
> > > +                            &bus_no);
> > > +    }
> > >  
> > >      for (i = 0; i < sphb->msi_devs_num; ++i) {
> > >          key = g_memdup(&sphb->msi_devs[i].key,
> > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > > index fa531d5..17589c8 100644
> > > --- a/include/hw/ppc/spapr_drc.h
> > > +++ b/include/hw/ppc/spapr_drc.h
> > > @@ -192,6 +192,15 @@ typedef struct sPAPRDRConnectorClass {
> > >                     void *detach_cb_opaque, Error **errp);
> > >      bool (*release_pending)(sPAPRDRConnector *drc);
> > >      void (*set_signalled)(sPAPRDRConnector *drc);
> > > +
> > > +    /*
> > > +     * QEMU interface for setting detach_cb after migration.
> > > +     * detach_cb in the DRC state is a function pointer that cannot be
> > > +     * migrated. We set it right after migration so that a migrated
> > > +     * hot-unplug event could finish its work.
> > > +     */
> > > +    void (*postmigrate_set_detach_cb)(sPAPRDRConnector *drc,
> > > +                                      spapr_drc_detach_cb *detach_cb);
> > >  } sPAPRDRConnectorClass;
> > >  
> > >  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,  
> >   
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct
  2016-11-22 17:33       ` Michael Roth
@ 2016-11-22 21:28         ` Greg Kurz
  0 siblings, 0 replies; 21+ messages in thread
From: Greg Kurz @ 2016-11-22 21:28 UTC (permalink / raw)
  To: Michael Roth
  Cc: duanj, quintela, qemu-devel, dgilbert, qemu-ppc, bharata,
	amit.shah, david

On Tue, 22 Nov 2016 11:33:44 -0600
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> Quoting Michael Roth (2016-11-22 11:24:23)
> > Quoting Greg Kurz (2016-11-22 10:35:52)  
> > > On Thu, 17 Nov 2016 19:40:26 -0600
> > > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > >   
> > > > From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > > > 
> > > > To manage hotplug/unplug of dynamic resources such as PCI cards,
> > > > memory, and CPU on sPAPR guests, a firmware abstraction known as
> > > > a Dynamic Resource Connector (DRC) is used to assign a particular
> > > > dynamic resource to the guest, and provide an interface for the
> > > > guest to manage configuration/removal of the resource associated
> > > > with it.
> > > > 
> > > > To migrate the hotplugged resources in migration, the
> > > > associated DRC state need be migrated. To migrate the DRC state,
> > > > we defined the VMStateDescription struct for spapr_drc to enable
> > > > the transmission of spapr_drc state in migration.
> > > > 
> > > > Not all the elements in the DRC state are migrated. Only those
> > > > ones modifiable or needed by guest actions or device add/remove
> > > > operation are migrated. From the perspective of device
> > > > hotplugging, if we hotplug a device on the source, we need to
> > > > "coldplug" it on the target. The states across two hosts for the
> > > > same device are not the same. Ideally we want the states be same
> > > > after migration so that the device would function as hotplugged
> > > > on the target. For example we can unplug it. The minimum DRC
> > > > state we need to transfer should cover all the pieces changed by
> > > > hotplugging. Out of the elements of the DRC state, isolation_state,
> > > > allocation_sate, and configured are involved in the DR state
> > > > transition diagram from PAPR+ 2.7, 13.4. configured and signalled
> > > > are needed in attaching and detaching devices. indicator_state
> > > > provides users with hardware state information. These 6 elements
> > > > are migrated.
> > > > 
> > > > detach_cb in the DRC state is a function pointer that cannot be
> > > > migrated. We set it right after DRC state is migrated so that
> > > > a migrated hot-unplug event could finish its work.
> > > > 
> > > > The instance_id is used to identify objects in migration. We set
> > > > instance_id of DRC using the unique index so that it is the same
> > > > across migration.
> > > > 
> > > > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > > > * add migration for awaiting_allocation state
> > > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > > ---
> > > >  hw/ppc/spapr_drc.c         | 70 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >  hw/ppc/spapr_pci.c         | 22 +++++++++++++++
> > > >  include/hw/ppc/spapr_drc.h |  9 ++++++
> > > >  3 files changed, 101 insertions(+)
> > > > 
> > > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > > > index a0c44ee..1ec6551 100644
> > > > --- a/hw/ppc/spapr_drc.c
> > > > +++ b/hw/ppc/spapr_drc.c
> > > > @@ -632,6 +632,72 @@ static void spapr_dr_connector_instance_init(Object *obj)
> > > >                          NULL, NULL, NULL, NULL);
> > > >  }
> > > >  
> > > > +static bool spapr_drc_needed(void *opaque)
> > > > +{
> > > > +    sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> > > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > +    bool rc = false;
> > > > +    sPAPRDREntitySense value;
> > > > +
> > > > +    drck->entity_sense(drc, &value);
> > > > +    /* If no dev is plugged in there is no need to migrate the DRC state */
> > > > +    if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
> > > > +        return false;
> > > > +    }
> > > > +    /*
> > > > +     * If there is dev plugged in, we need to migrate the DRC state when
> > > > +     * it is different from cold-plugged state
> > > > +     */
> > > > +    switch(drc->type) {
> > > > +    /* for PCI type */
> > > > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > > > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
> > > > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
> > > > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > > > +        break;
> > > > +    /* for LMB type */
> > > > +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > > > +        rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> > > > +               (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
> > > > +               drc->configured && drc->signalled && !drc->awaiting_release);
> > > > +        break;
> > > > +    default:
> > > > +        ;
> > > > +    }
> > > > +
> > > > +    return rc;
> > > > +}
> > > > +
> > > > +/* detach_cb needs be set since it is not migrated */
> > > > +static void postmigrate_set_detach_cb(sPAPRDRConnector *drc,
> > > > +                                      spapr_drc_detach_cb *detach_cb)
> > > > +{
> > > > +    drc->detach_cb = detach_cb;
> > > > +}
> > > > +
> > > > +/* return the unique drc index as instance_id for qom interfaces*/
> > > > +static int get_instance_id(DeviceState *dev)
> > > > +{
> > > > +    return (int)get_index(SPAPR_DR_CONNECTOR(OBJECT(dev)));
> > > > +}
> > > > +
> > > > +static const VMStateDescription vmstate_spapr_drc = {
> > > > +    .name = "spapr_drc",
> > > > +    .version_id = 1,
> > > > +    .minimum_version_id = 1,
> > > > +    .needed = spapr_drc_needed,
> > > > +    .fields  = (VMStateField []) {
> > > > +        VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> > > > +        VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> > > > +        VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> > > > +        VMSTATE_BOOL(configured, sPAPRDRConnector),
> > > > +        VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> > > > +        VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnector),
> > > > +        VMSTATE_BOOL(signalled, sPAPRDRConnector),
> > > > +        VMSTATE_END_OF_LIST()
> > > > +    }
> > > > +};
> > > > +
> > > >  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > > >  {
> > > >      DeviceClass *dk = DEVICE_CLASS(k);
> > > > @@ -640,6 +706,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > > >      dk->reset = reset;
> > > >      dk->realize = realize;
> > > >      dk->unrealize = unrealize;
> > > > +    dk->vmsd = &vmstate_spapr_drc;
> > > > +    dk->dev_get_instance_id = get_instance_id;
> > > >      drck->set_isolation_state = set_isolation_state;
> > > >      drck->set_indicator_state = set_indicator_state;
> > > >      drck->set_allocation_state = set_allocation_state;
> > > > @@ -653,6 +721,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > > >      drck->detach = detach;
> > > >      drck->release_pending = release_pending;
> > > >      drck->set_signalled = set_signalled;
> > > > +    drck->postmigrate_set_detach_cb = postmigrate_set_detach_cb;
> > > > +
> > > >      /*
> > > >       * Reason: it crashes FIXME find and document the real reason
> > > >       */
> > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > > index f9661b7..661f7d8 100644
> > > > --- a/hw/ppc/spapr_pci.c
> > > > +++ b/hw/ppc/spapr_pci.c
> > > > @@ -1638,11 +1638,33 @@ static void spapr_pci_pre_save(void *opaque)
> > > >      }
> > > >  }
> > > >  
> > > > +/*
> > > > + * detach_cb in the DRC state is a function pointer that cannot be
> > > > + * migrated. We set it right after migration so that a migrated
> > > > + * hot-unplug event could finish its work.
> > > > + */
> > > > +static void spapr_pci_set_detach_cb(PCIBus *bus, PCIDevice *pdev,
> > > > +                                 void *opaque)
> > > > +{
> > > > +    sPAPRPHBState *sphb = opaque;
> > > > +    sPAPRDRConnector *drc = spapr_phb_get_pci_drc(sphb, pdev);
> > > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);  
> > > 
> > > So we assume here that all PCI devices have an associated DRC, which is
> > > of course wrong for coldplug devices.  
> > 
> > In spapr_phb_realize we do actually create a DRC for each possible PCI
> > device:
> > 
> >    /* allocate connectors for child PCI devices */
> >     if (sphb->dr_enabled) {
> >         for (i = 0; i < PCI_SLOT_MAX * 8; i++) {
> >             spapr_dr_connector_new(OBJECT(phb),
> >                                    SPAPR_DR_CONNECTOR_TYPE_PCI,
> >                                    (sphb->index << 16) | i);
> >         }
> >     }
> > 
> > This is so coldplugged devices still have a mechanism for hot unplug
> > later. Is there another scenario that I'm missing?
> > 
> > However, now that I notice the sphb->dr_enabled that does make me
> > concerned that this assumption will not hold for older machine types
> > with dr disabled. Will make sure to check on that (and the postcopy
> > test issue) before re-submitting.  
> 
> I'm also just noticing that the post-migrate hook to set the detach_cb
> is only necessary for cases where migration occurs after device_del
> was issued on the source (but before the device has been released
> by the guest).  This is part of the "race window" mentioned in the
> summary and not actually within the scope of what this series is trying
> to fix. In our scenario the device_del gets issued on the target side,
> which is when the detach_cb's are set prior to sending unplug event to
> guest, so we don't need the post-migrate hook.
> 
> I'll go ahead and pull it out completely since there were other
> discussions about better ways to approach this anyway.
> 

Ok, I've now read the whole thread and I agree the specific case where
migration happens while unplug is not yet finalized should be handled
separately.

Cheers.

--
Greg

> >   
> > >   
> > > > +    drck->postmigrate_set_detach_cb(drc, spapr_phb_remove_pci_device_cb);
> > > > +}
> > > > +
> > > >  static int spapr_pci_post_load(void *opaque, int version_id)
> > > >  {
> > > >      sPAPRPHBState *sphb = opaque;
> > > >      gpointer key, value;
> > > >      int i;
> > > > +    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
> > > > +    unsigned int bus_no = 0;
> > > > +
> > > > +    /* Set detach_cb for the drc unconditionally after migration */
> > > > +    if (bus) {
> > > > +        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
> > > > +                            &bus_no);
> > > > +    }
> > > >  
> > > >      for (i = 0; i < sphb->msi_devs_num; ++i) {
> > > >          key = g_memdup(&sphb->msi_devs[i].key,
> > > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > > > index fa531d5..17589c8 100644
> > > > --- a/include/hw/ppc/spapr_drc.h
> > > > +++ b/include/hw/ppc/spapr_drc.h
> > > > @@ -192,6 +192,15 @@ typedef struct sPAPRDRConnectorClass {
> > > >                     void *detach_cb_opaque, Error **errp);
> > > >      bool (*release_pending)(sPAPRDRConnector *drc);
> > > >      void (*set_signalled)(sPAPRDRConnector *drc);
> > > > +
> > > > +    /*
> > > > +     * QEMU interface for setting detach_cb after migration.
> > > > +     * detach_cb in the DRC state is a function pointer that cannot be
> > > > +     * migrated. We set it right after migration so that a migrated
> > > > +     * hot-unplug event could finish its work.
> > > > +     */
> > > > +    void (*postmigrate_set_detach_cb)(sPAPRDRConnector *drc,
> > > > +                                      spapr_drc_detach_cb *detach_cb);
> > > >  } sPAPRDRConnectorClass;
> > > >  
> > > >  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,  
> > >   
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry
  2016-11-22  6:15   ` David Gibson
  2016-11-22 10:23     ` Dr. David Alan Gilbert
@ 2016-11-22 22:58     ` Michael Roth
  2016-11-30 22:22       ` Michael Roth
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Roth @ 2016-11-22 22:58 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, qemu-ppc, duanj, bharata, dgilbert, quintela, amit.shah

Quoting David Gibson (2016-11-22 00:15:10)
> On Thu, Nov 17, 2016 at 07:40:25PM -0600, Michael Roth wrote:
> > From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > 
> > Currently migrated Devices are identified with an idstr which is
> > calculated automatically using their path in the QOM composition
> > tree. In some cases these Devices may have a more reliable
> > identifier that may be preferable in situations where there's a
> > chance their path in the composition tree might change in the
> > future as a resulting of changes in how the device is modeled
> > in QEMU.
> > 
> > One such Device is the "spapr-dr-connector" used to handle hotplug
> > for various resources on pseries machine types, where the PAPR
> > specification mandates that each DRC have a 32-bit globally unique
> > identifier associated with it. As such, this identifier is also ideal
> > as a reliable way to identify a particular DRC in the migration
> > stream, so we introduce support here for using a caller-side supplied
> > instance_id for Devices in preparation for that.
> > 
> > register_savevm_live() and vmstate_register_with_alias_id() already
> > provide a means to let the caller supply an instance_id instead of
> > relying on the migration subsystem to generate one automatically,
> > but in cases where we're registering SaveVMHandlers or VMSDs
> > (respectively) that are associated with a Device, the instance_id is
> > ignored in favor of a string identifier based solely on the QOM path.
> > 
> > This patch generalizes this so that an instance_id can also be
> > supplied by the caller for Devices. Since VMSD registration for
> > Devices is generally handled automatically by qdev, we also introduce
> > a DeviceClass->dev_get_instance_id() method that, when set, is called
> > by qdev to obtain the corresponding instance_id that should be used
> > for a particular Device. Otherwise we maintain the original behavior
> > of passing instance_id == -1 and thus falling back to the previous
> > logic of using the QOM path.
> > 
> > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > * moved usage of DeviceClass->dev_get_instance_id() out of savevm.c
> >   and into caller (qdev) instead
> > * clarified usage/intent in comments and commit msg
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> 
> I've had to remove this patch and 2/3 from ppc-for-2.8, because I
> discovered (as I was preparing a pull request) that it causes a weird
> breakage.

Sorry for the breakage. I will make sure to run make check on ppc64
beforehand next time as well. Though in this case we might've gotten
lucky that it happened to trigger at all...

> 
> Specifically, on some, but not all, setups it causes the postcopy-test
> to fail with the error:
> 
> qemu-system-ppc64: RP: Received invalid message 0x0000 length 0x0000
> FAIL
> 
> For me, it fails when running on a ppc64le or ppc64 host (RHEL7.3),
> and on 32-bit x86 (Fedora container) but not on x86_64 host (Fedora
> 24).
> 
> That's a pretty baffling set of symptoms, and so far I haven't gotten
> far in figuring out how it could happen.  But I really want to get the
> rest of the patches in ppc-for-2.8 pulled, so I've dropped these for
> now.
> 
> I'll try to debug this once I've prepared the next pull request, but
> if you're able to work out what's going on for me, that would be
> extremely helpful.

It turns out all these strange connections might just be red herrings.
I think the problem is here in spapr_pci_post_load:

    unsigned int bus_no = 0;

    /* Set detach_cb for the drc unconditionally after migration */
    if (bus) {
        pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
                            &bus_no);
    }

I think that was a copy/paste artifact from one of the other callsites in
spapr_pci.c. &bus_no is an opaque that spapr_pci_set_detach_cb ends up trying
to use as sPAPRMachineState, which was leading to a NULL pointer dereference
when trying to access the DRC corresponding the the PCIDevice. The search
query is something like:

    return spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI,
                                    (phb->index << 16) |
                                    (busnr << 8) |
                                    devfn);

My guess is that in some cases phb->index just happened to be 0, which would
in most cases lead to a successful query. That might explain the different
results on different architectures.

Based on the some discussion elsehwere is turns out we don't need this
detach cb stuff at all for the unplug issue we're trying to fix here, so
I will strip it out for the next version.

> 
> I do have one vague theory..

I think that needs to be dealt with as well, more comments below.

> 
> > ---
> >  hw/core/qdev.c         | 6 +++++-
> >  include/hw/qdev-core.h | 9 +++++++++
> >  migration/savevm.c     | 4 ++--
> >  3 files changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index 5783442..c8c0c44 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -933,7 +933,11 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
> >          }
> >  
> >          if (qdev_get_vmsd(dev)) {
> > -            vmstate_register_with_alias_id(dev, -1, qdev_get_vmsd(dev), dev,
> > +            int instance_id = DEVICE_GET_CLASS(dev)->dev_get_instance_id
> > +                ? DEVICE_GET_CLASS(dev)->dev_get_instance_id(dev)
> > +                : -1;
> > +
> > +            vmstate_register_with_alias_id(dev, instance_id, qdev_get_vmsd(dev), dev,
> >                                             dev->instance_id_alias,
> >                                             dev->alias_required_for_version);
> >          }
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 2c97347..8ba82af 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -139,6 +139,15 @@ typedef struct DeviceClass {
> >      qdev_initfn init; /* TODO remove, once users are converted to realize */
> >      qdev_event exit; /* TODO remove, once users are converted to unrealize */
> >      const char *bus_type;
> > +
> > +    /* When this field is set, instead of using the device's QOM path,
> > +     * SaveStateEntry's for devices will be identified using a combination
> > +     * of the corresponding VMSD name and an instance_id returned by this
> > +     * function. This should only be necessary for situations where the
> > +     * QOM path is anticipated to change and a more stable identifier is
> > +     * desired to identify a device in the migration stream.
> > +     */
> > +    int (*dev_get_instance_id)(DeviceState *dev);
> >  } DeviceClass;
> >  
> >  typedef struct NamedGPIOList NamedGPIOList;
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 0363372..a95fff9 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -556,7 +556,7 @@ int register_savevm_live(DeviceState *dev,
> >          se->is_ram = 1;
> >      }
> >  
> > -    if (dev) {
> > +    if (dev && instance_id == -1) {
> 
> .. so, I'm not sure how this could lead to the observed symptoms, but
> I did note that adding this condition makes another if within the
> block redundant, because it also checks for instance_id == -1.  That
> suggests that simply skipping the whole block in this case is probably
> not the right thing to do.

I don't think this is the cause (verified the loading orders didn't
change before/after these patches for the scenario in question), but
that's a good catch:

there was previously se->compat->idstr/instance_id handling for
cases where a we register a VMSD for a Device and provide an explicit
instance_id. This registers an additional check so that we can map
the se->compat values from an older QEMU to the current ID. This
unfortunately doesn't work in the opposite direction so didn't suite
our purposes for controlling identifiers in the outgoing stream, but
by accidentally bypassing it completely there's now a chance old->new
configurations might break for certain devices, so this needs to be
re-thought a bit...

> 
> >          char *id = qdev_get_dev_path(dev);
> >          if (id) {
> >              pstrcpy(se->idstr, sizeof(se->idstr), id);
> > @@ -640,7 +640,7 @@ int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
> >      se->vmsd = vmsd;
> >      se->alias_id = alias_id;
> >  
> > -    if (dev) {
> > +    if (dev && instance_id == -1) {
> >          char *id = qdev_get_dev_path(dev);
> >          if (id) {
> >              pstrcpy(se->idstr, sizeof(se->idstr), id);
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry
  2016-11-22 22:58     ` Michael Roth
@ 2016-11-30 22:22       ` Michael Roth
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Roth @ 2016-11-30 22:22 UTC (permalink / raw)
  To: David Gibson
  Cc: duanj, quintela, qemu-devel, dgilbert, qemu-ppc, bharata, amit.shah

Quoting Michael Roth (2016-11-22 16:58:44)
> Quoting David Gibson (2016-11-22 00:15:10)
> > On Thu, Nov 17, 2016 at 07:40:25PM -0600, Michael Roth wrote:
> > > From: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > > 
> > > Currently migrated Devices are identified with an idstr which is
> > > calculated automatically using their path in the QOM composition
> > > tree. In some cases these Devices may have a more reliable
> > > identifier that may be preferable in situations where there's a
> > > chance their path in the composition tree might change in the
> > > future as a resulting of changes in how the device is modeled
> > > in QEMU.
> > > 
> > > One such Device is the "spapr-dr-connector" used to handle hotplug
> > > for various resources on pseries machine types, where the PAPR
> > > specification mandates that each DRC have a 32-bit globally unique
> > > identifier associated with it. As such, this identifier is also ideal
> > > as a reliable way to identify a particular DRC in the migration
> > > stream, so we introduce support here for using a caller-side supplied
> > > instance_id for Devices in preparation for that.
> > > 
> > > register_savevm_live() and vmstate_register_with_alias_id() already
> > > provide a means to let the caller supply an instance_id instead of
> > > relying on the migration subsystem to generate one automatically,
> > > but in cases where we're registering SaveVMHandlers or VMSDs
> > > (respectively) that are associated with a Device, the instance_id is
> > > ignored in favor of a string identifier based solely on the QOM path.
> > > 
> > > This patch generalizes this so that an instance_id can also be
> > > supplied by the caller for Devices. Since VMSD registration for
> > > Devices is generally handled automatically by qdev, we also introduce
> > > a DeviceClass->dev_get_instance_id() method that, when set, is called
> > > by qdev to obtain the corresponding instance_id that should be used
> > > for a particular Device. Otherwise we maintain the original behavior
> > > of passing instance_id == -1 and thus falling back to the previous
> > > logic of using the QOM path.
> > > 
> > > Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
> > > * moved usage of DeviceClass->dev_get_instance_id() out of savevm.c
> > >   and into caller (qdev) instead
> > > * clarified usage/intent in comments and commit msg
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > 
> > I've had to remove this patch and 2/3 from ppc-for-2.8, because I
> > discovered (as I was preparing a pull request) that it causes a weird
> > breakage.
> 
> Sorry for the breakage. I will make sure to run make check on ppc64
> beforehand next time as well. Though in this case we might've gotten
> lucky that it happened to trigger at all...
> 
> > 
> > Specifically, on some, but not all, setups it causes the postcopy-test
> > to fail with the error:
> > 
> > qemu-system-ppc64: RP: Received invalid message 0x0000 length 0x0000
> > FAIL
> > 
> > For me, it fails when running on a ppc64le or ppc64 host (RHEL7.3),
> > and on 32-bit x86 (Fedora container) but not on x86_64 host (Fedora
> > 24).
> > 
> > That's a pretty baffling set of symptoms, and so far I haven't gotten
> > far in figuring out how it could happen.  But I really want to get the
> > rest of the patches in ppc-for-2.8 pulled, so I've dropped these for
> > now.
> > 
> > I'll try to debug this once I've prepared the next pull request, but
> > if you're able to work out what's going on for me, that would be
> > extremely helpful.
> 
> It turns out all these strange connections might just be red herrings.
> I think the problem is here in spapr_pci_post_load:
> 
>     unsigned int bus_no = 0;
> 
>     /* Set detach_cb for the drc unconditionally after migration */
>     if (bus) {
>         pci_for_each_device(bus, pci_bus_num(bus), spapr_pci_set_detach_cb,
>                             &bus_no);
>     }
> 
> I think that was a copy/paste artifact from one of the other callsites in
> spapr_pci.c. &bus_no is an opaque that spapr_pci_set_detach_cb ends up trying
> to use as sPAPRMachineState, which was leading to a NULL pointer dereference
> when trying to access the DRC corresponding the the PCIDevice. The search
> query is something like:
> 
>     return spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI,
>                                     (phb->index << 16) |
>                                     (busnr << 8) |
>                                     devfn);
> 
> My guess is that in some cases phb->index just happened to be 0, which would
> in most cases lead to a successful query. That might explain the different
> results on different architectures.
> 
> Based on the some discussion elsehwere is turns out we don't need this
> detach cb stuff at all for the unplug issue we're trying to fix here, so
> I will strip it out for the next version.
> 
> > 
> > I do have one vague theory..
> 
> I think that needs to be dealt with as well, more comments below.
> 
> > 
> > > ---
> > >  hw/core/qdev.c         | 6 +++++-
> > >  include/hw/qdev-core.h | 9 +++++++++
> > >  migration/savevm.c     | 4 ++--
> > >  3 files changed, 16 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > > index 5783442..c8c0c44 100644
> > > --- a/hw/core/qdev.c
> > > +++ b/hw/core/qdev.c
> > > @@ -933,7 +933,11 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
> > >          }
> > >  
> > >          if (qdev_get_vmsd(dev)) {
> > > -            vmstate_register_with_alias_id(dev, -1, qdev_get_vmsd(dev), dev,
> > > +            int instance_id = DEVICE_GET_CLASS(dev)->dev_get_instance_id
> > > +                ? DEVICE_GET_CLASS(dev)->dev_get_instance_id(dev)
> > > +                : -1;
> > > +
> > > +            vmstate_register_with_alias_id(dev, instance_id, qdev_get_vmsd(dev), dev,
> > >                                             dev->instance_id_alias,
> > >                                             dev->alias_required_for_version);
> > >          }
> > > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > > index 2c97347..8ba82af 100644
> > > --- a/include/hw/qdev-core.h
> > > +++ b/include/hw/qdev-core.h
> > > @@ -139,6 +139,15 @@ typedef struct DeviceClass {
> > >      qdev_initfn init; /* TODO remove, once users are converted to realize */
> > >      qdev_event exit; /* TODO remove, once users are converted to unrealize */
> > >      const char *bus_type;
> > > +
> > > +    /* When this field is set, instead of using the device's QOM path,
> > > +     * SaveStateEntry's for devices will be identified using a combination
> > > +     * of the corresponding VMSD name and an instance_id returned by this
> > > +     * function. This should only be necessary for situations where the
> > > +     * QOM path is anticipated to change and a more stable identifier is
> > > +     * desired to identify a device in the migration stream.
> > > +     */
> > > +    int (*dev_get_instance_id)(DeviceState *dev);
> > >  } DeviceClass;
> > >  
> > >  typedef struct NamedGPIOList NamedGPIOList;
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index 0363372..a95fff9 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -556,7 +556,7 @@ int register_savevm_live(DeviceState *dev,
> > >          se->is_ram = 1;
> > >      }
> > >  
> > > -    if (dev) {
> > > +    if (dev && instance_id == -1) {
> > 
> > .. so, I'm not sure how this could lead to the observed symptoms, but
> > I did note that adding this condition makes another if within the
> > block redundant, because it also checks for instance_id == -1.  That
> > suggests that simply skipping the whole block in this case is probably
> > not the right thing to do.
> 
> I don't think this is the cause (verified the loading orders didn't
> change before/after these patches for the scenario in question), but
> that's a good catch:
> 
> there was previously se->compat->idstr/instance_id handling for
> cases where a we register a VMSD for a Device and provide an explicit
> instance_id. This registers an additional check so that we can map
> the se->compat values from an older QEMU to the current ID. This
> unfortunately doesn't work in the opposite direction so didn't suite
> our purposes for controlling identifiers in the outgoing stream, but
> by accidentally bypassing it completely there's now a chance old->new
> configurations might break for certain devices, so this needs to be
> re-thought a bit...

It turns out spapr_iommu already had a fairly straightforward solution.
Rather than relying on qdev to register it's vmstate, it does it
manually via:

  vmstate_register(DEVICE(tcet), tcet->liobn, &vmstate_spapr_tce_table,
                   tcet);

This avoids the qom-path-based code due to the fact that
qdev_get_dev_path() is NULL for bus-less Devices, so as a fallback the
VMSD gets identified via "spapr_iommu"/liobn. So using this approach,
we don't actually need patch 1 at all.

But it turns out, for this particular memory unplug issue, we don't
actually need DRC migration. The real issue is that the default DRC
state for coldplugged LMBs *should* correspond to the post-hotplug
state. Otherwise, unplug fails for cold-plugged LMBs *even if you
don't migrate beforehand*. The fix for that is much more
straight-forward, and the same approach we currently use for
coldplugged CPUs. Will send a patch shortly.

> 
> > 
> > >          char *id = qdev_get_dev_path(dev);
> > >          if (id) {
> > >              pstrcpy(se->idstr, sizeof(se->idstr), id);
> > > @@ -640,7 +640,7 @@ int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
> > >      se->vmsd = vmsd;
> > >      se->alias_id = alias_id;
> > >  
> > > -    if (dev) {
> > > +    if (dev && instance_id == -1) {
> > >          char *id = qdev_get_dev_path(dev);
> > >          if (id) {
> > >              pstrcpy(se->idstr, sizeof(se->idstr), id);
> > 
> > -- 
> > David Gibson                    | I'll have my music baroque, and my code
> > david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
> >                                 | _way_ _around_!
> > http://www.ozlabs.org/~dgibson
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-11-30 22:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-18  1:40 [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration Michael Roth
2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 1/3] migration: alternative way to set instance_id in SaveStateEntry Michael Roth
2016-11-22  6:15   ` David Gibson
2016-11-22 10:23     ` Dr. David Alan Gilbert
2016-11-22 22:58     ` Michael Roth
2016-11-30 22:22       ` Michael Roth
2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 2/3] migration: spapr_drc: defined VMStateDescription struct Michael Roth
2016-11-18  6:04   ` David Gibson
2016-11-18 16:32     ` Michael Roth
2016-11-22 16:35   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2016-11-22 17:24     ` Michael Roth
2016-11-22 17:33       ` Michael Roth
2016-11-22 21:28         ` Greg Kurz
2016-11-22 20:09       ` Greg Kurz
2016-11-18  1:40 ` [Qemu-devel] [PATCH for-2.8 3/3] spapr: migration support for CAS-negotiated option vectors Michael Roth
2016-11-18 16:08   ` Michael Roth
2016-11-20 23:57     ` David Gibson
2016-11-18  1:51 ` [Qemu-devel] [PATCH for-2.8 0/3] spapr: fix breakage of memory unplug after migration no-reply
2016-11-18  5:45 ` David Gibson
2016-11-18 16:39   ` Michael Roth
2016-11-20 23:58     ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.