All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal
@ 2015-04-17  8:53 Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 1/7] qemu-agent: add agent init callback when detecting guest setup Chen Fan
                   ` (9 more replies)
  0 siblings, 10 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

backgrond:
Live migration is one of the most important features of virtualization technology.
With regard to recent virtualization techniques, performance of network I/O is critical.
Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
performance gap with native network I/O. Pass-through network devices have near
native performance, however, they have thus far prevented live migration. No existing
methods solve the problem of live migration with pass-through devices perfectly.

There was an idea to solve the problem in website:
https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
Please refer to above document for detailed information.

So I think this problem maybe could be solved by using the combination of existing
technology. and the following steps are we considering to implement:

-  before boot VM, we anticipate to specify two NICs for creating bonding device
   (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
   in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.

-  when qemu-guest-agent startup in guest it would send a notification to libvirt,
   then libvirt will call the previous registered initialize callbacks. so through
   the callback functions, we can create the bonding device according to the XML
   configuration. and here we use netcf tool which can facilitate to create bonding device
   easily.

-  during migration, unplug the passthroughed NIC. then do native migration.

-  on destination side, check whether need to hotplug new NIC according to specified XML.
   usually, we use migrate "--xml" command option to specify the destination host NIC mac
   address to hotplug a new NIC, because source side passthrough NIC mac address is different,
   then hotplug the deivce according to the destination XML configuration.

TODO:
  1.  when hot add a new NIC in destination side after migration finished, the NIC device
      need to re-enslave on bonding device in guest. otherwise, it is offline. maybe
      we should consider bonding driver to support add interfaces dynamically.

This is an example on how this might work, so I want to hear some voices about this scenario.

Thanks,
Chen

Chen Fan (7):
  qemu-agent: add agent init callback when detecting guest setup
  qemu: add guest init event callback to do the initialize work for
    guest
  hostdev: add a 'bond' type element in <hostdev> element
  qemu-agent: add qemuAgentCreateBond interface
  hostdev: add parse ip and route for bond configure
  migrate: hot remove hostdev at perform phase for bond device
  migrate: add hostdev migrate status to support hostdev migration

 docs/schemas/basictypes.rng   |   6 ++
 docs/schemas/domaincommon.rng |  37 ++++++++
 src/conf/domain_conf.c        | 195 ++++++++++++++++++++++++++++++++++++++---
 src/conf/domain_conf.h        |  40 +++++++--
 src/conf/networkcommon_conf.c |  17 ----
 src/conf/networkcommon_conf.h |  17 ++++
 src/libvirt_private.syms      |   1 +
 src/qemu/qemu_agent.c         | 196 +++++++++++++++++++++++++++++++++++++++++-
 src/qemu/qemu_agent.h         |  12 +++
 src/qemu/qemu_command.c       |   3 +
 src/qemu/qemu_domain.c        |  70 +++++++++++++++
 src/qemu/qemu_domain.h        |  14 +++
 src/qemu/qemu_driver.c        |  38 ++++++++
 src/qemu/qemu_hotplug.c       |   8 +-
 src/qemu/qemu_migration.c     |  91 ++++++++++++++++++++
 src/qemu/qemu_migration.h     |   4 +
 src/qemu/qemu_process.c       |  32 +++++++
 src/util/virhostdev.c         |   3 +
 18 files changed, 745 insertions(+), 39 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 1/7] qemu-agent: add agent init callback when detecting guest setup
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
@ 2015-04-17  8:53 ` Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 2/7] qemu: add guest init event callback to do the initialize work for guest Chen Fan
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

sometimes, we want to do some initialize work in guest when guest startup,
but currently, qemu-agent doesn't support that. so here we add an init
callback, when guest startup, notify libvirt it has been up, then
libvirt can do some work for guest.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 src/qemu/qemu_agent.c   | 26 +++++++++++++++++++++++---
 src/qemu/qemu_agent.h   |  2 ++
 src/qemu/qemu_process.c |  6 ++++++
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
index 548d580..cee0f8b 100644
--- a/src/qemu/qemu_agent.c
+++ b/src/qemu/qemu_agent.c
@@ -92,6 +92,7 @@ struct _qemuAgent {
     int watch;
 
     bool connectPending;
+    bool connected;
 
     virDomainObjPtr vm;
 
@@ -306,6 +307,7 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
     virJSONValuePtr obj = NULL;
     int ret = -1;
     unsigned long long id;
+    const char *status;
 
     VIR_DEBUG("Line [%s]", line);
 
@@ -318,7 +320,11 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
         goto cleanup;
     }
 
-    if (virJSONValueObjectHasKey(obj, "QMP") == 1) {
+    if (virJSONValueObjectHasKey(obj, "QMP") == 1 ||
+        virJSONValueObjectHasKey(obj, "status") == 1) {
+        status = virJSONValueObjectGetString(obj, "status");
+        if (STREQ(status, "connected"))
+            mon->connected = true;
         ret = 0;
     } else if (virJSONValueObjectHasKey(obj, "event") == 1) {
         ret = qemuAgentIOProcessEvent(mon, obj);
@@ -700,8 +706,22 @@ qemuAgentIO(int watch, int fd, int events, void *opaque)
         VIR_DEBUG("Triggering error callback");
         (errorNotify)(mon, vm);
     } else {
-        virObjectUnlock(mon);
-        virObjectUnref(mon);
+        if (mon->connected) {
+            void (*init)(qemuAgentPtr, virDomainObjPtr)
+                = mon->cb->init;
+            virDomainObjPtr vm = mon->vm;
+
+            mon->connected = false;
+
+            virObjectUnlock(mon);
+            virObjectUnref(mon);
+
+            VIR_DEBUG("Triggering init callback");
+            (init)(mon, vm);
+        } else {
+            virObjectUnlock(mon);
+            virObjectUnref(mon);
+        }
     }
 }
 
diff --git a/src/qemu/qemu_agent.h b/src/qemu/qemu_agent.h
index 1cd5749..42414a7 100644
--- a/src/qemu/qemu_agent.h
+++ b/src/qemu/qemu_agent.h
@@ -34,6 +34,8 @@ typedef qemuAgent *qemuAgentPtr;
 typedef struct _qemuAgentCallbacks qemuAgentCallbacks;
 typedef qemuAgentCallbacks *qemuAgentCallbacksPtr;
 struct _qemuAgentCallbacks {
+    void (*init)(qemuAgentPtr mon,
+                 virDomainObjPtr vm);
     void (*destroy)(qemuAgentPtr mon,
                     virDomainObjPtr vm);
     void (*eofNotify)(qemuAgentPtr mon,
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 276837e..e6fc53a 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -194,8 +194,14 @@ static void qemuProcessHandleAgentDestroy(qemuAgentPtr agent,
     virObjectUnref(vm);
 }
 
+static void qemuProcessHandleAgentInit(qemuAgentPtr agent ATTRIBUTE_UNUSED,
+                                       virDomainObjPtr vm)
+{
+    VIR_DEBUG("Received init from agent on %p '%s'", vm, vm->def->name);
+}
 
 static qemuAgentCallbacks agentCallbacks = {
+    .init = qemuProcessHandleAgentInit,
     .destroy = qemuProcessHandleAgentDestroy,
     .eofNotify = qemuProcessHandleAgentEOF,
     .errorNotify = qemuProcessHandleAgentError,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 2/7] qemu: add guest init event callback to do the initialize work for guest
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 1/7] qemu-agent: add agent init callback when detecting guest setup Chen Fan
@ 2015-04-17  8:53 ` Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 3/7] hostdev: add a 'bond' type element in <hostdev> element Chen Fan
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 src/qemu/qemu_domain.h  |  7 +++++++
 src/qemu/qemu_driver.c  | 32 ++++++++++++++++++++++++++++++++
 src/qemu/qemu_process.c | 22 ++++++++++++++++++++++
 3 files changed, 61 insertions(+)

diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 3225abb..19f4b27 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -136,6 +136,8 @@ struct qemuDomainJobObj {
 typedef void (*qemuDomainCleanupCallback)(virQEMUDriverPtr driver,
                                           virDomainObjPtr vm);
 
+typedef void (*qemuDomainInitCallback)(virDomainObjPtr vm);
+
 typedef struct _qemuDomainObjPrivate qemuDomainObjPrivate;
 typedef qemuDomainObjPrivate *qemuDomainObjPrivatePtr;
 struct _qemuDomainObjPrivate {
@@ -185,6 +187,10 @@ struct _qemuDomainObjPrivate {
     size_t ncleanupCallbacks;
     size_t ncleanupCallbacks_max;
 
+    qemuDomainInitCallback *initCallbacks;
+    size_t nInitCallbacks;
+    size_t nInitCallbacks_max;
+
     virCgroupPtr cgroup;
 
     virCond unplugFinished; /* signals that unpluggingDevice was unplugged */
@@ -205,6 +211,7 @@ typedef enum {
     QEMU_PROCESS_EVENT_NIC_RX_FILTER_CHANGED,
     QEMU_PROCESS_EVENT_SERIAL_CHANGED,
     QEMU_PROCESS_EVENT_BLOCK_JOB,
+    QEMU_PROCESS_EVENT_GUESTINIT,
 
     QEMU_PROCESS_EVENT_LAST
 } qemuProcessEventType;
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index f37b95d..7368145 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -4073,6 +4073,35 @@ processGuestPanicEvent(virQEMUDriverPtr driver,
 
 
 static void
+processGuestInitEvent(virQEMUDriverPtr driver,
+                      virDomainObjPtr vm)
+{
+    qemuDomainObjPrivatePtr priv;
+    int i;
+
+    VIR_DEBUG("init guest from domain %p %s",
+              vm, vm->def->name);
+
+    if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0)
+        return;
+
+    if (!virDomainObjIsActive(vm)) {
+        VIR_DEBUG("Domain is not running");
+        goto endjob;
+    }
+
+    priv = vm->privateData;
+
+    for (i = 0; i < priv->nInitCallbacks; i++) {
+        if (priv->initCallbacks[i])
+            priv->initCallbacks[i](vm);
+    }
+
+ endjob:
+    qemuDomainObjEndJob(driver, vm);
+}
+
+static void
 processDeviceDeletedEvent(virQEMUDriverPtr driver,
                           virDomainObjPtr vm,
                           char *devAlias)
@@ -4627,6 +4656,9 @@ static void qemuProcessEventHandler(void *data, void *opaque)
                              processEvent->action,
                              processEvent->status);
         break;
+    case QEMU_PROCESS_EVENT_GUESTINIT:
+        processGuestInitEvent(driver, vm);
+        break;
     case QEMU_PROCESS_EVENT_LAST:
         break;
     }
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index e6fc53a..fcc0566 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -197,7 +197,29 @@ static void qemuProcessHandleAgentDestroy(qemuAgentPtr agent,
 static void qemuProcessHandleAgentInit(qemuAgentPtr agent ATTRIBUTE_UNUSED,
                                        virDomainObjPtr vm)
 {
+    struct qemuProcessEvent *processEvent = NULL;
+    virQEMUDriverPtr driver = qemu_driver;
+
+    virObjectLock(vm);
+
     VIR_DEBUG("Received init from agent on %p '%s'", vm, vm->def->name);
+
+    if (VIR_ALLOC(processEvent) < 0)
+        goto cleanup;
+
+    processEvent->eventType = QEMU_PROCESS_EVENT_GUESTINIT;
+    processEvent->vm = vm;
+
+    virObjectRef(vm);
+    if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) {
+        if (!virObjectUnref(vm))
+            vm = NULL;
+        VIR_FREE(processEvent);
+    }
+
+ cleanup:
+    if (vm)
+        virObjectUnlock(vm);
 }
 
 static qemuAgentCallbacks agentCallbacks = {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 3/7] hostdev: add a 'bond' type element in <hostdev> element
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 1/7] qemu-agent: add agent init callback when detecting guest setup Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 2/7] qemu: add guest init event callback to do the initialize work for guest Chen Fan
@ 2015-04-17  8:53 ` Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface Chen Fan
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

this 'bond' element is to create bond device when guest startup,
the xml like:
<hostdev mode='subsystem' type='pci' managed='yes'>
    <driver name='vfio' type='bond'/>
    <bond>
      <interface address='XXX'/>
      <interface address='XXX1'/>
    </bond>
</hostdev>

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 docs/schemas/basictypes.rng   |   6 ++
 docs/schemas/domaincommon.rng |  16 ++++++
 src/conf/domain_conf.c        | 131 ++++++++++++++++++++++++++++++++++++++----
 src/conf/domain_conf.h        |  13 +++++
 src/libvirt_private.syms      |   1 +
 5 files changed, 157 insertions(+), 10 deletions(-)

diff --git a/docs/schemas/basictypes.rng b/docs/schemas/basictypes.rng
index f086ad2..aef24fe 100644
--- a/docs/schemas/basictypes.rng
+++ b/docs/schemas/basictypes.rng
@@ -66,6 +66,12 @@
     </choice>
   </define>
 
+  <define name="pciinterface">
+    <attribute name="address">
+      <ref name="uniMacAddr"/>
+    </attribute>
+  </define>
+
   <define name="pciaddress">
     <optional>
       <attribute name="domain">
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 03fd541..0cf82cb 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -3766,9 +3766,25 @@
               <value>xen</value>
             </choice>
           </attribute>
+          <optional>
+            <attribute name="type">
+              <choice>
+                <value>bond</value>
+              </choice>
+            </attribute>
+          </optional>
           <empty/>
         </element>
       </optional>
+      <optional>
+        <element name="bond">
+          <zeroOrMore>
+            <element name="interface">
+              <ref name="pciinterface"/>
+            </element>
+          </zeroOrMore>
+        </element>
+      </optional>
       <element name="source">
         <optional>
           <ref name="startupPolicy"/>
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 4d7e3c9..14bcae1 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -610,6 +610,11 @@ VIR_ENUM_IMPL(virDomainHostdevSubsysPCIBackend,
               "vfio",
               "xen")
 
+VIR_ENUM_IMPL(virDomainHostdevSubsysPCIDevice,
+              VIR_DOMAIN_HOSTDEV_PCI_DEVICE_TYPE_LAST,
+              "default",
+              "bond")
+
 VIR_ENUM_IMPL(virDomainHostdevSubsysSCSIProtocol,
               VIR_DOMAIN_HOSTDEV_SCSI_PROTOCOL_TYPE_LAST,
               "adapter",
@@ -1907,6 +1912,10 @@ void virDomainHostdevDefClear(virDomainHostdevDefPtr def)
             } else {
                 VIR_FREE(scsisrc->u.host.adapter);
             }
+        } else if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) {
+            virDomainHostdevSubsysPCIPtr pcisrc = &def->source.subsys.u.pci;
+            if (pcisrc->device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND)
+                VIR_FREE(pcisrc->macs);
         }
         break;
     }
@@ -4978,7 +4987,9 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node,
     char *sgio = NULL;
     char *rawio = NULL;
     char *backendStr = NULL;
+    char *deviceStr = NULL;
     int backend;
+    int device;
     int ret = -1;
     virDomainHostdevSubsysPCIPtr pcisrc = &def->source.subsys.u.pci;
     virDomainHostdevSubsysSCSIPtr scsisrc = &def->source.subsys.u.scsi;
@@ -5077,6 +5088,68 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node,
         }
         pcisrc->backend = backend;
 
+        device =  VIR_DOMAIN_HOSTDEV_PCI_DEVICE_DEFAULT;
+        if ((deviceStr = virXPathString("string(./driver/@type)", ctxt)) &&
+            (((device = virDomainHostdevSubsysPCIDeviceTypeFromString(deviceStr)) < 0) ||
+             device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_DEFAULT)) {
+            virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+                           _("Unknown PCI device <driver type='%s'/> "
+                             "has been specified"), deviceStr);
+            goto error;
+        }
+        pcisrc->device = device;
+
+        if (device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND) {
+            xmlNodePtr *macs = NULL;
+            int n = 0;
+            int i;
+            char *macStr = NULL;
+
+            if (!(virXPathNode("./bond", ctxt))) {
+                virReportError(VIR_ERR_XML_ERROR, "%s",
+                               _("missing <nond> node specified by bond type"));
+                goto error;
+            }
+
+            if ((n = virXPathNodeSet("./bond/interface", ctxt, &macs)) < 0) {
+                virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                               _("Cannot extract interface nodes"));
+                goto error;
+            }
+
+            VIR_FREE(pcisrc->macs);
+            if (VIR_ALLOC_N(pcisrc->macs, n) < 0)
+                goto error;
+
+            pcisrc->nmac = n;
+            for (i = 0; i < n; i++) {
+                xmlNodePtr cur_node = macs[i];
+
+                macStr = virXMLPropString(cur_node, "address");
+                if (!macStr) {
+                    virReportError(VIR_ERR_XML_ERROR, "%s",
+                                   _("Missing required address attribute "
+                                   "in interface element"));
+                    goto error;
+                }
+                if (virMacAddrParse((const char *)macStr, &pcisrc->macs[i]) < 0) {
+                    virReportError(VIR_ERR_XML_ERROR,
+                                   _("unable to parse mac address '%s'"),
+                                   (const char *)macStr);
+                    VIR_FREE(macStr);
+                    goto error;
+                }
+                if (virMacAddrIsMulticast(&pcisrc->macs[i])) {
+                    virReportError(VIR_ERR_XML_ERROR,
+                                   _("expected unicast mac address, found multicast '%s'"),
+                                   (const char *)macStr);
+                    VIR_FREE(macStr);
+                    goto error;
+                }
+                VIR_FREE(macStr);
+            }
+        }
+
         break;
 
     case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB:
@@ -18389,18 +18462,56 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf,
     virDomainHostdevSubsysSCSIHostPtr scsihostsrc = &scsisrc->u.host;
     virDomainHostdevSubsysSCSIiSCSIPtr iscsisrc = &scsisrc->u.iscsi;
 
-    if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
-        pcisrc->backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT) {
-        const char *backend =
-            virDomainHostdevSubsysPCIBackendTypeToString(pcisrc->backend);
+    if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) {
+        const char *backend = NULL;
+        const char *device = NULL;
+        int i;
+        char macstr[VIR_MAC_STRING_BUFLEN];
 
-        if (!backend) {
-            virReportError(VIR_ERR_INTERNAL_ERROR,
-                           _("unexpected pci hostdev driver name type %d"),
-                           pcisrc->backend);
-            return -1;
+        if (pcisrc->backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT) {
+            backend =
+                virDomainHostdevSubsysPCIBackendTypeToString(pcisrc->backend);
+
+            if (!backend) {
+                virReportError(VIR_ERR_INTERNAL_ERROR,
+                               _("unexpected pci hostdev driver name type %d"),
+                               pcisrc->backend);
+                return -1;
+            }
+        }
+
+        if (pcisrc->device != VIR_DOMAIN_HOSTDEV_PCI_DEVICE_DEFAULT) {
+            device =
+                virDomainHostdevSubsysPCIDeviceTypeToString(pcisrc->device);
+
+            if (!device) {
+                virReportError(VIR_ERR_INTERNAL_ERROR,
+                               _("unexpected pci hostdev device name type %d"),
+                               pcisrc->device);
+                return -1;
+            }
+        }
+
+        if (backend) {
+            virBufferAddLit(buf, "<driver");
+            virBufferAsprintf(buf, " name='%s'", backend);
+            if (device)
+                virBufferAsprintf(buf, " type='%s'", device);
+
+            virBufferAddLit(buf, "/>\n");
+        }
+
+        if (pcisrc->device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND &&
+            pcisrc->nmac > 0) {
+            virBufferAddLit(buf, "<bond>\n");
+            virBufferAdjustIndent(buf, 2);
+            for (i = 0; i < pcisrc->nmac; i++) {
+                virBufferAsprintf(buf, "<interface address='%s'/>\n",
+                                  virMacAddrFormat(&pcisrc->macs[i], macstr));
+            }
+            virBufferAdjustIndent(buf, -2);
+            virBufferAddLit(buf, "</bond>\n");
         }
-        virBufferAsprintf(buf, "<driver name='%s'/>\n", backend);
     }
 
     virBufferAddLit(buf, "<source");
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index e6fa3c9..e62979f 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -416,6 +416,16 @@ typedef enum {
 
 VIR_ENUM_DECL(virDomainHostdevSubsysPCIBackend)
 
+/* the type used for PCI hostdev devices */
+typedef enum {
+    VIR_DOMAIN_HOSTDEV_PCI_DEVICE_DEFAULT, /* default */
+    VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND,    /* bond device */
+
+    VIR_DOMAIN_HOSTDEV_PCI_DEVICE_TYPE_LAST
+} virDomainHostdevSubsysPCIDeviceType;
+
+VIR_ENUM_DECL(virDomainHostdevSubsysPCIDevice)
+
 typedef enum {
     VIR_DOMAIN_HOSTDEV_SCSI_PROTOCOL_TYPE_NONE,
     VIR_DOMAIN_HOSTDEV_SCSI_PROTOCOL_TYPE_ISCSI,
@@ -442,6 +452,9 @@ typedef virDomainHostdevSubsysPCI *virDomainHostdevSubsysPCIPtr;
 struct _virDomainHostdevSubsysPCI {
     virDevicePCIAddress addr; /* host address */
     int backend; /* enum virDomainHostdevSubsysPCIBackendType */
+    int device;  /* enum virDomainHostdevSubsysPCIDeviceType */
+    size_t nmac;
+    virMacAddr* macs;
 };
 
 typedef struct _virDomainHostdevSubsysSCSIHost virDomainHostdevSubsysSCSIHost;
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index aafc385..43a769d 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -320,6 +320,7 @@ virDomainHostdevInsert;
 virDomainHostdevModeTypeToString;
 virDomainHostdevRemove;
 virDomainHostdevSubsysPCIBackendTypeToString;
+virDomainHostdevSubsysPCIDeviceTypeToString;
 virDomainHostdevSubsysTypeToString;
 virDomainHubTypeFromString;
 virDomainHubTypeToString;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
                   ` (2 preceding siblings ...)
  2015-04-17  8:53 ` [Qemu-devel] [RFC 3/7] hostdev: add a 'bond' type element in <hostdev> element Chen Fan
@ 2015-04-17  8:53 ` Chen Fan
  2015-05-19  9:13   ` Michael S. Tsirkin
  2015-05-29  7:37   ` Michal Privoznik
  2015-04-17  8:53 ` [Qemu-devel] [RFC 5/7] hostdev: add parse ip and route for bond configure Chen Fan
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

via initialize callback to create bond device.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 src/qemu/qemu_agent.c   | 118 ++++++++++++++++++++++++++++++++++++++++++++++++
 src/qemu/qemu_agent.h   |  10 ++++
 src/qemu/qemu_domain.c  |  70 ++++++++++++++++++++++++++++
 src/qemu/qemu_domain.h  |   7 +++
 src/qemu/qemu_process.c |   4 ++
 5 files changed, 209 insertions(+)

diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
index cee0f8b..b8eba01 100644
--- a/src/qemu/qemu_agent.c
+++ b/src/qemu/qemu_agent.c
@@ -2169,3 +2169,121 @@ qemuAgentGetInterfaces(qemuAgentPtr mon,
 
     goto cleanup;
 }
+
+static virDomainInterfacePtr
+findInterfaceByMac(virDomainInterfacePtr *info,
+                   size_t len,
+                   const char *macstr)
+{
+    size_t i;
+    bool found = false;
+
+    for (i = 0; i < len; i++) {
+        if (info[i]->hwaddr &&
+            STREQ(info[i]->hwaddr, macstr)) {
+            found = true;
+            break;
+        }
+    }
+
+    if (found) {
+        return info[i];
+    }
+
+    return NULL;
+}
+
+/*
+ * qemuAgentSetInterface:
+ */
+int
+qemuAgentCreateBond(qemuAgentPtr mon,
+                    virDomainHostdevSubsysPCIPtr pcisrc)
+{
+    int ret = -1;
+    virJSONValuePtr cmd = NULL;
+    virJSONValuePtr reply = NULL;
+    size_t i;
+    char macstr[VIR_MAC_STRING_BUFLEN];
+    virDomainInterfacePtr *interfaceInfo = NULL;
+    virDomainInterfacePtr interface;
+    virJSONValuePtr new_interface = NULL;
+    virJSONValuePtr subInterfaces = NULL;
+    virJSONValuePtr subInterface = NULL;
+    int len;
+
+    if (!(pcisrc->nmac || pcisrc->macs))
+        return ret;
+
+    len = qemuAgentGetInterfaces(mon, &interfaceInfo);
+    if (len < 0)
+        return ret;
+
+    if (!(new_interface = virJSONValueNewObject()))
+        goto cleanup;
+
+    if (virJSONValueObjectAppendString(new_interface, "type", "bond") < 0)
+        goto cleanup;
+
+    if (virJSONValueObjectAppendString(new_interface, "name", "bond0") < 0)
+        goto cleanup;
+
+    if (virJSONValueObjectAppendString(new_interface, "onboot", "onboot") < 0)
+        goto cleanup;
+
+    if (!(subInterfaces = virJSONValueNewArray()))
+        goto cleanup;
+
+    for (i = 0; i < pcisrc->nmac; i++) {
+        virMacAddrFormat(&pcisrc->macs[i], macstr);
+        interface = findInterfaceByMac(interfaceInfo, len, macstr);
+        if (!interface) {
+            goto cleanup;
+        }
+
+        if (!(subInterface = virJSONValueNewObject()))
+            goto cleanup;
+
+        if (virJSONValueObjectAppendString(subInterface, "name", interface->name) < 0)
+            goto cleanup;
+
+        if (virJSONValueArrayAppend(subInterfaces, subInterface) < 0)
+            goto cleanup;
+
+        subInterface = NULL;
+    }
+
+    if (i && virJSONValueObjectAppend(new_interface, "subInterfaces", subInterfaces) < 0)
+        goto cleanup;
+
+    cmd = qemuAgentMakeCommand("guest-network-set-interface",
+                               "a:interface", new_interface,
+                               NULL);
+
+    if (!cmd)
+        goto cleanup;
+
+    subInterfaces = NULL;
+    new_interface = NULL;
+
+    if (qemuAgentCommand(mon, cmd, &reply, true,
+                         VIR_DOMAIN_QEMU_AGENT_COMMAND_BLOCK) < 0)
+        goto cleanup;
+
+    if (virJSONValueObjectGetNumberInt(reply, "return", &ret) < 0) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("malformed return value"));
+    }
+
+ cleanup:
+    virJSONValueFree(subInterfaces);
+    virJSONValueFree(subInterface);
+    virJSONValueFree(new_interface);
+    virJSONValueFree(cmd);
+    virJSONValueFree(reply);
+    if (interfaceInfo)
+        for (i = 0; i < len; i++)
+            virDomainInterfaceFree(interfaceInfo[i]);
+    VIR_FREE(interfaceInfo);
+    return ret;
+}
diff --git a/src/qemu/qemu_agent.h b/src/qemu/qemu_agent.h
index 42414a7..744cb0a 100644
--- a/src/qemu/qemu_agent.h
+++ b/src/qemu/qemu_agent.h
@@ -97,6 +97,13 @@ struct _qemuAgentCPUInfo {
     bool offlinable;    /* true if the CPU can be offlined */
 };
 
+typedef struct _qemuAgentInterfaceInfo qemuAgentInterfaceInfo;
+typedef qemuAgentInterfaceInfo *qemuAgentInterfaceInfoPtr;
+struct _qemuAgentInterfaceInfo {
+    char *name;
+    char *hardware_address;
+};
+
 int qemuAgentGetVCPUs(qemuAgentPtr mon, qemuAgentCPUInfoPtr *info);
 int qemuAgentSetVCPUs(qemuAgentPtr mon, qemuAgentCPUInfoPtr cpus, size_t ncpus);
 int qemuAgentUpdateCPUInfo(unsigned int nvcpus,
@@ -114,4 +121,7 @@ int qemuAgentSetTime(qemuAgentPtr mon,
 int qemuAgentGetInterfaces(qemuAgentPtr mon,
                            virDomainInterfacePtr **ifaces);
 
+int qemuAgentCreateBond(qemuAgentPtr mon,
+                        virDomainHostdevSubsysPCIPtr pcisrc);
+
 #endif /* __QEMU_AGENT_H__ */
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 603360f..584fefb 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -2722,6 +2722,46 @@ qemuDomainCleanupRun(virQEMUDriverPtr driver,
     priv->ncleanupCallbacks_max = 0;
 }
 
+/*
+ * The vm must be locked when any of the following init functions is
+ * called.
+ */
+int
+qemuDomainInitAdd(virDomainObjPtr vm,
+                  qemuDomainInitCallback cb)
+{
+    qemuDomainObjPrivatePtr priv = vm->privateData;
+    size_t i;
+
+    VIR_DEBUG("vm=%s, cb=%p", vm->def->name, cb);
+
+    for (i = 0; i < priv->nInitCallbacks; i++) {
+        if (priv->initCallbacks[i] == cb)
+            return 0;
+    }
+
+    if (VIR_RESIZE_N(priv->initCallbacks,
+                     priv->nInitCallbacks_max,
+                     priv->nInitCallbacks, 1) < 0)
+        return -1;
+
+    priv->initCallbacks[priv->nInitCallbacks++] = cb;
+    return 0;
+}
+
+void
+qemuDomainInitCleanup(virDomainObjPtr vm)
+{
+    qemuDomainObjPrivatePtr priv = vm->privateData;
+
+    VIR_DEBUG("vm=%s", vm->def->name);
+
+    VIR_FREE(priv->cleanupCallbacks);
+    priv->ncleanupCallbacks = 0;
+    priv->ncleanupCallbacks_max = 0;
+}
+
+
 static void
 qemuDomainGetImageIds(virQEMUDriverConfigPtr cfg,
                       virDomainObjPtr vm,
@@ -3083,3 +3123,33 @@ qemuDomainSupportsBlockJobs(virDomainObjPtr vm,
 
     return 0;
 }
+
+void
+qemuDomainPrepareHostdevInit(virDomainObjPtr vm)
+{
+    qemuDomainObjPrivatePtr priv = vm->privateData;
+    virDomainDefPtr def = vm->def;
+    int i;
+
+    if (!def->nhostdevs)
+        return;
+
+    if (!qemuDomainAgentAvailable(vm, false))
+        return;
+
+    if (!virDomainObjIsActive(vm))
+        return;
+
+    for (i = 0; i < def->nhostdevs; i++) {
+        virDomainHostdevDefPtr hostdev = def->hostdevs[i];
+        virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci;
+
+        if (hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
+            hostdev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO &&
+            hostdev->source.subsys.u.pci.device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND) {
+            qemuDomainObjEnterAgent(vm);
+            qemuAgentCreateBond(priv->agent, pcisrc);
+            qemuDomainObjExitAgent(vm);
+        }
+    }
+}
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 19f4b27..3244ca0 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -403,6 +403,10 @@ void qemuDomainCleanupRemove(virDomainObjPtr vm,
 void qemuDomainCleanupRun(virQEMUDriverPtr driver,
                           virDomainObjPtr vm);
 
+int qemuDomainInitAdd(virDomainObjPtr vm,
+                      qemuDomainInitCallback cb);
+void qemuDomainInitCleanup(virDomainObjPtr vm);
+
 extern virDomainXMLPrivateDataCallbacks virQEMUDriverPrivateDataCallbacks;
 extern virDomainXMLNamespace virQEMUDriverDomainXMLNamespace;
 extern virDomainDefParserConfig virQEMUDriverDomainDefParserConfig;
@@ -444,4 +448,7 @@ void qemuDomObjEndAPI(virDomainObjPtr *vm);
 int qemuDomainAlignMemorySizes(virDomainDefPtr def);
 void qemuDomainMemoryDeviceAlignSize(virDomainMemoryDefPtr mem);
 
+void
+qemuDomainPrepareHostdevInit(virDomainObjPtr vm);
+
 #endif /* __QEMU_DOMAIN_H__ */
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index fcc0566..0a72aca 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -4444,6 +4444,9 @@ int qemuProcessStart(virConnectPtr conn,
                                hostdev_flags) < 0)
         goto cleanup;
 
+    if (qemuDomainInitAdd(vm, qemuDomainPrepareHostdevInit))
+        goto cleanup;
+
     VIR_DEBUG("Preparing chr devices");
     if (virDomainChrDefForeach(vm->def,
                                true,
@@ -5186,6 +5189,7 @@ void qemuProcessStop(virQEMUDriverPtr driver,
                                  VIR_QEMU_PROCESS_KILL_NOCHECK));
 
     qemuDomainCleanupRun(driver, vm);
+    qemuDomainInitCleanup(vm);
 
     /* Stop autodestroy in case guest is restarted */
     qemuProcessAutoDestroyRemove(driver, vm);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 5/7] hostdev: add parse ip and route for bond configure
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
                   ` (3 preceding siblings ...)
  2015-04-17  8:53 ` [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface Chen Fan
@ 2015-04-17  8:53 ` Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 6/7] migrate: hot remove hostdev at perform phase for bond device Chen Fan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

bond device always need to configure the ip address and
route way address. so here we add the interface.

xml like:
<hostdev mode='subsystem' type='pci' managed='no'>
  <driver name='vfio' type='bond'/>
  <bond>
    <ip address='192.168.122.5' family='ipv4' prefix='24'/>
    <route family='ipv4' address='0.0.0.0' gateway='192.168.122.1'/>
    <interface address='52:54:00:e8:c0:f3'/>
    <interface address='44:33:4c:06:f5:8e'/>
</bond>

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 docs/schemas/domaincommon.rng | 21 +++++++++++
 src/conf/domain_conf.c        | 87 ++++++++++++++++++++++++++++++++++++-------
 src/conf/domain_conf.h        | 24 ++++++++----
 src/conf/networkcommon_conf.c | 17 ---------
 src/conf/networkcommon_conf.h | 17 +++++++++
 src/qemu/qemu_agent.c         | 58 +++++++++++++++++++++++++++--
 6 files changed, 183 insertions(+), 41 deletions(-)

diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 0cf82cb..4056cbd 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -3779,6 +3779,27 @@
       <optional>
         <element name="bond">
           <zeroOrMore>
+            <element name="ip">
+              <attribute name="address">
+                <ref name="ipAddr"/>
+              </attribute>
+              <optional>
+                <attribute name="family">
+                  <ref name="addr-family"/>
+                </attribute>
+              </optional>
+              <optional>
+                <attribute name="prefix">
+                  <ref name="ipPrefix"/>
+                </attribute>
+              </optional>
+              <empty/>
+            </element>
+          </zeroOrMore>
+          <zeroOrMore>
+            <ref name="route"/>
+          </zeroOrMore>
+          <zeroOrMore>
             <element name="interface">
               <ref name="pciinterface"/>
             </element>
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 14bcae1..7d1cd3e 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -797,6 +797,8 @@ static virClassPtr virDomainXMLOptionClass;
 static void virDomainObjDispose(void *obj);
 static void virDomainObjListDispose(void *obj);
 static void virDomainXMLOptionClassDispose(void *obj);
+static virDomainNetIpDefPtr virDomainNetIpParseXML(xmlNodePtr node);
+
 
 static int virDomainObjOnceInit(void)
 {
@@ -1914,8 +1916,17 @@ void virDomainHostdevDefClear(virDomainHostdevDefPtr def)
             }
         } else if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) {
             virDomainHostdevSubsysPCIPtr pcisrc = &def->source.subsys.u.pci;
-            if (pcisrc->device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND)
-                VIR_FREE(pcisrc->macs);
+            if (pcisrc->device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND) {
+                for (i = 0; i < pcisrc->net.nmacs; i++)
+                    VIR_FREE(pcisrc->net.macs[i]);
+                VIR_FREE(pcisrc->net.macs);
+                for (i = 0; i < pcisrc->net.nips; i++)
+                    VIR_FREE(pcisrc->net.ips[i]);
+                VIR_FREE(pcisrc->net.ips);
+                for (i = 0; i < pcisrc->net.nroutes; i++)
+                    VIR_FREE(pcisrc->net.routes[i]);
+                VIR_FREE(pcisrc->net.routes);
+            }
         }
         break;
     }
@@ -5102,26 +5113,68 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node,
         if (device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND) {
             xmlNodePtr *macs = NULL;
             int n = 0;
-            int i;
+            size_t i;
             char *macStr = NULL;
+            xmlNodePtr *ipnodes = NULL;
+            int nipnodes;
+            xmlNodePtr *routenodes = NULL;
+            int nroutenodes;
 
             if (!(virXPathNode("./bond", ctxt))) {
                 virReportError(VIR_ERR_XML_ERROR, "%s",
-                               _("missing <nond> node specified by bond type"));
+                               _("missing <bond> node specified by bond type"));
                 goto error;
             }
 
+            if ((nipnodes = virXPathNodeSet("./bond/ip", ctxt, &ipnodes)) < 0)
+                goto error;
+
+            if (nipnodes) {
+                for (i = 0; i < nipnodes; i++) {
+                    virDomainNetIpDefPtr ip = virDomainNetIpParseXML(ipnodes[i]);
+
+                    if (!ip)
+                        goto error;
+
+                    if (VIR_APPEND_ELEMENT(pcisrc->net.ips,
+                                           pcisrc->net.nips, ip) < 0) {
+                        VIR_FREE(ip);
+                        goto error;
+                    }
+                }
+            }
+
+            if ((nroutenodes = virXPathNodeSet("./bond/route", ctxt, &routenodes)) < 0)
+                goto error;
+
+            if (nroutenodes) {
+                for (i = 0; i < nroutenodes; i++) {
+                    virNetworkRouteDefPtr route = NULL;
+
+                    if (!(route = virNetworkRouteDefParseXML(_("Domain hostdev device"),
+                                                             routenodes[i],
+                                                             ctxt)))
+                        goto error;
+
+                    if (VIR_APPEND_ELEMENT(pcisrc->net.routes,
+                                           pcisrc->net.nroutes, route) < 0) {
+                        virNetworkRouteDefFree(route);
+                        goto error;
+                    }
+                }
+            }
+
             if ((n = virXPathNodeSet("./bond/interface", ctxt, &macs)) < 0) {
                 virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                                _("Cannot extract interface nodes"));
                 goto error;
             }
 
-            VIR_FREE(pcisrc->macs);
-            if (VIR_ALLOC_N(pcisrc->macs, n) < 0)
+            VIR_FREE(pcisrc->net.macs);
+            if (VIR_ALLOC_N(pcisrc->net.macs, n) < 0)
                 goto error;
 
-            pcisrc->nmac = n;
+            pcisrc->net.nmacs = n;
             for (i = 0; i < n; i++) {
                 xmlNodePtr cur_node = macs[i];
 
@@ -5132,14 +5185,18 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node,
                                    "in interface element"));
                     goto error;
                 }
-                if (virMacAddrParse((const char *)macStr, &pcisrc->macs[i]) < 0) {
+
+                if (VIR_ALLOC(pcisrc->net.macs[i]) < 0)
+                    goto error;
+
+                if (virMacAddrParse((const char *)macStr, pcisrc->net.macs[i]) < 0) {
                     virReportError(VIR_ERR_XML_ERROR,
                                    _("unable to parse mac address '%s'"),
                                    (const char *)macStr);
                     VIR_FREE(macStr);
                     goto error;
                 }
-                if (virMacAddrIsMulticast(&pcisrc->macs[i])) {
+                if (virMacAddrIsMulticast(pcisrc->net.macs[i])) {
                     virReportError(VIR_ERR_XML_ERROR,
                                    _("expected unicast mac address, found multicast '%s'"),
                                    (const char *)macStr);
@@ -18501,13 +18558,17 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf,
             virBufferAddLit(buf, "/>\n");
         }
 
-        if (pcisrc->device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND &&
-            pcisrc->nmac > 0) {
+        if (pcisrc->device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND) {
             virBufferAddLit(buf, "<bond>\n");
             virBufferAdjustIndent(buf, 2);
-            for (i = 0; i < pcisrc->nmac; i++) {
+            if (virDomainNetIpsFormat(buf, pcisrc->net.ips, pcisrc->net.nips) < 0)
+                return -1;
+            if (virDomainNetRoutesFormat(buf, pcisrc->net.routes, pcisrc->net.nroutes) < 0)
+                return -1;
+
+            for (i = 0; i < pcisrc->net.nmacs; i++) {
                 virBufferAsprintf(buf, "<interface address='%s'/>\n",
-                                  virMacAddrFormat(&pcisrc->macs[i], macstr));
+                                  virMacAddrFormat(pcisrc->net.macs[i], macstr));
             }
             virBufferAdjustIndent(buf, -2);
             virBufferAddLit(buf, "</bond>\n");
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index e62979f..723f07b 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -447,14 +447,28 @@ struct _virDomainHostdevSubsysUSB {
     unsigned product;
 };
 
+typedef struct _virDomainNetIpDef virDomainNetIpDef;
+typedef virDomainNetIpDef *virDomainNetIpDefPtr;
+struct _virDomainNetIpDef {
+    virSocketAddr address;       /* ipv4 or ipv6 address */
+    unsigned int prefix; /* number of 1 bits in the net mask */
+};
+
 typedef struct _virDomainHostdevSubsysPCI virDomainHostdevSubsysPCI;
 typedef virDomainHostdevSubsysPCI *virDomainHostdevSubsysPCIPtr;
 struct _virDomainHostdevSubsysPCI {
     virDevicePCIAddress addr; /* host address */
     int backend; /* enum virDomainHostdevSubsysPCIBackendType */
     int device;  /* enum virDomainHostdevSubsysPCIDeviceType */
-    size_t nmac;
-    virMacAddr* macs;
+
+    struct {
+        size_t nips;
+        virDomainNetIpDefPtr *ips;
+        size_t nroutes;
+        virNetworkRouteDefPtr *routes;
+        size_t nmacs;
+        virMacAddrPtr *macs;
+    } net;
 };
 
 typedef struct _virDomainHostdevSubsysSCSIHost virDomainHostdevSubsysSCSIHost;
@@ -507,12 +521,6 @@ typedef enum {
     VIR_DOMAIN_HOSTDEV_CAPS_TYPE_LAST
 } virDomainHostdevCapsType;
 
-typedef struct _virDomainNetIpDef virDomainNetIpDef;
-typedef virDomainNetIpDef *virDomainNetIpDefPtr;
-struct _virDomainNetIpDef {
-    virSocketAddr address;       /* ipv4 or ipv6 address */
-    unsigned int prefix; /* number of 1 bits in the net mask */
-};
 
 typedef struct _virDomainHostdevCaps virDomainHostdevCaps;
 typedef virDomainHostdevCaps *virDomainHostdevCapsPtr;
diff --git a/src/conf/networkcommon_conf.c b/src/conf/networkcommon_conf.c
index 7b7a851..c11baf6 100644
--- a/src/conf/networkcommon_conf.c
+++ b/src/conf/networkcommon_conf.c
@@ -32,23 +32,6 @@
 
 #define VIR_FROM_THIS VIR_FROM_NETWORK
 
-struct _virNetworkRouteDef {
-    char *family;               /* ipv4 or ipv6 - default is ipv4 */
-    virSocketAddr address;      /* Routed Network IP address */
-
-    /* One or the other of the following two will be used for a given
-     * Network address, but never both. The parser guarantees this.
-     * The virSocketAddrGetIpPrefix() can be used to get a
-     * valid prefix.
-     */
-    virSocketAddr netmask;      /* ipv4 - either netmask or prefix specified */
-    unsigned int prefix;        /* ipv6 - only prefix allowed */
-    bool has_prefix;            /* prefix= was specified */
-    unsigned int metric;        /* value for metric (defaults to 1) */
-    bool has_metric;            /* metric= was specified */
-    virSocketAddr gateway;      /* gateway IP address for ip-route */
-};
-
 void
 virNetworkRouteDefFree(virNetworkRouteDefPtr def)
 {
diff --git a/src/conf/networkcommon_conf.h b/src/conf/networkcommon_conf.h
index 1500d0f..a9f58e8 100644
--- a/src/conf/networkcommon_conf.h
+++ b/src/conf/networkcommon_conf.h
@@ -35,6 +35,23 @@
 typedef struct _virNetworkRouteDef virNetworkRouteDef;
 typedef virNetworkRouteDef *virNetworkRouteDefPtr;
 
+struct _virNetworkRouteDef {
+    char *family;               /* ipv4 or ipv6 - default is ipv4 */
+    virSocketAddr address;      /* Routed Network IP address */
+
+    /* One or the other of the following two will be used for a given
+     * Network address, but never both. The parser guarantees this.
+     * The virSocketAddrGetIpPrefix() can be used to get a
+     * valid prefix.
+     */
+    virSocketAddr netmask;      /* ipv4 - either netmask or prefix specified */
+    unsigned int prefix;        /* ipv6 - only prefix allowed */
+    bool has_prefix;            /* prefix= was specified */
+    unsigned int metric;        /* value for metric (defaults to 1) */
+    bool has_metric;            /* metric= was specified */
+    virSocketAddr gateway;      /* gateway IP address for ip-route */
+};
+
 void
 virNetworkRouteDefFree(virNetworkRouteDefPtr def);
 
diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
index b8eba01..f9823e2 100644
--- a/src/qemu/qemu_agent.c
+++ b/src/qemu/qemu_agent.c
@@ -2208,11 +2208,14 @@ qemuAgentCreateBond(qemuAgentPtr mon,
     virDomainInterfacePtr *interfaceInfo = NULL;
     virDomainInterfacePtr interface;
     virJSONValuePtr new_interface = NULL;
+    virJSONValuePtr ip_interface = NULL;
     virJSONValuePtr subInterfaces = NULL;
     virJSONValuePtr subInterface = NULL;
     int len;
 
-    if (!(pcisrc->nmac || pcisrc->macs))
+    if (!(pcisrc->net.nmacs &&
+          pcisrc->net.nips &&
+          pcisrc->net.nroutes))
         return ret;
 
     len = qemuAgentGetInterfaces(mon, &interfaceInfo);
@@ -2231,11 +2234,60 @@ qemuAgentCreateBond(qemuAgentPtr mon,
     if (virJSONValueObjectAppendString(new_interface, "onboot", "onboot") < 0)
         goto cleanup;
 
+    if (virJSONValueObjectAppendString(new_interface,
+                                       "options",
+                                       "mode=active-backup miimon=100 updelay=10") < 0)
+        goto cleanup;
+
+    if (!(ip_interface = virJSONValueNewObject()))
+        goto cleanup;
+
+    if (pcisrc->net.nips) {
+        /* the first valid */
+        virSocketAddrPtr address = &pcisrc->net.ips[0]->address;
+        char *ipStr = virSocketAddrFormat(address);
+        const char *familyStr = NULL;
+
+        if (virJSONValueObjectAppendString(ip_interface, "ip-address", ipStr) < 0)
+            goto cleanup;
+        VIR_FREE(ipStr);
+
+        if (VIR_SOCKET_ADDR_IS_FAMILY(address, AF_INET6))
+            familyStr = "ipv6";
+        else if (VIR_SOCKET_ADDR_IS_FAMILY(address, AF_INET))
+            familyStr = "ipv4";
+
+        if (familyStr)
+            if (virJSONValueObjectAppendString(ip_interface, "ip-address-type", familyStr) < 0)
+                goto cleanup;
+        if (pcisrc->net.ips[0]->prefix != 0)
+            if (virJSONValueObjectAppendNumberInt(ip_interface, "prefix",
+                                                  pcisrc->net.ips[0]->prefix) < 0)
+                goto cleanup;
+    }
+
+    if (pcisrc->net.nroutes) {
+        /* the first valid */
+        char *addr = NULL;
+        virSocketAddrPtr gateway = &pcisrc->net.routes[0]->gateway;
+
+        if (!(addr = virSocketAddrFormat(gateway)))
+            goto cleanup;
+        if (virJSONValueObjectAppendString(ip_interface, "gateway", addr) < 0)
+            goto cleanup;
+        VIR_FREE(addr);
+    }
+
+    if ((pcisrc->net.nroutes ||
+         pcisrc->net.nips) &&
+        virJSONValueObjectAppend(new_interface, "ip-address", ip_interface) < 0)
+        goto cleanup;
+
     if (!(subInterfaces = virJSONValueNewArray()))
         goto cleanup;
 
-    for (i = 0; i < pcisrc->nmac; i++) {
-        virMacAddrFormat(&pcisrc->macs[i], macstr);
+    for (i = 0; i < pcisrc->net.nmacs; i++) {
+        virMacAddrFormat(pcisrc->net.macs[i], macstr);
         interface = findInterfaceByMac(interfaceInfo, len, macstr);
         if (!interface) {
             goto cleanup;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 6/7] migrate: hot remove hostdev at perform phase for bond device
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
                   ` (4 preceding siblings ...)
  2015-04-17  8:53 ` [Qemu-devel] [RFC 5/7] hostdev: add parse ip and route for bond configure Chen Fan
@ 2015-04-17  8:53 ` Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 7/7] migrate: add hostdev migrate status to support hostdev migration Chen Fan
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

For bond device, we can support the migrate, we can simple
to hot remove the device from source side, and after migration
end, we hot add the new device at destination side.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 src/qemu/qemu_driver.c    | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 src/qemu/qemu_migration.c |  7 ++++++
 2 files changed, 64 insertions(+)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 7368145..0ba9e4a 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -12353,6 +12353,58 @@ qemuDomainMigrateBegin3(virDomainPtr domain,
                               cookieout, cookieoutlen, flags);
 }
 
+static int
+qemuDomainRemovePciPassThruDevices(virConnectPtr conn,
+                                   virDomainObjPtr vm)
+{
+    virQEMUDriverPtr driver = conn->privateData;
+    virDomainDeviceDef dev;
+    virDomainDeviceDefPtr dev_copy = NULL;
+    virCapsPtr caps = NULL;
+    int ret = -1;
+    size_t i;
+
+    if (!(caps = virQEMUDriverGetCapabilities(driver, false)))
+        goto cleanup;
+
+    if (!qemuMigrationJobIsActive(vm, QEMU_ASYNC_JOB_MIGRATION_OUT))
+        goto cleanup;
+
+    /* unplug passthrough bond device */
+    for (i = 0; i < vm->def->nhostdevs; i++) {
+        virDomainHostdevDefPtr hostdev = vm->def->hostdevs[i];
+
+        if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
+            hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
+            hostdev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO &&
+            hostdev->source.subsys.u.pci.device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND) {
+
+            dev.type = VIR_DOMAIN_DEVICE_HOSTDEV;
+            dev.data.hostdev = hostdev;
+
+            dev_copy = virDomainDeviceDefCopy(&dev, vm->def, caps, driver->xmlopt);
+            if (!dev_copy)
+                goto cleanup;
+
+            if (qemuDomainDetachHostDevice(driver, vm, dev_copy) < 0) {
+                virDomainDeviceDefFree(dev_copy);
+                goto cleanup;
+            }
+
+            virDomainDeviceDefFree(dev_copy);
+            if (qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE) < 0)
+                goto cleanup;
+        }
+    }
+
+    ret = 0;
+
+ cleanup:
+    virObjectUnref(caps);
+
+    return ret;
+}
+
 static char *
 qemuDomainMigrateBegin3Params(virDomainPtr domain,
                               virTypedParameterPtr params,
@@ -12688,6 +12740,11 @@ qemuDomainMigratePerform3Params(virDomainPtr dom,
         return -1;
     }
 
+    if (qemuDomainRemovePciPassThruDevices(dom->conn, vm) < 0) {
+        qemuDomObjEndAPI(&vm);
+        return -1;
+    }
+
     return qemuMigrationPerform(driver, dom->conn, vm, dom_xml,
                                 dconnuri, uri, graphicsuri, listenAddress,
                                 cookiein, cookieinlen, cookieout, cookieoutlen,
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 611f53a..9ea83df 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -2000,6 +2000,13 @@ qemuMigrationIsAllowed(virQEMUDriverPtr driver, virDomainObjPtr vm,
     forbid = false;
     for (i = 0; i < def->nhostdevs; i++) {
         virDomainHostdevDefPtr hostdev = def->hostdevs[i];
+
+        if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
+            hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
+            hostdev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO &&
+            hostdev->source.subsys.u.pci.device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND)
+            continue;
+
         if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS ||
             hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB) {
             forbid = true;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 7/7] migrate: add hostdev migrate status to support hostdev migration
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
                   ` (5 preceding siblings ...)
  2015-04-17  8:53 ` [Qemu-devel] [RFC 6/7] migrate: hot remove hostdev at perform phase for bond device Chen Fan
@ 2015-04-17  8:53 ` Chen Fan
  2015-04-17  8:53 ` [Qemu-devel] [RFC 0/3] add support migration with passthrough device Chen Fan
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

we add a migrate status for hostdev to specify the device don't
need to initialze when VM startup, after migration end, we add
the migrate status hostdev, so can support hostdev migration.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 src/conf/domain_conf.c    |  3 ++
 src/conf/domain_conf.h    |  7 ++++
 src/qemu/qemu_command.c   |  3 ++
 src/qemu/qemu_driver.c    | 53 +--------------------------
 src/qemu/qemu_hotplug.c   |  8 +++--
 src/qemu/qemu_migration.c | 92 ++++++++++++++++++++++++++++++++++++++++++++---
 src/qemu/qemu_migration.h |  4 +++
 src/util/virhostdev.c     |  3 ++
 8 files changed, 114 insertions(+), 59 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 7d1cd3e..b56c6fa 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -3035,6 +3035,9 @@ virDomainDeviceInfoIterateInternal(virDomainDefPtr def,
     device.type = VIR_DOMAIN_DEVICE_HOSTDEV;
     for (i = 0; i < def->nhostdevs; i++) {
         device.data.hostdev = def->hostdevs[i];
+        if (device.data.hostdev->state == VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE)
+            continue;
+
         if (cb(def, &device, def->hostdevs[i]->info, opaque) < 0)
             return -1;
     }
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 723f07b..4b7b4c9 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -543,6 +543,12 @@ struct _virDomainHostdevCaps {
     } u;
 };
 
+typedef enum {
+    VIR_DOMAIN_HOSTDEV_STATE_DEFAULT,
+    VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE,
+
+    VIR_DOMAIN_HOSTDEV_STATE_LAST
+} virDomainHostdevState;
 
 /* basic device for direct passthrough */
 struct _virDomainHostdevDef {
@@ -559,6 +565,7 @@ struct _virDomainHostdevDef {
     } source;
     virDomainHostdevOrigStates origstates;
     virDomainDeviceInfoPtr info; /* Guest address */
+    int state;
 };
 
 
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index e7e0937..dc5245a 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -10365,6 +10365,9 @@ qemuBuildCommandLine(virConnectPtr conn,
         virDomainHostdevDefPtr hostdev = def->hostdevs[i];
         char *devstr;
 
+        if (hostdev->state == VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE)
+            continue;
+
         if (hostdev->info->bootIndex) {
             if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS ||
                 (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 0ba9e4a..4724171 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -12353,57 +12353,6 @@ qemuDomainMigrateBegin3(virDomainPtr domain,
                               cookieout, cookieoutlen, flags);
 }
 
-static int
-qemuDomainRemovePciPassThruDevices(virConnectPtr conn,
-                                   virDomainObjPtr vm)
-{
-    virQEMUDriverPtr driver = conn->privateData;
-    virDomainDeviceDef dev;
-    virDomainDeviceDefPtr dev_copy = NULL;
-    virCapsPtr caps = NULL;
-    int ret = -1;
-    size_t i;
-
-    if (!(caps = virQEMUDriverGetCapabilities(driver, false)))
-        goto cleanup;
-
-    if (!qemuMigrationJobIsActive(vm, QEMU_ASYNC_JOB_MIGRATION_OUT))
-        goto cleanup;
-
-    /* unplug passthrough bond device */
-    for (i = 0; i < vm->def->nhostdevs; i++) {
-        virDomainHostdevDefPtr hostdev = vm->def->hostdevs[i];
-
-        if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
-            hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
-            hostdev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO &&
-            hostdev->source.subsys.u.pci.device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND) {
-
-            dev.type = VIR_DOMAIN_DEVICE_HOSTDEV;
-            dev.data.hostdev = hostdev;
-
-            dev_copy = virDomainDeviceDefCopy(&dev, vm->def, caps, driver->xmlopt);
-            if (!dev_copy)
-                goto cleanup;
-
-            if (qemuDomainDetachHostDevice(driver, vm, dev_copy) < 0) {
-                virDomainDeviceDefFree(dev_copy);
-                goto cleanup;
-            }
-
-            virDomainDeviceDefFree(dev_copy);
-            if (qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE) < 0)
-                goto cleanup;
-        }
-    }
-
-    ret = 0;
-
- cleanup:
-    virObjectUnref(caps);
-
-    return ret;
-}
 
 static char *
 qemuDomainMigrateBegin3Params(virDomainPtr domain,
@@ -12740,7 +12689,7 @@ qemuDomainMigratePerform3Params(virDomainPtr dom,
         return -1;
     }
 
-    if (qemuDomainRemovePciPassThruDevices(dom->conn, vm) < 0) {
+    if (qemuDomainMigratePciPassThruDevices(driver, vm, false) < 0) {
         qemuDomObjEndAPI(&vm);
         return -1;
     }
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index f07c54d..13a7338 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -1239,8 +1239,9 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver,
     virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
     unsigned int flags = 0;
 
-    if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs + 1) < 0)
-        return -1;
+    if (hostdev->state != VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE)
+        if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs + 1) < 0)
+            return -1;
 
     if (!cfg->relaxedACS)
         flags |= VIR_HOSTDEV_STRICT_ACS_CHECK;
@@ -1344,7 +1345,8 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver,
     if (ret < 0)
         goto error;
 
-    vm->def->hostdevs[vm->def->nhostdevs++] = hostdev;
+    if (hostdev->state != VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE)
+        vm->def->hostdevs[vm->def->nhostdevs++] = hostdev;
 
     VIR_FREE(devstr);
     VIR_FREE(configfd_name);
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 9ea83df..291cb9f 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -2001,10 +2001,7 @@ qemuMigrationIsAllowed(virQEMUDriverPtr driver, virDomainObjPtr vm,
     for (i = 0; i < def->nhostdevs; i++) {
         virDomainHostdevDefPtr hostdev = def->hostdevs[i];
 
-        if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
-            hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
-            hostdev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO &&
-            hostdev->source.subsys.u.pci.device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND)
+        if (hostdev->state == VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE)
             continue;
 
         if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS ||
@@ -2629,6 +2626,80 @@ qemuMigrationCleanup(virDomainObjPtr vm,
 }
 
 
+static void
+qemuMigrationSetStateForHostdev(virDomainDefPtr def,
+                                int state)
+{
+    virDomainHostdevDefPtr hostdev;
+    size_t i;
+
+    if (!def)
+        return;
+
+    for (i = 0; i < def->nhostdevs; i++) {
+        hostdev = def->hostdevs[i];
+
+        if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
+            hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
+            hostdev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO &&
+            hostdev->source.subsys.u.pci.device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND)
+            hostdev->state = state;
+    }
+}
+
+
+int
+qemuDomainMigratePciPassThruDevices(virQEMUDriverPtr driver,
+                                    virDomainObjPtr vm,
+                                    bool isPlug)
+{
+    virDomainDeviceDef dev;
+    virDomainDeviceDefPtr dev_copy = NULL;
+    virDomainHostdevDefPtr hostdev;
+    virCapsPtr caps = NULL;
+    int ret = -1;
+    int i;
+
+    if (!(caps = virQEMUDriverGetCapabilities(driver, false)))
+        goto cleanup;
+
+    /* plug/unplug passthrough bond device */
+    for (i = vm->def->nhostdevs; i >= 0; i--) {
+        hostdev = vm->def->hostdevs[i];
+
+        if (hostdev->state == VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE) {
+            if (!isPlug) {
+                dev.type = VIR_DOMAIN_DEVICE_HOSTDEV;
+                dev.data.hostdev = hostdev;
+
+                dev_copy = virDomainDeviceDefCopy(&dev, vm->def, caps, driver->xmlopt);
+                if (!dev_copy)
+                    goto cleanup;
+
+                if (qemuDomainDetachHostDevice(driver, vm, dev_copy) < 0) {
+                    virDomainDeviceDefFree(dev_copy);
+                    goto cleanup;
+                }
+                virDomainDeviceDefFree(dev_copy);
+            } else {
+                qemuMigrationSetStateForHostdev(vm->def, VIR_DOMAIN_HOSTDEV_STATE_DEFAULT);
+                if (qemuDomainAttachHostDevice(NULL, driver, vm, hostdev) < 0)
+                    goto cleanup;
+            }
+            if (qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE) < 0)
+                goto cleanup;
+        }
+    }
+
+    ret = 0;
+
+ cleanup:
+    virObjectUnref(caps);
+
+    return ret;
+}
+
+
 /* The caller is supposed to lock the vm and start a migration job. */
 static char
 *qemuMigrationBeginPhase(virQEMUDriverPtr driver,
@@ -2662,6 +2733,8 @@ static char
     if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_OUT)
         qemuMigrationJobSetPhase(driver, vm, QEMU_MIGRATION_PHASE_BEGIN3);
 
+    qemuMigrationSetStateForHostdev(vm->def, VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE);
+
     if (!qemuMigrationIsAllowed(driver, vm, NULL, true, abort_on_error))
         goto cleanup;
 
@@ -2885,6 +2958,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver,
     if (!(caps = virQEMUDriverGetCapabilities(driver, false)))
         goto cleanup;
 
+    qemuMigrationSetStateForHostdev(*def, VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE);
+
     if (!qemuMigrationIsAllowed(driver, NULL, *def, true, abort_on_error))
         goto cleanup;
 
@@ -5315,6 +5390,13 @@ qemuMigrationFinish(virQEMUDriverPtr driver,
             goto endjob;
         }
 
+        /* hotplug previous mark migrate hostdev */
+        if (qemuDomainMigratePciPassThruDevices(driver, vm, true) < 0) {
+            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                           _("passthrough for hostdev failed"));
+            goto endjob;
+        }
+
         /* Guest is successfully running, so cancel previous auto destroy */
         qemuProcessAutoDestroyRemove(driver, vm);
     } else if (!(flags & VIR_MIGRATE_OFFLINE)) {
@@ -5331,6 +5413,8 @@ qemuMigrationFinish(virQEMUDriverPtr driver,
         VIR_WARN("Unable to encode migration cookie");
 
  endjob:
+    qemuMigrationSetStateForHostdev(vm->def, VIR_DOMAIN_HOSTDEV_STATE_DEFAULT);
+
     qemuMigrationJobFinish(driver, vm);
     if (!vm->persistent && !virDomainObjIsActive(vm))
         qemuDomainRemoveInactive(driver, vm);
diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h
index 1726455..fa21752 100644
--- a/src/qemu/qemu_migration.h
+++ b/src/qemu/qemu_migration.h
@@ -177,4 +177,8 @@ int qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm,
     ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(5)
     ATTRIBUTE_RETURN_CHECK;
 
+int qemuDomainMigratePciPassThruDevices(virQEMUDriverPtr driver,
+                                        virDomainObjPtr vm,
+                                        bool isPlug);
+
 #endif /* __QEMU_MIGRATION_H__ */
diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c
index f583e54..4b6152a 100644
--- a/src/util/virhostdev.c
+++ b/src/util/virhostdev.c
@@ -206,6 +206,9 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs)
         virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci;
         virPCIDevicePtr dev;
 
+        if (hostdev->state == VIR_DOMAIN_HOSTDEV_STATE_READY_FOR_MIGRATE)
+            continue;
+
         if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS)
             continue;
         if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 0/3] add support migration with passthrough device
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
                   ` (6 preceding siblings ...)
  2015-04-17  8:53 ` [Qemu-devel] [RFC 7/7] migrate: add hostdev migrate status to support hostdev migration Chen Fan
@ 2015-04-17  8:53 ` Chen Fan
  2015-04-17  8:53   ` [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command Chen Fan
                     ` (2 more replies)
  2015-04-19 22:29 ` [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal Laine Stump
  2015-04-22  9:23 ` [Qemu-devel] " Daniel P. Berrange
  9 siblings, 3 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

the patches is for libvirt to support migration with passthrough
device using existing feacture.

Chen Fan (3):
  qemu-agent: add guest-network-set-interface command
  qemu-agent: add guest-network-delete-interface command
  qemu-agent: add notify for qemu-ga boot

 configure            |  16 +++
 qga/commands-posix.c | 312 +++++++++++++++++++++++++++++++++++++++++++++++++++
 qga/commands-win32.c |  13 +++
 qga/main.c           |  13 +++
 qga/qapi-schema.json |  65 +++++++++++
 5 files changed, 419 insertions(+)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command
  2015-04-17  8:53 ` [Qemu-devel] [RFC 0/3] add support migration with passthrough device Chen Fan
@ 2015-04-17  8:53   ` Chen Fan
  2015-05-21 13:52     ` Olga Krishtal
  2015-04-17  8:53   ` [Qemu-devel] [RFC 2/3] qemu-agent: add guest-network-delete-interface command Chen Fan
  2015-04-17  8:53   ` [Qemu-devel] [RFC 3/3] qemu-agent: add notify for qemu-ga boot Chen Fan
  2 siblings, 1 reply; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

Nowadays, qemu has supported physical NIC hotplug for high network
throughput. but it's in conflict with live migration feature, to keep
network connectivity, we could to create bond device interface which
provides a mechanism for enslaving multiple network interfaces into a
single "bond" interface. the active-backup mode can be used for an
automatic switch. so this patch is adding a guest-network-set-interface
command for creating bond device. so the management can easy to create
a bond device dynamically when guest running.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 configure            |  16 ++++
 qga/commands-posix.c | 261 +++++++++++++++++++++++++++++++++++++++++++++++++++
 qga/commands-win32.c |   7 ++
 qga/qapi-schema.json |  54 +++++++++++
 4 files changed, 338 insertions(+)

diff --git a/configure b/configure
index f185dd0..ebfcc6a 100755
--- a/configure
+++ b/configure
@@ -3618,6 +3618,18 @@ if test "$darwin" != "yes" -a "$mingw32" != "yes" -a "$solaris" != yes -a \
 fi
 
 ##########################################
+# Do we need netcf
+netcf=no
+cat > $TMPC << EOF
+#include <netcf.h>
+int main(void) { return 0; }
+EOF
+if compile_prog "" "-lnetcf" ; then
+    netcf=yes
+    libs_qga="$libs_qga -lnetcf"
+fi
+
+##########################################
 # spice probe
 if test "$spice" != "no" ; then
   cat > $TMPC << EOF
@@ -4697,6 +4709,10 @@ if test "$spice" = "yes" ; then
   echo "CONFIG_SPICE=y" >> $config_host_mak
 fi
 
+if test "$netcf" = "yes" ; then
+  echo "CONFIG_NETCF=y" >> $config_host_mak
+fi
+
 if test "$smartcard_nss" = "yes" ; then
   echo "CONFIG_SMARTCARD_NSS=y" >> $config_host_mak
   echo "NSS_LIBS=$nss_libs" >> $config_host_mak
diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index f6f3e3c..5ee7949 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -46,6 +46,10 @@ extern char **environ;
 #include <sys/socket.h>
 #include <net/if.h>
 
+#ifdef CONFIG_NETCF
+#include <netcf.h>
+#endif
+
 #ifdef FIFREEZE
 #define CONFIG_FSFREEZE
 #endif
@@ -1719,6 +1723,263 @@ error:
     return NULL;
 }
 
+#ifdef CONFIG_NETCF
+static const char *interface_type_string[] = {
+    "bond",
+};
+
+static const char *ip_address_type_string[] = {
+    "ipv4",
+    "ipv6",
+};
+
+static char *parse_options(const char *str, const char *needle)
+{
+    char *start, *end, *buffer = NULL;
+    char *ret = NULL;
+
+    buffer = g_strdup(str);
+    start = buffer;
+    if ((start = strstr(start, needle))) {
+        start += strlen(needle);
+        end = strchr(start, ' ');
+        if (end) {
+            *end = '\0';
+        }
+        if (strlen(start) == 0) {
+            goto cleanup;
+        }
+        ret = g_strdup(start);
+    }
+
+cleanup:
+    g_free(buffer);
+    return ret;
+}
+
+/**
+ * @buffer: xml string data to be formatted
+ * @indent: indent number relative to first line
+ *
+ */
+static void adjust_indent(char **buffer, int indent)
+{
+    char spaces[1024];
+    int i;
+
+    if (!*buffer) {
+        return;
+    }
+
+    if (indent < 0 || indent >= 1024) {
+        return;
+    }
+    memset(spaces, 0, sizeof(spaces));
+    for (i = 0; i < indent; i++) {
+        spaces[i] = ' ';
+    }
+
+    sprintf(*buffer + strlen(*buffer), "%s", spaces);
+}
+
+static char *create_bond_interface(GuestNetworkInterface2 *interface)
+{
+    char *target_xml;
+
+    target_xml = g_malloc0(1024);
+    if (!target_xml) {
+        return NULL;
+    }
+
+    sprintf(target_xml, "<interface type='%s' name='%s'>\n",
+            interface_type_string[interface->type], interface->name);
+    adjust_indent(&target_xml, 2);
+    sprintf(target_xml + strlen(target_xml), "<start mode='%s'/>\n",
+            interface->has_onboot ? interface->onboot : "none");
+    if (interface->has_ip_address) {
+        GuestIpAddress *address_item = interface->ip_address;
+
+        adjust_indent(&target_xml, 2);
+        sprintf(target_xml + strlen(target_xml), "<protocol family='%s'>\n",
+                ip_address_type_string[address_item->ip_address_type]);
+        adjust_indent(&target_xml, 4);
+        sprintf(target_xml + strlen(target_xml), "<ip address='%s' prefix='%" PRId64 "'/>\n",
+                address_item->ip_address, address_item->prefix);
+        if (address_item->has_gateway) {
+            adjust_indent(&target_xml, 4);
+            sprintf(target_xml + strlen(target_xml), "<route gateway='%s'/>\n",
+                    address_item->gateway);
+        }
+        adjust_indent(&target_xml, 2);
+        sprintf(target_xml + strlen(target_xml), "%s\n", "</protocol>");
+    }
+
+    adjust_indent(&target_xml, 2);
+    if (interface->has_options) {
+        char *value;
+
+        value = parse_options(interface->options, "mode=");
+        if (value) {
+            sprintf(target_xml + strlen(target_xml), "<bond mode='%s'>\n",
+                    value);
+            g_free(value);
+        } else {
+            sprintf(target_xml + strlen(target_xml), "%s\n", "<bond>");
+        }
+
+        value = parse_options(interface->options, "miimon=");
+        if (value) {
+            adjust_indent(&target_xml, 4);
+            sprintf(target_xml + strlen(target_xml), "<miimon freq='%s'",
+                   value);
+            g_free(value);
+
+            value = parse_options(interface->options, "updelay=");
+            if (value) {
+                sprintf(target_xml + strlen(target_xml), " updelay='%s'",
+                        value);
+                g_free(value);
+            }
+            value = parse_options(interface->options, "downdelay=");
+            if (value) {
+                sprintf(target_xml + strlen(target_xml), " downdelay='%s'",
+                        value);
+                g_free(value);
+            }
+            value = parse_options(interface->options, "use_carrier=");
+            if (value) {
+                sprintf(target_xml + strlen(target_xml), " carrier='%s'",
+                        value);
+                g_free(value);
+            }
+
+            sprintf(target_xml + strlen(target_xml), "%s\n", "/>");
+        }
+
+        value = parse_options(interface->options, "arp_interval=");
+        if (value) {
+            adjust_indent(&target_xml, 4);
+            sprintf(target_xml + strlen(target_xml), "<arpmon interval='%s'",
+                    value);
+            g_free(value);
+
+            value = parse_options(interface->options, "arp_ip_target=");
+            if (value) {
+                sprintf(target_xml + strlen(target_xml), " target='%s'",
+                        value);
+                g_free(value);
+            }
+
+            value = parse_options(interface->options, "arp_validate=");
+            if (value) {
+                sprintf(target_xml + strlen(target_xml), " validate='%s'",
+                        value);
+                g_free(value);
+            }
+
+            sprintf(target_xml + strlen(target_xml), "%s\n", "/>");
+        }
+    } else {
+        sprintf(target_xml + strlen(target_xml), "%s\n", "<bond>");
+    }
+
+    if (interface->has_subInterfaces) {
+        GuestNetworkInterfaceList *head = interface->subInterfaces;
+
+        for (; head; head = head->next) {
+            adjust_indent(&target_xml, 4);
+            sprintf(target_xml + strlen(target_xml),
+                    "<interface type='ethernet' name='%s'/>\n",
+                    head->value->name);
+        }
+    }
+
+    adjust_indent(&target_xml, 2);
+    sprintf(target_xml + strlen(target_xml), "%s\n", "</bond>");
+    sprintf(target_xml + strlen(target_xml), "%s\n", "</interface>");
+
+    return target_xml;
+}
+
+static struct netcf *netcf;
+
+static void create_interface(GuestNetworkInterface2 *interface, Error **errp)
+{
+    int ret = -1;
+    struct netcf_if *iface;
+    unsigned int flags = 0;
+    char *target_xml;
+
+    /* open netcf */
+    if (netcf == NULL) {
+        if (ncf_init(&netcf, NULL) != 0) {
+            error_setg(errp, "netcf init failed");
+            return;
+        }
+    }
+
+    if (interface->type != GUEST_INTERFACE_TYPE_BOND) {
+        error_setg(errp, "interface type is not supported, only support 'bond' type");
+        return;
+    }
+
+   target_xml = create_bond_interface(interface);
+   if (!target_xml) {
+        error_setg(errp, "no enough memory spaces");
+        return;
+    }
+
+    iface = ncf_define(netcf, target_xml);
+    if (!iface) {
+        error_setg(errp, "netcf interface define failed");
+        g_free(target_xml);
+        goto cleanup;
+    }
+
+    g_free(target_xml);
+
+    if (ncf_if_status(iface, &flags) < 0) {
+        error_setg(errp, "netcf interface get status failed");
+        goto cleanup;
+    }
+
+    if (flags & NETCF_IFACE_ACTIVE) {
+        error_setg(errp, "interface is already running");
+        goto cleanup;
+    }
+
+    ret = ncf_if_up(iface);
+    if (ret < 0) {
+        error_setg(errp, "netcf interface up failed");
+        goto cleanup;
+    }
+
+ cleanup:
+    ncf_if_free(iface);
+}
+
+int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
+                                        Error **errp)
+{
+    Error *local_err = NULL;
+
+    create_interface(interface, &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return -1;
+    }
+
+    return 0;
+}
+#else
+int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
+                                        Error **errp)
+{
+    error_set(errp, QERR_UNSUPPORTED);
+    return -1;
+}
+#endif
+
 #define SYSCONF_EXACT(name, errp) sysconf_exact((name), #name, (errp))
 
 static long sysconf_exact(int name, const char *name_str, Error **errp)
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 3bcbeae..4c14514 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -446,6 +446,13 @@ int64_t qmp_guest_set_vcpus(GuestLogicalProcessorList *vcpus, Error **errp)
     return -1;
 }
 
+int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
+                                        Error **errp)
+{
+    error_set(errp, QERR_UNSUPPORTED);
+    return -1;
+}
+
 /* add unsupported commands to the blacklist */
 GList *ga_command_blacklist_init(GList *blacklist)
 {
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 376e79f..77f499b 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -556,6 +556,7 @@
 { 'type': 'GuestIpAddress',
   'data': {'ip-address': 'str',
            'ip-address-type': 'GuestIpAddressType',
+           '*gateway': 'str',
            'prefix': 'int'} }
 
 ##
@@ -575,6 +576,43 @@
            '*ip-addresses': ['GuestIpAddress'] } }
 
 ##
+# @GuestInterfaceType:
+#
+# An enumeration of supported interface types
+#
+# @bond: bond device
+#
+# Since: 2.3
+##
+{ 'enum': 'GuestInterfaceType',
+  'data': [ 'bond' ] }
+
+##
+# @GuestNetworkInterface2:
+#
+# @type: the interface type which supported in enum GuestInterfaceType.
+#
+# @name: the interface name.
+#
+# @onboot: the interface start model.
+#
+# @ip-address: IP address.
+#
+# @options: the options argument.
+#
+# @subInterfaces: the slave interfaces.
+#
+# Since: 2.3
+##
+{ 'type': 'GuestNetworkInterface2',
+  'data': {'type': 'GuestInterfaceType',
+           'name': 'str',
+           '*onboot': 'str',
+           '*ip-address': 'GuestIpAddress',
+           '*options': 'str',
+           '*subInterfaces': ['GuestNetworkInterface'] } }
+
+##
 # @guest-network-get-interfaces:
 #
 # Get list of guest IP addresses, MAC addresses
@@ -588,6 +626,22 @@
   'returns': ['GuestNetworkInterface'] }
 
 ##
+# @guest-network-set-interface:
+#
+# Set guest network interface
+#
+# return: 0:      call successful.
+#
+#         -1:     call failed.
+#
+#
+# Since: 2.3
+##
+{ 'command': 'guest-network-set-interface',
+  'data'   : {'interface': 'GuestNetworkInterface2' },
+  'returns': 'int' }
+
+##
 # @GuestLogicalProcessor:
 #
 # @logical-id: Arbitrary guest-specific unique identifier of the VCPU.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 2/3] qemu-agent: add guest-network-delete-interface command
  2015-04-17  8:53 ` [Qemu-devel] [RFC 0/3] add support migration with passthrough device Chen Fan
  2015-04-17  8:53   ` [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command Chen Fan
@ 2015-04-17  8:53   ` Chen Fan
  2015-04-17  8:53   ` [Qemu-devel] [RFC 3/3] qemu-agent: add notify for qemu-ga boot Chen Fan
  2 siblings, 0 replies; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

Add a corresponding command to guest-network-set-interface.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 qga/commands-posix.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 qga/commands-win32.c |  6 ++++++
 qga/qapi-schema.json | 11 +++++++++++
 3 files changed, 68 insertions(+)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 5ee7949..058085f 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1971,6 +1971,51 @@ int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
 
     return 0;
 }
+
+int64_t qmp_guest_network_delete_interface(const char *name, Error **errp)
+{
+    struct netcf_if *iface;
+    int ret = -1;
+    unsigned int flags = 0;
+
+    /* open netcf */
+    if (netcf == NULL) {
+        if (ncf_init(&netcf, NULL) != 0) {
+            error_setg(errp, "netcf init failed");
+            return ret;
+        }
+    }
+
+    iface = ncf_lookup_by_name(netcf, name);
+    if (!iface) {
+       error_setg(errp, "couldn't find interface named '%s'", name);
+       return ret;
+    }
+
+    if (ncf_if_status(iface, &flags) < 0) {
+        error_setg(errp, "netcf interface get status failed");
+        goto cleanup;
+    }
+
+    if (flags & NETCF_IFACE_ACTIVE) {
+        ret = ncf_if_down(iface);
+        if (ret < 0) {
+            error_setg(errp, "netcf interface stop failed");
+            goto cleanup;
+        }
+    }
+
+    ret = ncf_if_undefine(iface);
+    if (ret < 0) {
+        error_setg(errp, "netcf interface delete failed");
+        goto cleanup;
+    }
+
+    ret = 0;
+cleanup:
+    ncf_if_free(iface);
+    return ret;
+}
 #else
 int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
                                         Error **errp)
@@ -1978,6 +2023,12 @@ int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
     error_set(errp, QERR_UNSUPPORTED);
     return -1;
 }
+
+int64_t qmp_guest_network_delete_interface(const char *name, Error **errp)
+{
+    error_set(errp, QERR_UNSUPPORTED);
+    return -1;
+}
 #endif
 
 #define SYSCONF_EXACT(name, errp) sysconf_exact((name), #name, (errp))
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 4c14514..52f6e47 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -453,6 +453,12 @@ int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
     return -1;
 }
 
+int64_t qmp_guest_network_delete_interface(const char *name, Error **errp)
+{
+    error_set(errp, QERR_UNSUPPORTED);
+    return -1;
+}
+
 /* add unsupported commands to the blacklist */
 GList *ga_command_blacklist_init(GList *blacklist)
 {
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 77f499b..b886f97 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -642,6 +642,17 @@
   'returns': 'int' }
 
 ##
+# @guest-network-delete-interface:
+#
+# @name: interface name.
+#
+# Since: 2.3
+##
+{ 'command': 'guest-network-delete-interface',
+  'data'   : {'name': 'str' },
+  'returns': 'int' }
+
+##
 # @GuestLogicalProcessor:
 #
 # @logical-id: Arbitrary guest-specific unique identifier of the VCPU.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [RFC 3/3] qemu-agent: add notify for qemu-ga boot
  2015-04-17  8:53 ` [Qemu-devel] [RFC 0/3] add support migration with passthrough device Chen Fan
  2015-04-17  8:53   ` [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command Chen Fan
  2015-04-17  8:53   ` [Qemu-devel] [RFC 2/3] qemu-agent: add guest-network-delete-interface command Chen Fan
@ 2015-04-17  8:53   ` Chen Fan
  2015-04-21 23:38     ` Eric Blake
  2 siblings, 1 reply; 45+ messages in thread
From: Chen Fan @ 2015-04-17  8:53 UTC (permalink / raw)
  To: libvir-list; +Cc: izumi.taku, qemu-devel

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 qga/main.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/qga/main.c b/qga/main.c
index 9939a2b..f011ce0 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -1170,6 +1170,19 @@ int main(int argc, char **argv)
         g_critical("failed to initialize guest agent channel");
         goto out_bad;
     }
+
+    /* send a notification to path */
+    if (ga_state->channel) {
+        QDict *qdict = qdict_new();
+        int ret;
+
+        qdict_put_obj(qdict, "status", QOBJECT(qstring_from_str("connected")));
+        ret = send_response(s, QOBJECT(qdict));
+        if (ret < 0) {
+            g_warning("error sending connected status");
+        }
+    }
+
 #ifndef _WIN32
     g_main_loop_run(ga_state->main_loop);
 #else
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
                   ` (7 preceding siblings ...)
  2015-04-17  8:53 ` [Qemu-devel] [RFC 0/3] add support migration with passthrough device Chen Fan
@ 2015-04-19 22:29 ` Laine Stump
  2015-04-22  4:22   ` Chen Fan
  2015-04-23  8:34   ` Chen Fan
  2015-04-22  9:23 ` [Qemu-devel] " Daniel P. Berrange
  9 siblings, 2 replies; 45+ messages in thread
From: Laine Stump @ 2015-04-19 22:29 UTC (permalink / raw)
  To: Chen Fan, libvir-list; +Cc: qemu-devel

On 04/17/2015 04:53 AM, Chen Fan wrote:
> backgrond:
> Live migration is one of the most important features of virtualization technology.
> With regard to recent virtualization techniques, performance of network I/O is critical.
> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> performance gap with native network I/O. Pass-through network devices have near
> native performance, however, they have thus far prevented live migration. No existing
> methods solve the problem of live migration with pass-through devices perfectly.
>
> There was an idea to solve the problem in website:
> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> Please refer to above document for detailed information.

This functionality has been on my mind/bug list for a long time, but I
haven't been able to pursue it much. See this BZ, along with the
original patches submitted by Shradha Shah from SolarFlare:

https://bugzilla.redhat.com/show_bug.cgi?id=896716

(I was a bit optimistic in my initial review of the patches - there are
actually a lot of issues that weren't handled by those patches.)

>
> So I think this problem maybe could be solved by using the combination of existing
> technology. and the following steps are we considering to implement:
>
> -  before boot VM, we anticipate to specify two NICs for creating bonding device
>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.

An interesting idea, but I think that is a 2nd level enhancement, not
necessary initially (and maybe not ever, due to the high possibility of
it being extremely difficult to get right in 100% of the cases).

>
> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
>    then libvirt will call the previous registered initialize callbacks. so through
>    the callback functions, we can create the bonding device according to the XML
>    configuration. and here we use netcf tool which can facilitate to create bonding device
>    easily.

This isn't quite making sense - the bond will be on the guest, which may
not have netcf installed. Anyway, I think it should be up to the guest's
own system network config to have the bond already setup. If you try to
impose it from outside that infrastructure, you run too much risk of
running afoul of something on the guest (e.g. NetworkManager)


>
> -  during migration, unplug the passthroughed NIC. then do native migration.

Correct. This is the most important part. But not just unplugging it,
you also need to wait until the unplug operation completes (it is
asynchronous). (After this point, the emulated NIC that is part of the
bond would get all of the traffic).

>
> -  on destination side, check whether need to hotplug new NIC according to specified XML.
>    usually, we use migrate "--xml" command option to specify the destination host NIC mac
>    address to hotplug a new NIC, because source side passthrough NIC mac address is different,
>    then hotplug the deivce according to the destination XML configuration.

Why does the MAC address need to be different? Are you suggesting doing
this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
its MAC address from the libvirt config, so it's very simple to use the
same MAC address across the migration. Any network card that would be
able to do this on any sort of useful scale will be SRIOV-capable (or
should be replaced with one that is - some of them are not that expensive).


>
> TODO:
>   1.  when hot add a new NIC in destination side after migration finished, the NIC device
>       need to re-enslave on bonding device in guest. otherwise, it is offline. maybe
>       we should consider bonding driver to support add interfaces dynamically.

I never looked at the details of how SolarFlare's code handled the guest
side (they have/had their own patchset they maintained for some older
version of libvirt which integrated with some sort of enhanced bonding
driver on the guests). I assumed the bond driver could handle this
already, but have to say I never investigated.


>
> This is an example on how this might work, so I want to hear some voices about this scenario.
>
> Thanks,
> Chen
>
> Chen Fan (7):
>   qemu-agent: add agent init callback when detecting guest setup
>   qemu: add guest init event callback to do the initialize work for
>     guest
>   hostdev: add a 'bond' type element in <hostdev> element


Putting this into <hostdev> is the wrong approach, for two reasons: 1)
it doesn't account for the device to be used being in a different
address on the source and destination hosts, 2) the <interface> element
already has much of the config you need, and an interface type
supporting hostdev passthrough.

It has been possible to do passthrough of an SRIOV VF via <interface
type='hostdev'> for a long time now and, even better, via an <interface
type='network'> where the network pointed to contains a pool of VFs - As
long as the source and destination hosts both have networks with the
same name, libvirt will be able to find a currently available device on
the destination as it migrates from one host to another instead of
relying on both hosts having the exact same device at the exact same
address on the host and destination (and also magically unused by any
other guest). This page explains the use of a "hostdev network" which
has a pool of devices:

http://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition

This was designed specifically with the idea in mind that one day it
would be possible to migrate a domain with a hostdev device (as long as
the guest could handle the hostdev device being temporarily unplugged
during the migration).

>   qemu-agent: add qemuAgentCreateBond interface
>   hostdev: add parse ip and route for bond configure

Again, I think that this level of detail about the guest network config
belongs on the guest, not in libvirt.

>   migrate: hot remove hostdev at perform phase for bond device

^^ this is the useful part but I don't think the right method is to make
this action dependent on the device being a "bond".

I think that in this respect Shradha's patches had a better idea - any
hostdev (or, by implication <interface type='hostdev'> or, much more
usefully <interface type='network'> pointing to a pool of VFs - could
have an attribute "ephemeral". If ephemeral was "yes", then the device
would always be unplugged prior to migration and re-plugged when
migration was completed (the same thing should be done when
saving/restoring a domain which also can't currently be done with a
domain that has a passthrough device).

For that matter, this could be a general-purpose thing (although
probably most useful for hostdevs) - just make it possible for *any*
hotpluggable device to be "ephemeral"; the meaning of this would be that
every device marked as ephemeral should be unplugged prior to migration
or save (and libvirt should wait for qemu to notify that the unplug is
completed), and re-plugged right after the guest is restarted.

(possibly it should be implemented as an <ephemeral> *element* rather
than attribute, so that options could be specified).

After that is implemented and works properly, then it might be the time
to think about auto-creating the bond (although again, my opinion is
that this is getting a bit too intrusive into the guest (and making it
more likely to fail - I know from long experience with netcf that it is
all too easy for some other service on the system (ahem) to mess up all
your hard work); I think it would be better to just let the guest deal
with setting up a bond in its system network config, and if the bond
driver can't handle having a device in the bond unplugging and plugging,
then the bond driver should be enhanced).


>   migrate: add hostdev migrate status to support hostdev migration
>
>  docs/schemas/basictypes.rng   |   6 ++
>  docs/schemas/domaincommon.rng |  37 ++++++++
>  src/conf/domain_conf.c        | 195 ++++++++++++++++++++++++++++++++++++++---
>  src/conf/domain_conf.h        |  40 +++++++--
>  src/conf/networkcommon_conf.c |  17 ----
>  src/conf/networkcommon_conf.h |  17 ++++
>  src/libvirt_private.syms      |   1 +
>  src/qemu/qemu_agent.c         | 196 +++++++++++++++++++++++++++++++++++++++++-
>  src/qemu/qemu_agent.h         |  12 +++
>  src/qemu/qemu_command.c       |   3 +
>  src/qemu/qemu_domain.c        |  70 +++++++++++++++
>  src/qemu/qemu_domain.h        |  14 +++
>  src/qemu/qemu_driver.c        |  38 ++++++++
>  src/qemu/qemu_hotplug.c       |   8 +-
>  src/qemu/qemu_migration.c     |  91 ++++++++++++++++++++
>  src/qemu/qemu_migration.h     |   4 +
>  src/qemu/qemu_process.c       |  32 +++++++
>  src/util/virhostdev.c         |   3 +
>  18 files changed, 745 insertions(+), 39 deletions(-)
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 3/3] qemu-agent: add notify for qemu-ga boot
  2015-04-17  8:53   ` [Qemu-devel] [RFC 3/3] qemu-agent: add notify for qemu-ga boot Chen Fan
@ 2015-04-21 23:38     ` Eric Blake
  0 siblings, 0 replies; 45+ messages in thread
From: Eric Blake @ 2015-04-21 23:38 UTC (permalink / raw)
  To: Chen Fan, libvir-list; +Cc: izumi.taku, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 732 bytes --]

On 04/17/2015 02:53 AM, Chen Fan wrote:
> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> ---
>  qga/main.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)

I'm not sure that qga should be sending asynchronous messages (so far,
it only every replies synchronously).  As it is, we already wired up a
qemu event that fires any time the guest opens or closes the virtio
connection powering the agent; libvirt can already use those events to
know when the agent has opened the connection, and is presumably ready
to listen to commands after first booting.  So I don't think this patch
is needed.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-19 22:29 ` [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal Laine Stump
@ 2015-04-22  4:22   ` Chen Fan
  2015-04-23 14:14     ` Laine Stump
  2015-04-23  8:34   ` Chen Fan
  1 sibling, 1 reply; 45+ messages in thread
From: Chen Fan @ 2015-04-22  4:22 UTC (permalink / raw)
  To: Laine Stump, libvir-list; +Cc: izumi.taku, qemu-devel

Hi Laine,

Thanks for your review for my patches.

and do you know that solarflare's patches have made some update version
since

https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html

?

if not, I hope to go on to complete this work. ;)

Thanks,
Chen


On 04/20/2015 06:29 AM, Laine Stump wrote:
> On 04/17/2015 04:53 AM, Chen Fan wrote:
>> backgrond:
>> Live migration is one of the most important features of virtualization technology.
>> With regard to recent virtualization techniques, performance of network I/O is critical.
>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
>> performance gap with native network I/O. Pass-through network devices have near
>> native performance, however, they have thus far prevented live migration. No existing
>> methods solve the problem of live migration with pass-through devices perfectly.
>>
>> There was an idea to solve the problem in website:
>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
>> Please refer to above document for detailed information.
> This functionality has been on my mind/bug list for a long time, but I
> haven't been able to pursue it much. See this BZ, along with the
> original patches submitted by Shradha Shah from SolarFlare:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=896716
>
> (I was a bit optimistic in my initial review of the patches - there are
> actually a lot of issues that weren't handled by those patches.)
>
>> So I think this problem maybe could be solved by using the combination of existing
>> technology. and the following steps are we considering to implement:
>>
>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
>>     (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
>>     in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> An interesting idea, but I think that is a 2nd level enhancement, not
> necessary initially (and maybe not ever, due to the high possibility of
> it being extremely difficult to get right in 100% of the cases).
>
>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
>>     then libvirt will call the previous registered initialize callbacks. so through
>>     the callback functions, we can create the bonding device according to the XML
>>     configuration. and here we use netcf tool which can facilitate to create bonding device
>>     easily.
> This isn't quite making sense - the bond will be on the guest, which may
> not have netcf installed. Anyway, I think it should be up to the guest's
> own system network config to have the bond already setup. If you try to
> impose it from outside that infrastructure, you run too much risk of
> running afoul of something on the guest (e.g. NetworkManager)
>
>
>> -  during migration, unplug the passthroughed NIC. then do native migration.
> Correct. This is the most important part. But not just unplugging it,
> you also need to wait until the unplug operation completes (it is
> asynchronous). (After this point, the emulated NIC that is part of the
> bond would get all of the traffic).
>
>> -  on destination side, check whether need to hotplug new NIC according to specified XML.
>>     usually, we use migrate "--xml" command option to specify the destination host NIC mac
>>     address to hotplug a new NIC, because source side passthrough NIC mac address is different,
>>     then hotplug the deivce according to the destination XML configuration.
> Why does the MAC address need to be different? Are you suggesting doing
> this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
> its MAC address from the libvirt config, so it's very simple to use the
> same MAC address across the migration. Any network card that would be
> able to do this on any sort of useful scale will be SRIOV-capable (or
> should be replaced with one that is - some of them are not that expensive).
>
>
>> TODO:
>>    1.  when hot add a new NIC in destination side after migration finished, the NIC device
>>        need to re-enslave on bonding device in guest. otherwise, it is offline. maybe
>>        we should consider bonding driver to support add interfaces dynamically.
> I never looked at the details of how SolarFlare's code handled the guest
> side (they have/had their own patchset they maintained for some older
> version of libvirt which integrated with some sort of enhanced bonding
> driver on the guests). I assumed the bond driver could handle this
> already, but have to say I never investigated.
>
>
>> This is an example on how this might work, so I want to hear some voices about this scenario.
>>
>> Thanks,
>> Chen
>>
>> Chen Fan (7):
>>    qemu-agent: add agent init callback when detecting guest setup
>>    qemu: add guest init event callback to do the initialize work for
>>      guest
>>    hostdev: add a 'bond' type element in <hostdev> element
>
> Putting this into <hostdev> is the wrong approach, for two reasons: 1)
> it doesn't account for the device to be used being in a different
> address on the source and destination hosts, 2) the <interface> element
> already has much of the config you need, and an interface type
> supporting hostdev passthrough.
>
> It has been possible to do passthrough of an SRIOV VF via <interface
> type='hostdev'> for a long time now and, even better, via an <interface
> type='network'> where the network pointed to contains a pool of VFs - As
> long as the source and destination hosts both have networks with the
> same name, libvirt will be able to find a currently available device on
> the destination as it migrates from one host to another instead of
> relying on both hosts having the exact same device at the exact same
> address on the host and destination (and also magically unused by any
> other guest). This page explains the use of a "hostdev network" which
> has a pool of devices:
>
> http://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition
>
> This was designed specifically with the idea in mind that one day it
> would be possible to migrate a domain with a hostdev device (as long as
> the guest could handle the hostdev device being temporarily unplugged
> during the migration).
>
>>    qemu-agent: add qemuAgentCreateBond interface
>>    hostdev: add parse ip and route for bond configure
> Again, I think that this level of detail about the guest network config
> belongs on the guest, not in libvirt.
>
>>    migrate: hot remove hostdev at perform phase for bond device
> ^^ this is the useful part but I don't think the right method is to make
> this action dependent on the device being a "bond".
>
> I think that in this respect Shradha's patches had a better idea - any
> hostdev (or, by implication <interface type='hostdev'> or, much more
> usefully <interface type='network'> pointing to a pool of VFs - could
> have an attribute "ephemeral". If ephemeral was "yes", then the device
> would always be unplugged prior to migration and re-plugged when
> migration was completed (the same thing should be done when
> saving/restoring a domain which also can't currently be done with a
> domain that has a passthrough device).
>
> For that matter, this could be a general-purpose thing (although
> probably most useful for hostdevs) - just make it possible for *any*
> hotpluggable device to be "ephemeral"; the meaning of this would be that
> every device marked as ephemeral should be unplugged prior to migration
> or save (and libvirt should wait for qemu to notify that the unplug is
> completed), and re-plugged right after the guest is restarted.
>
> (possibly it should be implemented as an <ephemeral> *element* rather
> than attribute, so that options could be specified).
>
> After that is implemented and works properly, then it might be the time
> to think about auto-creating the bond (although again, my opinion is
> that this is getting a bit too intrusive into the guest (and making it
> more likely to fail - I know from long experience with netcf that it is
> all too easy for some other service on the system (ahem) to mess up all
> your hard work); I think it would be better to just let the guest deal
> with setting up a bond in its system network config, and if the bond
> driver can't handle having a device in the bond unplugging and plugging,
> then the bond driver should be enhanced).
>
>
>>    migrate: add hostdev migrate status to support hostdev migration
>>
>>   docs/schemas/basictypes.rng   |   6 ++
>>   docs/schemas/domaincommon.rng |  37 ++++++++
>>   src/conf/domain_conf.c        | 195 ++++++++++++++++++++++++++++++++++++++---
>>   src/conf/domain_conf.h        |  40 +++++++--
>>   src/conf/networkcommon_conf.c |  17 ----
>>   src/conf/networkcommon_conf.h |  17 ++++
>>   src/libvirt_private.syms      |   1 +
>>   src/qemu/qemu_agent.c         | 196 +++++++++++++++++++++++++++++++++++++++++-
>>   src/qemu/qemu_agent.h         |  12 +++
>>   src/qemu/qemu_command.c       |   3 +
>>   src/qemu/qemu_domain.c        |  70 +++++++++++++++
>>   src/qemu/qemu_domain.h        |  14 +++
>>   src/qemu/qemu_driver.c        |  38 ++++++++
>>   src/qemu/qemu_hotplug.c       |   8 +-
>>   src/qemu/qemu_migration.c     |  91 ++++++++++++++++++++
>>   src/qemu/qemu_migration.h     |   4 +
>>   src/qemu/qemu_process.c       |  32 +++++++
>>   src/util/virhostdev.c         |   3 +
>>   18 files changed, 745 insertions(+), 39 deletions(-)
>>
> .
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
                   ` (8 preceding siblings ...)
  2015-04-19 22:29 ` [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal Laine Stump
@ 2015-04-22  9:23 ` Daniel P. Berrange
  2015-04-22 13:05   ` Daniel P. Berrange
                     ` (2 more replies)
  9 siblings, 3 replies; 45+ messages in thread
From: Daniel P. Berrange @ 2015-04-22  9:23 UTC (permalink / raw)
  To: Chen Fan; +Cc: libvir-list, izumi.taku, qemu-devel

On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> backgrond:
> Live migration is one of the most important features of virtualization technology.
> With regard to recent virtualization techniques, performance of network I/O is critical.
> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> performance gap with native network I/O. Pass-through network devices have near
> native performance, however, they have thus far prevented live migration. No existing
> methods solve the problem of live migration with pass-through devices perfectly.
> 
> There was an idea to solve the problem in website:
> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> Please refer to above document for detailed information.
> 
> So I think this problem maybe could be solved by using the combination of existing
> technology. and the following steps are we considering to implement:
> 
> -  before boot VM, we anticipate to specify two NICs for creating bonding device
>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> 
> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
>    then libvirt will call the previous registered initialize callbacks. so through
>    the callback functions, we can create the bonding device according to the XML
>    configuration. and here we use netcf tool which can facilitate to create bonding device
>    easily.

I'm not really clear on why libvirt/guest agent needs to be involved in this.
I think configuration of networking is really something that must be left to
the guest OS admin to control. I don't think the guest agent should be trying
to reconfigure guest networking itself, as that is inevitably going to conflict
with configuration attempted by things in the guest like NetworkManager or
systemd-networkd.

IOW, if you want to do this setup where the guest is given multiple NICs connected
to the same host LAN, then I think we should just let the gues admin configure
bonding in whatever manner they decide is best for their OS install.

> -  during migration, unplug the passthroughed NIC. then do native migration.
> 
> -  on destination side, check whether need to hotplug new NIC according to specified XML.
>    usually, we use migrate "--xml" command option to specify the destination host NIC mac
>    address to hotplug a new NIC, because source side passthrough NIC mac address is different,
>    then hotplug the deivce according to the destination XML configuration.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-22  9:23 ` [Qemu-devel] " Daniel P. Berrange
@ 2015-04-22 13:05   ` Daniel P. Berrange
  2015-04-22 17:01   ` Dr. David Alan Gilbert
  2015-05-19  9:07   ` [Qemu-devel] " Michael S. Tsirkin
  2 siblings, 0 replies; 45+ messages in thread
From: Daniel P. Berrange @ 2015-04-22 13:05 UTC (permalink / raw)
  To: Chen Fan; +Cc: libvir-list, izumi.taku, qemu-devel

On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > backgrond:
> > Live migration is one of the most important features of virtualization technology.
> > With regard to recent virtualization techniques, performance of network I/O is critical.
> > Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > performance gap with native network I/O. Pass-through network devices have near
> > native performance, however, they have thus far prevented live migration. No existing
> > methods solve the problem of live migration with pass-through devices perfectly.
> > 
> > There was an idea to solve the problem in website:
> > https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > Please refer to above document for detailed information.
> > 
> > So I think this problem maybe could be solved by using the combination of existing
> > technology. and the following steps are we considering to implement:
> > 
> > -  before boot VM, we anticipate to specify two NICs for creating bonding device
> >    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> >    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > 
> > -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> >    then libvirt will call the previous registered initialize callbacks. so through
> >    the callback functions, we can create the bonding device according to the XML
> >    configuration. and here we use netcf tool which can facilitate to create bonding device
> >    easily.
> 
> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> I think configuration of networking is really something that must be left to
> the guest OS admin to control. I don't think the guest agent should be trying
> to reconfigure guest networking itself, as that is inevitably going to conflict
> with configuration attempted by things in the guest like NetworkManager or
> systemd-networkd.
> 
> IOW, if you want to do this setup where the guest is given multiple NICs connected
> to the same host LAN, then I think we should just let the gues admin configure
> bonding in whatever manner they decide is best for their OS install.

Thinking about it some more I'm not even convinced this should need direct
support in libvirt or QEMU at all.  We already have the ability to hotplug
and unplug NICs, and the guest OS can be setup to run appropriate scripts
when a PCI hotadd/remove event occurrs (eg via udev rules). So I think
this functionality can be done entirely within the mgmt application (oVirt
or OpenStack) and the guest OS.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-22  9:23 ` [Qemu-devel] " Daniel P. Berrange
  2015-04-22 13:05   ` Daniel P. Berrange
@ 2015-04-22 17:01   ` Dr. David Alan Gilbert
  2015-04-22 17:06     ` Daniel P. Berrange
  2015-05-19  9:07   ` [Qemu-devel] " Michael S. Tsirkin
  2 siblings, 1 reply; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-04-22 17:01 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Chen Fan, libvir-list, qemu-devel, izumi.taku

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > backgrond:
> > Live migration is one of the most important features of virtualization technology.
> > With regard to recent virtualization techniques, performance of network I/O is critical.
> > Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > performance gap with native network I/O. Pass-through network devices have near
> > native performance, however, they have thus far prevented live migration. No existing
> > methods solve the problem of live migration with pass-through devices perfectly.
> > 
> > There was an idea to solve the problem in website:
> > https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > Please refer to above document for detailed information.
> > 
> > So I think this problem maybe could be solved by using the combination of existing
> > technology. and the following steps are we considering to implement:
> > 
> > -  before boot VM, we anticipate to specify two NICs for creating bonding device
> >    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> >    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > 
> > -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> >    then libvirt will call the previous registered initialize callbacks. so through
> >    the callback functions, we can create the bonding device according to the XML
> >    configuration. and here we use netcf tool which can facilitate to create bonding device
> >    easily.
> 
> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> I think configuration of networking is really something that must be left to
> the guest OS admin to control. I don't think the guest agent should be trying
> to reconfigure guest networking itself, as that is inevitably going to conflict
> with configuration attempted by things in the guest like NetworkManager or
> systemd-networkd.
> 
> IOW, if you want to do this setup where the guest is given multiple NICs connected
> to the same host LAN, then I think we should just let the gues admin configure
> bonding in whatever manner they decide is best for their OS install.

I disagree; there should be a way for the admin not to have to do this manually;
however it should interact well with existing management stuff.

At the simplest, something that marks the two NICs in a discoverable way
so that they can be seen that they're part of a set;  with just that ID system
then an installer or setup tool can notice them and offer to put them into
a bond automatically; I'd assume it would be possible to add a rule somewhere
that said anything with the same ID would automatically be added to the bond.

However, I agree that you might be able to avoid having to do anything in the
guest agent.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-22 17:01   ` Dr. David Alan Gilbert
@ 2015-04-22 17:06     ` Daniel P. Berrange
  2015-04-22 17:12       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrange @ 2015-04-22 17:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Chen Fan, libvir-list, qemu-devel, izumi.taku

On Wed, Apr 22, 2015 at 06:01:56PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > backgrond:
> > > Live migration is one of the most important features of virtualization technology.
> > > With regard to recent virtualization techniques, performance of network I/O is critical.
> > > Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > performance gap with native network I/O. Pass-through network devices have near
> > > native performance, however, they have thus far prevented live migration. No existing
> > > methods solve the problem of live migration with pass-through devices perfectly.
> > > 
> > > There was an idea to solve the problem in website:
> > > https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > Please refer to above document for detailed information.
> > > 
> > > So I think this problem maybe could be solved by using the combination of existing
> > > technology. and the following steps are we considering to implement:
> > > 
> > > -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > >    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > >    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > 
> > > -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > >    then libvirt will call the previous registered initialize callbacks. so through
> > >    the callback functions, we can create the bonding device according to the XML
> > >    configuration. and here we use netcf tool which can facilitate to create bonding device
> > >    easily.
> > 
> > I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > I think configuration of networking is really something that must be left to
> > the guest OS admin to control. I don't think the guest agent should be trying
> > to reconfigure guest networking itself, as that is inevitably going to conflict
> > with configuration attempted by things in the guest like NetworkManager or
> > systemd-networkd.
> > 
> > IOW, if you want to do this setup where the guest is given multiple NICs connected
> > to the same host LAN, then I think we should just let the gues admin configure
> > bonding in whatever manner they decide is best for their OS install.
> 
> I disagree; there should be a way for the admin not to have to do this manually;
> however it should interact well with existing management stuff.
> 
> At the simplest, something that marks the two NICs in a discoverable way
> so that they can be seen that they're part of a set;  with just that ID system
> then an installer or setup tool can notice them and offer to put them into
> a bond automatically; I'd assume it would be possible to add a rule somewhere
> that said anything with the same ID would automatically be added to the bond.

I didn't mean the admin would literally configure stuff manually. I really
just meant that the guest OS itself should decide how it is done, whether
NetworkManager magically does the right thing, or the person building the
cloud disk image provides a magic udev rule, or $something else. I just
don't think that the QEMU guest agent should be involved, as that will
definitely trample all over other things that manage networking in the
guest.  I could see this being solved in the cloud disk images by using
cloud-init metadata to mark the NICs as being in a set, or perhaps there
is some magic you could define in SMBIOS tables, or something else again.
A cloud-init based solution wouldn't need any QEMU work, but an SMBIOS
solution might.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-22 17:06     ` Daniel P. Berrange
@ 2015-04-22 17:12       ` Dr. David Alan Gilbert
  2015-04-22 17:15         ` Daniel P. Berrange
  0 siblings, 1 reply; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-04-22 17:12 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Chen Fan, libvir-list, qemu-devel, izumi.taku

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Wed, Apr 22, 2015 at 06:01:56PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > > backgrond:
> > > > Live migration is one of the most important features of virtualization technology.
> > > > With regard to recent virtualization techniques, performance of network I/O is critical.
> > > > Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > > performance gap with native network I/O. Pass-through network devices have near
> > > > native performance, however, they have thus far prevented live migration. No existing
> > > > methods solve the problem of live migration with pass-through devices perfectly.
> > > > 
> > > > There was an idea to solve the problem in website:
> > > > https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > > Please refer to above document for detailed information.
> > > > 
> > > > So I think this problem maybe could be solved by using the combination of existing
> > > > technology. and the following steps are we considering to implement:
> > > > 
> > > > -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > >    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > >    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > > 
> > > > -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > >    then libvirt will call the previous registered initialize callbacks. so through
> > > >    the callback functions, we can create the bonding device according to the XML
> > > >    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > >    easily.
> > > 
> > > I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > I think configuration of networking is really something that must be left to
> > > the guest OS admin to control. I don't think the guest agent should be trying
> > > to reconfigure guest networking itself, as that is inevitably going to conflict
> > > with configuration attempted by things in the guest like NetworkManager or
> > > systemd-networkd.
> > > 
> > > IOW, if you want to do this setup where the guest is given multiple NICs connected
> > > to the same host LAN, then I think we should just let the gues admin configure
> > > bonding in whatever manner they decide is best for their OS install.
> > 
> > I disagree; there should be a way for the admin not to have to do this manually;
> > however it should interact well with existing management stuff.
> > 
> > At the simplest, something that marks the two NICs in a discoverable way
> > so that they can be seen that they're part of a set;  with just that ID system
> > then an installer or setup tool can notice them and offer to put them into
> > a bond automatically; I'd assume it would be possible to add a rule somewhere
> > that said anything with the same ID would automatically be added to the bond.
> 
> I didn't mean the admin would literally configure stuff manually. I really
> just meant that the guest OS itself should decide how it is done, whether
> NetworkManager magically does the right thing, or the person building the
> cloud disk image provides a magic udev rule, or $something else. I just
> don't think that the QEMU guest agent should be involved, as that will
> definitely trample all over other things that manage networking in the
> guest.

OK, good, that's about the same level I was at.

> I could see this being solved in the cloud disk images by using
> cloud-init metadata to mark the NICs as being in a set, or perhaps there
> is some magic you could define in SMBIOS tables, or something else again.
> A cloud-init based solution wouldn't need any QEMU work, but an SMBIOS
> solution might.

Would either of these work with hotplug though?   I guess as the VM starts
off with the pair of NICs, then when you remove one and add it back after
migration then you don't need any more information added; so yes
cloud-init or SMBIOS would do it.  (I was thinking SMBIOS stuff
in the way that you get device/slot numbering that NIC naming is sometimes based
off).

What about if we hot-add a new NIC later on (not during migration);
a normal hot-add of a NIC now turns into a hot-add of two new NICs; how
do we pass the information at hot-add time to provide that?

Dave

> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-22 17:12       ` Dr. David Alan Gilbert
@ 2015-04-22 17:15         ` Daniel P. Berrange
  2015-04-22 17:20           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrange @ 2015-04-22 17:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Chen Fan, libvir-list, qemu-devel, izumi.taku

On Wed, Apr 22, 2015 at 06:12:25PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Wed, Apr 22, 2015 at 06:01:56PM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > > > backgrond:
> > > > > Live migration is one of the most important features of virtualization technology.
> > > > > With regard to recent virtualization techniques, performance of network I/O is critical.
> > > > > Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > > > performance gap with native network I/O. Pass-through network devices have near
> > > > > native performance, however, they have thus far prevented live migration. No existing
> > > > > methods solve the problem of live migration with pass-through devices perfectly.
> > > > > 
> > > > > There was an idea to solve the problem in website:
> > > > > https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > > > Please refer to above document for detailed information.
> > > > > 
> > > > > So I think this problem maybe could be solved by using the combination of existing
> > > > > technology. and the following steps are we considering to implement:
> > > > > 
> > > > > -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > > >    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > > >    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > > > 
> > > > > -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > > >    then libvirt will call the previous registered initialize callbacks. so through
> > > > >    the callback functions, we can create the bonding device according to the XML
> > > > >    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > > >    easily.
> > > > 
> > > > I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > > I think configuration of networking is really something that must be left to
> > > > the guest OS admin to control. I don't think the guest agent should be trying
> > > > to reconfigure guest networking itself, as that is inevitably going to conflict
> > > > with configuration attempted by things in the guest like NetworkManager or
> > > > systemd-networkd.
> > > > 
> > > > IOW, if you want to do this setup where the guest is given multiple NICs connected
> > > > to the same host LAN, then I think we should just let the gues admin configure
> > > > bonding in whatever manner they decide is best for their OS install.
> > > 
> > > I disagree; there should be a way for the admin not to have to do this manually;
> > > however it should interact well with existing management stuff.
> > > 
> > > At the simplest, something that marks the two NICs in a discoverable way
> > > so that they can be seen that they're part of a set;  with just that ID system
> > > then an installer or setup tool can notice them and offer to put them into
> > > a bond automatically; I'd assume it would be possible to add a rule somewhere
> > > that said anything with the same ID would automatically be added to the bond.
> > 
> > I didn't mean the admin would literally configure stuff manually. I really
> > just meant that the guest OS itself should decide how it is done, whether
> > NetworkManager magically does the right thing, or the person building the
> > cloud disk image provides a magic udev rule, or $something else. I just
> > don't think that the QEMU guest agent should be involved, as that will
> > definitely trample all over other things that manage networking in the
> > guest.
> 
> OK, good, that's about the same level I was at.
> 
> > I could see this being solved in the cloud disk images by using
> > cloud-init metadata to mark the NICs as being in a set, or perhaps there
> > is some magic you could define in SMBIOS tables, or something else again.
> > A cloud-init based solution wouldn't need any QEMU work, but an SMBIOS
> > solution might.
> 
> Would either of these work with hotplug though?   I guess as the VM starts
> off with the pair of NICs, then when you remove one and add it back after
> migration then you don't need any more information added; so yes
> cloud-init or SMBIOS would do it.  (I was thinking SMBIOS stuff
> in the way that you get device/slot numbering that NIC naming is sometimes based
> off).
>
> What about if we hot-add a new NIC later on (not during migration);
> a normal hot-add of a NIC now turns into a hot-add of two new NICs; how
> do we pass the information at hot-add time to provide that?

Hmm, yes, actually hotplug would be a problem with that.

A even simpler idea would be to just keep things real dumb and simply
use the same MAC address for both NICs. Once you put them in a bond
device, the kernel will be copying the MAC address of the first NIC
into the second NIC anyway, so unless I'm missing something, we might
as well just use the same MAC address for both right away. That makes
it easy for guest to discover NICs in the same set and works with
hotplug trivially.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-22 17:15         ` Daniel P. Berrange
@ 2015-04-22 17:20           ` Dr. David Alan Gilbert
  2015-04-23 16:35             ` [Qemu-devel] [libvirt] " Laine Stump
  0 siblings, 1 reply; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-04-22 17:20 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Chen Fan, libvir-list, qemu-devel, izumi.taku

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Wed, Apr 22, 2015 at 06:12:25PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > On Wed, Apr 22, 2015 at 06:01:56PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > > On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > > > > backgrond:
> > > > > > Live migration is one of the most important features of virtualization technology.
> > > > > > With regard to recent virtualization techniques, performance of network I/O is critical.
> > > > > > Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > > > > performance gap with native network I/O. Pass-through network devices have near
> > > > > > native performance, however, they have thus far prevented live migration. No existing
> > > > > > methods solve the problem of live migration with pass-through devices perfectly.
> > > > > > 
> > > > > > There was an idea to solve the problem in website:
> > > > > > https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > > > > Please refer to above document for detailed information.
> > > > > > 
> > > > > > So I think this problem maybe could be solved by using the combination of existing
> > > > > > technology. and the following steps are we considering to implement:
> > > > > > 
> > > > > > -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > > > >    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > > > >    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > > > > 
> > > > > > -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > > > >    then libvirt will call the previous registered initialize callbacks. so through
> > > > > >    the callback functions, we can create the bonding device according to the XML
> > > > > >    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > > > >    easily.
> > > > > 
> > > > > I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > > > I think configuration of networking is really something that must be left to
> > > > > the guest OS admin to control. I don't think the guest agent should be trying
> > > > > to reconfigure guest networking itself, as that is inevitably going to conflict
> > > > > with configuration attempted by things in the guest like NetworkManager or
> > > > > systemd-networkd.
> > > > > 
> > > > > IOW, if you want to do this setup where the guest is given multiple NICs connected
> > > > > to the same host LAN, then I think we should just let the gues admin configure
> > > > > bonding in whatever manner they decide is best for their OS install.
> > > > 
> > > > I disagree; there should be a way for the admin not to have to do this manually;
> > > > however it should interact well with existing management stuff.
> > > > 
> > > > At the simplest, something that marks the two NICs in a discoverable way
> > > > so that they can be seen that they're part of a set;  with just that ID system
> > > > then an installer or setup tool can notice them and offer to put them into
> > > > a bond automatically; I'd assume it would be possible to add a rule somewhere
> > > > that said anything with the same ID would automatically be added to the bond.
> > > 
> > > I didn't mean the admin would literally configure stuff manually. I really
> > > just meant that the guest OS itself should decide how it is done, whether
> > > NetworkManager magically does the right thing, or the person building the
> > > cloud disk image provides a magic udev rule, or $something else. I just
> > > don't think that the QEMU guest agent should be involved, as that will
> > > definitely trample all over other things that manage networking in the
> > > guest.
> > 
> > OK, good, that's about the same level I was at.
> > 
> > > I could see this being solved in the cloud disk images by using
> > > cloud-init metadata to mark the NICs as being in a set, or perhaps there
> > > is some magic you could define in SMBIOS tables, or something else again.
> > > A cloud-init based solution wouldn't need any QEMU work, but an SMBIOS
> > > solution might.
> > 
> > Would either of these work with hotplug though?   I guess as the VM starts
> > off with the pair of NICs, then when you remove one and add it back after
> > migration then you don't need any more information added; so yes
> > cloud-init or SMBIOS would do it.  (I was thinking SMBIOS stuff
> > in the way that you get device/slot numbering that NIC naming is sometimes based
> > off).
> >
> > What about if we hot-add a new NIC later on (not during migration);
> > a normal hot-add of a NIC now turns into a hot-add of two new NICs; how
> > do we pass the information at hot-add time to provide that?
> 
> Hmm, yes, actually hotplug would be a problem with that.
> 
> A even simpler idea would be to just keep things real dumb and simply
> use the same MAC address for both NICs. Once you put them in a bond
> device, the kernel will be copying the MAC address of the first NIC
> into the second NIC anyway, so unless I'm missing something, we might
> as well just use the same MAC address for both right away. That makes
> it easy for guest to discover NICs in the same set and works with
> hotplug trivially.

I bet you need to distinguish the two NICs though; you'd want the bond
to send all the traffic through the real NIC during normal use;
and how does the guest know when it sees the hotplug of the 1st NIC in the pair
that this is a special NIC that it's about to see it's sibbling arrive.

Dave

> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-19 22:29 ` [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal Laine Stump
  2015-04-22  4:22   ` Chen Fan
@ 2015-04-23  8:34   ` Chen Fan
  2015-04-23 15:01     ` Laine Stump
  1 sibling, 1 reply; 45+ messages in thread
From: Chen Fan @ 2015-04-23  8:34 UTC (permalink / raw)
  To: Laine Stump, libvir-list; +Cc: izumi.taku, qemu-devel


On 04/20/2015 06:29 AM, Laine Stump wrote:
> On 04/17/2015 04:53 AM, Chen Fan wrote:
>> backgrond:
>> Live migration is one of the most important features of virtualization technology.
>> With regard to recent virtualization techniques, performance of network I/O is critical.
>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
>> performance gap with native network I/O. Pass-through network devices have near
>> native performance, however, they have thus far prevented live migration. No existing
>> methods solve the problem of live migration with pass-through devices perfectly.
>>
>> There was an idea to solve the problem in website:
>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
>> Please refer to above document for detailed information.
> This functionality has been on my mind/bug list for a long time, but I
> haven't been able to pursue it much. See this BZ, along with the
> original patches submitted by Shradha Shah from SolarFlare:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=896716
>
> (I was a bit optimistic in my initial review of the patches - there are
> actually a lot of issues that weren't handled by those patches.)
>
>> So I think this problem maybe could be solved by using the combination of existing
>> technology. and the following steps are we considering to implement:
>>
>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
>>     (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
>>     in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> An interesting idea, but I think that is a 2nd level enhancement, not
> necessary initially (and maybe not ever, due to the high possibility of
> it being extremely difficult to get right in 100% of the cases).
>
>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
>>     then libvirt will call the previous registered initialize callbacks. so through
>>     the callback functions, we can create the bonding device according to the XML
>>     configuration. and here we use netcf tool which can facilitate to create bonding device
>>     easily.
> This isn't quite making sense - the bond will be on the guest, which may
> not have netcf installed. Anyway, I think it should be up to the guest's
> own system network config to have the bond already setup. If you try to
> impose it from outside that infrastructure, you run too much risk of
> running afoul of something on the guest (e.g. NetworkManager)
>
>
>> -  during migration, unplug the passthroughed NIC. then do native migration.
> Correct. This is the most important part. But not just unplugging it,
> you also need to wait until the unplug operation completes (it is
> asynchronous). (After this point, the emulated NIC that is part of the
> bond would get all of the traffic).
>
>> -  on destination side, check whether need to hotplug new NIC according to specified XML.
>>     usually, we use migrate "--xml" command option to specify the destination host NIC mac
>>     address to hotplug a new NIC, because source side passthrough NIC mac address is different,
>>     then hotplug the deivce according to the destination XML configuration.
> Why does the MAC address need to be different? Are you suggesting doing
> this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
> its MAC address from the libvirt config, so it's very simple to use the
> same MAC address across the migration. Any network card that would be
> able to do this on any sort of useful scale will be SRIOV-capable (or
> should be replaced with one that is - some of them are not that expensive).
Hi Laine,

I think SRIOV virtual NIC to support migration is good idea,
but I also think some passthrough NIC without SRIOV-capable. for
these NIC devices we only able to use <hostdev> to specify the passthrough
function, so for these NIC I think we should support too.

Thanks,
Chen

>
>
>> TODO:
>>    1.  when hot add a new NIC in destination side after migration finished, the NIC device
>>        need to re-enslave on bonding device in guest. otherwise, it is offline. maybe
>>        we should consider bonding driver to support add interfaces dynamically.
> I never looked at the details of how SolarFlare's code handled the guest
> side (they have/had their own patchset they maintained for some older
> version of libvirt which integrated with some sort of enhanced bonding
> driver on the guests). I assumed the bond driver could handle this
> already, but have to say I never investigated.
>
>
>> This is an example on how this might work, so I want to hear some voices about this scenario.
>>
>> Thanks,
>> Chen
>>
>> Chen Fan (7):
>>    qemu-agent: add agent init callback when detecting guest setup
>>    qemu: add guest init event callback to do the initialize work for
>>      guest
>>    hostdev: add a 'bond' type element in <hostdev> element
>
> Putting this into <hostdev> is the wrong approach, for two reasons: 1)
> it doesn't account for the device to be used being in a different
> address on the source and destination hosts, 2) the <interface> element
> already has much of the config you need, and an interface type
> supporting hostdev passthrough.
>
> It has been possible to do passthrough of an SRIOV VF via <interface
> type='hostdev'> for a long time now and, even better, via an <interface
> type='network'> where the network pointed to contains a pool of VFs - As
> long as the source and destination hosts both have networks with the
> same name, libvirt will be able to find a currently available device on
> the destination as it migrates from one host to another instead of
> relying on both hosts having the exact same device at the exact same
> address on the host and destination (and also magically unused by any
> other guest). This page explains the use of a "hostdev network" which
> has a pool of devices:
>
> http://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition
>
> This was designed specifically with the idea in mind that one day it
> would be possible to migrate a domain with a hostdev device (as long as
> the guest could handle the hostdev device being temporarily unplugged
> during the migration).
>
>>    qemu-agent: add qemuAgentCreateBond interface
>>    hostdev: add parse ip and route for bond configure
> Again, I think that this level of detail about the guest network config
> belongs on the guest, not in libvirt.
>
>>    migrate: hot remove hostdev at perform phase for bond device
> ^^ this is the useful part but I don't think the right method is to make
> this action dependent on the device being a "bond".
>
> I think that in this respect Shradha's patches had a better idea - any
> hostdev (or, by implication <interface type='hostdev'> or, much more
> usefully <interface type='network'> pointing to a pool of VFs - could
> have an attribute "ephemeral". If ephemeral was "yes", then the device
> would always be unplugged prior to migration and re-plugged when
> migration was completed (the same thing should be done when
> saving/restoring a domain which also can't currently be done with a
> domain that has a passthrough device).
>
> For that matter, this could be a general-purpose thing (although
> probably most useful for hostdevs) - just make it possible for *any*
> hotpluggable device to be "ephemeral"; the meaning of this would be that
> every device marked as ephemeral should be unplugged prior to migration
> or save (and libvirt should wait for qemu to notify that the unplug is
> completed), and re-plugged right after the guest is restarted.
>
> (possibly it should be implemented as an <ephemeral> *element* rather
> than attribute, so that options could be specified).
>
> After that is implemented and works properly, then it might be the time
> to think about auto-creating the bond (although again, my opinion is
> that this is getting a bit too intrusive into the guest (and making it
> more likely to fail - I know from long experience with netcf that it is
> all too easy for some other service on the system (ahem) to mess up all
> your hard work); I think it would be better to just let the guest deal
> with setting up a bond in its system network config, and if the bond
> driver can't handle having a device in the bond unplugging and plugging,
> then the bond driver should be enhanced).
>
>
>>    migrate: add hostdev migrate status to support hostdev migration
>>
>>   docs/schemas/basictypes.rng   |   6 ++
>>   docs/schemas/domaincommon.rng |  37 ++++++++
>>   src/conf/domain_conf.c        | 195 ++++++++++++++++++++++++++++++++++++++---
>>   src/conf/domain_conf.h        |  40 +++++++--
>>   src/conf/networkcommon_conf.c |  17 ----
>>   src/conf/networkcommon_conf.h |  17 ++++
>>   src/libvirt_private.syms      |   1 +
>>   src/qemu/qemu_agent.c         | 196 +++++++++++++++++++++++++++++++++++++++++-
>>   src/qemu/qemu_agent.h         |  12 +++
>>   src/qemu/qemu_command.c       |   3 +
>>   src/qemu/qemu_domain.c        |  70 +++++++++++++++
>>   src/qemu/qemu_domain.h        |  14 +++
>>   src/qemu/qemu_driver.c        |  38 ++++++++
>>   src/qemu/qemu_hotplug.c       |   8 +-
>>   src/qemu/qemu_migration.c     |  91 ++++++++++++++++++++
>>   src/qemu/qemu_migration.h     |   4 +
>>   src/qemu/qemu_process.c       |  32 +++++++
>>   src/util/virhostdev.c         |   3 +
>>   18 files changed, 745 insertions(+), 39 deletions(-)
>>
> .
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-22  4:22   ` Chen Fan
@ 2015-04-23 14:14     ` Laine Stump
  0 siblings, 0 replies; 45+ messages in thread
From: Laine Stump @ 2015-04-23 14:14 UTC (permalink / raw)
  To: Laine Stump, libvir-list; +Cc: Chen Fan, qemu-devel

On 04/22/2015 12:22 AM, Chen Fan wrote:
> Hi Laine,
>
> Thanks for your review for my patches.
>
> and do you know that solarflare's patches have made some update version
> since
>
> https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html
>
> ?
>
> if not, I hope to go on to complete this work. ;)
>

I haven't heard of any updates. Their priorities may have changed.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-23  8:34   ` Chen Fan
@ 2015-04-23 15:01     ` Laine Stump
  2015-05-19  9:10       ` Michael S. Tsirkin
  0 siblings, 1 reply; 45+ messages in thread
From: Laine Stump @ 2015-04-23 15:01 UTC (permalink / raw)
  To: libvir-list; +Cc: Chen Fan, qemu-devel

On 04/23/2015 04:34 AM, Chen Fan wrote:
>
> On 04/20/2015 06:29 AM, Laine Stump wrote:
>> On 04/17/2015 04:53 AM, Chen Fan wrote:
>>> -  on destination side, check whether need to hotplug new NIC
>>> according to specified XML.
>>>     usually, we use migrate "--xml" command option to specify the
>>> destination host NIC mac
>>>     address to hotplug a new NIC, because source side passthrough
>>> NIC mac address is different,
>>>     then hotplug the deivce according to the destination XML
>>> configuration.

>> Why does the MAC address need to be different? Are you suggesting doing
>> this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
>> its MAC address from the libvirt config, so it's very simple to use the
>> same MAC address across the migration. Any network card that would be
>> able to do this on any sort of useful scale will be SRIOV-capable (or
>> should be replaced with one that is - some of them are not that
>> expensive).

> Hi Laine,
>
> I think SRIOV virtual NIC to support migration is good idea,
> but I also think some passthrough NIC without SRIOV-capable. for
> these NIC devices we only able to use <hostdev> to specify the
> passthrough
> function, so for these NIC I think we should support too.

As I think you've already discovered, passing through non-SRIOV NICS is
problematic. It is completely impossible for the host to change their
MAC address before assigning them to the guest - the guest's driver sees
standard netdev hardware and resets it, which resets the MAC address to
the original value burned into the firmware. This makes management more
complicated, especially when you get into scenarios such as what we're
discussing (i.e. migration) where the actual hardware (and thus MAC
address) may be different from one run to the next.

Since libvirt's <interface> element requires a fixed MAC address in the
XML, it's not possible to have an <interface> that gets the actual
device from a network pool (without some serious hacking to that code),
and there is no support for plain (non-network) <hostdev> device pools;
there would need to be a separate (nonexistent) driver for that. Since
the <hostdev> element relies on the PCI address of the device (in the
<source> subelement, which also must be fixed) to determine which device
to passthrough, a domain config with a <hostdev> that could be run on
two different machines would require the device to reside at exactly the
same PCI address on both machines, which is a very serious limitation to
have in an environment large enough that migrating domains is a requirement.

Also, non-SRIOV NICs are limited to a single device per physical port,
meaning probably at most 4 devices per physical host PCIe slot, and this
results in a greatly reduced density on the host (and even more so on
the switch that connects to the host!) compared to even the old Intel
82576 cards, which have 14 VFs (7VFs x 2 ethernet ports). Think about it
- with an 82576, you can get 14 guests into 1 PCIe slot and 2 switch
ports, while the same number of guests with non-SRIOV would take 4 PCIe
slots and 14(!) switch ports. The difference is even more striking when
comparing to chips like the 82599 (64 VFs per port x 2), or a Mellanox
(also 64?) or SolarFlare (128?) card. And don't forget that, because you
don't have pools of devices to be automatically chosen from, that each
guest domain that will be migrated requires a reserved NIC on *every*
machine it will be migrated to (no other domain can be configured to use
that NIC, in order to avoid conflicts).

Of course you could complicate the software by adding a driver that
manages pools of generic hostdevs, and coordinates MAC address changes
with the guest (part of what you're suggesting), but all that extra
complexity not only takes a lot of time and effort to develop, it also
creates more code that needs to be maintained and tested for regressions
at each release.

The alternative is to just spend $130 per host for an 82576 or Intel
I350 card (these are the cheapest SRIOV options I'm aware of). When
compared to the total cost of any hardware installation large enough to
support migration and have performance requirements high enough that NIC
passthrough is needed, this is a trivial amount.

I guess the bottom line of all this is that (in my opinion, of course
:-) supporting useful migration of domains that used passed-through
non-SRIOV NICs would be an interesting experiment, but I don't see much
utility to it, other than "scratching an intellectual itch", and I'm
concerned that it would create more long term maintenance cost than it
was worth.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-22 17:20           ` Dr. David Alan Gilbert
@ 2015-04-23 16:35             ` Laine Stump
  2015-05-19  9:04               ` Michael S. Tsirkin
  0 siblings, 1 reply; 45+ messages in thread
From: Laine Stump @ 2015-04-23 16:35 UTC (permalink / raw)
  To: libvir-list; +Cc: Dr. David Alan Gilbert, qemu-devel

On 04/22/2015 01:20 PM, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
>> On Wed, Apr 22, 2015 at 06:12:25PM +0100, Dr. David Alan Gilbert wrote:
>>> * Daniel P. Berrange (berrange@redhat.com) wrote:
>>>> On Wed, Apr 22, 2015 at 06:01:56PM +0100, Dr. David Alan Gilbert wrote:
>>>>> * Daniel P. Berrange (berrange@redhat.com) wrote:
>>>>>> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
>>>>>>> backgrond:
>>>>>>> Live migration is one of the most important features of virtualization technology.
>>>>>>> With regard to recent virtualization techniques, performance of network I/O is critical.
>>>>>>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
>>>>>>> performance gap with native network I/O. Pass-through network devices have near
>>>>>>> native performance, however, they have thus far prevented live migration. No existing
>>>>>>> methods solve the problem of live migration with pass-through devices perfectly.
>>>>>>>
>>>>>>> There was an idea to solve the problem in website:
>>>>>>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
>>>>>>> Please refer to above document for detailed information.
>>>>>>>
>>>>>>> So I think this problem maybe could be solved by using the combination of existing
>>>>>>> technology. and the following steps are we considering to implement:
>>>>>>>
>>>>>>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
>>>>>>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
>>>>>>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
>>>>>>>
>>>>>>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
>>>>>>>    then libvirt will call the previous registered initialize callbacks. so through
>>>>>>>    the callback functions, we can create the bonding device according to the XML
>>>>>>>    configuration. and here we use netcf tool which can facilitate to create bonding device
>>>>>>>    easily.
>>>>>> I'm not really clear on why libvirt/guest agent needs to be involved in this.
>>>>>> I think configuration of networking is really something that must be left to
>>>>>> the guest OS admin to control. I don't think the guest agent should be trying
>>>>>> to reconfigure guest networking itself, as that is inevitably going to conflict
>>>>>> with configuration attempted by things in the guest like NetworkManager or
>>>>>> systemd-networkd.
>>>>>>
>>>>>> IOW, if you want to do this setup where the guest is given multiple NICs connected
>>>>>> to the same host LAN, then I think we should just let the gues admin configure
>>>>>> bonding in whatever manner they decide is best for their OS install.
>>>>> I disagree; there should be a way for the admin not to have to do this manually;
>>>>> however it should interact well with existing management stuff.
>>>>>
>>>>> At the simplest, something that marks the two NICs in a discoverable way
>>>>> so that they can be seen that they're part of a set;  with just that ID system
>>>>> then an installer or setup tool can notice them and offer to put them into
>>>>> a bond automatically; I'd assume it would be possible to add a rule somewhere
>>>>> that said anything with the same ID would automatically be added to the bond.
>>>> I didn't mean the admin would literally configure stuff manually. I really
>>>> just meant that the guest OS itself should decide how it is done, whether
>>>> NetworkManager magically does the right thing, or the person building the
>>>> cloud disk image provides a magic udev rule, or $something else. I just
>>>> don't think that the QEMU guest agent should be involved, as that will
>>>> definitely trample all over other things that manage networking in the
>>>> guest.
>>> OK, good, that's about the same level I was at.
>>>
>>>> I could see this being solved in the cloud disk images by using
>>>> cloud-init metadata to mark the NICs as being in a set, or perhaps there
>>>> is some magic you could define in SMBIOS tables, or something else again.
>>>> A cloud-init based solution wouldn't need any QEMU work, but an SMBIOS
>>>> solution might.
>>> Would either of these work with hotplug though?   I guess as the VM starts
>>> off with the pair of NICs, then when you remove one and add it back after
>>> migration then you don't need any more information added; so yes
>>> cloud-init or SMBIOS would do it.  (I was thinking SMBIOS stuff
>>> in the way that you get device/slot numbering that NIC naming is sometimes based
>>> off).
>>>
>>> What about if we hot-add a new NIC later on (not during migration);
>>> a normal hot-add of a NIC now turns into a hot-add of two new NICs; how
>>> do we pass the information at hot-add time to provide that?
>> Hmm, yes, actually hotplug would be a problem with that.
>>
>> A even simpler idea would be to just keep things real dumb and simply
>> use the same MAC address for both NICs. Once you put them in a bond
>> device, the kernel will be copying the MAC address of the first NIC
>> into the second NIC anyway, so unless I'm missing something, we might
>> as well just use the same MAC address for both right away. That makes
>> it easy for guest to discover NICs in the same set and works with
>> hotplug trivially.
> I bet you need to distinguish the two NICs though; you'd want the bond
> to send all the traffic through the real NIC during normal use;
> and how does the guest know when it sees the hotplug of the 1st NIC in the pair
> that this is a special NIC that it's about to see it's sibbling arrive.

Yeah, there needs to be *some way* for the guest OS to differentiate
between the emulated NIC (which will be operational all the time, but
only used during migration when the passed-through NIC is missing) and
the passed-through NIC (which should be preferred for all traffic when
it is present). The simplest method of differentiating would be for the
admin who configures it to know the MAC address. Another way could be
[some bit of magic I don't know how to do] that sets the bonding config
based on which driver is used for the NIC (the emulated NIC will almost
certainly be virtio, and the passed-through will be igbf, ixgbvf, or
similar).

A complicating factor with using MAC address to differentiate is that it
isn't possible for the guest to modify the MAC address of a
passed-through SRIOV VF - the only way that could be done would be for
the guest to notify the host, then the host could use an RTM_SETLINK
message sent for the PF+VF# to change the MAC address, otherwise it is
prohibited by the hardware.

Likewise (but at least tehcnically possible to solve with current
libvirt+qemu), the default configuration for a macvtap connection to an
emulated guest ethernet device (which is probably what the "backup"
device of the bond would be) doesn't pass any traffic once the guest has
changed the MAC address of the emulated device - qemu does send an
RX_FILTER_CHANGED event to libvirt, and if the interface's config has
trustGuestRxFilters='yes', then and only then libvirt will modify the
MAC address of the host side of the macvtap device.

Thinking about this more, it seems a bit problematic from a security
point of view to allow the guest to arbitrarily change its MAC addresses
just to support this, so maybe the requirement should be that the MAC
addresses be set to the same value, and the guest config required to
figure out which is the "preferred" and which is the "backup" by
examining the driver used for the device.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-23 16:35             ` [Qemu-devel] [libvirt] " Laine Stump
@ 2015-05-19  9:04               ` Michael S. Tsirkin
  0 siblings, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2015-05-19  9:04 UTC (permalink / raw)
  To: Laine Stump; +Cc: libvir-list, Dr. David Alan Gilbert, qemu-devel

On Thu, Apr 23, 2015 at 12:35:28PM -0400, Laine Stump wrote:
> On 04/22/2015 01:20 PM, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> >> On Wed, Apr 22, 2015 at 06:12:25PM +0100, Dr. David Alan Gilbert wrote:
> >>> * Daniel P. Berrange (berrange@redhat.com) wrote:
> >>>> On Wed, Apr 22, 2015 at 06:01:56PM +0100, Dr. David Alan Gilbert wrote:
> >>>>> * Daniel P. Berrange (berrange@redhat.com) wrote:
> >>>>>> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> >>>>>>> backgrond:
> >>>>>>> Live migration is one of the most important features of virtualization technology.
> >>>>>>> With regard to recent virtualization techniques, performance of network I/O is critical.
> >>>>>>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> >>>>>>> performance gap with native network I/O. Pass-through network devices have near
> >>>>>>> native performance, however, they have thus far prevented live migration. No existing
> >>>>>>> methods solve the problem of live migration with pass-through devices perfectly.
> >>>>>>>
> >>>>>>> There was an idea to solve the problem in website:
> >>>>>>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> >>>>>>> Please refer to above document for detailed information.
> >>>>>>>
> >>>>>>> So I think this problem maybe could be solved by using the combination of existing
> >>>>>>> technology. and the following steps are we considering to implement:
> >>>>>>>
> >>>>>>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> >>>>>>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> >>>>>>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> >>>>>>>
> >>>>>>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> >>>>>>>    then libvirt will call the previous registered initialize callbacks. so through
> >>>>>>>    the callback functions, we can create the bonding device according to the XML
> >>>>>>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> >>>>>>>    easily.
> >>>>>> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> >>>>>> I think configuration of networking is really something that must be left to
> >>>>>> the guest OS admin to control. I don't think the guest agent should be trying
> >>>>>> to reconfigure guest networking itself, as that is inevitably going to conflict
> >>>>>> with configuration attempted by things in the guest like NetworkManager or
> >>>>>> systemd-networkd.
> >>>>>>
> >>>>>> IOW, if you want to do this setup where the guest is given multiple NICs connected
> >>>>>> to the same host LAN, then I think we should just let the gues admin configure
> >>>>>> bonding in whatever manner they decide is best for their OS install.
> >>>>> I disagree; there should be a way for the admin not to have to do this manually;
> >>>>> however it should interact well with existing management stuff.
> >>>>>
> >>>>> At the simplest, something that marks the two NICs in a discoverable way
> >>>>> so that they can be seen that they're part of a set;  with just that ID system
> >>>>> then an installer or setup tool can notice them and offer to put them into
> >>>>> a bond automatically; I'd assume it would be possible to add a rule somewhere
> >>>>> that said anything with the same ID would automatically be added to the bond.
> >>>> I didn't mean the admin would literally configure stuff manually. I really
> >>>> just meant that the guest OS itself should decide how it is done, whether
> >>>> NetworkManager magically does the right thing, or the person building the
> >>>> cloud disk image provides a magic udev rule, or $something else. I just
> >>>> don't think that the QEMU guest agent should be involved, as that will
> >>>> definitely trample all over other things that manage networking in the
> >>>> guest.
> >>> OK, good, that's about the same level I was at.
> >>>
> >>>> I could see this being solved in the cloud disk images by using
> >>>> cloud-init metadata to mark the NICs as being in a set, or perhaps there
> >>>> is some magic you could define in SMBIOS tables, or something else again.
> >>>> A cloud-init based solution wouldn't need any QEMU work, but an SMBIOS
> >>>> solution might.
> >>> Would either of these work with hotplug though?   I guess as the VM starts
> >>> off with the pair of NICs, then when you remove one and add it back after
> >>> migration then you don't need any more information added; so yes
> >>> cloud-init or SMBIOS would do it.  (I was thinking SMBIOS stuff
> >>> in the way that you get device/slot numbering that NIC naming is sometimes based
> >>> off).
> >>>
> >>> What about if we hot-add a new NIC later on (not during migration);
> >>> a normal hot-add of a NIC now turns into a hot-add of two new NICs; how
> >>> do we pass the information at hot-add time to provide that?
> >> Hmm, yes, actually hotplug would be a problem with that.
> >>
> >> A even simpler idea would be to just keep things real dumb and simply
> >> use the same MAC address for both NICs. Once you put them in a bond
> >> device, the kernel will be copying the MAC address of the first NIC
> >> into the second NIC anyway, so unless I'm missing something, we might
> >> as well just use the same MAC address for both right away. That makes
> >> it easy for guest to discover NICs in the same set and works with
> >> hotplug trivially.
> > I bet you need to distinguish the two NICs though; you'd want the bond
> > to send all the traffic through the real NIC during normal use;
> > and how does the guest know when it sees the hotplug of the 1st NIC in the pair
> > that this is a special NIC that it's about to see it's sibbling arrive.
> 
> Yeah, there needs to be *some way* for the guest OS to differentiate
> between the emulated NIC (which will be operational all the time, but
> only used during migration when the passed-through NIC is missing) and
> the passed-through NIC (which should be preferred for all traffic when
> it is present). The simplest method of differentiating would be for the
> admin who configures it to know the MAC address. Another way could be
> [some bit of magic I don't know how to do] that sets the bonding config
> based on which driver is used for the NIC (the emulated NIC will almost
> certainly be virtio, and the passed-through will be igbf, ixgbvf, or
> similar).

Why not supply this information using the qemu ga?

> A complicating factor with using MAC address to differentiate is that it
> isn't possible for the guest to modify the MAC address of a
> passed-through SRIOV VF - the only way that could be done would be for
> the guest to notify the host, then the host could use an RTM_SETLINK
> message sent for the PF+VF# to change the MAC address, otherwise it is
> prohibited by the hardware.
> 
> Likewise (but at least tehcnically possible to solve with current
> libvirt+qemu), the default configuration for a macvtap connection to an
> emulated guest ethernet device (which is probably what the "backup"
> device of the bond would be) doesn't pass any traffic once the guest has
> changed the MAC address of the emulated device - qemu does send an
> RX_FILTER_CHANGED event to libvirt, and if the interface's config has
> trustGuestRxFilters='yes', then and only then libvirt will modify the
> MAC address of the host side of the macvtap device.
> 
> Thinking about this more, it seems a bit problematic from a security
> point of view to allow the guest to arbitrarily change its MAC addresses
> just to support this, so maybe the requirement should be that the MAC
> addresses be set to the same value, and the guest config required to
> figure out which is the "preferred" and which is the "backup" by
> examining the driver used for the device.

That's an unrelated question.  Some people want to allow changing
the MAC, some don't. Don't use MAC addresses to identify devices,
and the problem will go away.

-- 
MST

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-22  9:23 ` [Qemu-devel] " Daniel P. Berrange
  2015-04-22 13:05   ` Daniel P. Berrange
  2015-04-22 17:01   ` Dr. David Alan Gilbert
@ 2015-05-19  9:07   ` Michael S. Tsirkin
  2015-05-19 14:15     ` [Qemu-devel] [libvirt] " Laine Stump
  2 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2015-05-19  9:07 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Chen Fan, libvir-list, qemu-devel, izumi.taku

On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > backgrond:
> > Live migration is one of the most important features of virtualization technology.
> > With regard to recent virtualization techniques, performance of network I/O is critical.
> > Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > performance gap with native network I/O. Pass-through network devices have near
> > native performance, however, they have thus far prevented live migration. No existing
> > methods solve the problem of live migration with pass-through devices perfectly.
> > 
> > There was an idea to solve the problem in website:
> > https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > Please refer to above document for detailed information.
> > 
> > So I think this problem maybe could be solved by using the combination of existing
> > technology. and the following steps are we considering to implement:
> > 
> > -  before boot VM, we anticipate to specify two NICs for creating bonding device
> >    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> >    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > 
> > -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> >    then libvirt will call the previous registered initialize callbacks. so through
> >    the callback functions, we can create the bonding device according to the XML
> >    configuration. and here we use netcf tool which can facilitate to create bonding device
> >    easily.
> 
> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> I think configuration of networking is really something that must be left to
> the guest OS admin to control. I don't think the guest agent should be trying
> to reconfigure guest networking itself, as that is inevitably going to conflict
> with configuration attempted by things in the guest like NetworkManager or
> systemd-networkd.

There should not be a conflict.
guest agent should just give NM the information, and have  NM do
the right thing.

> IOW, if you want to do this setup where the guest is given multiple NICs connected
> to the same host LAN, then I think we should just let the gues admin configure
> bonding in whatever manner they decide is best for their OS install.
> 
> > -  during migration, unplug the passthroughed NIC. then do native migration.
> > 
> > -  on destination side, check whether need to hotplug new NIC according to specified XML.
> >    usually, we use migrate "--xml" command option to specify the destination host NIC mac
> >    address to hotplug a new NIC, because source side passthrough NIC mac address is different,
> >    then hotplug the deivce according to the destination XML configuration.
> 
> Regards,
> Daniel

Users are actually asking for this functionality.

Configuring everything manually is possible but error
prone. We probably should leave manual configuration
as an option for the 10% of people who want to tweak
guest networking config, but this does not mean we shouldn't
have it all work out of the box for 90% of people that
just want networking to go fast with no tweaks.




> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-04-23 15:01     ` Laine Stump
@ 2015-05-19  9:10       ` Michael S. Tsirkin
  0 siblings, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2015-05-19  9:10 UTC (permalink / raw)
  To: Laine Stump; +Cc: libvir-list, Chen Fan, qemu-devel

On Thu, Apr 23, 2015 at 11:01:44AM -0400, Laine Stump wrote:
> On 04/23/2015 04:34 AM, Chen Fan wrote:
> >
> > On 04/20/2015 06:29 AM, Laine Stump wrote:
> >> On 04/17/2015 04:53 AM, Chen Fan wrote:
> >>> -  on destination side, check whether need to hotplug new NIC
> >>> according to specified XML.
> >>>     usually, we use migrate "--xml" command option to specify the
> >>> destination host NIC mac
> >>>     address to hotplug a new NIC, because source side passthrough
> >>> NIC mac address is different,
> >>>     then hotplug the deivce according to the destination XML
> >>> configuration.
> 
> >> Why does the MAC address need to be different? Are you suggesting doing
> >> this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
> >> its MAC address from the libvirt config, so it's very simple to use the
> >> same MAC address across the migration. Any network card that would be
> >> able to do this on any sort of useful scale will be SRIOV-capable (or
> >> should be replaced with one that is - some of them are not that
> >> expensive).
> 
> > Hi Laine,
> >
> > I think SRIOV virtual NIC to support migration is good idea,
> > but I also think some passthrough NIC without SRIOV-capable. for
> > these NIC devices we only able to use <hostdev> to specify the
> > passthrough
> > function, so for these NIC I think we should support too.
> 
> As I think you've already discovered, passing through non-SRIOV NICS is
> problematic. It is completely impossible for the host to change their
> MAC address before assigning them to the guest - the guest's driver sees
> standard netdev hardware and resets it, which resets the MAC address to
> the original value burned into the firmware. This makes management more
> complicated, especially when you get into scenarios such as what we're
> discussing (i.e. migration) where the actual hardware (and thus MAC
> address) may be different from one run to the next.

Right, passing through PFs is also insecure.  Let's get
everything working fine with VFs first, worry about PFs later.


> Since libvirt's <interface> element requires a fixed MAC address in the
> XML, it's not possible to have an <interface> that gets the actual
> device from a network pool (without some serious hacking to that code),
> and there is no support for plain (non-network) <hostdev> device pools;
> there would need to be a separate (nonexistent) driver for that. Since
> the <hostdev> element relies on the PCI address of the device (in the
> <source> subelement, which also must be fixed) to determine which device
> to passthrough, a domain config with a <hostdev> that could be run on
> two different machines would require the device to reside at exactly the
> same PCI address on both machines, which is a very serious limitation to
> have in an environment large enough that migrating domains is a requirement.
> 
> Also, non-SRIOV NICs are limited to a single device per physical port,
> meaning probably at most 4 devices per physical host PCIe slot, and this
> results in a greatly reduced density on the host (and even more so on
> the switch that connects to the host!) compared to even the old Intel
> 82576 cards, which have 14 VFs (7VFs x 2 ethernet ports). Think about it
> - with an 82576, you can get 14 guests into 1 PCIe slot and 2 switch
> ports, while the same number of guests with non-SRIOV would take 4 PCIe
> slots and 14(!) switch ports. The difference is even more striking when
> comparing to chips like the 82599 (64 VFs per port x 2), or a Mellanox
> (also 64?) or SolarFlare (128?) card. And don't forget that, because you
> don't have pools of devices to be automatically chosen from, that each
> guest domain that will be migrated requires a reserved NIC on *every*
> machine it will be migrated to (no other domain can be configured to use
> that NIC, in order to avoid conflicts).
> 
> Of course you could complicate the software by adding a driver that
> manages pools of generic hostdevs, and coordinates MAC address changes
> with the guest (part of what you're suggesting), but all that extra
> complexity not only takes a lot of time and effort to develop, it also
> creates more code that needs to be maintained and tested for regressions
> at each release.
> 
> The alternative is to just spend $130 per host for an 82576 or Intel
> I350 card (these are the cheapest SRIOV options I'm aware of). When
> compared to the total cost of any hardware installation large enough to
> support migration and have performance requirements high enough that NIC
> passthrough is needed, this is a trivial amount.
> 
> I guess the bottom line of all this is that (in my opinion, of course
> :-) supporting useful migration of domains that used passed-through
> non-SRIOV NICs would be an interesting experiment, but I don't see much
> utility to it, other than "scratching an intellectual itch", and I'm
> concerned that it would create more long term maintenance cost than it
> was worth.

I'm not sure it has no utility but it's easy to agree that
VFs are more important, and focusing on this first is a good
idea.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface
  2015-04-17  8:53 ` [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface Chen Fan
@ 2015-05-19  9:13   ` Michael S. Tsirkin
  2015-05-29  7:37   ` Michal Privoznik
  1 sibling, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2015-05-19  9:13 UTC (permalink / raw)
  To: Chen Fan; +Cc: libvir-list, izumi.taku, qemu-devel

On Fri, Apr 17, 2015 at 04:53:06PM +0800, Chen Fan wrote:
> via initialize callback to create bond device.
> 
> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> ---
>  src/qemu/qemu_agent.c   | 118 ++++++++++++++++++++++++++++++++++++++++++++++++
>  src/qemu/qemu_agent.h   |  10 ++++
>  src/qemu/qemu_domain.c  |  70 ++++++++++++++++++++++++++++
>  src/qemu/qemu_domain.h  |   7 +++
>  src/qemu/qemu_process.c |   4 ++
>  5 files changed, 209 insertions(+)
> 
> diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
> index cee0f8b..b8eba01 100644
> --- a/src/qemu/qemu_agent.c
> +++ b/src/qemu/qemu_agent.c
> @@ -2169,3 +2169,121 @@ qemuAgentGetInterfaces(qemuAgentPtr mon,
>  
>      goto cleanup;
>  }
> +
> +static virDomainInterfacePtr
> +findInterfaceByMac(virDomainInterfacePtr *info,
> +                   size_t len,
> +                   const char *macstr)
> +{
> +    size_t i;
> +    bool found = false;
> +
> +    for (i = 0; i < len; i++) {
> +        if (info[i]->hwaddr &&
> +            STREQ(info[i]->hwaddr, macstr)) {
> +            found = true;
> +            break;
> +        }
> +    }
> +
> +    if (found) {
> +        return info[i];
> +    }
> +
> +    return NULL;
> +}
> +

I think PCI addresses are a better way to identify the devices
for this purpose. This will mean softmac doesn't break this
functionality.
See anything wrong with it?


> +/*
> + * qemuAgentSetInterface:
> + */
> +int
> +qemuAgentCreateBond(qemuAgentPtr mon,
> +                    virDomainHostdevSubsysPCIPtr pcisrc)
> +{
> +    int ret = -1;
> +    virJSONValuePtr cmd = NULL;
> +    virJSONValuePtr reply = NULL;
> +    size_t i;
> +    char macstr[VIR_MAC_STRING_BUFLEN];
> +    virDomainInterfacePtr *interfaceInfo = NULL;
> +    virDomainInterfacePtr interface;
> +    virJSONValuePtr new_interface = NULL;
> +    virJSONValuePtr subInterfaces = NULL;
> +    virJSONValuePtr subInterface = NULL;
> +    int len;
> +
> +    if (!(pcisrc->nmac || pcisrc->macs))
> +        return ret;
> +
> +    len = qemuAgentGetInterfaces(mon, &interfaceInfo);
> +    if (len < 0)
> +        return ret;
> +
> +    if (!(new_interface = virJSONValueNewObject()))
> +        goto cleanup;
> +
> +    if (virJSONValueObjectAppendString(new_interface, "type", "bond") < 0)
> +        goto cleanup;
> +
> +    if (virJSONValueObjectAppendString(new_interface, "name", "bond0") < 0)
> +        goto cleanup;
> +
> +    if (virJSONValueObjectAppendString(new_interface, "onboot", "onboot") < 0)
> +        goto cleanup;
> +
> +    if (!(subInterfaces = virJSONValueNewArray()))
> +        goto cleanup;
> +
> +    for (i = 0; i < pcisrc->nmac; i++) {
> +        virMacAddrFormat(&pcisrc->macs[i], macstr);
> +        interface = findInterfaceByMac(interfaceInfo, len, macstr);
> +        if (!interface) {
> +            goto cleanup;
> +        }
> +
> +        if (!(subInterface = virJSONValueNewObject()))
> +            goto cleanup;
> +
> +        if (virJSONValueObjectAppendString(subInterface, "name", interface->name) < 0)
> +            goto cleanup;
> +
> +        if (virJSONValueArrayAppend(subInterfaces, subInterface) < 0)
> +            goto cleanup;
> +
> +        subInterface = NULL;
> +    }
> +
> +    if (i && virJSONValueObjectAppend(new_interface, "subInterfaces", subInterfaces) < 0)
> +        goto cleanup;
> +
> +    cmd = qemuAgentMakeCommand("guest-network-set-interface",
> +                               "a:interface", new_interface,
> +                               NULL);
> +
> +    if (!cmd)
> +        goto cleanup;
> +
> +    subInterfaces = NULL;
> +    new_interface = NULL;
> +
> +    if (qemuAgentCommand(mon, cmd, &reply, true,
> +                         VIR_DOMAIN_QEMU_AGENT_COMMAND_BLOCK) < 0)
> +        goto cleanup;
> +
> +    if (virJSONValueObjectGetNumberInt(reply, "return", &ret) < 0) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("malformed return value"));
> +    }
> +
> + cleanup:
> +    virJSONValueFree(subInterfaces);
> +    virJSONValueFree(subInterface);
> +    virJSONValueFree(new_interface);
> +    virJSONValueFree(cmd);
> +    virJSONValueFree(reply);
> +    if (interfaceInfo)
> +        for (i = 0; i < len; i++)
> +            virDomainInterfaceFree(interfaceInfo[i]);
> +    VIR_FREE(interfaceInfo);
> +    return ret;
> +}
> diff --git a/src/qemu/qemu_agent.h b/src/qemu/qemu_agent.h
> index 42414a7..744cb0a 100644
> --- a/src/qemu/qemu_agent.h
> +++ b/src/qemu/qemu_agent.h
> @@ -97,6 +97,13 @@ struct _qemuAgentCPUInfo {
>      bool offlinable;    /* true if the CPU can be offlined */
>  };
>  
> +typedef struct _qemuAgentInterfaceInfo qemuAgentInterfaceInfo;
> +typedef qemuAgentInterfaceInfo *qemuAgentInterfaceInfoPtr;
> +struct _qemuAgentInterfaceInfo {
> +    char *name;
> +    char *hardware_address;
> +};
> +
>  int qemuAgentGetVCPUs(qemuAgentPtr mon, qemuAgentCPUInfoPtr *info);
>  int qemuAgentSetVCPUs(qemuAgentPtr mon, qemuAgentCPUInfoPtr cpus, size_t ncpus);
>  int qemuAgentUpdateCPUInfo(unsigned int nvcpus,
> @@ -114,4 +121,7 @@ int qemuAgentSetTime(qemuAgentPtr mon,
>  int qemuAgentGetInterfaces(qemuAgentPtr mon,
>                             virDomainInterfacePtr **ifaces);
>  
> +int qemuAgentCreateBond(qemuAgentPtr mon,
> +                        virDomainHostdevSubsysPCIPtr pcisrc);
> +
>  #endif /* __QEMU_AGENT_H__ */
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index 603360f..584fefb 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -2722,6 +2722,46 @@ qemuDomainCleanupRun(virQEMUDriverPtr driver,
>      priv->ncleanupCallbacks_max = 0;
>  }
>  
> +/*
> + * The vm must be locked when any of the following init functions is
> + * called.
> + */
> +int
> +qemuDomainInitAdd(virDomainObjPtr vm,
> +                  qemuDomainInitCallback cb)
> +{
> +    qemuDomainObjPrivatePtr priv = vm->privateData;
> +    size_t i;
> +
> +    VIR_DEBUG("vm=%s, cb=%p", vm->def->name, cb);
> +
> +    for (i = 0; i < priv->nInitCallbacks; i++) {
> +        if (priv->initCallbacks[i] == cb)
> +            return 0;
> +    }
> +
> +    if (VIR_RESIZE_N(priv->initCallbacks,
> +                     priv->nInitCallbacks_max,
> +                     priv->nInitCallbacks, 1) < 0)
> +        return -1;
> +
> +    priv->initCallbacks[priv->nInitCallbacks++] = cb;
> +    return 0;
> +}
> +
> +void
> +qemuDomainInitCleanup(virDomainObjPtr vm)
> +{
> +    qemuDomainObjPrivatePtr priv = vm->privateData;
> +
> +    VIR_DEBUG("vm=%s", vm->def->name);
> +
> +    VIR_FREE(priv->cleanupCallbacks);
> +    priv->ncleanupCallbacks = 0;
> +    priv->ncleanupCallbacks_max = 0;
> +}
> +
> +
>  static void
>  qemuDomainGetImageIds(virQEMUDriverConfigPtr cfg,
>                        virDomainObjPtr vm,
> @@ -3083,3 +3123,33 @@ qemuDomainSupportsBlockJobs(virDomainObjPtr vm,
>  
>      return 0;
>  }
> +
> +void
> +qemuDomainPrepareHostdevInit(virDomainObjPtr vm)
> +{
> +    qemuDomainObjPrivatePtr priv = vm->privateData;
> +    virDomainDefPtr def = vm->def;
> +    int i;
> +
> +    if (!def->nhostdevs)
> +        return;
> +
> +    if (!qemuDomainAgentAvailable(vm, false))
> +        return;
> +
> +    if (!virDomainObjIsActive(vm))
> +        return;
> +
> +    for (i = 0; i < def->nhostdevs; i++) {
> +        virDomainHostdevDefPtr hostdev = def->hostdevs[i];
> +        virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci;
> +
> +        if (hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
> +            hostdev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO &&
> +            hostdev->source.subsys.u.pci.device == VIR_DOMAIN_HOSTDEV_PCI_DEVICE_BOND) {
> +            qemuDomainObjEnterAgent(vm);
> +            qemuAgentCreateBond(priv->agent, pcisrc);
> +            qemuDomainObjExitAgent(vm);
> +        }
> +    }
> +}
> diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
> index 19f4b27..3244ca0 100644
> --- a/src/qemu/qemu_domain.h
> +++ b/src/qemu/qemu_domain.h
> @@ -403,6 +403,10 @@ void qemuDomainCleanupRemove(virDomainObjPtr vm,
>  void qemuDomainCleanupRun(virQEMUDriverPtr driver,
>                            virDomainObjPtr vm);
>  
> +int qemuDomainInitAdd(virDomainObjPtr vm,
> +                      qemuDomainInitCallback cb);
> +void qemuDomainInitCleanup(virDomainObjPtr vm);
> +
>  extern virDomainXMLPrivateDataCallbacks virQEMUDriverPrivateDataCallbacks;
>  extern virDomainXMLNamespace virQEMUDriverDomainXMLNamespace;
>  extern virDomainDefParserConfig virQEMUDriverDomainDefParserConfig;
> @@ -444,4 +448,7 @@ void qemuDomObjEndAPI(virDomainObjPtr *vm);
>  int qemuDomainAlignMemorySizes(virDomainDefPtr def);
>  void qemuDomainMemoryDeviceAlignSize(virDomainMemoryDefPtr mem);
>  
> +void
> +qemuDomainPrepareHostdevInit(virDomainObjPtr vm);
> +
>  #endif /* __QEMU_DOMAIN_H__ */
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index fcc0566..0a72aca 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -4444,6 +4444,9 @@ int qemuProcessStart(virConnectPtr conn,
>                                 hostdev_flags) < 0)
>          goto cleanup;
>  
> +    if (qemuDomainInitAdd(vm, qemuDomainPrepareHostdevInit))
> +        goto cleanup;
> +
>      VIR_DEBUG("Preparing chr devices");
>      if (virDomainChrDefForeach(vm->def,
>                                 true,
> @@ -5186,6 +5189,7 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>                                   VIR_QEMU_PROCESS_KILL_NOCHECK));
>  
>      qemuDomainCleanupRun(driver, vm);
> +    qemuDomainInitCleanup(vm);
>  
>      /* Stop autodestroy in case guest is restarted */
>      qemuProcessAutoDestroyRemove(driver, vm);
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19  9:07   ` [Qemu-devel] " Michael S. Tsirkin
@ 2015-05-19 14:15     ` Laine Stump
  2015-05-19 14:21       ` Daniel P. Berrange
  2015-05-19 15:14       ` Michael S. Tsirkin
  0 siblings, 2 replies; 45+ messages in thread
From: Laine Stump @ 2015-05-19 14:15 UTC (permalink / raw)
  To: libvir-list; +Cc: qemu-devel, Michael S. Tsirkin

On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
>> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
>>> backgrond:
>>> Live migration is one of the most important features of virtualization technology.
>>> With regard to recent virtualization techniques, performance of network I/O is critical.
>>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
>>> performance gap with native network I/O. Pass-through network devices have near
>>> native performance, however, they have thus far prevented live migration. No existing
>>> methods solve the problem of live migration with pass-through devices perfectly.
>>>
>>> There was an idea to solve the problem in website:
>>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
>>> Please refer to above document for detailed information.
>>>
>>> So I think this problem maybe could be solved by using the combination of existing
>>> technology. and the following steps are we considering to implement:
>>>
>>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
>>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
>>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
>>>
>>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
>>>    then libvirt will call the previous registered initialize callbacks. so through
>>>    the callback functions, we can create the bonding device according to the XML
>>>    configuration. and here we use netcf tool which can facilitate to create bonding device
>>>    easily.
>> I'm not really clear on why libvirt/guest agent needs to be involved in this.
>> I think configuration of networking is really something that must be left to
>> the guest OS admin to control. I don't think the guest agent should be trying
>> to reconfigure guest networking itself, as that is inevitably going to conflict
>> with configuration attempted by things in the guest like NetworkManager or
>> systemd-networkd.
> There should not be a conflict.
> guest agent should just give NM the information, and have  NM do
> the right thing.

That assumes the guest will have NM running. Unless you want to severely
limit the scope of usefulness, you also need to handle systems that have
NM disabled, and among those the different styles of system network
config. It gets messy very fast.

>
> Users are actually asking for this functionality.
>
> Configuring everything manually is possible but error
> prone.

Yes, but attempting to do it automatically is also error prone (due to
the myriad of different guest network config systems, even just within
the seemingly narrow category of "Linux guests"). Pick your poison :-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 14:15     ` [Qemu-devel] [libvirt] " Laine Stump
@ 2015-05-19 14:21       ` Daniel P. Berrange
  2015-05-19 15:03         ` Dr. David Alan Gilbert
  2015-05-19 15:21         ` Michael S. Tsirkin
  2015-05-19 15:14       ` Michael S. Tsirkin
  1 sibling, 2 replies; 45+ messages in thread
From: Daniel P. Berrange @ 2015-05-19 14:21 UTC (permalink / raw)
  To: Laine Stump; +Cc: libvir-list, qemu-devel, Michael S. Tsirkin

On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> >>> backgrond:
> >>> Live migration is one of the most important features of virtualization technology.
> >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> >>> performance gap with native network I/O. Pass-through network devices have near
> >>> native performance, however, they have thus far prevented live migration. No existing
> >>> methods solve the problem of live migration with pass-through devices perfectly.
> >>>
> >>> There was an idea to solve the problem in website:
> >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> >>> Please refer to above document for detailed information.
> >>>
> >>> So I think this problem maybe could be solved by using the combination of existing
> >>> technology. and the following steps are we considering to implement:
> >>>
> >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> >>>
> >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> >>>    then libvirt will call the previous registered initialize callbacks. so through
> >>>    the callback functions, we can create the bonding device according to the XML
> >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> >>>    easily.
> >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> >> I think configuration of networking is really something that must be left to
> >> the guest OS admin to control. I don't think the guest agent should be trying
> >> to reconfigure guest networking itself, as that is inevitably going to conflict
> >> with configuration attempted by things in the guest like NetworkManager or
> >> systemd-networkd.
> > There should not be a conflict.
> > guest agent should just give NM the information, and have  NM do
> > the right thing.
> 
> That assumes the guest will have NM running. Unless you want to severely
> limit the scope of usefulness, you also need to handle systems that have
> NM disabled, and among those the different styles of system network
> config. It gets messy very fast.

Also OpenStack already has a way to pass guest information about the
required network setup, via cloud-init, so it would not be interested
in any thing that used the QEMU guest agent to configure network
manager. Which is really just another example of why this does not
belong anywhere in libvirt or lower.  The decision to use NM is a
policy decision that will always be wrong for a non-negligble set
of use cases and as such does not belong in libvirt or QEMU. It is
the job of higher level apps to make that kind of policy decision.

> > Users are actually asking for this functionality.
> >
> > Configuring everything manually is possible but error
> > prone.
> 
> Yes, but attempting to do it automatically is also error prone (due to
> the myriad of different guest network config systems, even just within
> the seemingly narrow category of "Linux guests"). Pick your poison :-)

Also note I'm not debating the usefulness of the overall concept
or the need for automation. It simply doesn't belong in libvirt or
lower - it is a job for the higher level management applications to
define a policy for that fits in with the way they are managing the
virtual machines and the networking.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 14:21       ` Daniel P. Berrange
@ 2015-05-19 15:03         ` Dr. David Alan Gilbert
  2015-05-19 15:18           ` Michael S. Tsirkin
  2015-05-19 15:35           ` Daniel P. Berrange
  2015-05-19 15:21         ` Michael S. Tsirkin
  1 sibling, 2 replies; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-19 15:03 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: libvir-list, qemu-devel, Laine Stump, Michael S. Tsirkin

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > >>> backgrond:
> > >>> Live migration is one of the most important features of virtualization technology.
> > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > >>> performance gap with native network I/O. Pass-through network devices have near
> > >>> native performance, however, they have thus far prevented live migration. No existing
> > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > >>>
> > >>> There was an idea to solve the problem in website:
> > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > >>> Please refer to above document for detailed information.
> > >>>
> > >>> So I think this problem maybe could be solved by using the combination of existing
> > >>> technology. and the following steps are we considering to implement:
> > >>>
> > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > >>>
> > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > >>>    the callback functions, we can create the bonding device according to the XML
> > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > >>>    easily.
> > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > >> I think configuration of networking is really something that must be left to
> > >> the guest OS admin to control. I don't think the guest agent should be trying
> > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > >> with configuration attempted by things in the guest like NetworkManager or
> > >> systemd-networkd.
> > > There should not be a conflict.
> > > guest agent should just give NM the information, and have  NM do
> > > the right thing.
> > 
> > That assumes the guest will have NM running. Unless you want to severely
> > limit the scope of usefulness, you also need to handle systems that have
> > NM disabled, and among those the different styles of system network
> > config. It gets messy very fast.
> 
> Also OpenStack already has a way to pass guest information about the
> required network setup, via cloud-init, so it would not be interested
> in any thing that used the QEMU guest agent to configure network
> manager. Which is really just another example of why this does not
> belong anywhere in libvirt or lower.  The decision to use NM is a
> policy decision that will always be wrong for a non-negligble set
> of use cases and as such does not belong in libvirt or QEMU. It is
> the job of higher level apps to make that kind of policy decision.

This is exactly my worry though; why should every higher level management
system have it's own way of communicating network config for hotpluggable
devices.  You shoudln't need to reconfigure a VM to move it between them.

This just makes it hard to move it between management layers; there needs
to be some standardisation (or abstraction) of this;  if libvirt isn't the place
to do it, then what is?


Dave

> > > Users are actually asking for this functionality.
> > >
> > > Configuring everything manually is possible but error
> > > prone.
> > 
> > Yes, but attempting to do it automatically is also error prone (due to
> > the myriad of different guest network config systems, even just within
> > the seemingly narrow category of "Linux guests"). Pick your poison :-)
> 
> Also note I'm not debating the usefulness of the overall concept
> or the need for automation. It simply doesn't belong in libvirt or
> lower - it is a job for the higher level management applications to
> define a policy for that fits in with the way they are managing the
> virtual machines and the networking.
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 14:15     ` [Qemu-devel] [libvirt] " Laine Stump
  2015-05-19 14:21       ` Daniel P. Berrange
@ 2015-05-19 15:14       ` Michael S. Tsirkin
  1 sibling, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2015-05-19 15:14 UTC (permalink / raw)
  To: Laine Stump; +Cc: libvir-list, qemu-devel

On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> >>> backgrond:
> >>> Live migration is one of the most important features of virtualization technology.
> >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> >>> performance gap with native network I/O. Pass-through network devices have near
> >>> native performance, however, they have thus far prevented live migration. No existing
> >>> methods solve the problem of live migration with pass-through devices perfectly.
> >>>
> >>> There was an idea to solve the problem in website:
> >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> >>> Please refer to above document for detailed information.
> >>>
> >>> So I think this problem maybe could be solved by using the combination of existing
> >>> technology. and the following steps are we considering to implement:
> >>>
> >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> >>>
> >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> >>>    then libvirt will call the previous registered initialize callbacks. so through
> >>>    the callback functions, we can create the bonding device according to the XML
> >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> >>>    easily.
> >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> >> I think configuration of networking is really something that must be left to
> >> the guest OS admin to control. I don't think the guest agent should be trying
> >> to reconfigure guest networking itself, as that is inevitably going to conflict
> >> with configuration attempted by things in the guest like NetworkManager or
> >> systemd-networkd.
> > There should not be a conflict.
> > guest agent should just give NM the information, and have  NM do
> > the right thing.
> 
> That assumes the guest will have NM running. Unless you want to severely
> limit the scope of usefulness, you also need to handle systems that have
> NM disabled, and among those the different styles of system network
> config. It gets messy very fast.

Systems with system network config can just do the configuration
manually, they won't be worse off than they are now.

> >
> > Users are actually asking for this functionality.
> >
> > Configuring everything manually is possible but error
> > prone.
> 
> Yes, but attempting to do it automatically is also error prone (due to
> the myriad of different guest network config systems, even just within
> the seemingly narrow category of "Linux guests"). Pick your poison :-)

Make it work well for RHEL guests. Others will work with less integration.

-- 
MST

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 15:03         ` Dr. David Alan Gilbert
@ 2015-05-19 15:18           ` Michael S. Tsirkin
  2015-05-19 15:35           ` Daniel P. Berrange
  1 sibling, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2015-05-19 15:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: libvir-list, qemu-devel, Laine Stump

On Tue, May 19, 2015 at 04:03:04PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > >>> backgrond:
> > > >>> Live migration is one of the most important features of virtualization technology.
> > > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > >>> performance gap with native network I/O. Pass-through network devices have near
> > > >>> native performance, however, they have thus far prevented live migration. No existing
> > > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > > >>>
> > > >>> There was an idea to solve the problem in website:
> > > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > >>> Please refer to above document for detailed information.
> > > >>>
> > > >>> So I think this problem maybe could be solved by using the combination of existing
> > > >>> technology. and the following steps are we considering to implement:
> > > >>>
> > > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > >>>
> > > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > > >>>    the callback functions, we can create the bonding device according to the XML
> > > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > >>>    easily.
> > > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > >> I think configuration of networking is really something that must be left to
> > > >> the guest OS admin to control. I don't think the guest agent should be trying
> > > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > > >> with configuration attempted by things in the guest like NetworkManager or
> > > >> systemd-networkd.
> > > > There should not be a conflict.
> > > > guest agent should just give NM the information, and have  NM do
> > > > the right thing.
> > > 
> > > That assumes the guest will have NM running. Unless you want to severely
> > > limit the scope of usefulness, you also need to handle systems that have
> > > NM disabled, and among those the different styles of system network
> > > config. It gets messy very fast.
> > 
> > Also OpenStack already has a way to pass guest information about the
> > required network setup, via cloud-init, so it would not be interested
> > in any thing that used the QEMU guest agent to configure network
> > manager. Which is really just another example of why this does not
> > belong anywhere in libvirt or lower.  The decision to use NM is a
> > policy decision that will always be wrong for a non-negligble set
> > of use cases and as such does not belong in libvirt or QEMU. It is
> > the job of higher level apps to make that kind of policy decision.
> 
> This is exactly my worry though; why should every higher level management
> system have it's own way of communicating network config for hotpluggable
> devices.  You shoudln't need to reconfigure a VM to move it between them.
> 
> This just makes it hard to move it between management layers; there needs
> to be some standardisation (or abstraction) of this;  if libvirt isn't the place
> to do it, then what is?
> 
> 
> Dave

+1

> > > > Users are actually asking for this functionality.
> > > >
> > > > Configuring everything manually is possible but error
> > > > prone.
> > > 
> > > Yes, but attempting to do it automatically is also error prone (due to
> > > the myriad of different guest network config systems, even just within
> > > the seemingly narrow category of "Linux guests"). Pick your poison :-)
> > 
> > Also note I'm not debating the usefulness of the overall concept
> > or the need for automation. It simply doesn't belong in libvirt or
> > lower - it is a job for the higher level management applications to
> > define a policy for that fits in with the way they are managing the
> > virtual machines and the networking.
> > 
> > Regards,
> > Daniel
> > -- 
> > |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> > |: http://libvirt.org              -o-             http://virt-manager.org :|
> > |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> > |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 14:21       ` Daniel P. Berrange
  2015-05-19 15:03         ` Dr. David Alan Gilbert
@ 2015-05-19 15:21         ` Michael S. Tsirkin
  1 sibling, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2015-05-19 15:21 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: libvir-list, qemu-devel, Laine Stump

On Tue, May 19, 2015 at 03:21:49PM +0100, Daniel P. Berrange wrote:
> On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > >>> backgrond:
> > >>> Live migration is one of the most important features of virtualization technology.
> > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > >>> performance gap with native network I/O. Pass-through network devices have near
> > >>> native performance, however, they have thus far prevented live migration. No existing
> > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > >>>
> > >>> There was an idea to solve the problem in website:
> > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > >>> Please refer to above document for detailed information.
> > >>>
> > >>> So I think this problem maybe could be solved by using the combination of existing
> > >>> technology. and the following steps are we considering to implement:
> > >>>
> > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > >>>
> > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > >>>    the callback functions, we can create the bonding device according to the XML
> > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > >>>    easily.
> > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > >> I think configuration of networking is really something that must be left to
> > >> the guest OS admin to control. I don't think the guest agent should be trying
> > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > >> with configuration attempted by things in the guest like NetworkManager or
> > >> systemd-networkd.
> > > There should not be a conflict.
> > > guest agent should just give NM the information, and have  NM do
> > > the right thing.
> > 
> > That assumes the guest will have NM running. Unless you want to severely
> > limit the scope of usefulness, you also need to handle systems that have
> > NM disabled, and among those the different styles of system network
> > config. It gets messy very fast.
> 
> Also OpenStack already has a way to pass guest information about the
> required network setup, via cloud-init, so it would not be interested
> in any thing that used the QEMU guest agent to configure network
> manager. Which is really just another example of why this does not
> belong anywhere in libvirt or lower.  The decision to use NM is a
> policy decision that will always be wrong for a non-negligble set
> of use cases and as such does not belong in libvirt or QEMU. It is
> the job of higher level apps to make that kind of policy decision.

Using NM is up to users. On some of my VMs, I bring up links manually
after each boot.  We can provide the into to guest, and teach NM use
that.  If someone will write bash scripts to use this info, that's also
fine.

> > > Users are actually asking for this functionality.
> > >
> > > Configuring everything manually is possible but error
> > > prone.
> > 
> > Yes, but attempting to do it automatically is also error prone (due to
> > the myriad of different guest network config systems, even just within
> > the seemingly narrow category of "Linux guests"). Pick your poison :-)
> 
> Also note I'm not debating the usefulness of the overall concept
> or the need for automation. It simply doesn't belong in libvirt or
> lower - it is a job for the higher level management applications to
> define a policy for that fits in with the way they are managing the
> virtual machines and the networking.
> 
> Regards,
> Daniel

Users are asking for this automation, so it's useful to them. We can
always tell them no. Saying no because we seem unable to be able to
decide where this useful functionality fits does not look like a good
reason.

> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 15:03         ` Dr. David Alan Gilbert
  2015-05-19 15:18           ` Michael S. Tsirkin
@ 2015-05-19 15:35           ` Daniel P. Berrange
  2015-05-19 15:39             ` Michael S. Tsirkin
  1 sibling, 1 reply; 45+ messages in thread
From: Daniel P. Berrange @ 2015-05-19 15:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: libvir-list, qemu-devel, Laine Stump, Michael S. Tsirkin

On Tue, May 19, 2015 at 04:03:04PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > >>> backgrond:
> > > >>> Live migration is one of the most important features of virtualization technology.
> > > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > >>> performance gap with native network I/O. Pass-through network devices have near
> > > >>> native performance, however, they have thus far prevented live migration. No existing
> > > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > > >>>
> > > >>> There was an idea to solve the problem in website:
> > > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > >>> Please refer to above document for detailed information.
> > > >>>
> > > >>> So I think this problem maybe could be solved by using the combination of existing
> > > >>> technology. and the following steps are we considering to implement:
> > > >>>
> > > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > >>>
> > > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > > >>>    the callback functions, we can create the bonding device according to the XML
> > > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > >>>    easily.
> > > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > >> I think configuration of networking is really something that must be left to
> > > >> the guest OS admin to control. I don't think the guest agent should be trying
> > > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > > >> with configuration attempted by things in the guest like NetworkManager or
> > > >> systemd-networkd.
> > > > There should not be a conflict.
> > > > guest agent should just give NM the information, and have  NM do
> > > > the right thing.
> > > 
> > > That assumes the guest will have NM running. Unless you want to severely
> > > limit the scope of usefulness, you also need to handle systems that have
> > > NM disabled, and among those the different styles of system network
> > > config. It gets messy very fast.
> > 
> > Also OpenStack already has a way to pass guest information about the
> > required network setup, via cloud-init, so it would not be interested
> > in any thing that used the QEMU guest agent to configure network
> > manager. Which is really just another example of why this does not
> > belong anywhere in libvirt or lower.  The decision to use NM is a
> > policy decision that will always be wrong for a non-negligble set
> > of use cases and as such does not belong in libvirt or QEMU. It is
> > the job of higher level apps to make that kind of policy decision.
> 
> This is exactly my worry though; why should every higher level management
> system have it's own way of communicating network config for hotpluggable
> devices.  You shoudln't need to reconfigure a VM to move it between them.
> 
> This just makes it hard to move it between management layers; there needs
> to be some standardisation (or abstraction) of this;  if libvirt isn't the place
> to do it, then what is?

NB, openstack isn't really defining a custom thing for networking here. It
is actually integrating with the standard cloud-init guest tools for this
task. Also note that OpenStack has defined a mechanism that works for
guest images regardless of what hypervisor they are running on - ie does
not rely on any QEMU or libvirt specific functionality here.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 15:35           ` Daniel P. Berrange
@ 2015-05-19 15:39             ` Michael S. Tsirkin
  2015-05-19 15:45               ` Daniel P. Berrange
  0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2015-05-19 15:39 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: libvir-list, Dr. David Alan Gilbert, Laine Stump, qemu-devel

On Tue, May 19, 2015 at 04:35:08PM +0100, Daniel P. Berrange wrote:
> On Tue, May 19, 2015 at 04:03:04PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > > > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > > > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > > >>> backgrond:
> > > > >>> Live migration is one of the most important features of virtualization technology.
> > > > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > > > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > > >>> performance gap with native network I/O. Pass-through network devices have near
> > > > >>> native performance, however, they have thus far prevented live migration. No existing
> > > > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > > > >>>
> > > > >>> There was an idea to solve the problem in website:
> > > > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > > >>> Please refer to above document for detailed information.
> > > > >>>
> > > > >>> So I think this problem maybe could be solved by using the combination of existing
> > > > >>> technology. and the following steps are we considering to implement:
> > > > >>>
> > > > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > > >>>
> > > > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > > > >>>    the callback functions, we can create the bonding device according to the XML
> > > > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > > >>>    easily.
> > > > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > > >> I think configuration of networking is really something that must be left to
> > > > >> the guest OS admin to control. I don't think the guest agent should be trying
> > > > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > > > >> with configuration attempted by things in the guest like NetworkManager or
> > > > >> systemd-networkd.
> > > > > There should not be a conflict.
> > > > > guest agent should just give NM the information, and have  NM do
> > > > > the right thing.
> > > > 
> > > > That assumes the guest will have NM running. Unless you want to severely
> > > > limit the scope of usefulness, you also need to handle systems that have
> > > > NM disabled, and among those the different styles of system network
> > > > config. It gets messy very fast.
> > > 
> > > Also OpenStack already has a way to pass guest information about the
> > > required network setup, via cloud-init, so it would not be interested
> > > in any thing that used the QEMU guest agent to configure network
> > > manager. Which is really just another example of why this does not
> > > belong anywhere in libvirt or lower.  The decision to use NM is a
> > > policy decision that will always be wrong for a non-negligble set
> > > of use cases and as such does not belong in libvirt or QEMU. It is
> > > the job of higher level apps to make that kind of policy decision.
> > 
> > This is exactly my worry though; why should every higher level management
> > system have it's own way of communicating network config for hotpluggable
> > devices.  You shoudln't need to reconfigure a VM to move it between them.
> > 
> > This just makes it hard to move it between management layers; there needs
> > to be some standardisation (or abstraction) of this;  if libvirt isn't the place
> > to do it, then what is?
> 
> NB, openstack isn't really defining a custom thing for networking here. It
> is actually integrating with the standard cloud-init guest tools for this
> task. Also note that OpenStack has defined a mechanism that works for
> guest images regardless of what hypervisor they are running on - ie does
> not rely on any QEMU or libvirt specific functionality here.
> 
> Regards,
> Daniel

I'm not sure what the implication is.  No new functionality should be
implemented unless we also add it to vmware?  People that don't want kvm
specific functionality, won't use it.

> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 15:39             ` Michael S. Tsirkin
@ 2015-05-19 15:45               ` Daniel P. Berrange
  2015-05-19 16:08                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrange @ 2015-05-19 15:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: libvir-list, Dr. David Alan Gilbert, Laine Stump, qemu-devel

On Tue, May 19, 2015 at 05:39:05PM +0200, Michael S. Tsirkin wrote:
> On Tue, May 19, 2015 at 04:35:08PM +0100, Daniel P. Berrange wrote:
> > On Tue, May 19, 2015 at 04:03:04PM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > > > > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > > > > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > > > >>> backgrond:
> > > > > >>> Live migration is one of the most important features of virtualization technology.
> > > > > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > > > > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > > > >>> performance gap with native network I/O. Pass-through network devices have near
> > > > > >>> native performance, however, they have thus far prevented live migration. No existing
> > > > > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > > > > >>>
> > > > > >>> There was an idea to solve the problem in website:
> > > > > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > > > >>> Please refer to above document for detailed information.
> > > > > >>>
> > > > > >>> So I think this problem maybe could be solved by using the combination of existing
> > > > > >>> technology. and the following steps are we considering to implement:
> > > > > >>>
> > > > > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > > > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > > > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > > > >>>
> > > > > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > > > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > > > > >>>    the callback functions, we can create the bonding device according to the XML
> > > > > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > > > >>>    easily.
> > > > > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > > > >> I think configuration of networking is really something that must be left to
> > > > > >> the guest OS admin to control. I don't think the guest agent should be trying
> > > > > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > > > > >> with configuration attempted by things in the guest like NetworkManager or
> > > > > >> systemd-networkd.
> > > > > > There should not be a conflict.
> > > > > > guest agent should just give NM the information, and have  NM do
> > > > > > the right thing.
> > > > > 
> > > > > That assumes the guest will have NM running. Unless you want to severely
> > > > > limit the scope of usefulness, you also need to handle systems that have
> > > > > NM disabled, and among those the different styles of system network
> > > > > config. It gets messy very fast.
> > > > 
> > > > Also OpenStack already has a way to pass guest information about the
> > > > required network setup, via cloud-init, so it would not be interested
> > > > in any thing that used the QEMU guest agent to configure network
> > > > manager. Which is really just another example of why this does not
> > > > belong anywhere in libvirt or lower.  The decision to use NM is a
> > > > policy decision that will always be wrong for a non-negligble set
> > > > of use cases and as such does not belong in libvirt or QEMU. It is
> > > > the job of higher level apps to make that kind of policy decision.
> > > 
> > > This is exactly my worry though; why should every higher level management
> > > system have it's own way of communicating network config for hotpluggable
> > > devices.  You shoudln't need to reconfigure a VM to move it between them.
> > > 
> > > This just makes it hard to move it between management layers; there needs
> > > to be some standardisation (or abstraction) of this;  if libvirt isn't the place
> > > to do it, then what is?
> > 
> > NB, openstack isn't really defining a custom thing for networking here. It
> > is actually integrating with the standard cloud-init guest tools for this
> > task. Also note that OpenStack has defined a mechanism that works for
> > guest images regardless of what hypervisor they are running on - ie does
> > not rely on any QEMU or libvirt specific functionality here.
> 
> I'm not sure what the implication is.  No new functionality should be
> implemented unless we also add it to vmware?  People that don't want kvm
> specific functionality, won't use it.

I'm saying that standardization of virtualization policy in libvirt is the
wrong solution, because different applications will have different viewpoints
as to what "standardization" is useful / appropriate. Creating a standardized
policy in libvirt for KVM, does not help OpenStack may help people who only
care about KVM, but that is not the entire ecosystem. OpenStack has a
standardized solution for guest configuration imformation that works across
all the hypervisors it targets.  This is just yet another example of exactly
why libvirt aims to design its APIs such that it exposes direct mechanisms
and leaves usage policy decisions upto the management applications. Libvirt
is not best placed to decide which policy all these mgmt apps must use for
this task.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 15:45               ` Daniel P. Berrange
@ 2015-05-19 16:08                 ` Michael S. Tsirkin
  2015-05-19 16:13                   ` Daniel P. Berrange
  2015-05-19 16:27                   ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2015-05-19 16:08 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: libvir-list, Dr. David Alan Gilbert, Laine Stump, qemu-devel

On Tue, May 19, 2015 at 04:45:03PM +0100, Daniel P. Berrange wrote:
> On Tue, May 19, 2015 at 05:39:05PM +0200, Michael S. Tsirkin wrote:
> > On Tue, May 19, 2015 at 04:35:08PM +0100, Daniel P. Berrange wrote:
> > > On Tue, May 19, 2015 at 04:03:04PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > > On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > > > > > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > > > > > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > > > > >>> backgrond:
> > > > > > >>> Live migration is one of the most important features of virtualization technology.
> > > > > > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > > > > > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > > > > >>> performance gap with native network I/O. Pass-through network devices have near
> > > > > > >>> native performance, however, they have thus far prevented live migration. No existing
> > > > > > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > > > > > >>>
> > > > > > >>> There was an idea to solve the problem in website:
> > > > > > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > > > > >>> Please refer to above document for detailed information.
> > > > > > >>>
> > > > > > >>> So I think this problem maybe could be solved by using the combination of existing
> > > > > > >>> technology. and the following steps are we considering to implement:
> > > > > > >>>
> > > > > > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > > > > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > > > > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > > > > >>>
> > > > > > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > > > > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > > > > > >>>    the callback functions, we can create the bonding device according to the XML
> > > > > > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > > > > >>>    easily.
> > > > > > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > > > > >> I think configuration of networking is really something that must be left to
> > > > > > >> the guest OS admin to control. I don't think the guest agent should be trying
> > > > > > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > > > > > >> with configuration attempted by things in the guest like NetworkManager or
> > > > > > >> systemd-networkd.
> > > > > > > There should not be a conflict.
> > > > > > > guest agent should just give NM the information, and have  NM do
> > > > > > > the right thing.
> > > > > > 
> > > > > > That assumes the guest will have NM running. Unless you want to severely
> > > > > > limit the scope of usefulness, you also need to handle systems that have
> > > > > > NM disabled, and among those the different styles of system network
> > > > > > config. It gets messy very fast.
> > > > > 
> > > > > Also OpenStack already has a way to pass guest information about the
> > > > > required network setup, via cloud-init, so it would not be interested
> > > > > in any thing that used the QEMU guest agent to configure network
> > > > > manager. Which is really just another example of why this does not
> > > > > belong anywhere in libvirt or lower.  The decision to use NM is a
> > > > > policy decision that will always be wrong for a non-negligble set
> > > > > of use cases and as such does not belong in libvirt or QEMU. It is
> > > > > the job of higher level apps to make that kind of policy decision.
> > > > 
> > > > This is exactly my worry though; why should every higher level management
> > > > system have it's own way of communicating network config for hotpluggable
> > > > devices.  You shoudln't need to reconfigure a VM to move it between them.
> > > > 
> > > > This just makes it hard to move it between management layers; there needs
> > > > to be some standardisation (or abstraction) of this;  if libvirt isn't the place
> > > > to do it, then what is?
> > > 
> > > NB, openstack isn't really defining a custom thing for networking here. It
> > > is actually integrating with the standard cloud-init guest tools for this
> > > task. Also note that OpenStack has defined a mechanism that works for
> > > guest images regardless of what hypervisor they are running on - ie does
> > > not rely on any QEMU or libvirt specific functionality here.
> > 
> > I'm not sure what the implication is.  No new functionality should be
> > implemented unless we also add it to vmware?  People that don't want kvm
> > specific functionality, won't use it.
> 
> I'm saying that standardization of virtualization policy in libvirt is the
> wrong solution, because different applications will have different viewpoints
> as to what "standardization" is useful / appropriate. Creating a standardized
> policy in libvirt for KVM, does not help OpenStack may help people who only
> care about KVM, but that is not the entire ecosystem. OpenStack has a
> standardized solution for guest configuration imformation that works across
> all the hypervisors it targets.  This is just yet another example of exactly
> why libvirt aims to design its APIs such that it exposes direct mechanisms
> and leaves usage policy decisions upto the management applications. Libvirt
> is not best placed to decide which policy all these mgmt apps must use for
> this task.
> 
> Regards,
> Daniel


I don't think we are pushing policy in libvirt here.

What we want is a mechanism that let users specify in the XML:
interface X is fallback for pass-through device Y
Then when requesting migration, specify that it should use
device Z on destination as replacement for Y.

We are asking libvirt to automatically
1.- when migration is requested, request unplug of Y
2.- wait until Y is deleted
3.- start migration
4.- wait until migration is completed
5.- plug device Z on destination


I don't see any policy above: libvirt is in control of migration and
seems best placed to implement this.



> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 16:08                 ` Michael S. Tsirkin
@ 2015-05-19 16:13                   ` Daniel P. Berrange
  2015-05-19 16:27                   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 45+ messages in thread
From: Daniel P. Berrange @ 2015-05-19 16:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: libvir-list, Dr. David Alan Gilbert, Laine Stump, qemu-devel

On Tue, May 19, 2015 at 06:08:10PM +0200, Michael S. Tsirkin wrote:
> On Tue, May 19, 2015 at 04:45:03PM +0100, Daniel P. Berrange wrote:
> > On Tue, May 19, 2015 at 05:39:05PM +0200, Michael S. Tsirkin wrote:
> > > On Tue, May 19, 2015 at 04:35:08PM +0100, Daniel P. Berrange wrote:
> > > > On Tue, May 19, 2015 at 04:03:04PM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > > > On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > > > > > > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > > > > > > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > > > > > >>> backgrond:
> > > > > > > >>> Live migration is one of the most important features of virtualization technology.
> > > > > > > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > > > > > > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > > > > > >>> performance gap with native network I/O. Pass-through network devices have near
> > > > > > > >>> native performance, however, they have thus far prevented live migration. No existing
> > > > > > > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > > > > > > >>>
> > > > > > > >>> There was an idea to solve the problem in website:
> > > > > > > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > > > > > >>> Please refer to above document for detailed information.
> > > > > > > >>>
> > > > > > > >>> So I think this problem maybe could be solved by using the combination of existing
> > > > > > > >>> technology. and the following steps are we considering to implement:
> > > > > > > >>>
> > > > > > > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > > > > > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > > > > > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > > > > > >>>
> > > > > > > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > > > > > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > > > > > > >>>    the callback functions, we can create the bonding device according to the XML
> > > > > > > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > > > > > >>>    easily.
> > > > > > > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > > > > > >> I think configuration of networking is really something that must be left to
> > > > > > > >> the guest OS admin to control. I don't think the guest agent should be trying
> > > > > > > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > > > > > > >> with configuration attempted by things in the guest like NetworkManager or
> > > > > > > >> systemd-networkd.
> > > > > > > > There should not be a conflict.
> > > > > > > > guest agent should just give NM the information, and have  NM do
> > > > > > > > the right thing.
> > > > > > > 
> > > > > > > That assumes the guest will have NM running. Unless you want to severely
> > > > > > > limit the scope of usefulness, you also need to handle systems that have
> > > > > > > NM disabled, and among those the different styles of system network
> > > > > > > config. It gets messy very fast.
> > > > > > 
> > > > > > Also OpenStack already has a way to pass guest information about the
> > > > > > required network setup, via cloud-init, so it would not be interested
> > > > > > in any thing that used the QEMU guest agent to configure network
> > > > > > manager. Which is really just another example of why this does not
> > > > > > belong anywhere in libvirt or lower.  The decision to use NM is a
> > > > > > policy decision that will always be wrong for a non-negligble set
> > > > > > of use cases and as such does not belong in libvirt or QEMU. It is
> > > > > > the job of higher level apps to make that kind of policy decision.
> > > > > 
> > > > > This is exactly my worry though; why should every higher level management
> > > > > system have it's own way of communicating network config for hotpluggable
> > > > > devices.  You shoudln't need to reconfigure a VM to move it between them.
> > > > > 
> > > > > This just makes it hard to move it between management layers; there needs
> > > > > to be some standardisation (or abstraction) of this;  if libvirt isn't the place
> > > > > to do it, then what is?
> > > > 
> > > > NB, openstack isn't really defining a custom thing for networking here. It
> > > > is actually integrating with the standard cloud-init guest tools for this
> > > > task. Also note that OpenStack has defined a mechanism that works for
> > > > guest images regardless of what hypervisor they are running on - ie does
> > > > not rely on any QEMU or libvirt specific functionality here.
> > > 
> > > I'm not sure what the implication is.  No new functionality should be
> > > implemented unless we also add it to vmware?  People that don't want kvm
> > > specific functionality, won't use it.
> > 
> > I'm saying that standardization of virtualization policy in libvirt is the
> > wrong solution, because different applications will have different viewpoints
> > as to what "standardization" is useful / appropriate. Creating a standardized
> > policy in libvirt for KVM, does not help OpenStack may help people who only
> > care about KVM, but that is not the entire ecosystem. OpenStack has a
> > standardized solution for guest configuration imformation that works across
> > all the hypervisors it targets.  This is just yet another example of exactly
> > why libvirt aims to design its APIs such that it exposes direct mechanisms
> > and leaves usage policy decisions upto the management applications. Libvirt
> > is not best placed to decide which policy all these mgmt apps must use for
> > this task.
> > 
> > Regards,
> > Daniel
> 
> 
> I don't think we are pushing policy in libvirt here.
> 
> What we want is a mechanism that let users specify in the XML:
> interface X is fallback for pass-through device Y
> Then when requesting migration, specify that it should use
> device Z on destination as replacement for Y.
> 
> We are asking libvirt to automatically
> 1.- when migration is requested, request unplug of Y
> 2.- wait until Y is deleted
> 3.- start migration
> 4.- wait until migration is completed
> 5.- plug device Z on destination
> 
> 
> I don't see any policy above: libvirt is in control of migration and
> seems best placed to implement this.

Even this implies policy in libvirt about handling of failure conditions.
How long to wait for unplug. What todo when unplug fails. What todo it
plug fails on the target. It is hard to report these errors to application
and when multiple devices are to be plugged/unplugged, the application will
also have trouble determining whether some or all of the devices are still
present after failure. Even beyond that, this is pointless as all 5 steps
you describe here are already possible to perform with existing functionality
in libvirt, with the application having direct control over what todo in the
failure scenarios.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
  2015-05-19 16:08                 ` Michael S. Tsirkin
  2015-05-19 16:13                   ` Daniel P. Berrange
@ 2015-05-19 16:27                   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 45+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-19 16:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: libvir-list, Dr. David Alan Gilbert, Laine Stump, qemu-devel

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Tue, May 19, 2015 at 04:45:03PM +0100, Daniel P. Berrange wrote:
> > On Tue, May 19, 2015 at 05:39:05PM +0200, Michael S. Tsirkin wrote:
> > > On Tue, May 19, 2015 at 04:35:08PM +0100, Daniel P. Berrange wrote:
> > > > On Tue, May 19, 2015 at 04:03:04PM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > > > On Tue, May 19, 2015 at 10:15:17AM -0400, Laine Stump wrote:
> > > > > > > On 05/19/2015 05:07 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Apr 22, 2015 at 10:23:04AM +0100, Daniel P. Berrange wrote:
> > > > > > > >> On Fri, Apr 17, 2015 at 04:53:02PM +0800, Chen Fan wrote:
> > > > > > > >>> backgrond:
> > > > > > > >>> Live migration is one of the most important features of virtualization technology.
> > > > > > > >>> With regard to recent virtualization techniques, performance of network I/O is critical.
> > > > > > > >>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
> > > > > > > >>> performance gap with native network I/O. Pass-through network devices have near
> > > > > > > >>> native performance, however, they have thus far prevented live migration. No existing
> > > > > > > >>> methods solve the problem of live migration with pass-through devices perfectly.
> > > > > > > >>>
> > > > > > > >>> There was an idea to solve the problem in website:
> > > > > > > >>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
> > > > > > > >>> Please refer to above document for detailed information.
> > > > > > > >>>
> > > > > > > >>> So I think this problem maybe could be solved by using the combination of existing
> > > > > > > >>> technology. and the following steps are we considering to implement:
> > > > > > > >>>
> > > > > > > >>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
> > > > > > > >>>    (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
> > > > > > > >>>    in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> > > > > > > >>>
> > > > > > > >>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
> > > > > > > >>>    then libvirt will call the previous registered initialize callbacks. so through
> > > > > > > >>>    the callback functions, we can create the bonding device according to the XML
> > > > > > > >>>    configuration. and here we use netcf tool which can facilitate to create bonding device
> > > > > > > >>>    easily.
> > > > > > > >> I'm not really clear on why libvirt/guest agent needs to be involved in this.
> > > > > > > >> I think configuration of networking is really something that must be left to
> > > > > > > >> the guest OS admin to control. I don't think the guest agent should be trying
> > > > > > > >> to reconfigure guest networking itself, as that is inevitably going to conflict
> > > > > > > >> with configuration attempted by things in the guest like NetworkManager or
> > > > > > > >> systemd-networkd.
> > > > > > > > There should not be a conflict.
> > > > > > > > guest agent should just give NM the information, and have  NM do
> > > > > > > > the right thing.
> > > > > > > 
> > > > > > > That assumes the guest will have NM running. Unless you want to severely
> > > > > > > limit the scope of usefulness, you also need to handle systems that have
> > > > > > > NM disabled, and among those the different styles of system network
> > > > > > > config. It gets messy very fast.
> > > > > > 
> > > > > > Also OpenStack already has a way to pass guest information about the
> > > > > > required network setup, via cloud-init, so it would not be interested
> > > > > > in any thing that used the QEMU guest agent to configure network
> > > > > > manager. Which is really just another example of why this does not
> > > > > > belong anywhere in libvirt or lower.  The decision to use NM is a
> > > > > > policy decision that will always be wrong for a non-negligble set
> > > > > > of use cases and as such does not belong in libvirt or QEMU. It is
> > > > > > the job of higher level apps to make that kind of policy decision.
> > > > > 
> > > > > This is exactly my worry though; why should every higher level management
> > > > > system have it's own way of communicating network config for hotpluggable
> > > > > devices.  You shoudln't need to reconfigure a VM to move it between them.
> > > > > 
> > > > > This just makes it hard to move it between management layers; there needs
> > > > > to be some standardisation (or abstraction) of this;  if libvirt isn't the place
> > > > > to do it, then what is?
> > > > 
> > > > NB, openstack isn't really defining a custom thing for networking here. It
> > > > is actually integrating with the standard cloud-init guest tools for this
> > > > task. Also note that OpenStack has defined a mechanism that works for
> > > > guest images regardless of what hypervisor they are running on - ie does
> > > > not rely on any QEMU or libvirt specific functionality here.
> > > 
> > > I'm not sure what the implication is.  No new functionality should be
> > > implemented unless we also add it to vmware?  People that don't want kvm
> > > specific functionality, won't use it.
> > 
> > I'm saying that standardization of virtualization policy in libvirt is the
> > wrong solution, because different applications will have different viewpoints
> > as to what "standardization" is useful / appropriate. Creating a standardized
> > policy in libvirt for KVM, does not help OpenStack may help people who only
> > care about KVM, but that is not the entire ecosystem. OpenStack has a
> > standardized solution for guest configuration imformation that works across
> > all the hypervisors it targets.  This is just yet another example of exactly
> > why libvirt aims to design its APIs such that it exposes direct mechanisms
> > and leaves usage policy decisions upto the management applications. Libvirt
> > is not best placed to decide which policy all these mgmt apps must use for
> > this task.
> > 
> > Regards,
> > Daniel
> 
> 
> I don't think we are pushing policy in libvirt here.
> 
> What we want is a mechanism that let users specify in the XML:
> interface X is fallback for pass-through device Y
> Then when requesting migration, specify that it should use
> device Z on destination as replacement for Y.
> 
> We are asking libvirt to automatically
> 1.- when migration is requested, request unplug of Y
> 2.- wait until Y is deleted
> 3.- start migration
> 4.- wait until migration is completed
> 5.- plug device Z on destination
> 
> I don't see any policy above: libvirt is in control of migration and
> seems best placed to implement this.

The step that list is missing is:
  0. Tell guest that *this virtio NIC (X) and *this real NIC (Y) are a bond pair
  6. Tell guest that *this real NIC (Z) are a bond pair

  0 has to happen both at startup and at hotplug of a new-pair;  I'm not clear
if 6 is actually needed depending on whether it can be done based on what was in 0.

Dave

> 
> 
> 
> > -- 
> > |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> > |: http://libvirt.org              -o-             http://virt-manager.org :|
> > |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> > |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command
  2015-04-17  8:53   ` [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command Chen Fan
@ 2015-05-21 13:52     ` Olga Krishtal
  2015-05-21 14:43       ` [Qemu-devel] [libvirt] " Eric Blake
  0 siblings, 1 reply; 45+ messages in thread
From: Olga Krishtal @ 2015-05-21 13:52 UTC (permalink / raw)
  To: Chen Fan, libvir-list; +Cc: izumi.taku, qemu-devel

On 17/04/15 11:53, Chen Fan wrote:
> Nowadays, qemu has supported physical NIC hotplug for high network
> throughput. but it's in conflict with live migration feature, to keep
> network connectivity, we could to create bond device interface which
> provides a mechanism for enslaving multiple network interfaces into a
> single "bond" interface. the active-backup mode can be used for an
> automatic switch. so this patch is adding a guest-network-set-interface
> command for creating bond device. so the management can easy to create
> a bond device dynamically when guest running.
>
> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> ---
>   configure            |  16 ++++
>   qga/commands-posix.c | 261 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   qga/commands-win32.c |   7 ++
>   qga/qapi-schema.json |  54 +++++++++++
>   4 files changed, 338 insertions(+)
>
> diff --git a/configure b/configure
> index f185dd0..ebfcc6a 100755
> --- a/configure
> +++ b/configure
> @@ -3618,6 +3618,18 @@ if test "$darwin" != "yes" -a "$mingw32" != "yes" -a "$solaris" != yes -a \
>   fi
>   
>   ##########################################
> +# Do we need netcf
> +netcf=no
> +cat > $TMPC << EOF
> +#include <netcf.h>
> +int main(void) { return 0; }
> +EOF
> +if compile_prog "" "-lnetcf" ; then
> +    netcf=yes
> +    libs_qga="$libs_qga -lnetcf"
> +fi
> +
> +##########################################
>   # spice probe
>   if test "$spice" != "no" ; then
>     cat > $TMPC << EOF
> @@ -4697,6 +4709,10 @@ if test "$spice" = "yes" ; then
>     echo "CONFIG_SPICE=y" >> $config_host_mak
>   fi
>   
> +if test "$netcf" = "yes" ; then
> +  echo "CONFIG_NETCF=y" >> $config_host_mak
> +fi
> +
>   if test "$smartcard_nss" = "yes" ; then
>     echo "CONFIG_SMARTCARD_NSS=y" >> $config_host_mak
>     echo "NSS_LIBS=$nss_libs" >> $config_host_mak
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index f6f3e3c..5ee7949 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -46,6 +46,10 @@ extern char **environ;
>   #include <sys/socket.h>
>   #include <net/if.h>
>   
> +#ifdef CONFIG_NETCF
> +#include <netcf.h>
> +#endif
> +
>   #ifdef FIFREEZE
>   #define CONFIG_FSFREEZE
>   #endif
> @@ -1719,6 +1723,263 @@ error:
>       return NULL;
>   }
>   
> +#ifdef CONFIG_NETCF
> +static const char *interface_type_string[] = {
> +    "bond",
> +};
> +
> +static const char *ip_address_type_string[] = {
> +    "ipv4",
> +    "ipv6",
> +};
> +
> +static char *parse_options(const char *str, const char *needle)
> +{
> +    char *start, *end, *buffer = NULL;
> +    char *ret = NULL;
> +
> +    buffer = g_strdup(str);
> +    start = buffer;
> +    if ((start = strstr(start, needle))) {
> +        start += strlen(needle);
> +        end = strchr(start, ' ');
> +        if (end) {
> +            *end = '\0';
> +        }
> +        if (strlen(start) == 0) {
> +            goto cleanup;
> +        }
> +        ret = g_strdup(start);
> +    }
> +
> +cleanup:
> +    g_free(buffer);
> +    return ret;
> +}
> +
> +/**
> + * @buffer: xml string data to be formatted
> + * @indent: indent number relative to first line
> + *
> + */
> +static void adjust_indent(char **buffer, int indent)
> +{
> +    char spaces[1024];
> +    int i;
> +
> +    if (!*buffer) {
> +        return;
> +    }
> +
> +    if (indent < 0 || indent >= 1024) {
> +        return;
> +    }
> +    memset(spaces, 0, sizeof(spaces));
> +    for (i = 0; i < indent; i++) {
> +        spaces[i] = ' ';
> +    }
> +
> +    sprintf(*buffer + strlen(*buffer), "%s", spaces);
> +}
> +
> +static char *create_bond_interface(GuestNetworkInterface2 *interface)
> +{
> +    char *target_xml;
> +
> +    target_xml = g_malloc0(1024);
> +    if (!target_xml) {
> +        return NULL;
> +    }
> +
> +    sprintf(target_xml, "<interface type='%s' name='%s'>\n",
> +            interface_type_string[interface->type], interface->name);
> +    adjust_indent(&target_xml, 2);
> +    sprintf(target_xml + strlen(target_xml), "<start mode='%s'/>\n",
> +            interface->has_onboot ? interface->onboot : "none");
> +    if (interface->has_ip_address) {
> +        GuestIpAddress *address_item = interface->ip_address;
> +
> +        adjust_indent(&target_xml, 2);
> +        sprintf(target_xml + strlen(target_xml), "<protocol family='%s'>\n",
> +                ip_address_type_string[address_item->ip_address_type]);
> +        adjust_indent(&target_xml, 4);
> +        sprintf(target_xml + strlen(target_xml), "<ip address='%s' prefix='%" PRId64 "'/>\n",
> +                address_item->ip_address, address_item->prefix);
> +        if (address_item->has_gateway) {
> +            adjust_indent(&target_xml, 4);
> +            sprintf(target_xml + strlen(target_xml), "<route gateway='%s'/>\n",
> +                    address_item->gateway);
> +        }
> +        adjust_indent(&target_xml, 2);
> +        sprintf(target_xml + strlen(target_xml), "%s\n", "</protocol>");
> +    }
> +
> +    adjust_indent(&target_xml, 2);
> +    if (interface->has_options) {
> +        char *value;
> +
> +        value = parse_options(interface->options, "mode=");
> +        if (value) {
> +            sprintf(target_xml + strlen(target_xml), "<bond mode='%s'>\n",
> +                    value);
> +            g_free(value);
> +        } else {
> +            sprintf(target_xml + strlen(target_xml), "%s\n", "<bond>");
> +        }
> +
> +        value = parse_options(interface->options, "miimon=");
> +        if (value) {
> +            adjust_indent(&target_xml, 4);
> +            sprintf(target_xml + strlen(target_xml), "<miimon freq='%s'",
> +                   value);
> +            g_free(value);
> +
> +            value = parse_options(interface->options, "updelay=");
> +            if (value) {
> +                sprintf(target_xml + strlen(target_xml), " updelay='%s'",
> +                        value);
> +                g_free(value);
> +            }
> +            value = parse_options(interface->options, "downdelay=");
> +            if (value) {
> +                sprintf(target_xml + strlen(target_xml), " downdelay='%s'",
> +                        value);
> +                g_free(value);
> +            }
> +            value = parse_options(interface->options, "use_carrier=");
> +            if (value) {
> +                sprintf(target_xml + strlen(target_xml), " carrier='%s'",
> +                        value);
> +                g_free(value);
> +            }
> +
> +            sprintf(target_xml + strlen(target_xml), "%s\n", "/>");
> +        }
> +
> +        value = parse_options(interface->options, "arp_interval=");
> +        if (value) {
> +            adjust_indent(&target_xml, 4);
> +            sprintf(target_xml + strlen(target_xml), "<arpmon interval='%s'",
> +                    value);
> +            g_free(value);
> +
> +            value = parse_options(interface->options, "arp_ip_target=");
> +            if (value) {
> +                sprintf(target_xml + strlen(target_xml), " target='%s'",
> +                        value);
> +                g_free(value);
> +            }
> +
> +            value = parse_options(interface->options, "arp_validate=");
> +            if (value) {
> +                sprintf(target_xml + strlen(target_xml), " validate='%s'",
> +                        value);
> +                g_free(value);
> +            }
> +
> +            sprintf(target_xml + strlen(target_xml), "%s\n", "/>");
> +        }
> +    } else {
> +        sprintf(target_xml + strlen(target_xml), "%s\n", "<bond>");
> +    }
> +
> +    if (interface->has_subInterfaces) {
> +        GuestNetworkInterfaceList *head = interface->subInterfaces;
> +
> +        for (; head; head = head->next) {
> +            adjust_indent(&target_xml, 4);
> +            sprintf(target_xml + strlen(target_xml),
> +                    "<interface type='ethernet' name='%s'/>\n",
> +                    head->value->name);
> +        }
> +    }
> +
> +    adjust_indent(&target_xml, 2);
> +    sprintf(target_xml + strlen(target_xml), "%s\n", "</bond>");
> +    sprintf(target_xml + strlen(target_xml), "%s\n", "</interface>");
> +
> +    return target_xml;
> +}
> +
> +static struct netcf *netcf;
> +
> +static void create_interface(GuestNetworkInterface2 *interface, Error **errp)
> +{
> +    int ret = -1;
> +    struct netcf_if *iface;
> +    unsigned int flags = 0;
> +    char *target_xml;
> +
> +    /* open netcf */
> +    if (netcf == NULL) {
> +        if (ncf_init(&netcf, NULL) != 0) {
> +            error_setg(errp, "netcf init failed");
> +            return;
> +        }
> +    }
> +
> +    if (interface->type != GUEST_INTERFACE_TYPE_BOND) {
> +        error_setg(errp, "interface type is not supported, only support 'bond' type");
> +        return;
> +    }
> +
> +   target_xml = create_bond_interface(interface);
> +   if (!target_xml) {
> +        error_setg(errp, "no enough memory spaces");
> +        return;
> +    }
> +
> +    iface = ncf_define(netcf, target_xml);
> +    if (!iface) {
> +        error_setg(errp, "netcf interface define failed");
> +        g_free(target_xml);
> +        goto cleanup;
> +    }
> +
> +    g_free(target_xml);
> +
> +    if (ncf_if_status(iface, &flags) < 0) {
> +        error_setg(errp, "netcf interface get status failed");
> +        goto cleanup;
> +    }
> +
> +    if (flags & NETCF_IFACE_ACTIVE) {
> +        error_setg(errp, "interface is already running");
> +        goto cleanup;
> +    }
> +
> +    ret = ncf_if_up(iface);
> +    if (ret < 0) {
> +        error_setg(errp, "netcf interface up failed");
> +        goto cleanup;
> +    }
> +
> + cleanup:
> +    ncf_if_free(iface);
> +}
> +
> +int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
> +                                        Error **errp)
> +{
> +    Error *local_err = NULL;
> +
> +    create_interface(interface, &local_err);
> +    if (local_err != NULL) {
> +        error_propagate(errp, local_err);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +#else
> +int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
> +                                        Error **errp)
> +{
> +    error_set(errp, QERR_UNSUPPORTED);
> +    return -1;
> +}
> +#endif
> +
>   #define SYSCONF_EXACT(name, errp) sysconf_exact((name), #name, (errp))
>   
>   static long sysconf_exact(int name, const char *name_str, Error **errp)
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index 3bcbeae..4c14514 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -446,6 +446,13 @@ int64_t qmp_guest_set_vcpus(GuestLogicalProcessorList *vcpus, Error **errp)
>       return -1;
>   }
>   
> +int64_t qmp_guest_network_set_interface(GuestNetworkInterface2 *interface,
> +                                        Error **errp)
> +{
> +    error_set(errp, QERR_UNSUPPORTED);
> +    return -1;
> +}
> +
>   /* add unsupported commands to the blacklist */
>   GList *ga_command_blacklist_init(GList *blacklist)
>   {
> diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
> index 376e79f..77f499b 100644
> --- a/qga/qapi-schema.json
> +++ b/qga/qapi-schema.json
> @@ -556,6 +556,7 @@
>   { 'type': 'GuestIpAddress',
>     'data': {'ip-address': 'str',
>              'ip-address-type': 'GuestIpAddressType',
> +           '*gateway': 'str',
>              'prefix': 'int'} }
>   
>   ##
> @@ -575,6 +576,43 @@
>              '*ip-addresses': ['GuestIpAddress'] } }
>   
>   ##
> +# @GuestInterfaceType:
> +#
> +# An enumeration of supported interface types
> +#
> +# @bond: bond device
> +#
> +# Since: 2.3
> +##
> +{ 'enum': 'GuestInterfaceType',
> +  'data': [ 'bond' ] }
> +
> +##
> +# @GuestNetworkInterface2:
> +#
> +# @type: the interface type which supported in enum GuestInterfaceType.
> +#
> +# @name: the interface name.
> +#
> +# @onboot: the interface start model.
> +#
> +# @ip-address: IP address.
> +#
> +# @options: the options argument.
> +#
> +# @subInterfaces: the slave interfaces.
> +#
> +# Since: 2.3
> +##
> +{ 'type': 'GuestNetworkInterface2',
> +  'data': {'type': 'GuestInterfaceType',
> +           'name': 'str',
> +           '*onboot': 'str',
> +           '*ip-address': 'GuestIpAddress',
> +           '*options': 'str',
> +           '*subInterfaces': ['GuestNetworkInterface'] } }
> +
> +##
>   # @guest-network-get-interfaces:
>   #
>   # Get list of guest IP addresses, MAC addresses
> @@ -588,6 +626,22 @@
>     'returns': ['GuestNetworkInterface'] }
>   
>   ##
> +# @guest-network-set-interface:
> +#
> +# Set guest network interface
> +#
> +# return: 0:      call successful.
> +#
> +#         -1:     call failed.
> +#
> +#
> +# Since: 2.3
> +##
> +{ 'command': 'guest-network-set-interface',
> +  'data'   : {'interface': 'GuestNetworkInterface2' },
> +  'returns': 'int' }
I thought that usage of built-in types as the returning value is 
deprecated.
Lets return dictionary in guest-network-set (get)-interface
> +
> +##
>   # @GuestLogicalProcessor:
>   #
>   # @logical-id: Arbitrary guest-specific unique identifier of the VCPU.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 1/3] qemu-agent: add guest-network-set-interface command
  2015-05-21 13:52     ` Olga Krishtal
@ 2015-05-21 14:43       ` Eric Blake
  0 siblings, 0 replies; 45+ messages in thread
From: Eric Blake @ 2015-05-21 14:43 UTC (permalink / raw)
  To: Olga Krishtal, Chen Fan, libvir-list; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1806 bytes --]

On 05/21/2015 07:52 AM, Olga Krishtal wrote:
> On 17/04/15 11:53, Chen Fan wrote:
>> Nowadays, qemu has supported physical NIC hotplug for high network
>> throughput. but it's in conflict with live migration feature, to keep
>> network connectivity, we could to create bond device interface which
>> provides a mechanism for enslaving multiple network interfaces into a
>> single "bond" interface. the active-backup mode can be used for an
>> automatic switch. so this patch is adding a guest-network-set-interface
>> command for creating bond device. so the management can easy to create
>> a bond device dynamically when guest running.
>>
>> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>> ---

>> @@ -588,6 +626,22 @@
>>     'returns': ['GuestNetworkInterface'] }
>>     ##
>> +# @guest-network-set-interface:
>> +#
>> +# Set guest network interface
>> +#
>> +# return: 0:      call successful.
>> +#
>> +#         -1:     call failed.
>> +#
>> +#
>> +# Since: 2.3

You've missed 2.3; if we still want this, it will need to be updated to 2.4.

>> +##
>> +{ 'command': 'guest-network-set-interface',
>> +  'data'   : {'interface': 'GuestNetworkInterface2' },
>> +  'returns': 'int' }
> I thought that usage of built-in types as the returning value is
> deprecated.
> Lets return dictionary in guest-network-set (get)-interface

Correct. Returning a non-dictionary now causes the generator to barf if
you don't update a whitelist.  But you don't even need a return value -
QGA is already set up to return {} on success and an error message on
failure, if you have nothing further to add.  Just omit 'returns' from
your 'command' definition.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface
  2015-04-17  8:53 ` [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface Chen Fan
  2015-05-19  9:13   ` Michael S. Tsirkin
@ 2015-05-29  7:37   ` Michal Privoznik
  1 sibling, 0 replies; 45+ messages in thread
From: Michal Privoznik @ 2015-05-29  7:37 UTC (permalink / raw)
  To: Chen Fan, libvir-list; +Cc: izumi.taku, qemu-devel

On 17.04.2015 10:53, Chen Fan wrote:
> via initialize callback to create bond device.
> 
> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> ---
>  src/qemu/qemu_agent.c   | 118 ++++++++++++++++++++++++++++++++++++++++++++++++
>  src/qemu/qemu_agent.h   |  10 ++++
>  src/qemu/qemu_domain.c  |  70 ++++++++++++++++++++++++++++
>  src/qemu/qemu_domain.h  |   7 +++
>  src/qemu/qemu_process.c |   4 ++
>  5 files changed, 209 insertions(+)
> 

If we go this way, we should introduce much broader set of interface
types to create. In fact, I don't like idea of qemu-ga mangling guest
network, esp. when there are so many tools for that.

Michal

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2015-05-29  7:37 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 1/7] qemu-agent: add agent init callback when detecting guest setup Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 2/7] qemu: add guest init event callback to do the initialize work for guest Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 3/7] hostdev: add a 'bond' type element in <hostdev> element Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface Chen Fan
2015-05-19  9:13   ` Michael S. Tsirkin
2015-05-29  7:37   ` Michal Privoznik
2015-04-17  8:53 ` [Qemu-devel] [RFC 5/7] hostdev: add parse ip and route for bond configure Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 6/7] migrate: hot remove hostdev at perform phase for bond device Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 7/7] migrate: add hostdev migrate status to support hostdev migration Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 0/3] add support migration with passthrough device Chen Fan
2015-04-17  8:53   ` [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command Chen Fan
2015-05-21 13:52     ` Olga Krishtal
2015-05-21 14:43       ` [Qemu-devel] [libvirt] " Eric Blake
2015-04-17  8:53   ` [Qemu-devel] [RFC 2/3] qemu-agent: add guest-network-delete-interface command Chen Fan
2015-04-17  8:53   ` [Qemu-devel] [RFC 3/3] qemu-agent: add notify for qemu-ga boot Chen Fan
2015-04-21 23:38     ` Eric Blake
2015-04-19 22:29 ` [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal Laine Stump
2015-04-22  4:22   ` Chen Fan
2015-04-23 14:14     ` Laine Stump
2015-04-23  8:34   ` Chen Fan
2015-04-23 15:01     ` Laine Stump
2015-05-19  9:10       ` Michael S. Tsirkin
2015-04-22  9:23 ` [Qemu-devel] " Daniel P. Berrange
2015-04-22 13:05   ` Daniel P. Berrange
2015-04-22 17:01   ` Dr. David Alan Gilbert
2015-04-22 17:06     ` Daniel P. Berrange
2015-04-22 17:12       ` Dr. David Alan Gilbert
2015-04-22 17:15         ` Daniel P. Berrange
2015-04-22 17:20           ` Dr. David Alan Gilbert
2015-04-23 16:35             ` [Qemu-devel] [libvirt] " Laine Stump
2015-05-19  9:04               ` Michael S. Tsirkin
2015-05-19  9:07   ` [Qemu-devel] " Michael S. Tsirkin
2015-05-19 14:15     ` [Qemu-devel] [libvirt] " Laine Stump
2015-05-19 14:21       ` Daniel P. Berrange
2015-05-19 15:03         ` Dr. David Alan Gilbert
2015-05-19 15:18           ` Michael S. Tsirkin
2015-05-19 15:35           ` Daniel P. Berrange
2015-05-19 15:39             ` Michael S. Tsirkin
2015-05-19 15:45               ` Daniel P. Berrange
2015-05-19 16:08                 ` Michael S. Tsirkin
2015-05-19 16:13                   ` Daniel P. Berrange
2015-05-19 16:27                   ` Dr. David Alan Gilbert
2015-05-19 15:21         ` Michael S. Tsirkin
2015-05-19 15:14       ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.