All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework
@ 2016-05-05 14:27 Daniel P. Berrange
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 1/6] scripts: add __init__.py file to scripts/qmp/ Daniel P. Berrange
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2016-05-05 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Amit Shah, Dr. David Alan Gilbert, Daniel P. Berrange

This series of patches provides a framework for testing migration performance
characteristics. The motivating factor for this is planning that is underway
in OpenStack wrt making use of QEMU migration features such as compression,
auto-converge and post-copy. The primary aim for OpenStack is to have Nova
autonomously manage migration features & tunables to maximise chances that
migration will complete. The problem faced is figuring out just which QEMU
migration features are "best" suited to our needs. This means we want data
on how well they are able to ensure completion of a migration, against the
host resources used and the impact on the guest workload performance.

The test framework produced here takes a pathelogical guest workload (every
CPU just burning 100% of time xor'ing every byte of guest memory with random
data). This is quite a pessimistic test because most guest workloads are not
giong to be this heavy on memory writes, and their data won't be uniformly
random and so will be able to compress better than this test does.

With this worst case guest, I have produced a set of tests using UNIX socket,
TCP localhost, TCP remote and RDMA remote socket transports, with both a
1 GB RAM + 1 CPU guest and a 8 GB RAM + 4 CPU guest.

The TCP/RDMA remote host tests were run over a 10-GiG-E network interface.

I have put the results online to view here:

  https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/

The charts here are showing two core sets of data:

 - The guest CPU performance. The left axis is showing the time in milliseconds
   required to xor 1 GB of memory. This is shown per-guest CPU and combined all
   CPUs.

 - The host CPU utilization. The right axis is showing the overall QEMU process
   CPU utilization, and the per-VCPU utilization.

Note that the charts are interactive - you can turn on/off each plot line and
zoom in by selecting regions on the chart.


Some interesting things that I have observed with this

 - At the start of each iteration of migration there is a distinct drop in
   guest CPU performance as shown by a spike in the guest CPU time lines.
   Performance would drop from 200ms/GB to 400ms/GB. Presumably this is
   related to QEMU recalculating the dirty bitmap for the guest RAM. See
   the spikes in the green line in:

    https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/post-copy-bandwidth/post-copy-bw-1gbs.html

 - For the larger sized guests, the auto-converge code has to throttle the
   guest to as much as 90% or more before it is able to meet the 500ms max
   downtime value

    https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/auto-converge-bandwidth/auto-converge-bw-1gbs.html

   Even then I often saw tests aborting as they hit the max number of
   iterations I permitted (30 iters max)

    https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/auto-converge-bandwidth/auto-converge-bw-10gbs.html

 - MT compression is actively harmful to chances of successful migration when
   the guest RAM is not compression friendly. My work load is worst case since
   it is splattering RAM with totally random bytes. The MT compression is
   dramatically increasing the time for each iteration as we bottleneck on CPU
   compression speed, leaving the network largely idle. This causes migration
   which would have completed without compression, to fail. It also burns huge
   amounts of host CPU time

     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-mt/compr-mt-threads-4.html

 - XBZRLE compression did not have as much of a CPU peformance penalty on the
   host as MT comprssion, but also did not help migration to actually complete.
   Again this is largely due to the workload being the worst case scenario with
   random bytes. The downside is obviously the potentially significant memory
   overhead on the host due to the cache sizing

    https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-xbzrle/compr-xbzrle-cache-50.html


 - Post-copy, by its very nature, obviously ensured that the migraton would
   complete. While post-copy was running in pre-copy mode there was a somewhat
   chaotic small impact on guest CPU performance, causing performance to
   periodically oscillate between 400ms/GB and 800ms/GB. This is less than
   the impact at the start of each migration iteration which was 1000ms/GB
   in this test. There was also a massive penalty at time of switchover from
   pre to post copy, as to be expected. The migration completed in post-copy
   phase quite quickly though. For this workload, number of iterations in
   pre-copy mode before switching to post-copy did not have much impact. I
   expect a less extreme workload would have shown more interesting results
   wrt number of iterations of pre-copy:

    https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/post-copy-iters.html


Overall, if we're looking for a solution that can guarantee completion under
the most extreme guest workload, then only post-copy & autoconverge appear
upto the job.

The MT compression is seriously harmful to migration and has severe CPU
overhead. The XBZRLE compression is moderatly harmful to migration and has
potentilly severa memory overhead for large cache sizes to make it useful.

While auto-converge can ensure that guest migration completes, it has a
pretty significantly long term impact on guest CPU performance to achieve
this. ie the guest spends a long time in pre-copy mode with its CPUs very
dramatically throttled down. The level of throttling required makes one
wonder whether it is worth using, against simply pausing the guest workload.
The latter has a hard blackout period, but over a quite short time frame
if network speed is fast.

The post-copy code does have an impact on guest performance while in pre
copy mode, vs a plain migration. It also has a fairly high spike when in
post-copy mode, but this last for a pretty short time. As compared to
auto-converge, it is able to ensure the migration completes in a finite
time without having a prolonged impact on guest CPU performance. The
penalty during the post-copy phase is on a par with the penalty impose
by auto-converge when it has to throttle to 90%+.


Overall, in the context of a worst case guest workload, it appears that
post-copy is the clear winning strategy ensuring completion of migration
without imposing an long duration penalty on guest peformance. If the
risk of failure from post-copy is unacceptable then auto-converge is a
good fallback option, if the long duration guest CPU penalty can be
accepted.

The compression options are only worth using if the host has free CPU
resources, and the guest RAM is believed to be compression friendly,
as they steal significant CPU time away from guests in order to run
compression, often with a negative impact on migration completion
chances.

Looking at migration in general, even with a 10-GiG-E NIC and RDMA
transport it is possible for a single guest to provide a workload that
will saturate the network during migration & thus prevent completion.
Based on this, there is little point in attempting to run migrations
in parallel on the same host, unless multiple NICs are available,
as parallel migrations would reduce the chances of either one ever
completing. Better reliability & faster overall completion would
likely be achieved by fully serializing migration operations per
host.

There is clearly scope for more investigation here, in particular

 - Produce some alternative guest workloads that try to present
   a more "average" scenario workload, instead of the worst-case.
   These would likely allow compression to have some positive
   impact.

 - Try various combinations of strategies. For example, combining
   post-copy and auto-converge at the same time, or compression
   combined with either post-copy or auto-converge.

 - Investigate block migration performance too, with NBD migration
   server.

 - Investigate effect of dynamically changing max downtime value
   during migration, rather than using a fixed 500ms value.


Daniel P. Berrange (6):
  scripts: add __init__.py file to scripts/qmp/
  scripts: add a 'debug' parameter to QEMUMonitorProtocol
  scripts: refactor the VM class in iotests for reuse
  scripts: set timeout when waiting for qemu monitor connection
  scripts: ensure monitor socket has SO_REUSEADDR set
  tests: introduce a framework for testing migration performance

 configure                               |   2 +
 scripts/qemu.py                         | 202 +++++++++++
 scripts/qmp/__init__.py                 |   0
 scripts/qmp/qmp.py                      |  15 +-
 scripts/qtest.py                        |  34 ++
 tests/Makefile                          |  12 +
 tests/migration/.gitignore              |   2 +
 tests/migration/guestperf-batch.py      |  26 ++
 tests/migration/guestperf-plot.py       |  26 ++
 tests/migration/guestperf.py            |  27 ++
 tests/migration/guestperf/__init__.py   |   0
 tests/migration/guestperf/comparison.py | 124 +++++++
 tests/migration/guestperf/engine.py     | 439 ++++++++++++++++++++++
 tests/migration/guestperf/hardware.py   |  62 ++++
 tests/migration/guestperf/plot.py       | 623 ++++++++++++++++++++++++++++++++
 tests/migration/guestperf/progress.py   | 117 ++++++
 tests/migration/guestperf/report.py     |  98 +++++
 tests/migration/guestperf/scenario.py   |  95 +++++
 tests/migration/guestperf/shell.py      | 255 +++++++++++++
 tests/migration/guestperf/timings.py    |  55 +++
 tests/migration/stress.c                | 367 +++++++++++++++++++
 tests/qemu-iotests/iotests.py           | 135 +------
 22 files changed, 2583 insertions(+), 133 deletions(-)
 create mode 100644 scripts/qemu.py
 create mode 100644 scripts/qmp/__init__.py
 create mode 100644 tests/migration/.gitignore
 create mode 100755 tests/migration/guestperf-batch.py
 create mode 100755 tests/migration/guestperf-plot.py
 create mode 100755 tests/migration/guestperf.py
 create mode 100644 tests/migration/guestperf/__init__.py
 create mode 100644 tests/migration/guestperf/comparison.py
 create mode 100644 tests/migration/guestperf/engine.py
 create mode 100644 tests/migration/guestperf/hardware.py
 create mode 100644 tests/migration/guestperf/plot.py
 create mode 100644 tests/migration/guestperf/progress.py
 create mode 100644 tests/migration/guestperf/report.py
 create mode 100644 tests/migration/guestperf/scenario.py
 create mode 100644 tests/migration/guestperf/shell.py
 create mode 100644 tests/migration/guestperf/timings.py
 create mode 100644 tests/migration/stress.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v1 1/6] scripts: add __init__.py file to scripts/qmp/
  2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
@ 2016-05-05 14:27 ` Daniel P. Berrange
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 2/6] scripts: add a 'debug' parameter to QEMUMonitorProtocol Daniel P. Berrange
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2016-05-05 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Amit Shah, Dr. David Alan Gilbert, Daniel P. Berrange

When searching for modules to load, python will ignore any
sub-directory which does not contain __init__.py. This means
that both scripts and scripts/qmp/ have to be explicitly added
to the python path. By adding a __init__.py file to scripts/qmp,
we only need add scripts/ to the python path and can then simply
do 'from qmp import qmp' to load scripts/qmp/qmp.py.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
 scripts/qmp/__init__.py | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 scripts/qmp/__init__.py

diff --git a/scripts/qmp/__init__.py b/scripts/qmp/__init__.py
new file mode 100644
index 0000000..e69de29
-- 
2.5.5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v1 2/6] scripts: add a 'debug' parameter to QEMUMonitorProtocol
  2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 1/6] scripts: add __init__.py file to scripts/qmp/ Daniel P. Berrange
@ 2016-05-05 14:27 ` Daniel P. Berrange
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 3/6] scripts: refactor the VM class in iotests for reuse Daniel P. Berrange
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2016-05-05 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Amit Shah, Dr. David Alan Gilbert, Daniel P. Berrange

Add a 'debug' parameter to the QEMUMonitorProtocol class
which will cause it to print out all JSON strings on
sys.stderr

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
 scripts/qmp/qmp.py | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
index 779332f..70e927e 100644
--- a/scripts/qmp/qmp.py
+++ b/scripts/qmp/qmp.py
@@ -11,6 +11,7 @@
 import json
 import errno
 import socket
+import sys
 
 class QMPError(Exception):
     pass
@@ -25,7 +26,7 @@ class QMPTimeoutError(QMPError):
     pass
 
 class QEMUMonitorProtocol:
-    def __init__(self, address, server=False):
+    def __init__(self, address, server=False, debug=False):
         """
         Create a QEMUMonitorProtocol class.
 
@@ -39,6 +40,7 @@ class QEMUMonitorProtocol:
         """
         self.__events = []
         self.__address = address
+        self._debug = debug
         self.__sock = self.__get_sock()
         if server:
             self.__sock.bind(self.__address)
@@ -68,6 +70,8 @@ class QEMUMonitorProtocol:
                 return
             resp = json.loads(data)
             if 'event' in resp:
+                if self._debug:
+                    print >>sys.stderr, "QMP:<<< %s" % resp
                 self.__events.append(resp)
                 if not only_event:
                     continue
@@ -148,13 +152,18 @@ class QEMUMonitorProtocol:
         @return QMP response as a Python dict or None if the connection has
                 been closed
         """
+        if self._debug:
+            print >>sys.stderr, "QMP:>>> %s" % qmp_cmd
         try:
             self.__sock.sendall(json.dumps(qmp_cmd))
         except socket.error as err:
             if err[0] == errno.EPIPE:
                 return
             raise socket.error(err)
-        return self.__json_read()
+        resp = self.__json_read()
+        if self._debug:
+            print >>sys.stderr, "QMP:<<< %s" % resp
+        return resp
 
     def cmd(self, name, args=None, id=None):
         """
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v1 3/6] scripts: refactor the VM class in iotests for reuse
  2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 1/6] scripts: add __init__.py file to scripts/qmp/ Daniel P. Berrange
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 2/6] scripts: add a 'debug' parameter to QEMUMonitorProtocol Daniel P. Berrange
@ 2016-05-05 14:27 ` Daniel P. Berrange
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 4/6] scripts: set timeout when waiting for qemu monitor connection Daniel P. Berrange
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2016-05-05 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Amit Shah, Dr. David Alan Gilbert, Daniel P. Berrange

The iotests module has a python class for controlling QEMU
processes. Pull the generic functionality out of this file
and create a scripts/qemu.py module containing a QEMUMachine
class. Put the QTest integration support into a subclass
QEMUQtestMachine.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
 scripts/qemu.py               | 202 ++++++++++++++++++++++++++++++++++++++++++
 scripts/qtest.py              |  34 +++++++
 tests/qemu-iotests/iotests.py | 135 +---------------------------
 3 files changed, 240 insertions(+), 131 deletions(-)
 create mode 100644 scripts/qemu.py

diff --git a/scripts/qemu.py b/scripts/qemu.py
new file mode 100644
index 0000000..9cdad24
--- /dev/null
+++ b/scripts/qemu.py
@@ -0,0 +1,202 @@
+# QEMU library
+#
+# Copyright (C) 2015-2016 Red Hat Inc.
+# Copyright (C) 2012 IBM Corp.
+#
+# Authors:
+#  Fam Zheng <famz@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2.  See
+# the COPYING file in the top-level directory.
+#
+# Based on qmp.py.
+#
+
+import errno
+import string
+import os
+import sys
+import subprocess
+import qmp.qmp
+
+
+class QEMUMachine(object):
+    '''A QEMU VM'''
+
+    def __init__(self, binary, args=[], wrapper=[], name=None, test_dir="/var/tmp",
+                 monitor_address=None, debug=False):
+        if name is None:
+            name = "qemu-%d" % os.getpid()
+        if monitor_address is None:
+            monitor_address = os.path.join(test_dir, name + "-monitor.sock")
+        self._monitor_address = monitor_address
+        self._qemu_log_path = os.path.join(test_dir, name + ".log")
+        self._popen = None
+        self._binary = binary
+        self._args = args
+        self._wrapper = wrapper
+        self._events = []
+        self._iolog = None
+        self._debug = debug
+
+    # This can be used to add an unused monitor instance.
+    def add_monitor_telnet(self, ip, port):
+        args = 'tcp:%s:%d,server,nowait,telnet' % (ip, port)
+        self._args.append('-monitor')
+        self._args.append(args)
+
+    def add_fd(self, fd, fdset, opaque, opts=''):
+        '''Pass a file descriptor to the VM'''
+        options = ['fd=%d' % fd,
+                   'set=%d' % fdset,
+                   'opaque=%s' % opaque]
+        if opts:
+            options.append(opts)
+
+        self._args.append('-add-fd')
+        self._args.append(','.join(options))
+        return self
+
+    def send_fd_scm(self, fd_file_path):
+        # In iotest.py, the qmp should always use unix socket.
+        assert self._qmp.is_scm_available()
+        bin = socket_scm_helper
+        if os.path.exists(bin) == False:
+            print "Scm help program does not present, path '%s'." % bin
+            return -1
+        fd_param = ["%s" % bin,
+                    "%d" % self._qmp.get_sock_fd(),
+                    "%s" % fd_file_path]
+        devnull = open('/dev/null', 'rb')
+        p = subprocess.Popen(fd_param, stdin=devnull, stdout=sys.stdout,
+                             stderr=sys.stderr)
+        return p.wait()
+
+    @staticmethod
+    def _remove_if_exists(path):
+        '''Remove file object at path if it exists'''
+        try:
+            os.remove(path)
+        except OSError as exception:
+            if exception.errno == errno.ENOENT:
+                return
+            raise
+
+    def get_pid(self):
+        if not self._popen:
+            return None
+        return self._popen.pid
+
+    def _load_io_log(self):
+        with open(self._qemu_log_path, "r") as fh:
+            self._iolog = fh.read()
+
+    def _base_args(self):
+        if isinstance(self._monitor_address, tuple):
+            moncdev = "socket,id=mon,host=%s,port=%s" % (
+                self._monitor_address[0],
+                self._monitor_address[1])
+        else:
+            moncdev = 'socket,id=mon,path=%s' % self._monitor_address
+        return ['-chardev', moncdev,
+                '-mon', 'chardev=mon,mode=control',
+                '-display', 'none', '-vga', 'none']
+
+    def _pre_launch(self):
+        self._qmp = qmp.qmp.QEMUMonitorProtocol(self._monitor_address, server=True,
+                                                debug=self._debug)
+
+    def _post_launch(self):
+        self._qmp.accept()
+
+    def _post_shutdown(self):
+        if not isinstance(self._monitor_address, tuple):
+            self._remove_if_exists(self._monitor_address)
+        self._remove_if_exists(self._qemu_log_path)
+
+    def launch(self):
+        '''Launch the VM and establish a QMP connection'''
+        devnull = open('/dev/null', 'rb')
+        qemulog = open(self._qemu_log_path, 'wb')
+        try:
+            self._pre_launch()
+            args = self._wrapper + [self._binary] + self._base_args() + self._args
+            self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
+                                           stderr=subprocess.STDOUT, shell=False)
+            self._post_launch()
+        except:
+            if self._popen:
+                self._popen.kill()
+            self._load_io_log()
+            self._post_shutdown()
+            self._popen = None
+            raise
+
+    def shutdown(self):
+        '''Terminate the VM and clean up'''
+        if not self._popen is None:
+            try:
+                self._qmp.cmd('quit')
+                self._qmp.close()
+            except:
+                self._popen.kill()
+
+            exitcode = self._popen.wait()
+            if exitcode < 0:
+                sys.stderr.write('qemu received signal %i: %s\n' % (-exitcode, ' '.join(self._args)))
+            self._load_io_log()
+            self._post_shutdown()
+            self._popen = None
+
+    underscore_to_dash = string.maketrans('_', '-')
+    def qmp(self, cmd, conv_keys=True, **args):
+        '''Invoke a QMP command and return the result dict'''
+        qmp_args = dict()
+        for k in args.keys():
+            if conv_keys:
+                qmp_args[k.translate(self.underscore_to_dash)] = args[k]
+            else:
+                qmp_args[k] = args[k]
+
+        return self._qmp.cmd(cmd, args=qmp_args)
+
+    def command(self, cmd, conv_keys=True, **args):
+        reply = self.qmp(cmd, conv_keys, **args)
+        if reply is None:
+            raise Exception("Monitor is closed")
+        if "error" in reply:
+            raise Exception(reply["error"]["desc"])
+        return reply["return"]
+
+    def get_qmp_event(self, wait=False):
+        '''Poll for one queued QMP events and return it'''
+        if len(self._events) > 0:
+            return self._events.pop(0)
+        return self._qmp.pull_event(wait=wait)
+
+    def get_qmp_events(self, wait=False):
+        '''Poll for queued QMP events and return a list of dicts'''
+        events = self._qmp.get_events(wait=wait)
+        events.extend(self._events)
+        del self._events[:]
+        self._qmp.clear_events()
+        return events
+
+    def event_wait(self, name, timeout=60.0, match=None):
+        # Search cached events
+        for event in self._events:
+            if (event['event'] == name) and event_match(event, match):
+                self._events.remove(event)
+                return event
+
+        # Poll for new events
+        while True:
+            event = self._qmp.pull_event(wait=timeout)
+            if (event['event'] == name) and event_match(event, match):
+                return event
+            self._events.append(event)
+
+        return None
+
+    def get_log(self):
+        return self._iolog
diff --git a/scripts/qtest.py b/scripts/qtest.py
index a971445..03bc7f6 100644
--- a/scripts/qtest.py
+++ b/scripts/qtest.py
@@ -13,6 +13,11 @@
 
 import errno
 import socket
+import string
+import os
+import subprocess
+import qmp.qmp
+import qemu
 
 class QEMUQtestProtocol(object):
     def __init__(self, address, server=False):
@@ -69,3 +74,32 @@ class QEMUQtestProtocol(object):
 
     def settimeout(self, timeout):
         self._sock.settimeout(timeout)
+
+
+class QEMUQtestMachine(qemu.QEMUMachine):
+    '''A QEMU VM'''
+
+    def __init__(self, binary, args=[], name=None, test_dir="/var/tmp"):
+        super(self, QEMUQtestMachine).__init__(binary, args, name, test_dir)
+        self._qtest_path = os.path.join(test_dir, name + "-qtest.sock")
+
+    def _base_args(self):
+        args = super(self, QEMUQtestMachine)._base_args()
+        args.extend(['-qtest', 'unix:path=' + self._qtest_path])
+        return args
+
+    def _pre_launch(self):
+        super(self, QEMUQtestMachine)._pre_launch()
+        self._qtest = QEMUQtestProtocol(self._qtest_path, server=True)
+
+    def _post_launch(self):
+        super(self, QEMUQtestMachine)._post_launch()
+        self._qtest.accept()
+
+    def _post_shutdown(self):
+        super(self, QEMUQtestMachine)._post_shutdown()
+        self._remove_if_exists(self._qtest_path)
+
+    def qtest(self, cmd):
+        '''Send a qtest command to guest'''
+        return self._qtest.cmd(cmd)
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 56f988a..67d1185 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -24,8 +24,6 @@ import string
 import unittest
 import sys
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'scripts'))
-sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'scripts', 'qmp'))
-import qmp
 import qtest
 import struct
 import json
@@ -41,9 +39,8 @@ qemu_io_args = [os.environ.get('QEMU_IO_PROG', 'qemu-io')]
 if os.environ.get('QEMU_IO_OPTIONS'):
     qemu_io_args += os.environ['QEMU_IO_OPTIONS'].strip().split(' ')
 
-qemu_args = [os.environ.get('QEMU_PROG', 'qemu')]
-if os.environ.get('QEMU_OPTIONS'):
-    qemu_args += os.environ['QEMU_OPTIONS'].strip().split(' ')
+qemu_prog = [os.environ.get('QEMU_PROG', 'qemu')]
+qemu_opts = os.environ.get('QEMU_OPTIONS', '').strip().split(' ')
 
 imgfmt = os.environ.get('IMGFMT', 'raw')
 imgproto = os.environ.get('IMGPROTO', 'file')
@@ -148,27 +145,12 @@ def event_match(event, match=None):
 
     return True
 
-class VM(object):
+class VM(qtest.QEMUMachine):
     '''A QEMU VM'''
 
     def __init__(self):
-        self._monitor_path = os.path.join(test_dir, 'qemu-mon.%d' % os.getpid())
-        self._qemu_log_path = os.path.join(test_dir, 'qemu-log.%d' % os.getpid())
-        self._qtest_path = os.path.join(test_dir, 'qemu-qtest.%d' % os.getpid())
-        self._args = qemu_args + ['-chardev',
-                     'socket,id=mon,path=' + self._monitor_path,
-                     '-mon', 'chardev=mon,mode=control',
-                     '-qtest', 'unix:path=' + self._qtest_path,
-                     '-machine', 'accel=qtest',
-                     '-display', 'none', '-vga', 'none']
+        super(self, VM).__init__(qemu_prog, qemu_opts, test_dir)
         self._num_drives = 0
-        self._events = []
-
-    # This can be used to add an unused monitor instance.
-    def add_monitor_telnet(self, ip, port):
-        args = 'tcp:%s:%d,server,nowait,telnet' % (ip, port)
-        self._args.append('-monitor')
-        self._args.append(args)
 
     def add_drive_raw(self, opts):
         self._args.append('-drive')
@@ -211,106 +193,6 @@ class VM(object):
         return self.qmp('human-monitor-command',
                         command_line='qemu-io %s "%s"' % (drive, cmd))
 
-    def add_fd(self, fd, fdset, opaque, opts=''):
-        '''Pass a file descriptor to the VM'''
-        options = ['fd=%d' % fd,
-                   'set=%d' % fdset,
-                   'opaque=%s' % opaque]
-        if opts:
-            options.append(opts)
-
-        self._args.append('-add-fd')
-        self._args.append(','.join(options))
-        return self
-
-    def send_fd_scm(self, fd_file_path):
-        # In iotest.py, the qmp should always use unix socket.
-        assert self._qmp.is_scm_available()
-        bin = socket_scm_helper
-        if os.path.exists(bin) == False:
-            print "Scm help program does not present, path '%s'." % bin
-            return -1
-        fd_param = ["%s" % bin,
-                    "%d" % self._qmp.get_sock_fd(),
-                    "%s" % fd_file_path]
-        devnull = open('/dev/null', 'rb')
-        p = subprocess.Popen(fd_param, stdin=devnull, stdout=sys.stdout,
-                             stderr=sys.stderr)
-        return p.wait()
-
-    def launch(self):
-        '''Launch the VM and establish a QMP connection'''
-        devnull = open('/dev/null', 'rb')
-        qemulog = open(self._qemu_log_path, 'wb')
-        try:
-            self._qmp = qmp.QEMUMonitorProtocol(self._monitor_path, server=True)
-            self._qtest = qtest.QEMUQtestProtocol(self._qtest_path, server=True)
-            self._popen = subprocess.Popen(self._args, stdin=devnull, stdout=qemulog,
-                                           stderr=subprocess.STDOUT)
-            self._qmp.accept()
-            self._qtest.accept()
-        except:
-            _remove_if_exists(self._monitor_path)
-            _remove_if_exists(self._qtest_path)
-            raise
-
-    def shutdown(self):
-        '''Terminate the VM and clean up'''
-        if not self._popen is None:
-            self._qmp.cmd('quit')
-            exitcode = self._popen.wait()
-            if exitcode < 0:
-                sys.stderr.write('qemu received signal %i: %s\n' % (-exitcode, ' '.join(self._args)))
-            os.remove(self._monitor_path)
-            os.remove(self._qtest_path)
-            os.remove(self._qemu_log_path)
-            self._popen = None
-
-    underscore_to_dash = string.maketrans('_', '-')
-    def qmp(self, cmd, conv_keys=True, **args):
-        '''Invoke a QMP command and return the result dict'''
-        qmp_args = dict()
-        for k in args.keys():
-            if conv_keys:
-                qmp_args[k.translate(self.underscore_to_dash)] = args[k]
-            else:
-                qmp_args[k] = args[k]
-
-        return self._qmp.cmd(cmd, args=qmp_args)
-
-    def qtest(self, cmd):
-        '''Send a qtest command to guest'''
-        return self._qtest.cmd(cmd)
-
-    def get_qmp_event(self, wait=False):
-        '''Poll for one queued QMP events and return it'''
-        if len(self._events) > 0:
-            return self._events.pop(0)
-        return self._qmp.pull_event(wait=wait)
-
-    def get_qmp_events(self, wait=False):
-        '''Poll for queued QMP events and return a list of dicts'''
-        events = self._qmp.get_events(wait=wait)
-        events.extend(self._events)
-        del self._events[:]
-        self._qmp.clear_events()
-        return events
-
-    def event_wait(self, name='BLOCK_JOB_COMPLETED', timeout=60.0, match=None):
-        # Search cached events
-        for event in self._events:
-            if (event['event'] == name) and event_match(event, match):
-                self._events.remove(event)
-                return event
-
-        # Poll for new events
-        while True:
-            event = self._qmp.pull_event(wait=timeout)
-            if (event['event'] == name) and event_match(event, match):
-                return event
-            self._events.append(event)
-
-        return None
 
 index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
 
@@ -427,15 +309,6 @@ class QMPTestCase(unittest.TestCase):
         event = self.wait_until_completed(drive=drive)
         self.assert_qmp(event, 'data/type', 'mirror')
 
-def _remove_if_exists(path):
-    '''Remove file object at path if it exists'''
-    try:
-        os.remove(path)
-    except OSError as exception:
-        if exception.errno == errno.ENOENT:
-           return
-        raise
-
 def notrun(reason):
     '''Skip this test suite'''
     # Each test in qemu-iotests has a number ("seq")
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v1 4/6] scripts: set timeout when waiting for qemu monitor connection
  2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
                   ` (2 preceding siblings ...)
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 3/6] scripts: refactor the VM class in iotests for reuse Daniel P. Berrange
@ 2016-05-05 14:27 ` Daniel P. Berrange
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 5/6] scripts: ensure monitor socket has SO_REUSEADDR set Daniel P. Berrange
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2016-05-05 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Amit Shah, Dr. David Alan Gilbert, Daniel P. Berrange

If QEMU fails to launch for some reason, the QEMUMonitorProtocol
class accept() method will wait forever in a socket accept call.
Set a timeout of 15 seconds so that we fail more gracefully
instead of hanging the test script forever

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
 scripts/qmp/qmp.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
index 70e927e..2d0d926 100644
--- a/scripts/qmp/qmp.py
+++ b/scripts/qmp/qmp.py
@@ -140,6 +140,7 @@ class QEMUMonitorProtocol:
         @raise QMPConnectError if the greeting is not received
         @raise QMPCapabilitiesError if fails to negotiate capabilities
         """
+        self.__sock.settimeout(15)
         self.__sock, _ = self.__sock.accept()
         self.__sockfile = self.__sock.makefile()
         return self.__negotiate_capabilities()
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v1 5/6] scripts: ensure monitor socket has SO_REUSEADDR set
  2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
                   ` (3 preceding siblings ...)
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 4/6] scripts: set timeout when waiting for qemu monitor connection Daniel P. Berrange
@ 2016-05-05 14:27 ` Daniel P. Berrange
  2016-05-23  7:34   ` Amit Shah
  2016-05-05 14:28 ` [Qemu-devel] [PATCH v1 6/6] tests: introduce a framework for testing migration performance Daniel P. Berrange
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Daniel P. Berrange @ 2016-05-05 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Amit Shah, Dr. David Alan Gilbert, Daniel P. Berrange

If tests use a TCP based monitor socket, the connection will
go into a TIMED_WAIT state when the test exits. This will
randomly prevent the test from being re-run without a certain
time period. Set the SO_REUSEADDR flag on the socket to ensure
we can immediately re-run the tests
---
 scripts/qmp/qmp.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
index 2d0d926..62d3651 100644
--- a/scripts/qmp/qmp.py
+++ b/scripts/qmp/qmp.py
@@ -43,6 +43,7 @@ class QEMUMonitorProtocol:
         self._debug = debug
         self.__sock = self.__get_sock()
         if server:
+            self.__sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
             self.__sock.bind(self.__address)
             self.__sock.listen(1)
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v1 6/6] tests: introduce a framework for testing migration performance
  2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
                   ` (4 preceding siblings ...)
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 5/6] scripts: ensure monitor socket has SO_REUSEADDR set Daniel P. Berrange
@ 2016-05-05 14:28 ` Daniel P. Berrange
  2016-05-05 15:39 ` [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Dr. David Alan Gilbert
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2016-05-05 14:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Amit Shah, Dr. David Alan Gilbert, Daniel P. Berrange

This introduces a moderately general purpose framework for
testing performance of migration.

The initial guest workload is provided by the included 'stress'
program, which is configured to spawn one thread per guest CPU
and run a maximally memory intensive workload. It will loop
over GB of memory, xor'ing each byte with data from a 4k array
of random bytes. This ensures heavy read and write load across
all of guest memory to stress the migration performance. While
running the 'stress' program will record how long it takes to
xor each GB of memory and print this data for later reporting.

The test engine will spawn a pair of QEMU processes, either on
the same host, or with the target on a remote host via ssh,
using the host kernel and a custom initrd built with 'stress'
as the /init binary. Kernel command line args are set to ensure
a fast kernel boot time (< 1 second) between launching QEMU and
the stress program starting execution.

None the less, the test engine will initially wait N seconds for
the guest workload to stablize, before starting the migration
operation. When migration is running, the engine will use pause,
post-copy, autoconverge, xbzrle compression and multithread
compression features, as well as downtime & bandwidth tuning
to encourage completion. If migration completes, the test engine
will wait N seconds again for the guest workooad to stablize on
the target host. If migration does not complete after a preset
number of iterations, it will be aborted.

While the QEMU process is running on the source host, the test
engine will sample the host CPU usage of QEMU as a whole, and
each vCPU thread. While migration is running, it will record
all the stats reported by 'query-migration'. Finally, it will
capture the output of the stress program running in the guest.

All the data produced from a single test execution is recorded
in a structured JSON file. A separate program is then able to
create interactive charts using the "plotly" python + javascript
libraries, showing the characteristics of the migration.

The data output provides visualization of the effect on guest
vCPU workloads from the migration process, the corresponding
vCPU utilization on the host, and the overall CPU hit from
QEMU on the host. This is correlated from statistics from the
migration process, such as downtime, vCPU throttling and iteration
number.

While the tests can be run individually with arbitrary parameters,
there is also a facility for producing batch reports for a number
of pre-defined scenarios / comparisons, in order to be able to
get standardized results across different hardware configurations
(eg TCP vs RDMA, or comparing different VCPU counts / memory
sizes, etc).

To use this, first you must build the initrd image

 $ make tests/migration/initrd-stress.img

To run a a one-shot test with all default parameters

 $ ./tests/migration/guestperf.py > result.json

This has many command line args for varying its behaviour.
For example, to increase the RAM size and CPU count and
bind it to specific host NUMA nodes

 $ ./tests/migration/guestperf.py \
       --mem 4 --cpus 2 \
       --src-mem-bind 0 --src-cpu-bind 0,1 \
       --dst-mem-bind 1 --dst-cpu-bind 2,3 \
       > result.json

Using mem + cpu binding is strongly recommended on NUMA
machines, otherwise the guest performance results will
vary wildly between runs of the test due to lucky/unlucky
NUMA placement, making sensible data analysis impossible.

To make it run across separate hosts:

 $ ./tests/migration/guestperf.py \
       --dst-host somehostname > result.json

To request that post-copy is enabled, with switchover
after 5 iterations

 $ ./tests/migration/guestperf.py \
       --post-copy --post-copy-iters 5 > result.json

Once a result.json file is created, a graph of the data
can be generated, showing guest workload performance per
thread and the migration iteration points:

 $ ./tests/migration/guestperf-plot.py --output result.html \
        --migration-iters --split-guest-cpu result.json

To further include host vCPU utilization and overall QEMU
utilization

 $ ./tests/migration/guestperf-plot.py --output result.html \
        --migration-iters --split-guest-cpu \
	--qemu-cpu --vcpu-cpu result.json

NB, the 'guestperf-plot.py' command requires that you have
the plotly python library installed. eg you must do

 $ pip install --user  plotly

Viewing the result.html file requires that you have the
plotly.min.js file in the same directory as the HTML
output. This js file is installed as part of the plotly
python library, so can be found in

  $HOME/.local/lib/python2.7/site-packages/plotly/offline/plotly.min.js

The guestperf-plot.py program can accept multiple json files
to plot, enabling results from different configurations to
be compared.

Finally, to run the entire standardized set of comparisons

  $ ./tests/migration/guestperf-batch.py \
       --dst-host somehost \
       --mem 4 --cpus 2 \
       --src-mem-bind 0 --src-cpu-bind 0,1 \
       --dst-mem-bind 1 --dst-cpu-bind 2,3
       --output tcp-somehost-4gb-2cpu

will store JSON files from all scenarios in the directory
named tcp-somehost-4gb-2cpu

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
 configure                               |   2 +
 tests/Makefile                          |  12 +
 tests/migration/.gitignore              |   2 +
 tests/migration/guestperf-batch.py      |  26 ++
 tests/migration/guestperf-plot.py       |  26 ++
 tests/migration/guestperf.py            |  27 ++
 tests/migration/guestperf/__init__.py   |   0
 tests/migration/guestperf/comparison.py | 124 +++++++
 tests/migration/guestperf/engine.py     | 439 ++++++++++++++++++++++
 tests/migration/guestperf/hardware.py   |  62 ++++
 tests/migration/guestperf/plot.py       | 623 ++++++++++++++++++++++++++++++++
 tests/migration/guestperf/progress.py   | 117 ++++++
 tests/migration/guestperf/report.py     |  98 +++++
 tests/migration/guestperf/scenario.py   |  95 +++++
 tests/migration/guestperf/shell.py      | 255 +++++++++++++
 tests/migration/guestperf/timings.py    |  55 +++
 tests/migration/stress.c                | 367 +++++++++++++++++++
 17 files changed, 2330 insertions(+)
 create mode 100644 tests/migration/.gitignore
 create mode 100755 tests/migration/guestperf-batch.py
 create mode 100755 tests/migration/guestperf-plot.py
 create mode 100755 tests/migration/guestperf.py
 create mode 100644 tests/migration/guestperf/__init__.py
 create mode 100644 tests/migration/guestperf/comparison.py
 create mode 100644 tests/migration/guestperf/engine.py
 create mode 100644 tests/migration/guestperf/hardware.py
 create mode 100644 tests/migration/guestperf/plot.py
 create mode 100644 tests/migration/guestperf/progress.py
 create mode 100644 tests/migration/guestperf/report.py
 create mode 100644 tests/migration/guestperf/scenario.py
 create mode 100644 tests/migration/guestperf/shell.py
 create mode 100644 tests/migration/guestperf/timings.py
 create mode 100644 tests/migration/stress.c

diff --git a/configure b/configure
index c37fc5f..43403d2 100755
--- a/configure
+++ b/configure
@@ -3091,6 +3091,7 @@ else
       if test "$found" = "no"; then
         LIBS="$pthread_lib $LIBS"
       fi
+      PTHREAD_LIB="$pthread_lib"
       break
     fi
   done
@@ -5515,6 +5516,7 @@ echo "LDFLAGS=$LDFLAGS" >> $config_host_mak
 echo "LDFLAGS_NOPIE=$LDFLAGS_NOPIE" >> $config_host_mak
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "LIBS_TOOLS+=$libs_tools" >> $config_host_mak
+echo "PTHREAD_LIB=$PTHREAD_LIB" >> $config_host_mak
 echo "EXESUF=$EXESUF" >> $config_host_mak
 echo "DSOSUF=$DSOSUF" >> $config_host_mak
 echo "LDFLAGS_SHARED=$LDFLAGS_SHARED" >> $config_host_mak
diff --git a/tests/Makefile b/tests/Makefile
index 9194f18..50b9d43 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -589,6 +589,18 @@ tests/test-filter-redirector$(EXESUF): tests/test-filter-redirector.o $(qtest-ob
 tests/ivshmem-test$(EXESUF): tests/ivshmem-test.o contrib/ivshmem-server/ivshmem-server.o $(libqos-pc-obj-y)
 tests/vhost-user-bridge$(EXESUF): tests/vhost-user-bridge.o
 
+tests/migration/stress$(EXESUF): tests/migration/stress.o
+	$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< ,"  LINK  $(TARGET_DIR)$@")
+
+INITRD_WORK_DIR=tests/migration/initrd
+
+tests/migration/initrd-stress.img: tests/migration/stress$(EXESUF)
+	mkdir -p $(INITRD_WORK_DIR)
+	cp $< $(INITRD_WORK_DIR)/init
+	(cd $(INITRD_WORK_DIR) && (find | cpio --quiet -o -H newc | gzip -9)) > $@
+	rm $(INITRD_WORK_DIR)/init
+	rmdir $(INITRD_WORK_DIR)
+
 ifeq ($(CONFIG_POSIX),y)
 LIBS += -lutil
 endif
diff --git a/tests/migration/.gitignore b/tests/migration/.gitignore
new file mode 100644
index 0000000..84f3755
--- /dev/null
+++ b/tests/migration/.gitignore
@@ -0,0 +1,2 @@
+initrd-stress.img
+stress
diff --git a/tests/migration/guestperf-batch.py b/tests/migration/guestperf-batch.py
new file mode 100755
index 0000000..cb150ce
--- /dev/null
+++ b/tests/migration/guestperf-batch.py
@@ -0,0 +1,26 @@
+#!/usr/bin/python
+#
+# Migration test batch comparison invokation
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+import sys
+
+from guestperf.shell import BatchShell
+
+shell = BatchShell()
+sys.exit(shell.run(sys.argv[1:]))
diff --git a/tests/migration/guestperf-plot.py b/tests/migration/guestperf-plot.py
new file mode 100755
index 0000000..d70bb7a
--- /dev/null
+++ b/tests/migration/guestperf-plot.py
@@ -0,0 +1,26 @@
+#!/usr/bin/python
+#
+# Migration test graph plotting command
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+import sys
+
+from guestperf.shell import PlotShell
+
+shell = PlotShell()
+sys.exit(shell.run(sys.argv[1:]))
diff --git a/tests/migration/guestperf.py b/tests/migration/guestperf.py
new file mode 100755
index 0000000..99b027e
--- /dev/null
+++ b/tests/migration/guestperf.py
@@ -0,0 +1,27 @@
+#!/usr/bin/python
+#
+# Migration test direct invokation command
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+
+import sys
+
+from guestperf.shell import Shell
+
+shell = Shell()
+sys.exit(shell.run(sys.argv[1:]))
diff --git a/tests/migration/guestperf/__init__.py b/tests/migration/guestperf/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/migration/guestperf/comparison.py b/tests/migration/guestperf/comparison.py
new file mode 100644
index 0000000..d0b7df9
--- /dev/null
+++ b/tests/migration/guestperf/comparison.py
@@ -0,0 +1,124 @@
+#
+# Migration test scenario comparison mapping
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+from guestperf.scenario import Scenario
+
+class Comparison(object):
+    def __init__(self, name, scenarios):
+        self._name = name
+        self._scenarios = scenarios
+
+COMPARISONS = [
+    # Looking at effect of pausing guest during migration
+    # at various stages of iteration over RAM
+    Comparison("pause-iters", scenarios = [
+        Scenario("pause-iters-0",
+                 pause=True, pause_iters=0),
+        Scenario("pause-iters-1",
+                 pause=True, pause_iters=1),
+        Scenario("pause-iters-5",
+                 pause=True, pause_iters=5),
+        Scenario("pause-iters-20",
+                 pause=True, pause_iters=20),
+    ]),
+
+
+    # Looking at use of post-copy in relation to bandwidth
+    # available for migration
+    Comparison("post-copy-bandwidth", scenarios = [
+        Scenario("post-copy-bw-100mbs",
+                 post_copy=True, bandwidth=12),
+        Scenario("post-copy-bw-300mbs",
+                 post_copy=True, bandwidth=37),
+        Scenario("post-copy-bw-1gbs",
+                 post_copy=True, bandwidth=125),
+        Scenario("post-copy-bw-10gbs",
+                 post_copy=True, bandwidth=1250),
+        Scenario("post-copy-bw-100gbs",
+                 post_copy=True, bandwidth=12500),
+    ]),
+
+
+    # Looking at effect of starting post-copy at different
+    # stages of the migration
+    Comparison("post-copy-iters", scenarios = [
+        Scenario("post-copy-iters-0",
+                 post_copy=True, post_copy_iters=0),
+        Scenario("post-copy-iters-1",
+                 post_copy=True, post_copy_iters=1),
+        Scenario("post-copy-iters-5",
+                 post_copy=True, post_copy_iters=5),
+        Scenario("post-copy-iters-20",
+                 post_copy=True, post_copy_iters=20),
+    ]),
+
+
+    # Looking at effect of auto-converge with different
+    # throttling percentage step rates
+    Comparison("auto-converge-iters", scenarios = [
+        Scenario("auto-converge-step-5",
+                 auto_converge=True, auto_converge_step=5),
+        Scenario("auto-converge-step-10",
+                 auto_converge=True, auto_converge_step=10),
+        Scenario("auto-converge-step-20",
+                 auto_converge=True, auto_converge_step=20),
+    ]),
+
+
+    # Looking at use of auto-converge in relation to bandwidth
+    # available for migration
+    Comparison("auto-converge-bandwidth", scenarios = [
+        Scenario("auto-converge-bw-100mbs",
+                 auto_converge=True, bandwidth=12),
+        Scenario("auto-converge-bw-300mbs",
+                 auto_converge=True, bandwidth=37),
+        Scenario("auto-converge-bw-1gbs",
+                 auto_converge=True, bandwidth=125),
+        Scenario("auto-converge-bw-10gbs",
+                 auto_converge=True, bandwidth=1250),
+        Scenario("auto-converge-bw-100gbs",
+                 auto_converge=True, bandwidth=12500),
+    ]),
+
+
+    # Looking at effect of multi-thread compression with
+    # varying numbers of threads
+    Comparison("compr-mt", scenarios = [
+        Scenario("compr-mt-threads-1",
+                 compression_mt=True, compression_mt_threads=1),
+        Scenario("compr-mt-threads-2",
+                 compression_mt=True, compression_mt_threads=2),
+        Scenario("compr-mt-threads-4",
+                 compression_mt=True, compression_mt_threads=4),
+    ]),
+
+
+    # Looking at effect of xbzrle compression with varying
+    # cache sizes
+    Comparison("compr-xbzrle", scenarios = [
+        Scenario("compr-xbzrle-cache-5",
+                 compression_xbzrle=True, compression_xbzrle_cache=5),
+        Scenario("compr-xbzrle-cache-10",
+                 compression_xbzrle=True, compression_xbzrle_cache=10),
+        Scenario("compr-xbzrle-cache-20",
+                 compression_xbzrle=True, compression_xbzrle_cache=10),
+        Scenario("compr-xbzrle-cache-50",
+                 compression_xbzrle=True, compression_xbzrle_cache=50),
+    ]),
+]
diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py
new file mode 100644
index 0000000..5cf669a
--- /dev/null
+++ b/tests/migration/guestperf/engine.py
@@ -0,0 +1,439 @@
+#
+# Migration test main engine
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+
+import os
+import re
+import sys
+import time
+
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', '..', 'scripts'))
+import qemu
+import qmp.qmp
+from guestperf.progress import Progress, ProgressStats
+from guestperf.report import Report
+from guestperf.timings import TimingRecord, Timings
+
+
+class Engine(object):
+
+    def __init__(self, binary, dst_host, kernel, initrd, transport="tcp",
+                 sleep=15, verbose=False, debug=False):
+
+        self._binary = binary # Path to QEMU binary
+        self._dst_host = dst_host # Hostname of target host
+        self._kernel = kernel # Path to kernel image
+        self._initrd = initrd # Path to stress initrd
+        self._transport = transport # 'unix' or 'tcp' or 'rdma'
+        self._sleep = sleep
+        self._verbose = verbose
+        self._debug = debug
+
+        if debug:
+            self._verbose = debug
+
+    def _vcpu_timing(self, pid, tid_list):
+        records = []
+        now = time.time()
+
+        jiffies_per_sec = os.sysconf(os.sysconf_names['SC_CLK_TCK'])
+        for tid in tid_list:
+            statfile = "/proc/%d/task/%d/stat" % (pid, tid)
+            with open(statfile, "r") as fh:
+                stat = fh.readline()
+                fields = stat.split(" ")
+                stime = int(fields[13])
+                utime = int(fields[14])
+                records.append(TimingRecord(tid, now, 1000 * (stime + utime) / jiffies_per_sec))
+        return records
+
+    def _cpu_timing(self, pid):
+        records = []
+        now = time.time()
+
+        jiffies_per_sec = os.sysconf(os.sysconf_names['SC_CLK_TCK'])
+        statfile = "/proc/%d/stat" % pid
+        with open(statfile, "r") as fh:
+            stat = fh.readline()
+            fields = stat.split(" ")
+            stime = int(fields[13])
+            utime = int(fields[14])
+            return TimingRecord(pid, now, 1000 * (stime + utime) / jiffies_per_sec)
+
+    def _migrate_progress(self, vm):
+        info = vm.command("query-migrate")
+
+        if "ram" not in info:
+            info["ram"] = {}
+
+        return Progress(
+            info.get("status", "active"),
+            ProgressStats(
+                info["ram"].get("transferred", 0),
+                info["ram"].get("remaining", 0),
+                info["ram"].get("total", 0),
+                info["ram"].get("duplicate", 0),
+                info["ram"].get("skipped", 0),
+                info["ram"].get("normal", 0),
+                info["ram"].get("normal-bytes", 0),
+                info["ram"].get("dirty-pages-rate", 0),
+                info["ram"].get("mbps", 0),
+                info["ram"].get("dirty-sync-count", 0)
+            ),
+            time.time(),
+            info.get("total-time", 0),
+            info.get("downtime", 0),
+            info.get("expected-downtime", 0),
+            info.get("setup-time", 0),
+            info.get("x-cpu-throttle-percentage", 0),
+        )
+
+    def _migrate(self, hardware, scenario, src, dst, connect_uri):
+        src_qemu_time = []
+        src_vcpu_time = []
+        src_pid = src.get_pid()
+
+        vcpus = src.command("query-cpus")
+        src_threads = []
+        for vcpu in vcpus:
+            src_threads.append(vcpu["thread_id"])
+
+        # XXX how to get dst timings on remote host ?
+
+        if self._verbose:
+            print "Sleeping %d seconds for initial guest workload run" % self._sleep
+        sleep_secs = self._sleep
+        while sleep_secs > 1:
+            src_qemu_time.append(self._cpu_timing(src_pid))
+            src_vcpu_time.extend(self._vcpu_timing(src_pid, src_threads))
+            time.sleep(1)
+            sleep_secs -= 1
+
+        if self._verbose:
+            print "Starting migration"
+        if scenario._auto_converge:
+            resp = src.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "auto-converge",
+                                     "state": True }
+                               ])
+            resp = src.command("migrate-set-parameters",
+                               x_cpu_throttle_increment=scenario._auto_converge_step)
+
+        if scenario._post_copy:
+            resp = src.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "postcopy-ram",
+                                     "state": True }
+                               ])
+            resp = dst.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "postcopy-ram",
+                                     "state": True }
+                               ])
+
+        resp = src.command("migrate_set_speed",
+                           value=scenario._bandwidth * 1024 * 1024)
+
+        resp = src.command("migrate_set_downtime",
+                           value=scenario._downtime / 1024.0)
+
+        if scenario._compression_mt:
+            resp = src.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "compress",
+                                     "state": True }
+                               ])
+            resp = src.command("migrate-set-parameters",
+                               compress_threads=scenario._compression_mt_threads)
+            resp = dst.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "compress",
+                                     "state": True }
+                               ])
+            resp = dst.command("migrate-set-parameters",
+                               decompress_threads=scenario._compression_mt_threads)
+
+        if scenario._compression_xbzrle:
+            resp = src.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "xbzrle",
+                                     "state": True }
+                               ])
+            resp = dst.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "xbzrle",
+                                     "state": True }
+                               ])
+            resp = src.command("migrate-set-cache-size",
+                               value=(hardware._mem * 1024 * 1024 * 1024 / 100 *
+                                      scenario._compression_xbzrle_cache))
+
+        resp = src.command("migrate", uri=connect_uri)
+
+        post_copy = False
+        paused = False
+
+        progress_history = []
+
+        start = time.time()
+        loop = 0
+        while True:
+            loop = loop + 1
+            time.sleep(0.05)
+
+            progress = self._migrate_progress(src)
+            if (loop % 20) == 0:
+                src_qemu_time.append(self._cpu_timing(src_pid))
+                src_vcpu_time.extend(self._vcpu_timing(src_pid, src_threads))
+
+            if (len(progress_history) == 0 or
+                (progress_history[-1]._ram._iterations <
+                 progress._ram._iterations)):
+                progress_history.append(progress)
+
+            if progress._status in ("completed", "failed", "cancelled"):
+                if paused:
+                    dst.command("cont")
+                if progress_history[-1] != progress:
+                    progress_history.append(progress)
+
+                if progress._status == "completed":
+                    if self._verbose:
+                        print "Sleeping %d seconds for final guest workload run" % self._sleep
+                    sleep_secs = self._sleep
+                    while sleep_secs > 1:
+                        time.sleep(1)
+                        src_qemu_time.append(self._cpu_timing(src_pid))
+                        src_vcpu_time.extend(self._vcpu_timing(src_pid, src_threads))
+                        sleep_secs -= 1
+
+                return [progress_history, src_qemu_time, src_vcpu_time]
+
+            if self._verbose and (loop % 20) == 0:
+                print "Iter %d: remain %5dMB of %5dMB (total %5dMB @ %5dMb/sec)" % (
+                    progress._ram._iterations,
+                    progress._ram._remaining_bytes / (1024 * 1024),
+                    progress._ram._total_bytes / (1024 * 1024),
+                    progress._ram._transferred_bytes / (1024 * 1024),
+                    progress._ram._transfer_rate_mbs,
+                )
+
+            if progress._ram._iterations > scenario._max_iters:
+                if self._verbose:
+                    print "No completion after %d iterations over RAM" % scenario._max_iters
+                src.command("migrate_cancel")
+                continue
+
+            if time.time() > (start + scenario._max_time):
+                if self._verbose:
+                    print "No completion after %d seconds" % scenario._max_time
+                src.command("migrate_cancel")
+                continue
+
+            if (scenario._post_copy and
+                progress._ram._iterations >= scenario._post_copy_iters and
+                not post_copy):
+                if self._verbose:
+                    print "Switching to post-copy after %d iterations" % scenario._post_copy_iters
+                resp = src.command("migrate-start-postcopy")
+                post_copy = True
+
+            if (scenario._pause and
+                progress._ram._iterations >= scenario._pause_iters and
+                not paused):
+                if self._verbose:
+                    print "Pausing VM after %d iterations" % scenario._pause_iters
+                resp = src.command("stop")
+                paused = True
+
+    def _get_common_args(self, hardware, tunnelled=False):
+        args = [
+            "noapic",
+            "edd=off",
+            "printk.time=1",
+            "noreplace-smp",
+            "cgroup_disable=memory",
+            "pci=noearly",
+            "console=ttyS0",
+        ]
+        if self._debug:
+            args.append("debug")
+        else:
+            args.append("quiet")
+
+        args.append("ramsize=%s" % hardware._mem)
+
+        cmdline = " ".join(args)
+        if tunnelled:
+            cmdline = "'" + cmdline + "'"
+
+        argv = [
+            "-machine", "accel=kvm",
+            "-cpu", "host",
+            "-kernel", self._kernel,
+            "-initrd", self._initrd,
+            "-append", cmdline,
+            "-chardev", "stdio,id=cdev0",
+            "-device", "isa-serial,chardev=cdev0",
+            "-m", str((hardware._mem * 1024) + 512),
+            "-smp", str(hardware._cpus),
+        ]
+
+        if self._debug:
+            argv.extend(["-device", "sga"])
+
+        if hardware._prealloc_pages:
+            argv_source += ["-mem-path", "/dev/shm",
+                            "-mem-prealloc"]
+        if hardware._locked_pages:
+            argv_source += ["-realtime", "mlock=on"]
+        if hardware._huge_pages:
+            pass
+
+        return argv
+
+    def _get_src_args(self, hardware):
+        return self._get_common_args(hardware)
+
+    def _get_dst_args(self, hardware, uri):
+        tunnelled = False
+        if self._dst_host != "localhost":
+            tunnelled = True
+        argv = self._get_common_args(hardware, tunnelled)
+        return argv + ["-incoming", uri]
+
+    @staticmethod
+    def _get_common_wrapper(cpu_bind, mem_bind):
+        wrapper = []
+        if len(cpu_bind) > 0 or len(mem_bind) > 0:
+            wrapper.append("numactl")
+            if cpu_bind:
+                wrapper.append("--physcpubind=%s" % ",".join(cpu_bind))
+            if mem_bind:
+                wrapper.append("--membind=%s" % ",".join(mem_bind))
+
+        return wrapper
+
+    def _get_src_wrapper(self, hardware):
+        return self._get_common_wrapper(hardware._src_cpu_bind, hardware._src_mem_bind)
+
+    def _get_dst_wrapper(self, hardware):
+        wrapper = self._get_common_wrapper(hardware._dst_cpu_bind, hardware._dst_mem_bind)
+        if self._dst_host != "localhost":
+            return ["ssh",
+                    "-R", "9001:localhost:9001",
+                    self._dst_host] + wrapper
+        else:
+            return wrapper
+
+    def _get_timings(self, vm):
+        log = vm.get_log()
+        if not log:
+            return []
+        if self._debug:
+            print log
+
+        regex = r"[^\s]+\s\((\d+)\):\sINFO:\s(\d+)ms\scopied\s\d+\sGB\sin\s(\d+)ms"
+        matcher = re.compile(regex)
+        records = []
+        for line in log.split("\n"):
+            match = matcher.match(line)
+            if match:
+                records.append(TimingRecord(int(match.group(1)),
+                                            int(match.group(2)) / 1000.0,
+                                            int(match.group(3))))
+        return records
+
+    def run(self, hardware, scenario, result_dir=os.getcwd()):
+        abs_result_dir = os.path.join(result_dir, scenario._name)
+
+        if self._transport == "tcp":
+            uri = "tcp:%s:9000" % self._dst_host
+        elif self._transport == "rdma":
+            uri = "rdma:%s:9000" % self._dst_host
+        elif self._transport == "unix":
+            if self._dst_host != "localhost":
+                raise Exception("Running use unix migration transport for non-local host")
+            uri = "unix:/var/tmp/qemu-migrate-%d.migrate" % os.getpid()
+            try:
+                os.remove(uri[5:])
+                os.remove(monaddr)
+            except:
+                pass
+
+        if self._dst_host != "localhost":
+            dstmonaddr = ("localhost", 9001)
+        else:
+            dstmonaddr = "/var/tmp/qemu-dst-%d-monitor.sock" % os.getpid()
+        srcmonaddr = "/var/tmp/qemu-src-%d-monitor.sock" % os.getpid()
+
+        src = qemu.QEMUMachine(self._binary,
+                               args=self._get_src_args(hardware),
+                               wrapper=self._get_src_wrapper(hardware),
+                               name="qemu-src-%d" % os.getpid(),
+                               monitor_address=srcmonaddr,
+                               debug=self._debug)
+
+        dst = qemu.QEMUMachine(self._binary,
+                               args=self._get_dst_args(hardware, uri),
+                               wrapper=self._get_dst_wrapper(hardware),
+                               name="qemu-dst-%d" % os.getpid(),
+                               monitor_address=dstmonaddr,
+                               debug=self._debug)
+
+        try:
+            src.launch()
+            dst.launch()
+
+            ret = self._migrate(hardware, scenario, src, dst, uri)
+            progress_history = ret[0]
+            qemu_timings = ret[1]
+            vcpu_timings = ret[2]
+            if uri[0:5] == "unix:":
+                os.remove(uri[5:])
+            if self._verbose:
+                print "Finished migration"
+
+            src.shutdown()
+            dst.shutdown()
+
+            return Report(hardware, scenario, progress_history,
+                          Timings(self._get_timings(src) + self._get_timings(dst)),
+                          Timings(qemu_timings),
+                          Timings(vcpu_timings),
+                          self._binary, self._dst_host, self._kernel,
+                          self._initrd, self._transport, self._sleep)
+        except Exception as e:
+            if self._debug:
+                print "Failed: %s" % str(e)
+            try:
+                src.shutdown()
+            except:
+                pass
+            try:
+                dst.shutdown()
+            except:
+                pass
+
+            if self._debug:
+                print src.get_log()
+                print dst.get_log()
+            raise
+
diff --git a/tests/migration/guestperf/hardware.py b/tests/migration/guestperf/hardware.py
new file mode 100644
index 0000000..0dcf477
--- /dev/null
+++ b/tests/migration/guestperf/hardware.py
@@ -0,0 +1,62 @@
+#
+# Migration test hardware configuration description
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+
+class Hardware(object):
+    def __init__(self, cpus=1, mem=1,
+                 src_cpu_bind=None, src_mem_bind=None,
+                 dst_cpu_bind=None, dst_mem_bind=None,
+                 prealloc_pages = False,
+                 huge_pages=False, locked_pages=False):
+        self._cpus = cpus
+        self._mem = mem # GiB
+        self._src_mem_bind = src_mem_bind # List of NUMA nodes
+        self._src_cpu_bind = src_cpu_bind # List of pCPUs
+        self._dst_mem_bind = dst_mem_bind # List of NUMA nodes
+        self._dst_cpu_bind = dst_cpu_bind # List of pCPUs
+        self._prealloc_pages = prealloc_pages
+        self._huge_pages = huge_pages
+        self._locked_pages = locked_pages
+
+
+    def serialize(self):
+        return {
+            "cpus": self._cpus,
+            "mem": self._cpus,
+            "src_mem_bind": self._src_mem_bind,
+            "dst_mem_bind": self._dst_mem_bind,
+            "src_cpu_bind": self._src_cpu_bind,
+            "dst_cpu_bind": self._dst_cpu_bind,
+            "prealloc_pages": self._prealloc_pages,
+            "huge_pages": self._huge_pages,
+            "locked_pages": self._locked_pages,
+        }
+
+    @classmethod
+    def deserialize(cls, data):
+        return cls(
+            data["cpus"],
+            data["mem"],
+            data["src_cpu_bind"],
+            data["src_mem_bind"],
+            data["dst_cpu_bind"],
+            data["dst_mem_bind"],
+            data["prealloc_pages"],
+            data["huge_pages"],
+            data["locked_pages"])
diff --git a/tests/migration/guestperf/plot.py b/tests/migration/guestperf/plot.py
new file mode 100644
index 0000000..bc42249
--- /dev/null
+++ b/tests/migration/guestperf/plot.py
@@ -0,0 +1,623 @@
+#
+# Migration test graph plotting
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+import sys
+
+
+class Plot(object):
+
+    # Generated using
+    # http://tools.medialab.sciences-po.fr/iwanthue/
+    COLORS = ["#CD54D0",
+              "#79D94C",
+              "#7470CD",
+              "#D2D251",
+              "#863D79",
+              "#76DDA6",
+              "#D4467B",
+              "#61923D",
+              "#CB9CCA",
+              "#D98F36",
+              "#8CC8DA",
+              "#CE4831",
+              "#5E7693",
+              "#9B803F",
+              "#412F4C",
+              "#CECBA6",
+              "#6D3229",
+              "#598B73",
+              "#C8827C",
+              "#394427"]
+
+    def __init__(self,
+                 reports,
+                 migration_iters,
+                 total_guest_cpu,
+                 split_guest_cpu,
+                 qemu_cpu,
+                 vcpu_cpu):
+
+        self._reports = reports
+        self._migration_iters = migration_iters
+        self._total_guest_cpu = total_guest_cpu
+        self._split_guest_cpu = split_guest_cpu
+        self._qemu_cpu = qemu_cpu
+        self._vcpu_cpu = vcpu_cpu
+        self._color_idx = 0
+
+    def _next_color(self):
+        color = self.COLORS[self._color_idx]
+        self._color_idx += 1
+        if self._color_idx >= len(self.COLORS):
+            self._color_idx = 0
+        return color
+
+    def _get_progress_label(self, progress):
+        if progress:
+            return "\n\n" + "\n".join(
+                ["Status: %s" % progress._status,
+                 "Iteration: %d" % progress._ram._iterations,
+                 "Throttle: %02d%%" % progress._throttle_pcent,
+                 "Dirty rate: %dMB/s" % (progress._ram._dirty_rate_pps * 4 / 1024.0)])
+        else:
+            return "\n\n" + "\n".join(
+                ["Status: %s" % "none",
+                 "Iteration: %d" % 0])
+
+    def _find_start_time(self, report):
+        startqemu = report._qemu_timings._records[0]._timestamp
+        startguest = report._guest_timings._records[0]._timestamp
+        if startqemu < startguest:
+            return startqemu
+        else:
+            return stasrtguest
+
+    def _get_guest_max_value(self, report):
+        maxvalue = 0
+        for record in report._guest_timings._records:
+            if record._value > maxvalue:
+                maxvalue = record._value
+        return maxvalue
+
+    def _get_qemu_max_value(self, report):
+        maxvalue = 0
+        oldvalue = None
+        oldtime = None
+        for record in report._qemu_timings._records:
+            if oldvalue is not None:
+                cpudelta = (record._value - oldvalue) / 1000.0
+                timedelta = record._timestamp - oldtime
+                if timedelta == 0:
+                    continue
+                util = cpudelta / timedelta * 100.0
+            else:
+                util = 0
+            oldvalue = record._value
+            oldtime = record._timestamp
+
+            if util > maxvalue:
+                maxvalue = util
+        return maxvalue
+
+    def _get_total_guest_cpu_graph(self, report, starttime):
+        xaxis = []
+        yaxis = []
+        labels = []
+        progress_idx = -1
+        for record in report._guest_timings._records:
+            while ((progress_idx + 1) < len(report._progress_history) and
+                   report._progress_history[progress_idx + 1]._now < record._timestamp):
+                progress_idx = progress_idx + 1
+
+            if progress_idx >= 0:
+                progress = report._progress_history[progress_idx]
+            else:
+                progress = None
+
+            xaxis.append(record._timestamp - starttime)
+            yaxis.append(record._value)
+            labels.append(self._get_progress_label(progress))
+
+        from plotly import graph_objs as go
+        return go.Scatter(x=xaxis,
+                          y=yaxis,
+                          name="Guest PIDs: %s" % report._scenario._name,
+                          mode='lines',
+                          line={
+                              "dash": "solid",
+                              "color": self._next_color(),
+                              "shape": "linear",
+                              "width": 1
+                          },
+                          text=labels)
+
+    def _get_split_guest_cpu_graphs(self, report, starttime):
+        threads = {}
+        for record in report._guest_timings._records:
+            if record._tid in threads:
+                continue
+            threads[record._tid] = {
+                "xaxis": [],
+                "yaxis": [],
+                "labels": [],
+            }
+
+        progress_idx = -1
+        for record in report._guest_timings._records:
+            while ((progress_idx + 1) < len(report._progress_history) and
+                   report._progress_history[progress_idx + 1]._now < record._timestamp):
+                progress_idx = progress_idx + 1
+
+            if progress_idx >= 0:
+                progress = report._progress_history[progress_idx]
+            else:
+                progress = None
+
+            threads[record._tid]["xaxis"].append(record._timestamp - starttime)
+            threads[record._tid]["yaxis"].append(record._value)
+            threads[record._tid]["labels"].append(self._get_progress_label(progress))
+
+
+        graphs = []
+        from plotly import graph_objs as go
+        for tid in threads.keys():
+            graphs.append(
+                go.Scatter(x=threads[tid]["xaxis"],
+                           y=threads[tid]["yaxis"],
+                           name="PID %s: %s" % (tid, report._scenario._name),
+                           mode="lines",
+                           line={
+                               "dash": "solid",
+                               "color": self._next_color(),
+                               "shape": "linear",
+                               "width": 1
+                           },
+                           text=threads[tid]["labels"]))
+        return graphs
+
+    def _get_migration_iters_graph(self, report, starttime):
+        xaxis = []
+        yaxis = []
+        labels = []
+        for progress in report._progress_history:
+            xaxis.append(progress._now - starttime)
+            yaxis.append(0)
+            labels.append(self._get_progress_label(progress))
+
+        from plotly import graph_objs as go
+        return go.Scatter(x=xaxis,
+                          y=yaxis,
+                          text=labels,
+                          name="Migration iterations",
+                          mode="markers",
+                          marker={
+                              "color": self._next_color(),
+                              "symbol": "star",
+                              "size": 5
+                          })
+
+    def _get_qemu_cpu_graph(self, report, starttime):
+        xaxis = []
+        yaxis = []
+        labels = []
+        progress_idx = -1
+
+        first = report._qemu_timings._records[0]
+        abstimestamps = [first._timestamp]
+        absvalues = [first._value]
+
+        for record in report._qemu_timings._records[1:]:
+            while ((progress_idx + 1) < len(report._progress_history) and
+                   report._progress_history[progress_idx + 1]._now < record._timestamp):
+                progress_idx = progress_idx + 1
+
+            if progress_idx >= 0:
+                progress = report._progress_history[progress_idx]
+            else:
+                progress = None
+
+            oldvalue = absvalues[-1]
+            oldtime = abstimestamps[-1]
+
+            cpudelta = (record._value - oldvalue) / 1000.0
+            timedelta = record._timestamp - oldtime
+            if timedelta == 0:
+                continue
+            util = cpudelta / timedelta * 100.0
+
+            abstimestamps.append(record._timestamp)
+            absvalues.append(record._value)
+
+            xaxis.append(record._timestamp - starttime)
+            yaxis.append(util)
+            labels.append(self._get_progress_label(progress))
+
+        from plotly import graph_objs as go
+        return go.Scatter(x=xaxis,
+                          y=yaxis,
+                          yaxis="y2",
+                          name="QEMU: %s" % report._scenario._name,
+                          mode='lines',
+                          line={
+                              "dash": "solid",
+                              "color": self._next_color(),
+                              "shape": "linear",
+                              "width": 1
+                          },
+                          text=labels)
+
+    def _get_vcpu_cpu_graphs(self, report, starttime):
+        threads = {}
+        for record in report._vcpu_timings._records:
+            if record._tid in threads:
+                continue
+            threads[record._tid] = {
+                "xaxis": [],
+                "yaxis": [],
+                "labels": [],
+                "absvalue": [record._value],
+                "abstime": [record._timestamp],
+            }
+
+        progress_idx = -1
+        for record in report._vcpu_timings._records:
+            while ((progress_idx + 1) < len(report._progress_history) and
+                   report._progress_history[progress_idx + 1]._now < record._timestamp):
+                progress_idx = progress_idx + 1
+
+            if progress_idx >= 0:
+                progress = report._progress_history[progress_idx]
+            else:
+                progress = None
+
+            oldvalue = threads[record._tid]["absvalue"][-1]
+            oldtime = threads[record._tid]["abstime"][-1]
+
+            cpudelta = (record._value - oldvalue) / 1000.0
+            timedelta = record._timestamp - oldtime
+            if timedelta == 0:
+                continue
+            util = cpudelta / timedelta * 100.0
+            if util > 100:
+                util = 100
+
+            threads[record._tid]["absvalue"].append(record._value)
+            threads[record._tid]["abstime"].append(record._timestamp)
+
+            threads[record._tid]["xaxis"].append(record._timestamp - starttime)
+            threads[record._tid]["yaxis"].append(util)
+            threads[record._tid]["labels"].append(self._get_progress_label(progress))
+
+
+        graphs = []
+        from plotly import graph_objs as go
+        for tid in threads.keys():
+            graphs.append(
+                go.Scatter(x=threads[tid]["xaxis"],
+                           y=threads[tid]["yaxis"],
+                           yaxis="y2",
+                           name="VCPU %s: %s" % (tid, report._scenario._name),
+                           mode="lines",
+                           line={
+                               "dash": "solid",
+                               "color": self._next_color(),
+                               "shape": "linear",
+                               "width": 1
+                           },
+                           text=threads[tid]["labels"]))
+        return graphs
+
+    def _generate_chart_report(self, report):
+        graphs = []
+        starttime = self._find_start_time(report)
+        if self._total_guest_cpu:
+            graphs.append(self._get_total_guest_cpu_graph(report, starttime))
+        if self._split_guest_cpu:
+            graphs.extend(self._get_split_guest_cpu_graphs(report, starttime))
+        if self._qemu_cpu:
+            graphs.append(self._get_qemu_cpu_graph(report, starttime))
+        if self._vcpu_cpu:
+            graphs.extend(self._get_vcpu_cpu_graphs(report, starttime))
+        if self._migration_iters:
+            graphs.append(self._get_migration_iters_graph(report, starttime))
+        return graphs
+
+    def _generate_annotation(self, starttime, progress):
+        return {
+            "text": progress._status,
+            "x": progress._now - starttime,
+            "y": 10,
+        }
+
+    def _generate_annotations(self, report):
+        starttime = self._find_start_time(report)
+        annotations = {}
+        started = False
+        for progress in report._progress_history:
+            if progress._status == "setup":
+                continue
+            if progress._status not in annotations:
+                annotations[progress._status] = self._generate_annotation(starttime, progress)
+
+        return annotations.values()
+
+    def _generate_chart(self):
+        from plotly.offline import plot
+        from plotly import graph_objs as go
+
+        graphs = []
+        yaxismax = 0
+        yaxismax2 = 0
+        for report in self._reports:
+            graphs.extend(self._generate_chart_report(report))
+
+            maxvalue = self._get_guest_max_value(report)
+            if maxvalue > yaxismax:
+                yaxismax = maxvalue
+
+            maxvalue = self._get_qemu_max_value(report)
+            if maxvalue > yaxismax2:
+                yaxismax2 = maxvalue
+
+        yaxismax += 100
+        if not self._qemu_cpu:
+            yaxismax2 = 110
+        yaxismax2 += 10
+
+        annotations = []
+        if self._migration_iters:
+            for report in self._reports:
+                annotations.extend(self._generate_annotations(report))
+
+        layout = go.Layout(title="Migration comparison",
+                           xaxis={
+                               "title": "Wallclock time (secs)",
+                               "showgrid": False,
+                           },
+                           yaxis={
+                               "title": "Memory update speed (ms/GB)",
+                               "showgrid": False,
+                               "range": [0, yaxismax],
+                           },
+                           yaxis2={
+                               "title": "Hostutilization (%)",
+                               "overlaying": "y",
+                               "side": "right",
+                               "range": [0, yaxismax2],
+                               "showgrid": False,
+                           },
+                           annotations=annotations)
+
+        figure = go.Figure(data=graphs, layout=layout)
+
+        return plot(figure,
+                    show_link=False,
+                    include_plotlyjs=False,
+                    output_type="div")
+
+
+    def _generate_report(self):
+        pieces = []
+        for report in self._reports:
+            pieces.append("""
+<h3>Report %s</h3>
+<table>
+""" % report._scenario._name)
+
+            pieces.append("""
+  <tr class="subhead">
+    <th colspan="2">Test config</th>
+  </tr>
+  <tr>
+    <th>Emulator:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Kernel:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Ramdisk:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Transport:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Host:</th>
+    <td>%s</td>
+  </tr>
+""" % (report._binary, report._kernel,
+       report._initrd, report._transport, report._dst_host))
+
+            hardware = report._hardware
+            pieces.append("""
+  <tr class="subhead">
+    <th colspan="2">Hardware config</th>
+  </tr>
+  <tr>
+    <th>CPUs:</th>
+    <td>%d</td>
+  </tr>
+  <tr>
+    <th>RAM:</th>
+    <td>%d GB</td>
+  </tr>
+  <tr>
+    <th>Source CPU bind:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Source RAM bind:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Dest CPU bind:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Dest RAM bind:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Preallocate RAM:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Locked RAM:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Huge pages:</th>
+    <td>%s</td>
+  </tr>
+""" % (hardware._cpus, hardware._mem,
+       ",".join(hardware._src_cpu_bind),
+       ",".join(hardware._src_mem_bind),
+       ",".join(hardware._dst_cpu_bind),
+       ",".join(hardware._dst_mem_bind),
+       "yes" if hardware._prealloc_pages else "no",
+       "yes" if hardware._locked_pages else "no",
+       "yes" if hardware._huge_pages else "no"))
+
+            scenario = report._scenario
+            pieces.append("""
+  <tr class="subhead">
+    <th colspan="2">Scenario config</th>
+  </tr>
+  <tr>
+    <th>Max downtime:</th>
+    <td>%d milli-sec</td>
+  </tr>
+  <tr>
+    <th>Max bandwidth:</th>
+    <td>%d MB/sec</td>
+  </tr>
+  <tr>
+    <th>Max iters:</th>
+    <td>%d</td>
+  </tr>
+  <tr>
+    <th>Max time:</th>
+    <td>%d secs</td>
+  </tr>
+  <tr>
+    <th>Pause:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Pause iters:</th>
+    <td>%d</td>
+  </tr>
+  <tr>
+    <th>Post-copy:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Post-copy iters:</th>
+    <td>%d</td>
+  </tr>
+  <tr>
+    <th>Auto-converge:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>Auto-converge iters:</th>
+    <td>%d</td>
+  </tr>
+  <tr>
+    <th>MT compression:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>MT compression threads:</th>
+    <td>%d</td>
+  </tr>
+  <tr>
+    <th>XBZRLE compression:</th>
+    <td>%s</td>
+  </tr>
+  <tr>
+    <th>XBZRLE compression cache:</th>
+    <td>%d%% of RAM</td>
+  </tr>
+""" % (scenario._downtime, scenario._bandwidth,
+       scenario._max_iters, scenario._max_time,
+       "yes" if scenario._pause else "no", scenario._pause_iters,
+       "yes" if scenario._post_copy else "no", scenario._post_copy_iters,
+       "yes" if scenario._auto_converge else "no", scenario._auto_converge_step,
+       "yes" if scenario._compression_mt else "no", scenario._compression_mt_threads,
+       "yes" if scenario._compression_xbzrle else "no", scenario._compression_xbzrle_cache))
+
+            pieces.append("""
+</table>
+""")
+
+        return "\n".join(pieces)
+
+    def _generate_style(self):
+        return """
+#report table tr th {
+    text-align: right;
+}
+#report table tr td {
+    text-align: left;
+}
+#report table tr.subhead th {
+    background: rgb(192, 192, 192);
+    text-align: center;
+}
+
+"""
+
+    def generate_html(self, fh):
+        print >>fh, """<html>
+  <head>
+    <script type="text/javascript" src="plotly.min.js">
+    </script>
+    <style type="text/css">
+%s
+    </style>
+    <title>Migration report</title>
+  </head>
+  <body>
+    <h1>Migration report</h1>
+    <h2>Chart summary</h2>
+    <div id="chart">
+""" % self._generate_style()
+        print >>fh, self._generate_chart()
+        print >>fh, """
+    </div>
+    <h2>Report details</h2>
+    <div id="report">
+"""
+        print >>fh, self._generate_report()
+        print >>fh, """
+    </div>
+  </body>
+</html>
+"""
+
+    def generate(self, filename):
+        if filename is None:
+            self.generate_html(sys.stdout)
+        else:
+            with open(filename, "w") as fh:
+                self.generate_html(fh)
diff --git a/tests/migration/guestperf/progress.py b/tests/migration/guestperf/progress.py
new file mode 100644
index 0000000..46d2157
--- /dev/null
+++ b/tests/migration/guestperf/progress.py
@@ -0,0 +1,117 @@
+#
+# Migration test migration operation progress
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+
+class ProgressStats(object):
+
+    def __init__(self,
+                 transferred_bytes,
+                 remaining_bytes,
+                 total_bytes,
+                 duplicate_pages,
+                 skipped_pages,
+                 normal_pages,
+                 normal_bytes,
+                 dirty_rate_pps,
+                 transfer_rate_mbs,
+                 iterations):
+        self._transferred_bytes = transferred_bytes
+        self._remaining_bytes = remaining_bytes
+        self._total_bytes = total_bytes
+        self._duplicate_pages = duplicate_pages
+        self._skipped_pages = skipped_pages
+        self._normal_pages = normal_pages
+        self._normal_bytes = normal_bytes
+        self._dirty_rate_pps = dirty_rate_pps
+        self._transfer_rate_mbs = transfer_rate_mbs
+        self._iterations = iterations
+
+    def serialize(self):
+        return {
+            "transferred_bytes": self._transferred_bytes,
+            "remaining_bytes": self._remaining_bytes,
+            "total_bytes": self._total_bytes,
+            "duplicate_pages": self._duplicate_pages,
+            "skipped_pages": self._skipped_pages,
+            "normal_pages": self._normal_pages,
+            "normal_bytes": self._normal_bytes,
+            "dirty_rate_pps": self._dirty_rate_pps,
+            "transfer_rate_mbs": self._transfer_rate_mbs,
+            "iterations": self._iterations,
+        }
+
+    @classmethod
+    def deserialize(cls, data):
+        return cls(
+            data["transferred_bytes"],
+            data["remaining_bytes"],
+            data["total_bytes"],
+            data["duplicate_pages"],
+            data["skipped_pages"],
+            data["normal_pages"],
+            data["normal_bytes"],
+            data["dirty_rate_pps"],
+            data["transfer_rate_mbs"],
+            data["iterations"])
+
+
+class Progress(object):
+
+    def __init__(self,
+                 status,
+                 ram,
+                 now,
+                 duration,
+                 downtime,
+                 downtime_expected,
+                 setup_time,
+                 throttle_pcent):
+
+        self._status = status
+        self._ram = ram
+        self._now = now
+        self._duration = duration
+        self._downtime = downtime
+        self._downtime_expected = downtime_expected
+        self._setup_time = setup_time
+        self._throttle_pcent = throttle_pcent
+
+    def serialize(self):
+        return {
+            "status": self._status,
+            "ram": self._ram.serialize(),
+            "now": self._now,
+            "duration": self._duration,
+            "downtime": self._downtime,
+            "downtime_expected": self._downtime_expected,
+            "setup_time": self._setup_time,
+            "throttle_pcent": self._throttle_pcent,
+        }
+
+    @classmethod
+    def deserialize(cls, data):
+        return cls(
+            data["status"],
+            ProgressStats.deserialize(data["ram"]),
+            data["now"],
+            data["duration"],
+            data["downtime"],
+            data["downtime_expected"],
+            data["setup_time"],
+            data["throttle_pcent"])
diff --git a/tests/migration/guestperf/report.py b/tests/migration/guestperf/report.py
new file mode 100644
index 0000000..6a1f971
--- /dev/null
+++ b/tests/migration/guestperf/report.py
@@ -0,0 +1,98 @@
+#
+# Migration test output result reporting
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+import json
+
+from guestperf.hardware import Hardware
+from guestperf.scenario import Scenario
+from guestperf.progress import Progress
+from guestperf.timings import Timings
+
+class Report(object):
+
+    def __init__(self,
+                 hardware,
+                 scenario,
+                 progress_history,
+                 guest_timings,
+                 qemu_timings,
+                 vcpu_timings,
+                 binary,
+                 dst_host,
+                 kernel,
+                 initrd,
+                 transport,
+                 sleep):
+
+        self._hardware = hardware
+        self._scenario = scenario
+        self._progress_history = progress_history
+        self._guest_timings = guest_timings
+        self._qemu_timings = qemu_timings
+        self._vcpu_timings = vcpu_timings
+        self._binary = binary
+        self._dst_host = dst_host
+        self._kernel = kernel
+        self._initrd = initrd
+        self._transport = transport
+        self._sleep = sleep
+
+    def serialize(self):
+        return {
+            "hardware": self._hardware.serialize(),
+            "scenario": self._scenario.serialize(),
+            "progress_history": [progress.serialize() for progress in self._progress_history],
+            "guest_timings": self._guest_timings.serialize(),
+            "qemu_timings": self._qemu_timings.serialize(),
+            "vcpu_timings": self._vcpu_timings.serialize(),
+            "binary": self._binary,
+            "dst_host": self._dst_host,
+            "kernel": self._kernel,
+            "initrd": self._initrd,
+            "transport": self._transport,
+            "sleep": self._sleep,
+        }
+
+    @classmethod
+    def deserialize(cls, data):
+        return cls(
+            Hardware.deserialize(data["hardware"]),
+            Scenario.deserialize(data["scenario"]),
+            [Progress.deserialize(record) for record in data["progress_history"]],
+            Timings.deserialize(data["guest_timings"]),
+            Timings.deserialize(data["qemu_timings"]),
+            Timings.deserialize(data["vcpu_timings"]),
+            data["binary"],
+            data["dst_host"],
+            data["kernel"],
+            data["initrd"],
+            data["transport"],
+            data["sleep"])
+
+    def to_json(self):
+        return json.dumps(self.serialize(), indent=4)
+
+    @classmethod
+    def from_json(cls, data):
+        return cls.deserialize(json.loads(data))
+
+    @classmethod
+    def from_json_file(cls, filename):
+        with open(filename, "r") as fh:
+            return cls.deserialize(json.load(fh))
diff --git a/tests/migration/guestperf/scenario.py b/tests/migration/guestperf/scenario.py
new file mode 100644
index 0000000..705c2e8
--- /dev/null
+++ b/tests/migration/guestperf/scenario.py
@@ -0,0 +1,95 @@
+#
+# Migration test scenario parameter description
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+
+class Scenario(object):
+
+    def __init__(self, name,
+                 downtime=500,
+                 bandwidth=125000, # 1000 gig-e, effectively unlimited
+                 max_iters=30,
+                 max_time=300,
+                 pause=False, pause_iters=5,
+                 post_copy=False, post_copy_iters=5,
+                 auto_converge=False, auto_converge_step=10,
+                 compression_mt=False, compression_mt_threads=1,
+                 compression_xbzrle=False, compression_xbzrle_cache=10):
+
+        self._name = name
+
+        # General migration tunables
+        self._downtime = downtime  # milliseconds
+        self._bandwidth = bandwidth # MiB per second
+        self._max_iters = max_iters
+        self._max_time = max_time # seconds
+
+
+        # Strategies for ensuring completion
+        self._pause = pause
+        self._pause_iters = pause_iters
+
+        self._post_copy = post_copy
+        self._post_copy_iters = post_copy_iters
+
+        self._auto_converge = auto_converge
+        self._auto_converge_step = auto_converge_step # percentage CPU time
+
+        self._compression_mt = compression_mt
+        self._compression_mt_threads = compression_mt_threads
+
+        self._compression_xbzrle = compression_xbzrle
+        self._compression_xbzrle_cache = compression_xbzrle_cache # percentage of guest RAM
+
+    def serialize(self):
+        return {
+            "name": self._name,
+            "downtime": self._downtime,
+            "bandwidth": self._bandwidth,
+            "max_iters": self._max_iters,
+            "max_time": self._max_time,
+            "pause": self._pause,
+            "pause_iters": self._pause_iters,
+            "post_copy": self._post_copy,
+            "post_copy_iters": self._post_copy_iters,
+            "auto_converge": self._auto_converge,
+            "auto_converge_step": self._auto_converge_step,
+            "compression_mt": self._compression_mt,
+            "compression_mt_threads": self._compression_mt_threads,
+            "compression_xbzrle": self._compression_xbzrle,
+            "compression_xbzrle_cache": self._compression_xbzrle_cache,
+        }
+
+    @classmethod
+    def deserialize(cls, data):
+        return cls(
+            data["name"],
+            data["downtime"],
+            data["bandwidth"],
+            data["max_iters"],
+            data["max_time"],
+            data["pause"],
+            data["pause_iters"],
+            data["post_copy"],
+            data["post_copy_iters"],
+            data["auto_converge"],
+            data["auto_converge_step"],
+            data["compression_mt"],
+            data["compression_mt_threads"],
+            data["compression_xbzrle"],
+            data["compression_xbzrle_cache"])
diff --git a/tests/migration/guestperf/shell.py b/tests/migration/guestperf/shell.py
new file mode 100644
index 0000000..185c569
--- /dev/null
+++ b/tests/migration/guestperf/shell.py
@@ -0,0 +1,255 @@
+#
+# Migration test command line shell integration
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+
+import argparse
+import fnmatch
+import os
+import os.path
+import platform
+import sys
+
+from guestperf.hardware import Hardware
+from guestperf.engine import Engine
+from guestperf.scenario import Scenario
+from guestperf.comparison import COMPARISONS
+from guestperf.plot import Plot
+from guestperf.report import Report
+
+
+class BaseShell(object):
+
+    def __init__(self):
+        parser = argparse.ArgumentParser(description="Migration Test Tool")
+
+        # Test args
+        parser.add_argument("--debug", dest="debug", default=False, action="store_true")
+        parser.add_argument("--verbose", dest="verbose", default=False, action="store_true")
+        parser.add_argument("--sleep", dest="sleep", default=15, type=int)
+        parser.add_argument("--binary", dest="binary", default="/usr/bin/qemu-system-x86_64")
+        parser.add_argument("--dst-host", dest="dst_host", default="localhost")
+        parser.add_argument("--kernel", dest="kernel", default="/boot/vmlinuz-%s" % platform.release())
+        parser.add_argument("--initrd", dest="initrd", default="tests/migration/initrd-stress.img")
+        parser.add_argument("--transport", dest="transport", default="unix")
+
+
+        # Hardware args
+        parser.add_argument("--cpus", dest="cpus", default=1, type=int)
+        parser.add_argument("--mem", dest="mem", default=1, type=int)
+        parser.add_argument("--src-cpu-bind", dest="src_cpu_bind", default="")
+        parser.add_argument("--src-mem-bind", dest="src_mem_bind", default="")
+        parser.add_argument("--dst-cpu-bind", dest="dst_cpu_bind", default="")
+        parser.add_argument("--dst-mem-bind", dest="dst_mem_bind", default="")
+        parser.add_argument("--prealloc-pages", dest="prealloc_pages", default=False)
+        parser.add_argument("--huge-pages", dest="huge_pages", default=False)
+        parser.add_argument("--locked-pages", dest="locked_pages", default=False)
+
+        self._parser = parser
+
+    def get_engine(self, args):
+        return Engine(binary=args.binary,
+                      dst_host=args.dst_host,
+                      kernel=args.kernel,
+                      initrd=args.initrd,
+                      transport=args.transport,
+                      sleep=args.sleep,
+                      debug=args.debug,
+                      verbose=args.verbose)
+
+    def get_hardware(self, args):
+        def split_map(value):
+            if value == "":
+                return []
+            return value.split(",")
+
+        return Hardware(cpus=args.cpus,
+                        mem=args.mem,
+
+                        src_cpu_bind=split_map(args.src_cpu_bind),
+                        src_mem_bind=split_map(args.src_mem_bind),
+                        dst_cpu_bind=split_map(args.dst_cpu_bind),
+                        dst_mem_bind=split_map(args.dst_mem_bind),
+
+                        locked_pages=args.locked_pages,
+                        huge_pages=args.huge_pages,
+                        prealloc_pages=args.prealloc_pages)
+
+
+class Shell(BaseShell):
+
+    def __init__(self):
+        super(Shell, self).__init__()
+
+        parser = self._parser
+
+        parser.add_argument("--output", dest="output", default=None)
+
+        # Scenario args
+        parser.add_argument("--max-iters", dest="max_iters", default=30, type=int)
+        parser.add_argument("--max-time", dest="max_time", default=300, type=int)
+        parser.add_argument("--bandwidth", dest="bandwidth", default=125000, type=int)
+        parser.add_argument("--downtime", dest="downtime", default=500, type=int)
+
+        parser.add_argument("--pause", dest="pause", default=False, action="store_true")
+        parser.add_argument("--pause-iters", dest="pause_iters", default=5, type=int)
+
+        parser.add_argument("--post-copy", dest="post_copy", default=False, action="store_true")
+        parser.add_argument("--post-copy-iters", dest="post_copy_iters", default=5, type=int)
+
+        parser.add_argument("--auto-converge", dest="auto_converge", default=False, action="store_true")
+        parser.add_argument("--auto-converge-step", dest="auto_converge_step", default=10, type=int)
+
+        parser.add_argument("--compression-mt", dest="compression_mt", default=False, action="store_true")
+        parser.add_argument("--compression-mt-threads", dest="compression_mt_threads", default=1, type=int)
+
+        parser.add_argument("--compression-xbzrle", dest="compression_xbzrle", default=False, action="store_true")
+        parser.add_argument("--compression-xbzrle-cache", dest="compression_xbzrle_cache", default=10, type=int)
+
+    def get_scenario(self, args):
+        return Scenario(name="perfreport",
+                        downtime=args.downtime,
+                        bandwidth=args.bandwidth,
+                        max_iters=args.max_iters,
+                        max_time=args.max_time,
+
+                        pause=args.pause,
+                        pause_iters=args.pause_iters,
+
+                        post_copy=args.post_copy,
+                        post_copy_iters=args.post_copy_iters,
+
+                        auto_converge=args.auto_converge,
+                        auto_converge_step=args.auto_converge_step,
+
+                        compression_mt=args.compression_mt,
+                        compression_mt_threads=args.compression_mt_threads,
+
+                        compression_xbzrle=args.compression_xbzrle,
+                        compression_xbzrle_cache=args.compression_xbzrle_cache)
+
+    def run(self, argv):
+        args = self._parser.parse_args(argv)
+
+        engine = self.get_engine(args)
+        hardware = self.get_hardware(args)
+        scenario = self.get_scenario(args)
+
+        try:
+            report = engine.run(hardware, scenario)
+            if args.output is None:
+                print report.to_json()
+            else:
+                with open(args.output, "w") as fh:
+                    print >>fh, report.to_json()
+            return 0
+        except Exception as e:
+            print >>sys.stderr, "Error: %s" % str(e)
+            if args.debug:
+                raise
+            return 1
+
+
+class BatchShell(BaseShell):
+
+    def __init__(self):
+        super(BatchShell, self).__init__()
+
+        parser = self._parser
+
+        parser.add_argument("--filter", dest="filter", default="*")
+        parser.add_argument("--output", dest="output", default=os.getcwd())
+
+    def run(self, argv):
+        args = self._parser.parse_args(argv)
+
+        engine = self.get_engine(args)
+        hardware = self.get_hardware(args)
+
+        try:
+            for comparison in COMPARISONS:
+                compdir = os.path.join(args.output, comparison._name)
+                for scenario in comparison._scenarios:
+                    name = os.path.join(comparison._name, scenario._name)
+                    if not fnmatch.fnmatch(name, args.filter):
+                        if args.verbose:
+                            print "Skipping %s" % name
+                        continue
+
+                    if args.verbose:
+                        print "Running %s" % name
+
+                    dirname = os.path.join(args.output, comparison._name)
+                    filename = os.path.join(dirname, scenario._name + ".json")
+                    if not os.path.exists(dirname):
+                        os.makedirs(dirname)
+                    report = engine.run(hardware, scenario)
+                    with open(filename, "w") as fh:
+                        print >>fh, report.to_json()
+        except Exception as e:
+            print >>sys.stderr, "Error: %s" % str(e)
+            if args.debug:
+                raise
+
+
+class PlotShell(object):
+
+    def __init__(self):
+        super(PlotShell, self).__init__()
+
+        self._parser = argparse.ArgumentParser(description="Migration Test Tool")
+
+        self._parser.add_argument("--output", dest="output", default=None)
+
+        self._parser.add_argument("--debug", dest="debug", default=False, action="store_true")
+        self._parser.add_argument("--verbose", dest="verbose", default=False, action="store_true")
+
+        self._parser.add_argument("--migration-iters", dest="migration_iters", default=False, action="store_true")
+        self._parser.add_argument("--total-guest-cpu", dest="total_guest_cpu", default=False, action="store_true")
+        self._parser.add_argument("--split-guest-cpu", dest="split_guest_cpu", default=False, action="store_true")
+        self._parser.add_argument("--qemu-cpu", dest="qemu_cpu", default=False, action="store_true")
+        self._parser.add_argument("--vcpu-cpu", dest="vcpu_cpu", default=False, action="store_true")
+
+        self._parser.add_argument("reports", nargs='*')
+
+    def run(self, argv):
+        args = self._parser.parse_args(argv)
+
+        if len(args.reports) == 0:
+            print >>sys.stderr, "At least one report required"
+            return 1
+
+        if not (args.qemu_cpu or
+                args.vcpu_cpu or
+                args.total_guest_cpu or
+                args.split_guest_cpu):
+            print >>sys.stderr, "At least one chart type is required"
+            return 1
+
+        reports = []
+        for report in args.reports:
+            reports.append(Report.from_json_file(report))
+
+        plot = Plot(reports,
+                    args.migration_iters,
+                    args.total_guest_cpu,
+                    args.split_guest_cpu,
+                    args.qemu_cpu,
+                    args.vcpu_cpu)
+
+        plot.generate(args.output)
diff --git a/tests/migration/guestperf/timings.py b/tests/migration/guestperf/timings.py
new file mode 100644
index 0000000..f94d809
--- /dev/null
+++ b/tests/migration/guestperf/timings.py
@@ -0,0 +1,55 @@
+#
+# Migration test timing records
+#
+# Copyright (c) 2016 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+#
+
+
+class TimingRecord(object):
+
+    def __init__(self, tid, timestamp, value):
+
+        self._tid = tid
+        self._timestamp = timestamp
+        self._value = value
+
+    def serialize(self):
+        return {
+            "tid": self._tid,
+            "timestamp": self._timestamp,
+            "value": self._value
+        }
+
+    @classmethod
+    def deserialize(cls, data):
+        return cls(
+            data["tid"],
+            data["timestamp"],
+            data["value"])
+
+
+class Timings(object):
+
+    def __init__(self, records):
+
+        self._records = records
+
+    def serialize(self):
+        return [record.serialize() for record in self._records]
+
+    @classmethod
+    def deserialize(cls, data):
+        return Timings([TimingRecord.deserialize(record) for record in data])
diff --git a/tests/migration/stress.c b/tests/migration/stress.c
new file mode 100644
index 0000000..cf8ce8b
--- /dev/null
+++ b/tests/migration/stress.c
@@ -0,0 +1,367 @@
+/*
+ * Migration stress workload
+ *
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+#include <getopt.h>
+#include <string.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/reboot.h>
+#include <sys/syscall.h>
+#include <linux/random.h>
+#include <sys/time.h>
+#include <pthread.h>
+#include <fcntl.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+
+const char *argv0;
+
+#define PAGE_SIZE 4096
+
+static int gettid(void)
+{
+    return syscall(SYS_gettid);
+}
+
+static __attribute__((noreturn)) void exit_failure(void)
+{
+    if (getpid() == 1) {
+        sync();
+        reboot(RB_POWER_OFF);
+        fprintf(stderr, "%s (%05d): ERROR: cannot reboot: %s\n",
+                argv0, gettid(), strerror(errno));
+        abort();
+    } else {
+        exit(1);
+    }
+}
+
+static __attribute__((noreturn)) void exit_success(void)
+{
+    if (getpid() == 1) {
+        sync();
+        reboot(RB_POWER_OFF);
+        fprintf(stderr, "%s (%05d): ERROR: cannot reboot: %s\n",
+                argv0, gettid(), strerror(errno));
+        abort();
+    } else {
+        exit(0);
+    }
+}
+
+static int get_command_arg_str(const char *name,
+                               char **val)
+{
+    static char line[1024];
+    FILE *fp = fopen("/proc/cmdline", "r");
+    char *start, *end;
+
+    if (fp == NULL) {
+        fprintf(stderr, "%s (%05d): ERROR: cannot open /proc/cmdline: %s\n",
+                argv0, gettid(), strerror(errno));
+        return -1;
+    }
+
+    if (!fgets(line, sizeof line, fp)) {
+        fprintf(stderr, "%s (%05d): ERROR: cannot read /proc/cmdline: %s\n",
+                argv0, gettid(), strerror(errno));
+        fclose(fp);
+        return -1;
+    }
+    fclose(fp);
+
+    start = strstr(line, name);
+    if (!start)
+        return 0;
+
+    start += strlen(name);
+
+    if (*start != '=') {
+        fprintf(stderr, "%s (%05d): ERROR: no value provided for '%s' in /proc/cmdline\n",
+                argv0, gettid(), name);
+    }
+    start++;
+
+    end = strstr(start, " ");
+    if (!end)
+        end = strstr(start, "\n");
+
+    if (end == start) {
+        fprintf(stderr, "%s (%05d): ERROR: no value provided for '%s' in /proc/cmdline\n",
+                argv0, gettid(), name);
+        return -1;
+    }
+
+    if (end)
+        *val = strndup(start, end - start);
+    else
+        *val = strdup(start);
+    return 1;
+}
+
+
+static int get_command_arg_ull(const char *name,
+                               unsigned long long *val)
+{
+    char *valstr;
+    char *end;
+
+    int ret = get_command_arg_str(name, &valstr);
+    if (ret <= 0)
+        return ret;
+
+    errno = 0;
+    *val = strtoll(valstr, &end, 10);
+    if (errno || *end) {
+        fprintf(stderr, "%s (%05d): ERROR: cannot parse %s value %s\n",
+                argv0, gettid(), name, valstr);
+        free(valstr);
+        return -1;
+    }
+    free(valstr);
+    return 0;
+}
+
+
+static int random_bytes(char *buf, size_t len)
+{
+    int fd;
+
+    fd = open("/dev/urandom", O_RDONLY);
+    if (fd < 0) {
+        fprintf(stderr, "%s (%05d): ERROR: cannot open /dev/urandom: %s\n",
+                argv0, gettid(), strerror(errno));
+        return -1;
+    }
+
+    if (read(fd, buf, len) != len) {
+        fprintf(stderr, "%s (%05d): ERROR: cannot read /dev/urandom: %s\n",
+                argv0, gettid(), strerror(errno));
+        close(fd);
+        return -1;
+    }
+
+    close(fd);
+
+    return 0;
+}
+
+
+static unsigned long long now(void)
+{
+    struct timeval tv;
+
+    gettimeofday(&tv, NULL);
+
+    return (tv.tv_sec * 1000ull) + (tv.tv_usec / 1000ull);
+}
+
+static int stressone(unsigned long long ramsizeMB)
+{
+    size_t pagesPerMB = 1024 * 1024 / PAGE_SIZE;
+    char *ram = malloc(ramsizeMB * 1024 * 1024);
+    char *ramptr;
+    size_t i, j, k;
+    char *data = malloc(PAGE_SIZE);
+    char *dataptr;
+    size_t nMB = 0;
+    unsigned long long before, after;
+
+    if (!ram) {
+        fprintf(stderr, "%s (%05d): ERROR: cannot allocate %llu MB of RAM: %s\n",
+                argv0, gettid(), ramsizeMB, strerror(errno));
+        return -1;
+    }
+    if (!data) {
+        fprintf(stderr, "%s (%d): ERROR: cannot allocate %d bytes of RAM: %s\n",
+                argv0, gettid(), PAGE_SIZE, strerror(errno));
+        free(ram);
+        return -1;
+    }
+
+    /* We don't care about initial state, but we do want
+     * to fault it all into RAM, otherwise the first iter
+     * of the loop below will be quite slow. We cna't use
+     * 0x0 as the byte as gcc optimizes that away into a
+     * calloc instead :-) */
+    memset(ram, 0xfe, ramsizeMB * 1024 * 1024);
+
+    if (random_bytes(data, PAGE_SIZE) < 0) {
+        free(ram);
+        free(data);
+        return -1;
+    }
+
+    before = now();
+
+    while (1) {
+
+        ramptr = ram;
+        for (i = 0; i < ramsizeMB; i++, nMB++) {
+            for (j = 0; j < pagesPerMB; j++) {
+                dataptr = data;
+                for (k = 0; k < PAGE_SIZE; k += sizeof(long long)) {
+                    ramptr += sizeof(long long);
+                    dataptr += sizeof(long long);
+                    *(unsigned long long *)ramptr ^= *(unsigned long long *)dataptr;
+                }
+            }
+
+            if (nMB == 1024) {
+                after = now();
+                fprintf(stderr, "%s (%05d): INFO: %06llums copied 1 GB in %05llums\n",
+                        argv0, gettid(), after, after - before);
+                before = now();
+                nMB = 0;
+            }
+        }
+    }
+
+    free(data);
+    free(ram);
+}
+
+
+static void *stressthread(void *arg)
+{
+    unsigned long long ramsizeMB = *(unsigned long long *)arg;
+
+    stressone(ramsizeMB);
+
+    return NULL;
+}
+
+static int stress(unsigned long long ramsizeGB, int ncpus)
+{
+    size_t i;
+    unsigned long long ramsizeMB = ramsizeGB * 1024 / ncpus;
+    ncpus--;
+
+    for (i = 0; i < ncpus; i++) {
+        pthread_t thr;
+        pthread_create(&thr, NULL,
+                       stressthread,   &ramsizeMB);
+    }
+
+    stressone(ramsizeMB);
+
+    return 0;
+}
+
+
+static int mount_misc(const char *fstype, const char *dir)
+{
+    if (mkdir(dir, 0755) < 0 && errno != EEXIST) {
+        fprintf(stderr, "%s (%05d): ERROR: cannot create %s: %s\n",
+                argv0, gettid(), dir, strerror(errno));
+        return -1;
+    }
+
+    if (mount("none", dir, fstype, 0, NULL) < 0) {
+        fprintf(stderr, "%s (%05d): ERROR: cannot mount %s: %s\n",
+                argv0, gettid(), dir, strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int mount_all(void)
+{
+    if (mount_misc("proc", "/proc") < 0 ||
+        mount_misc("sysfs", "/sys") < 0 ||
+        mount_misc("tmpfs", "/dev") < 0)
+        return -1;
+
+    mknod("/dev/urandom", 0777 | S_IFCHR, makedev(1, 9));
+    mknod("/dev/random", 0777 | S_IFCHR, makedev(1, 8));
+
+    return 0;
+}
+
+int main(int argc, char **argv)
+{
+    unsigned long long ramsizeGB = 1;
+    char *end;
+    int ch;
+    int opt_ind = 0;
+    const char *sopt = "hr:c:";
+    struct option lopt[] = {
+        { "help", no_argument, NULL, 'h' },
+        { "ramsize", required_argument, NULL, 'r' },
+        { "cpus", required_argument, NULL, 'c' },
+        { NULL, 0, NULL, 0 }
+    };
+    int ret;
+    int ncpus = 0;
+
+    argv0 = argv[0];
+
+    while ((ch = getopt_long(argc, argv, sopt, lopt, &opt_ind)) != -1) {
+        switch (ch) {
+        case 'r':
+            errno = 0;
+            ramsizeGB = strtoll(optarg, &end, 10);
+            if (errno != 0 || *end) {
+                fprintf(stderr, "%s (%05d): ERROR: Cannot parse RAM size %s\n",
+                        argv0, gettid(), optarg);
+                exit_failure();
+            }
+            break;
+
+        case 'c':
+            errno = 0;
+            ncpus = strtoll(optarg, &end, 10);
+            if (errno != 0 || *end) {
+                fprintf(stderr, "%s (%05d): ERROR: Cannot parse CPU count %s\n",
+                        argv0, gettid(), optarg);
+                exit_failure();
+            }
+            break;
+
+        case '?':
+        case 'h':
+            fprintf(stderr, "%s: [--help][--ramsize GB][--cpus N]\n", argv0);
+            exit_failure();
+        }
+    }
+
+    if (getpid() == 1) {
+        if (mount_all() < 0)
+            exit_failure();
+
+        ret = get_command_arg_ull("ramsize", &ramsizeGB);
+        if (ret < 0)
+            exit_failure();
+    }
+
+    if (ncpus == 0)
+        ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+
+    fprintf(stdout, "%s (%05d): INFO: RAM %llu GiB across %d CPUs\n",
+            argv0, gettid(), ramsizeGB, ncpus);
+
+    if (stress(ramsizeGB, ncpus) < 0)
+        exit_failure();
+
+    exit_success();
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework
  2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
                   ` (5 preceding siblings ...)
  2016-05-05 14:28 ` [Qemu-devel] [PATCH v1 6/6] tests: introduce a framework for testing migration performance Daniel P. Berrange
@ 2016-05-05 15:39 ` Dr. David Alan Gilbert
  2016-05-09 15:01   ` Daniel P. Berrange
  2016-05-06  1:33 ` Li, Liang Z
  2016-05-06 10:45 ` Amit Shah
  8 siblings, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2016-05-05 15:39 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: qemu-devel, Juan Quintela, Amit Shah

* Daniel P. Berrange (berrange@redhat.com) wrote:
> This series of patches provides a framework for testing migration performance
> characteristics. The motivating factor for this is planning that is underway
> in OpenStack wrt making use of QEMU migration features such as compression,
> auto-converge and post-copy. The primary aim for OpenStack is to have Nova
> autonomously manage migration features & tunables to maximise chances that
> migration will complete. The problem faced is figuring out just which QEMU
> migration features are "best" suited to our needs. This means we want data
> on how well they are able to ensure completion of a migration, against the
> host resources used and the impact on the guest workload performance.
> 
> The test framework produced here takes a pathelogical guest workload (every
> CPU just burning 100% of time xor'ing every byte of guest memory with random
> data). This is quite a pessimistic test because most guest workloads are not
> giong to be this heavy on memory writes, and their data won't be uniformly
> random and so will be able to compress better than this test does.
> 
> With this worst case guest, I have produced a set of tests using UNIX socket,
> TCP localhost, TCP remote and RDMA remote socket transports, with both a
> 1 GB RAM + 1 CPU guest and a 8 GB RAM + 4 CPU guest.
> 
> The TCP/RDMA remote host tests were run over a 10-GiG-E network interface.
> 
> I have put the results online to view here:
> 
>   https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/
> 
> The charts here are showing two core sets of data:
> 
>  - The guest CPU performance. The left axis is showing the time in milliseconds
>    required to xor 1 GB of memory. This is shown per-guest CPU and combined all
>    CPUs.
> 
>  - The host CPU utilization. The right axis is showing the overall QEMU process
>    CPU utilization, and the per-VCPU utilization.
> 
> Note that the charts are interactive - you can turn on/off each plot line and
> zoom in by selecting regions on the chart.
> 
> 
> Some interesting things that I have observed with this
> 
>  - At the start of each iteration of migration there is a distinct drop in
>    guest CPU performance as shown by a spike in the guest CPU time lines.
>    Performance would drop from 200ms/GB to 400ms/GB. Presumably this is
>    related to QEMU recalculating the dirty bitmap for the guest RAM. See
>    the spikes in the green line in:
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/post-copy-bandwidth/post-copy-bw-1gbs.html

Yeh, that doesn't surprise me too much.

>  - For the larger sized guests, the auto-converge code has to throttle the
>    guest to as much as 90% or more before it is able to meet the 500ms max
>    downtime value
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/auto-converge-bandwidth/auto-converge-bw-1gbs.html
> 
>    Even then I often saw tests aborting as they hit the max number of
>    iterations I permitted (30 iters max)
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/auto-converge-bandwidth/auto-converge-bw-10gbs.html

It doesn't take much CPU to dirty memory, so you need an awful lot of throttling,
and the throttling is non-discriminate so it throttles threads that are dirtying
memory a lot as well as those that aren't.

>  - MT compression is actively harmful to chances of successful migration when
>    the guest RAM is not compression friendly. My work load is worst case since
>    it is splattering RAM with totally random bytes. The MT compression is
>    dramatically increasing the time for each iteration as we bottleneck on CPU
>    compression speed, leaving the network largely idle. This causes migration
>    which would have completed without compression, to fail. It also burns huge
>    amounts of host CPU time
> 
>      https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-mt/compr-mt-threads-4.html

Yes; I think the hope is that this will work well with compression accelerator
hardware.  I look forward to the vendors of that hardware using your scripts
to produce comparisons.

>  - XBZRLE compression did not have as much of a CPU peformance penalty on the
>    host as MT comprssion, but also did not help migration to actually complete.
>    Again this is largely due to the workload being the worst case scenario with
>    random bytes. The downside is obviously the potentially significant memory
>    overhead on the host due to the cache sizing
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-xbzrle/compr-xbzrle-cache-50.html
> 
> 
>  - Post-copy, by its very nature, obviously ensured that the migraton would
>    complete. While post-copy was running in pre-copy mode there was a somewhat
>    chaotic small impact on guest CPU performance, causing performance to
>    periodically oscillate between 400ms/GB and 800ms/GB. This is less than
>    the impact at the start of each migration iteration which was 1000ms/GB
>    in this test. There was also a massive penalty at time of switchover from
>    pre to post copy, as to be expected. The migration completed in post-copy
>    phase quite quickly though. For this workload, number of iterations in
>    pre-copy mode before switching to post-copy did not have much impact. I
>    expect a less extreme workload would have shown more interesting results
>    wrt number of iterations of pre-copy:
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/post-copy-iters.html

Hmm; I hadn't actually expected that much performance difference during the
precopy phase (it used to in earlier postcopy versions but the later versions
should have got simpler).  The number of iterations wouldn't make that much difference
for your workload - because you're changing all of memory then we're going to have to
resend it; if you had a workload where some of the memory was mostly static
and some was rapidly changing, then one or two passes to transfer the mostly
static data would show a benefit.

> Overall, if we're looking for a solution that can guarantee completion under
> the most extreme guest workload, then only post-copy & autoconverge appear
> upto the job.
> 
> The MT compression is seriously harmful to migration and has severe CPU
> overhead. The XBZRLE compression is moderatly harmful to migration and has
> potentilly severa memory overhead for large cache sizes to make it useful.
> 
> While auto-converge can ensure that guest migration completes, it has a
> pretty significantly long term impact on guest CPU performance to achieve
> this. ie the guest spends a long time in pre-copy mode with its CPUs very
> dramatically throttled down. The level of throttling required makes one
> wonder whether it is worth using, against simply pausing the guest workload.
> The latter has a hard blackout period, but over a quite short time frame
> if network speed is fast.
> 
> The post-copy code does have an impact on guest performance while in pre
> copy mode, vs a plain migration. It also has a fairly high spike when in
> post-copy mode, but this last for a pretty short time. As compared to
> auto-converge, it is able to ensure the migration completes in a finite
> time without having a prolonged impact on guest CPU performance. The
> penalty during the post-copy phase is on a par with the penalty impose
> by auto-converge when it has to throttle to 90%+.

One advantage of postcopy here is that it should be more selective on
real workloads than auto-converge.  If some threads have the data they
need on the destination already, they can run without much performance impact.
(Indeed that's actually threads, not even vCPUs, since the asynchronous
page faults let the guest keep the vCPU busy even if one
thread is waiting).

> Overall, in the context of a worst case guest workload, it appears that
> post-copy is the clear winning strategy ensuring completion of migration
> without imposing an long duration penalty on guest peformance. If the
> risk of failure from post-copy is unacceptable then auto-converge is a
> good fallback option, if the long duration guest CPU penalty can be
> accepted.

Excellent :-) And we have a GSoC student looking at recovery after network
failure during postcopy - but as you see the actual postcopy phase is
pretty short anyway, so the risk window is short.

> The compression options are only worth using if the host has free CPU
> resources, and the guest RAM is believed to be compression friendly,
> as they steal significant CPU time away from guests in order to run
> compression, often with a negative impact on migration completion
> chances.
> 
> Looking at migration in general, even with a 10-GiG-E NIC and RDMA
> transport it is possible for a single guest to provide a workload that
> will saturate the network during migration & thus prevent completion.
> Based on this, there is little point in attempting to run migrations
> in parallel on the same host, unless multiple NICs are available,
> as parallel migrations would reduce the chances of either one ever
> completing. Better reliability & faster overall completion would
> likely be achieved by fully serializing migration operations per
> host.

Probably true; however:
   a) 10Gb is a bit slow these days - you can get dual port 100Gb cards
     even so you'll never keep up with a CPU bus going flat
     out.

   b) If your bandwidth is limited by single core CPU (e.g. encryption/ or
      just TCP speed) then two migrations might make sense if you
      still have spare cores and bandwidth.

   c) If the destination of the two migrations are different hosts then
      hmm the discussion probably gets more complex :-)

> 
> There is clearly scope for more investigation here, in particular
> 
>  - Produce some alternative guest workloads that try to present
>    a more "average" scenario workload, instead of the worst-case.
>    These would likely allow compression to have some positive
>    impact.
> 
>  - Try various combinations of strategies. For example, combining
>    post-copy and auto-converge at the same time, or compression
>    combined with either post-copy or auto-converge.
> 
>  - Investigate block migration performance too, with NBD migration
>    server.
> 
>  - Investigate effect of dynamically changing max downtime value
>    during migration, rather than using a fixed 500ms value.

Yes, and probably worth trying to figure out what's happening during
the precopy phase of postcopy.

Dave

> 
> 
> Daniel P. Berrange (6):
>   scripts: add __init__.py file to scripts/qmp/
>   scripts: add a 'debug' parameter to QEMUMonitorProtocol
>   scripts: refactor the VM class in iotests for reuse
>   scripts: set timeout when waiting for qemu monitor connection
>   scripts: ensure monitor socket has SO_REUSEADDR set
>   tests: introduce a framework for testing migration performance
> 
>  configure                               |   2 +
>  scripts/qemu.py                         | 202 +++++++++++
>  scripts/qmp/__init__.py                 |   0
>  scripts/qmp/qmp.py                      |  15 +-
>  scripts/qtest.py                        |  34 ++
>  tests/Makefile                          |  12 +
>  tests/migration/.gitignore              |   2 +
>  tests/migration/guestperf-batch.py      |  26 ++
>  tests/migration/guestperf-plot.py       |  26 ++
>  tests/migration/guestperf.py            |  27 ++
>  tests/migration/guestperf/__init__.py   |   0
>  tests/migration/guestperf/comparison.py | 124 +++++++
>  tests/migration/guestperf/engine.py     | 439 ++++++++++++++++++++++
>  tests/migration/guestperf/hardware.py   |  62 ++++
>  tests/migration/guestperf/plot.py       | 623 ++++++++++++++++++++++++++++++++
>  tests/migration/guestperf/progress.py   | 117 ++++++
>  tests/migration/guestperf/report.py     |  98 +++++
>  tests/migration/guestperf/scenario.py   |  95 +++++
>  tests/migration/guestperf/shell.py      | 255 +++++++++++++
>  tests/migration/guestperf/timings.py    |  55 +++
>  tests/migration/stress.c                | 367 +++++++++++++++++++
>  tests/qemu-iotests/iotests.py           | 135 +------
>  22 files changed, 2583 insertions(+), 133 deletions(-)
>  create mode 100644 scripts/qemu.py
>  create mode 100644 scripts/qmp/__init__.py
>  create mode 100644 tests/migration/.gitignore
>  create mode 100755 tests/migration/guestperf-batch.py
>  create mode 100755 tests/migration/guestperf-plot.py
>  create mode 100755 tests/migration/guestperf.py
>  create mode 100644 tests/migration/guestperf/__init__.py
>  create mode 100644 tests/migration/guestperf/comparison.py
>  create mode 100644 tests/migration/guestperf/engine.py
>  create mode 100644 tests/migration/guestperf/hardware.py
>  create mode 100644 tests/migration/guestperf/plot.py
>  create mode 100644 tests/migration/guestperf/progress.py
>  create mode 100644 tests/migration/guestperf/report.py
>  create mode 100644 tests/migration/guestperf/scenario.py
>  create mode 100644 tests/migration/guestperf/shell.py
>  create mode 100644 tests/migration/guestperf/timings.py
>  create mode 100644 tests/migration/stress.c
> 
> -- 
> 2.5.5
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework
  2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
                   ` (6 preceding siblings ...)
  2016-05-05 15:39 ` [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Dr. David Alan Gilbert
@ 2016-05-06  1:33 ` Li, Liang Z
  2016-05-06 10:45 ` Amit Shah
  8 siblings, 0 replies; 13+ messages in thread
From: Li, Liang Z @ 2016-05-06  1:33 UTC (permalink / raw)
  To: Daniel P. Berrange, qemu-devel
  Cc: Amit Shah, Dr. David Alan Gilbert, Juan Quintela

> This series of patches provides a framework for testing migration
> performance characteristics. The motivating factor for this is planning that is
> underway in OpenStack wrt making use of QEMU migration features such as
> compression, auto-converge and post-copy. The primary aim for OpenStack
> is to have Nova autonomously manage migration features & tunables to
> maximise chances that migration will complete. The problem faced is figuring
> out just which QEMU migration features are "best" suited to our needs. This
> means we want data on how well they are able to ensure completion of a
> migration, against the host resources used and the impact on the guest
> workload performance.
> 
> The test framework produced here takes a pathelogical guest workload
> (every CPU just burning 100% of time xor'ing every byte of guest memory
> with random data). This is quite a pessimistic test because most guest
> workloads are not giong to be this heavy on memory writes, and their data
> won't be uniformly random and so will be able to compress better than this
> test does.
> 


Wonderful test report!

> With this worst case guest, I have produced a set of tests using UNIX socket,
> TCP localhost, TCP remote and RDMA remote socket transports, with both a
> 1 GB RAM + 1 CPU guest and a 8 GB RAM + 4 CPU guest.
> 
> The TCP/RDMA remote host tests were run over a 10-GiG-E network
> interface.
> 
> I have put the results online to view here:
> 
>   https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/
> 
> The charts here are showing two core sets of data:
> 
>  - The guest CPU performance. The left axis is showing the time in
> milliseconds
>    required to xor 1 GB of memory. This is shown per-guest CPU and
> combined all
>    CPUs.
> 
>  - The host CPU utilization. The right axis is showing the overall QEMU process
>    CPU utilization, and the per-VCPU utilization.
> 
> Note that the charts are interactive - you can turn on/off each plot line and
> zoom in by selecting regions on the chart.
> 
> 
> Some interesting things that I have observed with this
> 
>  - At the start of each iteration of migration there is a distinct drop in
>    guest CPU performance as shown by a spike in the guest CPU time lines.
>    Performance would drop from 200ms/GB to 400ms/GB. Presumably this is
>    related to QEMU recalculating the dirty bitmap for the guest RAM. See
>    the spikes in the green line in:
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-
> remote-1gb-1cpu/post-copy-bandwidth/post-copy-bw-1gbs.html
> 
>  - For the larger sized guests, the auto-converge code has to throttle the
>    guest to as much as 90% or more before it is able to meet the 500ms max
>    downtime value
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-
> remote-1gb-1cpu/auto-converge-bandwidth/auto-converge-bw-1gbs.html
> 
>    Even then I often saw tests aborting as they hit the max number of
>    iterations I permitted (30 iters max)
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-
> remote-8gb-4cpu/auto-converge-bandwidth/auto-converge-bw-10gbs.html
> 
>  - MT compression is actively harmful to chances of successful migration
> when
>    the guest RAM is not compression friendly. My work load is worst case
> since
>    it is splattering RAM with totally random bytes. The MT compression is
>    dramatically increasing the time for each iteration as we bottleneck on CPU
>    compression speed, leaving the network largely idle. This causes migration
>    which would have completed without compression, to fail. It also burns
> huge
>    amounts of host CPU time
> 
>      https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-
> remote-1gb-1cpu/compr-mt/compr-mt-threads-4.html
> 
>  - XBZRLE compression did not have as much of a CPU peformance penalty on
> the
>    host as MT comprssion, but also did not help migration to actually complete.
>    Again this is largely due to the workload being the worst case scenario with
>    random bytes. The downside is obviously the potentially significant
> memory
>    overhead on the host due to the cache sizing
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-
> remote-1gb-1cpu/compr-xbzrle/compr-xbzrle-cache-50.html
> 
> 
>  - Post-copy, by its very nature, obviously ensured that the migraton would
>    complete. While post-copy was running in pre-copy mode there was a
> somewhat
>    chaotic small impact on guest CPU performance, causing performance to
>    periodically oscillate between 400ms/GB and 800ms/GB. This is less than
>    the impact at the start of each migration iteration which was 1000ms/GB
>    in this test. There was also a massive penalty at time of switchover from
>    pre to post copy, as to be expected. The migration completed in post-copy
>    phase quite quickly though. For this workload, number of iterations in
>    pre-copy mode before switching to post-copy did not have much impact. I
>    expect a less extreme workload would have shown more interesting results
>    wrt number of iterations of pre-copy:
> 
>     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-
> remote-8gb-4cpu/post-copy-iters.html
> 
> 
> Overall, if we're looking for a solution that can guarantee completion under
> the most extreme guest workload, then only post-copy & autoconverge
> appear upto the job.
> 
> The MT compression is seriously harmful to migration and has severe CPU
> overhead. The XBZRLE compression is moderatly harmful to migration and
> has potentilly severa memory overhead for large cache sizes to make it
> useful.
> 
> While auto-converge can ensure that guest migration completes, it has a
> pretty significantly long term impact on guest CPU performance to achieve
> this. ie the guest spends a long time in pre-copy mode with its CPUs very
> dramatically throttled down. The level of throttling required makes one
> wonder whether it is worth using, against simply pausing the guest workload.
> The latter has a hard blackout period, but over a quite short time frame if
> network speed is fast.
> 
> The post-copy code does have an impact on guest performance while in pre
> copy mode, vs a plain migration. It also has a fairly high spike when in post-
> copy mode, but this last for a pretty short time. As compared to auto-
> converge, it is able to ensure the migration completes in a finite time without
> having a prolonged impact on guest CPU performance. The penalty during
> the post-copy phase is on a par with the penalty impose by auto-converge
> when it has to throttle to 90%+.
> 
> 
> Overall, in the context of a worst case guest workload, it appears that post-
> copy is the clear winning strategy ensuring completion of migration without
> imposing an long duration penalty on guest peformance. If the risk of failure
> from post-copy is unacceptable then auto-converge is a good fallback option,
> if the long duration guest CPU penalty can be accepted.
> 
> The compression options are only worth using if the host has free CPU
> resources, and the guest RAM is believed to be compression friendly, as they
> steal significant CPU time away from guests in order to run compression,
> often with a negative impact on migration completion chances.
> 

MT compression should only be used when the network bandwidth is the bottle neck
that effects live migration. Use other faster (de)compression algorithm can reduce the CPU overhead.

Liang

> Looking at migration in general, even with a 10-GiG-E NIC and RDMA
> transport it is possible for a single guest to provide a workload that will
> saturate the network during migration & thus prevent completion.
> Based on this, there is little point in attempting to run migrations in parallel
> on the same host, unless multiple NICs are available, as parallel migrations
> would reduce the chances of either one ever completing. Better reliability &
> faster overall completion would likely be achieved by fully serializing
> migration operations per host.
> 
> There is clearly scope for more investigation here, in particular
> 
>  - Produce some alternative guest workloads that try to present
>    a more "average" scenario workload, instead of the worst-case.
>    These would likely allow compression to have some positive
>    impact.
> 
>  - Try various combinations of strategies. For example, combining
>    post-copy and auto-converge at the same time, or compression
>    combined with either post-copy or auto-converge.
> 
>  - Investigate block migration performance too, with NBD migration
>    server.
> 
>  - Investigate effect of dynamically changing max downtime value
>    during migration, rather than using a fixed 500ms value.
> 
> 
> Daniel P. Berrange (6):
>   scripts: add __init__.py file to scripts/qmp/
>   scripts: add a 'debug' parameter to QEMUMonitorProtocol
>   scripts: refactor the VM class in iotests for reuse
>   scripts: set timeout when waiting for qemu monitor connection
>   scripts: ensure monitor socket has SO_REUSEADDR set
>   tests: introduce a framework for testing migration performance
> 
>  configure                               |   2 +
>  scripts/qemu.py                         | 202 +++++++++++
>  scripts/qmp/__init__.py                 |   0
>  scripts/qmp/qmp.py                      |  15 +-
>  scripts/qtest.py                        |  34 ++
>  tests/Makefile                          |  12 +
>  tests/migration/.gitignore              |   2 +
>  tests/migration/guestperf-batch.py      |  26 ++
>  tests/migration/guestperf-plot.py       |  26 ++
>  tests/migration/guestperf.py            |  27 ++
>  tests/migration/guestperf/__init__.py   |   0
>  tests/migration/guestperf/comparison.py | 124 +++++++
>  tests/migration/guestperf/engine.py     | 439 ++++++++++++++++++++++
>  tests/migration/guestperf/hardware.py   |  62 ++++
>  tests/migration/guestperf/plot.py       | 623
> ++++++++++++++++++++++++++++++++
>  tests/migration/guestperf/progress.py   | 117 ++++++
>  tests/migration/guestperf/report.py     |  98 +++++
>  tests/migration/guestperf/scenario.py   |  95 +++++
>  tests/migration/guestperf/shell.py      | 255 +++++++++++++
>  tests/migration/guestperf/timings.py    |  55 +++
>  tests/migration/stress.c                | 367 +++++++++++++++++++
>  tests/qemu-iotests/iotests.py           | 135 +------
>  22 files changed, 2583 insertions(+), 133 deletions(-)  create mode 100644
> scripts/qemu.py  create mode 100644 scripts/qmp/__init__.py  create mode
> 100644 tests/migration/.gitignore  create mode 100755
> tests/migration/guestperf-batch.py
>  create mode 100755 tests/migration/guestperf-plot.py  create mode 100755
> tests/migration/guestperf.py  create mode 100644
> tests/migration/guestperf/__init__.py
>  create mode 100644 tests/migration/guestperf/comparison.py
>  create mode 100644 tests/migration/guestperf/engine.py
>  create mode 100644 tests/migration/guestperf/hardware.py
>  create mode 100644 tests/migration/guestperf/plot.py  create mode 100644
> tests/migration/guestperf/progress.py
>  create mode 100644 tests/migration/guestperf/report.py
>  create mode 100644 tests/migration/guestperf/scenario.py
>  create mode 100644 tests/migration/guestperf/shell.py
>  create mode 100644 tests/migration/guestperf/timings.py
>  create mode 100644 tests/migration/stress.c
> 
> --
> 2.5.5
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework
  2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
                   ` (7 preceding siblings ...)
  2016-05-06  1:33 ` Li, Liang Z
@ 2016-05-06 10:45 ` Amit Shah
  8 siblings, 0 replies; 13+ messages in thread
From: Amit Shah @ 2016-05-06 10:45 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: qemu-devel, Juan Quintela, Dr. David Alan Gilbert

On (Thu) 05 May 2016 [15:27:54], Daniel P. Berrange wrote:
> This series of patches provides a framework for testing migration performance
> characteristics. The motivating factor for this is planning that is underway
> in OpenStack wrt making use of QEMU migration features such as compression,
> auto-converge and post-copy. The primary aim for OpenStack is to have Nova
> autonomously manage migration features & tunables to maximise chances that
> migration will complete. The problem faced is figuring out just which QEMU
> migration features are "best" suited to our needs. This means we want data
> on how well they are able to ensure completion of a migration, against the
> host resources used and the impact on the guest workload performance.

This is nice.

> 
> The test framework produced here takes a pathelogical guest workload (every
> CPU just burning 100% of time xor'ing every byte of guest memory with random
> data). This is quite a pessimistic test because most guest workloads are not
> giong to be this heavy on memory writes, and their data won't be uniformly
> random and so will be able to compress better than this test does.
> 
> With this worst case guest, I have produced a set of tests using UNIX socket,
> TCP localhost, TCP remote and RDMA remote socket transports, with both a
> 1 GB RAM + 1 CPU guest and a 8 GB RAM + 4 CPU guest.

Are typical openstack guests just 8G or 4vcpu?  Do you see larger
guests?  Interesting things only start when we have lots more RAM, and
lots more vcpus to dirty that RAM.


		Amit

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework
  2016-05-05 15:39 ` [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Dr. David Alan Gilbert
@ 2016-05-09 15:01   ` Daniel P. Berrange
  2016-05-09 15:36     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel P. Berrange @ 2016-05-09 15:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, Juan Quintela, Amit Shah

On Thu, May 05, 2016 at 04:39:45PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > Some interesting things that I have observed with this

> >  - Post-copy, by its very nature, obviously ensured that the migraton would
> >    complete. While post-copy was running in pre-copy mode there was a somewhat
> >    chaotic small impact on guest CPU performance, causing performance to
> >    periodically oscillate between 400ms/GB and 800ms/GB. This is less than
> >    the impact at the start of each migration iteration which was 1000ms/GB
> >    in this test. There was also a massive penalty at time of switchover from
> >    pre to post copy, as to be expected. The migration completed in post-copy
> >    phase quite quickly though. For this workload, number of iterations in
> >    pre-copy mode before switching to post-copy did not have much impact. I
> >    expect a less extreme workload would have shown more interesting results
> >    wrt number of iterations of pre-copy:
> > 
> >     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/post-copy-iters.html
> 
> Hmm; I hadn't actually expected that much performance difference during the
> precopy phase (it used to in earlier postcopy versions but the later versions
> should have got simpler).  The number of iterations wouldn't make that much difference
> for your workload - because you're changing all of memory then we're going to have to
> resend it; if you had a workload where some of the memory was mostly static
> and some was rapidly changing, then one or two passes to transfer the mostly
> static data would show a benefit.

Ok, so I have repeated the tests with a standard kernel. I also measured
the exact same settings except without post-copy active, and also see
the exact same magnitude of jitter without post-copy. IOW, this is not
the fault of post-copy, its a factor whenever migration is running. What
is most interesting is that I see greater jitter in guest performance,
the higher the overall network transfer bandwidth is. ie with migration
throttled to 100mbs, the jitter is massively smaller than the jitter when
it is allowed to use 10gbs. Also, I only see the jitter on my 4 vCPU guest,
not the 1 vCPU guest.

The QEMU process is confined to only run on 4 pCPUs, so I believe the
cause of this jitter is simply a result of the migration thread in QEMU
stealing a little time from the vCPU threads.

IOW completely expected and there is no penalty of having post-copy
enabled even if you never get beyond the pre-copy stage :-)

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework
  2016-05-09 15:01   ` Daniel P. Berrange
@ 2016-05-09 15:36     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2016-05-09 15:36 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: qemu-devel, Juan Quintela, Amit Shah

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Thu, May 05, 2016 at 04:39:45PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > Some interesting things that I have observed with this
> 
> > >  - Post-copy, by its very nature, obviously ensured that the migraton would
> > >    complete. While post-copy was running in pre-copy mode there was a somewhat
> > >    chaotic small impact on guest CPU performance, causing performance to
> > >    periodically oscillate between 400ms/GB and 800ms/GB. This is less than
> > >    the impact at the start of each migration iteration which was 1000ms/GB
> > >    in this test. There was also a massive penalty at time of switchover from
> > >    pre to post copy, as to be expected. The migration completed in post-copy
> > >    phase quite quickly though. For this workload, number of iterations in
> > >    pre-copy mode before switching to post-copy did not have much impact. I
> > >    expect a less extreme workload would have shown more interesting results
> > >    wrt number of iterations of pre-copy:
> > > 
> > >     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/post-copy-iters.html
> > 
> > Hmm; I hadn't actually expected that much performance difference during the
> > precopy phase (it used to in earlier postcopy versions but the later versions
> > should have got simpler).  The number of iterations wouldn't make that much difference
> > for your workload - because you're changing all of memory then we're going to have to
> > resend it; if you had a workload where some of the memory was mostly static
> > and some was rapidly changing, then one or two passes to transfer the mostly
> > static data would show a benefit.
> 
> Ok, so I have repeated the tests with a standard kernel. I also measured
> the exact same settings except without post-copy active, and also see
> the exact same magnitude of jitter without post-copy. IOW, this is not
> the fault of post-copy, its a factor whenever migration is running.

OK, good.

> What
> is most interesting is that I see greater jitter in guest performance,
> the higher the overall network transfer bandwidth is. ie with migration
> throttled to 100mbs, the jitter is massively smaller than the jitter when
> it is allowed to use 10gbs.

That doesn't surprise me, for a few reasons; I think there are three
main sources of overhead:
    a) The syncing of the bitmap
    b) Write faults after a sync when the guest redirties a page
    c) The CPU overhead of shuffling pages down a socket and checking if
       they're zero etc

With a lower bandwidth connection (a) happens more rarely and (c) is lower.
Also, since (a) happens more rarely, and you only fault a page once
between sync's, (b) has a lower overhead.

> Also, I only see the jitter on my 4 vCPU guest, not the 1 vCPU guest.
>
> The QEMU process is confined to only run on 4 pCPUs, so I believe the
> cause of this jitter is simply a result of the migration thread in QEMU
> stealing a little time from the vCPU threads.

Oh yes, that's cruel - you need an extra pCPU for migration if you've
got a fast network connection because (a) & (c) are quite expensive.

> IOW completely expected and there is no penalty of having post-copy
> enabled even if you never get beyond the pre-copy stage :-)

Great.

Dave

> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v1 5/6] scripts: ensure monitor socket has SO_REUSEADDR set
  2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 5/6] scripts: ensure monitor socket has SO_REUSEADDR set Daniel P. Berrange
@ 2016-05-23  7:34   ` Amit Shah
  0 siblings, 0 replies; 13+ messages in thread
From: Amit Shah @ 2016-05-23  7:34 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: qemu-devel, Juan Quintela, Dr. David Alan Gilbert

On (Thu) 05 May 2016 [15:27:59], Daniel P. Berrange wrote:
> If tests use a TCP based monitor socket, the connection will
> go into a TIMED_WAIT state when the test exits. This will
> randomly prevent the test from being re-run without a certain
> time period. Set the SO_REUSEADDR flag on the socket to ensure
> we can immediately re-run the tests

Missing sign-off.

		Amit

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-05-23  7:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-05 14:27 [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Daniel P. Berrange
2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 1/6] scripts: add __init__.py file to scripts/qmp/ Daniel P. Berrange
2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 2/6] scripts: add a 'debug' parameter to QEMUMonitorProtocol Daniel P. Berrange
2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 3/6] scripts: refactor the VM class in iotests for reuse Daniel P. Berrange
2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 4/6] scripts: set timeout when waiting for qemu monitor connection Daniel P. Berrange
2016-05-05 14:27 ` [Qemu-devel] [PATCH v1 5/6] scripts: ensure monitor socket has SO_REUSEADDR set Daniel P. Berrange
2016-05-23  7:34   ` Amit Shah
2016-05-05 14:28 ` [Qemu-devel] [PATCH v1 6/6] tests: introduce a framework for testing migration performance Daniel P. Berrange
2016-05-05 15:39 ` [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework Dr. David Alan Gilbert
2016-05-09 15:01   ` Daniel P. Berrange
2016-05-09 15:36     ` Dr. David Alan Gilbert
2016-05-06  1:33 ` Li, Liang Z
2016-05-06 10:45 ` Amit Shah

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.