All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/8] block: generic image streaming
@ 2011-10-27 15:22 Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 1/8] coroutine: add co_sleep_ns() coroutine sleep function Stefan Hajnoczi
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-10-27 15:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

This series adds the 'block_stream' command which copies the contents of a
backing file into the image file while the VM is running.  The series builds on
copy-on-read and zero detection features which I sent out recently and I
suggest grabbing my git tree to try it out without merging these dependencies
yourself:

http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api

The image streaming HMP/QMP commands are documented in the patch and also
described here:

http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI

The basic idea is to execute 'block_stream virtio0' while the guest is running.
Progress can be monitored using 'info block-jobs'.  When the streaming
operation completes it raises a QMP event.

This series includes a Python test script called test-stream.py.  When run in a
QEMU source tree it performs basic image streaming QMP tests.

TODO:
 * support 'base' argument for stream partial backing file chains
 * rate-limiting support, currently a NotSupported error is raised

My plan is to add rate-limiting shortly but the 'base' argument will require
more work later.  I'm sending these patches out to share the general direction
and let folks know what to expect as I continue to test this code.

Stefan Hajnoczi (8):
  coroutine: add co_sleep_ns() coroutine sleep function
  block: add BlockJob interface for long-running operations
  block: add image streaming block job
  qmp: add block_stream command
  qmp: add block_job_set_speed command
  qmp: add block_job_cancel command
  qmp: add query-block-jobs
  test: add image streaming test cases

 Makefile.objs          |    3 +-
 block/stream.c         |  135 ++++++++++++++++++++++++++++++++
 block_int.h            |   98 +++++++++++++++++++++++
 blockdev.c             |  183 +++++++++++++++++++++++++++++++++++++++++++
 blockdev.h             |    8 ++
 hmp-commands.hx        |   45 +++++++++++
 monitor.c              |   19 +++++
 monitor.h              |    1 +
 qemu-coroutine-sleep.c |   38 +++++++++
 qemu-coroutine.h       |    6 ++
 qerror.c               |    4 +
 qerror.h               |    3 +
 qmp-commands.hx        |  173 +++++++++++++++++++++++++++++++++++++++++
 test-stream.py         |  200 ++++++++++++++++++++++++++++++++++++++++++++++++
 trace-events           |   10 +++
 15 files changed, 925 insertions(+), 1 deletions(-)
 create mode 100644 block/stream.c
 create mode 100644 qemu-coroutine-sleep.c
 create mode 100644 test-stream.py

-- 
1.7.7

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 1/8] coroutine: add co_sleep_ns() coroutine sleep function
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
@ 2011-10-27 15:22 ` Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 2/8] block: add BlockJob interface for long-running operations Stefan Hajnoczi
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-10-27 15:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 Makefile.objs          |    2 +-
 qemu-coroutine-sleep.c |   38 ++++++++++++++++++++++++++++++++++++++
 qemu-coroutine.h       |    6 ++++++
 3 files changed, 45 insertions(+), 1 deletions(-)
 create mode 100644 qemu-coroutine-sleep.c

diff --git a/Makefile.objs b/Makefile.objs
index 01587c8..c290fd3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -12,7 +12,7 @@ oslib-obj-$(CONFIG_POSIX) += oslib-posix.o qemu-thread-posix.o
 
 #######################################################################
 # coroutines
-coroutine-obj-y = qemu-coroutine.o qemu-coroutine-lock.o
+coroutine-obj-y = qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-sleep.o
 ifeq ($(CONFIG_UCONTEXT_COROUTINE),y)
 coroutine-obj-$(CONFIG_POSIX) += coroutine-ucontext.o
 else
diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c
new file mode 100644
index 0000000..fd65274
--- /dev/null
+++ b/qemu-coroutine-sleep.c
@@ -0,0 +1,38 @@
+/*
+ * QEMU coroutine sleep
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Stefan Hajnoczi    <stefanha@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-coroutine.h"
+#include "qemu-timer.h"
+
+typedef struct CoSleepCB {
+    QEMUTimer *ts;
+    Coroutine *co;
+} CoSleepCB;
+
+static void co_sleep_cb(void *opaque)
+{
+    CoSleepCB *sleep_cb = opaque;
+
+    qemu_free_timer(sleep_cb->ts);
+    qemu_coroutine_enter(sleep_cb->co, NULL);
+}
+
+void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns)
+{
+    CoSleepCB sleep_cb = {
+        .co = qemu_coroutine_self(),
+    };
+    sleep_cb.ts = qemu_new_timer(clock, SCALE_NS, co_sleep_cb, &sleep_cb);
+    qemu_mod_timer(sleep_cb.ts, qemu_get_clock_ns(clock) + ns);
+    qemu_coroutine_yield();
+}
diff --git a/qemu-coroutine.h b/qemu-coroutine.h
index b8fc4f4..e824327 100644
--- a/qemu-coroutine.h
+++ b/qemu-coroutine.h
@@ -17,6 +17,7 @@
 
 #include <stdbool.h>
 #include "qemu-queue.h"
+#include "qemu-timer.h"
 
 /**
  * Coroutines are a mechanism for stack switching and can be used for
@@ -188,4 +189,9 @@ void qemu_co_rwlock_wrlock(CoRwlock *lock);
  */
 void qemu_co_rwlock_unlock(CoRwlock *lock);
 
+/**
+ * Yield the coroutine for a given duration
+ */
+void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns);
+
 #endif /* QEMU_COROUTINE_H */
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 2/8] block: add BlockJob interface for long-running operations
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 1/8] coroutine: add co_sleep_ns() coroutine sleep function Stefan Hajnoczi
@ 2011-10-27 15:22 ` Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 3/8] block: add image streaming block job Stefan Hajnoczi
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-10-27 15:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block_int.h |   95 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/block_int.h b/block_int.h
index 8295dae..4da1b74 100644
--- a/block_int.h
+++ b/block_int.h
@@ -51,6 +51,37 @@ typedef struct AIOPool {
     BlockDriverAIOCB *free_aiocb;
 } AIOPool;
 
+typedef void BlockJobCancelFunc(void *opaque);
+typedef struct BlockJob BlockJob;
+typedef struct BlockJobType {
+    /** Derived BlockJob struct size */
+    size_t instance_size;
+
+    /** String describing the operation, part of query-block-jobs QMP API */
+    const char *job_type;
+
+    /** Is the block_job_set_speed() rate-limiting interface supported? */
+    bool set_speed_supported;
+} BlockJobType;
+
+/**
+ * Long-running operation on a BlockDriverState
+ */
+struct BlockJob {
+    const BlockJobType *job_type;
+    BlockDriverState *bs;
+
+    /* These fields are published by the query-block-jobs QMP API */
+    int64_t offset;
+    int64_t len;
+    int64_t speed;
+
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+    BlockJobCancelFunc *cancel_cb;
+    void *cancel_opaque;
+};
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -228,6 +259,9 @@ struct BlockDriverState {
 
     int request_tracking;       /* reference count, 0 means off */
     QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
+
+    /* long-running background operation */
+    BlockJob *job;
 };
 
 struct BlockDriverAIOCB {
@@ -251,4 +285,65 @@ void qemu_aio_release(void *p);
 int is_windows_drive(const char *filename);
 #endif
 
+static inline void *block_job_create(const BlockJobType *job_type,
+                                     BlockDriverState *bs,
+                                     BlockDriverCompletionFunc *cb,
+                                     void *opaque)
+{
+    BlockJob *job;
+
+    if (bdrv_in_use(bs)) {
+        return NULL;
+    }
+    bdrv_set_in_use(bs, 1);
+
+    job = g_malloc0(job_type->instance_size);
+    job->job_type      = job_type;
+    job->bs            = bs;
+    job->cb            = cb;
+    job->opaque        = opaque;
+    bs->job = job;
+    return job;
+}
+
+static inline void block_job_complete(BlockJob *job, int ret)
+{
+    BlockDriverState *bs = job->bs;
+
+    assert(bs->job == job);
+    if (job->cancel_cb) {
+        job->cancel_cb(job->cancel_opaque);
+    } else {
+        job->cb(job->opaque, ret);
+    }
+    bs->job = NULL;
+    g_free(job);
+    bdrv_set_in_use(bs, 0);
+}
+
+static inline int block_job_set_speed(BlockJob *job, int64_t value)
+{
+    if (!job->job_type->set_speed_supported) {
+        return -ENOTSUP;
+    }
+    job->speed = value;
+    return 0;
+}
+
+/*
+ * Caller must wait for the completion callback to be invoked
+ */
+static inline void block_job_cancel(BlockJob *job, BlockJobCancelFunc *cb,
+                                    void *opaque)
+{
+    assert(!job->cancel_cb);
+    job->cancel_cb     = cb;
+    job->cancel_opaque = opaque;
+}
+
+static inline bool block_job_is_cancelled(BlockJob *job)
+{
+    return job->cancel_cb;
+}
+
 #endif /* BLOCK_INT_H */
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 3/8] block: add image streaming block job
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 1/8] coroutine: add co_sleep_ns() coroutine sleep function Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 2/8] block: add BlockJob interface for long-running operations Stefan Hajnoczi
@ 2011-10-27 15:22 ` Stefan Hajnoczi
  2011-11-01 18:06   ` Marcelo Tosatti
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 4/8] qmp: add block_stream command Stefan Hajnoczi
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-10-27 15:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 Makefile.objs  |    1 +
 block/stream.c |  135 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block_int.h    |    3 +
 trace-events   |    4 ++
 4 files changed, 143 insertions(+), 0 deletions(-)
 create mode 100644 block/stream.c

diff --git a/Makefile.objs b/Makefile.objs
index c290fd3..802db96 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -34,6 +34,7 @@ block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += stream.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/stream.c b/block/stream.c
new file mode 100644
index 0000000..8cdf566
--- /dev/null
+++ b/block/stream.c
@@ -0,0 +1,135 @@
+/*
+ * Image streaming
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "trace.h"
+#include "block_int.h"
+
+enum {
+    /*
+     * Size of data buffer for populating the image file.  This should be large
+     * enough to process multiple clusters in a single call, so that populating
+     * contiguous regions of the image is efficient.
+     */
+    STREAM_BUFFER_SIZE = 512 * 1024, /* in bytes */
+};
+
+typedef struct StreamBlockJob {
+    BlockJob common;
+    BlockDriverState *base;
+} StreamBlockJob;
+
+static int coroutine_fn stream_populate(BlockDriverState *bs,
+                                        int64_t sector_num, int nb_sectors,
+                                        void *buf)
+{
+    struct iovec iov = {
+        .iov_base = buf,
+        .iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
+    };
+    QEMUIOVector qiov;
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    /* Copy-on-read the unallocated clusters */
+    return bdrv_co_readv(bs, sector_num, nb_sectors, &qiov);
+}
+
+static int stream_one_iteration(StreamBlockJob *s, int64_t sector_num,
+                                void *buf, int max_sectors, int *n)
+{
+    BlockDriverState *bs = s->common.bs;
+    int ret;
+
+    trace_stream_one_iteration(s, sector_num, max_sectors);
+
+    ret = bdrv_is_allocated(bs, sector_num, max_sectors, n);
+    if (ret < 0) {
+        return ret;
+    }
+    if (!ret) {
+        ret = stream_populate(bs, sector_num, *n, buf);
+    }
+    return ret;
+}
+
+static void coroutine_fn stream_run(void *opaque)
+{
+    StreamBlockJob *s = opaque;
+    BlockDriverState *bs = s->common.bs;
+    int64_t sector_num, end;
+    int ret = 0;
+    int n;
+    void *buf;
+
+    buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);
+    s->common.len = bdrv_getlength(bs);
+    bdrv_get_geometry(bs, (uint64_t *)&end);
+
+    bdrv_set_zero_detection(bs, true);
+    bdrv_set_copy_on_read(bs, true);
+
+    for (sector_num = 0; sector_num < end; sector_num += n) {
+        if (block_job_is_cancelled(&s->common)) {
+            break;
+        }
+
+        /* TODO rate-limit */
+        /* Note that even when no rate limit is applied we need to yield with
+         * no pending I/O here so that qemu_aio_flush() is able to return.
+         */
+        co_sleep_ns(rt_clock, 0);
+
+        ret = stream_one_iteration(s, sector_num, buf,
+                                   STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
+        if (ret < 0) {
+            break;
+        }
+
+        /* Publish progress */
+        s->common.offset += n * BDRV_SECTOR_SIZE;
+    }
+
+    bdrv_set_copy_on_read(bs, false);
+    bdrv_set_zero_detection(bs, false);
+
+    if (sector_num == end && ret == 0) {
+        bdrv_change_backing_file(bs, NULL, NULL);
+    }
+
+    qemu_vfree(buf);
+    block_job_complete(&s->common, ret);
+}
+
+static BlockJobType stream_job_type = {
+    .instance_size = sizeof(StreamBlockJob),
+    .job_type      = "stream",
+};
+
+int stream_start(BlockDriverState *bs, BlockDriverState *base,
+                 BlockDriverCompletionFunc *cb, void *opaque)
+{
+    StreamBlockJob *s;
+    Coroutine *co;
+
+    if (bs->job) {
+        return -EBUSY;
+    }
+
+    s = block_job_create(&stream_job_type, bs, cb, opaque);
+    s->base = base;
+
+    co = qemu_coroutine_create(stream_run);
+    trace_stream_start(bs, base, s, co, opaque);
+    qemu_coroutine_enter(co, s);
+    return 0;
+}
diff --git a/block_int.h b/block_int.h
index 4da1b74..c454170 100644
--- a/block_int.h
+++ b/block_int.h
@@ -346,4 +346,7 @@ static inline bool block_job_is_cancelled(BlockJob *job)
     return job->cancel_cb;
 }
 
+int stream_start(BlockDriverState *bs, BlockDriverState *base,
+                 BlockDriverCompletionFunc *cb, void *opaque);
+
 #endif /* BLOCK_INT_H */
diff --git a/trace-events b/trace-events
index e3a1e6f..487d560 100644
--- a/trace-events
+++ b/trace-events
@@ -71,6 +71,10 @@ bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"
 bdrv_co_io_em(void *bs, int64_t sector_num, int nb_sectors, int is_write, void *acb) "bs %p sector_num %"PRId64" nb_sectors %d is_write %d acb %p"
 bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t cluster_sector_num, int cluster_nb_sectors) "bs %p sector_num %"PRId64" nb_sectors %d cluster_sector_num %"PRId64" cluster_nb_sectors %d"
 
+# block/stream.c
+stream_one_iteration(void *s, int64_t sector_num, int max_sectors) "s %p sector_num %"PRId64" max_sectors %d"
+stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p"
+
 # hw/virtio-blk.c
 virtio_blk_req_complete(void *req, int status) "req %p status %d"
 virtio_blk_rw_complete(void *req, int ret) "req %p ret %d"
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 4/8] qmp: add block_stream command
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 3/8] block: add image streaming block job Stefan Hajnoczi
@ 2011-10-27 15:22 ` Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 5/8] qmp: add block_job_set_speed command Stefan Hajnoczi
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-10-27 15:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Add the block_stream command, which starts copy backing file contents
into the image file.  Later patches add control over the background copy
speed, cancelation, and querying running streaming operations.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 blockdev.c      |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 blockdev.h      |    1 +
 hmp-commands.hx |   14 +++++++++++
 monitor.c       |    3 ++
 monitor.h       |    1 +
 qerror.c        |    4 +++
 qerror.h        |    3 ++
 qmp-commands.hx |   65 +++++++++++++++++++++++++++++++++++++++++++++++++++
 trace-events    |    4 +++
 9 files changed, 164 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 1dd0f23..de911de 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -13,8 +13,10 @@
 #include "qerror.h"
 #include "qemu-option.h"
 #include "qemu-config.h"
+#include "qemu-objects.h"
 #include "sysemu.h"
 #include "block_int.h"
+#include "trace.h"
 
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
 
@@ -782,3 +784,70 @@ int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data)
 
     return 0;
 }
+
+static QObject *qobject_from_block_job(BlockJob *job)
+{
+    return qobject_from_jsonf("{ 'type': %s,"
+                              "'device': %s,"
+                              "'len': %" PRId64 ","
+                              "'offset': %" PRId64 ","
+                              "'speed': %" PRId64 " }",
+                              job->job_type->job_type,
+                              bdrv_get_device_name(job->bs),
+                              job->len,
+                              job->offset,
+                              job->speed);
+}
+
+static void block_stream_cb(void *opaque, int ret)
+{
+    BlockDriverState *bs = opaque;
+    QObject *obj;
+
+    trace_block_stream_cb(bs, bs->job, ret);
+
+    assert(bs->job);
+    obj = qobject_from_block_job(bs->job);
+    if (ret < 0) {
+        QDict *dict = qobject_to_qdict(obj);
+        qdict_put(dict, "error", qstring_from_str(strerror(-ret)));
+    }
+
+    monitor_protocol_event(QEVENT_BLOCK_JOB_COMPLETED, obj);
+    qobject_decref(obj);
+}
+
+int do_block_stream(Monitor *mon, const QDict *params, QObject **ret_data)
+{
+    const char *device = qdict_get_str(params, "device");
+    const char *base = qdict_get_try_str(params, "base");
+    BlockDriverState *bs;
+    int ret;
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, device);
+        return -1;
+    }
+
+    /* Base device not supported */
+    if (base) {
+        qerror_report(QERR_NOT_SUPPORTED);
+        return -1;
+    }
+
+    ret = stream_start(bs, NULL, block_stream_cb, bs);
+    if (ret < 0) {
+        switch (ret) {
+        case -EBUSY:
+            qerror_report(QERR_DEVICE_IN_USE, device);
+            return -1;
+        default:
+            qerror_report(QERR_NOT_SUPPORTED);
+            return -1;
+        }
+    }
+
+    trace_do_block_stream(bs, bs->job);
+    return 0;
+}
diff --git a/blockdev.h b/blockdev.h
index 3587786..ad98d37 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -65,5 +65,6 @@ int do_change_block(Monitor *mon, const char *device,
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_block_stream(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 05a1498..2aeb2e0 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -70,6 +70,20 @@ but should be used with extreme caution.  Note that this command only
 resizes image files, it can not resize block devices like LVM volumes.
 ETEXI
 
+    {
+        .name       = "block_stream",
+        .args_type  = "device:B,base:s?",
+        .params     = "device [base]",
+        .help       = "copy data from a backing file into a block device",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_stream,
+    },
+
+STEXI
+@item block_stream
+@findex block_stream
+Copy data from a backing file into a block device.
+ETEXI
 
     {
         .name       = "eject",
diff --git a/monitor.c b/monitor.c
index ffda0fe..38addcf 100644
--- a/monitor.c
+++ b/monitor.c
@@ -482,6 +482,9 @@ void monitor_protocol_event(MonitorEvent event, QObject *data)
         case QEVENT_SPICE_DISCONNECTED:
             event_name = "SPICE_DISCONNECTED";
             break;
+        case QEVENT_BLOCK_JOB_COMPLETED:
+            event_name = "BLOCK_JOB_COMPLETED";
+            break;
         default:
             abort();
             break;
diff --git a/monitor.h b/monitor.h
index 4f2d328..135c927 100644
--- a/monitor.h
+++ b/monitor.h
@@ -35,6 +35,7 @@ typedef enum MonitorEvent {
     QEVENT_SPICE_CONNECTED,
     QEVENT_SPICE_INITIALIZED,
     QEVENT_SPICE_DISCONNECTED,
+    QEVENT_BLOCK_JOB_COMPLETED,
     QEVENT_MAX,
 } MonitorEvent;
 
diff --git a/qerror.c b/qerror.c
index 68998d4..f531afa 100644
--- a/qerror.c
+++ b/qerror.c
@@ -162,6 +162,10 @@ static const QErrorStringTable qerror_table[] = {
         .desc      = "No '%(bus)' bus found for device '%(device)'",
     },
     {
+        .error_fmt = QERR_NOT_SUPPORTED,
+        .desc      = "Not supported",
+    },
+    {
         .error_fmt = QERR_OPEN_FILE_FAILED,
         .desc      = "Could not open '%(filename)'",
     },
diff --git a/qerror.h b/qerror.h
index d4bfcfd..3988610 100644
--- a/qerror.h
+++ b/qerror.h
@@ -141,6 +141,9 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_NO_BUS_FOR_DEVICE \
     "{ 'class': 'NoBusForDevice', 'data': { 'device': %s, 'bus': %s } }"
 
+#define QERR_NOT_SUPPORTED \
+    "{ 'class': 'NotSupported', 'data': {} }"
+
 #define QERR_OPEN_FILE_FAILED \
     "{ 'class': 'OpenFileFailed', 'data': { 'filename': %s } }"
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4328e8b..31cde4b 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -685,6 +685,71 @@ Example:
 EQMP
 
     {
+        .name       = "block_stream",
+        .args_type  = "device:B,base:s?",
+        .params     = "device [base]",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_stream,
+    },
+
+SQMP
+block_stream
+------------
+
+Copy data from a backing file into a block device.
+
+The block streaming operation is performed in the background until the entire
+backing file has been copied.  This command returns immediately once streaming
+has started.  The status of ongoing block streaming operations can be checked
+with query-block-jobs.  The operation can be stopped before it has completed
+using the block_job_cancel command.
+
+If a base file is specified then sectors are not copied from that base file and
+its backing chain.  When streaming completes the image file will have the base
+file as its backing file.  This can be used to stream a subset of the backing
+file chain instead of flattening the entire image.
+
+On successful completion the image file is updated to drop the backing file.
+
+Arguments:
+
+- device: device name (json-string)
+- base:   common backing file (json-string, optional)
+
+Errors:
+
+DeviceInUse:    streaming is already active on this device
+DeviceNotFound: device name is invalid
+NotSupported:   image streaming is not supported by this device
+
+Events:
+
+On completion the BLOCK_JOB_COMPLETED event is raised with the following
+fields:
+
+- type:     job type ("stream" for image streaming, json-string)
+- device:   device name (json-string)
+- len:      maximum progress value (json-int)
+- offset:   current progress value (json-int)
+- speed:    rate limit, bytes per second (json-int)
+- error:    error message (json-string, only on error)
+
+The completion event is raised both on success and on failure.  On
+success offset is equal to len.  On failure offset and len can be
+used to indicate at which point the operation failed.
+
+On failure the error field contains a human-readable error message.  There are
+no semantics other than that streaming has failed and clients should not try
+to interpret the error string.
+
+Examples:
+
+-> { "execute": "block_stream", "arguments": { "device": "virtio0" } }
+<- { "return":  {} }
+
+EQMP
+
+    {
         .name       = "blockdev-snapshot-sync",
         .args_type  = "device:B,snapshot-file:s?,format:s?",
         .params     = "device [new-image-file] [format]",
diff --git a/trace-events b/trace-events
index 487d560..15a6b7a 100644
--- a/trace-events
+++ b/trace-events
@@ -75,6 +75,10 @@ bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t clus
 stream_one_iteration(void *s, int64_t sector_num, int max_sectors) "s %p sector_num %"PRId64" max_sectors %d"
 stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p"
 
+# blockdev.c
+block_stream_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
+do_block_stream(void *bs, void *job) "bs %p job %p"
+
 # hw/virtio-blk.c
 virtio_blk_req_complete(void *req, int status) "req %p status %d"
 virtio_blk_rw_complete(void *req, int ret) "req %p ret %d"
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 5/8] qmp: add block_job_set_speed command
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 4/8] qmp: add block_stream command Stefan Hajnoczi
@ 2011-10-27 15:22 ` Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 6/8] qmp: add block_job_cancel command Stefan Hajnoczi
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-10-27 15:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Add block_job_set_speed, which sets the maximum speed for a background
block operation.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 blockdev.c      |   33 +++++++++++++++++++++++++++++++++
 blockdev.h      |    2 ++
 hmp-commands.hx |   15 +++++++++++++++
 qmp-commands.hx |   35 +++++++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index de911de..781825b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -851,3 +851,36 @@ int do_block_stream(Monitor *mon, const QDict *params, QObject **ret_data)
     trace_do_block_stream(bs, bs->job);
     return 0;
 }
+
+static BlockJob *find_block_job(const char *device)
+{
+    BlockDriverState *bs;
+
+    bs = bdrv_find(device);
+    if (!bs || !bs->job) {
+        return NULL;
+    }
+    return bs->job;
+}
+
+int do_block_job_set_speed(Monitor *mon, const QDict *params,
+                           QObject **ret_data)
+{
+    const char *device = qdict_get_str(params, "device");
+    BlockJob *job = find_block_job(device);
+    int64_t value;
+    int ret;
+
+    if (!job) {
+        qerror_report(QERR_DEVICE_NOT_ACTIVE, device);
+        return -1;
+    }
+
+    value = qdict_get_int(params, "value");
+    ret = block_job_set_speed(job, value);
+    if (ret == -ENOTSUP) {
+        qerror_report(QERR_NOT_SUPPORTED);
+        return -1;
+    }
+    return 0;
+}
diff --git a/blockdev.h b/blockdev.h
index ad98d37..6b48405 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -66,5 +66,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_stream(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_block_job_set_speed(Monitor *mon, const QDict *qdict,
+                           QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 2aeb2e0..2cdfa0b 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -86,6 +86,21 @@ Copy data from a backing file into a block device.
 ETEXI
 
     {
+        .name       = "block_job_set_speed",
+        .args_type  = "device:B,value:o",
+        .params     = "device value",
+        .help       = "set maximum speed for a background block operation",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_job_set_speed,
+    },
+
+STEXI
+@item block_job_set_stream
+@findex block_job_set_stream
+Set maximum speed for a background block operation.
+ETEXI
+
+    {
         .name       = "eject",
         .args_type  = "force:-f,device:B",
         .params     = "[-f] device",
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 31cde4b..6cfb548 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -750,6 +750,41 @@ Examples:
 EQMP
 
     {
+        .name       = "block_job_set_speed",
+        .args_type  = "device:B,value:o",
+        .params     = "device value",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_job_set_speed,
+    },
+
+SQMP
+
+block_job_set_speed
+-------------------
+
+Set maximum speed for a background block operation.
+
+This command can only be issued when there is an active block job.
+
+Throttling can be disabled by setting the speed to 0.
+
+Arguments:
+
+- device: device name (json-string)
+- value:  maximum speed, in bytes per second (json-int)
+
+Errors:
+NotSupported:    job type does not support speed setting
+DeviceNotActive: streaming is not active on this device
+
+Example:
+
+-> { "execute": "block_job_set_speed",
+    "arguments": { "device": "virtio0", "value": 1024 } }
+
+EQMP
+
+    {
         .name       = "blockdev-snapshot-sync",
         .args_type  = "device:B,snapshot-file:s?,format:s?",
         .params     = "device [new-image-file] [format]",
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 6/8] qmp: add block_job_cancel command
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 5/8] qmp: add block_job_set_speed command Stefan Hajnoczi
@ 2011-10-27 15:22 ` Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 7/8] qmp: add query-block-jobs Stefan Hajnoczi
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-10-27 15:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Add block_job_cancel, which stops an active block streaming operation.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 blockdev.c      |   35 +++++++++++++++++++++++++++++++++++
 blockdev.h      |    3 +++
 hmp-commands.hx |   16 ++++++++++++++++
 qmp-commands.hx |   41 +++++++++++++++++++++++++++++++++++++++++
 trace-events    |    2 ++
 5 files changed, 97 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 781825b..5ad7d3b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -884,3 +884,38 @@ int do_block_job_set_speed(Monitor *mon, const QDict *params,
     }
     return 0;
 }
+
+typedef struct BlockJobCancelData {
+    MonitorCompletion *cb;
+    void *opaque;
+} BlockJobCancelData;
+
+static void do_block_job_cancel_cb(void *opaque)
+{
+    BlockJobCancelData *cancel_data = opaque;
+
+    trace_block_job_cancel_cb(cancel_data, cancel_data->opaque);
+
+    cancel_data->cb(cancel_data->opaque, NULL);
+    g_free(cancel_data);
+}
+
+int do_block_job_cancel(Monitor *mon, const QDict *params,
+                        MonitorCompletion *cb, void *opaque)
+{
+    const char *device = qdict_get_str(params, "device");
+    BlockJob *job = find_block_job(device);
+    BlockJobCancelData *cancel_data;
+
+    if (!job) {
+        qerror_report(QERR_DEVICE_NOT_ACTIVE, device);
+        return -1;
+    }
+
+    cancel_data = g_malloc(sizeof(*cancel_data));
+    cancel_data->cb = cb;
+    cancel_data->opaque = opaque;
+    trace_do_block_job_cancel(job, cancel_data, opaque);
+    block_job_cancel(job, do_block_job_cancel_cb, cancel_data);
+    return 0;
+}
diff --git a/blockdev.h b/blockdev.h
index 6b48405..7d3db30 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -11,6 +11,7 @@
 #define BLOCKDEV_H
 
 #include "block.h"
+#include "monitor.h"
 #include "qemu-queue.h"
 
 void blockdev_mark_auto_del(BlockDriverState *bs);
@@ -68,5 +69,7 @@ int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_stream(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_job_set_speed(Monitor *mon, const QDict *qdict,
                            QObject **ret_data);
+int do_block_job_cancel(Monitor *mon, const QDict *params,
+                        MonitorCompletion *cb, void *opaque);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 2cdfa0b..a8b83f0 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -101,6 +101,22 @@ Set maximum speed for a background block operation.
 ETEXI
 
     {
+        .name       = "block_job_cancel",
+        .args_type  = "device:B",
+        .params     = "device",
+        .help       = "stop an active block streaming operation",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_async = do_block_job_cancel,
+        .flags      = MONITOR_CMD_ASYNC,
+    },
+
+STEXI
+@item block_job_cancel
+@findex block_job_cancel
+Stop an active block streaming operation.
+ETEXI
+
+    {
         .name       = "eject",
         .args_type  = "force:-f,device:B",
         .params     = "[-f] device",
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 6cfb548..741c2f8 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -785,6 +785,47 @@ Example:
 EQMP
 
     {
+        .name       = "block_job_cancel",
+        .args_type  = "device:B",
+        .params     = "device",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_async = do_block_job_cancel,
+        .flags      = MONITOR_CMD_ASYNC,
+    },
+
+SQMP
+
+block_job_cancel
+----------------
+
+Stop an active block streaming operation.
+
+This command returns once the active block streaming operation has been
+stopped.  It is an error to call this command if no operation is in progress.
+
+The image file retains its backing file unless the streaming operation happens
+to complete just as it is being cancelled.
+
+A new block streaming operation can be started at a later time to finish
+copying all data from the backing file.
+
+Arguments:
+
+- device: device name (json-string)
+
+Errors:
+
+DeviceNotActive: streaming is not active on this device
+DeviceInUse:     cancellation already in progress
+
+Examples:
+
+-> { "execute": "block_job_cancel", "arguments": { "device": "virtio0" } }
+<- { "return":  {} }
+
+EQMP
+
+    {
         .name       = "blockdev-snapshot-sync",
         .args_type  = "device:B,snapshot-file:s?,format:s?",
         .params     = "device [new-image-file] [format]",
diff --git a/trace-events b/trace-events
index 15a6b7a..39023cc 100644
--- a/trace-events
+++ b/trace-events
@@ -76,6 +76,8 @@ stream_one_iteration(void *s, int64_t sector_num, int max_sectors) "s %p sector_
 stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p"
 
 # blockdev.c
+block_job_cancel_cb(void *cancel_data, void *opaque) "cancel_data %p opaque %p"
+do_block_job_cancel(void *job, void *cancel_data, void *opaque) "job %p cancel_data %p opaque %p"
 block_stream_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 do_block_stream(void *bs, void *job) "bs %p job %p"
 
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 7/8] qmp: add query-block-jobs
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 6/8] qmp: add block_job_cancel command Stefan Hajnoczi
@ 2011-10-27 15:22 ` Stefan Hajnoczi
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 8/8] test: add image streaming test cases Stefan Hajnoczi
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-10-27 15:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Add query-block-jobs, which shows the progress of ongoing block device
operations.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 blockdev.c      |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 blockdev.h      |    2 ++
 monitor.c       |   16 ++++++++++++++++
 qmp-commands.hx |   32 ++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 5ad7d3b..6d78597 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -919,3 +919,49 @@ int do_block_job_cancel(Monitor *mon, const QDict *params,
     block_job_cancel(job, do_block_job_cancel_cb, cancel_data);
     return 0;
 }
+
+static void monitor_print_block_jobs_one(QObject *info, void *opaque)
+{
+    QDict *stream = qobject_to_qdict(info);
+    Monitor *mon = opaque;
+
+    /* TODO libvirt searches for "Streaming device" but this might be a
+     * non-stream job */
+    monitor_printf(mon, "Streaming device %s: Completed %" PRId64 " of %"
+                   PRId64 " bytes, speed limit %" PRId64 " bytes/s\n",
+                   qdict_get_str(stream, "device"),
+                   qdict_get_int(stream, "offset"),
+                   qdict_get_int(stream, "len"),
+                   qdict_get_int(stream, "speed"));
+}
+
+void monitor_print_block_jobs(Monitor *mon, const QObject *data)
+{
+    QList *list = qobject_to_qlist(data);
+
+    assert(list);
+
+    if (qlist_empty(list)) {
+        monitor_printf(mon, "No active jobs\n");
+        return;
+    }
+
+    qlist_iter(list, monitor_print_block_jobs_one, mon);
+}
+
+static void do_info_block_jobs_one(void *opaque, BlockDriverState *bs)
+{
+    QList *list = opaque;
+    BlockJob *job = bs->job;
+
+    if (job) {
+        qlist_append_obj(list, qobject_from_block_job(job));
+    }
+}
+
+void do_info_block_jobs(Monitor *mon, QObject **ret_data)
+{
+    QList *list = qlist_new();
+    bdrv_iterate(do_info_block_jobs_one, list);
+    *ret_data = QOBJECT(list);
+}
diff --git a/blockdev.h b/blockdev.h
index 7d3db30..9bfc33d 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -71,5 +71,7 @@ int do_block_job_set_speed(Monitor *mon, const QDict *qdict,
                            QObject **ret_data);
 int do_block_job_cancel(Monitor *mon, const QDict *params,
                         MonitorCompletion *cb, void *opaque);
+void monitor_print_block_jobs(Monitor *mon, const QObject *data);
+void do_info_block_jobs(Monitor *mon, QObject **ret_data);
 
 #endif
diff --git a/monitor.c b/monitor.c
index 38addcf..ad878af 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2786,6 +2786,14 @@ static const mon_cmd_t info_cmds[] = {
         .mhandler.info_new = bdrv_info_stats,
     },
     {
+        .name       = "block-jobs",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show progress of ongoing block device operations",
+        .user_print = monitor_print_block_jobs,
+        .mhandler.info_new = do_info_block_jobs,
+    },
+    {
         .name       = "registers",
         .args_type  = "",
         .params     = "",
@@ -3080,6 +3088,14 @@ static const mon_cmd_t qmp_query_cmds[] = {
         .mhandler.info_new = bdrv_info_stats,
     },
     {
+        .name       = "block-jobs",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show progress of ongoing block device operations",
+        .user_print = monitor_print_block_jobs,
+        .mhandler.info_new = do_info_block_jobs,
+    },
+    {
         .name       = "cpus",
         .args_type  = "",
         .params     = "",
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 741c2f8..6e8207b 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2089,3 +2089,35 @@ Example:
 
 EQMP
 
+SQMP
+
+query-block-jobs
+----------------
+
+Show progress of ongoing block device operations.
+
+Return a json-array of all block device operations.  If no operation is active
+then return an empty array.  Each operation is a json-object with the following
+data:
+
+- type:     job type ("stream" for image streaming, json-string)
+- device:   device name (json-string)
+- len:      maximum progress value (json-int)
+- offset:   current progress value (json-int)
+- speed:    rate limit, bytes per second (json-int)
+
+Progress can be observed as offset increases and it reaches len when the
+operation completes.  Offset and len have undefined units but can be used to
+calculate a percentage indicating the progress that has been made.
+
+Example:
+
+-> { "execute": "query-block-jobs" }
+<- { "return":[
+      { "type": "stream", "device": "virtio0",
+        "len": 10737418240, "offset": 709632,
+        "speed": 0 }
+   ]
+ }
+
+EQMP
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 8/8] test: add image streaming test cases
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 7/8] qmp: add query-block-jobs Stefan Hajnoczi
@ 2011-10-27 15:22 ` Stefan Hajnoczi
  2011-10-27 18:58 ` [Qemu-devel] [PATCH 0/8] block: generic image streaming Luiz Capitulino
  2011-11-01 16:46 ` Marcelo Tosatti
  9 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-10-27 15:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

python test-stream.py

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 test-stream.py |  200 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 200 insertions(+), 0 deletions(-)
 create mode 100644 test-stream.py

diff --git a/test-stream.py b/test-stream.py
new file mode 100644
index 0000000..22200ec
--- /dev/null
+++ b/test-stream.py
@@ -0,0 +1,200 @@
+#!/usr/bin/env python
+import unittest
+import subprocess
+import re
+import os
+import sys; sys.path.append('QMP/')
+import qmp
+
+def qemu_img(*args):
+    devnull = open('/dev/null', 'r+')
+    return subprocess.call(['./qemu-img'] + list(args), stdin=devnull, stdout=devnull)
+
+def qemu_io(*args):
+    args = ['./qemu-io'] + list(args)
+    return subprocess.Popen(args, stdout=subprocess.PIPE).communicate()[0]
+
+class VM(object):
+    def __init__(self):
+        self._monitor_path = '/tmp/qemu-mon.%d' % os.getpid()
+        self._qemu_log_path = '/tmp/qemu-log.%d' % os.getpid()
+        self._args = ['x86_64-softmmu/qemu-system-x86_64',
+                      '-chardev', 'socket,id=mon,path=' + self._monitor_path,
+                      '-mon', 'chardev=mon,mode=control', '-nographic']
+        self._num_drives = 0
+
+    def add_drive(self, path, opts=''):
+        options = ['if=virtio',
+                   'cache=none',
+                   'file=%s' % path,
+                   'id=drive%d' % self._num_drives]
+        if opts:
+            options.append(opts)
+
+        self._args.append('-drive')
+        self._args.append(','.join(options))
+        self._num_drives += 1
+        return self
+
+    def launch(self):
+        devnull = open('/dev/null', 'rb')
+        qemulog = open(self._qemu_log_path, 'wb')
+        self._qmp = qmp.QEMUMonitorProtocol(self._monitor_path, server=True)
+        self._popen = subprocess.Popen(self._args, stdin=devnull, stdout=qemulog,
+                                       stderr=subprocess.STDOUT)
+        self._qmp.accept()
+
+    def shutdown(self):
+        self._qmp.cmd('quit')
+        self._popen.wait()
+        os.remove(self._monitor_path)
+        os.remove(self._qemu_log_path)
+
+    def qmp(self, cmd, **args):
+        return self._qmp.cmd(cmd, args=args)
+
+    def get_qmp_events(self, wait=False):
+        events = self._qmp.get_events(wait=wait)
+        self._qmp.clear_events()
+        return events
+
+index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
+
+class QMPTestCase(unittest.TestCase):
+    def dictpath(self, d, path):
+        """Traverse a path in a nested dict"""
+        for component in path.split('/'):
+            m = index_re.match(component)
+            if m:
+                component, idx = m.groups()
+                idx = int(idx)
+
+            if not isinstance(d, dict) or component not in d:
+                self.fail('failed path traversal for "%s" in "%s"' % (path, str(d)))
+            d = d[component]
+
+            if m:
+                if not isinstance(d, list):
+                    self.fail('path component "%s" in "%s" is not a list in "%s"' % (component, path, str(d)))
+                try:
+                    d = d[idx]
+                except IndexError:
+                    self.fail('invalid index "%s" in path "%s" in "%s"' % (idx, path, str(d)))
+        return d
+
+    def assert_qmp(self, d, path, value):
+        result = self.dictpath(d, path)
+        self.assertEqual(result, value, 'values not equal "%s" and "%s"' % (str(result), str(value)))
+
+    def assert_no_active_streams(self):
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return', [])
+
+class TestSingleDrive(QMPTestCase):
+    image_len = 1 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', 'backing.img', str(TestSingleDrive.image_len))
+        qemu_img('create', '-f', 'qed', '-o', 'backing_file=backing.img', 'test.qed')
+        self.vm = VM().add_drive('test.qed')
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove('test.qed')
+        os.remove('backing.img')
+
+    def test_stream(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block_stream', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+
+        self.assertFalse('sectors not allocated' in qemu_io('-c', 'map', 'test.qed'),
+                         'image file not fully populated after streaming')
+
+    def test_device_not_found(self):
+        result = self.vm.qmp('block_stream', device='nonexistent')
+        self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+
+class TestStreamStop(QMPTestCase):
+    image_len = 8 * 1024 * 1024 * 1024 # GB
+
+    def setUp(self):
+        qemu_img('create', 'backing.img', str(TestStreamStop.image_len))
+        qemu_img('create', '-f', 'qed', '-o', 'backing_file=backing.img', 'test.qed')
+        self.vm = VM().add_drive('test.qed')
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove('test.qed')
+        os.remove('backing.img')
+
+    def test_stream_stop(self):
+        import time
+
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block_stream', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        time.sleep(1)
+        events = self.vm.get_qmp_events(wait=False)
+        self.assertEqual(events, [], 'unexpected QMP event: %s' % events)
+
+        self.vm.qmp('block_job_cancel', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        self.assert_no_active_streams()
+
+class TestSetSpeed(QMPTestCase):
+    image_len = 80 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', 'backing.img', str(TestSetSpeed.image_len))
+        qemu_img('create', '-f', 'qed', '-o', 'backing_file=backing.img', 'test.qed')
+        self.vm = VM().add_drive('test.qed')
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove('test.qed')
+        os.remove('backing.img')
+
+    # This doesn't print or verify anything, only use it via "test-stream.py TestSetSpeed"
+    def test_stream_set_speed(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block_stream', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block_job_set_speed', device='drive0', value=8 * 1024 * 1024)
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+
+if __name__ == '__main__':
+    unittest.main()
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] block: generic image streaming
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 8/8] test: add image streaming test cases Stefan Hajnoczi
@ 2011-10-27 18:58 ` Luiz Capitulino
  2011-11-01 16:46 ` Marcelo Tosatti
  9 siblings, 0 replies; 17+ messages in thread
From: Luiz Capitulino @ 2011-10-27 18:58 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, qemu-devel

On Thu, 27 Oct 2011 16:22:47 +0100
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> wrote:

> This series adds the 'block_stream' command which copies the contents of a
> backing file into the image file while the VM is running.  The series builds on
> copy-on-read and zero detection features which I sent out recently and I
> suggest grabbing my git tree to try it out without merging these dependencies
> yourself:
> 
> http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api
> 
> The image streaming HMP/QMP commands are documented in the patch and also
> described here:
> 
> http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI

I haven't reviewed the series yet, but new QMP commands should be using
the QAPI now that the basic infrastructure is in place.

I'll write some documentation on how to do it shortly, but you already can
find examples in the tree (eg. qmp_stop()/hmp_stop() and
qmp_query_chardev()/hmp_info_chardev()).

> The basic idea is to execute 'block_stream virtio0' while the guest is running.
> Progress can be monitored using 'info block-jobs'.  When the streaming
> operation completes it raises a QMP event.
> 
> This series includes a Python test script called test-stream.py.  When run in a
> QEMU source tree it performs basic image streaming QMP tests.
> 
> TODO:
>  * support 'base' argument for stream partial backing file chains
>  * rate-limiting support, currently a NotSupported error is raised
> 
> My plan is to add rate-limiting shortly but the 'base' argument will require
> more work later.  I'm sending these patches out to share the general direction
> and let folks know what to expect as I continue to test this code.
> 
> Stefan Hajnoczi (8):
>   coroutine: add co_sleep_ns() coroutine sleep function
>   block: add BlockJob interface for long-running operations
>   block: add image streaming block job
>   qmp: add block_stream command
>   qmp: add block_job_set_speed command
>   qmp: add block_job_cancel command
>   qmp: add query-block-jobs
>   test: add image streaming test cases
> 
>  Makefile.objs          |    3 +-
>  block/stream.c         |  135 ++++++++++++++++++++++++++++++++
>  block_int.h            |   98 +++++++++++++++++++++++
>  blockdev.c             |  183 +++++++++++++++++++++++++++++++++++++++++++
>  blockdev.h             |    8 ++
>  hmp-commands.hx        |   45 +++++++++++
>  monitor.c              |   19 +++++
>  monitor.h              |    1 +
>  qemu-coroutine-sleep.c |   38 +++++++++
>  qemu-coroutine.h       |    6 ++
>  qerror.c               |    4 +
>  qerror.h               |    3 +
>  qmp-commands.hx        |  173 +++++++++++++++++++++++++++++++++++++++++
>  test-stream.py         |  200 ++++++++++++++++++++++++++++++++++++++++++++++++
>  trace-events           |   10 +++
>  15 files changed, 925 insertions(+), 1 deletions(-)
>  create mode 100644 block/stream.c
>  create mode 100644 qemu-coroutine-sleep.c
>  create mode 100644 test-stream.py
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] block: generic image streaming
  2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2011-10-27 18:58 ` [Qemu-devel] [PATCH 0/8] block: generic image streaming Luiz Capitulino
@ 2011-11-01 16:46 ` Marcelo Tosatti
  2011-11-02 11:06   ` Stefan Hajnoczi
  9 siblings, 1 reply; 17+ messages in thread
From: Marcelo Tosatti @ 2011-11-01 16:46 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Anthony Liguori, qemu-devel

On Thu, Oct 27, 2011 at 04:22:47PM +0100, Stefan Hajnoczi wrote:
> This series adds the 'block_stream' command which copies the contents of a
> backing file into the image file while the VM is running.  The series builds on
> copy-on-read and zero detection features which I sent out recently and I
> suggest grabbing my git tree to try it out without merging these dependencies
> yourself:
> 
> http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api

Zero detection and COR are not in this branch? (and its non-trivial to
clone it).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] block: add image streaming block job
  2011-10-27 15:22 ` [Qemu-devel] [PATCH 3/8] block: add image streaming block job Stefan Hajnoczi
@ 2011-11-01 18:06   ` Marcelo Tosatti
  2011-11-02 15:43     ` Stefan Hajnoczi
  0 siblings, 1 reply; 17+ messages in thread
From: Marcelo Tosatti @ 2011-11-01 18:06 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Anthony Liguori, qemu-devel

On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  Makefile.objs  |    1 +
>  block/stream.c |  135 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block_int.h    |    3 +
>  trace-events   |    4 ++
>  4 files changed, 143 insertions(+), 0 deletions(-)
>  create mode 100644 block/stream.c
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index c290fd3..802db96 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -34,6 +34,7 @@ block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow
>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-nested-y += qed-check.o
>  block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
> +block-nested-y += stream.o
>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>  block-nested-$(CONFIG_CURL) += curl.o
> diff --git a/block/stream.c b/block/stream.c
> new file mode 100644
> index 0000000..8cdf566
> --- /dev/null
> +++ b/block/stream.c
> @@ -0,0 +1,135 @@
> +/*
> + * Image streaming
> + *
> + * Copyright IBM, Corp. 2011
> + *
> + * Authors:
> + *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "trace.h"
> +#include "block_int.h"
> +
> +enum {
> +    /*
> +     * Size of data buffer for populating the image file.  This should be large
> +     * enough to process multiple clusters in a single call, so that populating
> +     * contiguous regions of the image is efficient.
> +     */
> +    STREAM_BUFFER_SIZE = 512 * 1024, /* in bytes */
> +};
> +
> +typedef struct StreamBlockJob {
> +    BlockJob common;
> +    BlockDriverState *base;
> +} StreamBlockJob;
> +
> +static int coroutine_fn stream_populate(BlockDriverState *bs,
> +                                        int64_t sector_num, int nb_sectors,
> +                                        void *buf)
> +{
> +    struct iovec iov = {
> +        .iov_base = buf,
> +        .iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
> +    };
> +    QEMUIOVector qiov;
> +
> +    qemu_iovec_init_external(&qiov, &iov, 1);
> +
> +    /* Copy-on-read the unallocated clusters */
> +    return bdrv_co_readv(bs, sector_num, nb_sectors, &qiov);
> +}
> +
> +static int stream_one_iteration(StreamBlockJob *s, int64_t sector_num,
> +                                void *buf, int max_sectors, int *n)
> +{
> +    BlockDriverState *bs = s->common.bs;
> +    int ret;
> +
> +    trace_stream_one_iteration(s, sector_num, max_sectors);
> +
> +    ret = bdrv_is_allocated(bs, sector_num, max_sectors, n);
> +    if (ret < 0) {
> +        return ret;
> +    }

bdrv_is_allocated is still synchronous? If so, there should be at least
a plan to make it asynchronous.

> +    if (!ret) {
> +        ret = stream_populate(bs, sector_num, *n, buf);
> +    }
> +    return ret;
> +}
> +
> +static void coroutine_fn stream_run(void *opaque)
> +{
> +    StreamBlockJob *s = opaque;
> +    BlockDriverState *bs = s->common.bs;
> +    int64_t sector_num, end;
> +    int ret = 0;
> +    int n;
> +    void *buf;
> +
> +    buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);
> +    s->common.len = bdrv_getlength(bs);
> +    bdrv_get_geometry(bs, (uint64_t *)&end);
> +
> +    bdrv_set_zero_detection(bs, true);
> +    bdrv_set_copy_on_read(bs, true);

Should distinguish between stream initiated and user initiated setting 
of zero detection and cor (so that unsetting below does not clear
user settings).

> +
> +    for (sector_num = 0; sector_num < end; sector_num += n) {
> +        if (block_job_is_cancelled(&s->common)) {
> +            break;
> +        }

If cancellation is seen here in the last loop iteration,
bdrv_change_backing_file below should not be executed.

> +
> +        /* TODO rate-limit */
> +        /* Note that even when no rate limit is applied we need to yield with
> +         * no pending I/O here so that qemu_aio_flush() is able to return.
> +         */
> +        co_sleep_ns(rt_clock, 0);

How do you plan to implement rate limit?

> +
> +        ret = stream_one_iteration(s, sector_num, buf,
> +                                   STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
> +        if (ret < 0) {
> +            break;
> +        }
> +
> +        /* Publish progress */
> +        s->common.offset += n * BDRV_SECTOR_SIZE;
> +    }
> +
> +    bdrv_set_copy_on_read(bs, false);
> +    bdrv_set_zero_detection(bs, false);
> +
> +    if (sector_num == end && ret == 0) {
> +        bdrv_change_backing_file(bs, NULL, NULL);
> +    }
> +
> +    qemu_vfree(buf);
> +    block_job_complete(&s->common, ret);
> +}
> +
> +static BlockJobType stream_job_type = {
> +    .instance_size = sizeof(StreamBlockJob),
> +    .job_type      = "stream",
> +};
> +
> +int stream_start(BlockDriverState *bs, BlockDriverState *base,
> +                 BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    StreamBlockJob *s;
> +    Coroutine *co;
> +
> +    if (bs->job) {
> +        return -EBUSY;
> +    }
> +
> +    s = block_job_create(&stream_job_type, bs, cb, opaque);
> +    s->base = base;
> +
> +    co = qemu_coroutine_create(stream_run);
> +    trace_stream_start(bs, base, s, co, opaque);
> +    qemu_coroutine_enter(co, s);
> +    return 0;
> +}

I'd like to see the shared base code implemented before this is merged.

On a related note, the maze of coroutine locks and waiting queues makes
it difficult to have a clear picture of the execution flow (perhaps that
is due to lack of familiarity with the use of coroutines in the block
code).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] block: generic image streaming
  2011-11-01 16:46 ` Marcelo Tosatti
@ 2011-11-02 11:06   ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-11-02 11:06 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On Tue, Nov 1, 2011 at 4:46 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Oct 27, 2011 at 04:22:47PM +0100, Stefan Hajnoczi wrote:
>> This series adds the 'block_stream' command which copies the contents of a
>> backing file into the image file while the VM is running.  The series builds on
>> copy-on-read and zero detection features which I sent out recently and I
>> suggest grabbing my git tree to try it out without merging these dependencies
>> yourself:
>>
>> http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api
>
> Zero detection and COR are not in this branch? (and its non-trivial to
> clone it).

Hi Marcelo,
I have pushed the world that is based on CoR and zero detection:
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] block: add image streaming block job
  2011-11-01 18:06   ` Marcelo Tosatti
@ 2011-11-02 15:43     ` Stefan Hajnoczi
  2011-11-02 16:43       ` Kevin Wolf
  2011-11-03 16:34       ` Marcelo Tosatti
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-11-02 15:43 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On Tue, Nov 1, 2011 at 6:06 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote:
>> +static int stream_one_iteration(StreamBlockJob *s, int64_t sector_num,
>> +                                void *buf, int max_sectors, int *n)
>> +{
>> +    BlockDriverState *bs = s->common.bs;
>> +    int ret;
>> +
>> +    trace_stream_one_iteration(s, sector_num, max_sectors);
>> +
>> +    ret = bdrv_is_allocated(bs, sector_num, max_sectors, n);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>
> bdrv_is_allocated is still synchronous? If so, there should be at least
> a plan to make it asynchronous.

Yes, that's a good discussion to have.  My thoughts are that
bdrv_is_allocated() should be executed in coroutine context.  The
semantics are a little tricky because of parallel requests:

1. If a write request is in progress when we do bdrv_is_allocated() we
might get back "unallocated" even though clusters are just being
allocated.
2. If a TRIM request is in progress when we do bdrv_is_allocated() we
might get back "allocated" even though clusters are just being
deallocated.

In order to be reliable the caller needs to be aware of parallel
requests.  I think it's correct to defer this problem to the caller.

In the case of image streaming we're not TRIM-safe, I haven't really
thought about it yet.  But we are safe against parallel write requests
because there is serialization to prevent copy-on-read requests from
racing with write requests.

>> +    if (!ret) {
>> +        ret = stream_populate(bs, sector_num, *n, buf);
>> +    }
>> +    return ret;
>> +}
>> +
>> +static void coroutine_fn stream_run(void *opaque)
>> +{
>> +    StreamBlockJob *s = opaque;
>> +    BlockDriverState *bs = s->common.bs;
>> +    int64_t sector_num, end;
>> +    int ret = 0;
>> +    int n;
>> +    void *buf;
>> +
>> +    buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);
>> +    s->common.len = bdrv_getlength(bs);
>> +    bdrv_get_geometry(bs, (uint64_t *)&end);
>> +
>> +    bdrv_set_zero_detection(bs, true);
>> +    bdrv_set_copy_on_read(bs, true);
>
> Should distinguish between stream initiated and user initiated setting
> of zero detection and cor (so that unsetting below does not clear
> user settings).

For zero detection I agree.

For copy-on-read it is questionable since once streaming is complete
it does not make sense to have copy-on-read enabled.

I will address this in the next revision and think more about the
copy-on-read case.

>> +
>> +    for (sector_num = 0; sector_num < end; sector_num += n) {
>> +        if (block_job_is_cancelled(&s->common)) {
>> +            break;
>> +        }
>
> If cancellation is seen here in the last loop iteration,
> bdrv_change_backing_file below should not be executed.

I documented this case in the QMP API.  I'm not sure if it's possible
to guarantee that the operation isn't just completing as you cancel
it.  Any blocking point between completion of the last iteration and
completing the operation is vulnerable to missing the cancel.  It's
easier to explicitly say the operation might just have completed when
you canceled, rather than trying to protect the completion path.  Do
you think it's a problem to have these loose semantics that I
described?

>> +
>> +        /* TODO rate-limit */
>> +        /* Note that even when no rate limit is applied we need to yield with
>> +         * no pending I/O here so that qemu_aio_flush() is able to return.
>> +         */
>> +        co_sleep_ns(rt_clock, 0);
>
> How do you plan to implement rate limit?

It was implemented in the QED-specific image streaming series:

http://repo.or.cz/w/qemu/stefanha.git/commitdiff/22f2c09d2fcfe5e49ac4604fd23e4744f549a476

That implementation works fine and is small but I'd like to reuse the
migration speed limit, if possible.  That way we don't have 3
different rate-limiting implementations in QEMU :).

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] block: add image streaming block job
  2011-11-02 15:43     ` Stefan Hajnoczi
@ 2011-11-02 16:43       ` Kevin Wolf
  2011-11-03 16:34       ` Marcelo Tosatti
  1 sibling, 0 replies; 17+ messages in thread
From: Kevin Wolf @ 2011-11-02 16:43 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi, qemu-devel

Am 02.11.2011 16:43, schrieb Stefan Hajnoczi:
> On Tue, Nov 1, 2011 at 6:06 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote:
>>> +static int stream_one_iteration(StreamBlockJob *s, int64_t sector_num,
>>> +                                void *buf, int max_sectors, int *n)
>>> +{
>>> +    BlockDriverState *bs = s->common.bs;
>>> +    int ret;
>>> +
>>> +    trace_stream_one_iteration(s, sector_num, max_sectors);
>>> +
>>> +    ret = bdrv_is_allocated(bs, sector_num, max_sectors, n);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>
>> bdrv_is_allocated is still synchronous? If so, there should be at least
>> a plan to make it asynchronous.
> 
> Yes, that's a good discussion to have.  My thoughts are that
> bdrv_is_allocated() should be executed in coroutine context.  

bdrv_is_allocated() isn't coroutine-safe. You need to introduce
bdrv_co_is_allocated and convert all drivers before you can do this. You
don't want to access the qcow2 metadata cache without holding the lock,
for example.

> The
> semantics are a little tricky because of parallel requests:
> 
> 1. If a write request is in progress when we do bdrv_is_allocated() we
> might get back "unallocated" even though clusters are just being
> allocated.
> 2. If a TRIM request is in progress when we do bdrv_is_allocated() we
> might get back "allocated" even though clusters are just being
> deallocated.
> 
> In order to be reliable the caller needs to be aware of parallel
> requests.  I think it's correct to defer this problem to the caller.

I agree.

> In the case of image streaming we're not TRIM-safe, I haven't really
> thought about it yet.  But we are safe against parallel write requests
> because there is serialization to prevent copy-on-read requests from
> racing with write requests.

I don't think it matters. If you lose a bdrv_discard, nothing bad has
happened. bdrv_discard means that you have undefined content afterwards.

Kevin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] block: add image streaming block job
  2011-11-02 15:43     ` Stefan Hajnoczi
  2011-11-02 16:43       ` Kevin Wolf
@ 2011-11-03 16:34       ` Marcelo Tosatti
  2011-11-04  8:03         ` Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: Marcelo Tosatti @ 2011-11-03 16:34 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On Wed, Nov 02, 2011 at 03:43:49PM +0000, Stefan Hajnoczi wrote:
> On Tue, Nov 1, 2011 at 6:06 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote:
> >> +static int stream_one_iteration(StreamBlockJob *s, int64_t sector_num,
> >> +                                void *buf, int max_sectors, int *n)
> >> +{
> >> +    BlockDriverState *bs = s->common.bs;
> >> +    int ret;
> >> +
> >> +    trace_stream_one_iteration(s, sector_num, max_sectors);
> >> +
> >> +    ret = bdrv_is_allocated(bs, sector_num, max_sectors, n);
> >> +    if (ret < 0) {
> >> +        return ret;
> >> +    }
> >
> > bdrv_is_allocated is still synchronous? If so, there should be at least
> > a plan to make it asynchronous.
> 
> Yes, that's a good discussion to have.  My thoughts are that
> bdrv_is_allocated() should be executed in coroutine context.  The
> semantics are a little tricky because of parallel requests:
> 
> 1. If a write request is in progress when we do bdrv_is_allocated() we
> might get back "unallocated" even though clusters are just being
> allocated.
> 2. If a TRIM request is in progress when we do bdrv_is_allocated() we
> might get back "allocated" even though clusters are just being
> deallocated.
> 
> In order to be reliable the caller needs to be aware of parallel
> requests.  I think it's correct to defer this problem to the caller.
> 
> In the case of image streaming we're not TRIM-safe, I haven't really
> thought about it yet.  But we are safe against parallel write requests
> because there is serialization to prevent copy-on-read requests from
> racing with write requests.
> 
> >> +    if (!ret) {
> >> +        ret = stream_populate(bs, sector_num, *n, buf);
> >> +    }
> >> +    return ret;
> >> +}
> >> +
> >> +static void coroutine_fn stream_run(void *opaque)
> >> +{
> >> +    StreamBlockJob *s = opaque;
> >> +    BlockDriverState *bs = s->common.bs;
> >> +    int64_t sector_num, end;
> >> +    int ret = 0;
> >> +    int n;
> >> +    void *buf;
> >> +
> >> +    buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);
> >> +    s->common.len = bdrv_getlength(bs);
> >> +    bdrv_get_geometry(bs, (uint64_t *)&end);
> >> +
> >> +    bdrv_set_zero_detection(bs, true);
> >> +    bdrv_set_copy_on_read(bs, true);
> >
> > Should distinguish between stream initiated and user initiated setting
> > of zero detection and cor (so that unsetting below does not clear
> > user settings).
> 
> For zero detection I agree.
> 
> For copy-on-read it is questionable since once streaming is complete
> it does not make sense to have copy-on-read enabled.
> 
> I will address this in the next revision and think more about the
> copy-on-read case.
> 
> >> +
> >> +    for (sector_num = 0; sector_num < end; sector_num += n) {
> >> +        if (block_job_is_cancelled(&s->common)) {
> >> +            break;
> >> +        }
> >
> > If cancellation is seen here in the last loop iteration,
> > bdrv_change_backing_file below should not be executed.
> 
> I documented this case in the QMP API.  I'm not sure if it's possible
> to guarantee that the operation isn't just completing as you cancel
> it.  Any blocking point between completion of the last iteration and
> completing the operation is vulnerable to missing the cancel.  It's
> easier to explicitly say the operation might just have completed when
> you canceled, rather than trying to protect the completion path.  Do
> you think it's a problem to have these loose semantics that I
> described?

No, that is ok. I'm referring to bdrv_change_backing_file() being
executed without the entire image being streamed.

"if (sector_num == end && ret == 0)" includes both all sectors being 
streamed and all sectors except the last iteration being streamed (due
to job cancelled break).

> >> +
> >> +        /* TODO rate-limit */
> >> +        /* Note that even when no rate limit is applied we need to yield with
> >> +         * no pending I/O here so that qemu_aio_flush() is able to return.
> >> +         */
> >> +        co_sleep_ns(rt_clock, 0);
> >
> > How do you plan to implement rate limit?
> 
> It was implemented in the QED-specific image streaming series:
> 
> http://repo.or.cz/w/qemu/stefanha.git/commitdiff/22f2c09d2fcfe5e49ac4604fd23e4744f549a476
> 
> That implementation works fine and is small but I'd like to reuse the
> migration speed limit, if possible.  That way we don't have 3
> different rate-limiting implementations in QEMU :).

One possibility would be to create a "virtual" block device for
streaming, sitting on top of the real block device. Then enforce block
I/O limits on the virtual block device, the guest would remain accessing
the real block device.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] block: add image streaming block job
  2011-11-03 16:34       ` Marcelo Tosatti
@ 2011-11-04  8:03         ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-11-04  8:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Kevin Wolf, Stefan Hajnoczi, Anthony Liguori, qemu-devel

On Thu, Nov 03, 2011 at 02:34:24PM -0200, Marcelo Tosatti wrote:
> On Wed, Nov 02, 2011 at 03:43:49PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Nov 1, 2011 at 6:06 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote:
> > >> +
> > >> +    for (sector_num = 0; sector_num < end; sector_num += n) {
> > >> +        if (block_job_is_cancelled(&s->common)) {
> > >> +            break;
> > >> +        }
> > >
> > > If cancellation is seen here in the last loop iteration,
> > > bdrv_change_backing_file below should not be executed.
> > 
> > I documented this case in the QMP API.  I'm not sure if it's possible
> > to guarantee that the operation isn't just completing as you cancel
> > it.  Any blocking point between completion of the last iteration and
> > completing the operation is vulnerable to missing the cancel.  It's
> > easier to explicitly say the operation might just have completed when
> > you canceled, rather than trying to protect the completion path.  Do
> > you think it's a problem to have these loose semantics that I
> > described?
> 
> No, that is ok. I'm referring to bdrv_change_backing_file() being
> executed without the entire image being streamed.
> 
> "if (sector_num == end && ret == 0)" includes both all sectors being 
> streamed and all sectors except the last iteration being streamed (due
> to job cancelled break).

I don't see the case you mention.  Here is the code again:

for (sector_num = 0; sector_num < end; sector_num += n) {
    if (block_job_is_cancelled(&s->common)) {
        break;
    }

If we are on the last iteration, then sector_num = end - m, where m > 0
and is the number of sectors we are about to stream.

If we are cancelled during this last iteration then sector_num == end
- m.  Therefore the "if (sector_num == end && ret == 0)" case does not
  evaluate to true.

The only way we can reach sector_num == end is by having successfully
streamed those last m sectors.  Why?  Because sector_num is a 0-based
index and not a 1-based index, so it excludes end.

> > >> +
> > >> +        /* TODO rate-limit */
> > >> +        /* Note that even when no rate limit is applied we need to yield with
> > >> +         * no pending I/O here so that qemu_aio_flush() is able to return.
> > >> +         */
> > >> +        co_sleep_ns(rt_clock, 0);
> > >
> > > How do you plan to implement rate limit?
> > 
> > It was implemented in the QED-specific image streaming series:
> > 
> > http://repo.or.cz/w/qemu/stefanha.git/commitdiff/22f2c09d2fcfe5e49ac4604fd23e4744f549a476
> > 
> > That implementation works fine and is small but I'd like to reuse the
> > migration speed limit, if possible.  That way we don't have 3
> > different rate-limiting implementations in QEMU :).
> 
> One possibility would be to create a "virtual" block device for
> streaming, sitting on top of the real block device. Then enforce block
> I/O limits on the virtual block device, the guest would remain accessing
> the real block device.

That's an interesting idea.  I have also experimented with rate-limiting
and it seems the common code is really small - the rate-limiting code is
quite short to begin with.  So I'm now tending to reimplementing it.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-11-04  9:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-27 15:22 [Qemu-devel] [PATCH 0/8] block: generic image streaming Stefan Hajnoczi
2011-10-27 15:22 ` [Qemu-devel] [PATCH 1/8] coroutine: add co_sleep_ns() coroutine sleep function Stefan Hajnoczi
2011-10-27 15:22 ` [Qemu-devel] [PATCH 2/8] block: add BlockJob interface for long-running operations Stefan Hajnoczi
2011-10-27 15:22 ` [Qemu-devel] [PATCH 3/8] block: add image streaming block job Stefan Hajnoczi
2011-11-01 18:06   ` Marcelo Tosatti
2011-11-02 15:43     ` Stefan Hajnoczi
2011-11-02 16:43       ` Kevin Wolf
2011-11-03 16:34       ` Marcelo Tosatti
2011-11-04  8:03         ` Stefan Hajnoczi
2011-10-27 15:22 ` [Qemu-devel] [PATCH 4/8] qmp: add block_stream command Stefan Hajnoczi
2011-10-27 15:22 ` [Qemu-devel] [PATCH 5/8] qmp: add block_job_set_speed command Stefan Hajnoczi
2011-10-27 15:22 ` [Qemu-devel] [PATCH 6/8] qmp: add block_job_cancel command Stefan Hajnoczi
2011-10-27 15:22 ` [Qemu-devel] [PATCH 7/8] qmp: add query-block-jobs Stefan Hajnoczi
2011-10-27 15:22 ` [Qemu-devel] [PATCH 8/8] test: add image streaming test cases Stefan Hajnoczi
2011-10-27 18:58 ` [Qemu-devel] [PATCH 0/8] block: generic image streaming Luiz Capitulino
2011-11-01 16:46 ` Marcelo Tosatti
2011-11-02 11:06   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.