All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3
@ 2012-09-26 15:56 Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 01/45] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE Paolo Bonzini
                   ` (45 more replies)
  0 siblings, 46 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Hi all, this is the resubmission of my block job patches, originally
meant for 1.2.  This still does not include a persistent dirty bitmap,
which I hope to post in October.

The patches are organized as follows:

01-13   preparatory work for block job errors, including support for
        pausing and resuming jobs

14-18   introduce block job errors, and add support in block-stream

19-25   preparatory work for block mirroring: new commands/concepts
        and creating new functions out of existing code.

26-33   introduce a simple version of mirroring.  The initial patch
        add the mirroring logic, followed by the ability to switch to
        the destination of migration and to handle errors during the job.
        All these changes come with testcases.  Removing the ability to
        query the target file is the main change from v1.

34-41   These patches introduce the first optimizations, namely supporting
        an arbitrary granularity for the dirty bitmap.  The current default,
        1M, is too coarse to let the job converge quickly and in almost
        real-time.  These patches reimplement the block device dirty bitmap
        to allow efficient iteration, and add cluster copy-on-write logic.
        Cluster copy-on-write is needed because management will want to
        start the copy before the backing file is in place in the destination;
        if mirroring takes care of copy-on-write, BDRV_O_NO_BACKING can be
        used even if the granularity is smaller than the cluster size.

42-45   A second round optimizations, replacing serialized read-write
        operations with multiple asynchronous I/O operations.  The various
        in-flight operations can be of arbitrary size.  The initial copy
        will end up reading large chunks sequentially (10M by default),
        while subsequent passes can mimic more closely the guest's I/O
        patterns.

All comments from Kevin's partial review are addressed, so I believe
the first 29 patches should be ready to go.  Laszlo already reviewed some
of the subsequent parts.

Please review!

Jeff Cody (1):
  blockdev: rename block_stream_cb to a generic block_job_cb

Paolo Bonzini (44):
  qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  block: fix documentation of block_job_cancel_sync
  block: move job APIs to separate files
  block: add block_job_query
  block: add support for job pause/resume
  qmp: add block-job-pause and block-job-resume
  qemu-iotests: add test for pausing a streaming operation
  block: rename block_job_complete to block_job_completed
  iostatus: rename BlockErrorAction, BlockQMPEventAction
  iostatus: move BlockdevOnError declaration to QAPI
  iostatus: change is_read to a bool
  iostatus: reorganize io error code
  block: introduce block job error
  stream: add on-error argument
  blkdebug: process all set_state rules in the old state
  qemu-iotests: map underscore to dash in QMP argument names
  qemu-iotests: add tests for streaming error handling
  block: add bdrv_query_info
  block: add bdrv_query_stats
  block: add bdrv_open_backing_file
  block: introduce new dirty bitmap functionality
  block: export dirty bitmap information in query-block
  block: add block-job-complete
  block: introduce BLOCK_JOB_READY event
  mirror: introduce mirror job
  qmp: add drive-mirror command
  mirror: implement completion
  qemu-iotests: add mirroring test case
  iostatus: forward bdrv_iostatus_reset to block job
  mirror: add support for on-source-error/on-target-error
  qmp: add pull_event function
  qemu-iotests: add testcases for mirroring
    on-source-error/on-target-error
  host-utils: add ffsl
  add hierarchical bitmap data type and test cases
  block: implement dirty bitmap using HBitmap
  block: make round_to_clusters public
  mirror: perform COW if the cluster size is bigger than the
    granularity
  block: return count of dirty sectors, not chunks
  block: allow customizing the granularity of the dirty bitmap
  mirror: allow customizing the granularity
  mirror: switch mirror_iteration to AIO
  mirror: add buf-size argument to drive-mirror
  mirror: support more than one in-flight AIO operation
  mirror: support arbitrarily-sized iterations

 Makefile.objs                 |   5 +-
 QMP/qmp-events.txt            |  42 +++
 QMP/qmp.py                    |  20 ++
 block-migration.c             |   7 +-
 block.c                       | 480 +++++++++++++-----------------
 block.h                       |  37 ++-
 block/Makefile.objs           |   3 +-
 block/blkdebug.c              |  12 +-
 block/mirror.c                | 576 ++++++++++++++++++++++++++++++++++++
 block/stream.c                |  33 ++-
 block_int.h                   | 192 +++---------
 blockdev.c                    | 259 ++++++++++++++---
 blockjob.c                    | 282 ++++++++++++++++++
 blockjob.h                    | 280 ++++++++++++++++++
 hbitmap.c                     | 400 +++++++++++++++++++++++++
 hbitmap.h                     | 207 +++++++++++++
 hmp-commands.hx               |  73 ++++-
 hmp.c                         |  65 ++++-
 hmp.h                         |   4 +
 host-utils.h                  |  26 ++
 hw/fdc.c                      |   4 +-
 hw/ide/core.c                 |  22 +-
 hw/ide/pci.c                  |   4 +-
 hw/scsi-disk.c                |  25 +-
 hw/scsi-generic.c             |   4 +-
 hw/virtio-blk.c               |  23 +-
 monitor.c                     |   2 +
 monitor.h                     |   2 +
 qapi-schema.json              | 206 ++++++++++++-
 qemu-tool.c                   |   6 +
 qerror.h                      |   9 +
 qmp-commands.hx               |  75 ++++-
 tests/Makefile                |   2 +
 tests/qemu-iotests/030        | 260 ++++++++++++++++-
 tests/qemu-iotests/030.out    |   4 +-
 tests/qemu-iotests/040        | 658 ++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out    |   5 +
 tests/qemu-iotests/group      |   5 +-
 tests/qemu-iotests/iotests.py |  19 +-
 tests/test-hbitmap.c          | 408 ++++++++++++++++++++++++++
 trace-events                  |  24 +-
 41 file modificati, 4176 inserzioni(+), 594 rimozioni(-)
 create mode 100644 block/mirror.c
 create mode 100644 blockjob.c
 create mode 100644 blockjob.h
 create mode 100644 hbitmap.c
 create mode 100644 hbitmap.h
 create mode 100755 tests/qemu-iotests/040
 create mode 100644 tests/qemu-iotests/040.out
 create mode 100644 tests/test-hbitmap.c

-- 
1.7.12

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 01/45] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 02/45] blockdev: rename block_stream_cb to a generic block_job_cb Paolo Bonzini
                   ` (44 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

The DeviceNotActive text is not a particularly good match, add
a separate text while keeping the same class.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: rebased for Error changes

 blockdev.c | 4 ++--
 qerror.h   | 3 +++
 2 file modificati, 5 inserzioni(+), 2 rimozioni(-)

diff --git a/blockdev.c b/blockdev.c
index e5d450f..de5457d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1142,7 +1142,7 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
     BlockJob *job = find_block_job(device);
 
     if (!job) {
-        error_set(errp, QERR_DEVICE_NOT_ACTIVE, device);
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
 
@@ -1154,7 +1154,7 @@ void qmp_block_job_cancel(const char *device, Error **errp)
     BlockJob *job = find_block_job(device);
 
     if (!job) {
-        error_set(errp, QERR_DEVICE_NOT_ACTIVE, device);
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
 
diff --git a/qerror.h b/qerror.h
index d0a76a4..485c773 100644
--- a/qerror.h
+++ b/qerror.h
@@ -48,6 +48,9 @@ void assert_no_error(Error *err);
 #define QERR_BASE_NOT_FOUND \
     ERROR_CLASS_GENERIC_ERROR, "Base '%s' not found"
 
+#define QERR_BLOCK_JOB_NOT_ACTIVE \
+    ERROR_CLASS_DEVICE_NOT_ACTIVE, "No active block job on device '%s'"
+
 #define QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED \
     ERROR_CLASS_GENERIC_ERROR, "Block format '%s' used by device '%s' does not support feature '%s'"
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 02/45] blockdev: rename block_stream_cb to a generic block_job_cb
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 01/45] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27 11:56   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 03/45] block: fix documentation of block_job_cancel_sync Paolo Bonzini
                   ` (43 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

From: Jeff Cody <jcody@redhat.com>

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
        v1->v2: now synced with Jeff's live commit patch, and moved towards
        the beginning of the series to minimize conflicts

 blockdev.c   | 6 +++---
 trace-events | 2 +-
 2 file modificati, 4 inserzioni(+), 4 rimozioni(-)

diff --git a/blockdev.c b/blockdev.c
index de5457d..fa338fb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1065,12 +1065,12 @@ static QObject *qobject_from_block_job(BlockJob *job)
                               job->speed);
 }
 
-static void block_stream_cb(void *opaque, int ret)
+static void block_job_cb(void *opaque, int ret)
 {
     BlockDriverState *bs = opaque;
     QObject *obj;
 
-    trace_block_stream_cb(bs, bs->job, ret);
+    trace_block_job_cb(bs, bs->job, ret);
 
     assert(bs->job);
     obj = qobject_from_block_job(bs->job);
@@ -1112,7 +1112,7 @@ void qmp_block_stream(const char *device, bool has_base,
     }
 
     stream_start(bs, base_bs, base, has_speed ? speed : 0,
-                 block_stream_cb, bs, &local_err);
+                 block_job_cb, bs, &local_err);
     if (error_is_set(&local_err)) {
         error_propagate(errp, local_err);
         return;
diff --git a/trace-events b/trace-events
index f5b5097..e90de71 100644
--- a/trace-events
+++ b/trace-events
@@ -77,7 +77,7 @@ stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-block_stream_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
+block_job_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
 # hw/virtio-blk.c
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 03/45] block: fix documentation of block_job_cancel_sync
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 01/45] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 02/45] blockdev: rename block_stream_cb to a generic block_job_cb Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27 12:03   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 04/45] block: move job APIs to separate files Paolo Bonzini
                   ` (42 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Do this in a separate commit before we move the functions to
blockjob.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: split out of the next patch

 block_int.h | 4 ++--
 1 file modificato, 2 inserzioni(+), 2 rimozioni(-)

diff --git a/block_int.h b/block_int.h
index ac4245c..7bb95b7 100644
--- a/block_int.h
+++ b/block_int.h
@@ -425,10 +425,10 @@ void block_job_cancel(BlockJob *job);
 bool block_job_is_cancelled(BlockJob *job);
 
 /**
- * block_job_cancel:
+ * block_job_cancel_sync:
  * @job: The job to be canceled.
  *
- * Asynchronously cancel the job and wait for it to reach a quiescent
+ * Synchronously cancel the job and wait for it to reach a quiescent
  * state.  Note that the completion callback will still be called
  * asynchronously, hence it is *not* valid to call #bdrv_delete
  * immediately after #block_job_cancel_sync.  Users of block jobs
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 04/45] block: move job APIs to separate files
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (2 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 03/45] block: fix documentation of block_job_cancel_sync Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 05/45] block: add block_job_query Paolo Bonzini
                   ` (41 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.objs       |   5 +-
 block.c             | 128 +-----------------------------------
 block.h             |   2 +
 block/Makefile.objs |   3 +-
 block/stream.c      |   1 +
 block_int.h         | 153 -------------------------------------------
 blockdev.c          |   1 +
 blockjob.c          | 163 ++++++++++++++++++++++++++++++++++++++++++++++
 blockjob.h          | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 9 file modificati, 355 inserzioni(+), 283 rimozioni(-)
 create mode 100644 blockjob.c
 create mode 100644 blockjob.h

diff --git a/Makefile.objs b/Makefile.objs
index 4412757..9401516 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -42,7 +42,8 @@ coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
 block-obj-y = cutils.o iov.o cache-utils.o qemu-option.o module.o async.o
-block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o qemu-progress.o qemu-sockets.o
+block-obj-y += nbd.o block.o blockjob.o aio.o aes.o qemu-config.o
+block-obj-y += qemu-progress.o qemu-sockets.o
 block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
@@ -59,7 +60,7 @@ endif
 # suppress *all* target specific code in case of system emulation, i.e. a
 # single QEMU executable should support all CPUs and machines.
 
-common-obj-y = $(block-obj-y) blockdev.o
+common-obj-y = $(block-obj-y) blockdev.o block/
 common-obj-y += net.o net/
 common-obj-y += qom/
 common-obj-y += readline.o console.o cursor.o
diff --git a/block.c b/block.c
index 751ebdc..a4816ad 100644
--- a/block.c
+++ b/block.c
@@ -26,6 +26,7 @@
 #include "trace.h"
 #include "monitor.h"
 #include "block_int.h"
+#include "blockjob.h"
 #include "module.h"
 #include "qjson.h"
 #include "qemu-coroutine.h"
@@ -4247,130 +4248,3 @@ out:
 
     return ret;
 }
-
-void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
-                       int64_t speed, BlockDriverCompletionFunc *cb,
-                       void *opaque, Error **errp)
-{
-    BlockJob *job;
-
-    if (bs->job || bdrv_in_use(bs)) {
-        error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
-        return NULL;
-    }
-    bdrv_set_in_use(bs, 1);
-
-    job = g_malloc0(job_type->instance_size);
-    job->job_type      = job_type;
-    job->bs            = bs;
-    job->cb            = cb;
-    job->opaque        = opaque;
-    job->busy          = true;
-    bs->job = job;
-
-    /* Only set speed when necessary to avoid NotSupported error */
-    if (speed != 0) {
-        Error *local_err = NULL;
-
-        block_job_set_speed(job, speed, &local_err);
-        if (error_is_set(&local_err)) {
-            bs->job = NULL;
-            g_free(job);
-            bdrv_set_in_use(bs, 0);
-            error_propagate(errp, local_err);
-            return NULL;
-        }
-    }
-    return job;
-}
-
-void block_job_complete(BlockJob *job, int ret)
-{
-    BlockDriverState *bs = job->bs;
-
-    assert(bs->job == job);
-    job->cb(job->opaque, ret);
-    bs->job = NULL;
-    g_free(job);
-    bdrv_set_in_use(bs, 0);
-}
-
-void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
-{
-    Error *local_err = NULL;
-
-    if (!job->job_type->set_speed) {
-        error_set(errp, QERR_NOT_SUPPORTED);
-        return;
-    }
-    job->job_type->set_speed(job, speed, &local_err);
-    if (error_is_set(&local_err)) {
-        error_propagate(errp, local_err);
-        return;
-    }
-
-    job->speed = speed;
-}
-
-void block_job_cancel(BlockJob *job)
-{
-    job->cancelled = true;
-    if (job->co && !job->busy) {
-        qemu_coroutine_enter(job->co, NULL);
-    }
-}
-
-bool block_job_is_cancelled(BlockJob *job)
-{
-    return job->cancelled;
-}
-
-struct BlockCancelData {
-    BlockJob *job;
-    BlockDriverCompletionFunc *cb;
-    void *opaque;
-    bool cancelled;
-    int ret;
-};
-
-static void block_job_cancel_cb(void *opaque, int ret)
-{
-    struct BlockCancelData *data = opaque;
-
-    data->cancelled = block_job_is_cancelled(data->job);
-    data->ret = ret;
-    data->cb(data->opaque, ret);
-}
-
-int block_job_cancel_sync(BlockJob *job)
-{
-    struct BlockCancelData data;
-    BlockDriverState *bs = job->bs;
-
-    assert(bs->job == job);
-
-    /* Set up our own callback to store the result and chain to
-     * the original callback.
-     */
-    data.job = job;
-    data.cb = job->cb;
-    data.opaque = job->opaque;
-    data.ret = -EINPROGRESS;
-    job->cb = block_job_cancel_cb;
-    job->opaque = &data;
-    block_job_cancel(job);
-    while (data.ret == -EINPROGRESS) {
-        qemu_aio_wait();
-    }
-    return (data.cancelled && data.ret == 0) ? -ECANCELED : data.ret;
-}
-
-void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
-{
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (!block_job_is_cancelled(job)) {
-        job->busy = false;
-        co_sleep_ns(clock, ns);
-        job->busy = true;
-    }
-}
diff --git a/block.h b/block.h
index b1095d8..bd788e0 100644
--- a/block.h
+++ b/block.h
@@ -6,9 +6,11 @@
 #include "qemu-option.h"
 #include "qemu-coroutine.h"
 #include "qobject.h"
+#include "qapi-types.h"
 
 /* block.c */
 typedef struct BlockDriver BlockDriver;
+typedef struct BlockJob BlockJob;
 
 typedef struct BlockDriverInfo {
     /* in bytes, 0 if irrelevant */
diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..c45affc 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -3,9 +3,10 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-c
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
-block-obj-y += stream.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
+
+common-obj-y += stream.o
diff --git a/block/stream.c b/block/stream.c
index c4f87dd..57e4be7 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -13,6 +13,7 @@
 
 #include "trace.h"
 #include "block_int.h"
+#include "blockjob.h"
 #include "qemu/ratelimit.h"
 
 enum {
diff --git a/block_int.h b/block_int.h
index 7bb95b7..0da1067 100644
--- a/block_int.h
+++ b/block_int.h
@@ -67,73 +67,6 @@ typedef struct BlockIOBaseValue {
     uint64_t ios[2];
 } BlockIOBaseValue;
 
-typedef struct BlockJob BlockJob;
-
-/**
- * BlockJobType:
- *
- * A class type for block job objects.
- */
-typedef struct BlockJobType {
-    /** Derived BlockJob struct size */
-    size_t instance_size;
-
-    /** String describing the operation, part of query-block-jobs QMP API */
-    const char *job_type;
-
-    /** Optional callback for job types that support setting a speed limit */
-    void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
-} BlockJobType;
-
-/**
- * BlockJob:
- *
- * Long-running operation on a BlockDriverState.
- */
-struct BlockJob {
-    /** The job type, including the job vtable.  */
-    const BlockJobType *job_type;
-
-    /** The block device on which the job is operating.  */
-    BlockDriverState *bs;
-
-    /**
-     * The coroutine that executes the job.  If not NULL, it is
-     * reentered when busy is false and the job is cancelled.
-     */
-    Coroutine *co;
-
-    /**
-     * Set to true if the job should cancel itself.  The flag must
-     * always be tested just before toggling the busy flag from false
-     * to true.  After a job has been cancelled, it should only yield
-     * if #qemu_aio_wait will ("sooner or later") reenter the coroutine.
-     */
-    bool cancelled;
-
-    /**
-     * Set to false by the job while it is in a quiescent state, where
-     * no I/O is pending and the job has yielded on any condition
-     * that is not detected by #qemu_aio_wait, such as a timer.
-     */
-    bool busy;
-
-    /** Offset that is published by the query-block-jobs QMP API */
-    int64_t offset;
-
-    /** Length that is published by the query-block-jobs QMP API */
-    int64_t len;
-
-    /** Speed that was set with @block_job_set_speed.  */
-    int64_t speed;
-
-    /** The completion function that will be called when the job completes.  */
-    BlockDriverCompletionFunc *cb;
-
-    /** The opaque value that is passed to the completion function.  */
-    void *opaque;
-};
-
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -355,92 +288,6 @@ int is_windows_drive(const char *filename);
 #endif
 
 /**
- * block_job_create:
- * @job_type: The class object for the newly-created job.
- * @bs: The block
- * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
- * @cb: Completion function for the job.
- * @opaque: Opaque pointer value passed to @cb.
- * @errp: Error object.
- *
- * Create a new long-running block device job and return it.  The job
- * will call @cb asynchronously when the job completes.  Note that
- * @bs may have been closed at the time the @cb it is called.  If
- * this is the case, the job may be reported as either cancelled or
- * completed.
- *
- * This function is not part of the public job interface; it should be
- * called from a wrapper that is specific to the job type.
- */
-void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
-                       int64_t speed, BlockDriverCompletionFunc *cb,
-                       void *opaque, Error **errp);
-
-/**
- * block_job_sleep_ns:
- * @job: The job that calls the function.
- * @clock: The clock to sleep on.
- * @ns: How many nanoseconds to stop for.
- *
- * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will interrupt the wait immediately.
- */
-void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns);
-
-/**
- * block_job_complete:
- * @job: The job being completed.
- * @ret: The status code.
- *
- * Call the completion function that was registered at creation time, and
- * free @job.
- */
-void block_job_complete(BlockJob *job, int ret);
-
-/**
- * block_job_set_speed:
- * @job: The job to set the speed for.
- * @speed: The new value
- * @errp: Error object.
- *
- * Set a rate-limiting parameter for the job; the actual meaning may
- * vary depending on the job type.
- */
-void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
-
-/**
- * block_job_cancel:
- * @job: The job to be canceled.
- *
- * Asynchronously cancel the specified job.
- */
-void block_job_cancel(BlockJob *job);
-
-/**
- * block_job_is_cancelled:
- * @job: The job being queried.
- *
- * Returns whether the job is scheduled for cancellation.
- */
-bool block_job_is_cancelled(BlockJob *job);
-
-/**
- * block_job_cancel_sync:
- * @job: The job to be canceled.
- *
- * Synchronously cancel the job and wait for it to reach a quiescent
- * state.  Note that the completion callback will still be called
- * asynchronously, hence it is *not* valid to call #bdrv_delete
- * immediately after #block_job_cancel_sync.  Users of block jobs
- * will usually protect the BlockDriverState objects with a reference
- * count, should this be a concern.
- *
- * Returns the return value from the job if the job actually completed
- * during the call, or -ECANCELED if it was canceled.
- */
-int block_job_cancel_sync(BlockJob *job);
-
-/**
  * stream_start:
  * @bs: Block device to operate on.
  * @base: Block device that will become the new base, or %NULL to
diff --git a/blockdev.c b/blockdev.c
index fa338fb..7ab7d5e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -9,6 +9,7 @@
 
 #include "blockdev.h"
 #include "hw/block-common.h"
+#include "blockjob.h"
 #include "monitor.h"
 #include "qerror.h"
 #include "qemu-option.h"
diff --git a/blockjob.c b/blockjob.c
new file mode 100644
index 0000000..9737a43
--- /dev/null
+++ b/blockjob.c
@@ -0,0 +1,163 @@
+/*
+ * QEMU System Emulator block driver
+ *
+ * Copyright (c) 2011 IBM Corp.
+ * Copyright (c) 2012 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "config-host.h"
+#include "qemu-common.h"
+#include "trace.h"
+#include "monitor.h"
+#include "block.h"
+#include "blockjob.h"
+#include "block_int.h"
+#include "qjson.h"
+#include "qemu-coroutine.h"
+#include "qmp-commands.h"
+#include "qemu-timer.h"
+
+void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
+                       int64_t speed, BlockDriverCompletionFunc *cb,
+                       void *opaque, Error **errp)
+{
+    BlockJob *job;
+
+    if (bs->job || bdrv_in_use(bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
+        return NULL;
+    }
+    bdrv_set_in_use(bs, 1);
+
+    job = g_malloc0(job_type->instance_size);
+    job->job_type      = job_type;
+    job->bs            = bs;
+    job->cb            = cb;
+    job->opaque        = opaque;
+    job->busy          = true;
+    bs->job = job;
+
+    /* Only set speed when necessary to avoid NotSupported error */
+    if (speed != 0) {
+        Error *local_err = NULL;
+
+        block_job_set_speed(job, speed, &local_err);
+        if (error_is_set(&local_err)) {
+            bs->job = NULL;
+            g_free(job);
+            bdrv_set_in_use(bs, 0);
+            error_propagate(errp, local_err);
+            return NULL;
+        }
+    }
+    return job;
+}
+
+void block_job_complete(BlockJob *job, int ret)
+{
+    BlockDriverState *bs = job->bs;
+
+    assert(bs->job == job);
+    job->cb(job->opaque, ret);
+    bs->job = NULL;
+    g_free(job);
+    bdrv_set_in_use(bs, 0);
+}
+
+void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+    Error *local_err = NULL;
+
+    if (!job->job_type->set_speed) {
+        error_set(errp, QERR_NOT_SUPPORTED);
+        return;
+    }
+    job->job_type->set_speed(job, speed, &local_err);
+    if (error_is_set(&local_err)) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    job->speed = speed;
+}
+
+void block_job_cancel(BlockJob *job)
+{
+    job->cancelled = true;
+    if (job->co && !job->busy) {
+        qemu_coroutine_enter(job->co, NULL);
+    }
+}
+
+bool block_job_is_cancelled(BlockJob *job)
+{
+    return job->cancelled;
+}
+
+struct BlockCancelData {
+    BlockJob *job;
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+    bool cancelled;
+    int ret;
+};
+
+static void block_job_cancel_cb(void *opaque, int ret)
+{
+    struct BlockCancelData *data = opaque;
+
+    data->cancelled = block_job_is_cancelled(data->job);
+    data->ret = ret;
+    data->cb(data->opaque, ret);
+}
+
+int block_job_cancel_sync(BlockJob *job)
+{
+    struct BlockCancelData data;
+    BlockDriverState *bs = job->bs;
+
+    assert(bs->job == job);
+
+    /* Set up our own callback to store the result and chain to
+     * the original callback.
+     */
+    data.job = job;
+    data.cb = job->cb;
+    data.opaque = job->opaque;
+    data.ret = -EINPROGRESS;
+    job->cb = block_job_cancel_cb;
+    job->opaque = &data;
+    block_job_cancel(job);
+    while (data.ret == -EINPROGRESS) {
+        qemu_aio_wait();
+    }
+    return (data.cancelled && data.ret == 0) ? -ECANCELED : data.ret;
+}
+
+void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
+{
+    /* Check cancellation *before* setting busy = false, too!  */
+    if (!block_job_is_cancelled(job)) {
+        job->busy = false;
+        co_sleep_ns(clock, ns);
+        job->busy = true;
+    }
+}
diff --git a/blockjob.h b/blockjob.h
new file mode 100644
index 0000000..559518a
--- /dev/null
+++ b/blockjob.h
@@ -0,0 +1,181 @@
+/*
+ * Declarations for long-running block device operations
+ *
+ * Copyright (c) 2011 IBM Corp.
+ * Copyright (c) 2012 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef BLOCKJOB_H
+#define BLOCKJOB_H 1
+
+#include "block.h"
+
+/**
+ * BlockJobType:
+ *
+ * A class type for block job objects.
+ */
+typedef struct BlockJobType {
+    /** Derived BlockJob struct size */
+    size_t instance_size;
+
+    /** String describing the operation, part of query-block-jobs QMP API */
+    const char *job_type;
+
+    /** Optional callback for job types that support setting a speed limit */
+    void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
+} BlockJobType;
+
+/**
+ * BlockJob:
+ *
+ * Long-running operation on a BlockDriverState.
+ */
+struct BlockJob {
+    /** The job type, including the job vtable.  */
+    const BlockJobType *job_type;
+
+    /** The block device on which the job is operating.  */
+    BlockDriverState *bs;
+
+    /**
+     * The coroutine that executes the job.  If not NULL, it is
+     * reentered when busy is false and the job is cancelled.
+     */
+    Coroutine *co;
+
+    /**
+     * Set to true if the job should cancel itself.  The flag must
+     * always be tested just before toggling the busy flag from false
+     * to true.  After a job has been cancelled, it should only yield
+     * if #qemu_aio_wait will ("sooner or later") reenter the coroutine.
+     */
+    bool cancelled;
+
+    /**
+     * Set to false by the job while it is in a quiescent state, where
+     * no I/O is pending and the job has yielded on any condition
+     * that is not detected by #qemu_aio_wait, such as a timer.
+     */
+    bool busy;
+
+    /** Offset that is published by the query-block-jobs QMP API */
+    int64_t offset;
+
+    /** Length that is published by the query-block-jobs QMP API */
+    int64_t len;
+
+    /** Speed that was set with @block_job_set_speed.  */
+    int64_t speed;
+
+    /** The completion function that will be called when the job completes.  */
+    BlockDriverCompletionFunc *cb;
+
+    /** The opaque value that is passed to the completion function.  */
+    void *opaque;
+};
+
+/**
+ * block_job_create:
+ * @job_type: The class object for the newly-created job.
+ * @bs: The block
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @errp: Error object.
+ *
+ * Create a new long-running block device job and return it.  The job
+ * will call @cb asynchronously when the job completes.  Note that
+ * @bs may have been closed at the time the @cb it is called.  If
+ * this is the case, the job may be reported as either cancelled or
+ * completed.
+ *
+ * This function is not part of the public job interface; it should be
+ * called from a wrapper that is specific to the job type.
+ */
+void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
+                       int64_t speed, BlockDriverCompletionFunc *cb,
+                       void *opaque, Error **errp);
+
+/**
+ * block_job_sleep_ns:
+ * @job: The job that calls the function.
+ * @clock: The clock to sleep on.
+ * @ns: How many nanoseconds to stop for.
+ *
+ * Put the job to sleep (assuming that it wasn't canceled) for @ns
+ * nanoseconds.  Canceling the job will interrupt the wait immediately.
+ */
+void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns);
+
+/**
+ * block_job_complete:
+ * @job: The job being completed.
+ * @ret: The status code.
+ *
+ * Call the completion function that was registered at creation time, and
+ * free @job.
+ */
+void block_job_complete(BlockJob *job, int ret);
+
+/**
+ * block_job_set_speed:
+ * @job: The job to set the speed for.
+ * @speed: The new value
+ * @errp: Error object.
+ *
+ * Set a rate-limiting parameter for the job; the actual meaning may
+ * vary depending on the job type.
+ */
+void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
+
+/**
+ * block_job_cancel:
+ * @job: The job to be canceled.
+ *
+ * Asynchronously cancel the specified job.
+ */
+void block_job_cancel(BlockJob *job);
+
+/**
+ * block_job_is_cancelled:
+ * @job: The job being queried.
+ *
+ * Returns whether the job is scheduled for cancellation.
+ */
+bool block_job_is_cancelled(BlockJob *job);
+
+/**
+ * block_job_cancel_sync:
+ * @job: The job to be canceled.
+ *
+ * Synchronously cancel the job and wait for it to reach a quiescent
+ * state.  Note that the completion callback will still be called
+ * asynchronously, hence it is *not* valid to call #bdrv_delete
+ * immediately after #block_job_cancel_sync.  Users of block jobs
+ * will usually protect the BlockDriverState objects with a reference
+ * count, should this be a concern.
+ *
+ * Returns the return value from the job if the job actually completed
+ * during the call, or -ECANCELED if it was canceled.
+ */
+int block_job_cancel_sync(BlockJob *job);
+
+#endif
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 05/45] block: add block_job_query
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (3 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 04/45] block: move job APIs to separate files Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume Paolo Bonzini
                   ` (40 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Extract it out of the implementation of info block-jobs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: use g_new0.

 blockdev.c | 15 ++-------------
 blockjob.c | 11 +++++++++++
 blockjob.h |  8 ++++++++
 3 file modificati, 21 inserzioni(+), 13 rimozioni(-)

diff --git a/blockdev.c b/blockdev.c
index 7ab7d5e..5772c11 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1169,19 +1169,8 @@ static void do_qmp_query_block_jobs_one(void *opaque, BlockDriverState *bs)
     BlockJob *job = bs->job;
 
     if (job) {
-        BlockJobInfoList *elem;
-        BlockJobInfo *info = g_new(BlockJobInfo, 1);
-        *info = (BlockJobInfo){
-            .type   = g_strdup(job->job_type->job_type),
-            .device = g_strdup(bdrv_get_device_name(bs)),
-            .len    = job->len,
-            .offset = job->offset,
-            .speed  = job->speed,
-        };
-
-        elem = g_new0(BlockJobInfoList, 1);
-        elem->value = info;
-
+        BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
+        elem->value = block_job_query(bs->job);
         (*prev)->next = elem;
         *prev = elem;
     }
diff --git a/blockjob.c b/blockjob.c
index 9737a43..dea63f8 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -161,3 +161,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
         job->busy = true;
     }
 }
+
+BlockJobInfo *block_job_query(BlockJob *job)
+{
+    BlockJobInfo *info = g_new0(BlockJobInfo, 1);
+    info->type   = g_strdup(job->job_type->job_type);
+    info->device = g_strdup(bdrv_get_device_name(job->bs));
+    info->len    = job->len;
+    info->offset = job->offset;
+    info->speed  = job->speed;
+    return info;
+}
diff --git a/blockjob.h b/blockjob.h
index 559518a..c6af0fb 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -163,6 +163,14 @@ void block_job_cancel(BlockJob *job);
 bool block_job_is_cancelled(BlockJob *job);
 
 /**
+ * block_job_query:
+ * @job: The job to get information about.
+ *
+ * Return information about a job.
+ */
+BlockJobInfo *block_job_query(BlockJob *job);
+
+/**
  * block_job_cancel_sync:
  * @job: The job to be canceled.
  *
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (4 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 05/45] block: add block_job_query Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 17:31   ` Eric Blake
  2012-09-27 12:18   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 07/45] qmp: add block-job-pause and block-job-resume Paolo Bonzini
                   ` (39 subsequent siblings)
  45 siblings, 2 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Job pausing reuses the existing support for cancellable sleeps.  A pause
happens at the next sleeping point and lasts until the coroutine is
re-entered explicitly.  Cancellation was already doing a forced resume,
so implement it explicitly in terms of resume.

Paused jobs cannot be canceled without first resuming them.  This ensures
that I/O errors are never missed by management.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: rebased for Error changes

 blockdev.c       |  4 ++++
 blockjob.c       | 35 ++++++++++++++++++++++++++++++-----
 blockjob.h       | 30 ++++++++++++++++++++++++++++++
 qapi-schema.json |  4 +++-
 qerror.h         |  3 +++
 5 file modificati, 70 inserzioni(+), 6 rimozioni(-)

diff --git a/blockdev.c b/blockdev.c
index 5772c11..df0c449 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1158,6 +1158,10 @@ void qmp_block_job_cancel(const char *device, Error **errp)
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
+    if (job->paused) {
+        error_set(errp, QERR_BLOCK_JOB_PAUSED, device);
+        return;
+    }
 
     trace_qmp_block_job_cancel(job);
     block_job_cancel(job);
diff --git a/blockjob.c b/blockjob.c
index dea63f8..6c65521 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -99,14 +99,30 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     job->speed = speed;
 }
 
-void block_job_cancel(BlockJob *job)
+void block_job_pause(BlockJob *job)
 {
-    job->cancelled = true;
+    job->paused = true;
+}
+
+bool block_job_is_paused(BlockJob *job)
+{
+    return job->paused;
+}
+
+void block_job_resume(BlockJob *job)
+{
+    job->paused = false;
     if (job->co && !job->busy) {
         qemu_coroutine_enter(job->co, NULL);
     }
 }
 
+void block_job_cancel(BlockJob *job)
+{
+    job->cancelled = true;
+    block_job_resume(job);
+}
+
 bool block_job_is_cancelled(BlockJob *job)
 {
     return job->cancelled;
@@ -154,12 +170,20 @@ int block_job_cancel_sync(BlockJob *job)
 
 void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
 {
+    assert(job->busy);
+
     /* Check cancellation *before* setting busy = false, too!  */
-    if (!block_job_is_cancelled(job)) {
-        job->busy = false;
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    job->busy = false;
+    if (block_job_is_paused(job)) {
+        qemu_coroutine_yield();
+    } else {
         co_sleep_ns(clock, ns);
-        job->busy = true;
     }
+    job->busy = true;
 }
 
 BlockJobInfo *block_job_query(BlockJob *job)
@@ -168,6 +192,7 @@ BlockJobInfo *block_job_query(BlockJob *job)
     info->type   = g_strdup(job->job_type->job_type);
     info->device = g_strdup(bdrv_get_device_name(job->bs));
     info->len    = job->len;
+    info->paused = job->paused;
     info->offset = job->offset;
     info->speed  = job->speed;
     return info;
diff --git a/blockjob.h b/blockjob.h
index c6af0fb..a2bacba 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -70,6 +70,12 @@ struct BlockJob {
     bool cancelled;
 
     /**
+     * Set to true if the job is either paused, or will pause itself
+     * as soon as possible (if busy == true).
+     */
+    bool paused;
+
+    /**
      * Set to false by the job while it is in a quiescent state, where
      * no I/O is pending and the job has yielded on any condition
      * that is not detected by #qemu_aio_wait, such as a timer.
@@ -171,6 +177,30 @@ bool block_job_is_cancelled(BlockJob *job);
 BlockJobInfo *block_job_query(BlockJob *job);
 
 /**
+ * block_job_pause:
+ * @job: The job to be paused.
+ *
+ * Asynchronously pause the specified job.
+ */
+void block_job_pause(BlockJob *job);
+
+/**
+ * block_job_resume:
+ * @job: The job to be resumed.
+ *
+ * Resume the specified job.
+ */
+void block_job_resume(BlockJob *job);
+
+/**
+ * block_job_is_paused:
+ * @job: The job being queried.
+ *
+ * Returns whether the job is currently paused.
+ */
+bool block_job_is_paused(BlockJob *job);
+
+/**
  * block_job_cancel_sync:
  * @job: The job to be canceled.
  *
diff --git a/qapi-schema.json b/qapi-schema.json
index 14e4419..f8a67ae 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1098,6 +1098,8 @@
 #
 # @len: the maximum progress value
 #
+# @paused: whether the job is paused (since 1.2)
+#
 # @offset: the current progress value
 #
 # @speed: the rate limit, bytes per second
@@ -1106,7 +1108,7 @@
 ##
 { 'type': 'BlockJobInfo',
   'data': {'type': 'str', 'device': 'str', 'len': 'int',
-           'offset': 'int', 'speed': 'int'} }
+           'offset': 'int', 'paused': 'bool', 'speed': 'int'} }
 
 ##
 # @query-block-jobs:
diff --git a/qerror.h b/qerror.h
index 485c773..c91708c 100644
--- a/qerror.h
+++ b/qerror.h
@@ -51,6 +51,9 @@ void assert_no_error(Error *err);
 #define QERR_BLOCK_JOB_NOT_ACTIVE \
     ERROR_CLASS_DEVICE_NOT_ACTIVE, "No active block job on device '%s'"
 
+#define QERR_BLOCK_JOB_PAUSED \
+    ERROR_CLASS_GENERIC_ERROR, "The block job for device '%s' is currently paused"
+
 #define QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED \
     ERROR_CLASS_GENERIC_ERROR, "Block format '%s' used by device '%s' does not support feature '%s'"
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 07/45] qmp: add block-job-pause and block-job-resume
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (5 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 17:45   ` Eric Blake
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 08/45] qemu-iotests: add test for pausing a streaming operation Paolo Bonzini
                   ` (38 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Add QMP commands matching the functionality.

Paused jobs cannot be canceled without first resuming them.  This
ensures that I/O errors are never missed by management.  However, an
optional force argument can be specified to allow that.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: document that the commands do not nest; a single resume
        command will always resume.

 blockdev.c       | 35 +++++++++++++++++++++++++++++++++--
 hmp-commands.hx  | 35 ++++++++++++++++++++++++++++++++---
 hmp.c            | 23 ++++++++++++++++++++++-
 hmp.h            |  2 ++
 qapi-schema.json | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 qmp-commands.hx  | 12 +++++++++++-
 trace-events     |  2 ++
 7 file modificati, 146 inserzioni(+), 8 rimozioni(-)

diff --git a/blockdev.c b/blockdev.c
index df0c449..a718380 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1150,15 +1150,20 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
     block_job_set_speed(job, speed, errp);
 }
 
-void qmp_block_job_cancel(const char *device, Error **errp)
+void qmp_block_job_cancel(const char *device,
+                          bool has_force, bool force, Error **errp)
 {
     BlockJob *job = find_block_job(device);
 
+    if (!has_force) {
+        force = false;
+    }
+
     if (!job) {
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
-    if (job->paused) {
+    if (job->paused && !force) {
         error_set(errp, QERR_BLOCK_JOB_PAUSED, device);
         return;
     }
@@ -1167,6 +1172,32 @@ void qmp_block_job_cancel(const char *device, Error **errp)
     block_job_cancel(job);
 }
 
+void qmp_block_job_pause(const char *device, Error **errp)
+{
+    BlockJob *job = find_block_job(device);
+
+    if (!job) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
+        return;
+    }
+
+    trace_qmp_block_job_pause(job);
+    block_job_pause(job);
+}
+
+void qmp_block_job_resume(const char *device, Error **errp)
+{
+    BlockJob *job = find_block_job(device);
+
+    if (!job) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
+        return;
+    }
+
+    trace_qmp_block_job_resume(job);
+    block_job_resume(job);
+}
+
 static void do_qmp_query_block_jobs_one(void *opaque, BlockDriverState *bs)
 {
     BlockJobInfoList **prev = opaque;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index ed67e99..27d90a2 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -99,9 +99,10 @@ ETEXI
 
     {
         .name       = "block_job_cancel",
-        .args_type  = "device:B",
-        .params     = "device",
-        .help       = "stop an active background block operation",
+        .args_type  = "force:-f,device:B",
+        .params     = "[-f] device",
+        .help       = "stop an active background block operation (use -f"
+                      "\n\t\t\t if the operation is currently paused)",
         .mhandler.cmd = hmp_block_job_cancel,
     },
 
@@ -112,6 +113,34 @@ Stop an active block streaming operation.
 ETEXI
 
     {
+        .name       = "block_job_pause",
+        .args_type  = "device:B",
+        .params     = "device",
+        .help       = "pause an active background block operation",
+        .mhandler.cmd = hmp_block_job_pause,
+    },
+
+STEXI
+@item block_job_pause
+@findex block_job_pause
+Pause an active block streaming operation.
+ETEXI
+
+    {
+        .name       = "block_job_resume",
+        .args_type  = "device:B",
+        .params     = "device",
+        .help       = "resume a paused background block operation",
+        .mhandler.cmd = hmp_block_job_resume,
+    },
+
+STEXI
+@item block_job_resume
+@findex block_job_resume
+Resume a paused block streaming operation.
+ETEXI
+
+    {
         .name       = "eject",
         .args_type  = "force:-f,device:B",
         .params     = "[-f] device",
diff --git a/hmp.c b/hmp.c
index ba6fbd3..55601f7 100644
--- a/hmp.c
+++ b/hmp.c
@@ -950,8 +950,29 @@ void hmp_block_job_cancel(Monitor *mon, const QDict *qdict)
 {
     Error *error = NULL;
     const char *device = qdict_get_str(qdict, "device");
+    bool force = qdict_get_try_bool(qdict, "force", 0);
 
-    qmp_block_job_cancel(device, &error);
+    qmp_block_job_cancel(device, true, force, &error);
+
+    hmp_handle_error(mon, &error);
+}
+
+void hmp_block_job_pause(Monitor *mon, const QDict *qdict)
+{
+    Error *error = NULL;
+    const char *device = qdict_get_str(qdict, "device");
+
+    qmp_block_job_pause(device, &error);
+
+    hmp_handle_error(mon, &error);
+}
+
+void hmp_block_job_resume(Monitor *mon, const QDict *qdict)
+{
+    Error *error = NULL;
+    const char *device = qdict_get_str(qdict, "device");
+
+    qmp_block_job_resume(device, &error);
 
     hmp_handle_error(mon, &error);
 }
diff --git a/hmp.h b/hmp.h
index 48b9c59..71ea384 100644
--- a/hmp.h
+++ b/hmp.h
@@ -64,6 +64,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict);
 void hmp_block_stream(Monitor *mon, const QDict *qdict);
 void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
+void hmp_block_job_pause(Monitor *mon, const QDict *qdict);
+void hmp_block_job_resume(Monitor *mon, const QDict *qdict);
 void hmp_migrate(Monitor *mon, const QDict *qdict);
 void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index f8a67ae..2bd94dc 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1855,12 +1855,55 @@
 #
 # @device: the device name
 #
+# @force: #optional whether to allow cancellation of a paused job (default false)
+#
 # Returns: Nothing on success
 #          If no background operation is active on this device, DeviceNotActive
 #
 # Since: 1.1
 ##
-{ 'command': 'block-job-cancel', 'data': { 'device': 'str' } }
+{ 'command': 'block-job-cancel', 'data': { 'device': 'str', '*force': 'bool' } }
+
+##
+# @block-job-pause:
+#
+# Pause an active background block operation.
+#
+# This command returns immediately after marking the active background block
+# operation for pausing.  It is an error to call this command if no
+# operation is in progress.  Pausing an already paused job has no cumulative
+# effect; a single block-job-resume command will resume the job.
+#
+# The operation will pause as soon as possible.  No event is emitted when
+# the operation is actually paused.  Cancelling a paused job automatically
+# resumes it.
+#
+# @device: the device name
+#
+# Returns: Nothing on success
+#          If no background operation is active on this device, DeviceNotActive
+#
+# Since: 1.3
+##
+{ 'command': 'block-job-pause', 'data': { 'device': 'str' } }
+
+##
+# @block-job-resume:
+#
+# Resume an active background block operation.
+#
+# This command returns immediately after resuming a paused background block
+# operation.  It is an error to call this command if no operation is in
+# progress.  Resuming an already running job is not an error.
+#
+# @device: the device name
+#
+# Returns: Nothing on success
+#          If no background operation is active on this device, DeviceNotActive
+#
+# Since: 1.3
+##
+{ 'command': 'block-job-resume', 'data': { 'device': 'str' } }
 
 ##
 # @ObjectTypeInfo:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 6e21ddb..85eacb5 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -799,10 +799,20 @@ EQMP
 
     {
         .name       = "block-job-cancel",
-        .args_type  = "device:B",
+        .args_type  = "device:B,force:b?",
         .mhandler.cmd_new = qmp_marshal_input_block_job_cancel,
     },
     {
+        .name       = "block-job-pause",
+        .args_type  = "device:B",
+        .mhandler.cmd_new = qmp_marshal_input_block_job_pause,
+    },
+    {
+        .name       = "block-job-resume",
+        .args_type  = "device:B",
+        .mhandler.cmd_new = qmp_marshal_input_block_job_resume,
+    },
+    {
         .name       = "transaction",
         .args_type  = "actions:q",
         .mhandler.cmd_new = qmp_marshal_input_transaction,
diff --git a/trace-events b/trace-events
index e90de71..75f7357 100644
--- a/trace-events
+++ b/trace-events
@@ -77,6 +77,8 @@ stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
+qmp_block_job_pause(void *job) "job %p"
+qmp_block_job_resume(void *job) "job %p"
 block_job_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 08/45] qemu-iotests: add test for pausing a streaming operation
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (6 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 07/45] qmp: add block-job-pause and block-job-resume Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 09/45] block: rename block_job_complete to block_job_completed Paolo Bonzini
                   ` (37 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

These check that a paused streaming job does not advance its offset.

Sometimes the new test fails; the map is different between the source
and the destination of the streaming because qemu-io does not always
pack adjacent clusters that have the same allocated/unallocated state.
However, this also happens with the existing test_stream testcase, and
is better fixed in qemu-io.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/030     | 40 ++++++++++++++++++++++++++++++++++++++--
 tests/qemu-iotests/030.out |  4 ++--
 tests/qemu-iotests/group   |   2 +-
 3 file modificati, 41 inserzioni(+), 5 rimozioni(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 55b16f8..dfacdf1 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -18,6 +18,7 @@
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 #
 
+import time
 import os
 import iotests
 from iotests import qemu_img, qemu_io
@@ -98,6 +99,43 @@ class TestSingleDrive(ImageStreamingTestCase):
                          qemu_io('-c', 'map', test_img),
                          'image file map does not match backing file after streaming')
 
+    def test_stream_pause(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-pause', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        offset = self.dictpath(result, 'return[0]/offset')
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/offset', offset)
+
+        result = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
+
+        self.assertEqual(qemu_io('-c', 'map', backing_img),
+                         qemu_io('-c', 'map', test_img),
+                         'image file map does not match backing file after streaming')
+
     def test_stream_partial(self):
         self.assert_no_active_streams()
 
@@ -173,8 +211,6 @@ class TestStreamStop(ImageStreamingTestCase):
         os.remove(backing_img)
 
     def test_stream_stop(self):
-        import time
-
         self.assert_no_active_streams()
 
         result = self.vm.qmp('block-stream', device='drive0')
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 2f7d390..594c16f 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-.......
+........
 ----------------------------------------------------------------------
-Ran 7 tests
+Ran 8 tests
 
 OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index ebb5ca4..fa4da7d 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -36,7 +36,7 @@
 027 rw auto quick
 028 rw backing auto
 029 rw auto quick
-030 rw auto
+030 rw auto backing
 031 rw auto quick
 032 rw auto
 033 rw auto
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 09/45] block: rename block_job_complete to block_job_completed
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (7 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 08/45] qemu-iotests: add test for pausing a streaming operation Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27 12:30   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 10/45] iostatus: rename BlockErrorAction, BlockQMPEventAction Paolo Bonzini
                   ` (36 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

The imperative will be used for the QMP command.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/stream.c | 4 ++--
 blockjob.c     | 2 +-
 blockjob.h     | 4 ++--
 3 file modificati, 5 inserzioni(+), 5 rimozioni(-)

diff --git a/block/stream.c b/block/stream.c
index 57e4be7..a8f585a 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -84,7 +84,7 @@ static void coroutine_fn stream_run(void *opaque)
 
     s->common.len = bdrv_getlength(bs);
     if (s->common.len < 0) {
-        block_job_complete(&s->common, s->common.len);
+        block_job_completed(&s->common, s->common.len);
         return;
     }
 
@@ -167,7 +167,7 @@ wait:
     }
 
     qemu_vfree(buf);
-    block_job_complete(&s->common, ret);
+    block_job_completed(&s->common, ret);
 }
 
 static void stream_set_speed(BlockJob *job, int64_t speed, Error **errp)
diff --git a/blockjob.c b/blockjob.c
index 6c65521..884bd2b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -71,7 +71,7 @@ void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
     return job;
 }
 
-void block_job_complete(BlockJob *job, int ret)
+void block_job_completed(BlockJob *job, int ret)
 {
     BlockDriverState *bs = job->bs;
 
diff --git a/blockjob.h b/blockjob.h
index a2bacba..a0d1b5c 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -132,14 +132,14 @@ void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
 void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns);
 
 /**
- * block_job_complete:
+ * block_job_completed:
  * @job: The job being completed.
  * @ret: The status code.
  *
  * Call the completion function that was registered at creation time, and
  * free @job.
  */
-void block_job_complete(BlockJob *job, int ret);
+void block_job_completed(BlockJob *job, int ret);
 
 /**
  * block_job_set_speed:
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 10/45] iostatus: rename BlockErrorAction, BlockQMPEventAction
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (8 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 09/45] block: rename block_job_complete to block_job_completed Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 11/45] iostatus: move BlockdevOnError declaration to QAPI Paolo Bonzini
                   ` (35 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

We want to remove knowledge of BLOCK_ERR_STOP_ENOSPC from drivers;
drivers should only be told whether to stop/report/ignore the error.
On the other hand, we want to keep using the nicer BlockErrorAction
name in the drivers.  So rename the enums, while leaving aside the
names of the enum values for now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c         |  8 ++++----
 block.h         | 12 ++++++------
 block_int.h     |  2 +-
 hw/ide/core.c   |  2 +-
 hw/scsi-disk.c  |  2 +-
 hw/virtio-blk.c |  2 +-
 6 file modificati, 14 inserzioni(+), 14 rimozioni(-)

diff --git a/block.c b/block.c
index a4816ad..a4225e7 100644
--- a/block.c
+++ b/block.c
@@ -1387,7 +1387,7 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
 }
 
 void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockQMPEventAction action, int is_read)
+                               BlockErrorAction action, int is_read)
 {
     QObject *data;
     const char *action_str;
@@ -2331,14 +2331,14 @@ void bdrv_set_io_limits(BlockDriverState *bs,
     bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
 }
 
-void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
-                       BlockErrorAction on_write_error)
+void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
+                       BlockdevOnError on_write_error)
 {
     bs->on_read_error = on_read_error;
     bs->on_write_error = on_write_error;
 }
 
-BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read)
+BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, int is_read)
 {
     return is_read ? bs->on_read_error : bs->on_write_error;
 }
diff --git a/block.h b/block.h
index bd788e0..b4ef643 100644
--- a/block.h
+++ b/block.h
@@ -93,11 +93,11 @@ typedef struct BlockDevOps {
 typedef enum {
     BLOCK_ERR_REPORT, BLOCK_ERR_IGNORE, BLOCK_ERR_STOP_ENOSPC,
     BLOCK_ERR_STOP_ANY
-} BlockErrorAction;
+} BlockdevOnError;
 
 typedef enum {
     BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP
-} BlockQMPEventAction;
+} BlockErrorAction;
 
 typedef QSIMPLEQ_HEAD(BlockReopenQueue, BlockReopenQueueEntry) BlockReopenQueue;
 
@@ -114,7 +114,7 @@ void bdrv_iostatus_disable(BlockDriverState *bs);
 bool bdrv_iostatus_is_enabled(const BlockDriverState *bs);
 void bdrv_iostatus_set_err(BlockDriverState *bs, int error);
 void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockQMPEventAction action, int is_read);
+                               BlockErrorAction action, int is_read);
 void bdrv_info_print(Monitor *mon, const QObject *data);
 void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
@@ -279,9 +279,9 @@ int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
                       int *pnum);
 
-void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
-                       BlockErrorAction on_write_error);
-BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
+void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
+                       BlockdevOnError on_write_error);
+BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, int is_read);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
 int bdrv_enable_write_cache(BlockDriverState *bs);
diff --git a/block_int.h b/block_int.h
index 0da1067..db487eb 100644
--- a/block_int.h
+++ b/block_int.h
@@ -262,7 +262,7 @@ struct BlockDriverState {
 
     /* NOTE: the following infos are only hints for real hardware
        drivers. They are not used by the block driver */
-    BlockErrorAction on_read_error, on_write_error;
+    BlockdevOnError on_read_error, on_write_error;
     bool iostatus_enabled;
     BlockDeviceIoStatus iostatus;
     char device_name[32];
diff --git a/hw/ide/core.c b/hw/ide/core.c
index d6fb69c..57b9fa4 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -557,7 +557,7 @@ void ide_dma_error(IDEState *s)
 static int ide_handle_rw_error(IDEState *s, int error, int op)
 {
     int is_read = (op & BM_STATUS_RETRY_READ);
-    BlockErrorAction action = bdrv_get_on_error(s->bs, is_read);
+    BlockdevOnError action = bdrv_get_on_error(s->bs, is_read);
 
     if (action == BLOCK_ERR_IGNORE) {
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 95e9158..fef83a3 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -388,7 +388,7 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
 {
     int is_read = (r->req.cmd.xfer == SCSI_XFER_FROM_DEV);
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-    BlockErrorAction action = bdrv_get_on_error(s->qdev.conf.bs, is_read);
+    BlockdevOnError action = bdrv_get_on_error(s->qdev.conf.bs, is_read);
 
     if (action == BLOCK_ERR_IGNORE) {
         bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_IGNORE, is_read);
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 6f6d172..01e537d 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -66,7 +66,7 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
 static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
     int is_read)
 {
-    BlockErrorAction action = bdrv_get_on_error(req->dev->bs, is_read);
+    BlockdevOnError action = bdrv_get_on_error(req->dev->bs, is_read);
     VirtIOBlock *s = req->dev;
 
     if (action == BLOCK_ERR_IGNORE) {
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 11/45] iostatus: move BlockdevOnError declaration to QAPI
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (9 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 10/45] iostatus: rename BlockErrorAction, BlockQMPEventAction Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 17:54   ` Eric Blake
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 12/45] iostatus: change is_read to a bool Paolo Bonzini
                   ` (34 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

This will let block-stream reuse the enum.  Places that used the enums
are renamed accordingly.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c           |  6 +++---
 block.h           |  5 -----
 blockdev.c        | 12 ++++++------
 hw/fdc.c          |  4 ++--
 hw/ide/core.c     |  6 +++---
 hw/scsi-disk.c    |  6 +++---
 hw/scsi-generic.c |  4 ++--
 hw/virtio-blk.c   |  6 +++---
 qapi-schema.json  | 23 +++++++++++++++++++++++
 9 file modificati, 45 inserzioni(+), 27 rimozioni(-)

diff --git a/block.c b/block.c
index a4225e7..ab2f7e0 100644
--- a/block.c
+++ b/block.c
@@ -4050,9 +4050,9 @@ void bdrv_iostatus_enable(BlockDriverState *bs)
 bool bdrv_iostatus_is_enabled(const BlockDriverState *bs)
 {
     return (bs->iostatus_enabled &&
-           (bs->on_write_error == BLOCK_ERR_STOP_ENOSPC ||
-            bs->on_write_error == BLOCK_ERR_STOP_ANY    ||
-            bs->on_read_error == BLOCK_ERR_STOP_ANY));
+           (bs->on_write_error == BLOCKDEV_ON_ERROR_ENOSPC ||
+            bs->on_write_error == BLOCKDEV_ON_ERROR_STOP   ||
+            bs->on_read_error == BLOCKDEV_ON_ERROR_STOP));
 }
 
 void bdrv_iostatus_disable(BlockDriverState *bs)
diff --git a/block.h b/block.h
index b4ef643..d93cbc3 100644
--- a/block.h
+++ b/block.h
@@ -91,11 +91,6 @@ typedef struct BlockDevOps {
 #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
 
 typedef enum {
-    BLOCK_ERR_REPORT, BLOCK_ERR_IGNORE, BLOCK_ERR_STOP_ENOSPC,
-    BLOCK_ERR_STOP_ANY
-} BlockdevOnError;
-
-typedef enum {
     BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP
 } BlockErrorAction;
 
diff --git a/blockdev.c b/blockdev.c
index a718380..9fc794f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -241,13 +241,13 @@ static void drive_put_ref_bh_schedule(DriveInfo *dinfo)
 static int parse_block_error_action(const char *buf, int is_read)
 {
     if (!strcmp(buf, "ignore")) {
-        return BLOCK_ERR_IGNORE;
+        return BLOCKDEV_ON_ERROR_IGNORE;
     } else if (!is_read && !strcmp(buf, "enospc")) {
-        return BLOCK_ERR_STOP_ENOSPC;
+        return BLOCKDEV_ON_ERROR_ENOSPC;
     } else if (!strcmp(buf, "stop")) {
-        return BLOCK_ERR_STOP_ANY;
+        return BLOCKDEV_ON_ERROR_STOP;
     } else if (!strcmp(buf, "report")) {
-        return BLOCK_ERR_REPORT;
+        return BLOCKDEV_ON_ERROR_REPORT;
     } else {
         error_report("'%s' invalid %s error action",
                      buf, is_read ? "read" : "write");
@@ -433,7 +433,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         return NULL;
     }
 
-    on_write_error = BLOCK_ERR_STOP_ENOSPC;
+    on_write_error = BLOCKDEV_ON_ERROR_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
             error_report("werror is not supported by this bus type");
@@ -446,7 +446,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         }
     }
 
-    on_read_error = BLOCK_ERR_REPORT;
+    on_read_error = BLOCKDEV_ON_ERROR_REPORT;
     if ((buf = qemu_opt_get(opts, "rerror")) != NULL) {
         if (type != IF_IDE && type != IF_VIRTIO && type != IF_SCSI && type != IF_NONE) {
             error_report("rerror is not supported by this bus type");
diff --git a/hw/fdc.c b/hw/fdc.c
index 08830c1..43b0f20 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -1994,11 +1994,11 @@ static int fdctrl_connect_drives(FDCtrl *fdctrl)
         drive->fdctrl = fdctrl;
 
         if (drive->bs) {
-            if (bdrv_get_on_error(drive->bs, 0) != BLOCK_ERR_STOP_ENOSPC) {
+            if (bdrv_get_on_error(drive->bs, 0) != BLOCKDEV_ON_ERROR_ENOSPC) {
                 error_report("fdc doesn't support drive option werror");
                 return -1;
             }
-            if (bdrv_get_on_error(drive->bs, 1) != BLOCK_ERR_REPORT) {
+            if (bdrv_get_on_error(drive->bs, 1) != BLOCKDEV_ON_ERROR_REPORT) {
                 error_report("fdc doesn't support drive option rerror");
                 return -1;
             }
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 57b9fa4..2620e87 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -559,13 +559,13 @@ static int ide_handle_rw_error(IDEState *s, int error, int op)
     int is_read = (op & BM_STATUS_RETRY_READ);
     BlockdevOnError action = bdrv_get_on_error(s->bs, is_read);
 
-    if (action == BLOCK_ERR_IGNORE) {
+    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
         return 0;
     }
 
-    if ((error == ENOSPC && action == BLOCK_ERR_STOP_ENOSPC)
-            || action == BLOCK_ERR_STOP_ANY) {
+    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
+            || action == BLOCKDEV_ON_ERROR_STOP) {
         s->bus->dma->ops->set_unit(s->bus->dma, s->unit);
         s->bus->error_status = op;
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_STOP, is_read);
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index fef83a3..c295326 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -390,13 +390,13 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     BlockdevOnError action = bdrv_get_on_error(s->qdev.conf.bs, is_read);
 
-    if (action == BLOCK_ERR_IGNORE) {
+    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
         bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_IGNORE, is_read);
         return 0;
     }
 
-    if ((error == ENOSPC && action == BLOCK_ERR_STOP_ENOSPC)
-            || action == BLOCK_ERR_STOP_ANY) {
+    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
+            || action == BLOCKDEV_ON_ERROR_STOP) {
 
         bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_STOP, is_read);
         vm_stop(RUN_STATE_IO_ERROR);
diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c
index a5eb663..d904534 100644
--- a/hw/scsi-generic.c
+++ b/hw/scsi-generic.c
@@ -400,11 +400,11 @@ static int scsi_generic_initfn(SCSIDevice *s)
         return -1;
     }
 
-    if (bdrv_get_on_error(s->conf.bs, 0) != BLOCK_ERR_STOP_ENOSPC) {
+    if (bdrv_get_on_error(s->conf.bs, 0) != BLOCKDEV_ON_ERROR_ENOSPC) {
         error_report("Device doesn't support drive option werror");
         return -1;
     }
-    if (bdrv_get_on_error(s->conf.bs, 1) != BLOCK_ERR_REPORT) {
+    if (bdrv_get_on_error(s->conf.bs, 1) != BLOCKDEV_ON_ERROR_REPORT) {
         error_report("Device doesn't support drive option rerror");
         return -1;
     }
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 01e537d..f178fa8 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -69,13 +69,13 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
     BlockdevOnError action = bdrv_get_on_error(req->dev->bs, is_read);
     VirtIOBlock *s = req->dev;
 
-    if (action == BLOCK_ERR_IGNORE) {
+    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
         return 0;
     }
 
-    if ((error == ENOSPC && action == BLOCK_ERR_STOP_ENOSPC)
-            || action == BLOCK_ERR_STOP_ANY) {
+    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
+            || action == BLOCKDEV_ON_ERROR_STOP) {
         req->next = s->rq;
         s->rq = req;
         bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_STOP, is_read);
diff --git a/qapi-schema.json b/qapi-schema.json
index 2bd94dc..43d3345 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1088,6 +1088,29 @@
 { 'command': 'query-pci', 'returns': ['PciInfo'] }
 
 ##
+# @BlockdevOnError:
+#
+# An enumeration of possible behaviors for errors on I/O operations.
+# The exact meaning depends on whether the I/O was initiated by a guest
+# or by a block job
+#
+# @report: for guest operations, report the error to the guest;
+#          for jobs, cancel the job
+#
+# @ignore: ignore the error, only report a QMP event (BLOCK_IO_ERROR
+#          or BLOCK_JOB_ERROR)
+#
+# @stop: for guest operations, stop the virtual machine;
+#        for jobs, pause the job
+#
+# @enospc: same as @stop on ENOSPC, same as @report otherwise.
+#
+# Since: 1.3
+##
+{ 'enum': 'BlockdevOnError',
+  'data': ['report', 'ignore', 'enospc', 'stop'] }
+
+##
 # @BlockJobInfo:
 #
 # Information about a long-running block device operation.
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 12/45] iostatus: change is_read to a bool
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (10 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 11/45] iostatus: move BlockdevOnError declaration to QAPI Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 13/45] iostatus: reorganize io error code Paolo Bonzini
                   ` (33 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Do this while we are touching this part of the code, before introducing
more uses of "int is_read".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: new

 block.c         | 4 ++--
 block.h         | 4 ++--
 blockdev.c      | 2 +-
 hw/ide/core.c   | 2 +-
 hw/ide/pci.c    | 4 ++--
 hw/scsi-disk.c  | 2 +-
 hw/virtio-blk.c | 4 ++--
 7 file modificati, 11 inserzioni(+), 11 rimozioni(-)

diff --git a/block.c b/block.c
index ab2f7e0..85114c5 100644
--- a/block.c
+++ b/block.c
@@ -1387,7 +1387,7 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
 }
 
 void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockErrorAction action, int is_read)
+                               BlockErrorAction action, bool is_read)
 {
     QObject *data;
     const char *action_str;
@@ -2338,7 +2338,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
     bs->on_write_error = on_write_error;
 }
 
-BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, int is_read)
+BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, bool is_read)
 {
     return is_read ? bs->on_read_error : bs->on_write_error;
 }
diff --git a/block.h b/block.h
index d93cbc3..433721d 100644
--- a/block.h
+++ b/block.h
@@ -109,7 +109,7 @@ void bdrv_iostatus_disable(BlockDriverState *bs);
 bool bdrv_iostatus_is_enabled(const BlockDriverState *bs);
 void bdrv_iostatus_set_err(BlockDriverState *bs, int error);
 void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockErrorAction action, int is_read);
+                               BlockErrorAction action, bool is_read);
 void bdrv_info_print(Monitor *mon, const QObject *data);
 void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
@@ -276,7 +276,7 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
 
 void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
                        BlockdevOnError on_write_error);
-BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, int is_read);
+BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, bool is_read);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
 int bdrv_enable_write_cache(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 9fc794f..4a7d2dd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -238,7 +238,7 @@ static void drive_put_ref_bh_schedule(DriveInfo *dinfo)
     qemu_bh_schedule(s->bh);
 }
 
-static int parse_block_error_action(const char *buf, int is_read)
+static int parse_block_error_action(const char *buf, bool is_read)
 {
     if (!strcmp(buf, "ignore")) {
         return BLOCKDEV_ON_ERROR_IGNORE;
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 2620e87..c03db4a 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -556,7 +556,7 @@ void ide_dma_error(IDEState *s)
 
 static int ide_handle_rw_error(IDEState *s, int error, int op)
 {
-    int is_read = (op & BM_STATUS_RETRY_READ);
+    bool is_read = (op & BM_STATUS_RETRY_READ) != 0;
     BlockdevOnError action = bdrv_get_on_error(s->bs, is_read);
 
     if (action == BLOCKDEV_ON_ERROR_IGNORE) {
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 88c0942..644533f 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -188,7 +188,7 @@ static void bmdma_restart_bh(void *opaque)
 {
     BMDMAState *bm = opaque;
     IDEBus *bus = bm->bus;
-    int is_read;
+    bool is_read;
     int error_status;
 
     qemu_bh_delete(bm->bh);
@@ -198,7 +198,7 @@ static void bmdma_restart_bh(void *opaque)
         return;
     }
 
-    is_read = !!(bus->error_status & BM_STATUS_RETRY_READ);
+    is_read = (bus->error_status & BM_STATUS_RETRY_READ) != 0;
 
     /* The error status must be cleared before resubmitting the request: The
      * request may fail again, and this case can only be distinguished if the
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index c295326..2dd99a9 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -386,7 +386,7 @@ static void scsi_read_data(SCSIRequest *req)
  */
 static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
 {
-    int is_read = (r->req.cmd.xfer == SCSI_XFER_FROM_DEV);
+    bool is_read = (r->req.cmd.xfer == SCSI_XFER_FROM_DEV);
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     BlockdevOnError action = bdrv_get_on_error(s->qdev.conf.bs, is_read);
 
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index f178fa8..1ac2483 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -64,7 +64,7 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
 }
 
 static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
-    int is_read)
+    bool is_read)
 {
     BlockdevOnError action = bdrv_get_on_error(req->dev->bs, is_read);
     VirtIOBlock *s = req->dev;
@@ -98,7 +98,7 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
     trace_virtio_blk_rw_complete(req, ret);
 
     if (ret) {
-        int is_read = !(ldl_p(&req->out->type) & VIRTIO_BLK_T_OUT);
+        bool is_read = !(ldl_p(&req->out->type) & VIRTIO_BLK_T_OUT);
         if (virtio_blk_handle_rw_error(req, -ret, is_read))
             return;
     }
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 13/45] iostatus: reorganize io error code
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (11 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 12/45] iostatus: change is_read to a bool Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 14/45] block: introduce block job error Paolo Bonzini
                   ` (32 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Move the common part of IDE/SCSI/virtio error handling to the block
layer.  The new function bdrv_error_action subsumes all three of
bdrv_emit_qmp_error_event, vm_stop, bdrv_iostatus_set_err.

The same scheme will be used for errors in block jobs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c         | 46 ++++++++++++++++++++++++++++++++++++++--------
 block.h         |  5 +++--
 hw/ide/core.c   | 20 +++++---------------
 hw/scsi-disk.c  | 23 +++++++----------------
 hw/virtio-blk.c | 19 +++++--------------
 qemu-tool.c     |  6 ++++++
 6 file modificati, 64 inserzioni(+), 55 rimozioni(-)

diff --git a/block.c b/block.c
index 85114c5..b7b04aa 100644
--- a/block.c
+++ b/block.c
@@ -29,6 +29,7 @@
 #include "blockjob.h"
 #include "module.h"
 #include "qjson.h"
+#include "sysemu.h"
 #include "qemu-coroutine.h"
 #include "qmp-commands.h"
 #include "qemu-timer.h"
@@ -1386,8 +1387,8 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
     }
 }
 
-void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockErrorAction action, bool is_read)
+static void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
+                                      BlockErrorAction action, bool is_read)
 {
     QObject *data;
     const char *action_str;
@@ -2343,6 +2344,39 @@ BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, bool is_read)
     return is_read ? bs->on_read_error : bs->on_write_error;
 }
 
+BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, bool is_read, int error)
+{
+    BlockdevOnError on_err = is_read ? bs->on_read_error : bs->on_write_error;
+
+    switch (on_err) {
+    case BLOCKDEV_ON_ERROR_ENOSPC:
+        return (error == ENOSPC) ? BDRV_ACTION_STOP : BDRV_ACTION_REPORT;
+    case BLOCKDEV_ON_ERROR_STOP:
+        return BDRV_ACTION_STOP;
+    case BLOCKDEV_ON_ERROR_REPORT:
+        return BDRV_ACTION_REPORT;
+    case BLOCKDEV_ON_ERROR_IGNORE:
+        return BDRV_ACTION_IGNORE;
+    default:
+        abort();
+    }
+}
+
+/* This is done by device models because, while the block layer knows
+ * about the error, it does not know whether an operation comes from
+ * the device or the block layer (from a job, for example).
+ */
+void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
+                       bool is_read, int error)
+{
+    assert(error >= 0);
+    bdrv_emit_qmp_error_event(bs, action, is_read);
+    if (action == BDRV_ACTION_STOP) {
+        vm_stop(RUN_STATE_IO_ERROR);
+        bdrv_iostatus_set_err(bs, error);
+    }
+}
+
 int bdrv_is_read_only(BlockDriverState *bs)
 {
     return bs->read_only;
@@ -4067,14 +4101,10 @@ void bdrv_iostatus_reset(BlockDriverState *bs)
     }
 }
 
-/* XXX: Today this is set by device models because it makes the implementation
-   quite simple. However, the block layer knows about the error, so it's
-   possible to implement this without device models being involved */
 void bdrv_iostatus_set_err(BlockDriverState *bs, int error)
 {
-    if (bdrv_iostatus_is_enabled(bs) &&
-        bs->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
-        assert(error >= 0);
+    assert(bdrv_iostatus_is_enabled(bs));
+    if (bs->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
         bs->iostatus = error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
                                          BLOCK_DEVICE_IO_STATUS_FAILED;
     }
diff --git a/block.h b/block.h
index 433721d..7db23b8 100644
--- a/block.h
+++ b/block.h
@@ -108,8 +108,6 @@ void bdrv_iostatus_reset(BlockDriverState *bs);
 void bdrv_iostatus_disable(BlockDriverState *bs);
 bool bdrv_iostatus_is_enabled(const BlockDriverState *bs);
 void bdrv_iostatus_set_err(BlockDriverState *bs, int error);
-void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                               BlockErrorAction action, bool is_read);
 void bdrv_info_print(Monitor *mon, const QObject *data);
 void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
@@ -277,6 +275,9 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
 void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
                        BlockdevOnError on_write_error);
 BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, bool is_read);
+BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, bool is_read, int error);
+void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
+                       bool is_read, int error);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
 int bdrv_enable_write_cache(BlockDriverState *bs);
diff --git a/hw/ide/core.c b/hw/ide/core.c
index c03db4a..d683a8c 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -557,31 +557,21 @@ void ide_dma_error(IDEState *s)
 static int ide_handle_rw_error(IDEState *s, int error, int op)
 {
     bool is_read = (op & BM_STATUS_RETRY_READ) != 0;
-    BlockdevOnError action = bdrv_get_on_error(s->bs, is_read);
+    BlockErrorAction action = bdrv_get_error_action(s->bs, is_read, error);
 
-    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
-        return 0;
-    }
-
-    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
-            || action == BLOCKDEV_ON_ERROR_STOP) {
+    if (action == BDRV_ACTION_STOP) {
         s->bus->dma->ops->set_unit(s->bus->dma, s->unit);
         s->bus->error_status = op;
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_STOP, is_read);
-        vm_stop(RUN_STATE_IO_ERROR);
-        bdrv_iostatus_set_err(s->bs, error);
-    } else {
+    } else if (action == BDRV_ACTION_REPORT) {
         if (op & BM_STATUS_DMA_RETRY) {
             dma_buf_commit(s);
             ide_dma_error(s);
         } else {
             ide_rw_error(s);
         }
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_REPORT, is_read);
     }
-
-    return 1;
+    bdrv_error_action(s->bs, action, is_read, error);
+    return action != BDRV_ACTION_IGNORE;
 }
 
 void ide_dma_cb(void *opaque, int ret)
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 2dd99a9..99bb02e 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -388,21 +388,9 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
 {
     bool is_read = (r->req.cmd.xfer == SCSI_XFER_FROM_DEV);
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-    BlockdevOnError action = bdrv_get_on_error(s->qdev.conf.bs, is_read);
+    BlockErrorAction action = bdrv_get_error_action(s->qdev.conf.bs, is_read, error);
 
-    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
-        bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_IGNORE, is_read);
-        return 0;
-    }
-
-    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
-            || action == BLOCKDEV_ON_ERROR_STOP) {
-
-        bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_STOP, is_read);
-        vm_stop(RUN_STATE_IO_ERROR);
-        bdrv_iostatus_set_err(s->qdev.conf.bs, error);
-        scsi_req_retry(&r->req);
-    } else {
+    if (action == BDRV_ACTION_REPORT) {
         switch (error) {
         case ENOMEDIUM:
             scsi_check_condition(r, SENSE_CODE(NO_MEDIUM));
@@ -417,9 +405,12 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
             scsi_check_condition(r, SENSE_CODE(IO_ERROR));
             break;
         }
-        bdrv_emit_qmp_error_event(s->qdev.conf.bs, BDRV_ACTION_REPORT, is_read);
     }
-    return 1;
+    bdrv_error_action(s->qdev.conf.bs, action, is_read, error);
+    if (action == BDRV_ACTION_STOP) {
+        scsi_req_retry(&r->req);
+    }
+    return action != BDRV_ACTION_IGNORE;
 }
 
 static void scsi_write_complete(void * opaque, int ret)
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 1ac2483..e25cc96 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -66,29 +66,20 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
 static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
     bool is_read)
 {
-    BlockdevOnError action = bdrv_get_on_error(req->dev->bs, is_read);
+    BlockErrorAction action = bdrv_get_error_action(req->dev->bs, is_read, error);
     VirtIOBlock *s = req->dev;
 
-    if (action == BLOCKDEV_ON_ERROR_IGNORE) {
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
-        return 0;
-    }
-
-    if ((error == ENOSPC && action == BLOCKDEV_ON_ERROR_ENOSPC)
-            || action == BLOCKDEV_ON_ERROR_STOP) {
+    if (action == BDRV_ACTION_STOP) {
         req->next = s->rq;
         s->rq = req;
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_STOP, is_read);
-        vm_stop(RUN_STATE_IO_ERROR);
-        bdrv_iostatus_set_err(s->bs, error);
-    } else {
+    } else if (action == BDRV_ACTION_REPORT) {
         virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
         bdrv_acct_done(s->bs, &req->acct);
         g_free(req);
-        bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_REPORT, is_read);
     }
 
-    return 1;
+    bdrv_error_action(s->bs, action, is_read, error);
+    return action != BDRV_ACTION_IGNORE;
 }
 
 static void virtio_blk_rw_complete(void *opaque, int ret)
diff --git a/qemu-tool.c b/qemu-tool.c
index 18205ba..f2f9813 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -19,6 +19,7 @@
 #include "qemu-log.h"
 #include "migration.h"
 #include "main-loop.h"
+#include "sysemu.h"
 #include "qemu_socket.h"
 #include "slirp/libslirp.h"
 
@@ -37,6 +38,11 @@ const char *qemu_get_vm_name(void)
 
 Monitor *cur_mon;
 
+void vm_stop(RunState state)
+{
+    abort();
+}
+
 int monitor_cur_is_qmp(void)
 {
     return 0;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 14/45] block: introduce block job error
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (12 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 13/45] iostatus: reorganize io error code Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 19:10   ` Eric Blake
  2012-09-27 13:41   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 15/45] stream: add on-error argument Paolo Bonzini
                   ` (31 subsequent siblings)
  45 siblings, 2 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

The following behaviors are possible:

'report': The behavior is the same as in 1.1.  An I/O error,
respectively during a read or a write, will complete the job immediately
with an error code.

'ignore': An I/O error, respectively during a read or a write, will be
ignored.  For streaming, the job will complete with an error and the
backing file will be left in place.  For mirroring, the sector will be
marked again as dirty and re-examined later.

'stop': The job will be paused and the job iostatus will be set to
failed or nospace, while the VM will keep running.  This can only be
specified if the block device has rerror=stop and werror=stop or enospc.

'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.

In all cases, even for 'report', the I/O error is reported as a QMP
event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.

It is possible that while stopping the VM a BLOCK_IO_ERROR event will be
reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa.
This is not really avoidable since stopping the VM completes all pending
I/O requests.  In fact, it is already possible now that a series of
BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop
calls bdrv_drain_all and this can generate further errors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: introduced block_job_iostatus_reset.  Removed sorting
        of iostatus values with "failed" overriding "nospace" but not
        vice versa.  Documented that block-job-resume clears the
        iostatus field.  Always set errors on the block job even if
        they happen on the target; this removes the need to expose
        the target's BlockInfo in "query-blockjobs".

 QMP/qmp-events.txt | 22 ++++++++++++++++++++
 block.c            |  9 ++++----
 block_int.h        |  4 ++++
 blockjob.c         | 61 ++++++++++++++++++++++++++++++++++++++++++++++++------
 blockjob.h         | 25 ++++++++++++++++++++++
 monitor.c          |  1 +
 monitor.h          |  1 +
 qapi-schema.json   |  7 ++++++-
 8 file modificati, 119 inserzioni(+), 11 rimozioni(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 2878058..ae19db2 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -94,6 +94,28 @@ Example:
                "speed": 0 },
      "timestamp": { "seconds": 1267061043, "microseconds": 959568 } }
 
+BLOCK_JOB_ERROR
+---------------
+
+Emitted when a block job encounters an error.
+
+Data:
+
+- "device": device name (json-string)
+- "operation": I/O operation (json-string, "read" or "write")
+- "action": action that has been taken, it's one of the following (json-string):
+    "ignore": error has been ignored, the job may fail later
+    "report": error will be reported and the job canceled
+    "stop": error caused job to be paused
+
+Example:
+
+{ "event": "BLOCK_JOB_ERROR",
+    "data": { "device": "ide0-hd1",
+              "operation": "write",
+              "action": "stop" },
+    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+
 DEVICE_TRAY_MOVED
 -----------------
 
diff --git a/block.c b/block.c
index b7b04aa..83a695b 100644
--- a/block.c
+++ b/block.c
@@ -1387,8 +1387,9 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
     }
 }
 
-static void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                                      BlockErrorAction action, bool is_read)
+void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
+                               enum MonitorEvent ev,
+                               BlockErrorAction action, bool is_read)
 {
     QObject *data;
     const char *action_str;
@@ -1411,7 +1412,7 @@ static void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
                               bdrv->device_name,
                               action_str,
                               is_read ? "read" : "write");
-    monitor_protocol_event(QEVENT_BLOCK_IO_ERROR, data);
+    monitor_protocol_event(ev, data);
 
     qobject_decref(data);
 }
@@ -2370,7 +2371,7 @@ void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
                        bool is_read, int error)
 {
     assert(error >= 0);
-    bdrv_emit_qmp_error_event(bs, action, is_read);
+    bdrv_emit_qmp_error_event(bs, QEVENT_BLOCK_IO_ERROR, action, is_read);
     if (action == BDRV_ACTION_STOP) {
         vm_stop(RUN_STATE_IO_ERROR);
         bdrv_iostatus_set_err(bs, error);
diff --git a/block_int.h b/block_int.h
index db487eb..5257b10 100644
--- a/block_int.h
+++ b/block_int.h
@@ -31,6 +31,7 @@
 #include "qemu-timer.h"
 #include "qapi-types.h"
 #include "qerror.h"
+#include "monitor.h"
 
 #define BLOCK_FLAG_ENCRYPT          1
 #define BLOCK_FLAG_COMPAT6          4
@@ -286,6 +287,9 @@ void bdrv_set_io_limits(BlockDriverState *bs,
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
+void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
+                               enum MonitorEvent ev,
+                               BlockErrorAction action, bool is_read);
 
 /**
  * stream_start:
diff --git a/blockjob.c b/blockjob.c
index 884bd2b..5dd9c1e 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -112,6 +112,7 @@ bool block_job_is_paused(BlockJob *job)
 void block_job_resume(BlockJob *job)
 {
     job->paused = false;
+    block_job_iostatus_reset(job);
     if (job->co && !job->busy) {
         qemu_coroutine_enter(job->co, NULL);
     }
@@ -128,6 +129,11 @@ bool block_job_is_cancelled(BlockJob *job)
     return job->cancelled;
 }
 
+void block_job_iostatus_reset(BlockJob *job)
+{
+    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
+}
+
 struct BlockCancelData {
     BlockJob *job;
     BlockDriverCompletionFunc *cb;
@@ -189,11 +195,54 @@ void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
 BlockJobInfo *block_job_query(BlockJob *job)
 {
     BlockJobInfo *info = g_new0(BlockJobInfo, 1);
-    info->type   = g_strdup(job->job_type->job_type);
-    info->device = g_strdup(bdrv_get_device_name(job->bs));
-    info->len    = job->len;
-    info->paused = job->paused;
-    info->offset = job->offset;
-    info->speed  = job->speed;
+    info->type      = g_strdup(job->job_type->job_type);
+    info->device    = g_strdup(bdrv_get_device_name(job->bs));
+    info->len       = job->len;
+    info->paused    = job->paused;
+    info->offset    = job->offset;
+    info->speed     = job->speed;
+    info->io_status = job->iostatus;
     return info;
 }
+
+static void block_job_iostatus_set_err(BlockJob *job, int error)
+{
+    if (job->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
+        job->iostatus = error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
+                                          BLOCK_DEVICE_IO_STATUS_FAILED;
+    }
+}
+
+
+BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
+                                        BlockdevOnError on_err,
+                                        int is_read, int error)
+{
+    BlockErrorAction action;
+
+    switch (on_err) {
+    case BLOCKDEV_ON_ERROR_ENOSPC:
+        action = (error == ENOSPC) ? BDRV_ACTION_STOP : BDRV_ACTION_REPORT;
+        break;
+    case BLOCKDEV_ON_ERROR_STOP:
+        action = BDRV_ACTION_STOP;
+        break;
+    case BLOCKDEV_ON_ERROR_REPORT:
+        action = BDRV_ACTION_REPORT;
+        break;
+    case BLOCKDEV_ON_ERROR_IGNORE:
+        action = BDRV_ACTION_IGNORE;
+        break;
+    default:
+        abort();
+    }
+    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR, action, is_read);
+    if (action == BDRV_ACTION_STOP) {
+        block_job_pause(job);
+        block_job_iostatus_set_err(job, error);
+        if (bs != job->bs) {
+            bdrv_iostatus_set_err(bs, error);
+        }
+    }
+    return action;
+}
diff --git a/blockjob.h b/blockjob.h
index a0d1b5c..2070a1b 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -82,6 +82,9 @@ struct BlockJob {
      */
     bool busy;
 
+    /** Status that is published by the query-block-jobs QMP API */
+    BlockDeviceIoStatus iostatus;
+
     /** Offset that is published by the query-block-jobs QMP API */
     int64_t offset;
 
@@ -216,4 +219,26 @@ bool block_job_is_paused(BlockJob *job);
  */
 int block_job_cancel_sync(BlockJob *job);
 
+/**
+ * block_job_iostatus_reset:
+ * @job: The job whose I/O status should be reset.
+ *
+ * Reset I/O status on @job.
+ */
+void block_job_iostatus_reset(BlockJob *job);
+
+/**
+ * block_job_error_action:
+ * @job: The job to signal an error for.
+ * @bs: The block device on which to set an I/O error.
+ * @on_err: The error action setting.
+ * @is_read: Whether the operation was a read.
+ * @error: The error that was reported.
+ *
+ * Report an I/O error for a block job and possibly stop the VM.  Return the
+ * action that was selected based on @on_err and @error.
+ */
+BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
+                                        BlockdevOnError on_err,
+                                        int is_read, int error);
 #endif
diff --git a/monitor.c b/monitor.c
index 67064e2..d4bd5fe 100644
--- a/monitor.c
+++ b/monitor.c
@@ -450,6 +450,7 @@ static const char *monitor_event_names[] = {
     [QEVENT_SPICE_DISCONNECTED] = "SPICE_DISCONNECTED",
     [QEVENT_BLOCK_JOB_COMPLETED] = "BLOCK_JOB_COMPLETED",
     [QEVENT_BLOCK_JOB_CANCELLED] = "BLOCK_JOB_CANCELLED",
+    [QEVENT_BLOCK_JOB_ERROR] = "BLOCK_JOB_ERROR",
     [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
     [QEVENT_SUSPEND] = "SUSPEND",
     [QEVENT_SUSPEND_DISK] = "SUSPEND_DISK",
diff --git a/monitor.h b/monitor.h
index 64c1561..43040af 100644
--- a/monitor.h
+++ b/monitor.h
@@ -38,6 +38,7 @@ typedef enum MonitorEvent {
     QEVENT_SPICE_DISCONNECTED,
     QEVENT_BLOCK_JOB_COMPLETED,
     QEVENT_BLOCK_JOB_CANCELLED,
+    QEVENT_BLOCK_JOB_ERROR,
     QEVENT_DEVICE_TRAY_MOVED,
     QEVENT_SUSPEND,
     QEVENT_SUSPEND_DISK,
diff --git a/qapi-schema.json b/qapi-schema.json
index 43d3345..6e4b5b7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1127,11 +1127,14 @@
 #
 # @speed: the rate limit, bytes per second
 #
+# @io-status: the status of the job (since 1.2)
+#
 # Since: 1.1
 ##
 { 'type': 'BlockJobInfo',
   'data': {'type': 'str', 'device': 'str', 'len': 'int',
-           'offset': 'int', 'paused': 'bool', 'speed': 'int'} }
+           'offset': 'int', 'paused': 'bool', 'speed': 'int',
+           'io-status': 'BlockDeviceIoStatus'} }
 
 ##
 # @query-block-jobs:
@@ -1919,6 +1922,8 @@
 # operation.  It is an error to call this command if no operation is in
 # progress.  Resuming an already running job is not an error.
 #
+# This command also clears the error status of the job.
+#
 # @device: the device name
 #
 # Returns: Nothing on success
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 15/45] stream: add on-error argument
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (13 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 14/45] block: introduce block job error Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 20:53   ` Eric Blake
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 16/45] blkdebug: process all set_state rules in the old state Paolo Bonzini
                   ` (30 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

This patch adds support for error management to streaming.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/stream.c   | 28 +++++++++++++++++++++++++++-
 block_int.h      |  3 ++-
 blockdev.c       | 11 ++++++++---
 hmp.c            |  3 ++-
 qapi-schema.json |  9 +++++++--
 qmp-commands.hx  |  2 +-
 6 file modificati, 47 inserzioni(+), 9 rimozioni(-)

diff --git a/block/stream.c b/block/stream.c
index a8f585a..0c0fc7a 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -31,6 +31,7 @@ typedef struct StreamBlockJob {
     BlockJob common;
     RateLimit limit;
     BlockDriverState *base;
+    BlockdevOnError on_error;
     char backing_file_id[1024];
 } StreamBlockJob;
 
@@ -78,6 +79,7 @@ static void coroutine_fn stream_run(void *opaque)
     BlockDriverState *bs = s->common.bs;
     BlockDriverState *base = s->base;
     int64_t sector_num, end;
+    int error = 0;
     int ret = 0;
     int n = 0;
     void *buf;
@@ -142,7 +144,19 @@ wait:
             ret = stream_populate(bs, sector_num, n, buf);
         }
         if (ret < 0) {
-            break;
+            BlockErrorAction action =
+                block_job_error_action(&s->common, s->common.bs, s->on_error,
+                                       true, -ret);
+            if (action == BDRV_ACTION_STOP) {
+                n = 0;
+                continue;
+            }
+            if (error == 0) {
+                error = ret;
+            }
+            if (action == BDRV_ACTION_REPORT) {
+                break;
+            }
         }
         ret = 0;
 
@@ -154,6 +168,9 @@ wait:
         bdrv_disable_copy_on_read(bs);
     }
 
+    /* Do not remove the backing file if an error was there but ignored.  */
+    ret = error;
+
     if (!block_job_is_cancelled(&s->common) && sector_num == end && ret == 0) {
         const char *base_id = NULL, *base_fmt = NULL;
         if (base) {
@@ -189,11 +206,19 @@ static BlockJobType stream_job_type = {
 
 void stream_start(BlockDriverState *bs, BlockDriverState *base,
                   const char *base_id, int64_t speed,
+                  BlockdevOnError on_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp)
 {
     StreamBlockJob *s;
 
+    if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
+         on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
+        !bdrv_iostatus_is_enabled(bs)) {
+        error_set(errp, QERR_INVALID_PARAMETER, "on-error");
+        return;
+    }
+
     s = block_job_create(&stream_job_type, bs, speed, cb, opaque, errp);
     if (!s) {
         return;
@@ -204,6 +229,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
         pstrcpy(s->backing_file_id, sizeof(s->backing_file_id), base_id);
     }
 
+    s->on_error = on_error;
     s->common.co = qemu_coroutine_create(stream_run);
     trace_stream_start(bs, base, s, s->common.co, opaque);
     qemu_coroutine_enter(s->common.co, s);
diff --git a/block_int.h b/block_int.h
index 5257b10..057baae 100644
--- a/block_int.h
+++ b/block_int.h
@@ -299,6 +299,7 @@ void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
  * @base_id: The file name that will be written to @bs as the new
  * backing file if the job completes.  Ignored if @base is %NULL.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @on_error: The action to take upon error.
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
@@ -310,7 +311,7 @@ void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
  * @base_id in the written image and to @base in the live BlockDriverState.
  */
 void stream_start(BlockDriverState *bs, BlockDriverState *base,
-                  const char *base_id, int64_t speed,
+                  const char *base_id, int64_t speed, BlockdevOnError on_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
diff --git a/blockdev.c b/blockdev.c
index 4a7d2dd..425759e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1091,13 +1091,18 @@ static void block_job_cb(void *opaque, int ret)
 }
 
 void qmp_block_stream(const char *device, bool has_base,
-                      const char *base, bool has_speed,
-                      int64_t speed, Error **errp)
+                      const char *base, bool has_speed, int64_t speed,
+                      bool has_on_error, BlockdevOnError on_error,
+                      Error **errp)
 {
     BlockDriverState *bs;
     BlockDriverState *base_bs = NULL;
     Error *local_err = NULL;
 
+    if (!has_on_error) {
+        on_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
+
     bs = bdrv_find(device);
     if (!bs) {
         error_set(errp, QERR_DEVICE_NOT_FOUND, device);
@@ -1113,7 +1118,7 @@ void qmp_block_stream(const char *device, bool has_base,
     }
 
     stream_start(bs, base_bs, base, has_speed ? speed : 0,
-                 block_job_cb, bs, &local_err);
+                 on_error, block_job_cb, bs, &local_err);
     if (error_is_set(&local_err)) {
         error_propagate(errp, local_err);
         return;
diff --git a/hmp.c b/hmp.c
index 55601f7..df789b2 100644
--- a/hmp.c
+++ b/hmp.c
@@ -930,7 +930,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
     int64_t speed = qdict_get_try_int(qdict, "speed", 0);
 
     qmp_block_stream(device, base != NULL, base,
-                     qdict_haskey(qdict, "speed"), speed, &error);
+                     qdict_haskey(qdict, "speed"), speed,
+                     BLOCKDEV_ON_ERROR_REPORT, true, &error);
 
     hmp_handle_error(mon, &error);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 6e4b5b7..8719a9d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1831,13 +1831,18 @@
 #
 # @speed:  #optional the maximum speed, in bytes per second
 #
+# @on-error: #optional the action to take on an error (default report).
+#            'stop' and 'enospc' can only be used if the block device
+#            supports io-status (see BlockInfo).  Since 1.2.
+#
 # Returns: Nothing on success
 #          If @device does not exist, DeviceNotFound
 #
 # Since: 1.1
 ##
-{ 'command': 'block-stream', 'data': { 'device': 'str', '*base': 'str',
-                                       '*speed': 'int' } }
+{ 'command': 'block-stream',
+  'data': { 'device': 'str', '*base': 'str', '*speed': 'int',
+            '*on-error': 'BlockdevOnError' } }
 
 ##
 # @block-job-set-speed:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 85eacb5..07fa8fe 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -787,7 +787,7 @@ EQMP
 
     {
         .name       = "block-stream",
-        .args_type  = "device:B,base:s?,speed:o?",
+        .args_type  = "device:B,base:s?,speed:o?,on-error:s?",
         .mhandler.cmd_new = qmp_marshal_input_block_stream,
     },
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 16/45] blkdebug: process all set_state rules in the old state
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (14 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 15/45] stream: add on-error argument Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 17/45] qemu-iotests: map underscore to dash in QMP argument names Paolo Bonzini
                   ` (29 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Currently it is impossible to write a blkdebug script that ping-pongs
between two states, because the second set-state rule will use the
state that is set in the first.  If you have

    [set-state]
    event = "..."
    state = "1"
    new_state = "2"

    [set-state]
    event = "..."
    state = "2"
    new_state = "1"

for example the state will remain locked at 1.  This can be fixed
by first processing all rules, and then setting the state.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: dropped stray debug printfs

 block/blkdebug.c | 12 +++++++-----
 1 file modificato, 7 inserzioni(+), 5 rimozioni(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 59dcea0..1206d52 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -28,6 +28,7 @@
 
 typedef struct BDRVBlkdebugState {
     int state;
+    int new_state;
     QLIST_HEAD(, BlkdebugRule) rules[BLKDBG_EVENT_MAX];
     QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
 } BDRVBlkdebugState;
@@ -403,12 +404,12 @@ static void blkdebug_close(BlockDriverState *bs)
 }
 
 static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
-    int old_state, bool injected)
+    bool injected)
 {
     BDRVBlkdebugState *s = bs->opaque;
 
     /* Only process rules for the current state */
-    if (rule->state && rule->state != old_state) {
+    if (rule->state && rule->state != s->state) {
         return injected;
     }
 
@@ -423,7 +424,7 @@ static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
         break;
 
     case ACTION_SET_STATE:
-        s->state = rule->options.set_state.new_state;
+        s->new_state = rule->options.set_state.new_state;
         break;
     }
     return injected;
@@ -433,15 +434,16 @@ static void blkdebug_debug_event(BlockDriverState *bs, BlkDebugEvent event)
 {
     BDRVBlkdebugState *s = bs->opaque;
     struct BlkdebugRule *rule;
-    int old_state = s->state;
     bool injected;
 
     assert((int)event >= 0 && event < BLKDBG_EVENT_MAX);
 
     injected = false;
+    s->new_state = s->state;
     QLIST_FOREACH(rule, &s->rules[event], next) {
-        injected = process_rule(bs, rule, old_state, injected);
+        injected = process_rule(bs, rule, injected);
     }
+    s->state = s->new_state;
 }
 
 static int64_t blkdebug_getlength(BlockDriverState *bs)
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 17/45] qemu-iotests: map underscore to dash in QMP argument names
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (15 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 16/45] blkdebug: process all set_state rules in the old state Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 18/45] qemu-iotests: add tests for streaming error handling Paolo Bonzini
                   ` (28 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

iotests.py provides a convenience function that uses Python keyword
arguments to represent QMP command arguments.  However, almost all
QMP commands use dashes for argument names (the sole exception is
block_set_io_throttle), and dashes are not allowed in a keyword
argument name.  Hence provide automatic conversion of underscores
to dashes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/iotests.py | 8 +++++++-
 1 file modificato, 7 inserzioni(+). 1 rimozione(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e05b1d6..a94ea75 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -19,6 +19,7 @@
 import os
 import re
 import subprocess
+import string
 import unittest
 import sys; sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'QMP'))
 import qmp
@@ -96,9 +97,14 @@ class VM(object):
             os.remove(self._qemu_log_path)
             self._popen = None
 
+    underscore_to_dash = string.maketrans('_', '-')
     def qmp(self, cmd, **args):
         '''Invoke a QMP command and return the result dict'''
-        return self._qmp.cmd(cmd, args=args)
+        qmp_args = dict()
+        for k in args.keys():
+            qmp_args[k.translate(self.underscore_to_dash)] = args[k]
+
+        return self._qmp.cmd(cmd, args=qmp_args)
 
     def get_qmp_events(self, wait=False):
         '''Poll for queued QMP events and return a list of dicts'''
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 18/45] qemu-iotests: add tests for streaming error handling
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (16 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 17/45] qemu-iotests: map underscore to dash in QMP argument names Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 19/45] block: add bdrv_query_info Paolo Bonzini
                   ` (27 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Add a test for each of report/ignore/stop.  The tests use blkdebug
to generate an error in the middle of a script.  The error is
recoverable (once = "on") so that we can test resuming a job after
stopping for an error.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/030        | 220 ++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/030.out    |   4 +-
 tests/qemu-iotests/iotests.py |   7 ++
 4 file modificati, 230 inserzioni(+), 3 rimozioni(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index dfacdf1..dd4ef11 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -195,6 +195,226 @@ class TestSmallerBackingFile(ImageStreamingTestCase):
         self.assert_no_active_streams()
         self.vm.shutdown()
 
+class TestErrors(ImageStreamingTestCase):
+    image_len = 2 * 1024 * 1024 # MB
+
+    # this should match STREAM_BUFFER_SIZE/512 in block/stream.c
+    STREAM_BUFFER_SIZE = 512 * 1024
+
+    def create_blkdebug_file(self, name, event, errno):
+        file = open(name, 'w')
+        file.write('''
+[inject-error]
+state = "1"
+event = "%s"
+errno = "%d"
+immediately = "off"
+once = "on"
+sector = "%d"
+
+[set-state]
+state = "1"
+event = "%s"
+new_state = "2"
+
+[set-state]
+state = "2"
+event = "%s"
+new_state = "1"
+''' % (event, errno, self.STREAM_BUFFER_SIZE / 512, event, event))
+        file.close()
+
+class TestEIO(TestErrors):
+    def setUp(self):
+        self.blkdebug_file = backing_img + ".blkdebug"
+        self.create_image(backing_img, TestErrors.image_len)
+        self.create_blkdebug_file(self.blkdebug_file, "read_aio", 5)
+        qemu_img('create', '-f', iotests.imgfmt,
+                 '-o', 'backing_file=blkdebug:%s:%s,backing_fmt=raw'
+                       % (self.blkdebug_file, backing_img),
+                 test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(self.blkdebug_file)
+
+    def test_report(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        error = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/error', 'Input/output error')
+                    self.assert_qmp(event, 'data/offset', self.STREAM_BUFFER_SIZE)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
+
+    def test_ignore(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0', on_error='ignore')
+        self.assert_qmp(result, 'return', {})
+
+        error = False
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', False)
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/error', 'Input/output error')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
+
+    def test_stop(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0', on_error='stop')
+        self.assert_qmp(result, 'return', {})
+
+        error = False
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', True)
+                    self.assert_qmp(result, 'return[0]/offset', self.STREAM_BUFFER_SIZE)
+                    self.assert_qmp(result, 'return[0]/io-status', 'failed')
+
+                    result = self.vm.qmp('block-job-resume', device='drive0')
+                    self.assert_qmp(result, 'return', {})
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', False)
+                    self.assert_qmp(result, 'return[0]/io-status', 'ok')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp_absent(event, 'data/error')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
+
+    def test_enospc(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0', on_error='enospc')
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        error = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/error', 'Input/output error')
+                    self.assert_qmp(event, 'data/offset', self.STREAM_BUFFER_SIZE)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
+
+class TestENOSPC(TestErrors):
+    def setUp(self):
+        self.blkdebug_file = backing_img + ".blkdebug"
+        self.create_image(backing_img, TestErrors.image_len)
+        self.create_blkdebug_file(self.blkdebug_file, "read_aio", 28)
+        qemu_img('create', '-f', iotests.imgfmt,
+                 '-o', 'backing_file=blkdebug:%s:%s,backing_fmt=raw'
+                       % (self.blkdebug_file, backing_img),
+                 test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(self.blkdebug_file)
+
+    def test_enospc(self):
+        self.assert_no_active_streams()
+
+        result = self.vm.qmp('block-stream', device='drive0', on_error='enospc')
+        self.assert_qmp(result, 'return', {})
+
+        error = False
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', True)
+                    self.assert_qmp(result, 'return[0]/offset', self.STREAM_BUFFER_SIZE)
+                    self.assert_qmp(result, 'return[0]/io-status', 'nospace')
+
+                    result = self.vm.qmp('block-job-resume', device='drive0')
+                    self.assert_qmp(result, 'return', {})
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', False)
+                    self.assert_qmp(result, 'return[0]/io-status', 'ok')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'stream')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp_absent(event, 'data/error')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_streams()
+        self.vm.shutdown()
 
 class TestStreamStop(ImageStreamingTestCase):
     image_len = 8 * 1024 * 1024 * 1024 # GB
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 594c16f..fa16b5c 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-........
+.............
 ----------------------------------------------------------------------
-Ran 8 tests
+Ran 13 tests
 
 OK
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index a94ea75..3c60b2d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -138,6 +138,13 @@ class QMPTestCase(unittest.TestCase):
                     self.fail('invalid index "%s" in path "%s" in "%s"' % (idx, path, str(d)))
         return d
 
+    def assert_qmp_absent(self, d, path):
+        try:
+            result = self.dictpath(d, path)
+        except AssertionError:
+            return
+        self.fail('path "%s" has value "%s"' % (path, str(result)))
+
     def assert_qmp(self, d, path, value):
         '''Assert that the value for a specific path in a QMP dict matches'''
         result = self.dictpath(d, path)
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 19/45] block: add bdrv_query_info
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (17 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 18/45] qemu-iotests: add tests for streaming error handling Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-10-15 15:42   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 20/45] block: add bdrv_query_stats Paolo Bonzini
                   ` (26 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Extract it out of the implementation of "info block".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: moved bdrv_query_info close to qmp_query_block.
        Fixed conflicts for the new field 'encryption_key_missing'
        too.

 block.c | 104 +++++++++++++++++++++++++++++++---------------------------------
 block.h |   1 +
 2 file modificati, 52 inserzioni(+), 53 rimozioni(-)

diff --git a/block.c b/block.c
index 83a695b..1d95a5d 100644
--- a/block.c
+++ b/block.c
@@ -2653,69 +2653,67 @@ int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *top,
     return 0;
 }
 
+BlockInfo *bdrv_query_info(BlockDriverState *bs)
+{
+    BlockInfo *info = g_malloc0(sizeof(*info));
+    info->device = g_strdup(bs->device_name);
+    info->type = g_strdup("unknown");
+    info->locked = bdrv_dev_is_medium_locked(bs);
+    info->removable = bdrv_dev_has_removable_media(bs);
+
+    if (bdrv_dev_has_removable_media(bs)) {
+        info->has_tray_open = true;
+        info->tray_open = bdrv_dev_is_tray_open(bs);
+    }
+
+    if (bdrv_iostatus_is_enabled(bs)) {
+        info->has_io_status = true;
+        info->io_status = bs->iostatus;
+    }
+
+    if (bs->drv) {
+        info->has_inserted = true;
+        info->inserted = g_malloc0(sizeof(*info->inserted));
+        info->inserted->file = g_strdup(bs->filename);
+        info->inserted->ro = bs->read_only;
+        info->inserted->drv = g_strdup(bs->drv->format_name);
+        info->inserted->encrypted = bs->encrypted;
+        info->inserted->encryption_key_missing = bdrv_key_required(bs);
+
+        if (bs->backing_file[0]) {
+            info->inserted->has_backing_file = true;
+            info->inserted->backing_file = g_strdup(bs->backing_file);
+        }
+
+        if (bs->io_limits_enabled) {
+            info->inserted->bps =
+                           bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+            info->inserted->bps_rd =
+                           bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
+            info->inserted->bps_wr =
+                           bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
+            info->inserted->iops =
+                           bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+            info->inserted->iops_rd =
+                           bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
+            info->inserted->iops_wr =
+                           bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
+        }
+    }
+    return info;
+}
+
 BlockInfoList *qmp_query_block(Error **errp)
 {
-    BlockInfoList *head = NULL, *cur_item = NULL;
+    BlockInfoList *head = NULL, **p_next = &head;
     BlockDriverState *bs;
 
     QTAILQ_FOREACH(bs, &bdrv_states, list) {
         BlockInfoList *info = g_malloc0(sizeof(*info));
+        info->value = bdrv_query_info(bs);
 
-        info->value = g_malloc0(sizeof(*info->value));
-        info->value->device = g_strdup(bs->device_name);
-        info->value->type = g_strdup("unknown");
-        info->value->locked = bdrv_dev_is_medium_locked(bs);
-        info->value->removable = bdrv_dev_has_removable_media(bs);
-
-        if (bdrv_dev_has_removable_media(bs)) {
-            info->value->has_tray_open = true;
-            info->value->tray_open = bdrv_dev_is_tray_open(bs);
-        }
-
-        if (bdrv_iostatus_is_enabled(bs)) {
-            info->value->has_io_status = true;
-            info->value->io_status = bs->iostatus;
-        }
-
-        if (bs->drv) {
-            info->value->has_inserted = true;
-            info->value->inserted = g_malloc0(sizeof(*info->value->inserted));
-            info->value->inserted->file = g_strdup(bs->filename);
-            info->value->inserted->ro = bs->read_only;
-            info->value->inserted->drv = g_strdup(bs->drv->format_name);
-            info->value->inserted->encrypted = bs->encrypted;
-            info->value->inserted->encryption_key_missing = bdrv_key_required(bs);
-            if (bs->backing_file[0]) {
-                info->value->inserted->has_backing_file = true;
-                info->value->inserted->backing_file = g_strdup(bs->backing_file);
-            }
-
-            info->value->inserted->backing_file_depth =
-                bdrv_get_backing_file_depth(bs);
-
-            if (bs->io_limits_enabled) {
-                info->value->inserted->bps =
-                               bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
-                info->value->inserted->bps_rd =
-                               bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
-                info->value->inserted->bps_wr =
-                               bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
-                info->value->inserted->iops =
-                               bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
-                info->value->inserted->iops_rd =
-                               bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
-                info->value->inserted->iops_wr =
-                               bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
-            }
-        }
-
-        /* XXX: waiting for the qapi to support GSList */
-        if (!cur_item) {
-            head = cur_item = info;
-        } else {
-            cur_item->next = info;
-            cur_item = info;
-        }
+        *p_next = info;
+        p_next = &info->next;
     }
 
     return head;
diff --git a/block.h b/block.h
index 7db23b8..e450746 100644
--- a/block.h
+++ b/block.h
@@ -308,6 +308,7 @@ void bdrv_get_backing_filename(BlockDriverState *bs,
                                char *filename, int filename_size);
 void bdrv_get_full_backing_filename(BlockDriverState *bs,
                                     char *dest, size_t sz);
+BlockInfo *bdrv_query_info(BlockDriverState *s);
 int bdrv_can_snapshot(BlockDriverState *bs);
 int bdrv_is_snapshot(BlockDriverState *bs);
 BlockDriverState *bdrv_snapshots(void);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 20/45] block: add bdrv_query_stats
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (18 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 19/45] block: add bdrv_query_info Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 21/45] block: add bdrv_open_backing_file Paolo Bonzini
                   ` (25 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

qmp_query_blockstat cannot have errors, remove the Error argument and
create a new public function bdrv_query_stats out of it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c | 18 ++++++------------
 block.h |  1 +
 2 file modificati, 7 inserzioni(+), 12 rimozioni(-)

diff --git a/block.c b/block.c
index 1d95a5d..703261d 100644
--- a/block.c
+++ b/block.c
@@ -2719,8 +2719,7 @@ BlockInfoList *qmp_query_block(Error **errp)
     return head;
 }
 
-/* Consider exposing this as a full fledged QMP command */
-static BlockStats *qmp_query_blockstat(const BlockDriverState *bs, Error **errp)
+BlockStats *bdrv_query_stats(const BlockDriverState *bs)
 {
     BlockStats *s;
 
@@ -2744,7 +2743,7 @@ static BlockStats *qmp_query_blockstat(const BlockDriverState *bs, Error **errp)
 
     if (bs->file) {
         s->has_parent = true;
-        s->parent = qmp_query_blockstat(bs->file, NULL);
+        s->parent = bdrv_query_stats(bs->file);
     }
 
     return s;
@@ -2752,20 +2751,15 @@ static BlockStats *qmp_query_blockstat(const BlockDriverState *bs, Error **errp)
 
 BlockStatsList *qmp_query_blockstats(Error **errp)
 {
-    BlockStatsList *head = NULL, *cur_item = NULL;
+    BlockStatsList *head = NULL, **p_next = &head;
     BlockDriverState *bs;
 
     QTAILQ_FOREACH(bs, &bdrv_states, list) {
         BlockStatsList *info = g_malloc0(sizeof(*info));
-        info->value = qmp_query_blockstat(bs, NULL);
+        info->value = bdrv_query_stats(bs);
 
-        /* XXX: waiting for the qapi to support GSList */
-        if (!cur_item) {
-            head = cur_item = info;
-        } else {
-            cur_item->next = info;
-            cur_item = info;
-        }
+        *p_next = info;
+        p_next = &info->next;
     }
 
     return head;
diff --git a/block.h b/block.h
index e450746..aa1121a 100644
--- a/block.h
+++ b/block.h
@@ -309,6 +309,7 @@ void bdrv_get_backing_filename(BlockDriverState *bs,
 void bdrv_get_full_backing_filename(BlockDriverState *bs,
                                     char *dest, size_t sz);
 BlockInfo *bdrv_query_info(BlockDriverState *s);
+BlockStats *bdrv_query_stats(const BlockDriverState *bs);
 int bdrv_can_snapshot(BlockDriverState *bs);
 int bdrv_is_snapshot(BlockDriverState *bs);
 BlockDriverState *bdrv_snapshots(void);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 21/45] block: add bdrv_open_backing_file
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (19 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 20/45] block: add bdrv_query_stats Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27 18:14   ` Jeff Cody
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 22/45] block: introduce new dirty bitmap functionality Paolo Bonzini
                   ` (24 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Mirroring runs without the backing file so that it can be copied outside
QEMU.  However, we need to add it at the time the job is completed and
QEMU switches to the target.  Factor out the common bits of opening an
image and completing a mirroring operation.

The new function does not assume that the file is closed immediately after
it returns failure, so it keeps the BDRV_O_NO_BACKING flag up-to-date.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: do not close bs if the function fails (brain fart).

 block.c | 62 ++++++++++++++++++++++++++++++++++++++++++++------------------
 block.h |  1 +
 2 file modificati, 45 inserzioni(+), 18 rimozioni(-)

diff --git a/block.c b/block.c
index 703261d..6ee7052 100644
--- a/block.c
+++ b/block.c
@@ -734,6 +734,48 @@ int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags)
     return 0;
 }
 
+int bdrv_open_backing_file(BlockDriverState *bs)
+{
+    char backing_filename[PATH_MAX];
+    int back_flags, ret;
+    BlockDriver *back_drv = NULL;
+
+    if (bs->backing_hd != NULL) {
+        return 0;
+    }
+
+    bs->open_flags &= ~BDRV_O_NO_BACKING;
+    if (bs->backing_file[0] == '\0') {
+        return 0;
+    }
+
+    bs->backing_hd = bdrv_new("");
+    bdrv_get_full_backing_filename(bs, backing_filename,
+                                   sizeof(backing_filename));
+
+    if (bs->backing_format[0] != '\0') {
+        back_drv = bdrv_find_format(bs->backing_format);
+    }
+
+    /* backing files always opened read-only */
+    back_flags = bs->open_flags & ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT);
+
+    ret = bdrv_open(bs->backing_hd, backing_filename, back_flags, back_drv);
+    if (ret < 0) {
+        bdrv_delete(bs->backing_hd);
+        bs->backing_hd = NULL;
+        bs->open_flags |= BDRV_O_NO_BACKING;
+        return ret;
+    }
+    if (bs->is_temporary) {
+        bs->backing_hd->keep_read_only = !(bs->open_flags & BDRV_O_RDWR);
+    } else {
+        /* base images use the same setting as leaf */
+        bs->backing_hd->keep_read_only = bs->keep_read_only;
+    }
+    return 0;
+}
+
 /*
  * Opens a disk image (raw, qcow2, vmdk, ...)
  */
@@ -821,24 +863,8 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
     }
 
     /* If there is a backing file, use it */
-    if ((flags & BDRV_O_NO_BACKING) == 0 && bs->backing_file[0] != '\0') {
-        char backing_filename[PATH_MAX];
-        int back_flags;
-        BlockDriver *back_drv = NULL;
-
-        bs->backing_hd = bdrv_new("");
-        bdrv_get_full_backing_filename(bs, backing_filename,
-                                       sizeof(backing_filename));
-
-        if (bs->backing_format[0] != '\0') {
-            back_drv = bdrv_find_format(bs->backing_format);
-        }
-
-        /* backing files always opened read-only */
-        back_flags =
-            flags & ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING);
-
-        ret = bdrv_open(bs->backing_hd, backing_filename, back_flags, back_drv);
+    if ((flags & BDRV_O_NO_BACKING) == 0) {
+        ret = bdrv_open_backing_file(bs);
         if (ret < 0) {
             bdrv_close(bs);
             return ret;
diff --git a/block.h b/block.h
index aa1121a..08479e1 100644
--- a/block.h
+++ b/block.h
@@ -133,6 +133,7 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_parse_cache_flags(const char *mode, int *flags);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
+int bdrv_open_backing_file(BlockDriverState *bs);
 int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
               BlockDriver *drv);
 BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 22/45] block: introduce new dirty bitmap functionality
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (20 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 21/45] block: add bdrv_open_backing_file Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 23/45] block: export dirty bitmap information in query-block Paolo Bonzini
                   ` (23 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Assert that write_compressed is never used with the dirty bitmap.
Setting the bits early is wrong, because a coroutine might concurrently
examine them and copy incomplete data from the source.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------
 block.h |  5 +++--
 2 file modificati, 48 inserzioni(+), 8 rimozioni(-)

diff --git a/block.c b/block.c
index 6ee7052..2c1273c 100644
--- a/block.c
+++ b/block.c
@@ -2251,7 +2251,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
     }
 
     if (bs->dirty_bitmap) {
-        set_dirty_bitmap(bs, sector_num, nb_sectors, 1);
+        bdrv_set_dirty(bs, sector_num, nb_sectors);
     }
 
     if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
@@ -2818,9 +2818,7 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
     if (bdrv_check_request(bs, sector_num, nb_sectors))
         return -EIO;
 
-    if (bs->dirty_bitmap) {
-        set_dirty_bitmap(bs, sector_num, nb_sectors, 1);
-    }
+    assert(!bs->dirty_bitmap);
 
     return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors);
 }
@@ -4063,13 +4061,54 @@ int bdrv_get_dirty(BlockDriverState *bs, int64_t sector)
 
     if (bs->dirty_bitmap &&
         (sector << BDRV_SECTOR_BITS) < bdrv_getlength(bs)) {
-        return !!(bs->dirty_bitmap[chunk / (sizeof(unsigned long) * 8)] &
-            (1UL << (chunk % (sizeof(unsigned long) * 8))));
+        return !!(bs->dirty_bitmap[chunk / BITS_PER_LONG] &
+            (1UL << (chunk % BITS_PER_LONG)));
     } else {
         return 0;
     }
 }
 
+int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector)
+{
+    int64_t chunk;
+    int bit, elem;
+
+    /* Avoid an infinite loop.  */
+    assert(bs->dirty_count > 0);
+
+    sector = (sector | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
+    chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
+
+    QEMU_BUILD_BUG_ON(sizeof(bs->dirty_bitmap[0]) * 8 != BITS_PER_LONG);
+    elem = chunk / BITS_PER_LONG;
+    bit = chunk % BITS_PER_LONG;
+    for (;;) {
+        if (sector >= bs->total_sectors) {
+            sector = 0;
+            bit = elem = 0;
+        }
+        if (bit == 0 && bs->dirty_bitmap[elem] == 0) {
+            sector += BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG;
+            elem++;
+        } else {
+            if (bs->dirty_bitmap[elem] & (1UL << bit)) {
+                return sector;
+            }
+            sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
+            if (++bit == BITS_PER_LONG) {
+                bit = 0;
+                elem++;
+            }
+        }
+    }
+}
+
+void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
+                    int nr_sectors)
+{
+    set_dirty_bitmap(bs, cur_sector, nr_sectors, 1);
+}
+
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
                       int nr_sectors)
 {
diff --git a/block.h b/block.h
index 08479e1..9097edd 100644
--- a/block.h
+++ b/block.h
@@ -348,8 +348,9 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
 void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
 int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
-void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
-                      int nr_sectors);
+void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
+void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
+int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 23/45] block: export dirty bitmap information in query-block
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (21 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 22/45] block: introduce new dirty bitmap functionality Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-10-15 16:08   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 24/45] block: add block-job-complete Paolo Bonzini
                   ` (22 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: new

 block.c          |  6 ++++++
 qapi-schema.json | 20 ++++++++++++++++++--
 2 file modificati, 24 inserzioni(+), 2 rimozioni(-)

diff --git a/block.c b/block.c
index 2c1273c..074325d 100644
--- a/block.c
+++ b/block.c
@@ -2697,6 +2697,12 @@ BlockInfo *bdrv_query_info(BlockDriverState *bs)
         info->io_status = bs->iostatus;
     }
 
+    if (bs->dirty_bitmap) {
+        info->has_dirty = true;
+        info->dirty = g_malloc0(sizeof(*info->dirty));
+        info->dirty->count = bdrv_get_dirty_count(bs) * BDRV_SECTORS_PER_DIRTY_CHUNK;
+    }
+
     if (bs->drv) {
         info->has_inserted = true;
         info->inserted = g_malloc0(sizeof(*info->inserted));
diff --git a/qapi-schema.json b/qapi-schema.json
index 26ac21f..dd418b8 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -604,7 +604,7 @@
             '*backing_file': 'str', 'backing_file_depth': 'int',
             'encrypted': 'bool', 'encryption_key_missing': 'bool',
             'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
-            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int'} }
+            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } }
 
 ##
 # @BlockDeviceIoStatus:
@@ -622,6 +622,18 @@
 { 'enum': 'BlockDeviceIoStatus', 'data': [ 'ok', 'failed', 'nospace' ] }
 
 ##
+# @BlockDirtyInfo:
+#
+# Block dirty bitmap information.
+#
+# @count: number of dirty sectors according to the dirty bitmap
+#
+# Since: 1.3
+##
+{ 'type': 'BlockDirtyInfo',
+  'data': {'count': 'int'} }
+
+##
 # @BlockInfo:
 #
 # Block device information.  This structure describes a virtual device and
@@ -640,6 +652,9 @@
 # @tray_open: #optional True if the device has a tray and it is open
 #             (only present if removable is true)
 #
+# @dirty: #optional dirty bitmap information (only present if the dirty
+#         bitmap is enabled)
+#
 # @io-status: #optional @BlockDeviceIoStatus. Only present if the device
 #             supports it and the VM is configured to stop on errors
 #
@@ -651,7 +666,8 @@
 { 'type': 'BlockInfo',
   'data': {'device': 'str', 'type': 'str', 'removable': 'bool',
            'locked': 'bool', '*inserted': 'BlockDeviceInfo',
-           '*tray_open': 'bool', '*io-status': 'BlockDeviceIoStatus'} }
+           '*tray_open': 'bool', '*io-status': 'BlockDeviceIoStatus',
+           '*dirty': 'BlockDirtyInfo' } }
 
 ##
 # @query-block:
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 24/45] block: add block-job-complete
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (22 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 23/45] block: export dirty bitmap information in query-block Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 25/45] block: introduce BLOCK_JOB_READY event Paolo Bonzini
                   ` (21 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

While streaming can be dropped as soon as it progressed through the whole
image, mirroring needs to be completed manually for two reasons: 1) so that
management knows exactly when the VM switches to the target; 2) because
for other use cases such as replication, we may leave the operation running
for the whole life of the virtual machine.

Add a new block job command that manually completes background operations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: documentation fix, block_job_complete is asynchronous

 blockdev.c       | 13 +++++++++++++
 blockjob.c       | 10 ++++++++++
 blockjob.h       | 15 +++++++++++++++
 hmp-commands.hx  | 17 ++++++++++++++++-
 hmp.c            | 10 ++++++++++
 hmp.h            |  1 +
 qapi-schema.json | 25 +++++++++++++++++++++++++
 qerror.h         |  3 +++
 qmp-commands.hx  |  5 +++++
 trace-events     |  1 +
 10 file modificati, 99 inserzioni(+). 1 rimozione(-)

diff --git a/blockdev.c b/blockdev.c
index 425759e..24a839a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1203,6 +1203,19 @@ void qmp_block_job_resume(const char *device, Error **errp)
     block_job_resume(job);
 }
 
+void qmp_block_job_complete(const char *device, Error **errp)
+{
+    BlockJob *job = find_block_job(device);
+
+    if (!job) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
+        return;
+    }
+
+    trace_qmp_block_job_complete(job);
+    block_job_complete(job, errp);
+}
+
 static void do_qmp_query_block_jobs_one(void *opaque, BlockDriverState *bs)
 {
     BlockJobInfoList **prev = opaque;
diff --git a/blockjob.c b/blockjob.c
index 5dd9c1e..cdaae99 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -99,6 +99,16 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     job->speed = speed;
 }
 
+void block_job_complete(BlockJob *job, Error **errp)
+{
+    if (job->paused || job->cancelled || !job->job_type->complete) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_READY, job->bs->device_name);
+        return;
+    }
+
+    job->job_type->complete(job, errp);
+}
+
 void block_job_pause(BlockJob *job)
 {
     job->paused = true;
diff --git a/blockjob.h b/blockjob.h
index 2070a1b..37e0805 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -41,6 +41,12 @@ typedef struct BlockJobType {
 
     /** Optional callback for job types that support setting a speed limit */
     void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
+
+    /**
+     * Optional callback for job types whose completion must be triggered
+     * manually.
+     */
+    void (*complete)(BlockJob *job, Error **errp);
 } BlockJobType;
 
 /**
@@ -164,6 +170,15 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
 void block_job_cancel(BlockJob *job);
 
 /**
+ * block_job_complete:
+ * @job: The job to be completed.
+ * @errp: Error object.
+ *
+ * Asynchronously complete the specified job.
+ */
+void block_job_complete(BlockJob *job, Error **errp);
+
+/**
  * block_job_is_cancelled:
  * @job: The job being queried.
  *
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 27d90a2..4e52436 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -109,7 +109,22 @@ ETEXI
 STEXI
 @item block_job_cancel
 @findex block_job_cancel
-Stop an active block streaming operation.
+Stop an active background block operation (streaming, mirroring).
+ETEXI
+
+    {
+        .name       = "block_job_complete",
+        .args_type  = "device:B",
+        .params     = "device",
+        .help       = "stop an active background block operation",
+        .mhandler.cmd = hmp_block_job_complete,
+    },
+
+STEXI
+@item block_job_complete
+@findex block_job_complete
+Manually trigger completion of an active background block operation.
+For mirroring, this will switch the device to the destination path.
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index df789b2..7819110 100644
--- a/hmp.c
+++ b/hmp.c
@@ -978,6 +978,16 @@ void hmp_block_job_resume(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &error);
 }
 
+void hmp_block_job_complete(Monitor *mon, const QDict *qdict)
+{
+    Error *error = NULL;
+    const char *device = qdict_get_str(qdict, "device");
+
+    qmp_block_job_complete(device, &error);
+
+    hmp_handle_error(mon, &error);
+}
+
 typedef struct MigrationStatus
 {
     QEMUTimer *timer;
diff --git a/hmp.h b/hmp.h
index 71ea384..7bdd23c 100644
--- a/hmp.h
+++ b/hmp.h
@@ -66,6 +66,7 @@ void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_block_job_pause(Monitor *mon, const QDict *qdict);
 void hmp_block_job_resume(Monitor *mon, const QDict *qdict);
+void hmp_block_job_complete(Monitor *mon, const QDict *qdict);
 void hmp_migrate(Monitor *mon, const QDict *qdict);
 void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index dd418b8..1d78cb3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1955,6 +1955,31 @@
 { 'command': 'block-job-resume', 'data': { 'device': 'str' } }
 
 ##
+# @block-job-complete:
+#
+# Manually trigger completion of an active background block operation.  This
+# is supported for drive mirroring, where it also switches the device to
+# write to the target path only.
+#
+# This command completes an active background block operation synchronously.
+# The ordering of this command's return with the BLOCK_JOB_COMPLETED event
+# is not defined.  Note that if an I/O error occurs during the processing of
+# this command: 1) the command itself will fail; 2) the error will be processed
+# according to the rerror/werror arguments that were specified when starting
+# the operation.
+#
+# A cancelled or paused job cannot be completed.
+#
+# @device: the device name
+#
+# Returns: Nothing on success
+#          If no background operation is active on this device, DeviceNotActive
+#
+# Since: 1.3
+##
+{ 'command': 'block-job-complete', 'data': { 'device': 'str' } }
+
+##
 # @ObjectTypeInfo:
 #
 # This structure describes a search result from @qom-list-types
diff --git a/qerror.h b/qerror.h
index c91708c..dacac52 100644
--- a/qerror.h
+++ b/qerror.h
@@ -54,6 +54,9 @@ void assert_no_error(Error *err);
 #define QERR_BLOCK_JOB_PAUSED \
     ERROR_CLASS_GENERIC_ERROR, "The block job for device '%s' is currently paused"
 
+#define QERR_BLOCK_JOB_NOT_READY \
+    ERROR_CLASS_GENERIC_ERROR, "The active block job for device '%s' cannot be completed"
+
 #define QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED \
     ERROR_CLASS_GENERIC_ERROR, "Block format '%s' used by device '%s' does not support feature '%s'"
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 07fa8fe..017544e 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -813,6 +813,11 @@ EQMP
         .mhandler.cmd_new = qmp_marshal_input_block_job_resume,
     },
     {
+        .name       = "block-job-complete",
+        .args_type  = "device:B",
+        .mhandler.cmd_new = qmp_marshal_input_block_job_complete,
+    },
+    {
         .name       = "transaction",
         .args_type  = "actions:q",
         .mhandler.cmd_new = qmp_marshal_input_transaction,
diff --git a/trace-events b/trace-events
index 75f7357..833bb44 100644
--- a/trace-events
+++ b/trace-events
@@ -79,6 +79,7 @@ stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base
 qmp_block_job_cancel(void *job) "job %p"
 qmp_block_job_pause(void *job) "job %p"
 qmp_block_job_resume(void *job) "job %p"
+qmp_block_job_complete(void *job) "job %p"
 block_job_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 25/45] block: introduce BLOCK_JOB_READY event
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (23 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 24/45] block: add block-job-complete Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27  0:01   ` Eric Blake
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 26/45] mirror: introduce mirror job Paolo Bonzini
                   ` (20 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Even for jobs that need to be manually completed, management may want
to take care itself of the completion, not requiring the user to issue
a command to terminate the job.  In this case we want to avoid that
they poll us continuously, waiting for completion to become available.
Thus, add a new event that signals the phase switch and the availability
of the block-job-complete command.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 QMP/qmp-events.txt | 20 ++++++++++++++++++++
 blockdev.c         | 14 --------------
 blockjob.c         | 21 +++++++++++++++++++++
 blockjob.h         | 16 ++++++++++++++++
 monitor.c          |  1 +
 monitor.h          |  1 +
 qapi-schema.json   |  3 ++-
 7 file modificati, 61 inserzioni(+), 15 rimozioni(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index ae19db2..ceb36e3 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -116,6 +116,26 @@ Example:
               "action": "stop" },
     "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
 
+BLOCK_JOB_READY
+---------------
+
+Emitted when a block job is ready to complete.
+
+Data:
+
+- "device": device name (json-string)
+
+Example:
+
+{ "event": "BLOCK_JOB_READY",
+    "data": { "device": "ide0-hd1",
+              "operation": "write",
+              "action": "stop" },
+    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+
+Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
+event.
+
 DEVICE_TRAY_MOVED
 -----------------
 
diff --git a/blockdev.c b/blockdev.c
index 24a839a..9069ca1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1052,20 +1052,6 @@ void qmp_block_resize(const char *device, int64_t size, Error **errp)
     }
 }
 
-static QObject *qobject_from_block_job(BlockJob *job)
-{
-    return qobject_from_jsonf("{ 'type': %s,"
-                              "'device': %s,"
-                              "'len': %" PRId64 ","
-                              "'offset': %" PRId64 ","
-                              "'speed': %" PRId64 " }",
-                              job->job_type->job_type,
-                              bdrv_get_device_name(job->bs),
-                              job->len,
-                              job->offset,
-                              job->speed);
-}
-
 static void block_job_cb(void *opaque, int ret)
 {
     BlockDriverState *bs = opaque;
diff --git a/blockjob.c b/blockjob.c
index cdaae99..8d2687c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -224,6 +224,27 @@ static void block_job_iostatus_set_err(BlockJob *job, int error)
 }
 
 
+QObject *qobject_from_block_job(BlockJob *job)
+{
+    return qobject_from_jsonf("{ 'type': %s,"
+                              "'device': %s,"
+                              "'len': %" PRId64 ","
+                              "'offset': %" PRId64 ","
+                              "'speed': %" PRId64 " }",
+                              job->job_type->job_type,
+                              bdrv_get_device_name(job->bs),
+                              job->len,
+                              job->offset,
+                              job->speed);
+}
+
+void block_job_ready(BlockJob *job)
+{
+    QObject *data = qobject_from_block_job(job);
+    monitor_protocol_event(QEVENT_BLOCK_JOB_READY, data);
+    qobject_decref(data);
+}
+
 BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
                                         BlockdevOnError on_err,
                                         int is_read, int error)
diff --git a/blockjob.h b/blockjob.h
index 37e0805..be37cc1 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -211,6 +211,22 @@ void block_job_pause(BlockJob *job);
 void block_job_resume(BlockJob *job);
 
 /**
+ * qobject_from_block_job:
+ * @job: The job whose information is requested.
+ *
+ * Return a QDict corresponding to @job's query-block-jobs entry.
+ */
+QObject *qobject_from_block_job(BlockJob *job);
+
+/**
+ * block_job_ready:
+ * @job: The job which is now ready to complete.
+ *
+ * Send a BLOCK_JOB_READY event for the specified job.
+ */
+void block_job_ready(BlockJob *job);
+
+/**
  * block_job_is_paused:
  * @job: The job being queried.
  *
diff --git a/monitor.c b/monitor.c
index d4bd5fe..c4ac395 100644
--- a/monitor.c
+++ b/monitor.c
@@ -451,6 +451,7 @@ static const char *monitor_event_names[] = {
     [QEVENT_BLOCK_JOB_COMPLETED] = "BLOCK_JOB_COMPLETED",
     [QEVENT_BLOCK_JOB_CANCELLED] = "BLOCK_JOB_CANCELLED",
     [QEVENT_BLOCK_JOB_ERROR] = "BLOCK_JOB_ERROR",
+    [QEVENT_BLOCK_JOB_READY] = "BLOCK_JOB_READY",
     [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
     [QEVENT_SUSPEND] = "SUSPEND",
     [QEVENT_SUSPEND_DISK] = "SUSPEND_DISK",
diff --git a/monitor.h b/monitor.h
index 43040af..351f58a 100644
--- a/monitor.h
+++ b/monitor.h
@@ -39,6 +39,7 @@ typedef enum MonitorEvent {
     QEVENT_BLOCK_JOB_COMPLETED,
     QEVENT_BLOCK_JOB_CANCELLED,
     QEVENT_BLOCK_JOB_ERROR,
+    QEVENT_BLOCK_JOB_READY,
     QEVENT_DEVICE_TRAY_MOVED,
     QEVENT_SUSPEND,
     QEVENT_SUSPEND_DISK,
diff --git a/qapi-schema.json b/qapi-schema.json
index 1d78cb3..b466b7a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1959,7 +1959,8 @@
 #
 # Manually trigger completion of an active background block operation.  This
 # is supported for drive mirroring, where it also switches the device to
-# write to the target path only.
+# write to the target path only.  The ability to complete is signaled with
+# a BLOCK_JOB_READY event.
 #
 # This command completes an active background block operation synchronously.
 # The ordering of this command's return with the BLOCK_JOB_COMPLETED event
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 26/45] mirror: introduce mirror job
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (24 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 25/45] block: introduce BLOCK_JOB_READY event Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-10-15 16:57   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command Paolo Bonzini
                   ` (19 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

This patch adds the implementation of a new job that mirrors a disk to
a new image while letting the guest continue using the old image.
The target is treated as a "black box" and data is copied from the
source to the target in the background.  This can be used for several
purposes, including storage migration, continuous replication, and
observation of the guest I/O in an external program.  It is also a
first step in replacing the inefficient block migration code that is
part of QEMU.

The job is possibly never-ending, but it is logically structured into
two phases: 1) copy all data as fast as possible until the target
first gets in sync with the source; 2) keep target in sync and
ensure that reopening to the target gets a correct (full) copy
of the source data.

The second phase is indicated by the progress in "info block-jobs"
reporting the current offset to be equal to the length of the file.
When the job is cancelled in the second phase, QEMU will run the
job until the source is clean and quiescent, then it will report
successful completion of the job.

In other words, the BLOCK_JOB_CANCELLED event means that the target
may _not_ be consistent with a past state of the source; the
BLOCK_JOB_COMPLETED event means that the target is consistent with
a past state of the source.  (Note that it could already happen
that management lost the race against QEMU and got a completion
event instead of cancellation).

It is not yet possible to complete the job and switch over to the target
disk.  The next patches will fix this and add many refinements to the
basic idea introduced here.  These include improved error management,
some tunable knobs and performance optimizations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: Always "goto immediate_exit" and similar code cleanups.
        Error checking for bdrv_flush.  Call bdrv_set_enable_write_cache
        on the target to make it always writeback.

 block/Makefile.objs |   2 +-
 block/mirror.c      | 234 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block_int.h         |  20 +++++
 qapi-schema.json    |  17 ++++
 trace-events        |   7 ++
 5 file modificati, 279 inserzioni(+). 1 rimozione(-)
 create mode 100644 block/mirror.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c45affc..f1a394a 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -9,4 +9,4 @@ block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 
-common-obj-y += stream.o
+common-obj-y += stream.o mirror.o
diff --git a/block/mirror.c b/block/mirror.c
new file mode 100644
index 0000000..09ea020
--- /dev/null
+++ b/block/mirror.c
@@ -0,0 +1,234 @@
+/*
+ * Image mirroring
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Paolo Bonzini  <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "trace.h"
+#include "blockjob.h"
+#include "block_int.h"
+#include "qemu/ratelimit.h"
+
+enum {
+    /*
+     * Size of data buffer for populating the image file.  This should be large
+     * enough to process multiple clusters in a single call, so that populating
+     * contiguous regions of the image is efficient.
+     */
+    BLOCK_SIZE = 512 * BDRV_SECTORS_PER_DIRTY_CHUNK, /* in bytes */
+};
+
+#define SLICE_TIME 100000000ULL /* ns */
+
+typedef struct MirrorBlockJob {
+    BlockJob common;
+    RateLimit limit;
+    BlockDriverState *target;
+    MirrorSyncMode mode;
+    int64_t sector_num;
+    uint8_t *buf;
+} MirrorBlockJob;
+
+static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
+{
+    BlockDriverState *source = s->common.bs;
+    BlockDriverState *target = s->target;
+    QEMUIOVector qiov;
+    int ret, nb_sectors;
+    int64_t end;
+    struct iovec iov;
+
+    end = s->common.len >> BDRV_SECTOR_BITS;
+    s->sector_num = bdrv_get_next_dirty(source, s->sector_num);
+    nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
+    bdrv_reset_dirty(source, s->sector_num, nb_sectors);
+
+    /* Copy the dirty cluster.  */
+    iov.iov_base = s->buf;
+    iov.iov_len  = nb_sectors * 512;
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
+    ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
+    if (ret < 0) {
+        return ret;
+    }
+    return bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+}
+
+static void coroutine_fn mirror_run(void *opaque)
+{
+    MirrorBlockJob *s = opaque;
+    BlockDriverState *bs = s->common.bs;
+    int64_t sector_num, end;
+    int ret = 0;
+    int n;
+    bool synced = false;
+
+    if (block_job_is_cancelled(&s->common)) {
+        goto immediate_exit;
+    }
+
+    s->common.len = bdrv_getlength(bs);
+    if (s->common.len < 0) {
+        block_job_completed(&s->common, s->common.len);
+        return;
+    }
+
+    end = s->common.len >> BDRV_SECTOR_BITS;
+    s->buf = qemu_blockalign(bs, BLOCK_SIZE);
+
+    if (s->mode != MIRROR_SYNC_MODE_NONE) {
+        /* First part, loop on the sectors and initialize the dirty bitmap.  */
+        BlockDriverState *base;
+        base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
+        for (sector_num = 0; sector_num < end; ) {
+            int64_t next = (sector_num | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
+            ret = bdrv_co_is_allocated_above(bs, base,
+                                             sector_num, next - sector_num, &n);
+
+            if (ret < 0) {
+                goto immediate_exit;
+            }
+
+            assert(n > 0);
+            if (ret == 1) {
+                bdrv_set_dirty(bs, sector_num, n);
+                sector_num = next;
+            } else {
+                sector_num += n;
+            }
+        }
+    }
+
+    s->sector_num = -1;
+    for (;;) {
+        uint64_t delay_ns;
+        int64_t cnt;
+        bool should_complete;
+
+        cnt = bdrv_get_dirty_count(bs);
+        if (cnt != 0) {
+            ret = mirror_iteration(s);
+            if (ret < 0) {
+                goto immediate_exit;
+            }
+            cnt = bdrv_get_dirty_count(bs);
+        }
+
+        should_complete = false;
+        if (cnt == 0) {
+            trace_mirror_before_flush(s);
+            if (bdrv_flush(s->target) < 0) {
+                goto immediate_exit;
+            }
+
+            /* We're out of the streaming phase.  From now on, if the job
+             * is cancelled we will actually complete all pending I/O and
+             * report completion.  This way, block-job-cancel will leave
+             * the target in a consistent state.
+             */
+            synced = true;
+            s->common.offset = end * BDRV_SECTOR_SIZE;
+            should_complete = block_job_is_cancelled(&s->common);
+            cnt = bdrv_get_dirty_count(bs);
+        }
+
+        if (cnt == 0 && should_complete) {
+            /* The dirty bitmap is not updated while operations are pending.
+             * If we're about to exit, wait for pending operations before
+             * calling bdrv_get_dirty_count(bs), or we may exit while the
+             * source has dirty data to copy!
+             *
+             * Note that I/O can be submitted by the guest while
+             * mirror_populate runs.
+             */
+            trace_mirror_before_drain(s, cnt);
+            bdrv_drain_all();
+            cnt = bdrv_get_dirty_count(bs);
+        }
+
+        ret = 0;
+        trace_mirror_before_sleep(s, cnt, synced);
+        if (!synced) {
+            /* Publish progress */
+            s->common.offset = end * BDRV_SECTOR_SIZE - cnt * BLOCK_SIZE;
+
+            if (s->common.speed) {
+                delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
+            } else {
+                delay_ns = 0;
+            }
+
+            /* Note that even when no rate limit is applied we need to yield
+             * with no pending I/O here so that qemu_aio_flush() returns.
+             */
+            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+            if (block_job_is_cancelled(&s->common)) {
+                break;
+            }
+        } else if (!should_complete) {
+            delay_ns = (cnt == 0 ? SLICE_TIME : 0);
+            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+        } else if (cnt == 0) {
+            /* The two disks are in sync.  Exit and report successful
+             * completion.
+             */
+            assert(QLIST_EMPTY(&bs->tracked_requests));
+            s->common.cancelled = false;
+            break;
+        }
+    }
+
+immediate_exit:
+    g_free(s->buf);
+    bdrv_set_dirty_tracking(bs, false);
+    bdrv_close(s->target);
+    bdrv_delete(s->target);
+    block_job_completed(&s->common, ret);
+}
+
+static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    if (speed < 0) {
+        error_set(errp, QERR_INVALID_PARAMETER, "speed");
+        return;
+    }
+    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+}
+
+static BlockJobType mirror_job_type = {
+    .instance_size = sizeof(MirrorBlockJob),
+    .job_type      = "mirror",
+    .set_speed     = mirror_set_speed,
+};
+
+void mirror_start(BlockDriverState *bs, BlockDriverState *target,
+                  int64_t speed, MirrorSyncMode mode,
+                  BlockDriverCompletionFunc *cb,
+                  void *opaque, Error **errp)
+{
+    MirrorBlockJob *s;
+
+    s = block_job_create(&mirror_job_type, bs, speed, cb, opaque, errp);
+    if (!s) {
+        return;
+    }
+
+    s->target = target;
+    s->mode = mode;
+    bdrv_set_dirty_tracking(bs, true);
+    bdrv_set_enable_write_cache(s->target, true);
+    s->common.co = qemu_coroutine_create(mirror_run);
+    trace_mirror_start(bs, s, s->common.co, opaque);
+    qemu_coroutine_enter(s->common.co, s);
+}
diff --git a/block_int.h b/block_int.h
index 057baae..62525cf 100644
--- a/block_int.h
+++ b/block_int.h
@@ -315,4 +315,24 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
+/**
+ * mirror_start:
+ * @bs: Block device to operate on.
+ * @target: Block device to write to.
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @mode: Whether to collapse all images in the chain to the target.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @errp: Error object.
+ *
+ * Start a mirroring operation on @bs.  Clusters that are allocated
+ * in @bs will be written to @bs until the job is cancelled or
+ * manually completed.  At the end of a successful mirroring job,
+ * @bs will be switched to read from @target.
+ */
+void mirror_start(BlockDriverState *bs, BlockDriverState *target,
+                  int64_t speed, MirrorSyncMode mode,
+                  BlockDriverCompletionFunc *cb,
+                  void *opaque, Error **errp);
+
 #endif /* BLOCK_INT_H */
diff --git a/qapi-schema.json b/qapi-schema.json
index b466b7a..9ba2f86 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1127,6 +1127,23 @@
   'data': ['report', 'ignore', 'enospc', 'stop'] }
 
 ##
+# @MirrorSyncMode:
+#
+# An enumeration of possible behaviors for the initial synchronization
+# phase of storage mirroring.
+#
+# @top: copies data in the topmost image to the destination
+#
+# @full: copies data from all images to the destination
+#
+# @none: only copy data written from now on
+#
+# Since: 1.3
+##
+{ 'enum': 'MirrorSyncMode',
+  'data': ['top', 'full', 'none'] }
+
+##
 # @BlockJobInfo:
 #
 # Information about a long-running block device operation.
diff --git a/trace-events b/trace-events
index 833bb44..6ac4e4e 100644
--- a/trace-events
+++ b/trace-events
@@ -75,6 +75,13 @@ bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t c
 stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
 stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p"
 
+# block/mirror.c
+mirror_start(void *bs, void *s, void *co, void *opaque) "bs %p s %p co %p opaque %p"
+mirror_before_flush(void *s) "s %p"
+mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
+mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
+mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
+
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
 qmp_block_job_pause(void *job) "job %p"
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (25 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 26/45] mirror: introduce mirror job Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27  0:14   ` Eric Blake
                     ` (2 more replies)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 28/45] mirror: implement completion Paolo Bonzini
                   ` (18 subsequent siblings)
  45 siblings, 3 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

This adds the monitor commands that start the mirroring job.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 hmp-commands.hx  |  21 ++++++++++
 hmp.c            |  28 +++++++++++++
 hmp.h            |   1 +
 qapi-schema.json |  33 +++++++++++++++
 qmp-commands.hx  |  42 +++++++++++++++++++
 6 file modificati, 249 inserzioni(+). 1 rimozione(-)

diff --git a/blockdev.c b/blockdev.c
index 9069ca1..722aab5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1118,6 +1117,130 @@ void qmp_block_stream(const char *device, bool has_base,
     trace_qmp_block_stream(bs, bs->job);
 }
 
+void qmp_drive_mirror(const char *device, const char *target,
+                      bool has_format, const char *format,
+                      enum MirrorSyncMode sync,
+                      bool has_mode, enum NewImageMode mode,
+                      bool has_speed, int64_t speed, Error **errp)
+{
+    BlockDriverInfo bdi;
+    BlockDriverState *bs;
+    BlockDriverState *source, *target_bs;
+    BlockDriver *proto_drv;
+    BlockDriver *drv = NULL;
+    Error *local_err = NULL;
+    int flags;
+    uint64_t size;
+    int ret;
+
+    if (!has_speed) {
+        speed = 0;
+    }
+    if (!has_mode) {
+        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    }
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+
+    if (!has_format) {
+        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
+    }
+    if (format) {
+        drv = bdrv_find_format(format);
+        if (!drv) {
+            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+            return;
+        }
+    }
+
+    if (!bdrv_is_inserted(bs)) {
+        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+        return;
+    }
+
+    if (bdrv_in_use(bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, device);
+        return;
+    }
+
+    flags = bs->open_flags | BDRV_O_RDWR;
+    source = bs->backing_hd;
+    if (!source && sync == MIRROR_SYNC_MODE_TOP) {
+        sync = MIRROR_SYNC_MODE_FULL;
+    }
+
+    proto_drv = bdrv_find_protocol(target);
+    if (!proto_drv) {
+        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+        return;
+    }
+
+    if (sync == MIRROR_SYNC_MODE_FULL && mode != NEW_IMAGE_MODE_EXISTING) {
+        /* create new image w/o backing file */
+        assert(format && drv);
+        bdrv_get_geometry(bs, &size);
+        size *= 512;
+        ret = bdrv_img_create(target, format,
+                              NULL, NULL, NULL, size, flags);
+    } else {
+        switch (mode) {
+        case NEW_IMAGE_MODE_EXISTING:
+            ret = 0;
+            break;
+        case NEW_IMAGE_MODE_ABSOLUTE_PATHS:
+            /* create new image with backing file */
+            ret = bdrv_img_create(target, format,
+                                  source->filename,
+                                  source->drv->format_name,
+                                  NULL, -1, flags);
+            break;
+        default:
+            abort();
+        }
+    }
+
+    if (ret) {
+        error_set(errp, QERR_OPEN_FILE_FAILED, target);
+        return;
+    }
+
+    target_bs = bdrv_new("");
+    ret = bdrv_open(target_bs, target, flags | BDRV_O_NO_BACKING, drv);
+
+    if (ret < 0) {
+        bdrv_delete(target_bs);
+        error_set(errp, QERR_OPEN_FILE_FAILED, target);
+        return;
+    }
+
+    /* We need a backing file if we will copy parts of a cluster.  */
+    if (bdrv_get_info(target_bs, &bdi) >= 0 && bdi.cluster_size != 0 &&
+        bdi.cluster_size >= BDRV_SECTORS_PER_DIRTY_CHUNK * 512) {
+        ret = bdrv_open_backing_file(target_bs);
+        if (ret < 0) {
+            bdrv_delete(target_bs);
+            error_set(errp, QERR_OPEN_FILE_FAILED, target);
+            return;
+        }
+    }
+
+    mirror_start(bs, target_bs, speed, sync, block_job_cb, bs, &local_err);
+    if (local_err != NULL) {
+        bdrv_delete(target_bs);
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Grab a reference so hotplug does not delete the BlockDriverState from
+     * underneath us.
+     */
+    drive_get_ref(drive_get_by_blockdev(bs));
+}
+
 static BlockJob *find_block_job(const char *device)
 {
     BlockDriverState *bs;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 4e52436..9ac4cf6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1006,6 +1006,27 @@ Snapshot device, using snapshot file as target if provided
 ETEXI
 
     {
+        .name       = "drive_mirror",
+        .args_type  = "reuse:-n,full:-f,device:B,target:s,format:s?",
+        .params     = "[-n] [-f] device target [format]",
+        .help       = "initiates live storage\n\t\t\t"
+                      "migration for a device. The device's contents are\n\t\t\t"
+                      "copied to the new image file, including data that\n\t\t\t"
+                      "is written after the command is started.\n\t\t\t"
+                      "The -n flag requests QEMU to reuse the image found\n\t\t\t"
+                      "in new-image-file, instead of recreating it from scratch.\n\t\t\t"
+                      "The -f flag requests QEMU to copy the whole disk,\n\t\t\t"
+                      "so that the result does not need a backing file.\n\t\t\t",
+        .mhandler.cmd = hmp_drive_mirror,
+    },
+STEXI
+@item drive_mirror
+@findex drive_mirror
+Start mirroring a block device's writes to a new destination,
+using the specified target.
+ETEXI
+
+    {
         .name       = "drive_add",
         .args_type  = "pci_addr:s,opts:s",
         .params     = "[[<domain>:]<bus>:]<slot>\n"
diff --git a/hmp.c b/hmp.c
index 7819110..94d4d41 100644
--- a/hmp.c
+++ b/hmp.c
@@ -759,6 +759,34 @@ void hmp_block_resize(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &errp);
 }
 
+void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    const char *filename = qdict_get_str(qdict, "target");
+    const char *format = qdict_get_try_str(qdict, "format");
+    int reuse = qdict_get_try_bool(qdict, "reuse", 0);
+    int full = qdict_get_try_bool(qdict, "full", 0);
+    enum NewImageMode mode;
+    Error *errp = NULL;
+
+    if (!filename) {
+        error_set(&errp, QERR_MISSING_PARAMETER, "target");
+        hmp_handle_error(mon, &errp);
+        return;
+    }
+
+    if (reuse) {
+        mode = NEW_IMAGE_MODE_EXISTING;
+    } else {
+        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    }
+
+    qmp_drive_mirror(device, filename, !!format, format,
+                     full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
+                     true, mode, false, 0, &errp);
+    hmp_handle_error(mon, &errp);
+}
+
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
 {
     const char *device = qdict_get_str(qdict, "device");
diff --git a/hmp.h b/hmp.h
index 7bdd23c..34eb2b3 100644
--- a/hmp.h
+++ b/hmp.h
@@ -51,6 +51,7 @@ void hmp_block_passwd(Monitor *mon, const QDict *qdict);
 void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_block_resize(Monitor *mon, const QDict *qdict);
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
+void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index 9ba2f86..4827ed3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1529,6 +1529,39 @@
   'returns': 'str' }
 
 ##
+# @drive-mirror
+#
+# Start mirroring a block device's writes to a new destination.
+#
+# @device:  the name of the device whose writes should be mirrored.
+#
+# @target: the target of the new image. If the file exists, or if it
+#          is a device, the existing file/device will be used as the new
+#          destination.  If it does not exist, a new file will be created.
+#
+# @format: #optional the format of the new destination, default is to
+#          probe is @mode is 'existing', else the format of the source
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+#        'absolute-paths'.
+#
+# @speed:  #optional the maximum speed, in bytes per second
+#
+# @sync: what parts of the disk image should be copied to the destination
+#        (all the disk, only the sectors allocated in the topmost image, or
+#        only new I/O).
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#
+# Since 1.3
+##
+{ 'command': 'drive-mirror',
+  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
+            '*speed': 'int' } }
+
+##
 # @migrate_cancel
 #
 # Cancel the current executing migration process.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 017544e..25800a8 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -906,6 +906,48 @@ Example:
 EQMP
 
     {
+        .name       = "drive-mirror",
+        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?",
+        .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
+    },
+
+SQMP
+drive-mirror
+------------
+
+Start mirroring a block device's writes to a new destination. target
+specifies the target of the new image. If the file exists, or if it is
+a device, it will be used as the new destination for writes. If does not
+exist, a new file will be created. format specifies the format of the
+mirror image, default is to probe if mode='existing', else the format
+of the source.
+
+Arguments:
+
+- "device": device name to operate on (json-string)
+- "target": name of new image file (json-string)
+- "format": format of new image (json-string, optional)
+- "mode": how an image file should be created into the target
+  file/device (NewImageMode, optional, default 'absolute-paths')
+- "speed": maximum speed of the streaming job, in bytes per second
+  (json-int)
+- "sync": what parts of the disk image should be copied to the destination;
+  possibilities include "full" for all the disk, "top" for only the sectors
+  allocated in the topmost image, or "none" to only replicate new I/O
+  (MirrorSyncMode).
+
+
+Example:
+
+-> { "execute": "drive-mirror", "arguments": { "device": "ide-hd0",
+                                               "target": "/some/place/my-image",
+                                               "sync": "full",
+                                               "format": "qcow2" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "balloon",
         .args_type  = "value:M",
         .mhandler.cmd_new = qmp_marshal_input_balloon,
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 28/45] mirror: implement completion
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (26 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-10-15 17:49   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case Paolo Bonzini
                   ` (17 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Switching to the target of the migration is done mostly asynchronously,
and reported to management via the BLOCK_JOB_COMPLETED event; the only
synchronous phase is opening the backing files.  bdrv_open_backing_file
can always be done, even for migration of the full image (aka sync:
'full').  In this case, qmp_drive_mirror will create the target disk
with no backing file at all, and bdrv_open_backing_file will be a no-op.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file modificato, 36 inserzioni(+), 5 rimozioni(-)

diff --git a/block/mirror.c b/block/mirror.c
index 09ea020..939834d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -32,6 +32,8 @@ typedef struct MirrorBlockJob {
     RateLimit limit;
     BlockDriverState *target;
     MirrorSyncMode mode;
+    bool synced;
+    bool complete;
     int64_t sector_num;
     uint8_t *buf;
 } MirrorBlockJob;
@@ -70,7 +72,6 @@ static void coroutine_fn mirror_run(void *opaque)
     int64_t sector_num, end;
     int ret = 0;
     int n;
-    bool synced = false;
 
     if (block_job_is_cancelled(&s->common)) {
         goto immediate_exit;
@@ -135,9 +136,13 @@ static void coroutine_fn mirror_run(void *opaque)
              * report completion.  This way, block-job-cancel will leave
              * the target in a consistent state.
              */
-            synced = true;
             s->common.offset = end * BDRV_SECTOR_SIZE;
-            should_complete = block_job_is_cancelled(&s->common);
+            if (!s->synced) {
+                block_job_ready(&s->common);
+                s->synced = true;
+            }
+
+            should_complete = block_job_is_cancelled(&s->common) || s->complete;
             cnt = bdrv_get_dirty_count(bs);
         }
 
@@ -156,8 +161,8 @@ static void coroutine_fn mirror_run(void *opaque)
         }
 
         ret = 0;
-        trace_mirror_before_sleep(s, cnt, synced);
-        if (!synced) {
+        trace_mirror_before_sleep(s, cnt, s->synced);
+        if (!s->synced) {
             /* Publish progress */
             s->common.offset = end * BDRV_SECTOR_SIZE - cnt * BLOCK_SIZE;
 
@@ -190,6 +195,9 @@ static void coroutine_fn mirror_run(void *opaque)
 immediate_exit:
     g_free(s->buf);
     bdrv_set_dirty_tracking(bs, false);
+    if (s->complete && ret == 0) {
+        bdrv_swap(s->target, s->common.bs);
+    }
     bdrv_close(s->target);
     bdrv_delete(s->target);
     block_job_completed(&s->common, ret);
@@ -206,10 +214,33 @@ static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
     ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
 }
 
+static void mirror_complete(BlockJob *job, Error **errp)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+    int ret;
+
+    ret = bdrv_open_backing_file(s->target);
+    if (ret < 0) {
+        char backing_filename[PATH_MAX];
+        bdrv_get_full_backing_filename(s->target, backing_filename,
+                                       sizeof(backing_filename));
+        error_set(errp, QERR_OPEN_FILE_FAILED, backing_filename);
+        return;
+    }
+    if (!s->synced) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_READY, job->bs->device_name);
+        return;
+    }
+
+    s->complete = true;
+    block_job_resume(job);
+}
+
 static BlockJobType mirror_job_type = {
     .instance_size = sizeof(MirrorBlockJob),
     .job_type      = "mirror",
     .set_speed     = mirror_set_speed,
+    .complete      = mirror_complete,
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (27 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 28/45] mirror: implement completion Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27  0:26   ` Eric Blake
  2012-10-18 12:43   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 30/45] iostatus: forward block_job_iostatus_reset to block job Paolo Bonzini
                   ` (16 subsequent siblings)
  45 siblings, 2 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/040     | 353 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |   5 +
 tests/qemu-iotests/group   |   3 +-
 3 file modificati, 360 inserzioni(+). 1 rimozione(-)
 create mode 100755 tests/qemu-iotests/040
 create mode 100644 tests/qemu-iotests/040.out

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
new file mode 100755
index 0000000..44f1e56
--- /dev/null
+++ b/tests/qemu-iotests/040
@@ -0,0 +1,353 @@
+#!/usr/bin/env python
+#
+# Tests for image mirroring.
+#
+# Copyright (C) 2012 IBM Corp.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import time
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+import struct
+
+backing_img = os.path.join(iotests.test_dir, 'backing.img')
+target_backing_img = os.path.join(iotests.test_dir, 'target-backing.img')
+test_img = os.path.join(iotests.test_dir, 'test.img')
+target_img = os.path.join(iotests.test_dir, 'target.img')
+
+class ImageMirroringTestCase(iotests.QMPTestCase):
+    '''Abstract base class for image mirroring test cases'''
+
+    def assert_no_active_mirrors(self):
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return', [])
+
+    def cancel_and_wait(self, drive='drive0', wait_ready=True):
+        '''Cancel a block job and wait for it to finish'''
+        if wait_ready:
+            ready = False
+            while not ready:
+                for event in self.vm.get_qmp_events(wait=True):
+                    if event['event'] == 'BLOCK_JOB_READY':
+                        self.assert_qmp(event, 'data/type', 'mirror')
+                        self.assert_qmp(event, 'data/device', drive)
+                        ready = True
+
+        result = self.vm.qmp('block-job-cancel', device=drive,
+                             force=not wait_ready)
+        self.assert_qmp(result, 'return', {})
+
+        cancelled = False
+        while not cancelled:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED' or \
+                   event['event'] == 'BLOCK_JOB_CANCELLED':
+                    self.assert_qmp(event, 'data/type', 'mirror')
+                    self.assert_qmp(event, 'data/device', drive)
+                    if wait_ready:
+                        self.assertEquals(event['event'], 'BLOCK_JOB_COMPLETED')
+                        self.assert_qmp(event, 'data/offset', self.image_len)
+                        self.assert_qmp(event, 'data/len', self.image_len)
+                    cancelled = True
+
+        self.assert_no_active_mirrors()
+
+    def complete_and_wait(self, drive='drive0', wait_ready=True):
+        '''Complete a block job and wait for it to finish'''
+        if wait_ready:
+            ready = False
+            while not ready:
+                for event in self.vm.get_qmp_events(wait=True):
+                    if event['event'] == 'BLOCK_JOB_READY':
+                        self.assert_qmp(event, 'data/type', 'mirror')
+                        self.assert_qmp(event, 'data/device', drive)
+                        ready = True
+
+        result = self.vm.qmp('block-job-complete', device=drive)
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'mirror')
+                    self.assert_qmp(event, 'data/device', drive)
+                    self.assert_qmp_absent(event, 'data/error')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_mirrors()
+
+    def create_image(self, name, size):
+        file = open(name, 'w')
+        i = 0
+        while i < size:
+            sector = struct.pack('>l504xl', i / 512, i / 512)
+            file.write(sector)
+            i = i + 512
+        file.close()
+
+    def compare_images(self, img1, img2):
+        try:
+            qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img1, img1 + '.raw')
+            qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img2, img2 + '.raw')
+            file1 = open(img1 + '.raw', 'r')
+            file2 = open(img2 + '.raw', 'r')
+            return file1.read() == file2.read()
+        finally:
+            if file1 is not None:
+                file1.close()
+            if file2 is not None:
+                file2.close()
+            try:
+                os.remove(img1 + '.raw')
+            except OSError:
+                pass
+            try:
+                os.remove(img2 + '.raw')
+            except OSError:
+                pass
+
+class TestSingleDrive(ImageMirroringTestCase):
+    image_len = 1 * 1024 * 1024 # MB
+
+    def setUp(self):
+        self.create_image(backing_img, TestSingleDrive.image_len)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        try:
+            os.remove(target_img)
+        except OSError:
+            pass
+
+    def test_complete(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_cancel(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.cancel_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', test_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_pause(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-pause', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        offset = self.dictpath(result, 'return[0]/offset')
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/offset', offset)
+
+        result = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_large_cluster(self):
+        self.assert_no_active_mirrors()
+
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
+                        % (TestSingleDrive.image_len, backing_img), target_img)
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_image_not_found(self):
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        # Avoid failure on os.remove
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
+                        % (TestSingleDrive.image_len, test_img), target_img)
+
+    def test_device_not_found(self):
+        result = self.vm.qmp('drive-mirror', device='nonexistent', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+
+        # Avoid failure on os.remove
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
+                        % (TestSingleDrive.image_len, test_img), target_img)
+
+class TestMirrorNoBacking(ImageMirroringTestCase):
+    image_len = 2 * 1024 * 1024 # MB
+
+    def complete_and_wait(self, drive='drive0', wait_ready=True):
+        self.create_image(target_backing_img, TestMirrorNoBacking.image_len)
+        return ImageMirroringTestCase.complete_and_wait(self, drive, wait_ready)
+
+    def compare_images(self, img1, img2):
+        self.create_image(target_backing_img, TestMirrorNoBacking.image_len)
+        return ImageMirroringTestCase.compare_images(self, img1, img2)
+
+    def setUp(self):
+        self.create_image(backing_img, TestMirrorNoBacking.image_len)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(target_backing_img)
+        os.remove(target_img)
+
+    def test_complete(self):
+        self.assert_no_active_mirrors()
+
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, target_img)
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_cancel(self):
+        self.assert_no_active_mirrors()
+
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, target_img)
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.cancel_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', test_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+class TestSetSpeed(ImageMirroringTestCase):
+    image_len = 80 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', backing_img, str(TestSetSpeed.image_len))
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(target_img)
+
+    def test_set_speed(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        # Default speed is 0
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 0)
+
+        result = self.vm.qmp('block-job-set-speed', device='drive0', speed=8 * 1024 * 1024)
+        self.assert_qmp(result, 'return', {})
+
+        # Ensure the speed we set was accepted
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 8 * 1024 * 1024)
+
+        self.cancel_and_wait()
+
+        # Check setting speed in drive-mirror works
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img, speed=4*1024*1024)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 4 * 1024 * 1024)
+
+        self.cancel_and_wait()
+
+    def test_set_speed_invalid(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img, speed=-1)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-set-speed', device='drive0', speed=-1)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        self.cancel_and_wait()
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
new file mode 100644
index 0000000..36376be
--- /dev/null
+++ b/tests/qemu-iotests/040.out
@@ -0,0 +1,5 @@
+..........
+----------------------------------------------------------------------
+Ran 10 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index fa4da7d..90cff79 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -45,4 +45,5 @@
 036 rw auto quick
 037 rw auto backing
 038 rw auto backing
-039 rw auto
+039 rw auto
+040 rw auto backing
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 30/45] iostatus: forward block_job_iostatus_reset to block job
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (28 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error Paolo Bonzini
                   ` (15 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c    | 3 +++
 blockjob.c | 3 +++
 blockjob.h | 6 +++++-
 3 file modificati, 11 inserzioni(+). 1 rimozione(-)

diff --git a/block.c b/block.c
index 074325d..0f57ec1 100644
--- a/block.c
+++ b/block.c
@@ -4162,6 +4162,9 @@ void bdrv_iostatus_reset(BlockDriverState *bs)
 {
     if (bdrv_iostatus_is_enabled(bs)) {
         bs->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
+        if (bs->job) {
+            block_job_iostatus_reset(bs->job);
+        }
     }
 }
 
diff --git a/blockjob.c b/blockjob.c
index 8d2687c..52d290b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -142,6 +142,9 @@ bool block_job_is_cancelled(BlockJob *job)
 void block_job_iostatus_reset(BlockJob *job)
 {
     job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
+    if (job->job_type->iostatus_reset) {
+        job->job_type->iostatus_reset(job);
+    }
 }
 
 struct BlockCancelData {
diff --git a/blockjob.h b/blockjob.h
index be37cc1..32bd793 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -42,6 +42,9 @@ typedef struct BlockJobType {
     /** Optional callback for job types that support setting a speed limit */
     void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
 
+    /** Optional callback for job types that need to forward I/O status reset */
+    void (*iostatus_reset)(BlockJob *job);
+
     /**
      * Optional callback for job types whose completion must be triggered
      * manually.
@@ -254,7 +257,8 @@ int block_job_cancel_sync(BlockJob *job);
  * block_job_iostatus_reset:
  * @job: The job whose I/O status should be reset.
  *
- * Reset I/O status on @job.
+ * Reset I/O status on @job and on BlockDriverState objects it uses,
+ * other than job->bs.
  */
 void block_job_iostatus_reset(BlockJob *job);
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (29 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 30/45] iostatus: forward block_job_iostatus_reset to block job Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-10-18 13:07   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 32/45] qmp: add pull_event function Paolo Bonzini
                   ` (14 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Error management is important for mirroring; otherwise, an error on the
target (even something as "innocent" as ENOSPC) requires to start again
with a full copy.  Similar to on_read_error/on_write_error, two separate
knobs are provided for on_source_error (reads) and on_target_error (writes).
The default is 'report' for both.

The 'ignore' policy will leave the sector dirty, so that it will be
retried later.  Thus, it will not cause corruption.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: error handling for bdrv_flush, introduce mirror_error_action

 block/mirror.c   | 95 +++++++++++++++++++++++++++++++++++++++++++-------------
 block_int.h      |  4 +++
 blockdev.c       | 14 +++++++--
 hmp.c            |  3 +-
 qapi-schema.json | 11 ++++++-
 qmp-commands.hx  |  8 ++++-
 6 file modificati, 109 inserzioni(+), 26 rimozioni(-)

diff --git a/block/mirror.c b/block/mirror.c
index 939834d..caec272 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -32,13 +32,28 @@ typedef struct MirrorBlockJob {
     RateLimit limit;
     BlockDriverState *target;
     MirrorSyncMode mode;
+    BlockdevOnError on_source_error, on_target_error;
     bool synced;
     bool complete;
     int64_t sector_num;
     uint8_t *buf;
 } MirrorBlockJob;
 
-static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
+static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
+                                            int error)
+{
+    s->synced = false;
+    if (read) {
+        return block_job_error_action(&s->common, s->common.bs,
+                                      s->on_source_error, true, error);
+    } else {
+        return block_job_error_action(&s->common, s->target,
+                                      s->on_target_error, false, error);
+    }
+}
+
+static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
+                                         BlockErrorAction *p_action)
 {
     BlockDriverState *source = s->common.bs;
     BlockDriverState *target = s->target;
@@ -60,9 +75,21 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
     trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
     ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
     if (ret < 0) {
-        return ret;
+        *p_action = mirror_error_action(s, true, -ret);
+        goto fail;
+    }
+    ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+    if (ret < 0) {
+        *p_action = mirror_error_action(s, false, -ret);
+        s->synced = false;
+        goto fail;
     }
-    return bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+    return 0;
+
+fail:
+    /* Try again later.  */
+    bdrv_set_dirty(source, s->sector_num, nb_sectors);
+    return ret;
 }
 
 static void coroutine_fn mirror_run(void *opaque)
@@ -117,8 +144,9 @@ static void coroutine_fn mirror_run(void *opaque)
 
         cnt = bdrv_get_dirty_count(bs);
         if (cnt != 0) {
-            ret = mirror_iteration(s);
-            if (ret < 0) {
+            BlockErrorAction action = BDRV_ACTION_REPORT;
+            ret = mirror_iteration(s, &action);
+            if (ret < 0 && action == BDRV_ACTION_REPORT) {
                 goto immediate_exit;
             }
             cnt = bdrv_get_dirty_count(bs);
@@ -127,23 +155,26 @@ static void coroutine_fn mirror_run(void *opaque)
         should_complete = false;
         if (cnt == 0) {
             trace_mirror_before_flush(s);
-            if (bdrv_flush(s->target) < 0) {
-                goto immediate_exit;
-            }
-
-            /* We're out of the streaming phase.  From now on, if the job
-             * is cancelled we will actually complete all pending I/O and
-             * report completion.  This way, block-job-cancel will leave
-             * the target in a consistent state.
-             */
-            s->common.offset = end * BDRV_SECTOR_SIZE;
-            if (!s->synced) {
-                block_job_ready(&s->common);
-                s->synced = true;
+            ret = bdrv_flush(s->target);
+            if (ret < 0) {
+                if (mirror_error_action(s, false, -ret) == BDRV_ACTION_REPORT) {
+                    goto immediate_exit;
+                }
+            } else {
+                /* We're out of the streaming phase.  From now on, if the job
+                 * is cancelled we will actually complete all pending I/O and
+                 * report completion.  This way, block-job-cancel will leave
+                 * the target in a consistent state.
+                 */
+                s->common.offset = end * BDRV_SECTOR_SIZE;
+                if (!s->synced) {
+                    block_job_ready(&s->common);
+                    s->synced = true;
+                }
+
+                should_complete = block_job_is_cancelled(&s->common) || s->complete;
+                cnt = bdrv_get_dirty_count(bs);
             }
-
-            should_complete = block_job_is_cancelled(&s->common) || s->complete;
-            cnt = bdrv_get_dirty_count(bs);
         }
 
         if (cnt == 0 && should_complete) {
@@ -195,6 +226,7 @@ static void coroutine_fn mirror_run(void *opaque)
 immediate_exit:
     g_free(s->buf);
     bdrv_set_dirty_tracking(bs, false);
+    bdrv_iostatus_disable(s->target);
     if (s->complete && ret == 0) {
         bdrv_swap(s->target, s->common.bs);
     }
@@ -214,6 +246,13 @@ static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
     ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
 }
 
+static void mirror_iostatus_reset(BlockJob *job)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    bdrv_iostatus_reset(s->target);
+}
+
 static void mirror_complete(BlockJob *job, Error **errp)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
@@ -240,25 +279,39 @@ static BlockJobType mirror_job_type = {
     .instance_size = sizeof(MirrorBlockJob),
     .job_type      = "mirror",
     .set_speed     = mirror_set_speed,
+    .iostatus_reset= mirror_iostatus_reset,
     .complete      = mirror_complete,
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, MirrorSyncMode mode,
+                  BlockdevOnError on_source_error,
+                  BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp)
 {
     MirrorBlockJob *s;
 
+    if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
+         on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
+        !bdrv_iostatus_is_enabled(bs)) {
+        error_set(errp, QERR_INVALID_PARAMETER, "on-source-error");
+        return;
+    }
+
     s = block_job_create(&mirror_job_type, bs, speed, cb, opaque, errp);
     if (!s) {
         return;
     }
 
+    s->on_source_error = on_source_error;
+    s->on_target_error = on_target_error;
     s->target = target;
     s->mode = mode;
     bdrv_set_dirty_tracking(bs, true);
     bdrv_set_enable_write_cache(s->target, true);
+    bdrv_set_on_error(s->target, on_target_error, on_target_error);
+    bdrv_iostatus_enable(s->target);
     s->common.co = qemu_coroutine_create(mirror_run);
     trace_mirror_start(bs, s, s->common.co, opaque);
     qemu_coroutine_enter(s->common.co, s);
diff --git a/block_int.h b/block_int.h
index 62525cf..a533c7b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -321,6 +321,8 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @target: Block device to write to.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @mode: Whether to collapse all images in the chain to the target.
+ * @on_source_error: The action to take upon error reading from the source.
+ * @on_target_error: The action to take upon error writing to the target.
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
@@ -332,6 +334,8 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  */
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, MirrorSyncMode mode,
+                  BlockdevOnError on_source_error,
+                  BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
diff --git a/blockdev.c b/blockdev.c
index 722aab5..84fee2f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1121,7 +1121,10 @@ void qmp_drive_mirror(const char *device, const char *target,
                       bool has_format, const char *format,
                       enum MirrorSyncMode sync,
                       bool has_mode, enum NewImageMode mode,
-                      bool has_speed, int64_t speed, Error **errp)
+                      bool has_speed, int64_t speed,
+                      bool has_on_source_error, BlockdevOnError on_source_error,
+                      bool has_on_target_error, BlockdevOnError on_target_error,
+                      Error **errp)
 {
     BlockDriverInfo bdi;
     BlockDriverState *bs;
@@ -1136,6 +1139,12 @@ void qmp_drive_mirror(const char *device, const char *target,
     if (!has_speed) {
         speed = 0;
     }
+    if (!has_on_source_error) {
+        on_source_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
+    if (!has_on_target_error) {
+        on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
     if (!has_mode) {
         mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
     }
@@ -1228,7 +1237,8 @@ void qmp_drive_mirror(const char *device, const char *target,
         }
     }
 
-    mirror_start(bs, target_bs, speed, sync, block_job_cb, bs, &local_err);
+    mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
+                 block_job_cb, bs, &local_err);
     if (local_err != NULL) {
         bdrv_delete(target_bs);
         error_propagate(errp, local_err);
diff --git a/hmp.c b/hmp.c
index 94d4d41..b4d2736 100644
--- a/hmp.c
+++ b/hmp.c
@@ -783,7 +783,8 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
 
     qmp_drive_mirror(device, filename, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-                     true, mode, false, 0, &errp);
+                     true, mode, false, 0,
+                     false, 0, false, 0, &errp);
     hmp_handle_error(mon, &errp);
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 4827ed3..2947206 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1551,6 +1551,14 @@
 #        (all the disk, only the sectors allocated in the topmost image, or
 #        only new I/O).
 #
+# @on-source-error: #optional the action to take on an error on the source,
+#                   default 'report'.  'stop' and 'enospc' can only be used
+#                   if the block device supports io-status (see BlockInfo).
+#
+# @on-target-error: #optional the action to take on an error on the target,
+#                   default 'report' (no limitations, since this applies to
+#                   a different block device than @device).
+#
 # Returns: nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #
@@ -1559,7 +1567,8 @@
 { 'command': 'drive-mirror',
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
-            '*speed': 'int' } }
+            '*speed': 'int', '*on-source-error': 'BlockdevOnError',
+            '*on-target-error': 'BlockdevOnError' } }
 
 ##
 # @migrate_cancel
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 25800a8..ec97eaa 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -907,7 +907,8 @@ EQMP
 
     {
         .name       = "drive-mirror",
-        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?",
+        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
+                      "on-source-error:s?,on-target-error:s?",
         .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
     },
 
@@ -935,6 +936,11 @@ Arguments:
   possibilities include "full" for all the disk, "top" for only the sectors
   allocated in the topmost image, or "none" to only replicate new I/O
   (MirrorSyncMode).
+- "on-source-error": the action to take on an error on the source
+  (BlockdevOnError, default 'report')
+- "on-target-error": the action to take on an error on the target
+  (BlockdevOnError, default 'report')
+
 
 
 Example:
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 32/45] qmp: add pull_event function
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (30 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 17:17   ` Luiz Capitulino
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 33/45] qemu-iotests: add testcases for mirroring on-source-error/on-target-error Paolo Bonzini
                   ` (13 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, Luiz Capitulino

This function is unlike get_events in that it makes it easy to process
one event at a time.  This is useful in the mirroring test cases, where
we want to process just one event (BLOCK_JOB_ERROR) and leave the others
to a helper function.

Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 QMP/qmp.py | 20 ++++++++++++++++++++
 1 file modificato, 20 inserzioni(+)

diff --git a/QMP/qmp.py b/QMP/qmp.py
index 36ecc1d..7a598f1 100644
--- a/QMP/qmp.py
+++ b/QMP/qmp.py
@@ -134,6 +134,26 @@ class QEMUMonitorProtocol:
             raise Exception(ret['error']['desc'])
         return ret['return']
 
+    def pull_event(self, wait=False):
+        """
+        Get and delete the first available QMP event.
+
+        @param wait: block until an event is available (bool)
+        """
+        self.__sock.setblocking(0)
+        try:
+            self.__json_read()
+        except socket.error, err:
+            if err[0] == errno.EAGAIN:
+                # No data available
+                pass
+        self.__sock.setblocking(1)
+        if not self.__events and wait:
+            self.__json_read(only_event=True)
+        event = self.__events[0]
+        del self.__events[0]
+        return event
+
     def get_events(self, wait=False):
         """
         Get a list of available QMP events.
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 33/45] qemu-iotests: add testcases for mirroring on-source-error/on-target-error
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (31 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 32/45] qmp: add pull_event function Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 34/45] host-utils: add ffsl Paolo Bonzini
                   ` (12 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

The new options are tested with blkdebug on both the source and the
target.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/040        | 253 ++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out    |   4 +-
 tests/qemu-iotests/iotests.py |   4 +
 3 file modificati, 259 inserzioni(+), 2 rimozioni(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 44f1e56..ec86c70 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -283,6 +283,259 @@ class TestMirrorNoBacking(ImageMirroringTestCase):
         self.assertTrue(self.compare_images(test_img, target_img),
                         'target image does not match source after mirroring')
 
+class TestReadErrors(ImageMirroringTestCase):
+    image_len = 2 * 1024 * 1024 # MB
+
+    # this should be a multiple of twice the default granularity
+    # so that we hit this offset first in state 1
+    MIRROR_GRANULARITY = 1024 * 1024
+
+    def create_blkdebug_file(self, name, event, errno):
+        file = open(name, 'w')
+        file.write('''
+[inject-error]
+state = "1"
+event = "%s"
+errno = "%d"
+immediately = "off"
+once = "on"
+sector = "%d"
+
+[set-state]
+state = "1"
+event = "%s"
+new_state = "2"
+
+[set-state]
+state = "2"
+event = "%s"
+new_state = "1"
+''' % (event, errno, self.MIRROR_GRANULARITY / 512, event, event))
+        file.close()
+
+    def setUp(self):
+        self.blkdebug_file = backing_img + ".blkdebug"
+        self.create_image(backing_img, TestReadErrors.image_len)
+        self.create_blkdebug_file(self.blkdebug_file, "read_aio", 5)
+        qemu_img('create', '-f', iotests.imgfmt,
+                 '-o', 'backing_file=blkdebug:%s:%s,backing_fmt=raw'
+                       % (self.blkdebug_file, backing_img),
+                 test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(self.blkdebug_file)
+
+    def test_report_read(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        error = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_READY':
+                    self.assertTrue(False, 'job completed unexpectedly')
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'mirror')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/error', 'Input/output error')
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_mirrors()
+        self.vm.shutdown()
+
+    def test_ignore_read(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img, on_source_error='ignore')
+        self.assert_qmp(result, 'return', {})
+
+        event = self.vm.get_qmp_event(wait=True)
+        self.assertEquals(event['event'], 'BLOCK_JOB_ERROR')
+        self.assert_qmp(event, 'data/device', 'drive0')
+        self.assert_qmp(event, 'data/operation', 'read')
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/paused', False)
+        self.complete_and_wait()
+        self.vm.shutdown()
+
+    def test_stop_read(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             target=target_img, on_source_error='stop')
+        self.assert_qmp(result, 'return', {})
+
+        error = False
+        ready = False
+        while not ready:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'read')
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', True)
+                    self.assert_qmp(result, 'return[0]/io-status', 'failed')
+
+                    result = self.vm.qmp('block-job-resume', device='drive0')
+                    self.assert_qmp(result, 'return', {})
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_READY':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    ready = True
+
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/paused', False)
+        self.assert_qmp(result, 'return[0]/io-status', 'ok')
+
+        self.complete_and_wait(wait_ready=False)
+        self.assert_no_active_mirrors()
+        self.vm.shutdown()
+
+class TestWriteErrors(ImageMirroringTestCase):
+    image_len = 2 * 1024 * 1024 # MB
+
+    # this should be a multiple of twice the default granularity
+    # so that we hit this offset first in state 1
+    MIRROR_GRANULARITY = 1024 * 1024
+
+    def create_blkdebug_file(self, name, event, errno):
+        file = open(name, 'w')
+        file.write('''
+[inject-error]
+state = "1"
+event = "%s"
+errno = "%d"
+immediately = "off"
+once = "on"
+sector = "%d"
+
+[set-state]
+state = "1"
+event = "%s"
+new_state = "2"
+
+[set-state]
+state = "2"
+event = "%s"
+new_state = "1"
+''' % (event, errno, self.MIRROR_GRANULARITY / 512, event, event))
+        file.close()
+
+    def setUp(self):
+        self.blkdebug_file = target_img + ".blkdebug"
+        self.create_image(backing_img, TestWriteErrors.image_len)
+        self.create_blkdebug_file(self.blkdebug_file, "write_aio", 5)
+        qemu_img('create', '-f', iotests.imgfmt, '-obacking_file=%s' %(backing_img), test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.target_img = 'blkdebug:%s:%s' % (self.blkdebug_file, target_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-osize=%d' %(TestWriteErrors.image_len), target_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(backing_img)
+        os.remove(self.blkdebug_file)
+
+    def test_report_write(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=self.target_img)
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        error = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'write')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_READY':
+                    self.assertTrue(False, 'job completed unexpectedly')
+                elif event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/type', 'mirror')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/error', 'Input/output error')
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_mirrors()
+        self.vm.shutdown()
+
+    def test_ignore_write(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=self.target_img,
+                             on_target_error='ignore')
+        self.assert_qmp(result, 'return', {})
+
+        event = self.vm.get_qmp_event(wait=True)
+        self.assertEquals(event['event'], 'BLOCK_JOB_ERROR')
+        self.assert_qmp(event, 'data/device', 'drive0')
+        self.assert_qmp(event, 'data/operation', 'write')
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/paused', False)
+        self.complete_and_wait()
+        self.vm.shutdown()
+
+    def test_stop_write(self):
+        self.assert_no_active_mirrors()
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=self.target_img,
+                             on_target_error='stop')
+        self.assert_qmp(result, 'return', {})
+
+        error = False
+        ready = False
+        while not ready:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_ERROR':
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/operation', 'write')
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', True)
+                    self.assert_qmp(result, 'return[0]/io-status', 'failed')
+
+                    result = self.vm.qmp('block-job-resume', device='drive0')
+                    self.assert_qmp(result, 'return', {})
+
+                    result = self.vm.qmp('query-block-jobs')
+                    self.assert_qmp(result, 'return[0]/paused', False)
+                    self.assert_qmp(result, 'return[0]/io-status', 'ok')
+                    error = True
+                elif event['event'] == 'BLOCK_JOB_READY':
+                    self.assertTrue(error, 'job completed unexpectedly')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    ready = True
+
+        self.complete_and_wait(wait_ready=False)
+        self.assert_no_active_mirrors()
+        self.vm.shutdown()
+
 class TestSetSpeed(ImageMirroringTestCase):
     image_len = 80 * 1024 * 1024 # MB
 
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 36376be..b6f2576 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-..........
+................
 ----------------------------------------------------------------------
-Ran 10 tests
+Ran 16 tests
 
 OK
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 3c60b2d..735c674 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -106,6 +106,10 @@ class VM(object):
 
         return self._qmp.cmd(cmd, args=qmp_args)
 
+    def get_qmp_event(self, wait=False):
+        '''Poll for one queued QMP events and return it'''
+        return self._qmp.pull_event(wait=wait)
+
     def get_qmp_events(self, wait=False):
         '''Poll for queued QMP events and return a list of dicts'''
         events = self._qmp.get_events(wait=wait)
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 34/45] host-utils: add ffsl
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (32 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 33/45] qemu-iotests: add testcases for mirroring on-source-error/on-target-error Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27  1:14   ` Eric Blake
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases Paolo Bonzini
                   ` (11 subsequent siblings)
  45 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

We can provide fast versions based on the other functions defined
by host-utils.h.  Some care is required on glibc, which provides
ffsl already.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 host-utils.h | 26 ++++++++++++++++++++++++++
 1 file modificato, 26 inserzioni(+)

diff --git a/host-utils.h b/host-utils.h
index 821db93..2724be0 100644
--- a/host-utils.h
+++ b/host-utils.h
@@ -24,6 +24,7 @@
  */
 
 #include "compiler.h"   /* QEMU_GNUC_PREREQ */
+#include <string.h>     /* ffsl */
 
 #if defined(__x86_64__)
 #define __HAVE_FAST_MULU64__
@@ -234,3 +235,28 @@ static inline int ctpop64(uint64_t val)
     return val;
 #endif
 }
+
+/* glibc does not provide an inline version of ffsl, so always define
+ * ours.  We need to give it a different name, however.
+ */
+#ifdef __GLIBC__
+#define ffsl qemu_ffsl
+#endif
+static inline int ffsl(long val)
+{
+    if (!val) {
+        return 0;
+    }
+
+#if QEMU_GNUC_PREREQ(3, 4)
+    return __builtin_ctzl(val) + 1;
+#else
+    if (sizeof(long) == 4) {
+        return ctz32(val) + 1;
+    } else if (sizeof(long) == 8) {
+        return ctz64(val) + 1;
+    } else {
+        abort();
+    }
+#endif
+}
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (33 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 34/45] host-utils: add ffsl Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27  2:53   ` Eric Blake
  2012-10-24 14:41   ` Kevin Wolf
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 36/45] block: implement dirty bitmap using HBitmap Paolo Bonzini
                   ` (10 subsequent siblings)
  45 siblings, 2 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

HBitmaps provides an array of bits.  The bits are stored as usual in an
array of unsigned longs, but HBitmap is also optimized to provide fast
iteration over set bits; going from one bit to the next is O(logB n)
worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
that the number of levels is in fact fixed.

In order to do this, it stacks multiple bitmaps with progressively coarser
granularity; in all levels except the last, bit N is set iff the N-th
unsigned long is nonzero in the immediately next level.  When iteration
completes on the last level it can examine the 2nd-last level to quickly
skip entire words, and even do so recursively to skip blocks of 64 words or
powers thereof (32 on 32-bit machines).

Given an index in the bitmap, it can be split in group of bits like
this (for the 64-bit case):

     bits 0-57 => word in the last bitmap     | bits 58-63 => bit in the word
     bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
     bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word

So it is easy to move up simply by shifting the index right by
log2(BITS_PER_LONG) bits.  To move down, you shift the index left
similarly, and add the word index within the group.  Iteration uses
ffs (find first set bit) to find the next word to examine; this
operation can be done in constant time in most current architectures.

Setting or clearing a range of m bits on all levels, the work to perform
is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.

When iterating on a bitmap, each bit (on any level) is only visited
once.  Hence, The total cost of visiting a bitmap with m bits in it is
the number of bits that are set in all bitmaps.  Unless the bitmap is
extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
cost of advancing from one bit to the next is usually constant.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: reworked iterator API, various other changes after review
        from Laszlo.  Added a testcase to test-hbitmap.c.

 hbitmap.c            | 400 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hbitmap.h            | 207 ++++++++++++++++++++++++++
 tests/Makefile       |   2 +
 tests/test-hbitmap.c | 408 +++++++++++++++++++++++++++++++++++++++++++++++++++
 trace-events         |   5 +
 5 file modificati, 1022 inserzioni(+)
 create mode 100644 hbitmap.c
 create mode 100644 hbitmap.h
 create mode 100644 tests/test-hbitmap.c

diff --git a/hbitmap.c b/hbitmap.c
new file mode 100644
index 0000000..90facab
--- /dev/null
+++ b/hbitmap.c
@@ -0,0 +1,400 @@
+/*
+ * Hierarchical Bitmap Data Type
+ *
+ * Copyright Red Hat, Inc., 2012
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "osdep.h"
+#include "hbitmap.h"
+#include "host-utils.h"
+#include "trace.h"
+#include <string.h>
+#include <glib.h>
+#include <assert.h>
+
+/* HBitmaps provides an array of bits.  The bits are stored as usual in an
+ * array of unsigned longs, but HBitmap is also optimized to provide fast
+ * iteration over set bits; going from one bit to the next is O(logB n)
+ * worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
+ * that the number of levels is in fact fixed.
+ *
+ * In order to do this, it stacks multiple bitmaps with progressively coarser
+ * granularity; in all levels except the last, bit N is set iff the N-th
+ * unsigned long is nonzero in the immediately next level.  When iteration
+ * completes on the last level it can examine the 2nd-last level to quickly
+ * skip entire words, and even do so recursively to skip blocks of 64 words or
+ * powers thereof (32 on 32-bit machines).
+ *
+ * Given an index in the bitmap, it can be split in group of bits like
+ * this (for the 64-bit case):
+ *
+ *   bits 0-57 => word in the last bitmap     | bits 58-63 => bit in the word
+ *   bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
+ *   bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word
+ *
+ * So it is easy to move up simply by shifting the index right by
+ * log2(BITS_PER_LONG) bits.  To move down, you shift the index left
+ * similarly, and add the word index within the group.  Iteration uses
+ * ffs (find first set bit) to find the next word to examine; this
+ * operation can be done in constant time in most current architectures.
+ *
+ * Setting or clearing a range of m bits on all levels, the work to perform
+ * is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.
+ *
+ * When iterating on a bitmap, each bit (on any level) is only visited
+ * once.  Hence, The total cost of visiting a bitmap with m bits in it is
+ * the number of bits that are set in all bitmaps.  Unless the bitmap is
+ * extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
+ * cost of advancing from one bit to the next is usually constant (worst case
+ * O(logB n) as in the non-amortized complexity).
+ */
+
+struct HBitmap {
+    /* Number of total bits in the bottom level.  */
+    uint64_t size;
+
+    /* Number of set bits in the bottom level.  */
+    uint64_t count;
+
+    /* A scaling factor.  Given a granularity of G, each bit in the bitmap will
+     * will actually represent a group of 2^G elements.  Each operation on a
+     * range of bits first rounds the bits to determine which group they land
+     * in, and then affect the entire page; iteration will only visit the first
+     * bit of each group.  Here is an example of operations in a size-16,
+     * granularity-1 HBitmap:
+     *
+     *    initial state            00000000
+     *    set(start=0, count=9)    11111000 (iter: 0, 2, 4, 6, 8)
+     *    reset(start=1, count=3)  00111000 (iter: 4, 6, 8)
+     *    set(start=9, count=2)    00111100 (iter: 4, 6, 8, 10)
+     *    reset(start=5, count=5)  00000000
+     *
+     * From an implementation point of view, when setting or resetting bits,
+     * the bitmap will scale bit numbers right by this amount of bits.  When
+     * iterating, the bitmap will scale bit numbers left by this amount of
+     * bits.
+     */
+    int granularity;
+
+    /* A number of progressively less coarse bitmaps (i.e. level 0 is the
+     * coarsest).  Each bit in level N represents a word in level N+1 that
+     * has a set bit, except the last level where each bit represents the
+     * actual bitmap.
+     *
+     * Note that all bitmaps have the same number of levels.  Even a 1-bit
+     * bitmap will still allocate HBITMAP_LEVELS arrays.
+     */
+    unsigned long *levels[HBITMAP_LEVELS];
+};
+
+static inline int popcountl(unsigned long l)
+{
+    return BITS_PER_LONG == 32 ? ctpop32(l) : ctpop64(l);
+}
+
+/* Advance hbi to the next nonzero word and return it.  hbi->pos
+ * is updated.  Returns zero if we reach the end of the bitmap.
+ */
+unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi)
+{
+    uint64_t pos = hbi->pos;
+    const HBitmap *hb = hbi->hb;
+    unsigned i = HBITMAP_LEVELS - 1;
+
+    unsigned long cur;
+    do {
+        cur = hbi->cur[--i];
+        pos >>= BITS_PER_LEVEL;
+    } while (cur == 0);
+
+    /* Check for end of iteration.  We always use fewer than BITS_PER_LONG
+     * bits in the level 0 bitmap; thus we can repurpose the most significant
+     * bit as a sentinel.  The sentinel is set in hbitmap_alloc and ensures
+     * that the above loop ends even without an explicit check on i.
+     */
+
+    if (i == 0 && cur == (1UL << (BITS_PER_LONG - 1))) {
+        return 0;
+    }
+    for (; i < HBITMAP_LEVELS - 1; i++) {
+        /* Shift back pos to the left, matching the right shifts above.
+         * The index of this word's least significant set bit provides
+         * the low-order bits.
+         */
+        pos = (pos << BITS_PER_LEVEL) + ffsl(cur) - 1;
+        hbi->cur[i] = cur & (cur - 1);
+
+        /* Set up next level for iteration.  */
+        cur = hb->levels[i + 1][pos];
+    }
+
+    hbi->pos = pos;
+    trace_hbitmap_iter_skip_words(hbi->hb, hbi, pos, cur);
+
+    assert(cur);
+    return cur;
+}
+
+void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first)
+{
+    unsigned i, bit;
+    uint64_t pos;
+
+    hbi->hb = hb;
+    pos = first >> hb->granularity;
+    hbi->pos = pos >> BITS_PER_LEVEL;
+    hbi->granularity = hb->granularity;
+
+    for (i = HBITMAP_LEVELS; i-- > 0; ) {
+        bit = pos & (BITS_PER_LONG - 1);
+        pos >>= BITS_PER_LEVEL;
+
+        /* Drop bits representing items before first.  */
+        hbi->cur[i] = hb->levels[i][pos] & ~((1UL << bit) - 1);
+
+        /* We have already added level i+1, so the lowest set bit has
+         * been processed.  Clear it.
+         */
+        if (i != HBITMAP_LEVELS - 1) {
+            hbi->cur[i] &= ~(1UL << bit);
+        }
+    }
+}
+
+bool hbitmap_empty(const HBitmap *hb)
+{
+    return hb->count == 0;
+}
+
+int hbitmap_granularity(const HBitmap *hb)
+{
+    return hb->granularity;
+}
+
+uint64_t hbitmap_count(const HBitmap *hb)
+{
+    return hb->count << hb->granularity;
+}
+
+/* Count the number of set bits between start and end, not accounting for
+ * the granularity.  Also an example of how to use hbitmap_iter_next_word.
+ */
+static uint64_t hb_count_between(HBitmap *hb, uint64_t start, uint64_t last)
+{
+    HBitmapIter hbi;
+    uint64_t count = 0;
+    uint64_t end = last + 1;
+    unsigned long cur;
+    size_t pos;
+
+    hbitmap_iter_init(&hbi, hb, start << hb->granularity);
+    for (;;) {
+        pos = hbitmap_iter_next_word(&hbi, &cur);
+        if (pos >= (end >> BITS_PER_LEVEL)) {
+            break;
+        }
+        count += popcountl(cur);
+    }
+
+    if (pos == (end >> BITS_PER_LEVEL)) {
+        /* Drop bits representing the END-th and subsequent items.  */
+        int bit = end & (BITS_PER_LONG - 1);
+        cur &= (1UL << bit) - 1;
+        count += popcountl(cur);
+    }
+
+    return count;
+}
+
+/* Setting starts at the last layer and propagates up if an element
+ * changes from zero to non-zero.
+ */
+static inline bool hb_set_elem(unsigned long *elem, uint64_t start, uint64_t last)
+{
+    unsigned long mask;
+    bool changed;
+
+    assert((last >> BITS_PER_LEVEL) == (start >> BITS_PER_LEVEL));
+    assert(start <= last);
+
+    mask = 2UL << (last & (BITS_PER_LONG - 1));
+    mask -= 1UL << (start & (BITS_PER_LONG - 1));
+    changed = (*elem == 0);
+    *elem |= mask;
+    return changed;
+}
+
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
+static void hb_set_between(HBitmap *hb, int level, uint64_t start, uint64_t last)
+{
+    size_t pos = start >> BITS_PER_LEVEL;
+    size_t lastpos = last >> BITS_PER_LEVEL;
+    bool changed = false;
+    size_t i;
+
+    i = pos;
+    if (i < lastpos) {
+        uint64_t next = (start | (BITS_PER_LONG - 1)) + 1;
+        changed |= hb_set_elem(&hb->levels[level][i], start, next - 1);
+        for (;;) {
+            start = next;
+            next += BITS_PER_LONG;
+            if (++i == lastpos) {
+                break;
+            }
+            changed |= (hb->levels[level][i] == 0);
+            hb->levels[level][i] = ~0UL;
+        }
+    }
+    changed |= hb_set_elem(&hb->levels[level][i], start, last);
+
+    /* If there was any change in this layer, we may have to update
+     * the one above.
+     */
+    if (level > 0 && changed) {
+        hb_set_between(hb, level - 1, pos, lastpos);
+    }
+}
+
+void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t count)
+{
+    /* Compute range in the last layer.  */
+    uint64_t last = start + count - 1;
+
+    trace_hbitmap_set(hb, start, count,
+                      start >> hb->granularity, last >> hb->granularity);
+
+    start >>= hb->granularity;
+    last >>= hb->granularity;
+    count = last - start + 1;
+
+    hb->count += count - hb_count_between(hb, start, last);
+    hb_set_between(hb, HBITMAP_LEVELS - 1, start, last);
+}
+
+/* Resetting works the other way round: propagate up if the new
+ * value is zero.
+ */
+static inline bool hb_reset_elem(unsigned long *elem, uint64_t start, uint64_t last)
+{
+    unsigned long mask;
+    bool blanked;
+
+    assert((last >> BITS_PER_LEVEL) == (start >> BITS_PER_LEVEL));
+    assert(start <= last);
+
+    mask = 2UL << (last & (BITS_PER_LONG - 1));
+    mask -= 1UL << (start & (BITS_PER_LONG - 1));
+    blanked = *elem != 0 && ((*elem & ~mask) == 0);
+    *elem &= ~mask;
+    return blanked;
+}
+
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
+static void hb_reset_between(HBitmap *hb, int level, uint64_t start, uint64_t last)
+{
+    size_t pos = start >> BITS_PER_LEVEL;
+    size_t lastpos = last >> BITS_PER_LEVEL;
+    bool changed = false;
+    size_t i;
+
+    i = pos;
+    if (i < lastpos) {
+        uint64_t next = (start | (BITS_PER_LONG - 1)) + 1;
+
+        /* Here we need a more complex test than when setting bits.  Even if
+         * something was changed, we must not blank bits in the upper level
+         * unless the lower-level word became entirely zero.  So, remove pos
+         * from the upper-level range if bits remain set.
+         */
+        if (hb_reset_elem(&hb->levels[level][i], start, next - 1)) {
+            changed = true;
+        } else {
+            pos++;
+        }
+
+        for (;;) {
+            start = next;
+            next += BITS_PER_LONG;
+            if (++i == lastpos) {
+                break;
+            }
+            changed |= (hb->levels[level][i] != 0);
+            hb->levels[level][i] = 0UL;
+        }
+    }
+
+    /* Same as above, this time for lastpos.  */
+    if (hb_reset_elem(&hb->levels[level][i], start, last)) {
+        changed = true;
+    } else {
+        lastpos--;
+    }
+
+    if (level > 0 && changed) {
+        hb_reset_between(hb, level - 1, pos, lastpos);
+    }
+}
+
+void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count)
+{
+    /* Compute range in the last layer.  */
+    uint64_t last = start + count - 1;
+
+    trace_hbitmap_reset(hb, start, count,
+                        start >> hb->granularity, last >> hb->granularity);
+
+    start >>= hb->granularity;
+    last >>= hb->granularity;
+
+    hb->count -= hb_count_between(hb, start, last);
+    hb_reset_between(hb, HBITMAP_LEVELS - 1, start, last);
+}
+
+bool hbitmap_get(const HBitmap *hb, uint64_t item)
+{
+    /* Compute position and bit in the last layer.  */
+    uint64_t pos = item >> hb->granularity;
+    unsigned long bit = 1UL << (pos & (BITS_PER_LONG - 1));
+
+    return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
+}
+
+void hbitmap_free(HBitmap *hb)
+{
+    unsigned i;
+    for (i = HBITMAP_LEVELS; i-- > 0; ) {
+        g_free(hb->levels[i]);
+    }
+    g_free(hb);
+}
+
+HBitmap *hbitmap_alloc(uint64_t size, int granularity)
+{
+    HBitmap *hb = g_malloc0(sizeof (struct HBitmap));
+    unsigned i;
+
+    assert(granularity >= 0 && granularity < 64);
+    size = (size + (1ULL << granularity) - 1) >> granularity;
+    assert(size <= ((uint64_t)1 << HBITMAP_LOG_MAX_SIZE));
+
+    hb->size = size;
+    hb->granularity = granularity;
+    for (i = HBITMAP_LEVELS; i-- > 0; ) {
+        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+        hb->levels[i] = g_malloc0(size * sizeof(unsigned long));
+    }
+
+    /* We necessarily have free bits in level 0 due to the definition
+     * of HBITMAP_LEVELS, so use one for a sentinel.  This speeds up
+     * hbitmap_iter_skip_words.
+     */
+    assert(size == 1);
+    hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
+    return hb;
+}
diff --git a/hbitmap.h b/hbitmap.h
new file mode 100644
index 0000000..7ddfb66
--- /dev/null
+++ b/hbitmap.h
@@ -0,0 +1,207 @@
+/*
+ * Hierarchical Bitmap Data Type
+ *
+ * Copyright Red Hat, Inc., 2012
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef HBITMAP_H
+#define HBITMAP_H 1
+
+#include <limits.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include "bitops.h"
+
+typedef struct HBitmap HBitmap;
+typedef struct HBitmapIter HBitmapIter;
+
+#define BITS_PER_LEVEL         (BITS_PER_LONG == 32 ? 5 : 6)
+
+/* For 32-bit, the largest that fits in a 4 GiB address space.
+ * For 64-bit, the number of sectors in 1 PiB.  Good luck, in
+ * either case... :)
+ */
+#define HBITMAP_LOG_MAX_SIZE   (BITS_PER_LONG == 32 ? 34 : 41)
+
+/* We need to place a sentinel in level 0 to speed up iteration.  Thus,
+ * we do this instead of HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL.  The
+ * difference is that it allocates an extra level when HBITMAP_LOG_MAX_SIZE
+ * is an exact multiple of BITS_PER_LEVEL.
+ */
+#define HBITMAP_LEVELS         ((HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL) + 1)
+
+struct HBitmapIter {
+    const HBitmap *hb;
+
+    /* Copied from hb for access in the inline functions (hb is opaque).  */
+    int granularity;
+
+    /* Entry offset into the last-level array of longs.  */
+    size_t pos;
+
+    /* The currently-active path in the tree.  Each item of cur[i] stores
+     * the bits (i.e. the subtrees) yet to be processed under that node.
+     */
+    unsigned long cur[HBITMAP_LEVELS];
+};
+
+/**
+ * hbitmap_alloc:
+ * @size: Number of bits in the bitmap.
+ * @granularity: Granularity of the bitmap.  Aligned groups of 2^@granularity
+ * bits will be represented by a single bit.  Each operation on a
+ * range of bits first rounds the bits to determine which group they land
+ * in, and then affect the entire set; iteration will only visit the first
+ * bit of each group.
+ *
+ * Allocate a new HBitmap.
+ */
+HBitmap *hbitmap_alloc(uint64_t size, int granularity);
+
+/**
+ * hbitmap_empty:
+ * @hb: HBitmap to operate on.
+ *
+ * Return whether the bitmap is empty.
+ */
+bool hbitmap_empty(const HBitmap *hb);
+
+/**
+ * hbitmap_granularity:
+ * @hb: HBitmap to operate on.
+ *
+ * Return the granularity of the HBitmap.
+ */
+int hbitmap_granularity(const HBitmap *hb);
+
+/**
+ * hbitmap_count:
+ * @hb: HBitmap to operate on.
+ *
+ * Return the number of bits set in the HBitmap.
+ */
+uint64_t hbitmap_count(const HBitmap *hb);
+
+/**
+ * hbitmap_set:
+ * @hb: HBitmap to operate on.
+ * @start: First bit to set (0-based).
+ * @count: Number of bits to set.
+ *
+ * Set a consecutive range of bits in an HBitmap.
+ */
+void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_reset:
+ * @hb: HBitmap to operate on.
+ * @start: First bit to reset (0-based).
+ * @count: Number of bits to reset.
+ *
+ * Reset a consecutive range of bits in an HBitmap.
+ */
+void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_get:
+ * @hb: HBitmap to operate on.
+ * @item: Bit to query (0-based).
+ *
+ * Return whether the @item-th bit in an HBitmap is set.
+ */
+bool hbitmap_get(const HBitmap *hb, uint64_t item);
+
+/**
+ * hbitmap_free:
+ * @hb: HBitmap to operate on.
+ *
+ * Free an HBitmap and all of its associated memory.
+ */
+void hbitmap_free(HBitmap *hb);
+
+/**
+ * hbitmap_iter_init:
+ * @hbi: HBitmapIter to initialize.
+ * @hb: HBitmap to iterate on.
+ * @first: First bit to visit (0-based).
+ *
+ * Set up @hbi to iterate on the HBitmap @hb.  hbitmap_iter_next will return
+ * the lowest-numbered bit that is set in @hb, starting at @first.
+ *
+ * Concurrent setting of bits is acceptable, and will at worst cause the
+ * iteration to miss some of those bits.  Resetting bits before the current
+ * position of the iterator is also okay.  However, concurrent resetting of
+ * bits can lead to unexpected behavior if the iterator has not yet reached
+ * those bits.
+ */
+void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first);
+
+/* hbitmap_iter_skip_words:
+ * @hbi: HBitmapIter to operate on.
+ *
+ * Internal function used by hbitmap_iter_next and hbitmap_iter_next_word.
+ */
+unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi);
+
+/**
+ * hbitmap_iter_next:
+ * @hbi: HBitmapIter to operate on.
+ *
+ * Return the next bit that is set in @hbi's associated HBitmap,
+ * or -1 if all remaining bits are zero.
+ */
+static inline int64_t hbitmap_iter_next(HBitmapIter *hbi)
+{
+    unsigned long cur = hbi->cur[HBITMAP_LEVELS - 1];
+    int64_t item;
+
+    if (cur == 0) {
+        cur = hbitmap_iter_skip_words(hbi);
+        if (cur == 0) {
+            return -1;
+        }
+    }
+
+    /* The next call will resume work from the next bit.  */
+    hbi->cur[HBITMAP_LEVELS - 1] = cur & (cur - 1);
+    item = ((uint64_t)hbi->pos << BITS_PER_LEVEL) + ffsl(cur) - 1;
+
+    return item << hbi->granularity;
+}
+
+/**
+ * hbitmap_iter_next_word:
+ * @hbi: HBitmapIter to operate on.
+ * @p_cur: Location where to store the next non-zero word.
+ *
+ * Return the index of the next nonzero word that is set in @hbi's
+ * associated HBitmap, and set *p_cur to the content of that word
+ * (bits before the index that was passed to hbitmap_iter_init are
+ * trimmed on the first call).  Return -1, and set *p_cur to zero,
+ * if all remaining words are zero.
+ */
+static inline size_t hbitmap_iter_next_word(HBitmapIter *hbi, unsigned long *p_cur)
+{
+    unsigned long cur = hbi->cur[HBITMAP_LEVELS - 1];
+
+    if (cur == 0) {
+        cur = hbitmap_iter_skip_words(hbi);
+        if (cur == 0) {
+            *p_cur = 0;
+            return -1;
+        }
+    }
+
+    /* The next call will resume work from the next word.  */
+    hbi->cur[HBITMAP_LEVELS - 1] = 0;
+    *p_cur = cur;
+    return hbi->pos;
+}
+
+
+#endif
diff --git a/tests/Makefile b/tests/Makefile
index 26a67ce..f52a4f2 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -15,6 +15,7 @@ check-unit-y += tests/test-string-output-visitor$(EXESUF)
 check-unit-y += tests/test-coroutine$(EXESUF)
 check-unit-y += tests/test-visitor-serialization$(EXESUF)
 check-unit-y += tests/test-iov$(EXESUF)
+check-unit-y += tests/test-hbitmap$(EXESUF)
 
 check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
 
@@ -50,6 +51,7 @@ tests/check-qfloat$(EXESUF): tests/check-qfloat.o qfloat.o $(tools-obj-y)
 tests/check-qjson$(EXESUF): tests/check-qjson.o $(qobject-obj-y) $(tools-obj-y)
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(coroutine-obj-y) $(tools-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o iov.o
+tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o hbitmap.o $(trace-obj-y)
 
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
new file mode 100644
index 0000000..b5d76a7
--- /dev/null
+++ b/tests/test-hbitmap.c
@@ -0,0 +1,408 @@
+/*
+ * Hierarchical bitmap unit-tests.
+ *
+ * Copyright (C) 2012 Red Hat Inc.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <stdarg.h>
+#include "hbitmap.h"
+
+#define LOG_BITS_PER_LONG          (BITS_PER_LONG == 32 ? 5 : 6)
+
+#define L1                         BITS_PER_LONG
+#define L2                         (BITS_PER_LONG * L1)
+#define L3                         (BITS_PER_LONG * L2)
+
+typedef struct TestHBitmapData {
+    HBitmap       *hb;
+    unsigned long *bits;
+    size_t         size;
+    int            granularity;
+} TestHBitmapData;
+
+
+/* Check that the HBitmap and the shadow bitmap contain the same data,
+ * ignoring the same "first" bits.
+ */
+static void hbitmap_test_check(TestHBitmapData *data,
+                               uint64_t first)
+{
+    uint64_t count = 0;
+    size_t pos;
+    int bit;
+    HBitmapIter hbi;
+    int64_t i, next;
+
+    hbitmap_iter_init(&hbi, data->hb, first);
+
+    i = first;
+    for (;;) {
+        next = hbitmap_iter_next(&hbi);
+        if (next < 0) {
+            next = data->size;
+        }
+
+        while (i < next) {
+            pos = i >> LOG_BITS_PER_LONG;
+            bit = i & (BITS_PER_LONG - 1);
+            i++;
+            g_assert_cmpint(data->bits[pos] & (1UL << bit), ==, 0);
+        }
+
+        if (next == data->size) {
+            break;
+        }
+
+        pos = i >> LOG_BITS_PER_LONG;
+        bit = i & (BITS_PER_LONG - 1);
+        i++;
+        count++;
+        g_assert_cmpint(data->bits[pos] & (1UL << bit), !=, 0);
+    }
+
+    if (first == 0) {
+        g_assert_cmpint(count << data->granularity, ==, hbitmap_count(data->hb));
+    }
+}
+
+/* This is provided instead of a test setup function so that the sizes
+   are kept in the test functions (and not in main()) */
+static void hbitmap_test_init(TestHBitmapData *data,
+                              uint64_t size, int granularity)
+{
+    size_t n;
+    data->hb = hbitmap_alloc(size, granularity);
+
+    n = (size + BITS_PER_LONG - 1) / BITS_PER_LONG;
+    if (n == 0) {
+        n = 1;
+    }
+    data->bits = g_new0(unsigned long, n);
+    data->size = size;
+    data->granularity = granularity;
+    hbitmap_test_check(data, 0);
+}
+
+static void hbitmap_test_teardown(TestHBitmapData *data,
+                                  const void *unused)
+{
+    if (data->hb) {
+        hbitmap_free(data->hb);
+        data->hb = NULL;
+    }
+    if (data->bits) {
+        g_free(data->bits);
+        data->bits = NULL;
+    }
+}
+
+/* Set a range in the HBitmap and in the shadow "simple" bitmap.
+ * The two bitmaps are then tested against each other.
+ */
+static void hbitmap_test_set(TestHBitmapData *data,
+                             uint64_t first, uint64_t count)
+{
+    hbitmap_set(data->hb, first, count);
+    while (count-- != 0) {
+        size_t pos = first >> LOG_BITS_PER_LONG;
+        int bit = first & (BITS_PER_LONG - 1);
+        first++;
+
+        data->bits[pos] |= 1UL << bit;
+    }
+
+    if (data->granularity == 0) {
+        hbitmap_test_check(data, 0);
+    }
+}
+
+/* Reset a range in the HBitmap and in the shadow "simple" bitmap.
+ */
+static void hbitmap_test_reset(TestHBitmapData *data,
+                               uint64_t first, uint64_t count)
+{
+    hbitmap_reset(data->hb, first, count);
+    while (count-- != 0) {
+        size_t pos = first >> LOG_BITS_PER_LONG;
+        int bit = first & (BITS_PER_LONG - 1);
+        first++;
+
+        data->bits[pos] &= ~(1UL << bit);
+    }
+
+    if (data->granularity == 0) {
+        hbitmap_test_check(data, 0);
+    }
+}
+
+static void hbitmap_test_check_get(TestHBitmapData *data)
+{
+    uint64_t count = 0;
+    uint64_t i;
+
+    for (i = 0; i < data->size; i++) {
+        size_t pos = i >> LOG_BITS_PER_LONG;
+        int bit = i & (BITS_PER_LONG - 1);
+        unsigned long val = data->bits[pos] & (1UL << bit);
+        count += hbitmap_get(data->hb, i);
+        g_assert_cmpint(hbitmap_get(data->hb, i), ==, val != 0);
+    }
+    g_assert_cmpint(count, ==, hbitmap_count(data->hb));
+}
+
+static void test_hbitmap_zero(TestHBitmapData *data,
+                               const void *unused)
+{
+    hbitmap_test_init(data, 0, 0);
+}
+
+static void test_hbitmap_unaligned(TestHBitmapData *data,
+                                   const void *unused)
+{
+    hbitmap_test_init(data, L3 + 23, 0);
+    hbitmap_test_set(data, 0, 1);
+    hbitmap_test_set(data, L3 + 22, 1);
+}
+
+static void test_hbitmap_iter_empty(TestHBitmapData *data,
+                                    const void *unused)
+{
+    hbitmap_test_init(data, L1, 0);
+}
+
+static void test_hbitmap_iter_partial(TestHBitmapData *data,
+                                      const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_set(data, 0, L3);
+    hbitmap_test_check(data, 1);
+    hbitmap_test_check(data, L1 - 1);
+    hbitmap_test_check(data, L1);
+    hbitmap_test_check(data, L1 * 2 - 1);
+    hbitmap_test_check(data, L2 - 1);
+    hbitmap_test_check(data, L2);
+    hbitmap_test_check(data, L2 + 1);
+    hbitmap_test_check(data, L2 + L1);
+    hbitmap_test_check(data, L2 + L1 * 2 - 1);
+    hbitmap_test_check(data, L2 * 2 - 1);
+    hbitmap_test_check(data, L2 * 2);
+    hbitmap_test_check(data, L2 * 2 + 1);
+    hbitmap_test_check(data, L2 * 2 + L1);
+    hbitmap_test_check(data, L2 * 2 + L1 * 2 - 1);
+    hbitmap_test_check(data, L3 / 2);
+}
+
+static void test_hbitmap_iter_past(TestHBitmapData *data,
+                                    const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_set(data, 0, L3);
+    hbitmap_test_check(data, L3);
+}
+
+static void test_hbitmap_set_all(TestHBitmapData *data,
+                                 const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_set(data, 0, L3);
+}
+
+static void test_hbitmap_get_all(TestHBitmapData *data,
+                                 const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_set(data, 0, L3);
+    hbitmap_test_check_get(data);
+}
+
+static void test_hbitmap_get_some(TestHBitmapData *data,
+                                  const void *unused)
+{
+    hbitmap_test_init(data, 2 * L2, 0);
+    hbitmap_test_set(data, 10, 1);
+    hbitmap_test_check_get(data);
+    hbitmap_test_set(data, L1 - 1, 1);
+    hbitmap_test_check_get(data);
+    hbitmap_test_set(data, L1, 1);
+    hbitmap_test_check_get(data);
+    hbitmap_test_set(data, L2 - 1, 1);
+    hbitmap_test_check_get(data);
+    hbitmap_test_set(data, L2, 1);
+    hbitmap_test_check_get(data);
+}
+
+static void test_hbitmap_set_one(TestHBitmapData *data,
+                                 const void *unused)
+{
+    hbitmap_test_init(data, 2 * L2, 0);
+    hbitmap_test_set(data, 10, 1);
+    hbitmap_test_set(data, L1 - 1, 1);
+    hbitmap_test_set(data, L1, 1);
+    hbitmap_test_set(data, L2 - 1, 1);
+    hbitmap_test_set(data, L2, 1);
+}
+
+static void test_hbitmap_set_two_elem(TestHBitmapData *data,
+                                      const void *unused)
+{
+    hbitmap_test_init(data, 2 * L2, 0);
+    hbitmap_test_set(data, L1 - 1, 2);
+    hbitmap_test_set(data, L1 * 2 - 1, 4);
+    hbitmap_test_set(data, L1 * 4, L1 + 1);
+    hbitmap_test_set(data, L1 * 8 - 1, L1 + 1);
+    hbitmap_test_set(data, L2 - 1, 2);
+    hbitmap_test_set(data, L2 + L1 - 1, 8);
+    hbitmap_test_set(data, L2 + L1 * 4, L1 + 1);
+    hbitmap_test_set(data, L2 + L1 * 8 - 1, L1 + 1);
+}
+
+static void test_hbitmap_set(TestHBitmapData *data,
+                             const void *unused)
+{
+    hbitmap_test_init(data, L3 * 2, 0);
+    hbitmap_test_set(data, L1 - 1, L1 + 2);
+    hbitmap_test_set(data, L1 * 3 - 1, L1 + 2);
+    hbitmap_test_set(data, L1 * 5, L1 * 2 + 1);
+    hbitmap_test_set(data, L1 * 8 - 1, L1 * 2 + 1);
+    hbitmap_test_set(data, L2 - 1, L1 + 2);
+    hbitmap_test_set(data, L2 + L1 * 2 - 1, L1 + 2);
+    hbitmap_test_set(data, L2 + L1 * 4, L1 * 2 + 1);
+    hbitmap_test_set(data, L2 + L1 * 7 - 1, L1 * 2 + 1);
+    hbitmap_test_set(data, L2 * 2 - 1, L3 * 2 - L2 * 2);
+}
+
+static void test_hbitmap_set_twice(TestHBitmapData *data,
+                                   const void *unused)
+{
+    hbitmap_test_init(data, L1 * 3, 0);
+    hbitmap_test_set(data, 0, L1 * 3);
+    hbitmap_test_set(data, L1, 1);
+}
+
+static void test_hbitmap_set_overlap(TestHBitmapData *data,
+                                     const void *unused)
+{
+    hbitmap_test_init(data, L3 * 2, 0);
+    hbitmap_test_set(data, L1 - 1, L1 + 2);
+    hbitmap_test_set(data, L1 * 2 - 1, L1 * 2 + 2);
+    hbitmap_test_set(data, 0, L1 * 3);
+    hbitmap_test_set(data, L1 * 8 - 1, L2);
+    hbitmap_test_set(data, L2, L1);
+    hbitmap_test_set(data, L2 - L1 - 1, L1 * 8 + 2);
+    hbitmap_test_set(data, L2, L3 - L2 + 1);
+    hbitmap_test_set(data, L3 - L1, L1 * 3);
+    hbitmap_test_set(data, L3 - 1, 3);
+    hbitmap_test_set(data, L3 - 1, L2);
+}
+
+static void test_hbitmap_reset_empty(TestHBitmapData *data,
+                                     const void *unused)
+{
+    hbitmap_test_init(data, L3, 0);
+    hbitmap_test_reset(data, 0, L3);
+}
+
+static void test_hbitmap_reset(TestHBitmapData *data,
+                               const void *unused)
+{
+    hbitmap_test_init(data, L3 * 2, 0);
+    hbitmap_test_set(data, L1 - 1, L1 + 2);
+    hbitmap_test_reset(data, L1 * 2 - 1, L1 * 2 + 2);
+    hbitmap_test_set(data, 0, L1 * 3);
+    hbitmap_test_reset(data, L1 * 8 - 1, L2);
+    hbitmap_test_set(data, L2, L1);
+    hbitmap_test_reset(data, L2 - L1 - 1, L1 * 8 + 2);
+    hbitmap_test_set(data, L2, L3 - L2 + 1);
+    hbitmap_test_reset(data, L3 - L1, L1 * 3);
+    hbitmap_test_set(data, L3 - 1, 3);
+    hbitmap_test_reset(data, L3 - 1, L2);
+    hbitmap_test_set(data, 0, L3 * 2);
+    hbitmap_test_reset(data, 0, L1);
+    hbitmap_test_reset(data, 0, L2);
+    hbitmap_test_reset(data, L3, L3);
+    hbitmap_test_set(data, L3 / 2, L3);
+}
+
+static void test_hbitmap_granularity(TestHBitmapData *data,
+                                     const void *unused)
+{
+    /* Note that hbitmap_test_check has to be invoked manually in this test.  */
+    hbitmap_test_init(data, L1, 1);
+    hbitmap_test_set(data, 0, 1);
+    g_assert_cmpint(hbitmap_count(data->hb), ==, 2);
+    hbitmap_test_check(data, 0);
+    hbitmap_test_set(data, 2, 1);
+    g_assert_cmpint(hbitmap_count(data->hb), ==, 4);
+    hbitmap_test_check(data, 0);
+    hbitmap_test_set(data, 0, 3);
+    g_assert_cmpint(hbitmap_count(data->hb), ==, 4);
+    hbitmap_test_reset(data, 0, 1);
+    g_assert_cmpint(hbitmap_count(data->hb), ==, 2);
+}
+
+static void test_hbitmap_iter_granularity(TestHBitmapData *data,
+                                          const void *unused)
+{
+    HBitmapIter hbi;
+
+    /* Note that hbitmap_test_check has to be invoked manually in this test.  */
+    hbitmap_test_init(data, 131072 << 7, 7);
+    hbitmap_iter_init(&hbi, data->hb, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+
+    hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
+    hbitmap_iter_init(&hbi, data->hb, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+
+    hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+
+    hbitmap_test_set(data, (131072 << 7) - 8, 8);
+    hbitmap_iter_init(&hbi, data->hb, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+
+    hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+}
+
+static void hbitmap_test_add(const char *testpath,
+                                   void (*test_func)(TestHBitmapData *data, const void *user_data))
+{
+    g_test_add(testpath, TestHBitmapData, NULL, NULL, test_func,
+               hbitmap_test_teardown);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+    hbitmap_test_add("/hbitmap/size/0", test_hbitmap_zero);
+    hbitmap_test_add("/hbitmap/size/unaligned", test_hbitmap_unaligned);
+    hbitmap_test_add("/hbitmap/iter/empty", test_hbitmap_iter_empty);
+    hbitmap_test_add("/hbitmap/iter/past", test_hbitmap_iter_past);
+    hbitmap_test_add("/hbitmap/iter/partial", test_hbitmap_iter_partial);
+    hbitmap_test_add("/hbitmap/iter/granularity", test_hbitmap_iter_granularity);
+    hbitmap_test_add("/hbitmap/get/all", test_hbitmap_get_all);
+    hbitmap_test_add("/hbitmap/get/some", test_hbitmap_get_some);
+    hbitmap_test_add("/hbitmap/set/all", test_hbitmap_set_all);
+    hbitmap_test_add("/hbitmap/set/one", test_hbitmap_set_one);
+    hbitmap_test_add("/hbitmap/set/two-elem", test_hbitmap_set_two_elem);
+    hbitmap_test_add("/hbitmap/set/general", test_hbitmap_set);
+    hbitmap_test_add("/hbitmap/set/twice", test_hbitmap_set_twice);
+    hbitmap_test_add("/hbitmap/set/overlap", test_hbitmap_set_overlap);
+    hbitmap_test_add("/hbitmap/reset/empty", test_hbitmap_reset_empty);
+    hbitmap_test_add("/hbitmap/reset/general", test_hbitmap_reset);
+    hbitmap_test_add("/hbitmap/granularity", test_hbitmap_granularity);
+    g_test_run();
+
+    return 0;
+}
diff --git a/trace-events b/trace-events
index 6ac4e4e..e2aa164 100644
--- a/trace-events
+++ b/trace-events
@@ -1007,3 +1007,8 @@ spapr_pci_rtas_ibm_change_msi(unsigned func, unsigned req) "func %u, requested %
 spapr_pci_rtas_ibm_query_interrupt_source_number(unsigned ioa, unsigned intr) "queries for #%u, IRQ%u"
 spapr_pci_msi_write(uint64_t addr, uint64_t data, uint32_t dt_irq) "@%"PRIx64"<=%"PRIx64" IRQ %u"
 spapr_pci_lsi_set(const char *busname, int pin, uint32_t irq) "%s PIN%d IRQ %u"
+
+# hbitmap.c
+hbitmap_iter_skip_words(const void *hb, void *hbi, uint64_t pos, unsigned long cur) "hb %p hbi %p pos %"PRId64" cur 0x%lx"
+hbitmap_reset(void *hb, uint64_t start, uint64_t count, uint64_t sbit, uint64_t ebit) "hb %p items %"PRIu64",%"PRIu64" bits %"PRIu64"..%"PRIu64
+hbitmap_set(void *hb, uint64_t start, uint64_t count, uint64_t sbit, uint64_t ebit) "hb %p items %"PRIu64",%"PRIu64" bits %"PRIu64"..%"PRIu64
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 36/45] block: implement dirty bitmap using HBitmap
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (34 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 37/45] block: make round_to_clusters public Paolo Bonzini
                   ` (9 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.

Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.objs  |  2 +-
 block.c        | 94 ++++++++++------------------------------------------------
 block.h        |  6 ++--
 block/mirror.c | 12 ++++++--
 block_int.h    |  4 +--
 trace-events   |  1 +
 6 file modificati, 33 inserzioni(+), 86 rimozioni(-)

diff --git a/Makefile.objs b/Makefile.objs
index 9401516..b7bf9f1 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -43,7 +43,7 @@ coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
 
 block-obj-y = cutils.o iov.o cache-utils.o qemu-option.o module.o async.o
 block-obj-y += nbd.o block.o blockjob.o aio.o aes.o qemu-config.o
-block-obj-y += qemu-progress.o qemu-sockets.o
+block-obj-y += qemu-progress.o qemu-sockets.o hbitmap.o
 block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
diff --git a/block.c b/block.c
index 0f57ec1..af4237b 100644
--- a/block.c
+++ b/block.c
@@ -1267,7 +1267,6 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
     bs_dest->iostatus           = bs_src->iostatus;
 
     /* dirty bitmap */
-    bs_dest->dirty_count        = bs_src->dirty_count;
     bs_dest->dirty_bitmap       = bs_src->dirty_bitmap;
 
     /* job */
@@ -1872,36 +1871,6 @@ int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
     return ret;
 }
 
-#define BITS_PER_LONG  (sizeof(unsigned long) * 8)
-
-static void set_dirty_bitmap(BlockDriverState *bs, int64_t sector_num,
-                             int nb_sectors, int dirty)
-{
-    int64_t start, end;
-    unsigned long val, idx, bit;
-
-    start = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
-    end = (sector_num + nb_sectors - 1) / BDRV_SECTORS_PER_DIRTY_CHUNK;
-
-    for (; start <= end; start++) {
-        idx = start / BITS_PER_LONG;
-        bit = start % BITS_PER_LONG;
-        val = bs->dirty_bitmap[idx];
-        if (dirty) {
-            if (!(val & (1UL << bit))) {
-                bs->dirty_count++;
-                val |= 1UL << bit;
-            }
-        } else {
-            if (val & (1UL << bit)) {
-                bs->dirty_count--;
-                val &= ~(1UL << bit);
-            }
-        }
-        bs->dirty_bitmap[idx] = val;
-    }
-}
-
 /* Return < 0 if error. Important errors are:
   -EIO         generic I/O error (may happen for all errors)
   -ENOMEDIUM   No media inserted.
@@ -4044,18 +4013,15 @@ void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
 {
     int64_t bitmap_size;
 
-    bs->dirty_count = 0;
     if (enable) {
         if (!bs->dirty_bitmap) {
-            bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS) +
-                    BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG - 1;
-            bitmap_size /= BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG;
-
-            bs->dirty_bitmap = g_new0(unsigned long, bitmap_size);
+            bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
+            bs->dirty_bitmap = hbitmap_alloc(bitmap_size,
+                                             BDRV_LOG_SECTORS_PER_DIRTY_CHUNK);
         }
     } else {
         if (bs->dirty_bitmap) {
-            g_free(bs->dirty_bitmap);
+            hbitmap_free(bs->dirty_bitmap);
             bs->dirty_bitmap = NULL;
         }
     }
@@ -4063,67 +4029,37 @@ void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
 
 int bdrv_get_dirty(BlockDriverState *bs, int64_t sector)
 {
-    int64_t chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
-
-    if (bs->dirty_bitmap &&
-        (sector << BDRV_SECTOR_BITS) < bdrv_getlength(bs)) {
-        return !!(bs->dirty_bitmap[chunk / BITS_PER_LONG] &
-            (1UL << (chunk % BITS_PER_LONG)));
+    if (bs->dirty_bitmap) {
+        return hbitmap_get(bs->dirty_bitmap, sector);
     } else {
         return 0;
     }
 }
 
-int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector)
+void bdrv_dirty_iter_init(BlockDriverState *bs, HBitmapIter *hbi)
 {
-    int64_t chunk;
-    int bit, elem;
-
-    /* Avoid an infinite loop.  */
-    assert(bs->dirty_count > 0);
-
-    sector = (sector | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
-    chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
-
-    QEMU_BUILD_BUG_ON(sizeof(bs->dirty_bitmap[0]) * 8 != BITS_PER_LONG);
-    elem = chunk / BITS_PER_LONG;
-    bit = chunk % BITS_PER_LONG;
-    for (;;) {
-        if (sector >= bs->total_sectors) {
-            sector = 0;
-            bit = elem = 0;
-        }
-        if (bit == 0 && bs->dirty_bitmap[elem] == 0) {
-            sector += BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG;
-            elem++;
-        } else {
-            if (bs->dirty_bitmap[elem] & (1UL << bit)) {
-                return sector;
-            }
-            sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
-            if (++bit == BITS_PER_LONG) {
-                bit = 0;
-                elem++;
-            }
-        }
-    }
+    hbitmap_iter_init(hbi, bs->dirty_bitmap, 0);
 }
 
 void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
                     int nr_sectors)
 {
-    set_dirty_bitmap(bs, cur_sector, nr_sectors, 1);
+    hbitmap_set(bs->dirty_bitmap, cur_sector, nr_sectors);
 }
 
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
                       int nr_sectors)
 {
-    set_dirty_bitmap(bs, cur_sector, nr_sectors, 0);
+    hbitmap_reset(bs->dirty_bitmap, cur_sector, nr_sectors);
 }
 
 int64_t bdrv_get_dirty_count(BlockDriverState *bs)
 {
-    return bs->dirty_count;
+    if (bs->dirty_bitmap) {
+        return hbitmap_count(bs->dirty_bitmap) >> BDRV_LOG_SECTORS_PER_DIRTY_CHUNK;
+    } else {
+        return 0;
+    }
 }
 
 void bdrv_set_in_use(BlockDriverState *bs, int in_use)
diff --git a/block.h b/block.h
index 9097edd..676135d 100644
--- a/block.h
+++ b/block.h
@@ -344,13 +344,15 @@ int bdrv_img_create(const char *filename, const char *fmt,
 void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
-#define BDRV_SECTORS_PER_DIRTY_CHUNK 2048
+#define BDRV_SECTORS_PER_DIRTY_CHUNK     (1 << BDRV_LOG_SECTORS_PER_DIRTY_CHUNK)
+#define BDRV_LOG_SECTORS_PER_DIRTY_CHUNK 11
 
+struct HBitmapIter;
 void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
 int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
 void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
-int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector);
+void bdrv_dirty_iter_init(BlockDriverState *bs, struct HBitmapIter *hbi);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
diff --git a/block/mirror.c b/block/mirror.c
index caec272..72e0986 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -36,6 +36,7 @@ typedef struct MirrorBlockJob {
     bool synced;
     bool complete;
     int64_t sector_num;
+    HBitmapIter hbi;
     uint8_t *buf;
 } MirrorBlockJob;
 
@@ -62,8 +63,15 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
     int64_t end;
     struct iovec iov;
 
+    s->sector_num = hbitmap_iter_next(&s->hbi);
+    if (s->sector_num < 0) {
+        bdrv_dirty_iter_init(source, &s->hbi);
+        s->sector_num = hbitmap_iter_next(&s->hbi);
+        trace_mirror_restart_iter(s, bdrv_get_dirty_count(source));
+        assert(s->sector_num >= 0);
+    }
+
     end = s->common.len >> BDRV_SECTOR_BITS;
-    s->sector_num = bdrv_get_next_dirty(source, s->sector_num);
     nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
     bdrv_reset_dirty(source, s->sector_num, nb_sectors);
 
@@ -136,7 +144,7 @@ static void coroutine_fn mirror_run(void *opaque)
         }
     }
 
-    s->sector_num = -1;
+    bdrv_dirty_iter_init(bs, &s->hbi);
     for (;;) {
         uint64_t delay_ns;
         int64_t cnt;
diff --git a/block_int.h b/block_int.h
index a533c7b..c09a679 100644
--- a/block_int.h
+++ b/block_int.h
@@ -32,6 +32,7 @@
 #include "qapi-types.h"
 #include "qerror.h"
 #include "monitor.h"
+#include "hbitmap.h"
 
 #define BLOCK_FLAG_ENCRYPT          1
 #define BLOCK_FLAG_COMPAT6          4
@@ -267,8 +268,7 @@ struct BlockDriverState {
     bool iostatus_enabled;
     BlockDeviceIoStatus iostatus;
     char device_name[32];
-    unsigned long *dirty_bitmap;
-    int64_t dirty_count;
+    HBitmap *dirty_bitmap;
     int in_use; /* users other than guest access, eg. block migration */
     QTAILQ_ENTRY(BlockDriverState) list;
 
diff --git a/trace-events b/trace-events
index e2aa164..99818d5 100644
--- a/trace-events
+++ b/trace-events
@@ -77,6 +77,7 @@ stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base
 
 # block/mirror.c
 mirror_start(void *bs, void *s, void *co, void *opaque) "bs %p s %p co %p opaque %p"
+mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_flush(void *s) "s %p"
 mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 37/45] block: make round_to_clusters public
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (35 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 36/45] block: implement dirty bitmap using HBitmap Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 38/45] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
                   ` (8 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

This is needed in the following patch.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c | 16 ++++++++--------
 block.h |  4 ++++
 2 file modificati, 12 inserzioni(+), 8 rimozioni(-)

diff --git a/block.c b/block.c
index af4237b..16da2a9 100644
--- a/block.c
+++ b/block.c
@@ -1653,10 +1653,10 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
 /**
  * Round a region to cluster boundaries
  */
-static void round_to_clusters(BlockDriverState *bs,
-                              int64_t sector_num, int nb_sectors,
-                              int64_t *cluster_sector_num,
-                              int *cluster_nb_sectors)
+void bdrv_round_to_clusters(BlockDriverState *bs,
+                            int64_t sector_num, int nb_sectors,
+                            int64_t *cluster_sector_num,
+                            int *cluster_nb_sectors)
 {
     BlockDriverInfo bdi;
 
@@ -1698,8 +1698,8 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
      * CoR read and write operations are atomic and guest writes cannot
      * interleave between them.
      */
-    round_to_clusters(bs, sector_num, nb_sectors,
-                      &cluster_sector_num, &cluster_nb_sectors);
+    bdrv_round_to_clusters(bs, sector_num, nb_sectors,
+                           &cluster_sector_num, &cluster_nb_sectors);
 
     do {
         retry = false;
@@ -2022,8 +2022,8 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
     /* Cover entire cluster so no additional backing file I/O is required when
      * allocating cluster in the image file.
      */
-    round_to_clusters(bs, sector_num, nb_sectors,
-                      &cluster_sector_num, &cluster_nb_sectors);
+    bdrv_round_to_clusters(bs, sector_num, nb_sectors,
+                           &cluster_sector_num, &cluster_nb_sectors);
 
     trace_bdrv_co_do_copy_on_readv(bs, sector_num, nb_sectors,
                                    cluster_sector_num, cluster_nb_sectors);
diff --git a/block.h b/block.h
index 676135d..e586d67 100644
--- a/block.h
+++ b/block.h
@@ -303,6 +303,10 @@ int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
                           const uint8_t *buf, int nb_sectors);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
+void bdrv_round_to_clusters(BlockDriverState *bs,
+                            int64_t sector_num, int nb_sectors,
+                            int64_t *cluster_sector_num,
+                            int *cluster_nb_sectors);
 
 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 38/45] mirror: perform COW if the cluster size is bigger than the granularity
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (36 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 37/45] block: make round_to_clusters public Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 39/45] block: return count of dirty sectors, not chunks Paolo Bonzini
                   ` (7 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

When mirroring runs, the backing files for the target may not yet be
ready.  However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros.  Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying).  So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.

However, we want to lower the granularity for efficiency, so we need
a better solution.  The solution is to always copy a whole cluster the
first time it is touched.  The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: new testcase

 block/mirror.c             | 60 +++++++++++++++++++++++++++++++++++++++-------
 blockdev.c                 | 15 +++---------
 tests/qemu-iotests/040     | 21 ++++++++++++++++
 tests/qemu-iotests/040.out |  4 ++--
 trace-events               |  1 +
 5 file modificati, 78 inserzioni(+), 23 rimozioni(-)

diff --git a/block/mirror.c b/block/mirror.c
index 72e0986..49f9bde 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -15,6 +15,7 @@
 #include "blockjob.h"
 #include "block_int.h"
 #include "qemu/ratelimit.h"
+#include "bitmap.h"
 
 enum {
     /*
@@ -36,6 +37,8 @@ typedef struct MirrorBlockJob {
     bool synced;
     bool complete;
     int64_t sector_num;
+    size_t buf_size;
+    unsigned long *cow_bitmap;
     HBitmapIter hbi;
     uint8_t *buf;
 } MirrorBlockJob;
@@ -60,7 +63,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
     BlockDriverState *target = s->target;
     QEMUIOVector qiov;
     int ret, nb_sectors;
-    int64_t end;
+    int64_t end, sector_num, cluster_num;
     struct iovec iov;
 
     s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -71,22 +74,41 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
         assert(s->sector_num >= 0);
     }
 
+    /* If we have no backing file yet in the destination, and the cluster size
+     * is very large, we need to do COW ourselves.  The first time a cluster is
+     * copied, copy it entirely.
+     *
+     * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are
+     * powers of two, the number of sectors to copy cannot exceed one cluster.
+     */
+    sector_num = s->sector_num;
+    nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
+    cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
+    if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
+        trace_mirror_cow(s, sector_num);
+        bdrv_round_to_clusters(s->target,
+                               sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK,
+                               &sector_num, &nb_sectors);
+        bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK,
+                   nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK);
+    }
+
     end = s->common.len >> BDRV_SECTOR_BITS;
-    nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
-    bdrv_reset_dirty(source, s->sector_num, nb_sectors);
+    nb_sectors = MIN(nb_sectors, end - sector_num);
+    bdrv_reset_dirty(source, sector_num, nb_sectors);
 
     /* Copy the dirty cluster.  */
     iov.iov_base = s->buf;
     iov.iov_len  = nb_sectors * 512;
     qemu_iovec_init_external(&qiov, &iov, 1);
 
-    trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
-    ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
+    trace_mirror_one_iteration(s, sector_num, nb_sectors);
+    ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov);
     if (ret < 0) {
         *p_action = mirror_error_action(s, true, -ret);
         goto fail;
     }
-    ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+    ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov);
     if (ret < 0) {
         *p_action = mirror_error_action(s, false, -ret);
         s->synced = false;
@@ -96,7 +118,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
 
 fail:
     /* Try again later.  */
-    bdrv_set_dirty(source, s->sector_num, nb_sectors);
+    bdrv_set_dirty(source, sector_num, nb_sectors);
     return ret;
 }
 
@@ -104,7 +126,9 @@ static void coroutine_fn mirror_run(void *opaque)
 {
     MirrorBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
-    int64_t sector_num, end;
+    int64_t sector_num, end, length;
+    BlockDriverInfo bdi;
+    char backing_filename[1024];
     int ret = 0;
     int n;
 
@@ -118,8 +142,23 @@ static void coroutine_fn mirror_run(void *opaque)
         return;
     }
 
+    /* If we have no backing file yet in the destination, we cannot let
+     * the destination do COW.  Instead, we copy sectors around the
+     * dirty data if needed.  We need a bitmap to do that.
+     */
+    bdrv_get_backing_filename(s->target, backing_filename,
+                              sizeof(backing_filename));
+    if (backing_filename[0] && !s->target->backing_hd) {
+        bdrv_get_info(s->target, &bdi);
+        if (s->buf_size < bdi.cluster_size) {
+            s->buf_size = bdi.cluster_size;
+            length = (bdrv_getlength(bs) + BLOCK_SIZE - 1) / BLOCK_SIZE;
+            s->cow_bitmap = bitmap_new(length);
+        }
+    }
+
     end = s->common.len >> BDRV_SECTOR_BITS;
-    s->buf = qemu_blockalign(bs, BLOCK_SIZE);
+    s->buf = qemu_blockalign(bs, s->buf_size);
 
     if (s->mode != MIRROR_SYNC_MODE_NONE) {
         /* First part, loop on the sectors and initialize the dirty bitmap.  */
@@ -233,6 +272,7 @@ static void coroutine_fn mirror_run(void *opaque)
 
 immediate_exit:
     g_free(s->buf);
+    g_free(s->cow_bitmap);
     bdrv_set_dirty_tracking(bs, false);
     bdrv_iostatus_disable(s->target);
     if (s->complete && ret == 0) {
@@ -316,6 +356,8 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->on_target_error = on_target_error;
     s->target = target;
     s->mode = mode;
+    s->buf_size = BLOCK_SIZE;
+
     bdrv_set_dirty_tracking(bs, true);
     bdrv_set_enable_write_cache(s->target, true);
     bdrv_set_on_error(s->target, on_target_error, on_target_error);
diff --git a/blockdev.c b/blockdev.c
index 84fee2f..c989ce6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1126,7 +1126,6 @@ void qmp_drive_mirror(const char *device, const char *target,
                       bool has_on_target_error, BlockdevOnError on_target_error,
                       Error **errp)
 {
-    BlockDriverInfo bdi;
     BlockDriverState *bs;
     BlockDriverState *source, *target_bs;
     BlockDriver *proto_drv;
@@ -1217,6 +1216,9 @@ void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
+    /* Mirroring takes care of copy-on-write using the source's backing
+     * file.
+     */
     target_bs = bdrv_new("");
     ret = bdrv_open(target_bs, target, flags | BDRV_O_NO_BACKING, drv);
 
@@ -1226,17 +1228,6 @@ void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
-    /* We need a backing file if we will copy parts of a cluster.  */
-    if (bdrv_get_info(target_bs, &bdi) >= 0 && bdi.cluster_size != 0 &&
-        bdi.cluster_size >= BDRV_SECTORS_PER_DIRTY_CHUNK * 512) {
-        ret = bdrv_open_backing_file(target_bs);
-        if (ret < 0) {
-            bdrv_delete(target_bs);
-            error_set(errp, QERR_OPEN_FILE_FAILED, target);
-            return;
-        }
-    }
-
     mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
                  block_job_cb, bs, &local_err);
     if (local_err != NULL) {
diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index ec86c70..39d07a6 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -283,6 +283,27 @@ class TestMirrorNoBacking(ImageMirroringTestCase):
         self.assertTrue(self.compare_images(test_img, target_img),
                         'target image does not match source after mirroring')
 
+    def test_large_cluster(self):
+        self.assert_no_active_mirrors()
+
+        # qemu-img create fails if the image is not there
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'size=%d'
+                        %(TestMirrorNoBacking.image_len), target_backing_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
+                        % (TestMirrorNoBacking.image_len, target_backing_img), target_img)
+        os.remove(target_backing_img)
+
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
 class TestReadErrors(ImageMirroringTestCase):
     image_len = 2 * 1024 * 1024 # MB
 
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index b6f2576..52d796e 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-................
+.................
 ----------------------------------------------------------------------
-Ran 16 tests
+Ran 17 tests
 
 OK
diff --git a/trace-events b/trace-events
index 99818d5..8bca020 100644
--- a/trace-events
+++ b/trace-events
@@ -82,6 +82,7 @@ mirror_before_flush(void *s) "s %p"
 mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
 mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
+mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 39/45] block: return count of dirty sectors, not chunks
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (37 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 38/45] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 40/45] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
                   ` (6 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block-migration.c | 2 +-
 block.c           | 4 ++--
 block/mirror.c    | 2 +-
 3 file modificati, 4 inserzioni(+), 4 rimozioni(-)

diff --git a/block-migration.c b/block-migration.c
index 7def8ab..07eafd3 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -485,7 +485,7 @@ static int64_t get_remaining_dirty(void)
         dirty += bdrv_get_dirty_count(bmds->bs);
     }
 
-    return dirty * BLOCK_SIZE;
+    return dirty << BDRV_SECTOR_BITS;
 }
 
 static int is_stage2_completed(void)
diff --git a/block.c b/block.c
index 16da2a9..6fe6491 100644
--- a/block.c
+++ b/block.c
@@ -2669,7 +2669,7 @@ BlockInfo *bdrv_query_info(BlockDriverState *bs)
     if (bs->dirty_bitmap) {
         info->has_dirty = true;
         info->dirty = g_malloc0(sizeof(*info->dirty));
-        info->dirty->count = bdrv_get_dirty_count(bs) * BDRV_SECTORS_PER_DIRTY_CHUNK;
+        info->dirty->count = bdrv_get_dirty_count(bs);
     }
 
     if (bs->drv) {
@@ -4056,7 +4056,7 @@ void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
 int64_t bdrv_get_dirty_count(BlockDriverState *bs)
 {
     if (bs->dirty_bitmap) {
-        return hbitmap_count(bs->dirty_bitmap) >> BDRV_LOG_SECTORS_PER_DIRTY_CHUNK;
+        return hbitmap_count(bs->dirty_bitmap);
     } else {
         return 0;
     }
diff --git a/block/mirror.c b/block/mirror.c
index 49f9bde..179406b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -242,7 +242,7 @@ static void coroutine_fn mirror_run(void *opaque)
         trace_mirror_before_sleep(s, cnt, s->synced);
         if (!s->synced) {
             /* Publish progress */
-            s->common.offset = end * BDRV_SECTOR_SIZE - cnt * BLOCK_SIZE;
+            s->common.offset = (end - cnt) * BDRV_SECTOR_SIZE;
 
             if (s->common.speed) {
                 delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 40/45] block: allow customizing the granularity of the dirty bitmap
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (38 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 39/45] block: return count of dirty sectors, not chunks Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 41/45] mirror: allow customizing the granularity Paolo Bonzini
                   ` (5 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: change argument bdrv_set_dirty_tracking from sectors
        to bytes.

 block-migration.c |  5 +++--
 block.c           | 17 ++++++++++-------
 block.h           |  5 +----
 block/mirror.c    | 14 ++++----------
 qapi-schema.json  |  4 +++-
 5 file modificati, 21 inserzioni(+), 24 rimozioni(-)

diff --git a/block-migration.c b/block-migration.c
index 07eafd3..d5f0669 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -23,7 +23,8 @@
 #include "blockdev.h"
 #include <assert.h>
 
-#define BLOCK_SIZE (BDRV_SECTORS_PER_DIRTY_CHUNK << BDRV_SECTOR_BITS)
+#define BLOCK_SIZE                       (1 << 20)
+#define BDRV_SECTORS_PER_DIRTY_CHUNK     (BLOCK_SIZE >> BDRV_SECTOR_BITS)
 
 #define BLK_MIG_FLAG_DEVICE_BLOCK       0x01
 #define BLK_MIG_FLAG_EOS                0x02
@@ -264,7 +265,7 @@ static void set_dirty_tracking(int enable)
     BlkMigDevState *bmds;
 
     QSIMPLEQ_FOREACH(bmds, &block_mig_state.bmds_list, entry) {
-        bdrv_set_dirty_tracking(bmds->bs, enable);
+        bdrv_set_dirty_tracking(bmds->bs, enable ? BLOCK_SIZE : 0);
     }
 }
 
diff --git a/block.c b/block.c
index 6fe6491..b488c14 100644
--- a/block.c
+++ b/block.c
@@ -2670,6 +2670,8 @@ BlockInfo *bdrv_query_info(BlockDriverState *bs)
         info->has_dirty = true;
         info->dirty = g_malloc0(sizeof(*info->dirty));
         info->dirty->count = bdrv_get_dirty_count(bs);
+        info->dirty->granularity =
+            ((int64_t) BDRV_SECTOR_SIZE << hbitmap_granularity(bs->dirty_bitmap));
     }
 
     if (bs->drv) {
@@ -4009,16 +4011,17 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
     return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size);
 }
 
-void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
+void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity)
 {
     int64_t bitmap_size;
 
-    if (enable) {
-        if (!bs->dirty_bitmap) {
-            bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
-            bs->dirty_bitmap = hbitmap_alloc(bitmap_size,
-                                             BDRV_LOG_SECTORS_PER_DIRTY_CHUNK);
-        }
+    assert((granularity & (granularity - 1)) == 0);
+
+    if (granularity) {
+        granularity >>= BDRV_SECTOR_BITS;
+        assert(!bs->dirty_bitmap);
+        bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
+        bs->dirty_bitmap = hbitmap_alloc(bitmap_size, ffs(granularity) - 1);
     } else {
         if (bs->dirty_bitmap) {
             hbitmap_free(bs->dirty_bitmap);
diff --git a/block.h b/block.h
index e586d67..b0985e2 100644
--- a/block.h
+++ b/block.h
@@ -348,11 +348,8 @@ int bdrv_img_create(const char *filename, const char *fmt,
 void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
-#define BDRV_SECTORS_PER_DIRTY_CHUNK     (1 << BDRV_LOG_SECTORS_PER_DIRTY_CHUNK)
-#define BDRV_LOG_SECTORS_PER_DIRTY_CHUNK 11
-
 struct HBitmapIter;
-void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
+void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity);
 int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
 void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
diff --git a/block/mirror.c b/block/mirror.c
index 179406b..eba6259 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,14 +17,8 @@
 #include "qemu/ratelimit.h"
 #include "bitmap.h"
 
-enum {
-    /*
-     * Size of data buffer for populating the image file.  This should be large
-     * enough to process multiple clusters in a single call, so that populating
-     * contiguous regions of the image is efficient.
-     */
-    BLOCK_SIZE = 512 * BDRV_SECTORS_PER_DIRTY_CHUNK, /* in bytes */
-};
+#define BLOCK_SIZE                       (1 << 20)
+#define BDRV_SECTORS_PER_DIRTY_CHUNK     (BLOCK_SIZE >> BDRV_SECTOR_BITS)
 
 #define SLICE_TIME 100000000ULL /* ns */
 
@@ -273,7 +267,7 @@ static void coroutine_fn mirror_run(void *opaque)
 immediate_exit:
     g_free(s->buf);
     g_free(s->cow_bitmap);
-    bdrv_set_dirty_tracking(bs, false);
+    bdrv_set_dirty_tracking(bs, 0);
     bdrv_iostatus_disable(s->target);
     if (s->complete && ret == 0) {
         bdrv_swap(s->target, s->common.bs);
@@ -358,7 +352,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->mode = mode;
     s->buf_size = BLOCK_SIZE;
 
-    bdrv_set_dirty_tracking(bs, true);
+    bdrv_set_dirty_tracking(bs, BLOCK_SIZE);
     bdrv_set_enable_write_cache(s->target, true);
     bdrv_set_on_error(s->target, on_target_error, on_target_error);
     bdrv_iostatus_enable(s->target);
diff --git a/qapi-schema.json b/qapi-schema.json
index 2947206..dc28685 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -628,10 +628,12 @@
 #
 # @count: number of dirty sectors according to the dirty bitmap
 #
+# @granularity: granularity of the dirty bitmap in bytes
+#
 # Since: 1.3
 ##
 { 'type': 'BlockDirtyInfo',
-  'data': {'count': 'int'} }
+  'data': {'count': 'int', 'granularity': 'int'} }
 
 ##
 # @BlockInfo:
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 41/45] mirror: allow customizing the granularity
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (39 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 40/45] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 42/45] mirror: switch mirror_iteration to AIO Paolo Bonzini
                   ` (4 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.

Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: pick granularity in mirror_start, not qmp_drive_mirror.

 block/mirror.c   | 50 ++++++++++++++++++++++++++++++++------------------
 block_int.h      |  3 ++-
 blockdev.c       | 15 ++++++++++++++-
 hmp.c            |  2 +-
 qapi-schema.json |  8 +++++++-
 qmp-commands.hx  |  8 +++++++-
 6 file modificati, 63 inserzioni(+), 23 rimozioni(-)

diff --git a/block/mirror.c b/block/mirror.c
index eba6259..335f17c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,9 +17,6 @@
 #include "qemu/ratelimit.h"
 #include "bitmap.h"
 
-#define BLOCK_SIZE                       (1 << 20)
-#define BDRV_SECTORS_PER_DIRTY_CHUNK     (BLOCK_SIZE >> BDRV_SECTOR_BITS)
-
 #define SLICE_TIME 100000000ULL /* ns */
 
 typedef struct MirrorBlockJob {
@@ -31,6 +28,7 @@ typedef struct MirrorBlockJob {
     bool synced;
     bool complete;
     int64_t sector_num;
+    int64_t granularity;
     size_t buf_size;
     unsigned long *cow_bitmap;
     HBitmapIter hbi;
@@ -56,7 +54,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
     BlockDriverState *source = s->common.bs;
     BlockDriverState *target = s->target;
     QEMUIOVector qiov;
-    int ret, nb_sectors;
+    int ret, nb_sectors, nb_sectors_chunk;
     int64_t end, sector_num, cluster_num;
     struct iovec iov;
 
@@ -72,19 +70,19 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
      * is very large, we need to do COW ourselves.  The first time a cluster is
      * copied, copy it entirely.
      *
-     * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are
-     * powers of two, the number of sectors to copy cannot exceed one cluster.
+     * Because both the granularity and the cluster size are powers of two, the
+     * number of sectors to copy cannot exceed one cluster.
      */
     sector_num = s->sector_num;
-    nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
-    cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
+    nb_sectors_chunk = nb_sectors = s->granularity >> BDRV_SECTOR_BITS;
+    cluster_num = sector_num / nb_sectors_chunk;
     if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
         trace_mirror_cow(s, sector_num);
         bdrv_round_to_clusters(s->target,
-                               sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK,
+                               sector_num, nb_sectors_chunk,
                                &sector_num, &nb_sectors);
-        bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK,
-                   nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK);
+        bitmap_set(s->cow_bitmap, sector_num / nb_sectors_chunk,
+                   nb_sectors / nb_sectors_chunk);
     }
 
     end = s->common.len >> BDRV_SECTOR_BITS;
@@ -120,7 +118,7 @@ static void coroutine_fn mirror_run(void *opaque)
 {
     MirrorBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
-    int64_t sector_num, end, length;
+    int64_t sector_num, end, nb_sectors_chunk, length;
     BlockDriverInfo bdi;
     char backing_filename[1024];
     int ret = 0;
@@ -146,20 +144,21 @@ static void coroutine_fn mirror_run(void *opaque)
         bdrv_get_info(s->target, &bdi);
         if (s->buf_size < bdi.cluster_size) {
             s->buf_size = bdi.cluster_size;
-            length = (bdrv_getlength(bs) + BLOCK_SIZE - 1) / BLOCK_SIZE;
+            length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
             s->cow_bitmap = bitmap_new(length);
         }
     }
 
     end = s->common.len >> BDRV_SECTOR_BITS;
     s->buf = qemu_blockalign(bs, s->buf_size);
+    nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
 
     if (s->mode != MIRROR_SYNC_MODE_NONE) {
         /* First part, loop on the sectors and initialize the dirty bitmap.  */
         BlockDriverState *base;
         base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
         for (sector_num = 0; sector_num < end; ) {
-            int64_t next = (sector_num | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
+            int64_t next = (sector_num | (nb_sectors_chunk - 1)) + 1;
             ret = bdrv_co_is_allocated_above(bs, base,
                                              sector_num, next - sector_num, &n);
 
@@ -239,7 +238,7 @@ static void coroutine_fn mirror_run(void *opaque)
             s->common.offset = (end - cnt) * BDRV_SECTOR_SIZE;
 
             if (s->common.speed) {
-                delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
+                delay_ns = ratelimit_calculate_delay(&s->limit, nb_sectors_chunk);
             } else {
                 delay_ns = 0;
             }
@@ -326,7 +325,7 @@ static BlockJobType mirror_job_type = {
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, MirrorSyncMode mode,
+                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
@@ -334,6 +333,20 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
 {
     MirrorBlockJob *s;
 
+    if (granularity == 0) {
+        /* Choose the default granularity based on the target file's cluster
+         * size, clamped between 4k and 64k.  */
+        BlockDriverInfo bdi;
+        if (bdrv_get_info(target, &bdi) >= 0 && bdi.cluster_size != 0) {
+            granularity = MAX(4096, bdi.cluster_size);
+            granularity = MIN(65536, granularity);
+        } else {
+            granularity = 65536;
+        }
+    }
+
+    assert ((granularity & (granularity - 1)) == 0);
+
     if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
          on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
         !bdrv_iostatus_is_enabled(bs)) {
@@ -350,9 +363,10 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->on_target_error = on_target_error;
     s->target = target;
     s->mode = mode;
-    s->buf_size = BLOCK_SIZE;
+    s->granularity = granularity;
+    s->buf_size = granularity;
 
-    bdrv_set_dirty_tracking(bs, BLOCK_SIZE);
+    bdrv_set_dirty_tracking(bs, granularity);
     bdrv_set_enable_write_cache(s->target, true);
     bdrv_set_on_error(s->target, on_target_error, on_target_error);
     bdrv_iostatus_enable(s->target);
diff --git a/block_int.h b/block_int.h
index c09a679..032db74 100644
--- a/block_int.h
+++ b/block_int.h
@@ -320,6 +320,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @bs: Block device to operate on.
  * @target: Block device to write to.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @granularity: The chosen granularity for the dirty bitmap.
  * @mode: Whether to collapse all images in the chain to the target.
  * @on_source_error: The action to take upon error reading from the source.
  * @on_target_error: The action to take upon error writing to the target.
@@ -333,7 +334,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @bs will be switched to read from @target.
  */
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, MirrorSyncMode mode,
+                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
diff --git a/blockdev.c b/blockdev.c
index c989ce6..c53e861 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1122,6 +1122,7 @@ void qmp_drive_mirror(const char *device, const char *target,
                       enum MirrorSyncMode sync,
                       bool has_mode, enum NewImageMode mode,
                       bool has_speed, int64_t speed,
+                      bool has_granularity, uint32_t granularity,
                       bool has_on_source_error, BlockdevOnError on_source_error,
                       bool has_on_target_error, BlockdevOnError on_target_error,
                       Error **errp)
@@ -1147,6 +1148,17 @@ void qmp_drive_mirror(const char *device, const char *target,
     if (!has_mode) {
         mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
     }
+    if (!has_granularity) {
+        granularity = 0;
+    }
+    if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
+        error_set(errp, QERR_INVALID_PARAMETER, device);
+        return;
+    }
+    if (granularity & (granularity - 1)) {
+        error_set(errp, QERR_INVALID_PARAMETER, device);
+        return;
+    }
 
     bs = bdrv_find(device);
     if (!bs) {
@@ -1228,7 +1240,8 @@ void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
-    mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
+    mirror_start(bs, target_bs, speed, granularity, sync,
+                 on_source_error, on_target_error,
                  block_job_cb, bs, &local_err);
     if (local_err != NULL) {
         bdrv_delete(target_bs);
diff --git a/hmp.c b/hmp.c
index b4d2736..12d1413 100644
--- a/hmp.c
+++ b/hmp.c
@@ -783,7 +783,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
 
     qmp_drive_mirror(device, filename, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-                     true, mode, false, 0,
+                     true, mode, false, 0, false, 0,
                      false, 0, false, 0, &errp);
     hmp_handle_error(mon, &errp);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index dc28685..1b9a962 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1553,6 +1553,11 @@
 #        (all the disk, only the sectors allocated in the topmost image, or
 #        only new I/O).
 #
+# @granularity: #optional granularity of the dirty bitmap, default is 64K
+#               if the image format doesn't have clusters, 4K if the clusters
+#               are smaller than that, else the cluster size.  Must be a
+#               power of 2 between 512 and 64M.
+#
 # @on-source-error: #optional the action to take on an error on the source,
 #                   default 'report'.  'stop' and 'enospc' can only be used
 #                   if the block device supports io-status (see BlockInfo).
@@ -1569,7 +1574,8 @@
 { 'command': 'drive-mirror',
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
-            '*speed': 'int', '*on-source-error': 'BlockdevOnError',
+            '*speed': 'int', '*granularity': 'uint32',
+            '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError' } }
 
 ##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index ec97eaa..e48ef5e 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -908,7 +908,8 @@ EQMP
     {
         .name       = "drive-mirror",
         .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
-                      "on-source-error:s?,on-target-error:s?",
+                      "on-source-error:s?,on-target-error:s?,"
+                      "granularity:i?",
         .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
     },
 
@@ -932,6 +933,7 @@ Arguments:
   file/device (NewImageMode, optional, default 'absolute-paths')
 - "speed": maximum speed of the streaming job, in bytes per second
   (json-int)
+- "granularity": granularity of the dirty bitmap (json-int, optional)
 - "sync": what parts of the disk image should be copied to the destination;
   possibilities include "full" for all the disk, "top" for only the sectors
   allocated in the topmost image, or "none" to only replicate new I/O
@@ -941,6 +943,10 @@ Arguments:
 - "on-target-error": the action to take on an error on the target
   (BlockdevOnError, default 'report')
 
+The default value of the granularity is, if the image format defines
+a cluster size, the cluster size or 4096, whichever is larger.  If it
+does not define a cluster size, the default value of the granularity
+is 65536.
 
 
 Example:
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 42/45] mirror: switch mirror_iteration to AIO
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (40 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 41/45] mirror: allow customizing the granularity Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 43/45] mirror: add buf-size argument to drive-mirror Paolo Bonzini
                   ` (3 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

There is really no change in the behavior of the job here, since
there is still a maximum of one in-flight I/O operation between
the source and the target.  However, this patch already introduces
the AIO callbacks (which are unmodified in the next patch)
and some of the logic to count in-flight operations and only
complete the job when there is none.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: some simplification thanks to mirror_error_action

 block/mirror.c | 155 +++++++++++++++++++++++++++++++++++++++++++--------------
 trace-events   |   2 +
 2 file modificati, 119 inserzioni(+), 38 rimozioni(-)

diff --git a/block/mirror.c b/block/mirror.c
index 335f17c..fc39621 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -33,8 +33,19 @@ typedef struct MirrorBlockJob {
     unsigned long *cow_bitmap;
     HBitmapIter hbi;
     uint8_t *buf;
+
+    int in_flight;
+    int ret;
 } MirrorBlockJob;
 
+typedef struct MirrorOp {
+    MirrorBlockJob *s;
+    QEMUIOVector qiov;
+    struct iovec iov;
+    int64_t sector_num;
+    int nb_sectors;
+} MirrorOp;
+
 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
                                             int error)
 {
@@ -48,15 +59,60 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
     }
 }
 
-static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
-                                         BlockErrorAction *p_action)
+static void mirror_iteration_done(MirrorOp *op)
+{
+    MirrorBlockJob *s = op->s;
+
+    s->in_flight--;
+    trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors);
+    g_slice_free(MirrorOp, op);
+    qemu_coroutine_enter(s->common.co, NULL);
+}
+
+static void mirror_write_complete(void *opaque, int ret)
+{
+    MirrorOp *op = opaque;
+    MirrorBlockJob *s = op->s;
+    if (ret < 0) {
+        BlockDriverState *source = s->common.bs;
+        BlockErrorAction action;
+
+        bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+        action = mirror_error_action(s, false, -ret);
+        if (action == BDRV_ACTION_REPORT && s->ret >= 0) {
+            s->ret = ret;
+        }
+    }
+    mirror_iteration_done(op);
+}
+
+static void mirror_read_complete(void *opaque, int ret)
+{
+    MirrorOp *op = opaque;
+    MirrorBlockJob *s = op->s;
+    if (ret < 0) {
+        BlockDriverState *source = s->common.bs;
+        BlockErrorAction action;
+
+        bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+        action = mirror_error_action(s, true, -ret);
+        if (action == BDRV_ACTION_REPORT && s->ret >= 0) {
+            s->ret = ret;
+        }
+
+        mirror_iteration_done(op);
+        return;
+    }
+    bdrv_aio_writev(s->target, op->sector_num, &op->qiov, op->nb_sectors,
+                    mirror_write_complete, op);
+}
+
+static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->common.bs;
-    BlockDriverState *target = s->target;
-    QEMUIOVector qiov;
-    int ret, nb_sectors, nb_sectors_chunk;
+    int nb_sectors, nb_sectors_chunk;
     int64_t end, sector_num, cluster_num;
-    struct iovec iov;
+    MirrorOp *op;
 
     s->sector_num = hbitmap_iter_next(&s->hbi);
     if (s->sector_num < 0) {
@@ -87,31 +143,30 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
 
     end = s->common.len >> BDRV_SECTOR_BITS;
     nb_sectors = MIN(nb_sectors, end - sector_num);
+
+    /* Allocate a MirrorOp that is used as an AIO callback.  */
+    op = g_slice_new(MirrorOp);
+    op->s = s;
+    op->iov.iov_base = s->buf;
+    op->iov.iov_len  = nb_sectors * 512;
+    op->sector_num = sector_num;
+    op->nb_sectors = nb_sectors;
+    qemu_iovec_init_external(&op->qiov, &op->iov, 1);
+
     bdrv_reset_dirty(source, sector_num, nb_sectors);
 
     /* Copy the dirty cluster.  */
-    iov.iov_base = s->buf;
-    iov.iov_len  = nb_sectors * 512;
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
+    s->in_flight++;
     trace_mirror_one_iteration(s, sector_num, nb_sectors);
-    ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov);
-    if (ret < 0) {
-        *p_action = mirror_error_action(s, true, -ret);
-        goto fail;
-    }
-    ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov);
-    if (ret < 0) {
-        *p_action = mirror_error_action(s, false, -ret);
-        s->synced = false;
-        goto fail;
-    }
-    return 0;
+    bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
+                   mirror_read_complete, op);
+}
 
-fail:
-    /* Try again later.  */
-    bdrv_set_dirty(source, sector_num, nb_sectors);
-    return ret;
+static void mirror_drain(MirrorBlockJob *s)
+{
+    while (s->in_flight > 0) {
+        qemu_coroutine_yield();
+    }
 }
 
 static void coroutine_fn mirror_run(void *opaque)
@@ -119,6 +174,7 @@ static void coroutine_fn mirror_run(void *opaque)
     MirrorBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
     int64_t sector_num, end, nb_sectors_chunk, length;
+    uint64_t last_pause_ns;
     BlockDriverInfo bdi;
     char backing_filename[1024];
     int ret = 0;
@@ -177,28 +233,43 @@ static void coroutine_fn mirror_run(void *opaque)
     }
 
     bdrv_dirty_iter_init(bs, &s->hbi);
+    last_pause_ns = qemu_get_clock_ns(rt_clock);
     for (;;) {
         uint64_t delay_ns;
         int64_t cnt;
         bool should_complete;
 
+        if (s->ret < 0) {
+            ret = s->ret;
+            break;
+        }
+
         cnt = bdrv_get_dirty_count(bs);
-        if (cnt != 0) {
-            BlockErrorAction action = BDRV_ACTION_REPORT;
-            ret = mirror_iteration(s, &action);
-            if (ret < 0 && action == BDRV_ACTION_REPORT) {
-                goto immediate_exit;
+
+        /* Note that even when no rate limit is applied we need to yield
+         * periodically with no pending I/O so that qemu_aio_flush() returns.
+         * We do so every SLICE_TIME milliseconds, or when there is an error,
+         * or when the source is clean, whichever comes first.
+         */
+        if (qemu_get_clock_ns(rt_clock) - last_pause_ns < SLICE_TIME &&
+            s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
+            if (s->in_flight > 0) {
+                trace_mirror_yield(s, s->in_flight, cnt);
+                qemu_coroutine_yield();
+                continue;
+            } else if (cnt != 0) {
+                mirror_iteration(s);
+                continue;
             }
-            cnt = bdrv_get_dirty_count(bs);
         }
 
         should_complete = false;
-        if (cnt == 0) {
+        if (s->in_flight == 0 && cnt == 0) {
             trace_mirror_before_flush(s);
             ret = bdrv_flush(s->target);
             if (ret < 0) {
                 if (mirror_error_action(s, false, -ret) == BDRV_ACTION_REPORT) {
-                    goto immediate_exit;
+                    break;
                 }
             } else {
                 /* We're out of the streaming phase.  From now on, if the job
@@ -243,15 +314,12 @@ static void coroutine_fn mirror_run(void *opaque)
                 delay_ns = 0;
             }
 
-            /* Note that even when no rate limit is applied we need to yield
-             * with no pending I/O here so that qemu_aio_flush() returns.
-             */
             block_job_sleep_ns(&s->common, rt_clock, delay_ns);
             if (block_job_is_cancelled(&s->common)) {
                 break;
             }
         } else if (!should_complete) {
-            delay_ns = (cnt == 0 ? SLICE_TIME : 0);
+            delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
             block_job_sleep_ns(&s->common, rt_clock, delay_ns);
         } else if (cnt == 0) {
             /* The two disks are in sync.  Exit and report successful
@@ -261,9 +329,20 @@ static void coroutine_fn mirror_run(void *opaque)
             s->common.cancelled = false;
             break;
         }
+        last_pause_ns = qemu_get_clock_ns(rt_clock);
     }
 
 immediate_exit:
+    if (s->in_flight > 0) {
+        /* We get here only if something went wrong.  Either the job failed,
+         * or it was cancelled prematurely so that we do not guarantee that
+         * the target is a copy of the source.
+         */
+        assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common)));
+        mirror_drain(s);
+    }
+
+    assert(s->in_flight == 0);
     g_free(s->buf);
     g_free(s->cow_bitmap);
     bdrv_set_dirty_tracking(bs, 0);
diff --git a/trace-events b/trace-events
index 8bca020..ed65538 100644
--- a/trace-events
+++ b/trace-events
@@ -83,6 +83,8 @@ mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
 mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
 mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64
+mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
+mirror_yield(void *s, int64_t cnt, int in_flight) "s %p dirty count %"PRId64" in_flight %d"
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 43/45] mirror: add buf-size argument to drive-mirror
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (41 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 42/45] mirror: switch mirror_iteration to AIO Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 44/45] mirror: support more than one in-flight AIO operation Paolo Bonzini
                   ` (2 subsequent siblings)
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

This makes sense when the next commit starts using the extra buffer space
to perform many I/O operations asynchronously.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: new testcases
 block/mirror.c             |  6 +++---
 block_int.h                |  5 +++--
 blockdev.c                 |  9 ++++++++-
 hmp.c                      |  2 +-
 qapi-schema.json           |  5 ++++-
 qmp-commands.hx            |  4 +++-
 tests/qemu-iotests/040     | 31 +++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |  4 ++--
 8 file modificati, 55 inserzioni(+), 11 rimozioni(-)

diff --git a/block/mirror.c b/block/mirror.c
index fc39621..e6426bb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -404,8 +404,8 @@ static BlockJobType mirror_job_type = {
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
-                  BlockdevOnError on_source_error,
+                  int64_t speed, int64_t granularity, int64_t buf_size,
+                  MirrorSyncMode mode, BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp)
@@ -443,7 +443,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->target = target;
     s->mode = mode;
     s->granularity = granularity;
-    s->buf_size = granularity;
+    s->buf_size = MAX(buf_size, granularity);
 
     bdrv_set_dirty_tracking(bs, granularity);
     bdrv_set_enable_write_cache(s->target, true);
diff --git a/block_int.h b/block_int.h
index 032db74..78d87fa 100644
--- a/block_int.h
+++ b/block_int.h
@@ -321,6 +321,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @target: Block device to write to.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @granularity: The chosen granularity for the dirty bitmap.
+ * @buf_size: The amount of data that can be in flight at one time.
  * @mode: Whether to collapse all images in the chain to the target.
  * @on_source_error: The action to take upon error reading from the source.
  * @on_target_error: The action to take upon error writing to the target.
@@ -334,8 +335,8 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @bs will be switched to read from @target.
  */
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
-                  BlockdevOnError on_source_error,
+                  int64_t speed, int64_t granularity, int64_t buf_size,
+                  MirrorSyncMode mode, BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
diff --git a/blockdev.c b/blockdev.c
index c53e861..f001bba 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1117,12 +1117,15 @@ void qmp_block_stream(const char *device, bool has_base,
     trace_qmp_block_stream(bs, bs->job);
 }
 
+#define DEFAULT_MIRROR_BUF_SIZE   (10 << 20)
+
 void qmp_drive_mirror(const char *device, const char *target,
                       bool has_format, const char *format,
                       enum MirrorSyncMode sync,
                       bool has_mode, enum NewImageMode mode,
                       bool has_speed, int64_t speed,
                       bool has_granularity, uint32_t granularity,
+                      bool has_buf_size, int64_t buf_size,
                       bool has_on_source_error, BlockdevOnError on_source_error,
                       bool has_on_target_error, BlockdevOnError on_target_error,
                       Error **errp)
@@ -1151,6 +1154,10 @@ void qmp_drive_mirror(const char *device, const char *target,
     if (!has_granularity) {
         granularity = 0;
     }
+    if (!has_buf_size) {
+        buf_size = DEFAULT_MIRROR_BUF_SIZE;
+    }
+
     if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
         error_set(errp, QERR_INVALID_PARAMETER, device);
         return;
@@ -1240,7 +1247,7 @@ void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
-    mirror_start(bs, target_bs, speed, granularity, sync,
+    mirror_start(bs, target_bs, speed, granularity, buf_size, sync,
                  on_source_error, on_target_error,
                  block_job_cb, bs, &local_err);
     if (local_err != NULL) {
diff --git a/hmp.c b/hmp.c
index 12d1413..b518045 100644
--- a/hmp.c
+++ b/hmp.c
@@ -783,7 +783,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
 
     qmp_drive_mirror(device, filename, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-                     true, mode, false, 0, false, 0,
+                     true, mode, false, 0, false, 0, false, 0,
                      false, 0, false, 0, &errp);
     hmp_handle_error(mon, &errp);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 1b9a962..653ce1e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1558,6 +1558,9 @@
 #               are smaller than that, else the cluster size.  Must be a
 #               power of 2 between 512 and 64M.
 #
+# @buf-size: #optional maximum amount of data in flight from source to
+#            target.
+#
 # @on-source-error: #optional the action to take on an error on the source,
 #                   default 'report'.  'stop' and 'enospc' can only be used
 #                   if the block device supports io-status (see BlockInfo).
@@ -1575,7 +1578,7 @@
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
             '*speed': 'int', '*granularity': 'uint32',
-            '*on-source-error': 'BlockdevOnError',
+            '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError' } }
 
 ##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e48ef5e..55f5b90 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -909,7 +909,7 @@ EQMP
         .name       = "drive-mirror",
         .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
                       "on-source-error:s?,on-target-error:s?,"
-                      "granularity:i?",
+                      "buf-size:i?,granularity:i?",
         .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
     },
 
@@ -934,6 +934,8 @@ Arguments:
 - "speed": maximum speed of the streaming job, in bytes per second
   (json-int)
 - "granularity": granularity of the dirty bitmap (json-int, optional)
+- "buf_size": maximum amount of data in flight from source to target
+  (json-int, default 10M)
 - "sync": what parts of the disk image should be copied to the destination;
   possibilities include "full" for all the disk, "top" for only the sectors
   allocated in the topmost image, or "none" to only replicate new I/O
diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 39d07a6..9d06365 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -195,6 +195,37 @@ class TestSingleDrive(ImageMirroringTestCase):
         self.assertTrue(self.compare_images(test_img, target_img),
                         'target image does not match source after mirroring')
 
+    def test_small_buffer(self):
+        self.assert_no_active_mirrors()
+
+        # A small buffer is rounded up automatically
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             buf_size=4096, target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
+    def test_small_buffer2(self):
+        self.assert_no_active_mirrors()
+
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,size=%d'
+                        % (TestSingleDrive.image_len, TestSingleDrive.image_len), target_img)
+        result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+                             buf_size=65536, mode='existing', target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        result = self.vm.qmp('query-block')
+        self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after mirroring')
+
     def test_large_cluster(self):
         self.assert_no_active_mirrors()
 
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 52d796e..4176bb9 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-.................
+...................
 ----------------------------------------------------------------------
-Ran 17 tests
+Ran 19 tests
 
 OK
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 44/45] mirror: support more than one in-flight AIO operation
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (42 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 43/45] mirror: add buf-size argument to drive-mirror Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 45/45] mirror: support arbitrarily-sized iterations Paolo Bonzini
  2012-09-27 14:05 ` [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Kevin Wolf
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

With AIO support in place, we can start copying more than one chunk
in parallel.  This patch introduces the required infrastructure for
this: the buffer is split into multiple granularity-sized chunks,
and there is a free list to access them.

Because of copy-on-write, a single operation may already require
multiple chunks to be available on the free list.

In addition, two different iterations on the HBitmap may want to
copy the same cluster.  We avoid this by keeping a bitmap of in-flight
I/O operations, and blocking until the previous iteration completes.
This should be a pretty rare occurrence, though; as long as there is
no overlap the next iteration can start before the previous one finishes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
 trace-events   |   4 ++-
 2 file modificati, 100 inserzioni(+), 13 rimozioni(-)

diff --git a/block/mirror.c b/block/mirror.c
index e6426bb..9545f90 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,7 +17,15 @@
 #include "qemu/ratelimit.h"
 #include "bitmap.h"
 
-#define SLICE_TIME 100000000ULL /* ns */
+#define SLICE_TIME    100000000ULL /* ns */
+#define MAX_IN_FLIGHT 16
+
+/* The mirroring buffer is a list of granularity-sized chunks.
+ * Free chunks are organized in a list.
+ */
+typedef struct MirrorBuffer {
+    QSIMPLEQ_ENTRY(MirrorBuffer) next;
+} MirrorBuffer;
 
 typedef struct MirrorBlockJob {
     BlockJob common;
@@ -33,7 +41,10 @@ typedef struct MirrorBlockJob {
     unsigned long *cow_bitmap;
     HBitmapIter hbi;
     uint8_t *buf;
+    QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
+    int buf_free_count;
 
+    unsigned long *in_flight_bitmap;
     int in_flight;
     int ret;
 } MirrorBlockJob;
@@ -41,7 +52,6 @@ typedef struct MirrorBlockJob {
 typedef struct MirrorOp {
     MirrorBlockJob *s;
     QEMUIOVector qiov;
-    struct iovec iov;
     int64_t sector_num;
     int nb_sectors;
 } MirrorOp;
@@ -62,8 +72,22 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
 static void mirror_iteration_done(MirrorOp *op)
 {
     MirrorBlockJob *s = op->s;
+    struct iovec *iov;
+    int64_t cluster_num;
+    int i, nb_chunks;
 
     s->in_flight--;
+    iov = op->qiov.iov;
+    for (i = 0; i < op->qiov.niov; i++) {
+        MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
+        QSIMPLEQ_INSERT_TAIL(&s->buf_free, buf, next);
+        s->buf_free_count++;
+    }
+
+    cluster_num = op->sector_num / s->granularity;
+    nb_chunks = op->nb_sectors / s->granularity;
+    bitmap_clear(s->in_flight_bitmap, cluster_num, nb_chunks);
+
     trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors);
     g_slice_free(MirrorOp, op);
     qemu_coroutine_enter(s->common.co, NULL);
@@ -110,8 +134,8 @@ static void mirror_read_complete(void *opaque, int ret)
 static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->common.bs;
-    int nb_sectors, nb_sectors_chunk;
-    int64_t end, sector_num, cluster_num;
+    int nb_sectors, nb_sectors_chunk, nb_chunks;
+    int64_t end, sector_num, cluster_num, next_sector, hbitmap_next_sector;
     MirrorOp *op;
 
     s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -122,6 +146,8 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
         assert(s->sector_num >= 0);
     }
 
+    hbitmap_next_sector = s->sector_num;
+
     /* If we have no backing file yet in the destination, and the cluster size
      * is very large, we need to do COW ourselves.  The first time a cluster is
      * copied, copy it entirely.
@@ -137,21 +163,58 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
         bdrv_round_to_clusters(s->target,
                                sector_num, nb_sectors_chunk,
                                &sector_num, &nb_sectors);
-        bitmap_set(s->cow_bitmap, sector_num / nb_sectors_chunk,
-                   nb_sectors / nb_sectors_chunk);
+
+        /* The rounding may make us copy sectors before the
+         * first dirty one.
+         */
+        cluster_num = sector_num / nb_sectors_chunk;
+    }
+
+    /* Wait for I/O to this cluster (from a previous iteration) to be done.  */
+    while (test_bit(cluster_num, s->in_flight_bitmap)) {
+        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+        qemu_coroutine_yield();
     }
 
     end = s->common.len >> BDRV_SECTOR_BITS;
     nb_sectors = MIN(nb_sectors, end - sector_num);
+    nb_chunks = (nb_sectors + nb_sectors_chunk - 1) / nb_sectors_chunk;
+    while (s->buf_free_count < nb_chunks) {
+        trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
+        qemu_coroutine_yield();
+    }
+
+    /* We have enough free space to copy these sectors.  */
+    if (s->cow_bitmap) {
+        bitmap_set(s->cow_bitmap, cluster_num, nb_chunks);
+    }
 
     /* Allocate a MirrorOp that is used as an AIO callback.  */
     op = g_slice_new(MirrorOp);
     op->s = s;
-    op->iov.iov_base = s->buf;
-    op->iov.iov_len  = nb_sectors * 512;
     op->sector_num = sector_num;
     op->nb_sectors = nb_sectors;
-    qemu_iovec_init_external(&op->qiov, &op->iov, 1);
+
+    /* Now make a QEMUIOVector taking enough granularity-sized chunks
+     * from s->buf_free.
+     */
+    qemu_iovec_init(&op->qiov, nb_chunks);
+    next_sector = sector_num;
+    while (nb_chunks-- > 0) {
+        MirrorBuffer *buf = QSIMPLEQ_FIRST(&s->buf_free);
+        QSIMPLEQ_REMOVE_HEAD(&s->buf_free, next);
+        s->buf_free_count--;
+        qemu_iovec_add(&op->qiov, buf, s->granularity);
+
+        /* Advance the HBitmapIter in parallel, so that we do not examine
+         * the same sector twice.
+         */
+        if (next_sector > hbitmap_next_sector && bdrv_get_dirty(source, next_sector)) {
+            hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
+        }
+
+        next_sector += nb_sectors_chunk;
+    }
 
     bdrv_reset_dirty(source, sector_num, nb_sectors);
 
@@ -162,6 +225,23 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
                    mirror_read_complete, op);
 }
 
+static void mirror_free_init(MirrorBlockJob *s)
+{
+    int granularity = s->granularity;
+    size_t buf_size = s->buf_size;
+    uint8_t *buf = s->buf;
+
+    assert(s->buf_free_count == 0);
+    QSIMPLEQ_INIT(&s->buf_free);
+    while (buf_size != 0) {
+        MirrorBuffer *cur = (MirrorBuffer *)buf;
+        QSIMPLEQ_INSERT_TAIL(&s->buf_free, cur, next);
+        s->buf_free_count++;
+        buf_size -= granularity;
+        buf += granularity;
+    }
+}
+
 static void mirror_drain(MirrorBlockJob *s)
 {
     while (s->in_flight > 0) {
@@ -190,6 +270,9 @@ static void coroutine_fn mirror_run(void *opaque)
         return;
     }
 
+    length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
+    s->in_flight_bitmap = bitmap_new(length);
+
     /* If we have no backing file yet in the destination, we cannot let
      * the destination do COW.  Instead, we copy sectors around the
      * dirty data if needed.  We need a bitmap to do that.
@@ -200,7 +283,6 @@ static void coroutine_fn mirror_run(void *opaque)
         bdrv_get_info(s->target, &bdi);
         if (s->buf_size < bdi.cluster_size) {
             s->buf_size = bdi.cluster_size;
-            length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
             s->cow_bitmap = bitmap_new(length);
         }
     }
@@ -208,6 +290,7 @@ static void coroutine_fn mirror_run(void *opaque)
     end = s->common.len >> BDRV_SECTOR_BITS;
     s->buf = qemu_blockalign(bs, s->buf_size);
     nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
+    mirror_free_init(s);
 
     if (s->mode != MIRROR_SYNC_MODE_NONE) {
         /* First part, loop on the sectors and initialize the dirty bitmap.  */
@@ -253,8 +336,9 @@ static void coroutine_fn mirror_run(void *opaque)
          */
         if (qemu_get_clock_ns(rt_clock) - last_pause_ns < SLICE_TIME &&
             s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
-            if (s->in_flight > 0) {
-                trace_mirror_yield(s, s->in_flight, cnt);
+            if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
+                (cnt == 0 && s->in_flight > 0)) {
+                trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
                 qemu_coroutine_yield();
                 continue;
             } else if (cnt != 0) {
@@ -345,6 +429,7 @@ immediate_exit:
     assert(s->in_flight == 0);
     g_free(s->buf);
     g_free(s->cow_bitmap);
+    g_free(s->in_flight_bitmap);
     bdrv_set_dirty_tracking(bs, 0);
     bdrv_iostatus_disable(s->target);
     if (s->complete && ret == 0) {
diff --git a/trace-events b/trace-events
index ed65538..6521504 100644
--- a/trace-events
+++ b/trace-events
@@ -84,7 +84,9 @@ mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64
 mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
 mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64
 mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
-mirror_yield(void *s, int64_t cnt, int in_flight) "s %p dirty count %"PRId64" in_flight %d"
+mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
+mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
+mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v2 45/45] mirror: support arbitrarily-sized iterations
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (43 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 44/45] mirror: support more than one in-flight AIO operation Paolo Bonzini
@ 2012-09-26 15:56 ` Paolo Bonzini
  2012-09-27 14:05 ` [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Kevin Wolf
  45 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody

Yet another optimization is to extend the mirroring iteration to include more
adjacent dirty blocks.  This limits the number of I/O operations and makes
mirroring efficient even with a small granularity.  Most of the infrastructure
is already in place; we only need to put a loop around the computation of
the origin and sector count of the iteration.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c | 100 +++++++++++++++++++++++++++++++++++++++------------------
 trace-events   |   1 +
 2 file modificati, 69 inserzioni(+), 32 rimozioni(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9545f90..4e56101 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -135,7 +135,7 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->common.bs;
     int nb_sectors, nb_sectors_chunk, nb_chunks;
-    int64_t end, sector_num, cluster_num, next_sector, hbitmap_next_sector;
+    int64_t end, sector_num, next_cluster, next_sector, hbitmap_next_sector;
     MirrorOp *op;
 
     s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -147,47 +147,83 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
     }
 
     hbitmap_next_sector = s->sector_num;
+    sector_num = s->sector_num;
+    nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
+    end = s->common.len >> BDRV_SECTOR_BITS;
 
-    /* If we have no backing file yet in the destination, and the cluster size
-     * is very large, we need to do COW ourselves.  The first time a cluster is
-     * copied, copy it entirely.
+    /* Extend the QEMUIOVector to include all adjacent blocks that will
+     * be copied in this operation.
+     *
+     * We have to do this if we have no backing file yet in the destination,
+     * and the cluster size is very large.  Then we need to do COW ourselves.
+     * The first time a cluster is copied, copy it entirely.  Note that,
+     * because both the granularity and the cluster size are powers of two,
+     * the number of sectors to copy cannot exceed one cluster.
      *
-     * Because both the granularity and the cluster size are powers of two, the
-     * number of sectors to copy cannot exceed one cluster.
+     * We also want to extend the QEMUIOVector to include more adjacent
+     * dirty blocks if possible, to limit the number of I/O operations and
+     * run efficiently even with a small granularity.
      */
-    sector_num = s->sector_num;
-    nb_sectors_chunk = nb_sectors = s->granularity >> BDRV_SECTOR_BITS;
-    cluster_num = sector_num / nb_sectors_chunk;
-    if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
-        trace_mirror_cow(s, sector_num);
-        bdrv_round_to_clusters(s->target,
-                               sector_num, nb_sectors_chunk,
-                               &sector_num, &nb_sectors);
-
-        /* The rounding may make us copy sectors before the
-         * first dirty one.
-         */
-        cluster_num = sector_num / nb_sectors_chunk;
-    }
+    nb_chunks = 0;
+    nb_sectors = 0;
+    next_sector = sector_num;
+    next_cluster = sector_num / nb_sectors_chunk;
 
     /* Wait for I/O to this cluster (from a previous iteration) to be done.  */
-    while (test_bit(cluster_num, s->in_flight_bitmap)) {
+    while (test_bit(next_cluster, s->in_flight_bitmap)) {
         trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
         qemu_coroutine_yield();
     }
 
-    end = s->common.len >> BDRV_SECTOR_BITS;
-    nb_sectors = MIN(nb_sectors, end - sector_num);
-    nb_chunks = (nb_sectors + nb_sectors_chunk - 1) / nb_sectors_chunk;
-    while (s->buf_free_count < nb_chunks) {
-        trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
-        qemu_coroutine_yield();
-    }
+    do {
+        int added_sectors, added_chunks;
 
-    /* We have enough free space to copy these sectors.  */
-    if (s->cow_bitmap) {
-        bitmap_set(s->cow_bitmap, cluster_num, nb_chunks);
-    }
+        if (!bdrv_get_dirty(source, next_sector) ||
+            test_bit(next_cluster, s->in_flight_bitmap)) {
+            assert(nb_sectors > 0);
+            break;
+        }
+
+        added_sectors = nb_sectors_chunk;
+        if (s->cow_bitmap && !test_bit(next_cluster, s->cow_bitmap)) {
+            bdrv_round_to_clusters(s->target,
+                                   next_sector, added_sectors,
+                                   &next_sector, &added_sectors);
+
+            /* On the first iteration, the rounding may make us copy
+             * sectors before the first dirty one.
+             */
+            if (next_sector < sector_num) {
+                assert(nb_sectors == 0);
+                sector_num = next_sector;
+                next_cluster = next_sector / nb_sectors_chunk;
+            }
+        }
+
+        added_sectors = MIN(added_sectors, end - (sector_num + nb_sectors));
+        added_chunks = (added_sectors + nb_sectors_chunk - 1) / nb_sectors_chunk;
+
+        /* When doing COW, it may happen that there is not enough space for
+         * a full cluster.  Wait if that is the case.
+         */
+        while (nb_chunks == 0 && s->buf_free_count < added_chunks) {
+            trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
+            qemu_coroutine_yield();
+        }
+        if (s->buf_free_count < nb_chunks + added_chunks) {
+            trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
+            break;
+        }
+
+        /* We have enough free space to copy these sectors.  */
+        if (s->cow_bitmap) {
+            bitmap_set(s->cow_bitmap, next_cluster, added_chunks);
+        }
+        nb_sectors += added_sectors;
+        nb_chunks += added_chunks;
+        next_sector += added_sectors;
+        next_cluster += added_chunks;
+    } while (next_sector < end);
 
     /* Allocate a MirrorOp that is used as an AIO callback.  */
     op = g_slice_new(MirrorOp);
diff --git a/trace-events b/trace-events
index 6521504..412b5da 100644
--- a/trace-events
+++ b/trace-events
@@ -87,6 +87,7 @@ mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_
 mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
 mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
 mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
+mirror_break_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 32/45] qmp: add pull_event function
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 32/45] qmp: add pull_event function Paolo Bonzini
@ 2012-09-26 17:17   ` Luiz Capitulino
  0 siblings, 0 replies; 102+ messages in thread
From: Luiz Capitulino @ 2012-09-26 17:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

On Wed, 26 Sep 2012 17:56:38 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> This function is unlike get_events in that it makes it easy to process
> one event at a time.  This is useful in the mirroring test cases, where
> we want to process just one event (BLOCK_JOB_ERROR) and leave the others
> to a helper function.
> 
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Acked-by: Luiz Capitulino <lcapitulino@redhat.com>

> ---
>  QMP/qmp.py | 20 ++++++++++++++++++++
>  1 file modificato, 20 inserzioni(+)
> 
> diff --git a/QMP/qmp.py b/QMP/qmp.py
> index 36ecc1d..7a598f1 100644
> --- a/QMP/qmp.py
> +++ b/QMP/qmp.py
> @@ -134,6 +134,26 @@ class QEMUMonitorProtocol:
>              raise Exception(ret['error']['desc'])
>          return ret['return']
>  
> +    def pull_event(self, wait=False):
> +        """
> +        Get and delete the first available QMP event.
> +
> +        @param wait: block until an event is available (bool)
> +        """
> +        self.__sock.setblocking(0)
> +        try:
> +            self.__json_read()
> +        except socket.error, err:
> +            if err[0] == errno.EAGAIN:
> +                # No data available
> +                pass
> +        self.__sock.setblocking(1)
> +        if not self.__events and wait:
> +            self.__json_read(only_event=True)
> +        event = self.__events[0]
> +        del self.__events[0]
> +        return event
> +
>      def get_events(self, wait=False):
>          """
>          Get a list of available QMP events.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume Paolo Bonzini
@ 2012-09-26 17:31   ` Eric Blake
  2012-09-27 12:18   ` Kevin Wolf
  1 sibling, 0 replies; 102+ messages in thread
From: Eric Blake @ 2012-09-26 17:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 790 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> Job pausing reuses the existing support for cancellable sleeps.  A pause
> happens at the next sleeping point and lasts until the coroutine is
> re-entered explicitly.  Cancellation was already doing a forced resume,
> so implement it explicitly in terms of resume.
> 
> Paused jobs cannot be canceled without first resuming them.  This ensures
> that I/O errors are never missed by management.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

> +++ b/qapi-schema.json
> @@ -1098,6 +1098,8 @@
>  #
>  # @len: the maximum progress value
>  #
> +# @paused: whether the job is paused (since 1.2)

1.3

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/45] qmp: add block-job-pause and block-job-resume
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 07/45] qmp: add block-job-pause and block-job-resume Paolo Bonzini
@ 2012-09-26 17:45   ` Eric Blake
  2012-09-27  9:23     ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Blake @ 2012-09-26 17:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> Add QMP commands matching the functionality.
> 
> Paused jobs cannot be canceled without first resuming them.  This
> ensures that I/O errors are never missed by management.  However, an
> optional force argument can be specified to allow that.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: document that the commands do not nest; a single resume
>         command will always resume.
> 

> +++ b/qapi-schema.json
> @@ -1855,12 +1855,55 @@
>  #
>  # @device: the device name
>  #
> +# @force: #optional whether to allow cancellation of a paused job (default false)
> +#

Do we need (since 1.3) designation on this argument?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/45] iostatus: move BlockdevOnError declaration to QAPI
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 11/45] iostatus: move BlockdevOnError declaration to QAPI Paolo Bonzini
@ 2012-09-26 17:54   ` Eric Blake
  2012-09-27  9:23     ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Blake @ 2012-09-26 17:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> This will let block-stream reuse the enum.  Places that used the enums
> are renamed accordingly.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

> +#
> +# @stop: for guest operations, stop the virtual machine;
> +#        for jobs, pause the job
> +#
> +# @enospc: same as @stop on ENOSPC, same as @report otherwise.
> +#
> +# Since: 1.3
> +##
> +{ 'enum': 'BlockdevOnError',
> +  'data': ['report', 'ignore', 'enospc', 'stop'] }

Bike-shedding - should the order of the docs match the order of the
'data' array (that is, should 'enospc' be last in both places)?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/45] block: introduce block job error
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 14/45] block: introduce block job error Paolo Bonzini
@ 2012-09-26 19:10   ` Eric Blake
  2012-09-26 19:27     ` Eric Blake
  2012-09-27  9:24     ` Paolo Bonzini
  2012-09-27 13:41   ` Kevin Wolf
  1 sibling, 2 replies; 102+ messages in thread
From: Eric Blake @ 2012-09-26 19:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> The following behaviors are possible:
> 
> 'report': The behavior is the same as in 1.1.  An I/O error,
> respectively during a read or a write, will complete the job immediately
> with an error code.
> 
> 'ignore': An I/O error, respectively during a read or a write, will be
> ignored.  For streaming, the job will complete with an error and the
> backing file will be left in place.  For mirroring, the sector will be
> marked again as dirty and re-examined later.
> 
> 'stop': The job will be paused and the job iostatus will be set to
> failed or nospace, while the VM will keep running.  This can only be
> specified if the block device has rerror=stop and werror=stop or enospc.
> 
> 'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.
> 

> +Emitted when a block job encounters an error.
> +
> +Data:
> +
> +- "device": device name (json-string)
> +- "operation": I/O operation (json-string, "read" or "write")

For symmetry with BLOCK_JOB_{CANCELLED,COMPLETED}, you also need:
- "type":     Job type ("stream" for image streaming, json-string)

Libvirt would like to key off of the 'type' field for all three events.
 Besides, if management issues several block commands in a row, and only
then starts processing the pending event queue, it would be nice to know
whether the error stemmed from a 'stream', 'mirror', or (when combined
with Jeff's patches) 'commit' job.


> +++ b/qapi-schema.json
> @@ -1127,11 +1127,14 @@
>  #
>  # @speed: the rate limit, bytes per second
>  #
> +# @io-status: the status of the job (since 1.2)

1.3

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/45] block: introduce block job error
  2012-09-26 19:10   ` Eric Blake
@ 2012-09-26 19:27     ` Eric Blake
  2012-09-27  9:24     ` Paolo Bonzini
  1 sibling, 0 replies; 102+ messages in thread
From: Eric Blake @ 2012-09-26 19:27 UTC (permalink / raw)
  Cc: kwolf, Paolo Bonzini, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

On 09/26/2012 01:10 PM, Eric Blake wrote:

>> +Emitted when a block job encounters an error.
>> +
>> +Data:
>> +
>> +- "device": device name (json-string)
>> +- "operation": I/O operation (json-string, "read" or "write")
> 
> For symmetry with BLOCK_JOB_{CANCELLED,COMPLETED}, you also need:
> - "type":     Job type ("stream" for image streaming, json-string)
> 
> Libvirt would like to key off of the 'type' field for all three events.
>  Besides, if management issues several block commands in a row, and only
> then starts processing the pending event queue, it would be nice to know
> whether the error stemmed from a 'stream', 'mirror', or (when combined
> with Jeff's patches) 'commit' job.

For that matter, maybe qapi-schema.json should define an enum:

{ 'enum': 'BlockJobType',
  'data': [ 'stream', 'mirror', 'commit' ] }

and have the job type listed throughout the various QMP calls as a
member of that enum, rather than open-coded strings, to make it easier
the next time we add a job type.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/45] stream: add on-error argument
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 15/45] stream: add on-error argument Paolo Bonzini
@ 2012-09-26 20:53   ` Eric Blake
  0 siblings, 0 replies; 102+ messages in thread
From: Eric Blake @ 2012-09-26 20:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> This patch adds support for error management to streaming.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/stream.c   | 28 +++++++++++++++++++++++++++-
>  block_int.h      |  3 ++-
>  blockdev.c       | 11 ++++++++---
>  hmp.c            |  3 ++-
>  qapi-schema.json |  9 +++++++--
>  qmp-commands.hx  |  2 +-
>  6 file modificati, 47 inserzioni(+), 9 rimozioni(-)
> 

> +++ b/qapi-schema.json
> @@ -1831,13 +1831,18 @@
>  #
>  # @speed:  #optional the maximum speed, in bytes per second
>  #
> +# @on-error: #optional the action to take on an error (default report).
> +#            'stop' and 'enospc' can only be used if the block device
> +#            supports io-status (see BlockInfo).  Since 1.2.

1.3

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 25/45] block: introduce BLOCK_JOB_READY event
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 25/45] block: introduce BLOCK_JOB_READY event Paolo Bonzini
@ 2012-09-27  0:01   ` Eric Blake
  2012-09-27  9:25     ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Blake @ 2012-09-27  0:01 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1233 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> Even for jobs that need to be manually completed, management may want
> to take care itself of the completion, not requiring the user to issue
> a command to terminate the job.  In this case we want to avoid that
> they poll us continuously, waiting for completion to become available.
> Thus, add a new event that signals the phase switch and the availability
> of the block-job-complete command.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

>  
> +BLOCK_JOB_READY
> +---------------
> +
> +Emitted when a block job is ready to complete.
> +
> +Data:
> +
> +- "device": device name (json-string)
> +
> +Example:
> +
> +{ "event": "BLOCK_JOB_READY",
> +    "data": { "device": "ide0-hd1",
> +              "operation": "write",
> +              "action": "stop" },

You didn't document operation and action; are you missing documentation
for this event, or are they bad copy-and-paste in the example?

Again, libvirt would really like to have 'type':'mirror' in the data,
for symmetry with all the other BLOCK_JOB_* events.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command Paolo Bonzini
@ 2012-09-27  0:14   ` Eric Blake
  2012-09-27 19:49   ` Jeff Cody
  2012-10-15 17:33   ` Kevin Wolf
  2 siblings, 0 replies; 102+ messages in thread
From: Eric Blake @ 2012-09-27  0:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1645 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> This adds the monitor commands that start the mirroring job.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c       | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  hmp-commands.hx  |  21 ++++++++++
>  hmp.c            |  28 +++++++++++++
>  hmp.h            |   1 +
>  qapi-schema.json |  33 +++++++++++++++
>  qmp-commands.hx  |  42 +++++++++++++++++++
>  6 file modificati, 249 inserzioni(+). 1 rimozione(-)
> 

> +# @drive-mirror
> +#
> +# Start mirroring a block device's writes to a new destination.
> +#
> +# @device:  the name of the device whose writes should be mirrored.
> +#
> +# @target: the target of the new image. If the file exists, or if it
> +#          is a device, the existing file/device will be used as the new
> +#          destination.  If it does not exist, a new file will be created.
> +#
> +# @format: #optional the format of the new destination, default is to
> +#          probe is @mode is 'existing', else the format of the source

s/probe is/probe if/

> +- "device": device name to operate on (json-string)
> +- "target": name of new image file (json-string)
> +- "format": format of new image (json-string, optional)
> +- "mode": how an image file should be created into the target
> +  file/device (NewImageMode, optional, default 'absolute-paths')
> +- "speed": maximum speed of the streaming job, in bytes per second
> +  (json-int)

mention that speed is optional.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case Paolo Bonzini
@ 2012-09-27  0:26   ` Eric Blake
  2012-10-18 12:43   ` Kevin Wolf
  1 sibling, 0 replies; 102+ messages in thread
From: Eric Blake @ 2012-09-27  0:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  tests/qemu-iotests/040     | 353 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/040.out |   5 +
>  tests/qemu-iotests/group   |   3 +-
>  3 file modificati, 360 inserzioni(+). 1 rimozione(-)
>  create mode 100755 tests/qemu-iotests/040
>  create mode 100644 tests/qemu-iotests/040.out
> 
> diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
> new file mode 100755
> index 0000000..44f1e56
> --- /dev/null
> +++ b/tests/qemu-iotests/040
> @@ -0,0 +1,353 @@
> +#!/usr/bin/env python
> +#
> +# Tests for image mirroring.
> +#
> +# Copyright (C) 2012 IBM Corp.
> +#

While you are copying material from IBM, I think you have enough
original content to also claim a Red Hat copyright on this file.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 34/45] host-utils: add ffsl
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 34/45] host-utils: add ffsl Paolo Bonzini
@ 2012-09-27  1:14   ` Eric Blake
  0 siblings, 0 replies; 102+ messages in thread
From: Eric Blake @ 2012-09-27  1:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> We can provide fast versions based on the other functions defined
> by host-utils.h.  Some care is required on glibc, which provides
> ffsl already.
> 

Why glibc chose <string.h> for ffsl even though it uses <strings.h> for
ffs (per POSIX) is beyond me.  At any rate, this report spurred me to
finally file an enhancement request against POSIX:

http://www.austingroupbugs.net/view.php?id=617

Someday in the future, we might just get ffsl standardized. :)

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases Paolo Bonzini
@ 2012-09-27  2:53   ` Eric Blake
  2012-09-27  9:27     ` Paolo Bonzini
  2012-10-24 14:41   ` Kevin Wolf
  1 sibling, 1 reply; 102+ messages in thread
From: Eric Blake @ 2012-09-27  2:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]

On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> HBitmaps provides an array of bits.  The bits are stored as usual in an
> array of unsigned longs, but HBitmap is also optimized to provide fast
> iteration over set bits; going from one bit to the next is O(logB n)
> worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
> that the number of levels is in fact fixed.

> +++ b/hbitmap.c
> @@ -0,0 +1,400 @@
> +/*
> + * Hierarchical Bitmap Data Type
> + *
> + * Copyright Red Hat, Inc., 2012

vs.

> +++ b/tests/test-hbitmap.c
> @@ -0,0 +1,408 @@
> +/*
> + * Hierarchical bitmap unit-tests.
> + *
> + * Copyright (C) 2012 Red Hat Inc.

Is there a preferred form for the copyright line?

> +++ b/hbitmap.h
> @@ -0,0 +1,207 @@

> +
> +/* We need to place a sentinel in level 0 to speed up iteration.  Thus,
> + * we do this instead of HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL.  The
> + * difference is that it allocates an extra level when HBITMAP_LOG_MAX_SIZE
> + * is an exact multiple of BITS_PER_LEVEL.
> + */
> +#define HBITMAP_LEVELS         ((HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL) + 1)

Comment is a bit misleading.  Don't you mean:
Thus, we do this instead of (HBITMAP_LOG_MAX_SIZE + BITS_PER_LEVEL -
1)/BITS_PER_LEVEL.
(aka ceil(1.0*HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL))

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/45] qmp: add block-job-pause and block-job-resume
  2012-09-26 17:45   ` Eric Blake
@ 2012-09-27  9:23     ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27  9:23 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel

Il 26/09/2012 19:45, Eric Blake ha scritto:
>> > @@ -1855,12 +1855,55 @@
>> >  #
>> >  # @device: the device name
>> >  #
>> > +# @force: #optional whether to allow cancellation of a paused job (default false)
>> > +#
> Do we need (since 1.3) designation on this argument?

Yes.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/45] iostatus: move BlockdevOnError declaration to QAPI
  2012-09-26 17:54   ` Eric Blake
@ 2012-09-27  9:23     ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27  9:23 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel

Il 26/09/2012 19:54, Eric Blake ha scritto:
>> > +#
>> > +# @stop: for guest operations, stop the virtual machine;
>> > +#        for jobs, pause the job
>> > +#
>> > +# @enospc: same as @stop on ENOSPC, same as @report otherwise.
>> > +#
>> > +# Since: 1.3
>> > +##
>> > +{ 'enum': 'BlockdevOnError',
>> > +  'data': ['report', 'ignore', 'enospc', 'stop'] }
> Bike-shedding - should the order of the docs match the order of the
> 'data' array (that is, should 'enospc' be last in both places)?

Why not.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/45] block: introduce block job error
  2012-09-26 19:10   ` Eric Blake
  2012-09-26 19:27     ` Eric Blake
@ 2012-09-27  9:24     ` Paolo Bonzini
  1 sibling, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27  9:24 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel

Il 26/09/2012 21:10, Eric Blake ha scritto:
>> > +- "device": device name (json-string)
>> > +- "operation": I/O operation (json-string, "read" or "write")
> For symmetry with BLOCK_JOB_{CANCELLED,COMPLETED}, you also need:
> - "type":     Job type ("stream" for image streaming, json-string)
> 
> Libvirt would like to key off of the 'type' field for all three events.
>  Besides, if management issues several block commands in a row, and only
> then starts processing the pending event queue, it would be nice to know
> whether the error stemmed from a 'stream', 'mirror', or (when combined
> with Jeff's patches) 'commit' job.
> 
> 

Let's add it as a follow-up.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 25/45] block: introduce BLOCK_JOB_READY event
  2012-09-27  0:01   ` Eric Blake
@ 2012-09-27  9:25     ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27  9:25 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel

Il 27/09/2012 02:01, Eric Blake ha scritto:
>> > +{ "event": "BLOCK_JOB_READY",
>> > +    "data": { "device": "ide0-hd1",
>> > +              "operation": "write",
>> > +              "action": "stop" },
> You didn't document operation and action; are you missing documentation
> for this event, or are they bad copy-and-paste in the example?

The latter.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases
  2012-09-27  2:53   ` Eric Blake
@ 2012-09-27  9:27     ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27  9:27 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, jcody, qemu-devel

Il 27/09/2012 04:53, Eric Blake ha scritto:
>> +++ b/hbitmap.c
>> > @@ -0,0 +1,400 @@
>> > +/*
>> > + * Hierarchical Bitmap Data Type
>> > + *
>> > + * Copyright Red Hat, Inc., 2012
> vs.
> 
>> > +++ b/tests/test-hbitmap.c
>> > @@ -0,0 +1,408 @@
>> > +/*
>> > + * Hierarchical bitmap unit-tests.
>> > + *
>> > + * Copyright (C) 2012 Red Hat Inc.
> Is there a preferred form for the copyright line?
> 

I say it depends on where you cut-and-paste from. :)

I think the one with the (C) is better.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/45] blockdev: rename block_stream_cb to a generic block_job_cb
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 02/45] blockdev: rename block_stream_cb to a generic block_job_cb Paolo Bonzini
@ 2012-09-27 11:56   ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2012-09-27 11:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> From: Jeff Cody <jcody@redhat.com>
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>

This should also be signed off by you, not only by Jeff.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/45] block: fix documentation of block_job_cancel_sync
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 03/45] block: fix documentation of block_job_cancel_sync Paolo Bonzini
@ 2012-09-27 12:03   ` Kevin Wolf
  2012-09-27 12:08     ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-09-27 12:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Do this in a separate commit before we move the functions to
> blockjob.h.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: split out of the next patch
> 
>  block_int.h | 4 ++--
>  1 file modificato, 2 inserzioni(+), 2 rimozioni(-)
> 
> diff --git a/block_int.h b/block_int.h
> index ac4245c..7bb95b7 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -425,10 +425,10 @@ void block_job_cancel(BlockJob *job);
>  bool block_job_is_cancelled(BlockJob *job);
>  
>  /**
> - * block_job_cancel:
> + * block_job_cancel_sync:
>   * @job: The job to be canceled.
>   *
> - * Asynchronously cancel the job and wait for it to reach a quiescent
> + * Synchronously cancel the job and wait for it to reach a quiescent
>   * state.  Note that the completion callback will still be called
>   * asynchronously, hence it is *not* valid to call #bdrv_delete
>   * immediately after #block_job_cancel_sync.  Users of block jobs

I still don't agree with the s/Async/Sync/, in my opinion it contradicts
the rest of the comment. If it did cancel the job synchronously, then
the job would be immediately completed, and there would be no need to
wait for a quiescent state nor would the completion callback occur later.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/45] block: fix documentation of block_job_cancel_sync
  2012-09-27 12:03   ` Kevin Wolf
@ 2012-09-27 12:08     ` Paolo Bonzini
  2012-09-27 12:13       ` Kevin Wolf
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27 12:08 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 27/09/2012 14:03, Kevin Wolf ha scritto:
>> >  /**
>> > - * block_job_cancel:
>> > + * block_job_cancel_sync:
>> >   * @job: The job to be canceled.
>> >   *
>> > - * Asynchronously cancel the job and wait for it to reach a quiescent
>> > + * Synchronously cancel the job and wait for it to reach a quiescent
>> >   * state.  Note that the completion callback will still be called
>> >   * asynchronously, hence it is *not* valid to call #bdrv_delete
>> >   * immediately after #block_job_cancel_sync.  Users of block jobs
> I still don't agree with the s/Async/Sync/, in my opinion it contradicts
> the rest of the comment. If it did cancel the job synchronously, then
> the job would be immediately completed, and there would be no need to
> wait for a quiescent state nor would the completion callback occur later.

Now that I read it again, the comment is obsolete.

block_job_cancel_sync stalls until block_job_cancel_cb is called, and
that calls the completion callback.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/45] block: fix documentation of block_job_cancel_sync
  2012-09-27 12:08     ` Paolo Bonzini
@ 2012-09-27 12:13       ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2012-09-27 12:13 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 27.09.2012 14:08, schrieb Paolo Bonzini:
> Il 27/09/2012 14:03, Kevin Wolf ha scritto:
>>>>  /**
>>>> - * block_job_cancel:
>>>> + * block_job_cancel_sync:
>>>>   * @job: The job to be canceled.
>>>>   *
>>>> - * Asynchronously cancel the job and wait for it to reach a quiescent
>>>> + * Synchronously cancel the job and wait for it to reach a quiescent
>>>>   * state.  Note that the completion callback will still be called
>>>>   * asynchronously, hence it is *not* valid to call #bdrv_delete
>>>>   * immediately after #block_job_cancel_sync.  Users of block jobs
>> I still don't agree with the s/Async/Sync/, in my opinion it contradicts
>> the rest of the comment. If it did cancel the job synchronously, then
>> the job would be immediately completed, and there would be no need to
>> wait for a quiescent state nor would the completion callback occur later.
> 
> Now that I read it again, the comment is obsolete.
> 
> block_job_cancel_sync stalls until block_job_cancel_cb is called, and
> that calls the completion callback.

Okay. Best you rephrase the whole comment then instead of changing just
one word.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume Paolo Bonzini
  2012-09-26 17:31   ` Eric Blake
@ 2012-09-27 12:18   ` Kevin Wolf
  2012-09-27 12:27     ` Paolo Bonzini
  1 sibling, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-09-27 12:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Job pausing reuses the existing support for cancellable sleeps.  A pause
> happens at the next sleeping point and lasts until the coroutine is
> re-entered explicitly.  Cancellation was already doing a forced resume,
> so implement it explicitly in terms of resume.
> 
> Paused jobs cannot be canceled without first resuming them.  This ensures
> that I/O errors are never missed by management.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

I think there's a problem with terminology at least. What does "paused"
really mean? Is it that the job has been requested to pause, or that it
has actually yielded and is inactive?

The commit message seems to use the latter semantics (which I would
consider the intuitive one), the QMP documentation leaves it unclear,
but the code actually implements the former semantics.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume
  2012-09-27 12:18   ` Kevin Wolf
@ 2012-09-27 12:27     ` Paolo Bonzini
  2012-09-27 12:45       ` Kevin Wolf
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27 12:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 27/09/2012 14:18, Kevin Wolf ha scritto:
>> > 
>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> I think there's a problem with terminology at least. What does "paused"
> really mean? Is it that the job has been requested to pause, or that it
> has actually yielded and is inactive?
> 
> The commit message seems to use the latter semantics (which I would
> consider the intuitive one),

You mean this: "Paused jobs cannot be canceled without first resuming
them".  I can add a specification, like "(even if the job actually has
not reached the sleeping point and thus is still running)".

> the QMP documentation leaves it unclear,
> but the code actually implements the former semantics.

This code comment is clear:

    /**
     * Set to true if the job is either paused, or will pause itself
     * as soon as possible (if busy == true).
     */
    bool paused;

but this one can indeed use some improvement.

/**
 * block_job_is_paused:
 * @job: The job being queried.
 *
 * Returns whether the job is currently paused.
 */
bool block_job_is_paused(BlockJob *job);


>From the QMP client's point of view it doesn't really matter, does it?

- even after a job that writes to disk X has "really" paused, you cannot
read or write disk X.  It's still owned by QEMU, it hasn't been flushed,
it may play games like lazy refcounts.

- what matters is that a resume undoes a pause, even if it is still
pending (which it does).

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/45] block: rename block_job_complete to block_job_completed
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 09/45] block: rename block_job_complete to block_job_completed Paolo Bonzini
@ 2012-09-27 12:30   ` Kevin Wolf
  2012-09-27 20:31     ` Jeff Cody
  0 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-09-27 12:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> The imperative will be used for the QMP command.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

I would still be glad if we found a better name. Having two functions
block_job_complete() and block_job_completed() sounds like a great
source for confusion.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume
  2012-09-27 12:27     ` Paolo Bonzini
@ 2012-09-27 12:45       ` Kevin Wolf
  2012-09-27 12:57         ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-09-27 12:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 27.09.2012 14:27, schrieb Paolo Bonzini:
> Il 27/09/2012 14:18, Kevin Wolf ha scritto:
>>>>
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> I think there's a problem with terminology at least. What does "paused"
>> really mean? Is it that the job has been requested to pause, or that it
>> has actually yielded and is inactive?
>>
>> The commit message seems to use the latter semantics (which I would
>> consider the intuitive one),
> 
> You mean this: "Paused jobs cannot be canceled without first resuming
> them".  I can add a specification, like "(even if the job actually has
> not reached the sleeping point and thus is still running)".

I actually meant "pause happens at the next sleeping point", which isn't
unspecific at all.

>> the QMP documentation leaves it unclear,
>> but the code actually implements the former semantics.
> 
> This code comment is clear:
> 
>     /**
>      * Set to true if the job is either paused, or will pause itself
>      * as soon as possible (if busy == true).
>      */
>     bool paused;

Yes, this one is a good and clear comment (and possibly I wouldn't even
have noticed without this comment)

> but this one can indeed use some improvement.
> 
> /**
>  * block_job_is_paused:
>  * @job: The job being queried.
>  *
>  * Returns whether the job is currently paused.
>  */
> bool block_job_is_paused(BlockJob *job);
> 
> 
> From the QMP client's point of view it doesn't really matter, does it?
> 
> - even after a job that writes to disk X has "really" paused, you cannot
> read or write disk X.  It's still owned by QEMU, it hasn't been flushed,
> it may play games like lazy refcounts.

I'm not sure about this one. Consider things like a built-in NBD server.
Probably we'll find more cases in the future, where some monitor command
might seem to be safe while a job is paused.

It makes me nervous that clients could make assumptions based on the
paused state despite having no way to make sure that a job is actually
stopped - the documentation doesn't even tell them about the fact that
"paused" doesn't really mean what they think it means.

> - what matters is that a resume undoes a pause, even if it is still
> pending (which it does).

Agreed, this part looks okay.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume
  2012-09-27 12:45       ` Kevin Wolf
@ 2012-09-27 12:57         ` Paolo Bonzini
  2012-09-27 13:51           ` Kevin Wolf
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27 12:57 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 27/09/2012 14:45, Kevin Wolf ha scritto:
> Am 27.09.2012 14:27, schrieb Paolo Bonzini:
>> Il 27/09/2012 14:18, Kevin Wolf ha scritto:
>>>>>
>>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> I think there's a problem with terminology at least. What does "paused"
>>> really mean? Is it that the job has been requested to pause, or that it
>>> has actually yielded and is inactive?
>>>
>>> The commit message seems to use the latter semantics (which I would
>>> consider the intuitive one),
>>
>> You mean this: "Paused jobs cannot be canceled without first resuming
>> them".  I can add a specification, like "(even if the job actually has
>> not reached the sleeping point and thus is still running)".
> 
> I actually meant "pause happens at the next sleeping point", which isn't
> unspecific at all.

Hmm, there are two aspects: 1) when things stop running; 2) when the job
reports itself to be paused.  The commit message describes (1)
precisely, and doesn't say anything about (2).  That's too specific for
a commit message, but the header file describes it precisely.

However, in the QMP documentation, the good comment for "bool paused;"
must be replicated in BlockJobInfo's "paused" member.

>> From the QMP client's point of view it doesn't really matter, does it?
>>
>> - even after a job that writes to disk X has "really" paused, you cannot
>> read or write disk X.  It's still owned by QEMU, it hasn't been flushed,
>> it may play games like lazy refcounts.
> 
> I'm not sure about this one. Consider things like a built-in NBD server.
> Probably we'll find more cases in the future, where some monitor command
> might seem to be safe while a job is paused.

Ok, that's a good point.  I'll add a "busy" member to BlockJobInfo.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/45] block: introduce block job error
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 14/45] block: introduce block job error Paolo Bonzini
  2012-09-26 19:10   ` Eric Blake
@ 2012-09-27 13:41   ` Kevin Wolf
  2012-09-27 14:50     ` Paolo Bonzini
  1 sibling, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-09-27 13:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> The following behaviors are possible:
> 
> 'report': The behavior is the same as in 1.1.  An I/O error,
> respectively during a read or a write, will complete the job immediately
> with an error code.
> 
> 'ignore': An I/O error, respectively during a read or a write, will be
> ignored.  For streaming, the job will complete with an error and the
> backing file will be left in place.  For mirroring, the sector will be
> marked again as dirty and re-examined later.
> 
> 'stop': The job will be paused and the job iostatus will be set to
> failed or nospace, while the VM will keep running.  This can only be
> specified if the block device has rerror=stop and werror=stop or enospc.

Why is that? I don't see the dependency on rerror/werror in the code,
and documentation doesn't mention it either.

> 'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.
> 
> In all cases, even for 'report', the I/O error is reported as a QMP
> event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.
> 
> It is possible that while stopping the VM a BLOCK_IO_ERROR event will be
> reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa.
> This is not really avoidable since stopping the VM completes all pending
> I/O requests.  In fact, it is already possible now that a series of
> BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop
> calls bdrv_drain_all and this can generate further errors.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: introduced block_job_iostatus_reset.  Removed sorting
>         of iostatus values with "failed" overriding "nospace" but not
>         vice versa.  Documented that block-job-resume clears the
>         iostatus field.  Always set errors on the block job even if
>         they happen on the target; this removes the need to expose
>         the target's BlockInfo in "query-blockjobs".

> +BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
> +                                        BlockdevOnError on_err,
> +                                        int is_read, int error)
> +{
> +    BlockErrorAction action;
> +
> +    switch (on_err) {
> +    case BLOCKDEV_ON_ERROR_ENOSPC:
> +        action = (error == ENOSPC) ? BDRV_ACTION_STOP : BDRV_ACTION_REPORT;
> +        break;
> +    case BLOCKDEV_ON_ERROR_STOP:
> +        action = BDRV_ACTION_STOP;
> +        break;
> +    case BLOCKDEV_ON_ERROR_REPORT:
> +        action = BDRV_ACTION_REPORT;
> +        break;
> +    case BLOCKDEV_ON_ERROR_IGNORE:
> +        action = BDRV_ACTION_IGNORE;
> +        break;
> +    default:
> +        abort();
> +    }

Isn't this a duplication of bdrv_get_error_action()?

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume
  2012-09-27 12:57         ` Paolo Bonzini
@ 2012-09-27 13:51           ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2012-09-27 13:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 27.09.2012 14:57, schrieb Paolo Bonzini:
> Il 27/09/2012 14:45, Kevin Wolf ha scritto:
>> Am 27.09.2012 14:27, schrieb Paolo Bonzini:
>>> Il 27/09/2012 14:18, Kevin Wolf ha scritto:
>>>>>>
>>>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>> I think there's a problem with terminology at least. What does "paused"
>>>> really mean? Is it that the job has been requested to pause, or that it
>>>> has actually yielded and is inactive?
>>>>
>>>> The commit message seems to use the latter semantics (which I would
>>>> consider the intuitive one),
>>>
>>> You mean this: "Paused jobs cannot be canceled without first resuming
>>> them".  I can add a specification, like "(even if the job actually has
>>> not reached the sleeping point and thus is still running)".
>>
>> I actually meant "pause happens at the next sleeping point", which isn't
>> unspecific at all.
> 
> Hmm, there are two aspects: 1) when things stop running; 2) when the job
> reports itself to be paused.  The commit message describes (1)
> precisely, and doesn't say anything about (2).  That's too specific for
> a commit message, but the header file describes it precisely.

Yes, I understood that, I just found it confusing that both were called
"paused" in different contexts.

> However, in the QMP documentation, the good comment for "bool paused;"
> must be replicated in BlockJobInfo's "paused" member.
> 
>>> From the QMP client's point of view it doesn't really matter, does it?
>>>
>>> - even after a job that writes to disk X has "really" paused, you cannot
>>> read or write disk X.  It's still owned by QEMU, it hasn't been flushed,
>>> it may play games like lazy refcounts.
>>
>> I'm not sure about this one. Consider things like a built-in NBD server.
>> Probably we'll find more cases in the future, where some monitor command
>> might seem to be safe while a job is paused.
> 
> Ok, that's a good point.  I'll add a "busy" member to BlockJobInfo.

Ok, thanks. Together with the comment from the bool paused field this
should make pretty clear what clients would have to check for.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3
  2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
                   ` (44 preceding siblings ...)
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 45/45] mirror: support arbitrarily-sized iterations Paolo Bonzini
@ 2012-09-27 14:05 ` Kevin Wolf
  2012-09-27 14:57   ` Paolo Bonzini
  45 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-09-27 14:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Hi all, this is the resubmission of my block job patches, originally
> meant for 1.2.  This still does not include a persistent dirty bitmap,
> which I hope to post in October.
> 
> The patches are organized as follows:
> 
> 01-13   preparatory work for block job errors, including support for
>         pausing and resuming jobs
> 
> 14-18   introduce block job errors, and add support in block-stream

Completed review of patches 1-18 now. Do you think it would make sense
to split them off and do a v3 while the review goes on for the rest of
the series? Dealing with huge series is hard for me as a reviewer, and
probably for you as an author as well.

> 19-25   preparatory work for block mirroring: new commands/concepts
>         and creating new functions out of existing code.
> 
> 26-33   introduce a simple version of mirroring.  The initial patch
>         add the mirroring logic, followed by the ability to switch to
>         the destination of migration and to handle errors during the job.
>         All these changes come with testcases.  Removing the ability to
>         query the target file is the main change from v1.

We can probably make a similar cut at patch 33 if necessary.

> 34-41   These patches introduce the first optimizations, namely supporting
>         an arbitrary granularity for the dirty bitmap.  The current default,
>         1M, is too coarse to let the job converge quickly and in almost
>         real-time.  These patches reimplement the block device dirty bitmap
>         to allow efficient iteration, and add cluster copy-on-write logic.
>         Cluster copy-on-write is needed because management will want to
>         start the copy before the backing file is in place in the destination;
>         if mirroring takes care of copy-on-write, BDRV_O_NO_BACKING can be
>         used even if the granularity is smaller than the cluster size.
> 
> 42-45   A second round optimizations, replacing serialized read-write
>         operations with multiple asynchronous I/O operations.  The various
>         in-flight operations can be of arbitrary size.  The initial copy
>         will end up reading large chunks sequentially (10M by default),
>         while subsequent passes can mimic more closely the guest's I/O
>         patterns.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/45] block: introduce block job error
  2012-09-27 13:41   ` Kevin Wolf
@ 2012-09-27 14:50     ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27 14:50 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 27/09/2012 15:41, Kevin Wolf ha scritto:
>> > +BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
>> > +                                        BlockdevOnError on_err,
>> > +                                        int is_read, int error)
>> > +{
>> > +    BlockErrorAction action;
>> > +
>> > +    switch (on_err) {
>> > +    case BLOCKDEV_ON_ERROR_ENOSPC:
>> > +        action = (error == ENOSPC) ? BDRV_ACTION_STOP : BDRV_ACTION_REPORT;
>> > +        break;
>> > +    case BLOCKDEV_ON_ERROR_STOP:
>> > +        action = BDRV_ACTION_STOP;
>> > +        break;
>> > +    case BLOCKDEV_ON_ERROR_REPORT:
>> > +        action = BDRV_ACTION_REPORT;
>> > +        break;
>> > +    case BLOCKDEV_ON_ERROR_IGNORE:
>> > +        action = BDRV_ACTION_IGNORE;
>> > +        break;
>> > +    default:
>> > +        abort();
>> > +    }
> Isn't this a duplication of bdrv_get_error_action()?

bdrv_get_error_action() has this:

BlockdevOnError on_err = is_read ? bs->on_read_error : bs->on_write_error;

It can use some refactoring to commonize the switch statement, but
it's not a direct replacement.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3
  2012-09-27 14:05 ` [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Kevin Wolf
@ 2012-09-27 14:57   ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-27 14:57 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 27/09/2012 16:05, Kevin Wolf ha scritto:
>> > 01-13   preparatory work for block job errors, including support for
>> >         pausing and resuming jobs
>> > 
>> > 14-18   introduce block job errors, and add support in block-stream
> Completed review of patches 1-18 now. Do you think it would make sense
> to split them off and do a v3 while the review goes on for the rest of
> the series? Dealing with huge series is hard for me as a reviewer, and
> probably for you as an author as well.

Definitely.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/45] block: add bdrv_open_backing_file
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 21/45] block: add bdrv_open_backing_file Paolo Bonzini
@ 2012-09-27 18:14   ` Jeff Cody
  0 siblings, 0 replies; 102+ messages in thread
From: Jeff Cody @ 2012-09-27 18:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel

On 09/26/2012 11:56 AM, Paolo Bonzini wrote:
> Mirroring runs without the backing file so that it can be copied outside
> QEMU.  However, we need to add it at the time the job is completed and
> QEMU switches to the target.  Factor out the common bits of opening an
> image and completing a mirroring operation.
> 
> The new function does not assume that the file is closed immediately after
> it returns failure, so it keeps the BDRV_O_NO_BACKING flag up-to-date.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: do not close bs if the function fails (brain fart).
> 
>  block.c | 62 ++++++++++++++++++++++++++++++++++++++++++++------------------
>  block.h |  1 +
>  2 file modificati, 45 inserzioni(+), 18 rimozioni(-)
> 
> diff --git a/block.c b/block.c
> index 703261d..6ee7052 100644
> --- a/block.c
> +++ b/block.c
> @@ -734,6 +734,48 @@ int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags)
>      return 0;
>  }
>  
> +int bdrv_open_backing_file(BlockDriverState *bs)
> +{
> +    char backing_filename[PATH_MAX];
> +    int back_flags, ret;
> +    BlockDriver *back_drv = NULL;
> +
> +    if (bs->backing_hd != NULL) {
> +        return 0;
> +    }
> +
> +    bs->open_flags &= ~BDRV_O_NO_BACKING;
> +    if (bs->backing_file[0] == '\0') {
> +        return 0;
> +    }
> +
> +    bs->backing_hd = bdrv_new("");
> +    bdrv_get_full_backing_filename(bs, backing_filename,
> +                                   sizeof(backing_filename));
> +
> +    if (bs->backing_format[0] != '\0') {
> +        back_drv = bdrv_find_format(bs->backing_format);
> +    }
> +
> +    /* backing files always opened read-only */
> +    back_flags = bs->open_flags & ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT);
> +
> +    ret = bdrv_open(bs->backing_hd, backing_filename, back_flags, back_drv);
> +    if (ret < 0) {
> +        bdrv_delete(bs->backing_hd);
> +        bs->backing_hd = NULL;
> +        bs->open_flags |= BDRV_O_NO_BACKING;
> +        return ret;
> +    }
> +    if (bs->is_temporary) {
> +        bs->backing_hd->keep_read_only = !(bs->open_flags & BDRV_O_RDWR);
> +    } else {
> +        /* base images use the same setting as leaf */
> +        bs->backing_hd->keep_read_only = bs->keep_read_only;
> +    }

The bs->keep_read_only flag no longer exists.  I think you can safely delete
the above 6 lines... BDRV_O_ALLOW_RDWR is now used instead, and will be
pulled in from bs->open_flags.  (see commit be028ad and dc1c13d).

> +    return 0;
> +}
> +
>  /*
>   * Opens a disk image (raw, qcow2, vmdk, ...)
>   */
> @@ -821,24 +863,8 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>      }
>  
>      /* If there is a backing file, use it */
> -    if ((flags & BDRV_O_NO_BACKING) == 0 && bs->backing_file[0] != '\0') {
> -        char backing_filename[PATH_MAX];
> -        int back_flags;
> -        BlockDriver *back_drv = NULL;
> -
> -        bs->backing_hd = bdrv_new("");
> -        bdrv_get_full_backing_filename(bs, backing_filename,
> -                                       sizeof(backing_filename));
> -
> -        if (bs->backing_format[0] != '\0') {
> -            back_drv = bdrv_find_format(bs->backing_format);
> -        }
> -
> -        /* backing files always opened read-only */
> -        back_flags =
> -            flags & ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING);
> -
> -        ret = bdrv_open(bs->backing_hd, backing_filename, back_flags, back_drv);
> +    if ((flags & BDRV_O_NO_BACKING) == 0) {
> +        ret = bdrv_open_backing_file(bs);
>          if (ret < 0) {
>              bdrv_close(bs);
>              return ret;
> diff --git a/block.h b/block.h
> index aa1121a..08479e1 100644
> --- a/block.h
> +++ b/block.h
> @@ -133,6 +133,7 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
>  void bdrv_delete(BlockDriverState *bs);
>  int bdrv_parse_cache_flags(const char *mode, int *flags);
>  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
> +int bdrv_open_backing_file(BlockDriverState *bs);
>  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>                BlockDriver *drv);
>  BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command Paolo Bonzini
  2012-09-27  0:14   ` Eric Blake
@ 2012-09-27 19:49   ` Jeff Cody
  2012-10-15 17:33   ` Kevin Wolf
  2 siblings, 0 replies; 102+ messages in thread
From: Jeff Cody @ 2012-09-27 19:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel

On 09/26/2012 11:56 AM, Paolo Bonzini wrote:
> This adds the monitor commands that start the mirroring job.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c       | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  hmp-commands.hx  |  21 ++++++++++
>  hmp.c            |  28 +++++++++++++
>  hmp.h            |   1 +
>  qapi-schema.json |  33 +++++++++++++++
>  qmp-commands.hx  |  42 +++++++++++++++++++
>  6 file modificati, 249 inserzioni(+). 1 rimozione(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 9069ca1..722aab5 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1118,6 +1117,130 @@ void qmp_block_stream(const char *device, bool has_base,
>      trace_qmp_block_stream(bs, bs->job);
>  }
>  
> +void qmp_drive_mirror(const char *device, const char *target,
> +                      bool has_format, const char *format,
> +                      enum MirrorSyncMode sync,
> +                      bool has_mode, enum NewImageMode mode,
> +                      bool has_speed, int64_t speed, Error **errp)
> +{
> +    BlockDriverInfo bdi;
> +    BlockDriverState *bs;
> +    BlockDriverState *source, *target_bs;
> +    BlockDriver *proto_drv;
> +    BlockDriver *drv = NULL;
> +    Error *local_err = NULL;
> +    int flags;
> +    uint64_t size;
> +    int ret;
> +
> +    if (!has_speed) {
> +        speed = 0;
> +    }
> +    if (!has_mode) {
> +        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
> +    }
> +
> +    bs = bdrv_find(device);
> +    if (!bs) {
> +        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
> +        return;
> +    }
> +
> +    if (!has_format) {
> +        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
> +    }
> +    if (format) {
> +        drv = bdrv_find_format(format);
> +        if (!drv) {
> +            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
> +            return;
> +        }
> +    }
> +
> +    if (!bdrv_is_inserted(bs)) {
> +        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
> +        return;
> +    }
> +
> +    if (bdrv_in_use(bs)) {
> +        error_set(errp, QERR_DEVICE_IN_USE, device);
> +        return;
> +    }
> +
> +    flags = bs->open_flags | BDRV_O_RDWR;
> +    source = bs->backing_hd;
> +    if (!source && sync == MIRROR_SYNC_MODE_TOP) {
> +        sync = MIRROR_SYNC_MODE_FULL;
> +    }
> +
> +    proto_drv = bdrv_find_protocol(target);
> +    if (!proto_drv) {
> +        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
> +        return;
> +    }
> +
> +    if (sync == MIRROR_SYNC_MODE_FULL && mode != NEW_IMAGE_MODE_EXISTING) {
> +        /* create new image w/o backing file */
> +        assert(format && drv);
> +        bdrv_get_geometry(bs, &size);
> +        size *= 512;
> +        ret = bdrv_img_create(target, format,
> +                              NULL, NULL, NULL, size, flags);
> +    } else {
> +        switch (mode) {
> +        case NEW_IMAGE_MODE_EXISTING:
> +            ret = 0;
> +            break;
> +        case NEW_IMAGE_MODE_ABSOLUTE_PATHS:
> +            /* create new image with backing file */
> +            ret = bdrv_img_create(target, format,
> +                                  source->filename,
> +                                  source->drv->format_name,

Should we assert(source->drv != NULL)?  Or, alternatively, use
bdrv_get_format_name(source) here.

> +                                  NULL, -1, flags);
> +            break;
> +        default:
> +            abort();
> +        }
> +    }
> +
> +    if (ret) {
> +        error_set(errp, QERR_OPEN_FILE_FAILED, target);
> +        return;
> +    }
> +
> +    target_bs = bdrv_new("");
> +    ret = bdrv_open(target_bs, target, flags | BDRV_O_NO_BACKING, drv);
> +
> +    if (ret < 0) {
> +        bdrv_delete(target_bs);
> +        error_set(errp, QERR_OPEN_FILE_FAILED, target);
> +        return;
> +    }
> +
> +    /* We need a backing file if we will copy parts of a cluster.  */
> +    if (bdrv_get_info(target_bs, &bdi) >= 0 && bdi.cluster_size != 0 &&
> +        bdi.cluster_size >= BDRV_SECTORS_PER_DIRTY_CHUNK * 512) {
> +        ret = bdrv_open_backing_file(target_bs);
> +        if (ret < 0) {
> +            bdrv_delete(target_bs);
> +            error_set(errp, QERR_OPEN_FILE_FAILED, target);
> +            return;
> +        }
> +    }
> +
> +    mirror_start(bs, target_bs, speed, sync, block_job_cb, bs, &local_err);
> +    if (local_err != NULL) {
> +        bdrv_delete(target_bs);
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /* Grab a reference so hotplug does not delete the BlockDriverState from
> +     * underneath us.
> +     */
> +    drive_get_ref(drive_get_by_blockdev(bs));
> +}
> +
>  static BlockJob *find_block_job(const char *device)
>  {
>      BlockDriverState *bs;
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 4e52436..9ac4cf6 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1006,6 +1006,27 @@ Snapshot device, using snapshot file as target if provided
>  ETEXI
>  
>      {
> +        .name       = "drive_mirror",
> +        .args_type  = "reuse:-n,full:-f,device:B,target:s,format:s?",
> +        .params     = "[-n] [-f] device target [format]",
> +        .help       = "initiates live storage\n\t\t\t"
> +                      "migration for a device. The device's contents are\n\t\t\t"
> +                      "copied to the new image file, including data that\n\t\t\t"
> +                      "is written after the command is started.\n\t\t\t"
> +                      "The -n flag requests QEMU to reuse the image found\n\t\t\t"
> +                      "in new-image-file, instead of recreating it from scratch.\n\t\t\t"
> +                      "The -f flag requests QEMU to copy the whole disk,\n\t\t\t"
> +                      "so that the result does not need a backing file.\n\t\t\t",
> +        .mhandler.cmd = hmp_drive_mirror,
> +    },
> +STEXI
> +@item drive_mirror
> +@findex drive_mirror
> +Start mirroring a block device's writes to a new destination,
> +using the specified target.
> +ETEXI
> +
> +    {
>          .name       = "drive_add",
>          .args_type  = "pci_addr:s,opts:s",
>          .params     = "[[<domain>:]<bus>:]<slot>\n"
> diff --git a/hmp.c b/hmp.c
> index 7819110..94d4d41 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -759,6 +759,34 @@ void hmp_block_resize(Monitor *mon, const QDict *qdict)
>      hmp_handle_error(mon, &errp);
>  }
>  
> +void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
> +{
> +    const char *device = qdict_get_str(qdict, "device");
> +    const char *filename = qdict_get_str(qdict, "target");
> +    const char *format = qdict_get_try_str(qdict, "format");
> +    int reuse = qdict_get_try_bool(qdict, "reuse", 0);
> +    int full = qdict_get_try_bool(qdict, "full", 0);
> +    enum NewImageMode mode;
> +    Error *errp = NULL;
> +
> +    if (!filename) {
> +        error_set(&errp, QERR_MISSING_PARAMETER, "target");
> +        hmp_handle_error(mon, &errp);
> +        return;
> +    }
> +
> +    if (reuse) {
> +        mode = NEW_IMAGE_MODE_EXISTING;
> +    } else {
> +        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
> +    }
> +
> +    qmp_drive_mirror(device, filename, !!format, format,
> +                     full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
> +                     true, mode, false, 0, &errp);
> +    hmp_handle_error(mon, &errp);
> +}
> +
>  void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
>  {
>      const char *device = qdict_get_str(qdict, "device");
> diff --git a/hmp.h b/hmp.h
> index 7bdd23c..34eb2b3 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -51,6 +51,7 @@ void hmp_block_passwd(Monitor *mon, const QDict *qdict);
>  void hmp_balloon(Monitor *mon, const QDict *qdict);
>  void hmp_block_resize(Monitor *mon, const QDict *qdict);
>  void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
> +void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 9ba2f86..4827ed3 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1529,6 +1529,39 @@
>    'returns': 'str' }
>  
>  ##
> +# @drive-mirror
> +#
> +# Start mirroring a block device's writes to a new destination.
> +#
> +# @device:  the name of the device whose writes should be mirrored.
> +#
> +# @target: the target of the new image. If the file exists, or if it
> +#          is a device, the existing file/device will be used as the new
> +#          destination.  If it does not exist, a new file will be created.
> +#
> +# @format: #optional the format of the new destination, default is to
> +#          probe is @mode is 'existing', else the format of the source
> +#
> +# @mode: #optional whether and how QEMU should create a new image, default is
> +#        'absolute-paths'.
> +#
> +# @speed:  #optional the maximum speed, in bytes per second
> +#
> +# @sync: what parts of the disk image should be copied to the destination
> +#        (all the disk, only the sectors allocated in the topmost image, or
> +#        only new I/O).
> +#
> +# Returns: nothing on success
> +#          If @device is not a valid block device, DeviceNotFound
> +#
> +# Since 1.3
> +##
> +{ 'command': 'drive-mirror',
> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> +            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> +            '*speed': 'int' } }
> +
> +##
>  # @migrate_cancel
>  #
>  # Cancel the current executing migration process.
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 017544e..25800a8 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -906,6 +906,48 @@ Example:
>  EQMP
>  
>      {
> +        .name       = "drive-mirror",
> +        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?",
> +        .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
> +    },
> +
> +SQMP
> +drive-mirror
> +------------
> +
> +Start mirroring a block device's writes to a new destination. target
> +specifies the target of the new image. If the file exists, or if it is
> +a device, it will be used as the new destination for writes. If does not
> +exist, a new file will be created. format specifies the format of the
> +mirror image, default is to probe if mode='existing', else the format
> +of the source.
> +
> +Arguments:
> +
> +- "device": device name to operate on (json-string)
> +- "target": name of new image file (json-string)
> +- "format": format of new image (json-string, optional)
> +- "mode": how an image file should be created into the target
> +  file/device (NewImageMode, optional, default 'absolute-paths')
> +- "speed": maximum speed of the streaming job, in bytes per second
> +  (json-int)
> +- "sync": what parts of the disk image should be copied to the destination;
> +  possibilities include "full" for all the disk, "top" for only the sectors
> +  allocated in the topmost image, or "none" to only replicate new I/O
> +  (MirrorSyncMode).
> +
> +
> +Example:
> +
> +-> { "execute": "drive-mirror", "arguments": { "device": "ide-hd0",
> +                                               "target": "/some/place/my-image",
> +                                               "sync": "full",
> +                                               "format": "qcow2" } }
> +<- { "return": {} }
> +
> +EQMP
> +
> +    {
>          .name       = "balloon",
>          .args_type  = "value:M",
>          .mhandler.cmd_new = qmp_marshal_input_balloon,
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/45] block: rename block_job_complete to block_job_completed
  2012-09-27 12:30   ` Kevin Wolf
@ 2012-09-27 20:31     ` Jeff Cody
  2012-09-28 11:00       ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff Cody @ 2012-09-27 20:31 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, qemu-devel

On 09/27/2012 08:30 AM, Kevin Wolf wrote:
> Am 26.09.2012 17:56, schrieb Paolo Bonzini:
>> The imperative will be used for the QMP command.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> I would still be glad if we found a better name. Having two functions
> block_job_complete() and block_job_completed() sounds like a great
> source for confusion.
> 
> Kevin
> 

If I understand correctly, what we have is:

block_job_completed(): cleans up when a job is done
block_job_complete(): requests that a block job be completed

How about renaming both of them, respectively, to:

block_job_cleanup():  cleans up when a job is done
block_job_request_completion(): requests that a block job be completed

I think that would remove any ambiguity.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/45] block: rename block_job_complete to block_job_completed
  2012-09-27 20:31     ` Jeff Cody
@ 2012-09-28 11:00       ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-09-28 11:00 UTC (permalink / raw)
  To: jcody; +Cc: Kevin Wolf, qemu-devel

Il 27/09/2012 22:31, Jeff Cody ha scritto:
> 
> block_job_completed(): cleans up when a job is done

It also raises a BLOCK_JOB_COMPLETED event.

> block_job_complete(): requests that a block job be completed
> 
> How about renaming both of them, respectively, to:
> 
> block_job_cleanup():  cleans up when a job is done
> block_job_request_completion(): requests that a block job be completed
> 
> I think that would remove any ambiguity.

I'll keep this patch for now, since it doesn't introduce any ambiguity yet.

I could rename block_job_complete to block_job_request_completion as you
suggested, but it would be inconsistent with the QMP command
block-job-complete, so there are downsides as well.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 19/45] block: add bdrv_query_info
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 19/45] block: add bdrv_query_info Paolo Bonzini
@ 2012-10-15 15:42   ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2012-10-15 15:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Extract it out of the implementation of "info block".
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: moved bdrv_query_info close to qmp_query_block.
>         Fixed conflicts for the new field 'encryption_key_missing'
>         too.
> 
>  block.c | 104 +++++++++++++++++++++++++++++++---------------------------------
>  block.h |   1 +
>  2 file modificati, 52 inserzioni(+), 53 rimozioni(-)
> 
> diff --git a/block.c b/block.c
> index 83a695b..1d95a5d 100644
> --- a/block.c
> +++ b/block.c
> @@ -2653,69 +2653,67 @@ int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *top,
>      return 0;
>  }
>  
> +BlockInfo *bdrv_query_info(BlockDriverState *bs)
> +{
> +    BlockInfo *info = g_malloc0(sizeof(*info));
> +    info->device = g_strdup(bs->device_name);
> +    info->type = g_strdup("unknown");
> +    info->locked = bdrv_dev_is_medium_locked(bs);
> +    info->removable = bdrv_dev_has_removable_media(bs);
> +
> +    if (bdrv_dev_has_removable_media(bs)) {
> +        info->has_tray_open = true;
> +        info->tray_open = bdrv_dev_is_tray_open(bs);
> +    }
> +
> +    if (bdrv_iostatus_is_enabled(bs)) {
> +        info->has_io_status = true;
> +        info->io_status = bs->iostatus;
> +    }
> +
> +    if (bs->drv) {
> +        info->has_inserted = true;
> +        info->inserted = g_malloc0(sizeof(*info->inserted));
> +        info->inserted->file = g_strdup(bs->filename);
> +        info->inserted->ro = bs->read_only;
> +        info->inserted->drv = g_strdup(bs->drv->format_name);
> +        info->inserted->encrypted = bs->encrypted;
> +        info->inserted->encryption_key_missing = bdrv_key_required(bs);
> +
> +        if (bs->backing_file[0]) {
> +            info->inserted->has_backing_file = true;
> +            info->inserted->backing_file = g_strdup(bs->backing_file);
> +        }
> +
> +        if (bs->io_limits_enabled) {
> +            info->inserted->bps =
> +                           bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
> +            info->inserted->bps_rd =
> +                           bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
> +            info->inserted->bps_wr =
> +                           bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
> +            info->inserted->iops =
> +                           bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
> +            info->inserted->iops_rd =
> +                           bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
> +            info->inserted->iops_wr =
> +                           bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
> +        }
> +    }
> +    return info;
> +}
> +
>  BlockInfoList *qmp_query_block(Error **errp)
>  {
> -    BlockInfoList *head = NULL, *cur_item = NULL;
> +    BlockInfoList *head = NULL, **p_next = &head;
>      BlockDriverState *bs;
>  
>      QTAILQ_FOREACH(bs, &bdrv_states, list) {
>          BlockInfoList *info = g_malloc0(sizeof(*info));
> +        info->value = bdrv_query_info(bs);
>  
> -        info->value = g_malloc0(sizeof(*info->value));
> -        info->value->device = g_strdup(bs->device_name);
> -        info->value->type = g_strdup("unknown");
> -        info->value->locked = bdrv_dev_is_medium_locked(bs);
> -        info->value->removable = bdrv_dev_has_removable_media(bs);
> -
> -        if (bdrv_dev_has_removable_media(bs)) {
> -            info->value->has_tray_open = true;
> -            info->value->tray_open = bdrv_dev_is_tray_open(bs);
> -        }
> -
> -        if (bdrv_iostatus_is_enabled(bs)) {
> -            info->value->has_io_status = true;
> -            info->value->io_status = bs->iostatus;
> -        }
> -
> -        if (bs->drv) {
> -            info->value->has_inserted = true;
> -            info->value->inserted = g_malloc0(sizeof(*info->value->inserted));
> -            info->value->inserted->file = g_strdup(bs->filename);
> -            info->value->inserted->ro = bs->read_only;
> -            info->value->inserted->drv = g_strdup(bs->drv->format_name);
> -            info->value->inserted->encrypted = bs->encrypted;
> -            info->value->inserted->encryption_key_missing = bdrv_key_required(bs);
> -            if (bs->backing_file[0]) {
> -                info->value->inserted->has_backing_file = true;
> -                info->value->inserted->backing_file = g_strdup(bs->backing_file);
> -            }
> -
> -            info->value->inserted->backing_file_depth =
> -                bdrv_get_backing_file_depth(bs);

Mismerge: This part is missing from bdrv_query_info.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 23/45] block: export dirty bitmap information in query-block
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 23/45] block: export dirty bitmap information in query-block Paolo Bonzini
@ 2012-10-15 16:08   ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2012-10-15 16:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: new
> 
>  block.c          |  6 ++++++
>  qapi-schema.json | 20 ++++++++++++++++++--
>  2 file modificati, 24 inserzioni(+), 2 rimozioni(-)
> 
> diff --git a/block.c b/block.c
> index 2c1273c..074325d 100644
> --- a/block.c
> +++ b/block.c
> @@ -2697,6 +2697,12 @@ BlockInfo *bdrv_query_info(BlockDriverState *bs)
>          info->io_status = bs->iostatus;
>      }
>  
> +    if (bs->dirty_bitmap) {
> +        info->has_dirty = true;
> +        info->dirty = g_malloc0(sizeof(*info->dirty));
> +        info->dirty->count = bdrv_get_dirty_count(bs) * BDRV_SECTORS_PER_DIRTY_CHUNK;
> +    }
> +
>      if (bs->drv) {
>          info->has_inserted = true;
>          info->inserted = g_malloc0(sizeof(*info->inserted));
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 26ac21f..dd418b8 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -604,7 +604,7 @@
>              '*backing_file': 'str', 'backing_file_depth': 'int',
>              'encrypted': 'bool', 'encryption_key_missing': 'bool',
>              'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
> -            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int'} }
> +            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } }
>  
>  ##
>  # @BlockDeviceIoStatus:
> @@ -622,6 +622,18 @@
>  { 'enum': 'BlockDeviceIoStatus', 'data': [ 'ok', 'failed', 'nospace' ] }
>  
>  ##
> +# @BlockDirtyInfo:
> +#
> +# Block dirty bitmap information.
> +#
> +# @count: number of dirty sectors according to the dirty bitmap
> +#
> +# Since: 1.3
> +##
> +{ 'type': 'BlockDirtyInfo',
> +  'data': {'count': 'int'} }

Can we use bytes instead of arbitrary units of 512 bytes? I don't want
to discuss with people why a sector is 512 bytes here even though all
their virtual disks use a 4k sector size...

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 26/45] mirror: introduce mirror job
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 26/45] mirror: introduce mirror job Paolo Bonzini
@ 2012-10-15 16:57   ` Kevin Wolf
  2012-10-16  6:36     ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-10-15 16:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> This patch adds the implementation of a new job that mirrors a disk to
> a new image while letting the guest continue using the old image.
> The target is treated as a "black box" and data is copied from the
> source to the target in the background.  This can be used for several
> purposes, including storage migration, continuous replication, and
> observation of the guest I/O in an external program.  It is also a
> first step in replacing the inefficient block migration code that is
> part of QEMU.
> 
> The job is possibly never-ending, but it is logically structured into
> two phases: 1) copy all data as fast as possible until the target
> first gets in sync with the source; 2) keep target in sync and
> ensure that reopening to the target gets a correct (full) copy
> of the source data.
> 
> The second phase is indicated by the progress in "info block-jobs"
> reporting the current offset to be equal to the length of the file.
> When the job is cancelled in the second phase, QEMU will run the
> job until the source is clean and quiescent, then it will report
> successful completion of the job.
> 
> In other words, the BLOCK_JOB_CANCELLED event means that the target
> may _not_ be consistent with a past state of the source; the
> BLOCK_JOB_COMPLETED event means that the target is consistent with
> a past state of the source.  (Note that it could already happen
> that management lost the race against QEMU and got a completion
> event instead of cancellation).
> 
> It is not yet possible to complete the job and switch over to the target
> disk.  The next patches will fix this and add many refinements to the
> basic idea introduced here.  These include improved error management,
> some tunable knobs and performance optimizations.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: Always "goto immediate_exit" and similar code cleanups.
>         Error checking for bdrv_flush.  Call bdrv_set_enable_write_cache
>         on the target to make it always writeback.
> 
>  block/Makefile.objs |   2 +-
>  block/mirror.c      | 234 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block_int.h         |  20 +++++
>  qapi-schema.json    |  17 ++++
>  trace-events        |   7 ++
>  5 file modificati, 279 inserzioni(+). 1 rimozione(-)
>  create mode 100644 block/mirror.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index c45affc..f1a394a 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -9,4 +9,4 @@ block-obj-$(CONFIG_LIBISCSI) += iscsi.o
>  block-obj-$(CONFIG_CURL) += curl.o
>  block-obj-$(CONFIG_RBD) += rbd.o
>  
> -common-obj-y += stream.o
> +common-obj-y += stream.o mirror.o
> diff --git a/block/mirror.c b/block/mirror.c
> new file mode 100644
> index 0000000..09ea020
> --- /dev/null
> +++ b/block/mirror.c
> @@ -0,0 +1,234 @@
> +/*
> + * Image mirroring
> + *
> + * Copyright Red Hat, Inc. 2012
> + *
> + * Authors:
> + *  Paolo Bonzini  <pbonzini@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "trace.h"
> +#include "blockjob.h"
> +#include "block_int.h"
> +#include "qemu/ratelimit.h"
> +
> +enum {
> +    /*
> +     * Size of data buffer for populating the image file.  This should be large
> +     * enough to process multiple clusters in a single call, so that populating
> +     * contiguous regions of the image is efficient.
> +     */
> +    BLOCK_SIZE = 512 * BDRV_SECTORS_PER_DIRTY_CHUNK, /* in bytes */
> +};
> +
> +#define SLICE_TIME 100000000ULL /* ns */
> +
> +typedef struct MirrorBlockJob {
> +    BlockJob common;
> +    RateLimit limit;
> +    BlockDriverState *target;
> +    MirrorSyncMode mode;
> +    int64_t sector_num;
> +    uint8_t *buf;
> +} MirrorBlockJob;
> +
> +static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +{
> +    BlockDriverState *source = s->common.bs;
> +    BlockDriverState *target = s->target;
> +    QEMUIOVector qiov;
> +    int ret, nb_sectors;
> +    int64_t end;
> +    struct iovec iov;
> +
> +    end = s->common.len >> BDRV_SECTOR_BITS;
> +    s->sector_num = bdrv_get_next_dirty(source, s->sector_num);
> +    nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
> +    bdrv_reset_dirty(source, s->sector_num, nb_sectors);
> +
> +    /* Copy the dirty cluster.  */
> +    iov.iov_base = s->buf;
> +    iov.iov_len  = nb_sectors * 512;
> +    qemu_iovec_init_external(&qiov, &iov, 1);
> +
> +    trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
> +    ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
> +}
> +
> +static void coroutine_fn mirror_run(void *opaque)
> +{
> +    MirrorBlockJob *s = opaque;
> +    BlockDriverState *bs = s->common.bs;
> +    int64_t sector_num, end;
> +    int ret = 0;
> +    int n;
> +    bool synced = false;
> +
> +    if (block_job_is_cancelled(&s->common)) {
> +        goto immediate_exit;
> +    }
> +
> +    s->common.len = bdrv_getlength(bs);
> +    if (s->common.len < 0) {
> +        block_job_completed(&s->common, s->common.len);
> +        return;
> +    }
> +
> +    end = s->common.len >> BDRV_SECTOR_BITS;
> +    s->buf = qemu_blockalign(bs, BLOCK_SIZE);
> +
> +    if (s->mode != MIRROR_SYNC_MODE_NONE) {
> +        /* First part, loop on the sectors and initialize the dirty bitmap.  */
> +        BlockDriverState *base;
> +        base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
> +        for (sector_num = 0; sector_num < end; ) {
> +            int64_t next = (sector_num | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
> +            ret = bdrv_co_is_allocated_above(bs, base,
> +                                             sector_num, next - sector_num, &n);
> +
> +            if (ret < 0) {
> +                goto immediate_exit;
> +            }
> +
> +            assert(n > 0);
> +            if (ret == 1) {
> +                bdrv_set_dirty(bs, sector_num, n);
> +                sector_num = next;
> +            } else {
> +                sector_num += n;
> +            }
> +        }
> +    }
> +
> +    s->sector_num = -1;
> +    for (;;) {
> +        uint64_t delay_ns;
> +        int64_t cnt;
> +        bool should_complete;
> +
> +        cnt = bdrv_get_dirty_count(bs);
> +        if (cnt != 0) {
> +            ret = mirror_iteration(s);
> +            if (ret < 0) {
> +                goto immediate_exit;
> +            }
> +            cnt = bdrv_get_dirty_count(bs);
> +        }
> +
> +        should_complete = false;
> +        if (cnt == 0) {
> +            trace_mirror_before_flush(s);
> +            if (bdrv_flush(s->target) < 0) {
> +                goto immediate_exit;
> +            }

Are you sure that we should signal successful completion when
bdrv_flush() fails?

> +
> +            /* We're out of the streaming phase.  From now on, if the job
> +             * is cancelled we will actually complete all pending I/O and
> +             * report completion.  This way, block-job-cancel will leave
> +             * the target in a consistent state.
> +             */

Don't we have block_job_complete() for that now? Then I think the job
can be cancelled immediately, even in an inconsistent state.

> +            synced = true;
> +            s->common.offset = end * BDRV_SECTOR_SIZE;
> +            should_complete = block_job_is_cancelled(&s->common);
> +            cnt = bdrv_get_dirty_count(bs);
> +        }
> +
> +        if (cnt == 0 && should_complete) {
> +            /* The dirty bitmap is not updated while operations are pending.
> +             * If we're about to exit, wait for pending operations before
> +             * calling bdrv_get_dirty_count(bs), or we may exit while the
> +             * source has dirty data to copy!
> +             *
> +             * Note that I/O can be submitted by the guest while
> +             * mirror_populate runs.
> +             */
> +            trace_mirror_before_drain(s, cnt);
> +            bdrv_drain_all();
> +            cnt = bdrv_get_dirty_count(bs);
> +        }
> +
> +        ret = 0;
> +        trace_mirror_before_sleep(s, cnt, synced);
> +        if (!synced) {
> +            /* Publish progress */
> +            s->common.offset = end * BDRV_SECTOR_SIZE - cnt * BLOCK_SIZE;
> +
> +            if (s->common.speed) {
> +                delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
> +            } else {
> +                delay_ns = 0;
> +            }
> +
> +            /* Note that even when no rate limit is applied we need to yield
> +             * with no pending I/O here so that qemu_aio_flush() returns.
> +             */
> +            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
> +            if (block_job_is_cancelled(&s->common)) {
> +                break;
> +            }
> +        } else if (!should_complete) {
> +            delay_ns = (cnt == 0 ? SLICE_TIME : 0);
> +            block_job_sleep_ns(&s->common, rt_clock, delay_ns);

Why don't we check block_job_is_cancelled() here? I can't see how
cancellation works in the second phase, except when cnt becomes 0. But
this isn't guaranteed, is it?

> +        } else if (cnt == 0) {
> +            /* The two disks are in sync.  Exit and report successful
> +             * completion.
> +             */
> +            assert(QLIST_EMPTY(&bs->tracked_requests));
> +            s->common.cancelled = false;
> +            break;
> +        }
> +    }
> +
> +immediate_exit:
> +    g_free(s->buf);
> +    bdrv_set_dirty_tracking(bs, false);
> +    bdrv_close(s->target);
> +    bdrv_delete(s->target);
> +    block_job_completed(&s->common, ret);
> +}

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command Paolo Bonzini
  2012-09-27  0:14   ` Eric Blake
  2012-09-27 19:49   ` Jeff Cody
@ 2012-10-15 17:33   ` Kevin Wolf
  2012-10-16  6:39     ` Paolo Bonzini
  2012-10-18 13:13     ` Paolo Bonzini
  2 siblings, 2 replies; 102+ messages in thread
From: Kevin Wolf @ 2012-10-15 17:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> This adds the monitor commands that start the mirroring job.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c       | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  hmp-commands.hx  |  21 ++++++++++
>  hmp.c            |  28 +++++++++++++
>  hmp.h            |   1 +
>  qapi-schema.json |  33 +++++++++++++++
>  qmp-commands.hx  |  42 +++++++++++++++++++
>  6 file modificati, 249 inserzioni(+). 1 rimozione(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 9069ca1..722aab5 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1118,6 +1117,130 @@ void qmp_block_stream(const char *device, bool has_base,
>      trace_qmp_block_stream(bs, bs->job);
>  }
>  
> +void qmp_drive_mirror(const char *device, const char *target,
> +                      bool has_format, const char *format,
> +                      enum MirrorSyncMode sync,
> +                      bool has_mode, enum NewImageMode mode,
> +                      bool has_speed, int64_t speed, Error **errp)
> +{
> +    BlockDriverInfo bdi;
> +    BlockDriverState *bs;
> +    BlockDriverState *source, *target_bs;
> +    BlockDriver *proto_drv;
> +    BlockDriver *drv = NULL;
> +    Error *local_err = NULL;
> +    int flags;
> +    uint64_t size;
> +    int ret;
> +
> +    if (!has_speed) {
> +        speed = 0;
> +    }
> +    if (!has_mode) {
> +        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
> +    }
> +
> +    bs = bdrv_find(device);
> +    if (!bs) {
> +        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
> +        return;
> +    }
> +
> +    if (!has_format) {
> +        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;

bs->drv can be NULL for removable media. (Worth a test case?)

> +    }
> +    if (format) {
> +        drv = bdrv_find_format(format);
> +        if (!drv) {
> +            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
> +            return;
> +        }
> +    }
> +
> +    if (!bdrv_is_inserted(bs)) {
> +        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
> +        return;
> +    }
> +
> +    if (bdrv_in_use(bs)) {
> +        error_set(errp, QERR_DEVICE_IN_USE, device);
> +        return;
> +    }
> +
> +    flags = bs->open_flags | BDRV_O_RDWR;

The two questions from last time are still open:

Jeff's patches are in now, so we can do a bdrv_reopen() to remove
BDRV_O_RDWR again when completing the mirror job.

The other thing was the throttling. It's not entirely clear to me what
our conclusion was, but you suggested to remove it entirely from the
mirror because I/O throttling on the target is equivalent. Of course,
you can only throttle the target as soon as it has a name. What was your
plan with that?

> +    source = bs->backing_hd;
> +    if (!source && sync == MIRROR_SYNC_MODE_TOP) {
> +        sync = MIRROR_SYNC_MODE_FULL;
> +    }
> +
> +    proto_drv = bdrv_find_protocol(target);
> +    if (!proto_drv) {
> +        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);

Not a great error message for this case.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 28/45] mirror: implement completion
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 28/45] mirror: implement completion Paolo Bonzini
@ 2012-10-15 17:49   ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2012-10-15 17:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Switching to the target of the migration is done mostly asynchronously,
> and reported to management via the BLOCK_JOB_COMPLETED event; the only
> synchronous phase is opening the backing files.  bdrv_open_backing_file
> can always be done, even for migration of the full image (aka sync:
> 'full').  In this case, qmp_drive_mirror will create the target disk
> with no backing file at all, and bdrv_open_backing_file will be a no-op.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/mirror.c | 41 ++++++++++++++++++++++++++++++++++++-----
>  1 file modificato, 36 inserzioni(+), 5 rimozioni(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 09ea020..939834d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -32,6 +32,8 @@ typedef struct MirrorBlockJob {
>      RateLimit limit;
>      BlockDriverState *target;
>      MirrorSyncMode mode;
> +    bool synced;
> +    bool complete;

Maybe rename this to should_complete or completion_requested? For a few
seconds I thought this would indicate that the job has completed.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 26/45] mirror: introduce mirror job
  2012-10-15 16:57   ` Kevin Wolf
@ 2012-10-16  6:36     ` Paolo Bonzini
  2012-10-16  8:24       ` Kevin Wolf
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-16  6:36 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 15/10/2012 18:57, Kevin Wolf ha scritto:
> Am 26.09.2012 17:56, schrieb Paolo Bonzini:
>> This patch adds the implementation of a new job that mirrors a disk to
>> a new image while letting the guest continue using the old image.
>> The target is treated as a "black box" and data is copied from the
>> source to the target in the background.  This can be used for several
>> purposes, including storage migration, continuous replication, and
>> observation of the guest I/O in an external program.  It is also a
>> first step in replacing the inefficient block migration code that is
>> part of QEMU.
>>
>> The job is possibly never-ending, but it is logically structured into
>> two phases: 1) copy all data as fast as possible until the target
>> first gets in sync with the source; 2) keep target in sync and
>> ensure that reopening to the target gets a correct (full) copy
>> of the source data.
>>
>> The second phase is indicated by the progress in "info block-jobs"
>> reporting the current offset to be equal to the length of the file.
>> When the job is cancelled in the second phase, QEMU will run the
>> job until the source is clean and quiescent, then it will report
>> successful completion of the job.
>>
>> In other words, the BLOCK_JOB_CANCELLED event means that the target
>> may _not_ be consistent with a past state of the source; the
>> BLOCK_JOB_COMPLETED event means that the target is consistent with
>> a past state of the source.  (Note that it could already happen
>> that management lost the race against QEMU and got a completion
>> event instead of cancellation).
>>
>> It is not yet possible to complete the job and switch over to the target
>> disk.  The next patches will fix this and add many refinements to the
>> basic idea introduced here.  These include improved error management,
>> some tunable knobs and performance optimizations.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>         v1->v2: Always "goto immediate_exit" and similar code cleanups.
>>         Error checking for bdrv_flush.  Call bdrv_set_enable_write_cache
>>         on the target to make it always writeback.
>>
>>  block/Makefile.objs |   2 +-
>>  block/mirror.c      | 234 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block_int.h         |  20 +++++
>>  qapi-schema.json    |  17 ++++
>>  trace-events        |   7 ++
>>  5 file modificati, 279 inserzioni(+). 1 rimozione(-)
>>  create mode 100644 block/mirror.c
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index c45affc..f1a394a 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -9,4 +9,4 @@ block-obj-$(CONFIG_LIBISCSI) += iscsi.o
>>  block-obj-$(CONFIG_CURL) += curl.o
>>  block-obj-$(CONFIG_RBD) += rbd.o
>>  
>> -common-obj-y += stream.o
>> +common-obj-y += stream.o mirror.o
>> diff --git a/block/mirror.c b/block/mirror.c
>> new file mode 100644
>> index 0000000..09ea020
>> --- /dev/null
>> +++ b/block/mirror.c
>> @@ -0,0 +1,234 @@
>> +/*
>> + * Image mirroring
>> + *
>> + * Copyright Red Hat, Inc. 2012
>> + *
>> + * Authors:
>> + *  Paolo Bonzini  <pbonzini@redhat.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "trace.h"
>> +#include "blockjob.h"
>> +#include "block_int.h"
>> +#include "qemu/ratelimit.h"
>> +
>> +enum {
>> +    /*
>> +     * Size of data buffer for populating the image file.  This should be large
>> +     * enough to process multiple clusters in a single call, so that populating
>> +     * contiguous regions of the image is efficient.
>> +     */
>> +    BLOCK_SIZE = 512 * BDRV_SECTORS_PER_DIRTY_CHUNK, /* in bytes */
>> +};
>> +
>> +#define SLICE_TIME 100000000ULL /* ns */
>> +
>> +typedef struct MirrorBlockJob {
>> +    BlockJob common;
>> +    RateLimit limit;
>> +    BlockDriverState *target;
>> +    MirrorSyncMode mode;
>> +    int64_t sector_num;
>> +    uint8_t *buf;
>> +} MirrorBlockJob;
>> +
>> +static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
>> +{
>> +    BlockDriverState *source = s->common.bs;
>> +    BlockDriverState *target = s->target;
>> +    QEMUIOVector qiov;
>> +    int ret, nb_sectors;
>> +    int64_t end;
>> +    struct iovec iov;
>> +
>> +    end = s->common.len >> BDRV_SECTOR_BITS;
>> +    s->sector_num = bdrv_get_next_dirty(source, s->sector_num);
>> +    nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
>> +    bdrv_reset_dirty(source, s->sector_num, nb_sectors);
>> +
>> +    /* Copy the dirty cluster.  */
>> +    iov.iov_base = s->buf;
>> +    iov.iov_len  = nb_sectors * 512;
>> +    qemu_iovec_init_external(&qiov, &iov, 1);
>> +
>> +    trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
>> +    ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
>> +}
>> +
>> +static void coroutine_fn mirror_run(void *opaque)
>> +{
>> +    MirrorBlockJob *s = opaque;
>> +    BlockDriverState *bs = s->common.bs;
>> +    int64_t sector_num, end;
>> +    int ret = 0;
>> +    int n;
>> +    bool synced = false;
>> +
>> +    if (block_job_is_cancelled(&s->common)) {
>> +        goto immediate_exit;
>> +    }
>> +
>> +    s->common.len = bdrv_getlength(bs);
>> +    if (s->common.len < 0) {
>> +        block_job_completed(&s->common, s->common.len);
>> +        return;
>> +    }
>> +
>> +    end = s->common.len >> BDRV_SECTOR_BITS;
>> +    s->buf = qemu_blockalign(bs, BLOCK_SIZE);
>> +
>> +    if (s->mode != MIRROR_SYNC_MODE_NONE) {
>> +        /* First part, loop on the sectors and initialize the dirty bitmap.  */
>> +        BlockDriverState *base;
>> +        base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
>> +        for (sector_num = 0; sector_num < end; ) {
>> +            int64_t next = (sector_num | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
>> +            ret = bdrv_co_is_allocated_above(bs, base,
>> +                                             sector_num, next - sector_num, &n);
>> +
>> +            if (ret < 0) {
>> +                goto immediate_exit;
>> +            }
>> +
>> +            assert(n > 0);
>> +            if (ret == 1) {
>> +                bdrv_set_dirty(bs, sector_num, n);
>> +                sector_num = next;
>> +            } else {
>> +                sector_num += n;
>> +            }
>> +        }
>> +    }
>> +
>> +    s->sector_num = -1;
>> +    for (;;) {
>> +        uint64_t delay_ns;
>> +        int64_t cnt;
>> +        bool should_complete;
>> +
>> +        cnt = bdrv_get_dirty_count(bs);
>> +        if (cnt != 0) {
>> +            ret = mirror_iteration(s);
>> +            if (ret < 0) {
>> +                goto immediate_exit;
>> +            }
>> +            cnt = bdrv_get_dirty_count(bs);
>> +        }
>> +
>> +        should_complete = false;
>> +        if (cnt == 0) {
>> +            trace_mirror_before_flush(s);
>> +            if (bdrv_flush(s->target) < 0) {
>> +                goto immediate_exit;
>> +            }
> 
> Are you sure that we should signal successful completion when
> bdrv_flush() fails?

Hmm, of course not.

>> +
>> +            /* We're out of the streaming phase.  From now on, if the job
>> +             * is cancelled we will actually complete all pending I/O and
>> +             * report completion.  This way, block-job-cancel will leave
>> +             * the target in a consistent state.
>> +             */
> 
> Don't we have block_job_complete() for that now? Then I think the job
> can be cancelled immediately, even in an inconsistent state.

The idea was that block-job-cancel will still leave the target in a
consistent state if executed during the second phase.  Otherwise it is
impossible to take a consistent snapshot and keep running on the first
image.

>> +            synced = true;
>> +            s->common.offset = end * BDRV_SECTOR_SIZE;
>> +            should_complete = block_job_is_cancelled(&s->common);
>> +            cnt = bdrv_get_dirty_count(bs);
>> +        }
>> +
>> +        if (cnt == 0 && should_complete) {
>> +            /* The dirty bitmap is not updated while operations are pending.
>> +             * If we're about to exit, wait for pending operations before
>> +             * calling bdrv_get_dirty_count(bs), or we may exit while the
>> +             * source has dirty data to copy!
>> +             *
>> +             * Note that I/O can be submitted by the guest while
>> +             * mirror_populate runs.
>> +             */
>> +            trace_mirror_before_drain(s, cnt);
>> +            bdrv_drain_all();
>> +            cnt = bdrv_get_dirty_count(bs);
>> +        }
>> +
>> +        ret = 0;
>> +        trace_mirror_before_sleep(s, cnt, synced);
>> +        if (!synced) {
>> +            /* Publish progress */
>> +            s->common.offset = end * BDRV_SECTOR_SIZE - cnt * BLOCK_SIZE;
>> +
>> +            if (s->common.speed) {
>> +                delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
>> +            } else {
>> +                delay_ns = 0;
>> +            }
>> +
>> +            /* Note that even when no rate limit is applied we need to yield
>> +             * with no pending I/O here so that qemu_aio_flush() returns.
>> +             */
>> +            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
>> +            if (block_job_is_cancelled(&s->common)) {
>> +                break;
>> +            }
>> +        } else if (!should_complete) {
>> +            delay_ns = (cnt == 0 ? SLICE_TIME : 0);
>> +            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
> 
> Why don't we check block_job_is_cancelled() here? I can't see how
> cancellation works in the second phase, except when cnt becomes 0.

Indeed cancellation requires consistency (and hence cnt == 0) in the
second phase.

> But this isn't guaranteed, is it?

Not guaranteed, but in practice it works and you can always throttle
writes on the source to guarantee that it does.

>> +        } else if (cnt == 0) {
>> +            /* The two disks are in sync.  Exit and report successful
>> +             * completion.
>> +             */
>> +            assert(QLIST_EMPTY(&bs->tracked_requests));
>> +            s->common.cancelled = false;
>> +            break;
>> +        }
>> +    }
>> +
>> +immediate_exit:
>> +    g_free(s->buf);
>> +    bdrv_set_dirty_tracking(bs, false);
>> +    bdrv_close(s->target);
>> +    bdrv_delete(s->target);
>> +    block_job_completed(&s->common, ret);
>> +}
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command
  2012-10-15 17:33   ` Kevin Wolf
@ 2012-10-16  6:39     ` Paolo Bonzini
  2012-10-18 13:13     ` Paolo Bonzini
  1 sibling, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-16  6:39 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 15/10/2012 19:33, Kevin Wolf ha scritto:
>> > +
>> > +    flags = bs->open_flags | BDRV_O_RDWR;
> The two questions from last time are still open:
> 
> Jeff's patches are in now, so we can do a bdrv_reopen() to remove
> BDRV_O_RDWR again when completing the mirror job.

It's not a big change, so I'll add this to v3.

> The other thing was the throttling. It's not entirely clear to me what
> our conclusion was, but you suggested to remove it entirely from the
> mirror because I/O throttling on the target is equivalent.

Yes, that was because at the time you were suggesting to give the target
a name.  In the end that had too many problems, so I left that part out
for after Markus finishes his refactoring.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 26/45] mirror: introduce mirror job
  2012-10-16  6:36     ` Paolo Bonzini
@ 2012-10-16  8:24       ` Kevin Wolf
  2012-10-16  8:35         ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-10-16  8:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 16.10.2012 08:36, schrieb Paolo Bonzini:
> Il 15/10/2012 18:57, Kevin Wolf ha scritto:
>> Am 26.09.2012 17:56, schrieb Paolo Bonzini:
>>> +
>>> +            /* We're out of the streaming phase.  From now on, if the job
>>> +             * is cancelled we will actually complete all pending I/O and
>>> +             * report completion.  This way, block-job-cancel will leave
>>> +             * the target in a consistent state.
>>> +             */
>>
>> Don't we have block_job_complete() for that now? Then I think the job
>> can be cancelled immediately, even in an inconsistent state.
> 
> The idea was that block-job-cancel will still leave the target in a
> consistent state if executed during the second phase.  Otherwise it is
> impossible to take a consistent snapshot and keep running on the first
> image.

Yes, I noticed that when reading one of the following patches. However,
this behaviour didn't seem to be documented very well. IIRC, you do
mention it in the QMP documentation for block-job-complete, but wouldn't
it make sense to describe what cancel/complete mean in the documentation
for drive-mirror as well?

I'd also consider putting a comment in the code that explicitly says
that we intentionally wait for a consistent state before actually
cancelling. This is not the intuitive thing to do with cancel, so it
confused me.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 26/45] mirror: introduce mirror job
  2012-10-16  8:24       ` Kevin Wolf
@ 2012-10-16  8:35         ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-16  8:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 16/10/2012 10:24, Kevin Wolf ha scritto:
>> > The idea was that block-job-cancel will still leave the target in a
>> > consistent state if executed during the second phase.  Otherwise it is
>> > impossible to take a consistent snapshot and keep running on the first
>> > image.
> Yes, I noticed that when reading one of the following patches. However,
> this behaviour didn't seem to be documented very well. IIRC, you do
> mention it in the QMP documentation for block-job-complete, but wouldn't
> it make sense to describe what cancel/complete mean in the documentation
> for drive-mirror as well?
> 
> I'd also consider putting a comment in the code that explicitly says
> that we intentionally wait for a consistent state before actually
> cancelling. This is not the intuitive thing to do with cancel, so it
> confused me.

Ok, I'll add comments.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case Paolo Bonzini
  2012-09-27  0:26   ` Eric Blake
@ 2012-10-18 12:43   ` Kevin Wolf
  2012-10-18 12:50     ` Paolo Bonzini
  1 sibling, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-10-18 12:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  tests/qemu-iotests/040     | 353 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/040.out |   5 +
>  tests/qemu-iotests/group   |   3 +-
>  3 file modificati, 360 inserzioni(+). 1 rimozione(-)
>  create mode 100755 tests/qemu-iotests/040
>  create mode 100644 tests/qemu-iotests/040.out

How about another case for cancelling while the image isn't ready yet?

Also, 040 is already taken. Should this becomes 041 then? Or if I missed
the real 041, this would have to become 044.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case
  2012-10-18 12:43   ` Kevin Wolf
@ 2012-10-18 12:50     ` Paolo Bonzini
  2012-10-18 13:08       ` Kevin Wolf
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-18 12:50 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 18/10/2012 14:43, Kevin Wolf ha scritto:
>> >  tests/qemu-iotests/040     | 353 +++++++++++++++++++++++++++++++++++++++++++++
>> >  tests/qemu-iotests/040.out |   5 +
>> >  tests/qemu-iotests/group   |   3 +-
>> >  3 file modificati, 360 inserzioni(+). 1 rimozione(-)
>> >  create mode 100755 tests/qemu-iotests/040
>> >  create mode 100644 tests/qemu-iotests/040.out
> How about another case for cancelling while the image isn't ready yet?

Sure.

> Also, 040 is already taken. Should this becomes 041 then? Or if I missed
> the real 041, this would have to become 044.

Yes, this is already 041 in my tree.

Should I submit the next batch up to here, or do you want to wait until 33?

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error Paolo Bonzini
@ 2012-10-18 13:07   ` Kevin Wolf
  2012-10-18 13:10     ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-10-18 13:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Error management is important for mirroring; otherwise, an error on the
> target (even something as "innocent" as ENOSPC) requires to start again
> with a full copy.  Similar to on_read_error/on_write_error, two separate
> knobs are provided for on_source_error (reads) and on_target_error (writes).
> The default is 'report' for both.
> 
> The 'ignore' policy will leave the sector dirty, so that it will be
> retried later.  Thus, it will not cause corruption.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: error handling for bdrv_flush, introduce mirror_error_action
> 
>  block/mirror.c   | 95 +++++++++++++++++++++++++++++++++++++++++++-------------
>  block_int.h      |  4 +++
>  blockdev.c       | 14 +++++++--
>  hmp.c            |  3 +-
>  qapi-schema.json | 11 ++++++-
>  qmp-commands.hx  |  8 ++++-
>  6 file modificati, 109 inserzioni(+), 26 rimozioni(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 939834d..caec272 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -32,13 +32,28 @@ typedef struct MirrorBlockJob {
>      RateLimit limit;
>      BlockDriverState *target;
>      MirrorSyncMode mode;
> +    BlockdevOnError on_source_error, on_target_error;
>      bool synced;
>      bool complete;
>      int64_t sector_num;
>      uint8_t *buf;
>  } MirrorBlockJob;
>  
> -static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
> +                                            int error)
> +{
> +    s->synced = false;
> +    if (read) {
> +        return block_job_error_action(&s->common, s->common.bs,
> +                                      s->on_source_error, true, error);
> +    } else {
> +        return block_job_error_action(&s->common, s->target,
> +                                      s->on_target_error, false, error);

Here we produce an event that reports an error on s->bs, i.e. on the
source, even though the error was on the target. This makes some sense
today that the target doesn't have a name, but once it has, we would
better use the target name here.

Can we change this later on? If not, what's the way forward?

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case
  2012-10-18 12:50     ` Paolo Bonzini
@ 2012-10-18 13:08       ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2012-10-18 13:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 18.10.2012 14:50, schrieb Paolo Bonzini:
> Il 18/10/2012 14:43, Kevin Wolf ha scritto:
>>>>  tests/qemu-iotests/040     | 353 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  tests/qemu-iotests/040.out |   5 +
>>>>  tests/qemu-iotests/group   |   3 +-
>>>>  3 file modificati, 360 inserzioni(+). 1 rimozione(-)
>>>>  create mode 100755 tests/qemu-iotests/040
>>>>  create mode 100644 tests/qemu-iotests/040.out
>> How about another case for cancelling while the image isn't ready yet?
> 
> Sure.
> 
>> Also, 040 is already taken. Should this becomes 041 then? Or if I missed
>> the real 041, this would have to become 044.
> 
> Yes, this is already 041 in my tree.
> 
> Should I submit the next batch up to here, or do you want to wait until 33?

I've just completed review of 33/45, so you can submit up to 33.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error
  2012-10-18 13:07   ` Kevin Wolf
@ 2012-10-18 13:10     ` Paolo Bonzini
  2012-10-18 13:56       ` Kevin Wolf
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-18 13:10 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 18/10/2012 15:07, Kevin Wolf ha scritto:
>> > +    s->synced = false;
>> > +    if (read) {
>> > +        return block_job_error_action(&s->common, s->common.bs,
>> > +                                      s->on_source_error, true, error);
>> > +    } else {
>> > +        return block_job_error_action(&s->common, s->target,
>> > +                                      s->on_target_error, false, error);
> Here we produce an event that reports an error on s->bs, i.e. on the
> source, even though the error was on the target.

More precisely, this is an event that reports an error on s->bs's job.
In principle there is no reason why asynchronous long-running operations
are tied to a block device (in fact migration fits the definition quite
well, with the only twist that the VM is stopped at the end), but that's
the API we're stuck with.

> This makes some sense
> today that the target doesn't have a name, but once it has, we would
> better use the target name here.
> 
> Can we change this later on? If not, what's the way forward?

Yes, we can change it to one of these:

1) produce both a BLOCK_JOB_ERROR event on the source and a
BLOCK_IO_ERROR event on the target;

2) add a "device" argument to the BLOCK_JOB_ERROR and fill it.

I think I prefer the latter, but it can be discussed separately.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command
  2012-10-15 17:33   ` Kevin Wolf
  2012-10-16  6:39     ` Paolo Bonzini
@ 2012-10-18 13:13     ` Paolo Bonzini
  1 sibling, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-18 13:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 15/10/2012 19:33, Kevin Wolf ha scritto:
> 
>> > +    source = bs->backing_hd;
>> > +    if (!source && sync == MIRROR_SYNC_MODE_TOP) {
>> > +        sync = MIRROR_SYNC_MODE_FULL;
>> > +    }
>> > +
>> > +    proto_drv = bdrv_find_protocol(target);
>> > +    if (!proto_drv) {
>> > +        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
> Not a great error message for this case.

Hmm, this is cut-and-paste from qmp_transaction.

I will add an Error ** to bdrv_find_format and bdrv_find_protocol when I
have some time (certainly before 1.3).

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error
  2012-10-18 13:10     ` Paolo Bonzini
@ 2012-10-18 13:56       ` Kevin Wolf
  2012-10-18 14:52         ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-10-18 13:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 18.10.2012 15:10, schrieb Paolo Bonzini:
> Il 18/10/2012 15:07, Kevin Wolf ha scritto:
>>>> +    s->synced = false;
>>>> +    if (read) {
>>>> +        return block_job_error_action(&s->common, s->common.bs,
>>>> +                                      s->on_source_error, true, error);
>>>> +    } else {
>>>> +        return block_job_error_action(&s->common, s->target,
>>>> +                                      s->on_target_error, false, error);
>> Here we produce an event that reports an error on s->bs, i.e. on the
>> source, even though the error was on the target.
> 
> More precisely, this is an event that reports an error on s->bs's job.
> In principle there is no reason why asynchronous long-running operations
> are tied to a block device (in fact migration fits the definition quite
> well, with the only twist that the VM is stopped at the end), but that's
> the API we're stuck with.

Yes, I think I mentioned already more than once that it shouldn't be
block job, but background job without a reference to a (single)
BlockDriverState. What we have just doesn't make any sense - even for
block jobs, because block jobs working on a single BDS are the
exception, not the rule.

Should probably have tried to fix this when I first mentioned it, but
too many incoming patches prevent that I do any change myself...

>> This makes some sense
>> today that the target doesn't have a name, but once it has, we would
>> better use the target name here.
>>
>> Can we change this later on? If not, what's the way forward?
> 
> Yes, we can change it to one of these:
> 
> 1) produce both a BLOCK_JOB_ERROR event on the source and a
> BLOCK_IO_ERROR event on the target;
> 
> 2) add a "device" argument to the BLOCK_JOB_ERROR and fill it.
> 
> I think I prefer the latter, but it can be discussed separately.

I already hate it again. But yeah, we can muddle through somehow, not a
blocker at this moment.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error
  2012-10-18 13:56       ` Kevin Wolf
@ 2012-10-18 14:52         ` Paolo Bonzini
  2012-10-19  8:04           ` Kevin Wolf
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-18 14:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 18/10/2012 15:56, Kevin Wolf ha scritto:
> Am 18.10.2012 15:10, schrieb Paolo Bonzini:
>> Il 18/10/2012 15:07, Kevin Wolf ha scritto:
>>>>> +    s->synced = false;
>>>>> +    if (read) {
>>>>> +        return block_job_error_action(&s->common, s->common.bs,
>>>>> +                                      s->on_source_error, true, error);
>>>>> +    } else {
>>>>> +        return block_job_error_action(&s->common, s->target,
>>>>> +                                      s->on_target_error, false, error);
>>> Here we produce an event that reports an error on s->bs, i.e. on the
>>> source, even though the error was on the target.
>>
>> More precisely, this is an event that reports an error on s->bs's job.
>> In principle there is no reason why asynchronous long-running operations
>> are tied to a block device (in fact migration fits the definition quite
>> well, with the only twist that the VM is stopped at the end), but that's
>> the API we're stuck with.
> 
> Yes, I think I mentioned already more than once that it shouldn't be
> block job, but background job without a reference to a (single)
> BlockDriverState. What we have just doesn't make any sense - even for
> block jobs, because block jobs working on a single BDS are the
> exception, not the rule.

I'm quite at a loss with how to change this without breaking the API. :/

Unfortunately this came up after the first release with streaming.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error
  2012-10-18 14:52         ` Paolo Bonzini
@ 2012-10-19  8:04           ` Kevin Wolf
  2012-10-19  9:30             ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-10-19  8:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 18.10.2012 16:52, schrieb Paolo Bonzini:
> Il 18/10/2012 15:56, Kevin Wolf ha scritto:
>> Am 18.10.2012 15:10, schrieb Paolo Bonzini:
>>> Il 18/10/2012 15:07, Kevin Wolf ha scritto:
>>>>>> +    s->synced = false;
>>>>>> +    if (read) {
>>>>>> +        return block_job_error_action(&s->common, s->common.bs,
>>>>>> +                                      s->on_source_error, true, error);
>>>>>> +    } else {
>>>>>> +        return block_job_error_action(&s->common, s->target,
>>>>>> +                                      s->on_target_error, false, error);
>>>> Here we produce an event that reports an error on s->bs, i.e. on the
>>>> source, even though the error was on the target.
>>>
>>> More precisely, this is an event that reports an error on s->bs's job.
>>> In principle there is no reason why asynchronous long-running operations
>>> are tied to a block device (in fact migration fits the definition quite
>>> well, with the only twist that the VM is stopped at the end), but that's
>>> the API we're stuck with.
>>
>> Yes, I think I mentioned already more than once that it shouldn't be
>> block job, but background job without a reference to a (single)
>> BlockDriverState. What we have just doesn't make any sense - even for
>> block jobs, because block jobs working on a single BDS are the
>> exception, not the rule.
> 
> I'm quite at a loss with how to change this without breaking the API. :/
> 
> Unfortunately this came up after the first release with streaming.

Then let's break the API. Not immediately, I think we can keep some
useless compatibility fields in the implementation of background jobs
that would only be needed to allow the block job commands to be a
wrapper (mostly 'bool is_block_job' and 'BlockDriverState bs', I think;
maybe even just char* bs_name would be enough). Then deprecate block
jobs and at 1.6 or so remove them.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error
  2012-10-19  8:04           ` Kevin Wolf
@ 2012-10-19  9:30             ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-19  9:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel


> Then let's break the API. Not immediately, I think we can keep some
> useless compatibility fields in the implementation of background jobs
> that would only be needed to allow the block job commands to be a
> wrapper (mostly 'bool is_block_job' and 'BlockDriverState bs', I think;
> maybe even just char* bs_name would be enough). Then deprecate block
> jobs and at 1.6 or so remove them.

That's a plan.  I promise to send less patches starting at the next release
cycle!

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases
  2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases Paolo Bonzini
  2012-09-27  2:53   ` Eric Blake
@ 2012-10-24 14:41   ` Kevin Wolf
  2012-10-24 14:50     ` Paolo Bonzini
  1 sibling, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2012-10-24 14:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jcody, qemu-devel

Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> HBitmaps provides an array of bits.  The bits are stored as usual in an
> array of unsigned longs, but HBitmap is also optimized to provide fast
> iteration over set bits; going from one bit to the next is O(logB n)
> worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
> that the number of levels is in fact fixed.
> 
> In order to do this, it stacks multiple bitmaps with progressively coarser
> granularity; in all levels except the last, bit N is set iff the N-th
> unsigned long is nonzero in the immediately next level.  When iteration
> completes on the last level it can examine the 2nd-last level to quickly
> skip entire words, and even do so recursively to skip blocks of 64 words or
> powers thereof (32 on 32-bit machines).
> 
> Given an index in the bitmap, it can be split in group of bits like
> this (for the 64-bit case):
> 
>      bits 0-57 => word in the last bitmap     | bits 58-63 => bit in the word
>      bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
>      bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word
> 
> So it is easy to move up simply by shifting the index right by
> log2(BITS_PER_LONG) bits.  To move down, you shift the index left
> similarly, and add the word index within the group.  Iteration uses
> ffs (find first set bit) to find the next word to examine; this
> operation can be done in constant time in most current architectures.
> 
> Setting or clearing a range of m bits on all levels, the work to perform
> is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.
> 
> When iterating on a bitmap, each bit (on any level) is only visited
> once.  Hence, The total cost of visiting a bitmap with m bits in it is
> the number of bits that are set in all bitmaps.  Unless the bitmap is
> extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
> cost of advancing from one bit to the next is usually constant.
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

I'll continue (or actually only start really) reviewing this tomorrow,
but let me ask one question now so that I won't forget it:

> +struct HBitmapIter {
> +    const HBitmap *hb;
> +
> +    /* Copied from hb for access in the inline functions (hb is opaque).  */
> +    int granularity;
> +
> +    /* Entry offset into the last-level array of longs.  */
> +    size_t pos;

Other places, for example HBitmap.size/count and the local pos variable
in hbitmap_iter_skip_words(), use uint64_t. Why is size_t here enough?

> +
> +    /* The currently-active path in the tree.  Each item of cur[i] stores
> +     * the bits (i.e. the subtrees) yet to be processed under that node.
> +     */
> +    unsigned long cur[HBITMAP_LEVELS];
> +};

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases
  2012-10-24 14:41   ` Kevin Wolf
@ 2012-10-24 14:50     ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-24 14:50 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

Il 24/10/2012 16:41, Kevin Wolf ha scritto:
>> +struct HBitmapIter {
>> +    const HBitmap *hb;
>> +
>> +    /* Copied from hb for access in the inline functions (hb is opaque).  */
>> +    int granularity;
>> +
>> +    /* Entry offset into the last-level array of longs.  */
>> +    size_t pos;
> 
> Other places, for example HBitmap.size/count and the local pos variable
> in hbitmap_iter_skip_words(), use uint64_t. Why is size_t here enough?

size_t is enough if it is a word position.  uint64_t is necessary if it
is a bit position.  This line of hbitmap_iter_next explains it well:

    item = ((uint64_t)hbi->pos << BITS_PER_LEVEL) + ffsl(cur) - 1;

Here the word position hbi->pos is converted to a bit position, so we
need to convert size_t to uint64_t.

In fact, hbitmap_iter_skip_words could indeed use a size_t.  It uses
uint64_t because the line above used to be in hbitmap_iter_skip_words,
and used pos instead of hbi->pos.  With that code, using uint64_t saved
a cast.

Paolo

> 
>> +
>> +    /* The currently-active path in the tree.  Each item of cur[i] stores
>> +     * the bits (i.e. the subtrees) yet to be processed under that node.
>> +     */
>> +    unsigned long cur[HBITMAP_LEVELS];
>> +};
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2012-10-24 14:50 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-26 15:56 [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 01/45] qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 02/45] blockdev: rename block_stream_cb to a generic block_job_cb Paolo Bonzini
2012-09-27 11:56   ` Kevin Wolf
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 03/45] block: fix documentation of block_job_cancel_sync Paolo Bonzini
2012-09-27 12:03   ` Kevin Wolf
2012-09-27 12:08     ` Paolo Bonzini
2012-09-27 12:13       ` Kevin Wolf
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 04/45] block: move job APIs to separate files Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 05/45] block: add block_job_query Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 06/45] block: add support for job pause/resume Paolo Bonzini
2012-09-26 17:31   ` Eric Blake
2012-09-27 12:18   ` Kevin Wolf
2012-09-27 12:27     ` Paolo Bonzini
2012-09-27 12:45       ` Kevin Wolf
2012-09-27 12:57         ` Paolo Bonzini
2012-09-27 13:51           ` Kevin Wolf
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 07/45] qmp: add block-job-pause and block-job-resume Paolo Bonzini
2012-09-26 17:45   ` Eric Blake
2012-09-27  9:23     ` Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 08/45] qemu-iotests: add test for pausing a streaming operation Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 09/45] block: rename block_job_complete to block_job_completed Paolo Bonzini
2012-09-27 12:30   ` Kevin Wolf
2012-09-27 20:31     ` Jeff Cody
2012-09-28 11:00       ` Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 10/45] iostatus: rename BlockErrorAction, BlockQMPEventAction Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 11/45] iostatus: move BlockdevOnError declaration to QAPI Paolo Bonzini
2012-09-26 17:54   ` Eric Blake
2012-09-27  9:23     ` Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 12/45] iostatus: change is_read to a bool Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 13/45] iostatus: reorganize io error code Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 14/45] block: introduce block job error Paolo Bonzini
2012-09-26 19:10   ` Eric Blake
2012-09-26 19:27     ` Eric Blake
2012-09-27  9:24     ` Paolo Bonzini
2012-09-27 13:41   ` Kevin Wolf
2012-09-27 14:50     ` Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 15/45] stream: add on-error argument Paolo Bonzini
2012-09-26 20:53   ` Eric Blake
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 16/45] blkdebug: process all set_state rules in the old state Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 17/45] qemu-iotests: map underscore to dash in QMP argument names Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 18/45] qemu-iotests: add tests for streaming error handling Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 19/45] block: add bdrv_query_info Paolo Bonzini
2012-10-15 15:42   ` Kevin Wolf
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 20/45] block: add bdrv_query_stats Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 21/45] block: add bdrv_open_backing_file Paolo Bonzini
2012-09-27 18:14   ` Jeff Cody
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 22/45] block: introduce new dirty bitmap functionality Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 23/45] block: export dirty bitmap information in query-block Paolo Bonzini
2012-10-15 16:08   ` Kevin Wolf
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 24/45] block: add block-job-complete Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 25/45] block: introduce BLOCK_JOB_READY event Paolo Bonzini
2012-09-27  0:01   ` Eric Blake
2012-09-27  9:25     ` Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 26/45] mirror: introduce mirror job Paolo Bonzini
2012-10-15 16:57   ` Kevin Wolf
2012-10-16  6:36     ` Paolo Bonzini
2012-10-16  8:24       ` Kevin Wolf
2012-10-16  8:35         ` Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 27/45] qmp: add drive-mirror command Paolo Bonzini
2012-09-27  0:14   ` Eric Blake
2012-09-27 19:49   ` Jeff Cody
2012-10-15 17:33   ` Kevin Wolf
2012-10-16  6:39     ` Paolo Bonzini
2012-10-18 13:13     ` Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 28/45] mirror: implement completion Paolo Bonzini
2012-10-15 17:49   ` Kevin Wolf
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 29/45] qemu-iotests: add mirroring test case Paolo Bonzini
2012-09-27  0:26   ` Eric Blake
2012-10-18 12:43   ` Kevin Wolf
2012-10-18 12:50     ` Paolo Bonzini
2012-10-18 13:08       ` Kevin Wolf
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 30/45] iostatus: forward block_job_iostatus_reset to block job Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 31/45] mirror: add support for on-source-error/on-target-error Paolo Bonzini
2012-10-18 13:07   ` Kevin Wolf
2012-10-18 13:10     ` Paolo Bonzini
2012-10-18 13:56       ` Kevin Wolf
2012-10-18 14:52         ` Paolo Bonzini
2012-10-19  8:04           ` Kevin Wolf
2012-10-19  9:30             ` Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 32/45] qmp: add pull_event function Paolo Bonzini
2012-09-26 17:17   ` Luiz Capitulino
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 33/45] qemu-iotests: add testcases for mirroring on-source-error/on-target-error Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 34/45] host-utils: add ffsl Paolo Bonzini
2012-09-27  1:14   ` Eric Blake
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 35/45] add hierarchical bitmap data type and test cases Paolo Bonzini
2012-09-27  2:53   ` Eric Blake
2012-09-27  9:27     ` Paolo Bonzini
2012-10-24 14:41   ` Kevin Wolf
2012-10-24 14:50     ` Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 36/45] block: implement dirty bitmap using HBitmap Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 37/45] block: make round_to_clusters public Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 38/45] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 39/45] block: return count of dirty sectors, not chunks Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 40/45] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 41/45] mirror: allow customizing the granularity Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 42/45] mirror: switch mirror_iteration to AIO Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 43/45] mirror: add buf-size argument to drive-mirror Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 44/45] mirror: support more than one in-flight AIO operation Paolo Bonzini
2012-09-26 15:56 ` [Qemu-devel] [PATCH v2 45/45] mirror: support arbitrarily-sized iterations Paolo Bonzini
2012-09-27 14:05 ` [Qemu-devel] [PATCH v2 00/45] Block job improvements for 1.3 Kevin Wolf
2012-09-27 14:57   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.